• No se han encontrado resultados

4. ADAPTACIÓN DEL MODELO

4.1 EXPLICACIÓN DE LA ADAPTACIÓN REALIZADA

In this part, we used the success and surprise indicators defined in Section7.2.3to evaluate

Dedalo’s performance.

On Dedalo’s knowledge. First, we analysed what and how much Dedalo “knew” (or did

not). To do so, we comparedDSXand DSP(Dedalo’s success and surprise), as in Figure7.10.

We finally distinguished 3 different groups of performance.

Figure 7.10 Dedalo’s Success Rate (DSX) and Surprise Rate (DSP).

A first group includes trends where Dedalo’s success (in blue) was higher than its surprise (in red); see the trends “A Song of Ice and Fire”, “Germany”, “Hurricane”, “Rugby” and “Turkey”. What we understand from this is that users declared that Dedalo was able to find a

high number of explanations (highDSX), while missing few (lowDSP). In those cases, we

considered Dedalo’s performance to be satisfactory.

A second group consists in trends with an acceptable performance, i.e. “Brazil”, “Daniel

Radcliffe”, “Italy”, “Obama” and “Wembley”. Here, Dedalo’s missed explanations (DSP)

were more than the ones successfully found (or just as much; see “Obama”); however,

7.2 First Empirical Study | 153

explanations were missed by Dedalo (highDSP), but many others were actually found (high

DSX). Such a performance is again motivated by the nature of the trend: some of the peaks

relate to events that are too specific to be identified by induction, and this prevented Dedalo from finding explanations that the users, on their side, could find more easily. For example, Dedalo did find the generic explanations for both “Italy” and “Brazil”, i.e. the ones related to the Football World Cup are indeed identified and ranked high; Nevertheless, the unique events such as Italy’s disasters or Brazil’s riots could not be found.

The last group, for which the DSX rate was really low, corresponds to the “negative

examples” (cfr. the cases marked 7 in Table7.1), for which Dedalo was able to find the right

explanations, but not to rank them among the best 10.

On the users’ knowledge. In this case, we focused on what users knew and what they missed. Since we could not directly measure the surprise of the users, we compared the

number of explanations that the users did not think about (USP), with the ones Dedalo had

found (DSX), and estimated if and how much users did not know.

Again, the idea we brought here was that if an explanation edi did not appear in D0

of Task 2, but did appear in the ranked edi in R of Task 3, then this explanation was one

the user did not think about. We also putUSP andDSX in relation to the usage of external

background knowledge to see whether there was a correlation with the two indicators. Results

are presented in Table7.3.

In general, we observed that there were always 1 or 2 explanations per trend that the users missed. More specifically, we distinguished four behaviours (®, ©, ™, ´) among the users, presented below.

Case ®. Trends where users knew the topic enough not to need too much external knowl-

edge, but still missed some explanations. In those cases,USPwas more thanDSX, but the

percentage of external knowledge was low, which confirmed that few users needed it. See, for instance, “Obama” (it was easy to relate Obama to the U.S. presidential elections, but not to the federal elections), “Brazil” (assuming one was not expert enough to know the exact times of the World Cup), and “Turkey”, for which users probably thought about the country while the trend clearly refers to the searches for turkey recipes during Thanksgiving and Christmas.

Case ™. Popular trends the users knew about, for which external knowledge was only a support to confirm their explanations, and for which they missed very few explanations. This was the case of “A Song of Ice and Fire”, “Italy”, “Rugby” or “Daniel Radcliffe”,

Table 7.3 Comparison between the user surprise (USP) and Dedalo’s success (DSX). We also include the use of external knowledge (EK), measured as the percentage of users that filled the final box in Task 1 with a non-negative answer (i.e. they relied on some external knowledge to give their explanations).

Case Trend % EK USP DSX © Taylor 78.57 0.42 0.28 ™ A Song of Ice and Fire 62.16 0.70 1.65 ™ Rugby 58.82 1.35 1.76 ™ Italy 52.00 0.52 1.12 ™ Daniel Radcliffe 51.85 0.40 1.00 ´ Wembley 47.61 1.09 1.09 © Dakar 47.36 0.42 0.05 ® Obama 45.83 1.79 1.37 ® Brazil 43.75 1.34 1.09 ™ How I met your Mother 36.36 0.36 0.51 ´ Hurricane 33.33 1.57 1.81 ® Turkey 23.80 1.00 0.95 ´ Germany 21.73 0.73 1.21

Case ©. Trends, like “Taylor” or “Dakar”, that the users did not know enough, where their

surprise was high (USPhigher thanDSX). Considering that in Dedalo’s top 20 explanations

neither Taylor Swift nor the Paris-Dakar rally were mentioned, we concluded that users might have misunderstood those trends and wrongly ranked some explanations in Task 3. The high percentage of external knowledge usage confirms this idea.

Case ´. Miscellaneous trends of an average popularity, such as “Germany”, “Hurricane” or

“Wembley”, for whichUSPis lower, but still balanced, when compared to DSX. The usage

of external knowledge is low in those cases.

Error Interpretation and Possible Solutions. We finally analysed those cases in which Dedalo did not perform well, and found three major reasons.

In the case of “How I met your Mother”, the pattern was big and noisy: the number of peaks actually not related to the trend was too elevated, and this prevented Dedalo from inducing the correct explanation(s). This was therefore a pattern quality problem. The same explains why some of the #Tre trends, related either to too specific entities (people such as “Carla” or “Beppe”, TV series such as “Sherlock”) or unique events (“Volcano”, “AT&T”,

“Bitcoin”, “Star Wars”7) were excluded from the user evaluation. While noisy patterns may

be refined to obtain good explanations, specific events cannot be explained using an inductive

7Only two out of the seven current Star Wars movies have been released after 2004, when Google Trends

7.3 Second Empirical Study | 155

Documento similar