1.1 Justificación
1.1.4 Formulación del problema de investigación
cup brazil”, referring to “world cup 2014” (Kanhabua and Nørv˚ag, 2010a). In these cases, the implied time period can be inferred to temporally augment the query and enable temporal relevance modelling to improve retrieval.
3.10
Time in Matching/Relevance
Matching is the process of identifying information items in the collection which are expected to satisfy the user’s information need. Relevance quantifies the strength of a match, and facilitates relevance ranking whereby items with the greatest expected utility can be presented to the user first. In the time-aware IR model presented in Figure 3.1, matching and relevance is characterised in part (6).
Research on the fundamental nature of relevance as a dynamic and multi-dimensional con- cept has long since been at the heart of the interface between information science and IR. Several extensive works have wrestled with philosophical, cognitive and practical definitions of relevance (Saracevic, 2007). Context is a fundamental part of relevance. Since context is dynamic, so too is relevance across users, and indeed, for the same user over time (Sarace- vic, 2007). Elaborating on the role of time, Hjørland (2010) suggests user-based relevance theories tend to ignore the social nature of the world, of which time is a key element – in- stead treating the user as an isolated individual. User-oriented relevance has concentrated on discovering a general psychological mechanism residing in the mind of each individual user. Instead, Hjørland (2010) recognises that knowledge is expanding and changing all the time. Therefore, as relevance has been shown to be closely related to the user’s previous informa- tion interaction, he posits that information needs or relevance as developing inside the mind of a user cannot be understood disregarding the development in our collective knowledge, and thus, the time reflected throughout information collections.
In this section I examine work on temporal relevance modelling, which relates to incorporat- ing temporal aspects into IR relevance models which have conventionally focused solely on topical relevance.
3.10.1
Implicit Temporal Clues: Relevant Item Distribution
Several temporal dynamics arise when relevance is considered over time. Firstly, in a collec- tion where items are timestamped, the post-retrieval timestamp distribution of top retrieved items (i.e., those deemed most relevant) often contains implicit time clues of events and phe- nomena related to the query posed. Secondly, items considered relevant to a given query may change over time as users’ underlying intentions behind the information need changes.
3.10 Time in Matching/Relevance
Figure 3.4: Temporal dynamics in the timestamp distribution of relevant documents for TREC-2 topic 64 (“hostage taking”) in the AP 88-89 news wire collection.
Indeed, these items may or may not be timestamped. Hence, there are more subtle temporal behavioural factors involved. I discuss these temporal dynamics in the following subsec- tions.
Retrieved Item Timestamp Distribution
Collections containing timestamped information items often cover events (e.g., news with a publication time, or email with a sent/received time). Many information needs for these collections are time-sensitive in the sense that the topics they cover may relate to specific periods in time (Dakka et al., 2012; Jones and Diaz, 2007).
Illustrated in Figure 3.4 are temporal dynamics in the relevant document timestamp distri-
bution for the query “hostage taking”, in the TREC AP88-89 news wire collection1.
While hostage taking is discussed for the duration of the collection, there are clear events represented by spikes in relevant documents, most notably in May 1988 and August 1989. These periods correspond with reporting on major hostage taking events.
Following an initial retrieval, with analogous intuition to pseudo-relevance feedback, the pat- terns contained in the timestamp distribution of the top-k topically-relevant results may be considered an indicative sample of the time distribution of all relevant results (e.g. periodic or recency skewed, etc.). A substantial body of works exists for exploiting this temporal dynamic in time-aware IR tasks, including identifying highly relevant time periods for explo- ration, diversity, performing query expansion and query performance prediction.
3.10 Time in Matching/Relevance Jones and Diaz (2007) propose several techniques to identify and classify temporal informa- tion needs based on the relevant document timestamp distribution. Following classification of a temporal query, query performance prediction is enhanced by modelling the time-based distribution of all results.
Dakka et al. (2012) use the relevant item timestamp distribution to estimate temporal priors which are used as feedback for established retrieval models to improve retrieval effectiveness. Efron et al. (2014) expand this idea to Microblog search, and propose a framework for what they term the ‘temporal cluster hypothesis’ – positing that relevant documents often cluster together in time (i.e., for a specific event). Rather than modelling priors, IR effectiveness in this scenario is improved by biasing temporally adjacent results using a temporal density function based on kernel density estimation. Meanwhile, Berberich and Bedathur (2013) di- versify retrieval results by selecting most the relevant items from all temporal clusters. Further exploiting the fact that relevant documents for many time-based information needs tend to cluster in time, Peetz et al. (2014) extract distinctive terms found in bursts of doc- uments detected in the relevant item timestamp distribution in order to expand queries and improve retrieval performance. Meanwhile, Massoudi et al. (2011) improve retrieval by ex- panding queries with distinctive terms found in the most recent relevant items, thus capturing only the most recent discussion topics.
Since relevant item timestamp distribution is an implicit artefact of the retrieval model, it is relatively sensitive to the underlying retrieval model effectiveness. Consequently, when the initial retrieval is poor, then it is unlikely to sufficiently reveal an accurate timestamp distri- bution of relevant results. Accordingly, further work is needed to understand how reliable this feature is for temporal modelling in different scenarios.
Relevant Item Distribution
In contrast to the relevant item timestamp distribution, temporal dynamics in relevant items may be observed as users select different items for the same query over time (and thus, in- dicate temporal relevance changes). This change reflects a shift in intent underlying a user’s query. This notion of time-sensitive relevance was first noted by Kulkarni et al. (2011) after observing increased entropy in search result clicks for some queries over time.
For example, consider the non-specific web search query “party planning”. Depending on the time of year, the type of party the user is likely to be planning will vary, and therefore, so too the relevant search results they are likely to desire. To illustrate the change in party planning intent, in Figure 3.5, using Google Trends data over two years I present the popu- larity temporal dynamics of the each party type intent, e.g., “halloween”, “christmas”,
3.10 Time in Matching/Relevance
Figure 3.5: Temporal dynamics of six possible intents for the web search query “party planning”, based on data from Google Trends.
“easter”, “birthday” and “summer”. Also included is the scaled popularity of the “party planning” query itself, showing occasional temporal correlation with the intent popularities – for relatively large query bursts around major events – over time. Note that the scaling is relative only, since Google does not release absolute popularity statistics for queries.
In Figure 3.5, it is clear that birthday parties are relatively popular uniformly throughout the year, albeit with a decline over summer in favour of more general summer parties. Clear temporal dynamics are present indicating the seasonal change in party planning intent, such as Christmas party planning in the run up to Christmas. While the majority of temporal dynamics in this example are periodic, there could equally be one-off short-term bursts of changing query intent, such as for “street party planning” during the celebration
of the British Queen’s Diamond Jubilee celebrations1 in the United Kingdom at the start of
June, 2012.
Keikha et al. (2011) take a different view on the temporal dynamics of relevant items to improve blog discovery. To this end, they examine the relevance of the a blog to a given query over different periods of time. A blog which is consistently relevant for a topic over all periods time is considered to be most relevant for the query.
Central to many of the temporal dynamics themes discussed throughout this thesis, Radinsky
1The Diamond Jubilee event was the period of national celebration for the 60 year reign of Queen Elizabeth
3.11 Time in Information Collections