Codificaci´ on y decodificaci´ on

3. C´ odigos lineales

3.4. Codificaci´ on y decodificaci´ on

In this thesis, query intent is considered in terms of the popularity of user needs. In other words, query intent is defined as the most popular understanding of the information need of a typical user in Web search. For this reason, throughout the labeling process, the an- notators were asked to judge the assumed commercial intent of the search queries from the perspective of a general user. However, different users may issue the same query and have varying information needs, suggesting that ambiguity exists in query intent identification. Novelty and diversity (Agrawal et al., 2009; Clarke et al., 2008) behind user’s queries is another aspect that could be targeted in sponsored search. It is unrealistic to assume that the relevance of an ad to the commercial need of a user is independent of the neighbor ads displayed at the same time. As a result of this assumption, redundant ads, in terms of their topic coverage, may be shown for a given query. For example, the ads for the query “Banff Alberta” should have diversity, covering different topics about Banff, such as “Hotels in Banf”, Vacation Packages in Banff, “Hot Springs in Banff”, and “Banff Helicopter Tour” rather than dedicating several ads to pages specifically on hotel packages for Banff.

As Web queries have different meanings for different users, the results shown for queries should reflect the diversity in various query topics. Abandonment of ads appears to be more likely than the regular Web results due to the users bias against ads or to the lack of coverage of the topic of interest. Returning similar ads of a strong individual relevancy to a given query may produce a high score on a standard evaluation measure (Aslam et al., 2005; Sakai, 2009), but would certainly be viewed unfavorably by a user who might be interested in a possibly rarer but rather different aspect of the same query, and therefore it may not score hight on intent-aware evaluation measures (Ashkan and Clarke, 2011; Clarke et al., 2011). Possible future work in this direction could evaluate ads in a context that takes the diversity of displayed results into account, as well as their relevancy.

The query intent analysis in this work is limited to the commercial category of query intent, its sub-categories in terms of brand/retailer/product information, and the traditional categories of query intent, navigational and informational. This study can be seen as an initial step towards a long-term goal of extending the traditional categories of Web queries by developing and evaluating the seeds of a taxonomy for commercial search. Expanding this taxonomy, studying and comparing the clickthrough behavior for different dimensions of commercial intent, represents a future direction for this work.

The empirical evaluations of this study is limited to a data set from a single commercial search engine. An interesting experiment would be to replicate this work over other sources of data and compare the results. The available data consists of a sample of SERPs from a larger pool of data, but in the thesis it is assumed that the sample is independent and identically distributed (i.i.d.). Once the SERPs are sorted based on their time, they are treated as a sequence of result pages that can be targeted in an online setting. The complete pool of data with the sorted SERPs would provide a better sample of the reality.

Furthermore, the limitations in the available data set brings some limitations to the experimental studies performed in the thesis. For instance, the location of the ads is not recorded in the data, which creates ambiguity in location-based analysis. For this reason, in the last round of experiments in Chapter 5, a sample of search result pages with eight ads displayed are used such that the location of ads can be certainly identified.

textual factors would be helpful in clickthrough analysis. An effective way of evaluating the performance of contextual factors would be to study the performance of a general click prediction model that works based on the content of the ads once in presence of context- based information and once in absence of this information, and compare them against each other. This could not be studied in the thesis due to the lack of information about the content of ads in the data.

The evaluation results from Chapter 5 indicate that significant improvements can be achieved in click prediction once the overall quality of ads shown on a result page, along with location bias and query biases, are taken into consideration. Further investigation in this direction over other well-known click models are among the future directions for this work. Comparing the effectiveness of these factors in sponsored search versus organic search is also among the directions for future work.

With respect to the user behavior modeling studies conducted towards the end of the thesis, the simplifying assumptions regarding a user’s approach to browsing an ad list introduces limitations into the work. Instead of linearly browsing through the list, a user may randomly view an ad at a particular rank position or location, or they may move up and down in the list during their browsing session. However, the cascade assumption of linearly browsing enables us to represent the behavior of the majority of users, which may be considered as a reasonable starting point to better understand user behavior in sponsored search. It also enables us to model ads in the context of the preceding ads. More complex models are required in the future in order to address random viewing and skipping over different positions in the ad list.

Last, but not least, variability of user behavior is modeled through the parameters defined over queries by assuming that users issuing the same query have generally similar behavior. As indicated by Carterette et al. (2012) no two users interact with a system in exactly the same way. While a query-based representation of users appears to be effective in the domain of sponsored search (Yan et al., 2009), a more realistic assumption would define the parameters over user/query pairs.

Appendix A

Details of Parameter Estimation for

the Location– and Query–Aware

Model

Details on the inference algorithm used throughout Chapter 5 are provided in this appendix. Note that the variables across this section are mostly used in their general form. By adding a superscript j to the variables, the same formulations can be used for a particular SERP j from the search log.

In document Grado en matemáticas. Curso Códigos correctores de errores. Juan Jacobo Simón Pinero (página 41-45)