• No se han encontrado resultados

Capítulo 3: Legislación y Jurisprudencia comparada

3.4 España

State-of-the-Art and Its Limitations:Collaborative filtering algorithms are at the heart of rec- ommender systems for items like movies, cameras, restaurants and beer. Most of these meth- ods exploit user-user and item-item similarities in addition to the history of user-item ratings — similarities being based on latent factor models over user and item features [Koren 2015], and more recently on explicit links and interactions among users [Guha 2004b,West 2014].

All these data evolve over time leading to bursts in item popularity and other phenomena like anomalies[Günnemann 2014]. State-of-the-art recommender systems capture these tem- poral aspects by introducing global bias components that reflect the evolution of the user and community as a whole[Koren 2010]. A few models also consider changes in the social neighborhood of users[Ma 2011]. What is missing in all these approaches, though, is the awareness of how experience and maturity levels evolve in individual users.

Individual experience is crucial in how users appreciate items, and thus react to recommenda- tions. For example, a mature cinematographer would appreciate tips on art movies much more than recommendations for new blockbusters. Also, the facets of an item that a user focuses on change with experience. For example, a mature user pays more attention to narrative, light effects, and style rather than to actors or special effects. Similar observations hold for ratings of wine, beer, food, etc.

Our approach advances state-of-the-art by tapping review texts, modeling their properties as latent factors, and using them to explain and predict item ratings as a function of a user’s experience evolving over time. Prior works considering review texts (e.g., [McAuley 2013a, Wang 2011b,Mukherjee 2014a,Lakkaraju 2011,Wang 2011b]) did this only to learn topic sim- ilarities in a static, snapshot-oriented manner, without considering time at all. The only prior work [McAuley 2013b], considering time, ignores the text of user-contributed reviews in harnessing their experience. However, user experience and their interest in specific item facets at different timepoints can often be observed only indirectly through their ratings, and more vividly through her vocabulary and writing style in reviews.

Consider the reviews and ratings by a user on aCanon DSLRcamera about the facet lens at two different timepoints in his lifecycle in the electronics review community.

Example IV.2.1 [Posted on: August, 1997]: My first DSLR. Excellent camera, takes great pictures in HD, without a doubt it brings honor to its name. [Rating: 5]

[Posted on: October, 2012]: The EF 75-300 mm lens is only good to be used outside. The 2.2X HD lens can only be used for specific items; filters are useless if ISO, AP, ... The short 18-55mm lens is cheap and should have a hood to keep light off lens. [Rating: 3]

The user was clearly an amateur at the time of posting the first review; whereas, he is clearly more experienced a decade later while writing the second review, and more reserved about the lens quality of that camera model.

IV.2. Motivation and Approach

Future recommendations for this user should take into consideration her evolved maturity at the current timepoint.

As another example, consider the following reviews of Christopher Nolan movies where the facet of interest is the non-linear narrative style.

Example IV.2.2 User 1 on Memento (2001): “Backwards told is thriller noir-art empty ulti- mately but compelling and intriguing this.”

User 2 on The Dark Knight (2008): “Memento was very complicated. The Dark Knight was flawless. Heath Ledger rocks!”

User 3 on Inception (2010): “Inception is a triumph of style over substance. It is complex only in a structural way, not in terms of plot. It doesn’t unravel in the way Memento does.”

The first user does not appreciate complex narratives, making fun of it by writing her review backwards. The second user prefers simpler blockbusters. The third user seems to appreciate the complex narration style of Inception and, more of, Memento. We would consider this maturity level of the more experienced User 3 to generate future recommendations to her.

We model the joint evolution of user experience, interests in specific item facets, rating behavior and writing style (captured by her language model) in a community. As only item ratings and review texts are directly observed, we capture a user’s experience and interests by a latent model learned from her reviews, and vocabulary. All this is conditioned on time, considering the maturing rate of a user. Intuitively, a user gains experience not only by writing many reviews, but she also needs to continuously improve the quality of her reviews. This varies for different users, as some enter the community being experienced. This allows us to generate individual recommendations that take into account the user’s maturity level and interest in specific facets of items, at different timepoints.

We propose two approaches to model this evolving user experience, and her writing style: the first approach considers a user’s experience to progress in a discrete manner (refer to SectionIV.2.1 for overview); whereas, the next approach (refer to SectionIV.2.2for overview) addresses several drawbacks of this discrete evolution, and proposes a natural and continuous mode of temporal evolution of a user’s experience, and her language model.

IV.2.1 Discete Experience Evolution

Approach: In the first approach, we assume that the user experience level is categorical with discrete levels (e.g., [1, 2, 3, ··· ,E]), and that users progress from each level to the next in a discrete manner. The experience level of each user is considered to be a latent variable that evolves over time conditioned on the user’s progression in the community.

We develop a generative HMM-LDA model for a user’s evolution, where the Hidden Markov Model (HMM) traces her latent experience progressing over time, and the Latent Dirichlet

Experience Beer Movies News

Level 1 bad, shit stupid, bizarre bad, stupid

Level 2 sweet, bitter storyline, epic biased, unfair

Level 3 caramel finish,

coffee roasted

realism, visceral, nostalgic

opinionated, fallacy, rhetoric

Table IV.1: Vocabulary at different experience levels.

Allocation (LDA) model captures her interests in specific item facets as a function of her (again, latent) experience level. The only explicit input to our model is the ratings and review texts upto a certain timepoint; everything else – especially the user’s experience level – is a latent variable. The output is the predicted ratings for the user’s reviews following the given timepoint. In addition, we can derive interpretations of a user’s experience and interests by salient words in the distributional vectors for latent dimensions. Although it is unsurprising to see users writing sophisticated words with more experience, we observe something more inter- esting. For instance in specialized communities likebeeradvocate.comandratebeer.com, experienced users write more descriptive and fruity words to depict the beer taste (cf. Ta- bleIV.5). TableIV.1shows a snapshot of the words used by users at different experience levels to depict the facets beer taste, movie plot, and bad journalism, respectively.

Contributions: This discrete-experience evolution model is discussed in-depth in SectionIV.3

that introduces the following novel contributions:

a) The first model (SectionIV.3.1) to consider the progression of user experience as ex- pressed through the text of item reviews, thereby elegantly combining text and time.

b) An approach (SectionIV.3.3,IV.3.4),to capture the natural smooth temporal progression in user experience factoring in the maturing rate of the user, as expressed through her writing.

c) Offers interpretability by learning the vocabulary usage of users at different levels of experience.

d) A large-scale experimental study (SectionIV.3.5) in five real world datasets from different communities like movies, beer, and food.

IV.2.2 Continuous Experience Evolution

Limitations of Discrete Evolution Models: SectionIV.2.1gives the motivation for the evolu- tion of user experience and how it affects ratings. However, the proposed approach and its precursor [McAuley 2013b] make the simplifying assumption that user experience is categori- cal with discrete levels (e.g. [1, 2, 3, . . . , E ]), and that users progress from one level to the next in a discrete manner. As an artifact of this assumption, the experience level of a user changes

IV.2. Motivation and Approach

abruptly by one transition. Also, an undesirable consequence of the discrete model is that all users at the same level of experience are treated similarly, although their maturity could still be far apart (if we had a continuous scale of measuring experience). Therefore, the assumption of exchangeability of reviews — for the latent factor model in the discrete approach — for users at the same level of experience may not hold as the language model changes.

The prior work [McAuley 2013b] assumes user activity (e.g., number of reviews) to play a major role in experience evolution, which biases the model towards highly active users (as opposed to an experienced person who posts only once in a while). In contrast, the discrete version of our own approach (refer to SectionIV.2.1) captures interpretable evidence for a user’s experience level using her vocabulary, cast into a language model with latent facets. However, this approach also exhibits the drawbacks of discrete levels of experience, as discussed above.

Therefore, we propose a continuous version of experience evolution that overcomes these limitations by modeling the evolution of user experience, and the corresponding language model, as a continuous-time stochastic process. We model time explicitly in this work, in contrast to the prior works.

Approach: This is the first work to develop a continuous-time model of user experience and language evolution. Unlike prior work, we do not rely on explicit features like ratings or number of reviews. Instead, we capture a user’s experience by a latent language model learned from the user-specific vocabulary in her review texts. We present a generative model where the user’s experience and language model evolve according to a Geometric Brownian Motion (GBM) and Brownian Motion process, respectively. Analysis of the GBM trajectory of users offer interesting insights; for instance, users who reach a high level of experience progress faster than those who do not, and also exhibit a comparatively higher variance. Also, the number of reviews written by a user does not have a strong influence, unless they are written over a long period of time.

The facets in our model (e.g., narrative style, actor performance, etc. for movies) are generated using Latent Dirichlet Allocation. User experience and item facets are latent variables, whereas the observables are words at explicit timepoints in user reviews.

The parameter estimation and inference for our model are challenging since we combine discrete multinomial distributions (generating words per review) with a continuous Brownian Motion process for the language models’ evolution, and a continuous Geometric Brownian Motion (GBM) process for the user experience.

Contributions: To solve this technical challenge, we present an inference method consisting of three steps: a) estimation of user experience from a user-specific GBM using the Metropolis Hastings algorithm, b) estimation of the language model evolution by Kalman Filter, and c) estimation of latent facets using Gibbs sampling. Our experiments, with real-life data from five different communities on movies, food, beer and news media, show that the three components coherently work together and yield a better fit of the data (in terms of log-likelihood) than

the previously best models with discrete experience levels. We also achieve an improvement of ca. 11% to 36% for the mean squared error for predicting user-specific ratings of items compared to the baseline of [McAuley 2013b], and the discrete version of the model (refer to SectionIV.2.1for overview).

This continuous-experience evolution model is discussed in-depth in SectionIV.4that intro- duces the following novel contributions:

a) Model: We devise a probabilistic model (SectionIV.4.1) for tracing continuous evolution of user experience, combined with a language model for facets that explicitly captures smooth evolution over time.

b Algorithm: We introduce an effective learning algorithm (SectionIV.4.2), that infers each users’ experience progression, time-sensitive language models, and latent facets of each word.

c) Experiments: We perform extensive experiments (SectionIV.4.3) with five real-word datasets, together comprising of 12.7 million ratings from 0.9 million users on 0.5 million items, and demonstrate substantial improvements of our method over state-of-the-art baselines.

As an interesting use-case application of our experience-evolution model, we perform an experimental study (SectionIV.5) in a news community to identify experienced members who can play the role of citizen journalists in the community. This study is similar to SectionIII.7.5

for credibility analysis — with the additional incorporation of temporal evolution.

IV.3 Discrete Experience Evolution