4 Análisis de Información con el Módulo Geotech
4.10 Resaltar Datos Estructurales
We introduce another algorithm that allows determining the rate popularity of process information based on user ratings. In enterprises, existing portals often allow users to rate the quality of process information, e.g., by means of “like buttons” or “five stars ratings”. The set of ratings R can then be used to determine the rate popularity RP (v) of an information object v. However, ranking information objects based on user ratings is a non-trivial task. First, we show that existing algorithms are not directly applicable to POIL. Second, we develop the SIN rate popularity (RP) algorithm, which allows determining the rate popularity of information objects in a SIN (cf. Figure 5.22).
AverageRate algorithm
TotalNumber algorithm SIN RP algorithm
not sufficient: not sufficient: sufficient:
Figure 5.22: Rate popularity algorithms.
An approach to determine the rate popularity RP (v) of an information object v is to rank information objects according to their total number of ratings (cf. Formula 5.8):
RP (v) = |R(v)| (5.8)
Another approach is to determine the rate popularity RP (v) based on the average user rating of an information object v (cf. Formula 5.9):
RP (v) = avg(R(v)) = X
r∈R(v)
r
|R(v)| (5.9)
Formulas 5.8 or 5.9 are not appropriate in the context of a SIN. Both formulas tend to prefer older information objects that have been available for a longer time; i.e., there
has been more time for users to rate for these information objects. This shortcoming is rather problematic in enterprise environments with continuously emerging information objects. Moreover, the use of Formula 5.9 results in another problem: Assume that in a “five stars rating” there is an information object with an average rating of 4.8, which is based on hundreds of individual ratings. Additionally, assume that another information object is rated by one user with 5.0. The latter information object is then directly ranked on the first position. To avoid this, all ratings must be taken into account.
Thus, we calculate the rate popularity consistent with Bayesian interpretation (to evaluate the probability of the hypothesis) [192]. Formula 5.10 allows calculating the average rating avg(R) of all information objects. Formula 5.11 then calculates the rate popularity RP (v) of a single information object v, taking both the set of ratings R and the age of the information objects into account. Thus, we ensure that information objects with few, but favorable ratings are not ranked on first positions:
avg(R) =X v∈V |R(v)| ∗ avg(R(v)) |R| (5.10) RP (v) = |R| |{v∈V |R(v)>0}|∗avg(R) + |R(v)|∗avg(R(v)) |R| |{v∈V |R(v)>0}|+|R(v)| age(v) (5.11)
Based on Formulas 5.10 and 5.11, one can determine the rate popularity of information objects. The SIN RP algorithm8 shows how the rate popularity value is calculated for each information object v taking the set of user ratings R into account:
Algorithm 5: SIN RP Algorithm.
Input: SIN = (V, E, L, W, fl, fw) a SIN, R the set of ratings;
Output: RP (v) for each v ∈ V where |R(v)| > 0;
1 begin 2 foreach v ∈ V do 3 if |R(v)| > 0 then 4 avg(R)= |R(v)| ∗ avg(R(v)) / |R|;+ 5 foreach v ∈ V do 6 if |R(v)| > 0 then
7 pop = ((|R| / |{v ∈ V | R(v) > 0}| ∗ avg(R)) + (|R(v)| ∗ avg(R(v)))); 8 pop = pop / (|R| / |{v ∈ V | R(v) > 0}| + |R(v)|);
9 RP (v) = pop / age(v);
To better understand the SIN RP algorithm, we compare it with other approaches, specifically with the TotalNumber and the AverageRate algorithm. For this purpose,
we use available user ratings of information objects that we have adopted from a real- world enterprise portal. Note that we exclude the age of information objects from our analysis such that results of the SIN RP algorithm are comparable to other algorithms not considering the age of information objects. Table 5.4 shows the comparison results.
# TotalNumber AverageRate SIN RP
(cf. Formula 5.8) (cf. Formula 5.9) (cf. Formula 5.11)
A 22 4.0 3.730 B 4 3.2 3.179 C 10 4.2 3.670 D 8 1.4 2.411 E 12 3.7 3.452 F 2 5.0 3.461 G 15 2.8 2.954 H 12 1.6 2.338
Table 5.4: Results obtained when applying the SIN RP algorithm to the example. If we rank the rated information objects by the TotalNumber algorithm (cf. For- mula 5.8), the most popular information object will be “A”. However, when applying the AverageRate algorithm (cf. Formula 5.9), information object “F” will be the most popular one. However, as a problem, information objects with only few good ratings are ranked on the first positions in the ranking. The SIN RP algorithm (cf. Formula 5.11) addresses this problem. Information objects with many good ratings (e.g., “A” or “C”) are now ranked higher than “F”. The SIN RP algorithm ensures that an information object with two user ratings and an average rating of 5.0, for example, is not ranked higher than an information object with 50 user ratings with an average rating of 4.9.
Other research influenced the development of the SIN RP algorithm. An approach to improve search results based on user ratings, for example, is presented by Vassilvitskii and Brill [193]. In Lowd et al. [194], a study on rate popularity algorithms is presented and their advantages and disadvantages are discussed. Similar to the SIN RP algorithm, a self-learning algorithm is provided in Bian et al. [195]. The latter addresses both user ratings and content relevance. Like the link popularity algorithms, existing rate popularity algorithms cannot be directly applied to a SIN. Reason is that they do not allow dealing with specific characteristics of the SIN.
Validation (SIN RP Algorithm). We use an automotive scenario to validate the ap- plicability of the SIN RP algorithm. Hence we summarize results in this section; further results are provided in Section 9.2. The results of the SIN RP algorithm were consid- ered as useful by the case study participants. In fact, most participants stated that the ranking of process information as suggested by the SIN RP algorithm is both plausible
and useful. Additionally, the SIN RP algorithm avoids the problematic situation that process information with only a few good user ratings is directly ranked on the first position of a ranking.
Altogether, the SIN RP algorithm allows determining the rate popularity of informa- tion objects based on user ratings. The algorithm can be further extended to include additional factors, e.g., the experience of knowledge workers and decision makers. For example, user ratings of experienced knowledge workers could be weighted higher.