• No se han encontrado resultados

HÉCTOR ZULUAGA CASTELLANOS

5.2 COMUNICACIÓN Y LENGUAJE

5.2.1 Comunicación: Hablamos de comunicación en el sentido amplio

5.2.2.1 Lenguaje y cognición: Dada la estrecha relación existente entre

This experimentation was conducted in order to compute the execution time for the algorithms used in the recommendation process. The systematic review conducted in this thesis demonstrated that few works evaluated the performance of RS, which is a critical factor specially for applications that need responses with short timeouts. Therefore it is still an open issue considering that accessing to Linked Data datasets in most cases is time consuming and requires that researchers download dumps of the datasets to access them in local repositories.

Accordingly, this section presents the results of the evaluation of the perfor- mance for the generation layer and ranking layer, because the most critical tasks in the recommendation process, described in chapter 3 for the AlLied framework, are performed by these layers. This experimentation is divide in evaluation for graph-based algorithms and machine learning algorithms.

6.4.1

Performance for graph-based algorithms

This study evaluated the performance for the algorithms for generation and ranking layers. Table 6.4 shows the mean values for the execution time (in milliseconds) as well as the number of candidate films (resources) generated by the three algorithms for the generation layer in the graph-based implementation. The number of candi- date films generated was different for each algorithm. The traversal generator shown to be the more restrictive as it extracted films that were only related by traversal links (direct or indirect links), while the hierarchical obtained the films that were re- lated by hierarchical information which contains much more links than the traversal relationships. The ReDyAl algorithm generated an intermediate number of can- didate resources as it uses dynamically the hierarchical and traversal relationships depending on the number of traversal links of the initial film (query).

With regard to the execution time, it was expected that even though the best relationship candidate resources-execution time was scored by the ReDyAl algo- rithm with a value of 3.7 resources per millisecond and the worse was scored by the Traversal generator with (0.28). This results shown, that generating candidate resources dynamically, not only allows to improve the accuracy and novelty but also the execution time.

Generator Algorithm T (ms) Candidate Films

ReDyAl 1911.3 7069.0 HierarchicalREC 5379.2 11513.1 Traversal REC 15637.5 4404.5

Table 6.4. Performance for generation layer algorithms

Table 6.5 shows the results of the performance for the ranking layer algorithms. The algorithms evaluated were the HybridSimpleRank, Traversal LDSD, and the Traversal HyProximity.

The ranker algorithms were tested with a fixed number of candidate resources, otherwise it is not possible to compare the algorithms. Though, this is not the real situation because the generation layer algorithms may generate different number of candidate resources depending on the initial film. The value of candidate resources selected for the experimentation was 23000 because it was the mean value among the generation algorithms presented in table 6.4.

This experimentation demonstrated that the faster ranking algorithm was the Traversal Ranker HyProximity and the lowest was the HybridSimpleRank. This was expected as the HybridSimpleRanker computes the similarity values based on both hierarchical and traversal links.

Ranking Algorithm T(ms)

HybridSimpleRank 3545092 Traversal LDSD Rank 3255383 TraversalRankerHyProx 1454162

Table 6.5. Performance for ranking layer algorithms

The results for performance of the graph-based algorithms demonstrated that there is still a need for improving them because, even though the accuracy measures are good enough, the execution times are not suitable for applications where the final user wants a response time of only few milliseconds.

6.4.2

Performance for machine learning algorithms

This experimentation was conducted in order to compute the execution time for the algorithms used in the recommendation process. In this study the algorithms evaluated are: the algorithm k-means for clustering used in the generation layer; the algorithm kNN for classification (used with new films that are not part of the LODMatrix, that needed to be assigned to a cluster); and the ranking functions. Execution times were computed for two phases, because some machine learning algorithms require a previous phase for training know as offline execution and a second phase of testing known as online execution.

Offline execution

The offline execution is considered for those tasks of the machine learning that may be performed at different time than when the user is requiring recommendations, for example before the system is deployed for its first use, or in night hours when the system is rarely used. The two algorithms that needed a training phase were the k-means and the kNN and their execution time are presented in table 6.6. The most expensive task in the online execution was the clustering with the k-means, because it had to load the full LODMatrix to compute the similarity among all the films and locate them in clusters with similar films.

Offline execution T(ms)

Training k-means 211440 Training kNN classifier 6000

Online execution

The online execution is when the user is executing the recommender system, so the algorithms in this phase are: testing the KNN algorithm, only for those films that were not within the LODMatrix, so they needed to be located in one of the 55 clusters precomputed; searching for films in the same cluster as the query film; and the ranking films which sorts the films of a cluster according to their similarity with the query film.

Table 6.7 shows the execution times for the online tasks executed by the machine learning implementation of the AlLied framework. The most expensive task in the online execution was testing the KNN algorithm, however this task is only used when a new film is added to the LODMatrix as explained before. The second expensive task is the searching for films in the same cluster as the query film followed nearly by the ranking task. The time of the ranking task was computed as the mean among 100 execution of the ranking functions.

Online execution T(ms)

Testing KNN 60000

Searching for films in the same cluster as the query 5000

Ranking 8000

Table 6.7. Online execution