CAPÍTULO I. Fundamentación Teórica
1.2 La terminación de las relaciones laborales
1.2.1 Terminación del contrato de trabajo
1.2.1.1 Formas de terminación del contrato laboral
We nd the scoring of diculty and education level of web pages very challenging. The main issue is how to identify, with little ambiguity, the diculty and the education level automatically from a web page. Also, our analysis of the teaching resources avail- able in the DAtaset of Joint Educational Entities (DAJEE) (Estivill-Castro et al., 2016) does not nd any set of terms that can indicate the diculty and education level of the resources. Therefore, we will not invest more eorts on discovering the diculty or the education level for a web page. The source of the problem is that educational resources contain words to instruct a concept, with no reference to the diculty or education level of the resource itself.
Although in Chapter 5 we found that existing search engines and IR methods benet from either diculty or education level, after several attempts we have not found a systematic approach to successfully exploiting this information for a further boost in the performance of WebEduRank. Thus, these two attributes require further consideration and investigation before including them in WebEduRank. In Section 5.2 and Section 5.3, we reported a comprehensive analysis of the performance of the baseline methods with and without diculty and education level in the query. For some of the systems we tested, the performance deteriorates if we remove one of them. Hence, these attributes carry on valuable information, but we still have not found an approach to properly include them for WebEduRank. No matter what attributes the ranking methods use, we can still compare which method oers a more accurate educational-oriented ranking
of web pages.
6.3 Matching the attributes of the Instructor Prole with
web pages: the Expectancy Appearance Matrix
(EAM)
Each attribute of the Teaching Context represents particular information about the educational context of an instructor. We have dened four components of a web page, and the attributes of the Teaching Context will have dierent presence in each of those sections of a web page. For the ranking process, WebEduRank considers ve attributes of our Teaching Context, and we will be analysing their appearance in the text fragments of the four sections of a web page. We propose to use an Expectancy Appearance Matrix (EAM) for implementing a weighting mechanism. This mechanism is based on the expectation that an attribute of the Teaching Context appears in a section of the web page. Therefore, the Expectancy Appearance Matrix (EAM) reects a certain form of the co-variance that an attribute of our Teaching Context is reected among a section of the web page. The rationale is that an attribute is more likely to appear in a part of a web page instead of others. We do not want to boost the score as it is common for BM25F and TFIDF, but we aim to reward webpages which have elements of the Teaching Context in the right section, where those elements should be. A section of the web page expresses a content with a certain meaning (for example, the section links mostly refers to related concepts, while the section highlights shall hold content about important concepts). A weighting method based on EAM should lter noise when an attribute of the teaching context appears repeatedly in a section of the web page where we do not expect to nd it. For example, we consider noise a situation where a web page presents a high frequency of the attribute concept name in the links section. We hypothesise that the web page itself explains the concept without referring to other material to explain it. Following this example, we want to reward more a web page which has a high frequency of the attribute concept name in the title and body sections. We design the EAM to help the WebEduRank in better ranking web pages by looking for each piece of the Teaching Context in the right section. Formally, EAM is a 4x5 matrix,
where the rows are the four components of the web pages analysed by WebEduRank, and the columns are the attributes of the Teaching Context. The element aij ∈ EAM
is a weight expressing the expectancy that the i-th section of the web page contains the
j-th attribute of the Teaching Context.
The formulation of the EAM raises a new challenge for deciding the expectancy of each attribute into each section of the web page. We can determine values for this matrix using dierent approaches, allowing the tuning of WebEduRank discovering the most appropriate values for each element of the matrix. In theory, we could tune the EAM using machine learning techniques, nding the best setting according to the ratings of web pages (Pérez-Agüera et al., 2010). Practically, we need a considerable amount of web pages contextualised in teaching contexts, with also user ratings or other information to identify which pages are useful for a context. Since we are using a new formulation of the Teaching Context, we do not have such data available in current datasets. Although we have conducted a data collection phase for evaluation purposes (see Section 5.3.3), we do not have enough data for tuning the EAM with machine learning approaches. The optimisation of IR methods is a common problem for researchers in IR because of the considerable amount of data required for this task, even for data-scientists who work with large IR systems (Pérez-Agüera et al., 2010).
Therefore, we can only tune the EAM based on our experience, although, this may not be the optimal conguration. However, this is not a big issue for the scope of this thesis. If we prove that our WebEduRank performs better than the baseline methods even with a non-optimal EAM, then the optimisation of the EAM will improve such performance. Therefore, we propose and evaluate a simpler WebEduRank although we are aware it could be optimized further.