2. Estado del arte
2.3. Emparejamiento de imágenes
Canonical correlation analysis (CCA) is a common way to inspect the rela- tionship between two sets of variables based on their correlation. However, the method produces inaccurate estimates of parameters, and non-generalizable results that are hard to interpret in case of insufficient sample sizes or high cor- relations between the variables in the data. Recently developed modifications of CCA, such as regularized CCA and sparse CCA, that impose L2 and L1 norm regularization respectively, were used to address such weaknesses. More details about the methods are presented in Chapter 8, sections 8.3.2 and 8.3.3. All the three versions of CCA were used to analyze the the same sample of evaluation data for the Introductory Programming with MatLab course held on January 2010, the largest course, where all 6 questions of form B (evaluations of instructor) were active.
Figure 5.7 illustrates that the correlation between the students answers on different questions of evaluation survey were quite high.
The association between how students evaluate the course and how students evaluate the teacher was found to be quite strong in all three cases. However, regularized and sparse CCA produced results with increased interpretability over traditional CCA. The traditional CCA reported that the first 4 canonical corrections are statistically significant. This means that the structure of cor- relation lies in a 4 dimensional space, which is hard to visualize and interpret. The regularized and sparse CCA reported just one significant canonical correla- tion. The structures of these correlations presented in figure 5.8 and figure 5.9 respectively.
The two structures were similar, but the correlation structure resulting from the regularized CCA had more variables than the correlation structure for the
Figure 5.7: The correlations between the students answers on the student eval- uation survey.
Figure 5.8: The structure of the regularized canonical correlation between the two parts of course evaluation.
Figure 5.9: The structure of the sparse canonical correlation between the two parts of course evaluation.
5.1 Association between student evaluations of courses and instructors 67
sparse CCA. This is mainly due to the fact that sparse CCA set the canonical weights of unimportant variables to zero, while the regularized CCA just shrinks these canonical weights, while the canonical factor loadings and cross-loadings can still show the importance of the variable.
The simplest model was obtained from the sparse canonical correlation analysis. The association between how students evaluate the course and how students evaluate the teacher was found to be due to the relationship between (on the course side, Form A) the good continuity between teaching activities in the course, content of the course, teaching material and overall quality of the course and (on the teacher side, Form B) the teachers ability to give a good grasp of the academic content of the course, teachers ability to motivate the students, and teachers good communication about the subject.
To check for the stability of the correlation structures, the subsequent years of the course should be analyzed. However, the introduction to programming with MatLab course had different numbers of students registered (from 100 to 350 students). The evaluation response rates range from 19% to 50%. Therefore, in some of the terms the number of observations was too small to conduct a proper analysis. Figures 8.7 and 8.8 present the correlation structures resulting from a sparse canonical correlation analysis of the evaluations in June 2011 and June 2012 respectively.
Figure 5.10: Structure of canonical correlation between the two parts of the course evaluation in June 2011
Overall, the two structures are similar. The only difference is on the side the evaluation of the teacher, where question B.2.2 (The teacher is good at helping me to understand the academic content) from the structure in 2011, while B.1.3 (The teacher motivates us to actively follow the class) was in the canonical correlation structure for 2012. The figures also show the weights each variable had in the latent canonical variable. The weights were different for the two years. However, it can be explained by the main teachers of the course being different.
Figure 5.11: Structure of canonical correlation between the two parts of the course evaluation in June 2012
The association between how students rate the teacher and the course was found to be subject to change with the change of teaching methods and with the change of teacher.
5.2
Text mining of student comments
The first part of the project considered the association between the two quan- titative parts of the evaluation survey. However, many lecturers pointed out that the student written feedback provides more precise information of students points of satisfaction or dissatisfaction than the quantitative score. Moreover, student ratings of courses and teachers are subjective.
The current process of analysing SET results at DTU does not include analysis of students’ open-ended feedback. The students’ comments on what went well, what did not went so well and students’ suggestions are available for course teachers and for university administration. The traditional way of analysing students’ comments, i.e. reading, is hardly applicable when all courses of the university or department are analysed. In this situation, an automated method for extracting the most important information from the students written feed- back may be able to provide insight to university administration and depart- ments study boards on how a course was conducted, what went well, and what could be improved.
There are some challenges in analyzing the students written comments. First of all, the response rates on open-ended SET questions is usually below 20% (Jordan, 2011). Moreover, standard text-mining methods are developed for analysis of large documents or large collections of documents, while the stu- dents comments are different in length, ranging from just a few words to several
5.2 Text mining of student comments 69
paragraphs of detailed discussion. Another challenge is that many comments contain slang, mistakes, misprints, word contractions, course-specific terms and abbreviations. However, one of the advantages of DTUs survey is that students write their positive and negative feedback separately.