2. CAPITULO II: MARCO TEORICO
2.8. Descripción de la Tecnología Seleccionada
This limitation is overcome to some extent by Piech et al. who enable students to use the Eclipse programming environment, a non-structured editor, to invoke methods that are defined by instructors and use statistical models to automatically determine the amount of difficulty students had with their assignments [47].
The goal of their work was to automatically monitor students’ progress through assignments and determine where they are having difficulty. Their intuition was that a) incremental submissions of students’ programs can be automatically assigned a label such as
“the student has just started”, b) students transitions from label to label could be graphically modeled to show how students transition through assignments, and c) the amount of difficulty students had on assignments was based on the number of code submissions it took before they moved to the next label.
To test their intuition, they created an Eclipse plug-in to log students’ code when they saved or compiled their program. Since labels were not predefined, they needed a way to a) automatically convert incremental code submissions into labels and b) predict the likelihood that students will transition to the next label. To create labels, they clustered 2000 code submissions from different students using the K-Mediods algorithm. Given n versions of code and k (the initial number of clusters to produce), they asked the algorithm to partition the code into clusters based on the median distance between versions of code. In their case, n was 2000 and k was 26. Since there is no well-known measure for determining similarity between two pieces of code, they created three measures.
The first measure, Bag of Words Difference, uses histograms to represent the frequency that key words appear in two versions of code and the Euclidean distance between two
histograms as a measure of difference between the two versions of code. The second measure, Application Program Interface (API) Call Dissimilarity, is computed by a) executing students' programs to capture the sequence of API calls, b) finding sequences of API calls that do not match using the Needleman-Wunsch global DNA alignment algorithm [45], and c) using the number of sequences that do not match as the difference between two programs. The last measure, Abstract Syntax Tree (AST) Change Severity, is computed by first creating AST representations of two programs. An abstract syntax tree is a tree representation of programs where each tree node contains programming syntax. The term “abstract” means some
programming syntax is excluded from the tree such as parentheses. The next step is to determine the Evolizer change severity score, which is the minimum number of changes needed to
transform an AST from one program to the AST of another program. Finally, they use the Evolizer change severity score to determine the difference between two programs.
To evaluate each measure, they a) selected 90 pairs of programs where each pair was from the same student, b) computed each measure for each pair of programs, c) recruited five experts to label each pair of programs as either similar or different based on style and functional rules given to them, and d) compared the experts’ assessment to each measure. The API Call Dissimilarity and the AST Change Severity metric preformed best. Therefore, they created a weighted sum of both metrics. We refer to the weighted sum of these metrics as code distance because they determine the amount of difference between two versions of code. They used the K-Mediods algorithm to partition the code submissions clusters based on the code distance. A manual inspection of the clusters confirmed that code submissions that were clustered together were similar in functionality and intuitively made sense.
To graphically model students’ transitions they used Hidden Markov Models (HMMs), which is a probabilistic finite state machine shown in Figure 2.7. The term “hidden” means that states or labels, are not explicitly labeled, but are inferred using data that correlates with states. In their case, data is incremental code submissions. Each label is a node in the finite state
machine and the HMM provides the probability of transitioning from one label to the next and A computes the probability that a code submission is a label given X. The final step is to
determine the amount of difficulty students had using the graphical model.
To find patterns, they compared students’ transitions through the various labels in the HMM by clustering their paths using the K-Means algorithm. Given the transitions through the
various labels and k (the number of clusters to produce), they asked the algorithm to partition the sequences into clusters based on the average probability that one student’s path could be
produced by another student’s HMM and vice versa. They found that there were several clusters where students submitted several versions of code, but remained in the same label. The number of times students remained in the same label indicated the amount of difficulty students had while programming. Interestingly, once students were having difficulty, there was a high probability that they would continue being stuck.
Figure 2.7: Hidden Markov Model of state transitions for a student. “Code” nodes represent a version of a student’s code at a particular time and “State” nodes represent the
high-level label the student is in at that same time. N represents the number of states and versions of code for a student.