Approaches of knowledge tracing often aim to model the learner’s mastery of skills being taught dur- ing a tutoring interaction (seePelánek(2017) for an overview). This is one of the important pieces of information that is stored in thestudent model of an ITS and can be used as a knowledge base to address the learner’s individual needs by planning the next steps in a tutoring interaction accord- ingly. One possible way to approach this aspect is to extract information about the student from data using complex machine learning algorithms, such as recurrent neural networks (Piech et al.,2015;Kha- jah et al.,2016), collaborative filtering techniques (Töscher and Jahrer,2010) or even ensembles of dif- ferent approaches (Pardos et al.,2012). While these models often achieve a good predictive accuracy, they lack interpretability. However, especially in the scope of educational applications, educators and teachers are often concerned with the interpretability and validity of applied models and, thus, these approaches are barely used in practical applications. Alternatively, simple and easily understandable assumption-free approaches, such as the exponential moving average, can be used. Here, past attempts to solve a task are weighted by an exponentially decreasing function to estimate the learner’s knowledge. These approaches have the advantage of computational efficiency and the ease of application, while of- ten providing reasonable predictions. Nevertheless, they still cannot keep up with more sophisticated knowledge tracing algorithms (cf.Wauters et al.,2012;Pelánek,2014).
A more elaborated approach to model the learner’s knowledge is based on logistic models, which are usually used to model the acquisition and forgetting of declarative knowledge (White,2001;Pavlik and Anderson,2005;Pelánek,2015;Sense et al.,2016). To achieve this, the skill is represented as a continuous variable and learning is modeled as a gradual change. Furthermore, the item difficulty is calculated by using a logistic function, e.g.,f(x) =1/(1+e−x), representing the probability of answering correctly given a specific task difficulty and the current skill mastery. A typical logistic model is the Performance Factor Analysis (Pavlik et al.,2009), which allows to estimate the skill mastery based on the learner’s performance during the interaction. Similar models are the Additive Factors Model (Cen et al.,2006;
Käser et al.,2014b), which is also sensitive to the frequency of prior practices of a skill, the Instrumental Factors Analysis (Chi et al.,2011), which also incorporates different types of instructional interventions and their effects, and the Elo Rating System (Pelánek,2016). The latter is originally developed to rate chess players and allows to easily and dynamically estimate the skill level of students, as well as the difficulty of tasks by interpreting the student’s answer as a match between the student and the task.
Another widely used group of approaches incorporates Bayesian models. They are able to handle uncertainty easily, recover from errors during an interaction and allow to infer hidden state values from evidence. The On-Line Assessment of Expertise (OLAE) tool, for instance, observes the individual steps done by the learner to infer her skill mastery and domain knowledge (Vanlehn and Martin,1998). Similarly, the ITS called Ecolab logs the learner’s requests for help to predict the mastery of the current domain, as well as the readiness to learn new topics (Luckin and du Boulay,1999). Moreover,Gordon and Breazeal(2015) presented a so-called “active learner model” to trace the word reading skill of young children. It employs a simple distance metric to approximate the conditional probabilityp(w2|w1)
5.1. Model Selection
evaluation showed that this approach is able to adapt to users of different age and to trace their reading knowledge fairly well (Gordon and Breazeal,2015). An additional benefit of Bayesian models is their optimisability through machine learning to accelerate the development of ITSs and to refine models based on the learner’s data either during the interaction (Schadenberg et al.,2017) or even beforehand (Arroyo et al.,2004;Ferguson et al.,2006).Schadenberg et al.(2017), for instance, based their lesson planning on the likelihood whether the learner will answer correctly or not. This likelihood is modeled with just two parameters, namely, the user’s learning ability and the task difficulty. Both are fitted during the interaction while observing the learner’s performance to optimize the knowledge tracing and, with that, the lesson planning (Schadenberg et al.,2017).
In addition, Bayesian models are also able to model changes over time, e.g., in Dynamic Bayesian Networks (DBNs). Similar to traditional Bayesian models, DBNs can be used by an ITS to decide what to do next based on the current knowledge about the learning situation. But, they also allow for a more detailed planning of the next steps by simulating the future course of the interaction. Early versions of DBN-like constructs can be found in the Andes physics tutor (Conati,2002) and Prime Climb math tutor (Conati et al.,2002;Conati and Maclaren,2009), which model the learner’s goals and affective states, while using a non-Bayesian update rule for better scaling. However, probably the most common DBN-like Bayesian model to trace the learner’s knowledge is called Bayesian Knowledge Tracing (BKT) (Corbett and Anderson,1994). It is based on a Hidden Markov Model (HMM) consist- ing of a latent (the skill-knowledge of just one skill) and an observable variable (the answer correctness) and often serves as a basis for more complex models of knowledge tracing (e.g.,de Baker et al.,2008;
Lee and Brunskill,2012;Spaulding et al.,2016).Spaulding et al.(2016), for instance, proposed the Af- fective BKT model to trace the language reading skills of children in a cHRI. They introduced two further observable variables, namely “smile” and “engagement”, to enable the system to take the affec- tive and cognitive state of the child into account and to calculate the skill belief correspondingly. Their evaluation showed that this model outperforms traditional BKT-based models for knowledge tracing in educational settings (Spaulding et al.,2016). Similarly,Käser et al.(2014a) extended the traditional BKT and defined a comprehensive DBN to trace the knowledge of all skills to be learned in just one network. This allows for the system to trace the learner’s knowledge about each skill individually and, additionally, to represent and reason about skill interdependencies, which in turn allows to specify the best learning order of all skills or even to let the system search for it autonomously. Their evaluation demonstrated that this more detailed model outperforms the traditional BKT with regard to the accu- racy of traced skill beliefs, at least in domains with skills that are interdependent (Käser et al.,2014a). In addition to these examples, many other extensions and variants of the basic BKT can be found that include the item difficulty (Pardos and Heffernan,2011), forgetting (Khajah et al.,2016), extended learning states (Zhang and Yao,2018), the time between attempts (Qiu et al.,2011) or investigate the individualization of these models in more detail (Pardos and Heffernan,2010;Yudelson et al.,2013).
Also generalizations and combinations of Bayesian and logistic models were developed in recent years (Wang et al.,2013;Gonzalez-Brenes et al.,2014;Khajah et al.,2014a,b;Streeter,2015). Khajah et al.(2014b), for instance, combined Item Response Theory (a logistic model), which allows to model
different student abilities and problem difficulties, with a HMM to trace the learner’s skill acquisition. Their evaluation showed that each sole model is outperformed by the combination of both types (Kha- jah et al.,2014b). Streeter(2015), instead, used a more generalized approach called mixture modeling, which also combines both types of models, but shows higher improvements in prediction accuracy on real data (Streeter,2015). However, the superiority of these more complex hybrid models is mostly only observable when comparing them with fairly simple versions of the basic models. In fact,Zhu et al.(2018) demonstrated that the basic BKT approach extended with temporal information about the performance data already achieves comparable results (Zhu et al.,2018).