Monterrey Campus
School of Engineering and Sciences
Exploring Data-Driven Selection Hyper-Heuristic Approaches for the Curriculum-Based Course Timetabling
A thesis presented by
Carlos Alfonso Hinojosa Cavada
Submitted to the
School of Engineering and Sciences
in partial fulfillment of the requirements for the degree of
Master of Science
in
Computer Science
Monterrey, Nuevo Le´on, December, 2020
Declaration of Authorship
I, Carlos Alfonso Hinojosa Cavada, declare that this thesis titled, Exploring Data-Driven Se- lection Hyper-Heuristic Approaches for the Curriculum-Based Course Timetabling and the work presented in it are my own. I confirm that:
• This work was done wholly or mainly while in candidature for a research degree at this University.
• Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.
• Where I have consulted the published work of others, this is always clearly attributed.
• Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.
• I have acknowledged all main sources of help.
• Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.
Carlos Alfonso Hinojosa Cavada Monterrey, Nuevo Le´on, December, 2020
2020 by Carlos Alfonso Hinojosa Cavadac All Rights Reserved
iii
Dedication
I would like to express my deepest gratitude to all those who have been by my side on every step of the way: my parents, friends and family. Thank you all for your unconditional confi- dence, support, patience, and encouragement. You were my main motivation for completing this work.
v
Acknowledgements
First and foremost, I would like to thank my family for being my driving force and motivation.
This achievement is also yours.
To Brianda Daniela Flores Garc´ıa for your love and support, you inspire me every day.
Life by your side is beautiful. I am very grateful to share it with you.
Special thanks to my advisor, Dr. Santiago Conant, who was always there to guide the project through these last two years. The encouragement of Dr. Jos´e Carlos Ortiz Bayliss was also fundamental for driving this project to conclusion.
To Erick Lara C´ardenas, for your everlasting willingness to help, and for sharing your passion for technology and software development. I appreciate your advice for completing this dissertation.
To my classmates Eider D´ıaz and Miguel ´Angel Cort´es. We made a great team, may we continue doing so for years to come. Thanks to Wario Ch´avez, Axel Ramos and the great group of people I was fortunate to meet these years. You were brilliant.
This work was possible partly because the tuition support and computing resources pro- vided by Tecnologico de Monterrey and CONACyT.
vii
Exploring Data-Driven Selection Hyper-Heuristic
Approaches for the Curriculum-Based Course Timetabling by
Carlos Alfonso Hinojosa Cavada Abstract
The curriculum-based timetabling problem (CB-CTT) represents a challenging field of study within educational timetabling, with real-world applications that stress its importance. Solv- ing a CB-CTT problem requires allocating a set of courses using limited resources, subject to a set of hard constraints that must be satisfied [102]. The goal then is to find a feasible assign- ment for every lecture that constitutes the courses to the positions in the timetable formed by a combination of day, period, and room; all while minimizing an objective function as specified by the constraints in the problem.
Designing the timetable for the courses in the incoming term is a problem faced by universities each academic period. Given the complexity of manually designing timetables, automated methods have attracted the attention of many researchers for solving this problem.
The design of timetables remains an open problem to this day. According to the no free lunch theorem [120], different heuristics are effective on different problem instances, stressing the importance of finding automated methods for designing timetables. This dissertation explores novel hyper-heuristic models that rely on various machine learning techniques, such as boost- ing, clustering and principal component analysis. In total, two models were designed and implemented as results of this work.
The first model relies on gradient boosting algorithms to generate a selection hyper- heuristic. The general idea is that different instances of the CB-CTT are best solved by differ- ent heuristics. Hence, the aim is to create a model that learns from the features that describe problem instances and predicts which would be the most suitable heuristic to apply. While the classification model produces promising results in terms of accuracy, the quality of the generated solutions is bounded by the best-known single heuristic.
The second model aims to remove the bounds set by the use of a single heuristic by exploring ways of combining heuristics during the timetable construction process. The se- lection hyper-heuristic approach is powered by principal component analysis and k-means.
The model starts by identifying similar regions in the instance space and keeping track of the performance of each heuristic for those regions. Then, when constructing new timetables, the model determines the most suitable heuristic for a given region of the instance space. The method was able to outperform the synthetic oracle created by taking the result of the best isolated heuristic in several instances.
This dissertation is submitted to the Graduate Programs in Engineering and Information Technologies in partial fulfillment of the requirements for the degree of Master of Science in Computer Sciences with a major in Intelligent Systems.
ix
List of Figures
2.1 Example data for the heuristic ordering of five courses with six timeslots and three rooms. . . 11 2.2 Hyper-heuristic classification by learning methodology and the nature of the
heuristic space as proposed by Burke et al. [20]. . . 16 2.3 Example of the steps described by the Cross-Industry Standard Process for
Data Mining (CRISP-DM). The outer circle represents the cyclic nature of the data analysis process. . . 18 3.1 Lecture distribution for a 200-lecture CB-CTT instance with a 9-block long
hyper-heuristic. Lectures are randomly distributed, with the goal of exploring the space of the instance without bias. . . 27 4.1 Block diagram showing the process of generating feature-driven hyper-heuristics. 33 4.2 Graphical representation of the accuracy of XGBoost after feature transfor-
mation. The Figure shows that the model is able to choose a competitive heuristic, even when failing to predict the best known value. . . 34 4.3 Graphical representation of the importance of features used for classification
as measured by their F-Score. . . 35 5.1 Block diagram showing the process of generating hyper-heuristics for a single
experiment scenario as shown on Table 5.2. The methods and techniques used for PCA and clustering are described in Section 5.1.1. Also, the method by which the input dataset is created is described in Section 3.3.2 . . . 40 5.2 Graphical representation of the cumulative percentage of the variance as ex-
plained by the principal components . . . 42 5.3 Graphical representation of the Elbow method. As the number of cluster in-
creases, the WCSS value decreases. A change in the steepness on the rate of decrease indicates the optimal number of clusters. . . 43 5.4 Success rate of the best hyper-heuristics (30 runs) trained with 7500 data
points using PCA for feature selection and the lowest average to determine the best heuristic on the cluster (HH17, HH25), feature importance for selec- tion (FI) and roulette wheel for deciding which heuristic to use (RW). . . 48 5.5 Graphical representation of the penalties for different solution methods on the
benchmark instances. The best known value using the heuristics alone (H*, blue) is outperformed by the hyper-heuristic methods (HH18, green). . . 50
xi
List of Tables
3.1 The set of features used to describe instances on the problem. Some features are extracted only after an instance has gone through the solution process. . . 24 3.2 The set of features used to describe the state of the problem as it changes
during the construction phase. . . 25 3.3 Description of the ITC-2007 instances . . . 26 4.1 Curriculum-based course timetabling features considered for training classifi-
cation models. Each feature is mapped to a unique ID for simplicity. . . 35 4.2 Accuracy scores for XGBoost and Adaboost models with varying number of
features, according to their importance during training stages. . . 36 5.1 The mean of the cost for each observation on every cluster is calculated. The
heuristic with the lowest average cost is considered the best for the observa- tions related to that cluster. . . 44 5.2 Experiments for data-driven hyper-heuristic models are divided into three
stages: exploratory, extended and confirmatory. On the first two scenarios, we increase the count of data points used to train the hyper-heuristics. Confir- matory experiments are performed on the benchmark set. . . 45 5.3 Hyper-heuristic success rate (%) for scenario 1 (see Table 5.2). The best
hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered as a success when the hyper-heuristic is better than H*. . . 46 5.4 Hyper-heuristic success rate (%) for scenario 2 (see Table 5.2). The best
hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered a success when the hyper-heuristic is better than H*. . . 47 5.5 Hyper-heuristic success rate (%) for scenario 3 (see Table 5.2). The best
hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered a success when the hyper-heuristic is better than H*. . . 47 5.6 Hyper-heuristic success rate (%) for the confirmatory experiments on sce-
nario 4 (see Table 5.2). The best hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered as a success when the hyper-heuristic is better than H*. . . 49
xiii
Contents
Abstract ix
List of Figures xi
List of Tables xiii
1 Introduction 1
1.1 Motivation . . . 2
1.2 The Curriculum-Based Course Timetabling . . . 2
1.3 Hypothesis . . . 3
1.4 Objectives . . . 3
1.5 Methodology . . . 4
1.6 Contributions . . . 5
1.7 Document Structure . . . 5
2 Theoretical Background 7 2.1 The Curriculum-Based Timetabling Problem . . . 7
2.1.1 Definitions and Formal Representation . . . 7
2.1.2 Exact Methods for Solving the CB-CTT . . . 9
2.1.3 Approximation Methods for Solving the CB-CTT . . . 9
2.1.4 Constructive Heuristics for Solving the CB-CTT . . . 10
2.1.5 Other Educational Timetabling Problems . . . 12
2.2 Metaheuristics and Machine Learning Methods . . . 12
2.2.1 Tabu Search . . . 13
2.2.2 Simulated Annealing . . . 13
2.2.3 Genetic Algorithms . . . 14
2.2.4 Neural Networks . . . 14
2.2.5 Meta-Learning . . . 14
2.3 Hyper-Heuristics . . . 15
2.3.1 Hyper-Heuristics for Educational Timetabling . . . 17
2.4 Cross-Industry Standard Process for Data Mining . . . 18
2.5 Summary . . . 19
3 Methodology: Hyper-Heuristic Generation Through CRISP-DM 21 3.1 Business Understanding . . . 21
3.1.1 The Curriculum-Based Course Timetabling Problem . . . 21 xv
3.2.1 Static Features . . . 23
3.2.2 Dynamic Features . . . 24
3.2.3 The CB-CTT Instances . . . 25
3.3 Data Preparation . . . 26
3.3.1 Collecting Static Features . . . 27
3.3.2 Collecting Dynamic Features . . . 27
3.4 CRISP-DM: Modeling, Evaluation, and Deployment . . . 28
3.4.1 Model A. Feature-Driven Hyper-Heuristic Model . . . 28
3.4.2 Model B. Data-driven hyper-heuristic model . . . 28
3.5 Summary . . . 29
4 A Feature-Driven Hyper-Heuristic Model 31 4.1 Solution Model . . . 31
4.2 Experiments and Results . . . 32
4.2.1 CRISP-DM: Modeling . . . 32
4.2.2 CRISP-DM: Model Evaluation . . . 34
4.3 Summary . . . 36
5 A Data-Driven Hyper-Heuristic Model 39 5.1 Solution Model . . . 39
5.1.1 Modeling: Training a Data-Driven Hyper-Heuristic . . . 41
5.2 Model Evaluation: Experiments and Results . . . 44
5.2.1 Exploratory Experiments . . . 45
5.2.2 Extended Experiments . . . 45
5.2.3 Confirmatory Experiments . . . 49
5.3 Discussion . . . 50
5.4 Summary . . . 51
6 Conclusions 53 6.1 General Conclusions . . . 53
6.1.1 Remarks on a Feature-Driven Hyper-Heuristic Model . . . 54
6.1.2 Remarks on a Data-Driven Hyper-Heuristic Model . . . 54
6.2 Future Work . . . 55
A Base Solver 57 A.1 Neighborhood structure . . . 57
A.2 Construction phase . . . 58
A.3 Feasibility and Improvement Phases . . . 61
Bibliography 72
xvi
Chapter 1 Introduction
Timetables are present in many different areas of life. As an organized society, we need systems to schedule when things are going to happen whether it is consulting the time for a movie screening, booking a flight, or attending meetings at work. In universities, scheduling courses is a recurring activity, with timetables often adjusted or redesigned for each academic period. Moreover, while we can easily read and understand timetables, creating them can become very complex due to the large number of entities involved. Problems where finding a solution could be computationally expensive, but the solution can be verified easily are called non-deterministic polynomial problems (NP) [69].
A further classification on the complexity of problems lies on the NP-hard class. These problems are at least as hard as the hardest problems in NP, but do not necessarily belong to the NP class [67, 118]. The halting problem is an example of NP-hard complexity [117].
Furthermore, the intersection between NP and NP-hard problems is called NP-Complete [4].
While some instances of NP-complete problems may be solved in polynomial-time, there is no known polynomial-time algorithm that guarantees optimal solutions as the instance size increases [47].
Educational timetabling problems belong to the NP-hard class. The problems in this field of artificial intelligence and operational research have many real-world applications, as designing timetables is a recurring activity for educational institutions. This work focuses on the Curriculum-Based Course Timetabling Problem (CB-CTT), which deals with the al- location of lectures that involve teachers and students in a previously defined period of time, while satisfying a set of constraints [102]. Depending on the size of the CB-CTT instance, performing an exhaustive search to find the optimal solution may not be feasible.
In the following sections, we present the Curriculum-Based Course Timetabling prob- lem and the motivation to conduct the study in this area. This is followed by the hypothesis and research questions that led to the objectives of this dissertation. Next, a description of the methodology of how hyper-heuristics and data science will be applied to the problem is presented. Finally, the section ends with the contributions derived from this work and a description of how this document is organized.
1
1.1 Motivation
Formally introduced as the third track in the Second International Timetabling Competi- tion (ITC-2007), the curriculum-based course timetabling (CB-CTT) is a type of university course timetabling where the task is to schedule a set of lectures on a set of time slots, subject to specific constraints. The constraints are divided into two sets: hard constraints that must be satisfied in order to consider a feasible solution and soft constraints, which make a timetable more desirable. Thus, solving the CB-CTT requires to find a timetable that allocates every lecture of every course while minimizing the penalty for failing to satisfy the soft constraints.
Educational timetabling belongs to the NP-hard problems [35, 40], as real-world applications can become quite challenging. Additionally, the formulation of the problem varies to accom- modate the requirements of different universities in specific periods of time, making CB-CTT an essential problem to study.
While exact methods aiming for lower bounds optimality have been proposed for solv- ing the CB-CTT [73, 90], it is still considered an open problem, attracting the attention of researchers around the world at the time of writing [30, 34]. These exact methods have ob- tained varying degrees of success, but are limited to instance sizes and constraints [48], hence the use of heuristic methods such as metaheuristics and hyper-heuristics.
Heuristic methods [21] specifically tailored for the CB-CTT problem have attracted at- tention from researchers in recent years, as surveyed by Pillay in [93]. The goal of heuristic methods is to generate approximate solutions in a short time using few computational re- sources. In educational timetabling, heuristic methods are usually applied to create an initial solution, which cannot be guaranteed to be feasible let alone optimal, yet this initial timetable can later be optimized by more sophisticated methods. Some examples of this approach in- clude the automated generation of constructive ordering heuristics [94] or iterated local search combining perturbation and hill-climbing [111].
Selecting the most suitable algorithm for an instance of a problem or a specific situation during the solution process can be approached by selection hyper-heuristics [21, 36]. In this study, procedures based on data mining are presented as an alternative for constructing the timetables for the CB-CTT problem, as research has shown early decisions impact the overall quality of the solutions [82]. The process of generating data-driven selection hyper-heuristics for the CB-CTT is framed using the cross-industry standard process for data mining (CRISP- DM) [104].
1.2 The Curriculum-Based Course Timetabling
Educational timetabling is defined as the allocation, subject of constraints, of given resources to objects placed in space-time in such a way as to satisfy as much as possible a set of objec- tives. Timetabling is similar to the graph coloring problem [100, 84], in that nodes represent events and edges conflicts between them.
In its most basic form, the elements that the timetabling problem is concerned with include combining a set of courses, a set of professors, a set of rooms, and a set of timeslots following constraints. The constraints on the timetabling problem can be classified into two categories: hard and soft. The hard constraints must be satisfied for a solution to be feasible,
1.3. HYPOTHESIS 3
while soft constraints can be based on a cost function where maximizing it is related to the quality or desirability of that solution.
1.3 Hypothesis
The application of data science techniques to analyze the data collected from the execution of heuristic methods can be used to discover insights to make decisions for the building selection hyper-heuristics that perform better than the best heuristic for the initial timetable construction on instances of the curriculum-based course timetabling problem.
The main research question for this work is how can data science can uncover patterns in the data generated by the execution of heuristic methods to create hyper-heuristics that find high-quality solutions for the CB-CTT problem. To find this answer, several more specific questions must be answered first. These questions include:
• What data should be collected to best describe the different states of the problem during the solution process? How often should such data be collected?
• Which low-level heuristic methods can be used to create an initial solution for the CB- CTT instances? How do such heuristics behave on different instances of the CB-CTT problem?
• How will partial states of the CB-CTT solution process be evaluated and associated with different low-level heuristics?
• How do the different features that characterize the instances of the CB-CTT problem affect the training process for the hyper-heuristic?
• How can the trained machine learning models be evaluated to decide which one provides the best information for assigning the best possible heuristic at every step of the solution process?
• Will the combination of different low-level heuristics produce better solutions compared against applying each heuristic in isolation?
1.4 Objectives
The main objective of this work is to conduct an experimental analysis of a data-driven model for improving hyper-heuristic performance on the Curriculum-Based Course Timetabling prob- lem. We analyze the role of the data science pipeline for the implementation and improvement of this hyper-heuristic model. For this, the following particular objectives must be achieved:
• Collect data during the solution process that can be leveraged to describe the impact of heuristics for different stages of the solution.
• Determine the feasibility of designing and implementing a hyper-heuristic model by comparing the results obtained by applying heuristics in isolation.
• Design and implement at least one model to produce selection hyper-heuristics for the CB-CTT based on the features of the instances.
• Design and implement at least one model to produce selection hyper-heuristics for the CB-CTT based on the data collected from different states of the solution process.
• Present the advantages and disadvantages of solving the CB-CTT problem with the proposed models.
The expected contribution depends directly on the successful completion of these ob- jectives. Once they are completed, the results will provide evidence that data science can be leveraged to guide the design of hyper-heuristic models for the Curriculum-Based Course Timetabling.
1.5 Methodology
This section describes the steps needed to test the hypothesis and accomplish the objectives stated in the previous sections. The following elements are sorted by priority, starting by determining the data collection methodology and concluding with the experimentation and analysis of results.
• Determine the data collection methodology. The first step of the research involves the creation of a dataset with features that describe the effect of heuristic methods applied to the problem instances during the different stages of the solution process. This step includes the research as to which features are commonly used to describe the problem, as well as designing other characteristics that help describe the problem during the solution process. In addition to determining what data will be collected, the frequency for the data collection has to be set before selecting the final dataset.
• Define the set of data science procedures to get insights from the collected data. The generated data on the previous step holds the key to define a data-driven hyper-heuristic method, but data science techniques must be applied to the raw data to uncover insights.
The selection of these techniques will be based on literature research and may include feature selection, feature scaling, the training of machine learning models and statistical evaluation of the results.
• Specify the data-driven hyper-heuristic model. Define and implement a specific hyper- heuristic model based on the insights gathered from data per the steps above. The strategy for the design of the hyper-heuristic is done in a way that makes it possible to compare the results with the ones obtained by the heuristics applied in isolation.
• Conduct experiments and analyze their results. Train the hyper-heuristic model using the problem instances from which data was collected. Define a performance metric to evaluate the different solution strategies across different instances.
1.6. CONTRIBUTIONS 5
1.6 Contributions
The present dissertation deepens into hyper-heuristics for solving the CB-CTT. The main con- tribution of this thesis is the design, experimentation and analysis of two data-driven hyper- heuristic models for the CB-CTT. In a more descriptive way, the contributions of this investi- gation are:
Applying a feature-driven selection hyper-heuristic to the CB-CTT. This work proposes a selection hyper-heuristic model which uses features of the instances to make decisions.
In the model presented in Chapter 4, knowledge obtained from previous experience is leveraged to select the most suitable heuristics while constructing a timetable for the CB-CTT.
A data-driven selection hyper-heuristic for the CB-CTT. The selection hyper-heuristic model for combining heuristics based on previous experience is presented in Chapter 5. In this model, clustering algorithms are used to group observations of intermediate problem states to determine which is the most suitable heuristic to apply at those stages. This approach could be extended to other domains of combinatorial optimization.
A framework for studying the overlap of data science and optimization in timetabling. To the best of the author’s knowledge, there is little to no research on selection hyper- heuristics generated through a data mining process that has been used to solve the CB- CTT. In the models proposed in this dissertation, the optimization process is guided by a strategy based on data science.
1.7 Document Structure
The following chapter presents the formal definition of the CB-CTT and entries in the litera- ture related to the problem. Chapter 3 describes the proposed solution model for this disserta- tion. In chapter 4, the reader will find experiments and their corresponding analysis under the light of the CRISP-DM methodology. Finally, chapter 5 presents the conclusions and future work derived from this dissertation.
Chapter 2
Theoretical Background
This chapter presents a review of the state of the art related to this research. A brief description of the Curriculum-Based Course Timetabling (CB-CTT) problem is presented, along with se- lected approaches from the literature to solve them. The concept of hyper-heuristic is defined next, with a special focus on selection hyper-heuristics and some background on their appli- cation to the CB-CTT problem. Next, the concept of meta-learning is presented. Finally, the concepts and definitions relevant to this research are summarized.
2.1 The Curriculum-Based Timetabling Problem
This section covers a formal definition for the curriculum-based course timetabling prob- lem (CB-CTT). A brief introduction of the key concepts is presented, followed by some pop- ular solving strategies for the problem and example instances. Finally, we briefly introduce other formulations of the university timetabling problem.
2.1.1 Definitions and Formal Representation
A classic timetabling problem deals with the allocation of events to slots in time, subject to the available resources and explicit constraints. Therefore, the problem can be represented by four sets: the set of events, the set time periods, the set of resources, and the set of constraints.
Eventsare activities to be scheduled. In the domain of the CB-CTT, the events represent the lectures that form each of the courses in the problem instance. These courses are aggre- gated to form the curricula, which represents a set of courses that must be taken by a group of students. The set of time periods is formed by multiplying the number of teaching days by the number of time slots of each day. Hence, a time period is a pair formed by day and time slot. Finally, the set of constraints is formed by hard constraints which must be satisfied for a timetable to be valid and soft constraints, which indicate the quality of the solution.
The problem is detailed as follows. Let li,k ∈ Li be a lecture of course ci from the set of courses C =
n
S
i=1
ci to be scheduled in the set of p time periods P = dh, where d is the number of teaching days and h is the number of time slots per day. Likewise, each lecture li must be scheduled in a room r from the set of rooms given R =
n
S
i=1
ri. In addition, sets
7
of k curricula K =
n
S
i=1
ki are formed by courses which share common students. This way, a timetable x from the space of all possible timetables X is represented as a pr matrix, where xi,j corresponds to a lecture assigned to a period pi and a room rj. Given this notation, we have a candidate timetable that might be a valid solution.
For a timetable to be considered feasible, it must satisfy every hard constraint and every soft constraint violation incurs a penalty cost. This study adopts a formulation from ITC-2007 commonly found in the literature with a set of four hard constraints H = {H1, H2, H3, H4} and a set of four soft constraints S = {S1, S2, S3, S4} considered as follows:
Lectures (H1) All lectures of every course must be scheduled in distinct periods. In other words, H1 will not be satisfied and a timetable will not considered valid until every lecture has been allocated.
Room Occupancy (H2) Two lectures cannot be assigned to the same room in the same pe- riod. In simpler terms, lectures cannot share rooms at the same time for H2 to be satisfied.
Conflicts (H3) Lectures of courses in the same curriculum or taught by the same teacher cannot be scheduled in the same period, to prevent overlapping of students or teachers.
Availability (H4) If the teacher of a course is not available at a given period, then no lectures of the course can be assigned to that period.
Recall that soft constraints have no impact over the feasibility of a timetable. Once every hard constraint has been satisfied, the quality of a timetable is determined by the weighted sum of the soft constraint violations. The set of soft constraints is detailed as follows:
Room capacity (S1) For each lecture, the number of students sr attending the course should not be greater than the capacity of the room cr hosting the lecture. S1 is formally described by Equation 2.1.
Room stability (S2) The number of rooms r for scheduling the lectures of a course ci should not exceed 1. S2 is formally described by Equation 2.2.
Minimum working days (S3) The lectures of a course should be spread s(ci) over a given minimum number of days m. S3 is formally described by Equation 2.3.
Curriculum compactness (S4) For a given curriculum, a violation is counted if there is one lecture not adjacent to any other lecture l belonging to the same curriculum Ci within the same day d, which means the agenda of students should be as compact as possible.
S4 is formally described by Equation 2.4.
f1(xi,j) =
(sr− cr, if sr > cr;
0, otherwise . (2.1)
f2(ci) = r(ci) − 1 (2.2)
2.1. THE CURRICULUM-BASED TIMETABLING PROBLEM 9
f3(ci) =
(mi− s(ci), if mi > s(ci);
0, otherwise . (2.3)
f4(xi,j) =X
ciC
d · l (2.4)
Finally, the quality of a candidate solution is determined by the total soft penalty cost, calculated by evaluating an objective function defined in Equation 2.5. Soft constraint vio- lations are associated to penalty coefficients as follows α1 = 1, α2 = 1, α3 = 5, α4 = 2 per the formulation proposed in [17]. The goal is to find a feasible solution X0 such that f (X0) ≤ f (X) for all X candidate solutions in the solution search space.
f(X)= α1· X
xi,jX
f1(xi,j) + α2 ·X
ciC
f2(ci) + α3·X
ciC
f3(ci) + α4· X
xi,jX
f4(xi,j) (2.5)
2.1.2 Exact Methods for Solving the CB-CTT
The literature refers to exact methods as those capable of producing the best possible solution every time they are executed. With exact methods, the solution found must always be the optimal one. The following lines describe two of these methods.
Branch and cut In branch and cut, the problem is formulated as a mixed-integer linear pro- gramming problem and solved combining the cutting plane method with a branch-and- bound algorithm [39]. The purpose of the branch-and-bound method is to search on the space of all possible solutions, such that solution attributes not shared among op- timal solutions are found as early as possible. The set of feasible solutions is thought of as forming a rooted tree with the full set at the root. In branch-and-cut, a cutting plane algorithm is applied in each node before branching. The objective is to improve the formulation in each node. Compared to a classical Branch and bound algorithm, it decreases the number of nodes to be explored but it increases the computation effort in each node [12, 22].
Boolean Satisfiability (SAT) and MaxSAT Herein, the CB-CTT is proposed as different en- codings based on Basic SAT encoding for modeling the subsets of hard and soft con- straints [31]. Max-SAT is a generalization of the boolean satisfiability problem which asks whether there exists a boolean formula that evaluates to true when replacing all its variables by the values true or false [29]. In this exact method, both soft and hard constraints of the CB-CTT problem are defined as hard. Hence, when the SAT solver finds a solution for an instance, the resulting time-table is guaranteed to be optimal at zero-cost [2].
2.1.3 Approximation Methods for Solving the CB-CTT
While the exact methods presented in the previous section guarantee to find an optimal solu- tion when it exists, these are rarely used in real-world scenarios. Exact methods are limited
by the size of the instance since the search space of the CB-CTT can grow dramatically with the number of lectures. Hence, performing an exhaustive search in the space of the solu- tions might require many resources. The literature describes approximation methods as an alternative for dealing with problem instances where exact methods cannot be used [53]. Ap- proximation methods present a trade-off where they do not guarantee to find an optimal solu- tion, but they can be executed using only a few resources in terms of memory and CPU time when compared with exact methods. The results obtained from approximation methods can be competitive when compared to the optimal solution, making them a field worth studying.
Approximation methods for scheduling problems include ruled-based approaches and heuristics, which have been used indistinctly in the past [86]. However, these concepts have some slight differences [102]. The rule-based approach is a solution technique based on ex- pert systems for resource allocation, where lectures are considered activities and periods are considered resources to be assigned as activities. The RAPS rule-based language proposed by Solotorevsky et al [108] is the most common basis of this kind of approach. In contrast, heuristics are defined simply as rules of thumb for resource allocation.
Constraint programming is another relevant solution technique that has proved effective for CB-CTT [25]. In this approach, the timetabling problem is framed as a constraint sat- isfaction problem where the lectures are defined as the variables and the time slots are the domains. Then, the exploration of the relevant search space can be carried out by any search method available as demonstrated by [122], such as depth-first search, iterative broadening, large neighborhood search and so on, in conjunction with common constraint satisfaction heuristics for ordering the events.
2.1.4 Constructive Heuristics for Solving the CB-CTT
This investigation explores systematic methods for combining the use of heuristics, or their components, as a way to improve the search process. In the following lines, we describe the set of heuristics and how they work on an example case. The choice of heuristics for this investigation takes some ideas from related works [77, 94]. Thus, we have only considered constructive heuristics, which means that they build a solution from scratch by making one decision at a time. In other words, at each step of the search, they decide which lecture to schedule next, among all the available options. The heuristics selected for this work are described as follows:
Largest Enrollment (LE) The events are ordered by the number of students attending each course, descending. The intuition in this heuristic is that events with a large number of students should be given the highest priority to be scheduled.
Largest Degree (LD) The events are ordered by the number of conflicts with other courses, descending. The intuition in this heuristic is to assess the potential difficulty of schedul- ing each event, and give the highest priority to the hardest ones, while there still is a high time slot availability.
Largest Weighted Degree (LWD) This heuristic is similar to the largest degree heuristic but the conflicts between events are weighted by the number of students enrolled in them.
Events with a higher largest weighted degree are given priority.
2.1. THE CURRICULUM-BASED TIMETABLING PROBLEM 11
Least Available Periods (APD) The events are ordered increasingly by calculating the ratio of the number of available time slots by the number of unassigned lectures. The idea is to assign courses with the most lectures and lowest availability at the earliest.
Least Available Positions (APS) Works similar to APD, but instead of using time slots, to compute the availability for a course, it uses the number of available positions in the timetable. Similar to APD, this heuristic assigns a higher priority to courses with a large number of lectures and low availability.
Figure 2.1: Example data for the heuristic ordering of five courses with six timeslots and three rooms.
In all cases, these heuristics break ties by using the index of the lecture that corresponds to the course as presented in the problem formulation in a first-come-first-served fashion.
Figure 2.1 illustrates how these heuristics prioritize an example with five events, six time slots, and three rooms. The following lines present a further explanation on the calculations of the heuristics described above and how they make their decisions:
• LE would prioritize the Data Science course, simply because is the one with the largest number of students.
• The priority given by LD is also very simple. This heuristic would prioritize the lectures on the ML course because is the one with the largest number of conflicts.
• The evaluation of the LWD heuristic results in a tie between Algorithms and Data Sci- ence, both resulting in 50 after taking into account the number of conflicts and the enrolled students. The heuristic favors Algorithms in this tie since it appears earlier in the list of courses.
• Both APD and APS assign the highest priority to ML, given the low number of time slots and positions available for the course, respectively. This holds even though the course has a relatively low number of lectures pending assignment.
The example above presents the sequence of courses to be scheduled given the priority assigned by each heuristic. It is worth noting that the heuristics work by prioritizing courses, so during the process of assigning a period to an event or lecture, possible changes in the evaluation of conflicts, available time slots, or available positions would not be taken into account until a new heuristic is applied.
2.1.5 Other Educational Timetabling Problems
This section presents formulations of the educational timetabling problem that has attracted the attention of researchers in recent years. However, the focus of this work is exclusively CB-CTT. Therefore, although other problems may be very similar and have real-world appli- cations, they are considered beyond the scope for this research work.
Examination Timetabling In examination timetabling, room assignments offer more flexi- bility when compared to CB-CTT. A room can be shared between two exams, or an exam can be split into two or more rooms if one is too small to accommodate the num- ber of student since the lecturer does not have to be present at all times. Another key difference is that courses must be scheduled within a week that repeats until the end of the term. On the other hand, examination usually takes place at the end of the term, so the time slots available offer more flexibility when compared to course timetabling.
Notable work on examination timetabling includes a fast simulated annealing proce- dure, which obtains competitive results in a relatively small amount of computational time [71] or a mixed-integer programming method for obtaining lower bounds [10].
School Timetabling Contrary to university timetabling, students of a school are aggregated in classes, where they share the same courses at the same time. Hence, here the as- signment of rooms is reduced to a minor concern. Another difference is that schools have typically very dense teaching schedules, so curriculum compactness is also of lit- tle concern. A recent survey of the state-of-the-art on school timetabling is presented in [113].
Post Enrolment Course Timetabling PE-CTT is very similar to university timetabling, with the key difference that in this problem the timetable is produced after the student enrol- ment on courses has taken place. The idea behind this formulation is to maximize the number of options for students when choosing their own timetables. PE-CTT continues to attract the interest of researchers due to its high degree of complexity [51, 115, 121].
While there exist some key differences that distinguish educational timetabling problems from one another, they all share a largely similar structure. Moreover, a procedure able to find a solution for one of them could also be used to find solutions to other timetabling problems, with minor modifications.
2.2 Metaheuristics and Machine Learning Methods
The literature defines a metaheuristic as an iterative procedure that guides a subordinate heuristic by exploring and exploiting the search space in an intelligent manner, with the goal of finding optimal or near-optimal solutions [101]. While metaheuristics can be considered approximation methods, the literature has emphasized their relevance for the CB-CTT, which is why metaheuristics are presented in a different classification for this work. This section presents some metaheuristic methods that have been applied to solve the CB-CTT.
2.2. METAHEURISTICS AND MACHINE LEARNING METHODS 13
2.2.1 Tabu Search
Tabu search (TS) is a method based on a local search procedure that employs a set of moves for transforming one solution state into another until a stopping criterion is met [50]. Local search methods are prone to get stuck in a local optimum, unable to reach other regions of the search space. TS implements a neighborhood structure for each solution, which is explored during the search process. Thus, the method can move to an improved solution in the neighborhood, guided by tabu restrictions and aspiration criteria.
In simpler terms, TS not only moves towards higher-quality solutions (i.e. hill-climbing), but also looks at the neighborhoods of the solutions from the moves produced by the search.
By also tracking solutions of poorer quality (or bad moves), this method avoids getting trapped in a specific region of the search space. Therefore, the effectiveness of TS is related to the neighborhood structure and the criteria to evaluate the moves made during the search process [63].
The CB-CTT has also been solved using tabu search. L¨u and Hao [77] proposed an adap- tive tabu search algorithm with a double Kempe chain neighborhood structure. Their method starts by generating an initial solution using a greedy heuristic, followed by an adaptive stage which combines intensification and diversification, aiming to reduce the soft constraint viola- tions while maintaining the satisfaction of hard constraints. Amaral and Cardal [7] proposed a solution using a rule-based approach to handle the elements in the tabu list and a variation of the compromise ratio to rank the solutions in a neighborhood.
2.2.2 Simulated Annealing
First proposed by Kirkpatrick et al. [66], simulated annealing (SA) is a metaheuristic that assigns a probability to accept worse solutions as a strategy to avoid getting trapped in local optima. The inspiration for naming the algorithm comes from the annealing technique in met- allurgy, which involves heating and a controlled cooling of a material to reduce their defects.
As the search process advances, the method probability of accepting low-quality solutions decreases. Hence, the method increasingly selects solutions that improve on the quality of the current solution.
The probability of transitioning between solutions, as described by Equation 2.6, de- pends largely on a temperature parameter T , which decreases during the search process. At the start of the process, the temperature and the probability of accepting low-quality solutions are high. However, as they progressively decrease, the algorithm converges into a simple iterative improvement algorithm (i.e. hill climbing).
P =
(1 if ∆c ≤ 0,
e−∆c/t if ∆c > 0 (2.6)
Applications of SA for solving the CB-CTT in the literature include the work of Bellio et al. [14], who designed and implemented a statistically-principled methodology for the param- eter tuning procedure. The authors modeled the relationship between the search parameters and the instance features, allowing the algorithm to generalize for previously unseen instances of the problem. Tarawneh et al. [5] proposed a variation of SA where unaccepted solutions are stored to be used in later stages when the search process gets stuck in a local optima.
2.2.3 Genetic Algorithms
First popularized by Holland [58], genetic algorithms (GA) are metaheuristics based on the mechanics of natural selection. GAs algorithms start with an initial population, whose indi- viduals are evolved through biologically inspired operators such as selection, crossover, and mutation [52]. The overall fitness is improved by following the Darwinian principle of the sur- vival of the fittest. Typically, a GA requires a genetic representation of the problem domain and a fitness function to evaluate the solutions.
GAs have been used for the CB-CTT. Abdullah and Tarubieh [1] presented a hybrid ge- netic algorithm that uses a tabu list to control the selection of neighborhoods, The crossover represents a period exchange while preserving the feasibility of the solution, while the muta- tion is used to swap events, allowing diversification. Akkan and G¨ulc¨u [6] tackled the CB- CTT as a bi-objective optimization problem, presenting a hybridization of the standard GA approach by using simulated annealing. The authors attempted to find good solutions in terms of penalty and a custom robustness measure.
2.2.4 Neural Networks
In recent years, there has been an increasing interest in neural networks (NN) [80, 87]. NNs are machine learning methods able to leverage large amounts of data for clustering, regression, and classification [85, 103]. The ability to correctly generalize unseen examples, even with imbalanced or small amounts of data given as input, has made neural networks a popular field of study [18, 49].
An artificial neural network is composed of artificial neurons, which act as simple pro- cessing units. The network consists of interconnected neurons, where the output of one neuron is the input of another [37]. The neurons are activated through weighted connections estab- lished during the training phase. Hence, connections with a low weight represent low impor- tance to the neuron that receives it as input. The neurons are organized into layers, where the middle layers are known as hidden ones. A standard neural network will have every neuron on a layer connected to the adjacent layer [72].
Neural networks were used to solve the school timetabling problem by Carrasco and Pato [27]. In their work, they compared discrete and continuous neural network approaches, finding that the discrete NN had a better performance on a set of five hard problem instances.
Similarly, Smith et al. [105] used modified discrete Hopfield neural networks to solve the university timetabling problem. Their work obtained competitive results compared against metaheuristic algorithms like tabu search or simulated annealing.
2.2.5 Meta-Learning
Meta-learning is a machine learning method based on the idea of learning to learn. This method works by systematically observing the results produced by applying different algo- rithms to problem instances, rather than from traditional features used to describe the in- stances themselves [70]. Meta-learning methods have gained much popularity due to their ability to generalize across different domains by benefiting from prior experience with other
2.3. HYPER-HEURISTICS 15
tasks [3, 98]. Meta-learning approaches have shown significant generalization capabilities even from small amounts of data [42].
A common type of meta-learning is the algorithm selection problem (ASP) [99]. In the ASP, the goal is to relate problem instances and the space of algorithms in such a way as to maximize a performance measure. The meta-learning approach to ASP is based on the use of meta-features, which can be expressed in two different groups. A group of basic meta-features includes general information about problem instances, such as the number of courses or room occupation percentage in the domain of curriculum-based course timetabling. The second group of meta-features, for which we will borrow the denomination of landmarking meta- features, is used to describe the performance of the simple heuristics when applied to different instances, with no concern about the features that describe the given problem instance. Hence, the combination of these groups of meta-features enables a meta-learner, usually implemented in the form of a classifier, to select the best algorithm for new instances of the problem.
In combinatorial optimization, meta-learning techniques have been applied by many re- searchers to solve problems from different domains. To mention a few examples, Kanda et al [62] studied a meta-learning method for the Traveling Salesman Problem (TSP). In their work, they recommend the best meta-heuristic for new problem instances without having to execute them. They showed that by predicting a rank of meta-heuristics it was possible to obtain better solutions when compared to other simpler selection methods. Gutierrez et al [54] present a meta-learning approach for the Vehicle Routing Problem with Time Win- dows (VRPTW). In their research, they considered two groups of meta-features, to train a multi-layer perceptron to select the best meta-heuristic. They outperformed individual solvers in a reasonable amount of time. Smith-Miles [107] used meta-learning techniques for the Quadratic Assignment Problem (QAP), considering three meta-heuristic algorithms. The study was proposed as a classification problem, using a neural network as a meta-learner to predict which algorithm would perform best. The results encourage the use of meta-learning for combinatorial optimization, even when there is limited data to assess the performance of meta-heuristic algorithms.
2.3 Hyper-Heuristics
Hyper-heuristics emerged as a method to build solution systems which can not only solve individual combinatorial optimization problems, but rather a wide range of problem domains.
In other words, hyper-heuristics aim to raise the level of generality for the operation of op- timization systems. Moreover, the term hyper-heuristics was coined by Cowling et al. [32], who described them as “heuristics to choose heuristics”. Burke et al [19] proposed a hyper- heuristic method for automated scheduling which is not restricted to one problem.
The literature on hyper-heuristics lacks clarity regarding a common formal definition of the concept. For this work, we accept the formal definition presented by Pillay and Qu [95].
Hence, a hyper-heuristic is defined as a search method for a given problem p. To find a solution state S, the hyper-heuristic explores a set of heuristics. Heuristic configurations are explored to move from a problem state s to the next s0, stopping when the solution state S for problem p is reached.
In a comprehensive literature review on hyper-heuristics, Burke et al. [20] present a
survey on the state-of-the-art. The authors extended Cowling’s definition of hyper-heuristics to a “search method or learning mechanism for selecting or generating heuristics to solve computational search problems”. Their work presents a commonly accepted classification of hyper-heuristics, distinguishing them by the learning methodology they use and the nature of the heuristic space as shown in Figure 2.2. Then, if distinguishing hyper-heuristics by their learning strategy, they are classified as follows:
Figure 2.2: Hyper-heuristic classification by learning methodology and the nature of the heuristic space as proposed by Burke et al. [20].
Online learning In online learning, the learning takes place in parallel, as the algorithm is solving an instance of the problem. Hence, the problem features are used by the high- level strategy to determine the next heuristic to be applied. Since the algorithm makes decisions as the value for the problem features change, there is no distinction between training and testing phases, common in machine learning methods.
Offline learning Contrary to online learning, this type of hyper-heuristics learn from a set of training instances. In offline learning, the hyper-heuristic collects knowledge in the form of rules during training, which should generalize well to later be applied for solv- ing unseen instances from a related domain. As in online learning, the problem features are mapped to the heuristics.
No learning Hyper-heuristics that do not use feedback from the search process belong to this classification.
Then, according to the nature of the heuristic space, hyper-heuristics can be classified as follows:
Selection hyper-heuristics The basis of selection hyper-heuristics is to choose among ex- isting heuristics. The framework is provided with a set of existing heuristics, and the algorithm must select the most suitable to apply, depending on a given problem state.
The set of heuristics from which the hyper-heuristic framework can choose is generally problem-specific.
2.3. HYPER-HEURISTICS 17
Generation hyper-heuristics In generation hyper-heuristics, the idea is to create new heuris- tics from the components of existing ones. In addition to producing a solution, this hyper-heuristic framework is also able to produce new heuristics as outputs, which can potentially be reused to solve other problems.
Furthermore, the literature considers a second level in this dimension. When considering the search paradigms, hyper-heuristics can be classified as follows:
Constructive hyper-heuristics Starting with an empty solution, constructive hyper-heuristics build a solution incrementally. A sequence of heuristics is applied until the final solu- tion state has been reached. There is an interesting challenge in this paradigm in the form of associating the partial states of the solution with the most adequate heuristic to apply.
Perturbative hyper-heuristics Contrary to constructive methods, perturbative hyper-heuristics start with a complete candidate solution. The perturbative framework then modifies one or more components of the solution to move from a solution state to another. The pro- cess is repeated until a final solution state has been reached.
2.3.1 Hyper-Heuristics for Educational Timetabling
The literature contains a wide range of examples of hyper-heuristics used to solve educational timetabling problems, particularly the CB-CTT. Qu and Burke evaluated different local search algorithms for use in a selection hyper-heuristic [96]. The hybrid method was employed to search the space of the heuristic combination and the space of the solutions. This hybrid method was tested on both course and examination timetabling problems, achieving competi- tive results. Soria-Alcaraz et al. proposed an iterated local search hyper-heuristic, alternating between a perturbation and improvement phase [110]. These phases employ offline and online learning, respectively, as feedback from the search process.
Burke et al. proposed a multi-objective hyper-heuristic that employs tabu search for searching in the space of perturbation heuristics [24]. Then, each heuristic is selected to optimize a specific objective during each iteration of the search. Kalender et al. proposed a selection hyper-heuristic employing greedy gradient for heuristic selection and simulated annealing (SA) for move acceptance to guide the search process [60, 61].
Hyper-heuristics based on evolutionary algorithms (EA) are also a popular approach for educational timetabling. Terashima-Marin et al. used a non-direct chromosome representa- tion based on evolving the configuration of constraint satisfaction methods for examination problems [114]. Pillay et al. [91, 92] also implemented an EA hyper-heuristic to search in the space of heuristic combinations using three different chromosome representations, which produced competitive results when compared against that of other hyper-heuristic methods based on other methodologies.
The works cited in this section show the feasibility of applying hyper-heuristics for the CB-CTT and similar problem domains and as a general solution for educational timetabling.
Moreover, evidence on hyper-heuristics successfully generalizing over problem domains [79, 81] makes this an interesting topic of research for the advancement of educational timetabling.
2.4 Cross-Industry Standard Process for Data Mining
Extracting information from data has occurred for centuries. Manual methods for identifying patterns in data include Bayes’ theorem in the 1700s [33] and regression analysis dating back to the 19th century [112]. However, with the power of computer technology dramatically increasing the capacity for data collection and storage, manual methods gave way to more sophisticated techniques like Knowledge Discovery in Databases (KDD) [41]. KDD intro- duced data mining as the process of extracting information from large sets of data, usually in the form of patterns, anomalies and correlations for predicting outcomes. Cross-Industry Standard Process for Data Mining (CRISP-DM) is an open standard that describes the steps for data mining [104].
Figure 2.3: Example of the steps described by the Cross-Industry Standard Process for Data Mining (CRISP-DM). The outer circle represents the cyclic nature of the data analysis pro- cess.
The CRISP-DM methodology provides a general framework that can work with any project. This methodology for data mining is structured in six sequential phases where the output of one becomes the input of the next. However, the stages in CRISP-DM are flexible in such a way that some phases allow partial or complete revision of previous stages as shown in Figure 2.3. The six phases of CRISP-DM are described as follows:
2.5. SUMMARY 19
Business Understanding The initial phase of the CRISP-DM methodology focuses on un- derstanding the goals of the project. Once the objectives have been established clearly, the data analysis process can begin in the context of a particular model.
Data Understanding The data understanding phase starts with an initial data collection, fol- lowed by strategies to get more familiar with the data, such as Exploratory Data Anal- ysis (EDA) [116]. The preliminary knowledge extracted by applying EDA can be used to identify opportunities with the data.
Data Preparation The goal of the data preparation phase is to process the available raw data to produce the datasets used for modeling. Common tasks include data cleaning [97], feature selection [65] and feature transformation [68].
Modeling In the modeling phase, machine learning algorithms are implemented and applied to the input data. Complementary tasks like parameter tuning [11] usually take place to improve the performance of the models. Since the implementation of machine learning algorithms may have specific requirements regarding input data, it is very common to move back and forth with the data preparation phase.
Evaluation The goal on the evaluation phase is to determine if the produced models satisfy the requirements established on the business understanding phase. The output of this phase is an assessment of the overall quality of the models and the decision to whether deploy them into production.
Deployment The last phase of the CRISP-DM methodology focuses on deploying the created models into production. However, note that the implementation of the models presented in this thesis is beyond the scope of this work.
Existing research recognises the feasibility of solving combinatorial optimization prob- lems under the light of the CRISP-DM methodology. Makrymanolakis et al. proposed an algorithm that combined various local search techniques, guided by CRISP-DM, for solving the Permutation Flow Shop Scheduling Problem (PFSSP) [78]. The authors reported that a great amount of systematic executions were required to extract knowledge from the avail- able datasets, contrary to more empirical random executions. While initially expensive in terms of computational power, the data mining procedure proved useful for parameter con- trol. Haeri et al. presented a hybrid data mining approach based on multi-objective particle swarm optimization (MOPSO) for the Travelling Salesman Problem (TSP) [56]. Data mining was used to extract rules based on the most efficient solutions of MOPSOs. The extracted rules were then applied to solve new TSP instances, improving on the quality solutions obtained by MOPSO in a subset of the instances.
2.5 Summary
This chapter described the most relevant topics related to this work. The theoretical framework focused on the Curriculum-Based Timetabling problem and strategies to solve the problem,
including metaheuristics, hyper-heuristics, and other machine learning methods. The CB- CTT represents a significant field of study, both for its NP-hard nature and the recurrent task of building timetables by administrative personnel in universities.
The concept of hyper-heuristic introduced in this chapter is quite relevant for this inves- tigation. The next chapters will focus on hyper-heuristics as high-level heuristics that solve a problem by selecting low-level heuristics. These approaches have been widely used to solve combinatorial optimization problems from a wide range of domains [36]. However, this dis- sertation focuses on hyper-heuristics for the CB-CTT.
The next chapter presents the methodology followed throughout this research work to develop two hyper-heuristic models.
Chapter 3
Methodology: Hyper-Heuristic Generation Through CRISP-DM
This chapter introduces a hyper-heuristic model generated by following the stages of the Cross-Industry Standard Process for Data Mining (CRISP-DM). The training process based on a data-driven approach provides a model to generalize the patterns of permutations of existing heuristics so that specific combinations increase the performance on a wider set of problem instances. These steps of CRISP-DM are used to generate selection hyper-heuristics.
3.1 Business Understanding
This section presents the objective of the CRISP-DM process, for finding suitable rules and patterns on instances of the CB-CTT. In particular, we want to define in a very concise manner the timetabling problem we are going to study, the characteristics of the problem we consider, and the ultimate goal we aim to achieve.
3.1.1 The Curriculum-Based Course Timetabling Problem
Scheduling is a very important activity for many aspects of life, as timetables are necessary in many different fields like education, healthcare, transportation, entertainment, and so on.
This research focuses on educational timetabling, as this is a recurring activity for the admin- istration of educational institutions that still does not have a trivial solution, being considered among the NP-complete problems.
A widely accepted breakdown of the educational timetabling problem is university course timetabling, school timetabling, and examination timetabling. For this research, we decided to focus on the university course timetabling as it poses the interesting challenge of designing different arrangements of lectures that allow students to choose their own personal schedule, which might be different from their peers. Contrary to this, in school timetabling the students are aggregated in classes, and classes are assigned to specific rooms. Another interesting chal- lenge presented by the university course timetabling is that courses are normally scheduled within a week, repeating through the semester, while in examination timetabling the events take place only once at the end of the semester.
21
University course timetabling can be broken down into post enrolment course timetabling and curriculum-based course timetabling. Both of them present different challenges, the main difference between them is that in the former the timetable is produced after the student enrol- ment has taken place, while in the latter the timetable is created based on curricula published by the university. In this study, we decided to tackle the curriculum-based course timetabling, as it corresponds to the way timetables are produced in Tecnologico de Monterrey.
The curriculum-based course timetabling problem (CB-CTT) deals with the allocation of the courses formed by lectures on different timeslots, which are formed by the number of working days and the number of periods per day, subject to some constraints. The distribution of these entities may vary among different instances of the problem, making it difficult to tell if how easy or difficult it will be to find a good solution. For example, an instance with a large number of lectures and a small number of timeslots might look very constrained, but can be easy to resolve if there is a large number of rooms.
These characteristics of an instance can also vary during the process of allocating lec- tures into timeslots. For example, it might be good to schedule early the lectures that can produce a lot of conflicts whether by sharing students or professors, rather than later when the number of available timeslots has been reduced.
3.1.2 Justification of the Hyper-Heuristic Approach
Previous work on university timetabling problems has demonstrated that combining heuristics yield good results, including research on the CB-CTT [93]. Therefore, the question arises:
how can we find a sequence of heuristics that perform better than others on CB-CTT in- stances? Experimentation tests were conducted on synthetically generated CB-CTT instances and then compared against random methods. The results encouraged the idea of creating data-driven hyper-heuristic models can obtain competitive results when compared with the traditional heuristics run in isolation. A set of experiments was created to test the best hyper- heuristic model against heuristics commonly used on scheduling problems.
In the next section, we identify the properties which can be used to characterize an instance and how they vary during the solution process with the goal of deciding how easy or difficult it will be to find a good solution. Particularly, how suitable a given heuristic is to different instances of the problem.
3.2 Data Understanding
This section presents the data we consider for understanding the instances of the CB-CTT.
More specifically, this section is divided into two branches: static and dynamic characteristics.
The distinction is made to separate the characteristics that can be used to describe the initial state of an instance and the features that are better suited to describe specific states of the problem throughout the construction phase.
3.2. DATA UNDERSTANDING 23
3.2.1 Static Features
In this study, we will call static features the set of features that can be used to describe in- dividual instances of the problem and do not change throughout the construction process of assigning lectures. These features are common in the literature [17] and include features re- lated to the size of the problem like the number of lectures, courses and curricula. Features that describe how many positions an instance will have to accommodate lectures include the number of available rooms, the number of working days in the week, and the periods per day are such characteristics.
Other static features can be computed from the ones mentioned previously with the idea of establishing the relationship between the individual features. Some of these calculated static features include the total available positions, the average conflicts among lectures, the average teacher availability, and the average room occupation.
The total available positions P represent the number of spaces where a lecture can be allocated. This is calculated simply by multiplying the number of rooms r by the number of working days in the week d and the periods p formed by each day. Formally, the positions can be calculated as Equation 3.1.
P = rpd (3.1)
To calculate the average percentage of conflicts among lectures, let Kc denote the con- flicts of a lecture of course with others. Conflicts between pairs of lectures occur if they cannot be scheduled for the same timeslot, whether they share the same teacher, course or curricu- lum. Let C denote the set of all courses and lc the number of lectures on course c, then the set of all lectures L will be the sum of every lecture l on every course. Therefore, the average conflict percentage Co can be computed as Equation 3.2.
Co=
n
P
cC
(kclc)
L (3.2)
Let ucbe the number of unavailable periods for course c. The average teacher availabil- ity A is represented by the set of periods where lectures a course lc must not be scheduled.
Then, we can compute the unavailability percentage for lectures from a single course as Equa- tion 3.3. Therefore, the average A for every lecture can be calculated as Equations 3.3 and 3.4.
ac = 1 − uc
pd (3.3)
A=
n
P
cC
(aclc)
L (3.4)
Finally, the average room occupation O can be computed by dividing the total number of lectures l by the total number of positions computed by Equation 3.1. Formally, O can be computed as:
O = l
rpd (3.5)