Instituto Tecnologico y de Estudios Superiores de Monterrey

(1)

Monterrey Campus

School of Engineering and Sciences

Exploring Data-Driven Selection Hyper-Heuristic Approaches for the Curriculum-Based Course Timetabling

A thesis presented by

Carlos Alfonso Hinojosa Cavada

Submitted to the

School of Engineering and Sciences

in partial fulfillment of the requirements for the degree of

Master of Science

in

Computer Science

Monterrey, Nuevo Le´on, December, 2020

(2)

Declaration of Authorship

I, Carlos Alfonso Hinojosa Cavada, declare that this thesis titled, Exploring Data-Driven Se- lection Hyper-Heuristic Approaches for the Curriculum-Based Course Timetabling and the work presented in it are my own. I confirm that:

• This work was done wholly or mainly while in candidature for a research degree at this University.

• Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.

• Where I have consulted the published work of others, this is always clearly attributed.

• Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work.

• I have acknowledged all main sources of help.

• Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

Carlos Alfonso Hinojosa Cavada Monterrey, Nuevo Le´on, December, 2020

iii

(3)

(4)

Dedication

I would like to express my deepest gratitude to all those who have been by my side on every step of the way: my parents, friends and family. Thank you all for your unconditional confi- dence, support, patience, and encouragement. You were my main motivation for completing this work.

v

(5)

(6)

Acknowledgements

First and foremost, I would like to thank my family for being my driving force and motivation.

This achievement is also yours.

To Brianda Daniela Flores Garc´ıa for your love and support, you inspire me every day.

Life by your side is beautiful. I am very grateful to share it with you.

Special thanks to my advisor, Dr. Santiago Conant, who was always there to guide the project through these last two years. The encouragement of Dr. Jos´e Carlos Ortiz Bayliss was also fundamental for driving this project to conclusion.

To Erick Lara C´ardenas, for your everlasting willingness to help, and for sharing your passion for technology and software development. I appreciate your advice for completing this dissertation.

To my classmates Eider D´ıaz and Miguel Ángel Cortés. We made a great team, may we continue doing so for years to come. Thanks to Wario Chávez, Axel Ramos and the great group of people I was fortunate to meet these years. You were brilliant.

This work was possible partly because the tuition support and computing resources provided by Tecnologico de Monterrey and CONACyT.

vii

(7)

(8)

Exploring Data-Driven Selection Hyper-Heuristic

Approaches for the Curriculum-Based Course Timetabling by

Carlos Alfonso Hinojosa Cavada Abstract

The curriculum-based timetabling problem (CB-CTT) represents a challenging field of study within educational timetabling, with real-world applications that stress its importance. Solv- ing a CB-CTT problem requires allocating a set of courses using limited resources, subject to a set of hard constraints that must be satisfied [102]. The goal then is to find a feasible assignment for every lecture that constitutes the courses to the positions in the timetable formed by a combination of day, period, and room; all while minimizing an objective function as specified by the constraints in the problem.

Designing the timetable for the courses in the incoming term is a problem faced by universities each academic period. Given the complexity of manually designing timetables, automated methods have attracted the attention of many researchers for solving this problem.

The design of timetables remains an open problem to this day. According to the no free lunch theorem [120], different heuristics are effective on different problem instances, stressing the importance of finding automated methods for designing timetables. This dissertation explores novel hyper-heuristic models that rely on various machine learning techniques, such as boosting, clustering and principal component analysis. In total, two models were designed and implemented as results of this work.

The first model relies on gradient boosting algorithms to generate a selection hyper- heuristic. The general idea is that different instances of the CB-CTT are best solved by different heuristics. Hence, the aim is to create a model that learns from the features that describe problem instances and predicts which would be the most suitable heuristic to apply. While the classification model produces promising results in terms of accuracy, the quality of the generated solutions is bounded by the best-known single heuristic.

The second model aims to remove the bounds set by the use of a single heuristic by exploring ways of combining heuristics during the timetable construction process. The selection hyper-heuristic approach is powered by principal component analysis and k-means.

The model starts by identifying similar regions in the instance space and keeping track of the performance of each heuristic for those regions. Then, when constructing new timetables, the model determines the most suitable heuristic for a given region of the instance space. The method was able to outperform the synthetic oracle created by taking the result of the best isolated heuristic in several instances.

This dissertation is submitted to the Graduate Programs in Engineering and Information Technologies in partial fulfillment of the requirements for the degree of Master of Science in Computer Sciences with a major in Intelligent Systems.

ix

(9)

(10)

List of Figures

2.1 Example data for the heuristic ordering of five courses with six timeslots and three rooms. . . 11 2.2 Hyper-heuristic classification by learning methodology and the nature of the

heuristic space as proposed by Burke et al. [20]. . . 16 2.3 Example of the steps described by the Cross-Industry Standard Process for

Data Mining (CRISP-DM). The outer circle represents the cyclic nature of the data analysis process. . . 18 3.1 Lecture distribution for a 200-lecture CB-CTT instance with a 9-block long

hyper-heuristic. Lectures are randomly distributed, with the goal of exploring the space of the instance without bias. . . 27 4.1 Block diagram showing the process of generating feature-driven hyper-heuristics. 33 4.2 Graphical representation of the accuracy of XGBoost after feature transfor-

mation. The Figure shows that the model is able to choose a competitive heuristic, even when failing to predict the best known value. . . 34 4.3 Graphical representation of the importance of features used for classification

as measured by their F-Score. . . 35 5.1 Block diagram showing the process of generating hyper-heuristics for a single

experiment scenario as shown on Table 5.2. The methods and techniques used for PCA and clustering are described in Section 5.1.1. Also, the method by which the input dataset is created is described in Section 3.3.2 . . . 40 5.2 Graphical representation of the cumulative percentage of the variance as ex-

plained by the principal components . . . 42 5.3 Graphical representation of the Elbow method. As the number of cluster in-

creases, the WCSS value decreases. A change in the steepness on the rate of decrease indicates the optimal number of clusters. . . 43 5.4 Success rate of the best hyper-heuristics (30 runs) trained with 7500 data

points using PCA for feature selection and the lowest average to determine the best heuristic on the cluster (HH17, HH25), feature importance for selection (FI) and roulette wheel for deciding which heuristic to use (RW). . . 48 5.5 Graphical representation of the penalties for different solution methods on the

benchmark instances. The best known value using the heuristics alone (H*, blue) is outperformed by the hyper-heuristic methods (HH18, green). . . 50

xi

(11)

(12)

List of Tables

3.1 The set of features used to describe instances on the problem. Some features are extracted only after an instance has gone through the solution process. . . 24 3.2 The set of features used to describe the state of the problem as it changes

during the construction phase. . . 25 3.3 Description of the ITC-2007 instances . . . 26 4.1 Curriculum-based course timetabling features considered for training classifi-

cation models. Each feature is mapped to a unique ID for simplicity. . . 35 4.2 Accuracy scores for XGBoost and Adaboost models with varying number of

features, according to their importance during training stages. . . 36 5.1 The mean of the cost for each observation on every cluster is calculated. The

heuristic with the lowest average cost is considered the best for the observations related to that cluster. . . 44 5.2 Experiments for data-driven hyper-heuristic models are divided into three

stages: exploratory, extended and confirmatory. On the first two scenarios, we increase the count of data points used to train the hyper-heuristics. Confir- matory experiments are performed on the benchmark set. . . 45 5.3 Hyper-heuristic success rate (%) for scenario 1 (see Table 5.2). The best

hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered as a success when the hyper-heuristic is better than H*. . . 46 5.4 Hyper-heuristic success rate (%) for scenario 2 (see Table 5.2). The best

hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered a success when the hyper-heuristic is better than H*. . . 47 5.5 Hyper-heuristic success rate (%) for scenario 3 (see Table 5.2). The best

hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered a success when the hyper-heuristic is better than H*. . . 47 5.6 Hyper-heuristic success rate (%) for the confirmatory experiments on sce-

nario 4 (see Table 5.2). The best hyper-heuristic (considering its performance against H*) is highlighted in bold. A test is considered as a success when the hyper-heuristic is better than H*. . . 49

xiii

(13)

(14)

Chapter 1 Introduction

Timetables are present in many different areas of life. As an organized society, we need systems to schedule when things are going to happen whether it is consulting the time for a movie screening, booking a flight, or attending meetings at work. In universities, scheduling courses is a recurring activity, with timetables often adjusted or redesigned for each academic period. Moreover, while we can easily read and understand timetables, creating them can become very complex due to the large number of entities involved. Problems where finding a solution could be computationally expensive, but the solution can be verified easily are called non-deterministic polynomial problems (NP) [69].

A further classification on the complexity of problems lies on the NP-hard class. These problems are at least as hard as the hardest problems in NP, but do not necessarily belong to the NP class [67, 118]. The halting problem is an example of NP-hard complexity [117].

Furthermore, the intersection between NP and NP-hard problems is called NP-Complete [4].

While some instances of NP-complete problems may be solved in polynomial-time, there is no known polynomial-time algorithm that guarantees optimal solutions as the instance size increases [47].

Educational timetabling problems belong to the NP-hard class. The problems in this field of artificial intelligence and operational research have many real-world applications, as designing timetables is a recurring activity for educational institutions. This work focuses on the Curriculum-Based Course Timetabling Problem (CB-CTT), which deals with the allocation of lectures that involve teachers and students in a previously defined period of time, while satisfying a set of constraints [102]. Depending on the size of the CB-CTT instance, performing an exhaustive search to find the optimal solution may not be feasible.

In the following sections, we present the Curriculum-Based Course Timetabling problem and the motivation to conduct the study in this area. This is followed by the hypothesis and research questions that led to the objectives of this dissertation. Next, a description of the methodology of how hyper-heuristics and data science will be applied to the problem is presented. Finally, the section ends with the contributions derived from this work and a description of how this document is organized.

1

(17)

1.1 Motivation

Formally introduced as the third track in the Second International Timetabling Competi- tion (ITC-2007), the curriculum-based course timetabling (CB-CTT) is a type of university course timetabling where the task is to schedule a set of lectures on a set of time slots, subject to specific constraints. The constraints are divided into two sets: hard constraints that must be satisfied in order to consider a feasible solution and soft constraints, which make a timetable more desirable. Thus, solving the CB-CTT requires to find a timetable that allocates every lecture of every course while minimizing the penalty for failing to satisfy the soft constraints.

Educational timetabling belongs to the NP-hard problems [35, 40], as real-world applications can become quite challenging. Additionally, the formulation of the problem varies to accommodate the requirements of different universities in specific periods of time, making CB-CTT an essential problem to study.

While exact methods aiming for lower bounds optimality have been proposed for solving the CB-CTT [73, 90], it is still considered an open problem, attracting the attention of researchers around the world at the time of writing [30, 34]. These exact methods have obtained varying degrees of success, but are limited to instance sizes and constraints [48], hence the use of heuristic methods such as metaheuristics and hyper-heuristics.

Heuristic methods [21] specifically tailored for the CB-CTT problem have attracted attention from researchers in recent years, as surveyed by Pillay in [93]. The goal of heuristic methods is to generate approximate solutions in a short time using few computational resources. In educational timetabling, heuristic methods are usually applied to create an initial solution, which cannot be guaranteed to be feasible let alone optimal, yet this initial timetable can later be optimized by more sophisticated methods. Some examples of this approach include the automated generation of constructive ordering heuristics [94] or iterated local search combining perturbation and hill-climbing [111].

Selecting the most suitable algorithm for an instance of a problem or a specific situation during the solution process can be approached by selection hyper-heuristics [21, 36]. In this study, procedures based on data mining are presented as an alternative for constructing the timetables for the CB-CTT problem, as research has shown early decisions impact the overall quality of the solutions [82]. The process of generating data-driven selection hyper-heuristics for the CB-CTT is framed using the cross-industry standard process for data mining (CRISP- DM) [104].

1.2 The Curriculum-Based Course Timetabling

Educational timetabling is defined as the allocation, subject of constraints, of given resources to objects placed in space-time in such a way as to satisfy as much as possible a set of objectives. Timetabling is similar to the graph coloring problem [100, 84], in that nodes represent events and edges conflicts between them.

In its most basic form, the elements that the timetabling problem is concerned with include combining a set of courses, a set of professors, a set of rooms, and a set of timeslots following constraints. The constraints on the timetabling problem can be classified into two categories: hard and soft. The hard constraints must be satisfied for a solution to be feasible,

(18)

1.3. HYPOTHESIS 3

while soft constraints can be based on a cost function where maximizing it is related to the quality or desirability of that solution.

1.3 Hypothesis

The application of data science techniques to analyze the data collected from the execution of heuristic methods can be used to discover insights to make decisions for the building selection hyper-heuristics that perform better than the best heuristic for the initial timetable construction on instances of the curriculum-based course timetabling problem.

The main research question for this work is how can data science can uncover patterns in the data generated by the execution of heuristic methods to create hyper-heuristics that find high-quality solutions for the CB-CTT problem. To find this answer, several more specific questions must be answered first. These questions include:

• What data should be collected to best describe the different states of the problem during the solution process? How often should such data be collected?

• Which low-level heuristic methods can be used to create an initial solution for the CB- CTT instances? How do such heuristics behave on different instances of the CB-CTT problem?

• How will partial states of the CB-CTT solution process be evaluated and associated with different low-level heuristics?

• How do the different features that characterize the instances of the CB-CTT problem affect the training process for the hyper-heuristic?

• How can the trained machine learning models be evaluated to decide which one provides the best information for assigning the best possible heuristic at every step of the solution process?

• Will the combination of different low-level heuristics produce better solutions compared against applying each heuristic in isolation?

1.4 Objectives

The main objective of this work is to conduct an experimental analysis of a data-driven model for improving hyper-heuristic performance on the Curriculum-Based Course Timetabling problem. We analyze the role of the data science pipeline for the implementation and improvement of this hyper-heuristic model. For this, the following particular objectives must be achieved:

• Collect data during the solution process that can be leveraged to describe the impact of heuristics for different stages of the solution.

• Determine the feasibility of designing and implementing a hyper-heuristic model by comparing the results obtained by applying heuristics in isolation.

(19)

• Design and implement at least one model to produce selection hyper-heuristics for the CB-CTT based on the features of the instances.

• Design and implement at least one model to produce selection hyper-heuristics for the CB-CTT based on the data collected from different states of the solution process.

• Present the advantages and disadvantages of solving the CB-CTT problem with the proposed models.

The expected contribution depends directly on the successful completion of these objectives. Once they are completed, the results will provide evidence that data science can be leveraged to guide the design of hyper-heuristic models for the Curriculum-Based Course Timetabling.

1.5 Methodology

This section describes the steps needed to test the hypothesis and accomplish the objectives stated in the previous sections. The following elements are sorted by priority, starting by determining the data collection methodology and concluding with the experimentation and analysis of results.

• Determine the data collection methodology. The first step of the research involves the creation of a dataset with features that describe the effect of heuristic methods applied to the problem instances during the different stages of the solution process. This step includes the research as to which features are commonly used to describe the problem, as well as designing other characteristics that help describe the problem during the solution process. In addition to determining what data will be collected, the frequency for the data collection has to be set before selecting the final dataset.

• Define the set of data science procedures to get insights from the collected data. The generated data on the previous step holds the key to define a data-driven hyper-heuristic method, but data science techniques must be applied to the raw data to uncover insights.

The selection of these techniques will be based on literature research and may include feature selection, feature scaling, the training of machine learning models and statistical evaluation of the results.

• Specify the data-driven hyper-heuristic model. Define and implement a specific hyper- heuristic model based on the insights gathered from data per the steps above. The strategy for the design of the hyper-heuristic is done in a way that makes it possible to compare the results with the ones obtained by the heuristics applied in isolation.

• Conduct experiments and analyze their results. Train the hyper-heuristic model using the problem instances from which data was collected. Define a performance metric to evaluate the different solution strategies across different instances.

(20)

1.6. CONTRIBUTIONS 5

1.6 Contributions

The present dissertation deepens into hyper-heuristics for solving the CB-CTT. The main contribution of this thesis is the design, experimentation and analysis of two data-driven hyper- heuristic models for the CB-CTT. In a more descriptive way, the contributions of this investigation are:

Applying a feature-driven selection hyper-heuristic to the CB-CTT. This work proposes a selection hyper-heuristic model which uses features of the instances to make decisions.

In the model presented in Chapter 4, knowledge obtained from previous experience is leveraged to select the most suitable heuristics while constructing a timetable for the CB-CTT.

A data-driven selection hyper-heuristic for the CB-CTT. The selection hyper-heuristic model for combining heuristics based on previous experience is presented in Chapter 5. In this model, clustering algorithms are used to group observations of intermediate problem states to determine which is the most suitable heuristic to apply at those stages. This approach could be extended to other domains of combinatorial optimization.

A framework for studying the overlap of data science and optimization in timetabling. To the best of the author’s knowledge, there is little to no research on selection hyper- heuristics generated through a data mining process that has been used to solve the CB- CTT. In the models proposed in this dissertation, the optimization process is guided by a strategy based on data science.

1.7 Document Structure

The following chapter presents the formal definition of the CB-CTT and entries in the literature related to the problem. Chapter 3 describes the proposed solution model for this dissertation. In chapter 4, the reader will find experiments and their corresponding analysis under the light of the CRISP-DM methodology. Finally, chapter 5 presents the conclusions and future work derived from this dissertation.

(21)

(22)

Chapter 2 Theoretical Background

This chapter presents a review of the state of the art related to this research. A brief description of the Curriculum-Based Course Timetabling (CB-CTT) problem is presented, along with selected approaches from the literature to solve them. The concept of hyper-heuristic is defined next, with a special focus on selection hyper-heuristics and some background on their application to the CB-CTT problem. Next, the concept of meta-learning is presented. Finally, the concepts and definitions relevant to this research are summarized.

2.1 The Curriculum-Based Timetabling Problem

This section covers a formal definition for the curriculum-based course timetabling problem (CB-CTT). A brief introduction of the key concepts is presented, followed by some popular solving strategies for the problem and example instances. Finally, we briefly introduce other formulations of the university timetabling problem.

2.1.1 Definitions and Formal Representation

A classic timetabling problem deals with the allocation of events to slots in time, subject to the available resources and explicit constraints. Therefore, the problem can be represented by four sets: the set of events, the set time periods, the set of resources, and the set of constraints.

Eventsare activities to be scheduled. In the domain of the CB-CTT, the events represent the lectures that form each of the courses in the problem instance. These courses are aggregated to form the curricula, which represents a set of courses that must be taken by a group of students. The set of time periods is formed by multiplying the number of teaching days by the number of time slots of each day. Hence, a time period is a pair formed by day and time slot. Finally, the set of constraints is formed by hard constraints which must be satisfied for a timetable to be valid and soft constraints, which indicate the quality of the solution.

The problem is detailed as follows. Let li,k ∈ L_i be a lecture of course ci from the set of courses C =

n

S

i=1

c_i to be scheduled in the set of p time periods P = dh, where d is the number of teaching days and h is the number of time slots per day. Likewise, each lecture l_i must be scheduled in a room r from the set of rooms given R =

n

S

i=1

r_i. In addition, sets

7

(23)

of k curricula K =

n

S

i=1

k_i are formed by courses which share common students. This way, a timetable x from the space of all possible timetables X is represented as a pr matrix, where xi,j corresponds to a lecture assigned to a period pi and a room rj. Given this notation, we have a candidate timetable that might be a valid solution.

For a timetable to be considered feasible, it must satisfy every hard constraint and every soft constraint violation incurs a penalty cost. This study adopts a formulation from ITC-2007 commonly found in the literature with a set of four hard constraints H = {H1, H₂, H₃, H₄} and a set of four soft constraints S = {S₁, S₂, S₃, S₄} considered as follows:

Lectures (H1) All lectures of every course must be scheduled in distinct periods. In other words, H1 will not be satisfied and a timetable will not considered valid until every lecture has been allocated.

Room Occupancy (H2) Two lectures cannot be assigned to the same room in the same period. In simpler terms, lectures cannot share rooms at the same time for H2 to be satisfied.

Conflicts (H3) Lectures of courses in the same curriculum or taught by the same teacher cannot be scheduled in the same period, to prevent overlapping of students or teachers.

Availability (H4) If the teacher of a course is not available at a given period, then no lectures of the course can be assigned to that period.

Recall that soft constraints have no impact over the feasibility of a timetable. Once every hard constraint has been satisfied, the quality of a timetable is determined by the weighted sum of the soft constraint violations. The set of soft constraints is detailed as follows:

Room capacity (S1) For each lecture, the number of students s_r attending the course should not be greater than the capacity of the room c_r hosting the lecture. S1 is formally described by Equation 2.1.

Room stability (S2) The number of rooms r for scheduling the lectures of a course c_i should not exceed 1. S2 is formally described by Equation 2.2.

Minimum working days (S3) The lectures of a course should be spread s(c_i) over a given minimum number of days m. S3 is formally described by Equation 2.3.

Curriculum compactness (S4) For a given curriculum, a violation is counted if there is one lecture not adjacent to any other lecture l belonging to the same curriculum C_i within the same day d, which means the agenda of students should be as compact as possible.

S4 is formally described by Equation 2.4.

f₁(x_i,j) =

(s_r− c_r, if s_r > c_r;

0, otherwise . (2.1)

f₂(c_i) = r(c_i) − 1 (2.2)

(24)

2.1. THE CURRICULUM-BASED TIMETABLING PROBLEM 9

f₃(c_i) =

(mi− s(ci), if mi > s(ci);

0, otherwise . (2.3)

f₄(x_i,j) =X

ciC

d · l (2.4)

Finally, the quality of a candidate solution is determined by the total soft penalty cost, calculated by evaluating an objective function defined in Equation 2.5. Soft constraint violations are associated to penalty coefficients as follows α₁ = 1, α₂ = 1, α₃ = 5, α₄ = 2 per the formulation proposed in [17]. The goal is to find a feasible solution X⁰ such that f (X⁰) ≤ f (X) for all X candidate solutions in the solution search space.

f(X)= α₁· X

xi,jX

f₁(x_i,j) + α₂ ·X

ciC

f₂(c_i) + α₃·X

ciC

f₃(c_i) + α₄· X

xi,jX

f₄(x_i,j) (2.5)

2.1.2 Exact Methods for Solving the CB-CTT

The literature refers to exact methods as those capable of producing the best possible solution every time they are executed. With exact methods, the solution found must always be the optimal one. The following lines describe two of these methods.

Branch and cut In branch and cut, the problem is formulated as a mixed-integer linear programming problem and solved combining the cutting plane method with a branch-and- bound algorithm [39]. The purpose of the branch-and-bound method is to search on the space of all possible solutions, such that solution attributes not shared among optimal solutions are found as early as possible. The set of feasible solutions is thought of as forming a rooted tree with the full set at the root. In branch-and-cut, a cutting plane algorithm is applied in each node before branching. The objective is to improve the formulation in each node. Compared to a classical Branch and bound algorithm, it decreases the number of nodes to be explored but it increases the computation effort in each node [12, 22].

Boolean Satisfiability (SAT) and MaxSAT Herein, the CB-CTT is proposed as different en- codings based on Basic SAT encoding for modeling the subsets of hard and soft constraints [31]. Max-SAT is a generalization of the boolean satisfiability problem which asks whether there exists a boolean formula that evaluates to true when replacing all its variables by the values true or false [29]. In this exact method, both soft and hard constraints of the CB-CTT problem are defined as hard. Hence, when the SAT solver finds a solution for an instance, the resulting time-table is guaranteed to be optimal at zero-cost [2].

2.1.3 Approximation Methods for Solving the CB-CTT

While the exact methods presented in the previous section guarantee to find an optimal solution when it exists, these are rarely used in real-world scenarios. Exact methods are limited

(25)

by the size of the instance since the search space of the CB-CTT can grow dramatically with the number of lectures. Hence, performing an exhaustive search in the space of the solutions might require many resources. The literature describes approximation methods as an alternative for dealing with problem instances where exact methods cannot be used [53]. Ap- proximation methods present a trade-off where they do not guarantee to find an optimal solution, but they can be executed using only a few resources in terms of memory and CPU time when compared with exact methods. The results obtained from approximation methods can be competitive when compared to the optimal solution, making them a field worth studying.

Approximation methods for scheduling problems include ruled-based approaches and heuristics, which have been used indistinctly in the past [86]. However, these concepts have some slight differences [102]. The rule-based approach is a solution technique based on ex- pert systems for resource allocation, where lectures are considered activities and periods are considered resources to be assigned as activities. The RAPS rule-based language proposed by Solotorevsky et al [108] is the most common basis of this kind of approach. In contrast, heuristics are defined simply as rules of thumb for resource allocation.

Constraint programming is another relevant solution technique that has proved effective for CB-CTT [25]. In this approach, the timetabling problem is framed as a constraint satisfaction problem where the lectures are defined as the variables and the time slots are the domains. Then, the exploration of the relevant search space can be carried out by any search method available as demonstrated by [122], such as depth-first search, iterative broadening, large neighborhood search and so on, in conjunction with common constraint satisfaction heuristics for ordering the events.

2.1.4 Constructive Heuristics for Solving the CB-CTT

This investigation explores systematic methods for combining the use of heuristics, or their components, as a way to improve the search process. In the following lines, we describe the set of heuristics and how they work on an example case. The choice of heuristics for this investigation takes some ideas from related works [77, 94]. Thus, we have only considered constructive heuristics, which means that they build a solution from scratch by making one decision at a time. In other words, at each step of the search, they decide which lecture to schedule next, among all the available options. The heuristics selected for this work are described as follows:

Largest Enrollment (LE) The events are ordered by the number of students attending each course, descending. The intuition in this heuristic is that events with a large number of students should be given the highest priority to be scheduled.

Largest Degree (LD) The events are ordered by the number of conflicts with other courses, descending. The intuition in this heuristic is to assess the potential difficulty of scheduling each event, and give the highest priority to the hardest ones, while there still is a high time slot availability.

Largest Weighted Degree (LWD) This heuristic is similar to the largest degree heuristic but the conflicts between events are weighted by the number of students enrolled in them.

Events with a higher largest weighted degree are given priority.

(26)

2.1. THE CURRICULUM-BASED TIMETABLING PROBLEM 11

Least Available Periods (APD) The events are ordered increasingly by calculating the ratio of the number of available time slots by the number of unassigned lectures. The idea is to assign courses with the most lectures and lowest availability at the earliest.

Least Available Positions (APS) Works similar to APD, but instead of using time slots, to compute the availability for a course, it uses the number of available positions in the timetable. Similar to APD, this heuristic assigns a higher priority to courses with a large number of lectures and low availability.

Figure 2.1: Example data for the heuristic ordering of five courses with six timeslots and three rooms.

In all cases, these heuristics break ties by using the index of the lecture that corresponds to the course as presented in the problem formulation in a first-come-first-served fashion.

Figure 2.1 illustrates how these heuristics prioritize an example with five events, six time slots, and three rooms. The following lines present a further explanation on the calculations of the heuristics described above and how they make their decisions:

• LE would prioritize the Data Science course, simply because is the one with the largest number of students.

• The priority given by LD is also very simple. This heuristic would prioritize the lectures on the ML course because is the one with the largest number of conflicts.

• The evaluation of the LWD heuristic results in a tie between Algorithms and Data Sci- ence, both resulting in 50 after taking into account the number of conflicts and the enrolled students. The heuristic favors Algorithms in this tie since it appears earlier in the list of courses.

• Both APD and APS assign the highest priority to ML, given the low number of time slots and positions available for the course, respectively. This holds even though the course has a relatively low number of lectures pending assignment.

The example above presents the sequence of courses to be scheduled given the priority assigned by each heuristic. It is worth noting that the heuristics work by prioritizing courses, so during the process of assigning a period to an event or lecture, possible changes in the evaluation of conflicts, available time slots, or available positions would not be taken into account until a new heuristic is applied.

(27)

2.1.5 Other Educational Timetabling Problems

This section presents formulations of the educational timetabling problem that has attracted the attention of researchers in recent years. However, the focus of this work is exclusively CB-CTT. Therefore, although other problems may be very similar and have real-world applications, they are considered beyond the scope for this research work.

Examination Timetabling In examination timetabling, room assignments offer more flexibility when compared to CB-CTT. A room can be shared between two exams, or an exam can be split into two or more rooms if one is too small to accommodate the number of student since the lecturer does not have to be present at all times. Another key difference is that courses must be scheduled within a week that repeats until the end of the term. On the other hand, examination usually takes place at the end of the term, so the time slots available offer more flexibility when compared to course timetabling.

Notable work on examination timetabling includes a fast simulated annealing procedure, which obtains competitive results in a relatively small amount of computational time [71] or a mixed-integer programming method for obtaining lower bounds [10].

School Timetabling Contrary to university timetabling, students of a school are aggregated in classes, where they share the same courses at the same time. Hence, here the assignment of rooms is reduced to a minor concern. Another difference is that schools have typically very dense teaching schedules, so curriculum compactness is also of little concern. A recent survey of the state-of-the-art on school timetabling is presented in [113].

Post Enrolment Course Timetabling PE-CTT is very similar to university timetabling, with the key difference that in this problem the timetable is produced after the student enrolment on courses has taken place. The idea behind this formulation is to maximize the number of options for students when choosing their own timetables. PE-CTT continues to attract the interest of researchers due to its high degree of complexity [51, 115, 121].

While there exist some key differences that distinguish educational timetabling problems from one another, they all share a largely similar structure. Moreover, a procedure able to find a solution for one of them could also be used to find solutions to other timetabling problems, with minor modifications.

2.2 Metaheuristics and Machine Learning Methods

The literature defines a metaheuristic as an iterative procedure that guides a subordinate heuristic by exploring and exploiting the search space in an intelligent manner, with the goal of finding optimal or near-optimal solutions [101]. While metaheuristics can be considered approximation methods, the literature has emphasized their relevance for the CB-CTT, which is why metaheuristics are presented in a different classification for this work. This section presents some metaheuristic methods that have been applied to solve the CB-CTT.

(28)

2.2. METAHEURISTICS AND MACHINE LEARNING METHODS 13

2.2.1 Tabu Search

Tabu search (TS) is a method based on a local search procedure that employs a set of moves for transforming one solution state into another until a stopping criterion is met [50]. Local search methods are prone to get stuck in a local optimum, unable to reach other regions of the search space. TS implements a neighborhood structure for each solution, which is explored during the search process. Thus, the method can move to an improved solution in the neighborhood, guided by tabu restrictions and aspiration criteria.

In simpler terms, TS not only moves towards higher-quality solutions (i.e. hill-climbing), but also looks at the neighborhoods of the solutions from the moves produced by the search.

By also tracking solutions of poorer quality (or bad moves), this method avoids getting trapped in a specific region of the search space. Therefore, the effectiveness of TS is related to the neighborhood structure and the criteria to evaluate the moves made during the search process [63].

The CB-CTT has also been solved using tabu search. L¨u and Hao [77] proposed an adaptive tabu search algorithm with a double Kempe chain neighborhood structure. Their method starts by generating an initial solution using a greedy heuristic, followed by an adaptive stage which combines intensification and diversification, aiming to reduce the soft constraint violations while maintaining the satisfaction of hard constraints. Amaral and Cardal [7] proposed a solution using a rule-based approach to handle the elements in the tabu list and a variation of the compromise ratio to rank the solutions in a neighborhood.

2.2.2 Simulated Annealing

First proposed by Kirkpatrick et al. [66], simulated annealing (SA) is a metaheuristic that assigns a probability to accept worse solutions as a strategy to avoid getting trapped in local optima. The inspiration for naming the algorithm comes from the annealing technique in met- allurgy, which involves heating and a controlled cooling of a material to reduce their defects.

As the search process advances, the method probability of accepting low-quality solutions decreases. Hence, the method increasingly selects solutions that improve on the quality of the current solution.

The probability of transitioning between solutions, as described by Equation 2.6, depends largely on a temperature parameter T , which decreases during the search process. At the start of the process, the temperature and the probability of accepting low-quality solutions are high. However, as they progressively decrease, the algorithm converges into a simple iterative improvement algorithm (i.e. hill climbing).

P =

(1 if ∆c ≤ 0,

e^−∆c/t if ∆c > 0 (2.6)

Applications of SA for solving the CB-CTT in the literature include the work of Bellio et al. [14], who designed and implemented a statistically-principled methodology for the parameter tuning procedure. The authors modeled the relationship between the search parameters and the instance features, allowing the algorithm to generalize for previously unseen instances of the problem. Tarawneh et al. [5] proposed a variation of SA where unaccepted solutions are stored to be used in later stages when the search process gets stuck in a local optima.

(29)

2.2.3 Genetic Algorithms

First popularized by Holland [58], genetic algorithms (GA) are metaheuristics based on the mechanics of natural selection. GAs algorithms start with an initial population, whose indi- viduals are evolved through biologically inspired operators such as selection, crossover, and mutation [52]. The overall fitness is improved by following the Darwinian principle of the sur- vival of the fittest. Typically, a GA requires a genetic representation of the problem domain and a fitness function to evaluate the solutions.

GAs have been used for the CB-CTT. Abdullah and Tarubieh [1] presented a hybrid genetic algorithm that uses a tabu list to control the selection of neighborhoods, The crossover represents a period exchange while preserving the feasibility of the solution, while the mutation is used to swap events, allowing diversification. Akkan and G¨ulc¨u [6] tackled the CB- CTT as a bi-objective optimization problem, presenting a hybridization of the standard GA approach by using simulated annealing. The authors attempted to find good solutions in terms of penalty and a custom robustness measure.

2.2.4 Neural Networks

In recent years, there has been an increasing interest in neural networks (NN) [80, 87]. NNs are machine learning methods able to leverage large amounts of data for clustering, regression, and classification [85, 103]. The ability to correctly generalize unseen examples, even with imbalanced or small amounts of data given as input, has made neural networks a popular field of study [18, 49].

An artificial neural network is composed of artificial neurons, which act as simple pro- cessing units. The network consists of interconnected neurons, where the output of one neuron is the input of another [37]. The neurons are activated through weighted connections established during the training phase. Hence, connections with a low weight represent low importance to the neuron that receives it as input. The neurons are organized into layers, where the middle layers are known as hidden ones. A standard neural network will have every neuron on a layer connected to the adjacent layer [72].

Neural networks were used to solve the school timetabling problem by Carrasco and Pato [27]. In their work, they compared discrete and continuous neural network approaches, finding that the discrete NN had a better performance on a set of five hard problem instances.

Similarly, Smith et al. [105] used modified discrete Hopfield neural networks to solve the university timetabling problem. Their work obtained competitive results compared against metaheuristic algorithms like tabu search or simulated annealing.

2.2.5 Meta-Learning

Meta-learning is a machine learning method based on the idea of learning to learn. This method works by systematically observing the results produced by applying different algorithms to problem instances, rather than from traditional features used to describe the instances themselves [70]. Meta-learning methods have gained much popularity due to their ability to generalize across different domains by benefiting from prior experience with other

(30)

2.3. HYPER-HEURISTICS 15

tasks [3, 98]. Meta-learning approaches have shown significant generalization capabilities even from small amounts of data [42].

A common type of meta-learning is the algorithm selection problem (ASP) [99]. In the ASP, the goal is to relate problem instances and the space of algorithms in such a way as to maximize a performance measure. The meta-learning approach to ASP is based on the use of meta-features, which can be expressed in two different groups. A group of basic meta-features includes general information about problem instances, such as the number of courses or room occupation percentage in the domain of curriculum-based course timetabling. The second group of meta-features, for which we will borrow the denomination of landmarking meta- features, is used to describe the performance of the simple heuristics when applied to different instances, with no concern about the features that describe the given problem instance. Hence, the combination of these groups of meta-features enables a meta-learner, usually implemented in the form of a classifier, to select the best algorithm for new instances of the problem.

In combinatorial optimization, meta-learning techniques have been applied by many researchers to solve problems from different domains. To mention a few examples, Kanda et al [62] studied a meta-learning method for the Traveling Salesman Problem (TSP). In their work, they recommend the best meta-heuristic for new problem instances without having to execute them. They showed that by predicting a rank of meta-heuristics it was possible to obtain better solutions when compared to other simpler selection methods. Gutierrez et al [54] present a meta-learning approach for the Vehicle Routing Problem with Time Win- dows (VRPTW). In their research, they considered two groups of meta-features, to train a multi-layer perceptron to select the best meta-heuristic. They outperformed individual solvers in a reasonable amount of time. Smith-Miles [107] used meta-learning techniques for the Quadratic Assignment Problem (QAP), considering three meta-heuristic algorithms. The study was proposed as a classification problem, using a neural network as a meta-learner to predict which algorithm would perform best. The results encourage the use of meta-learning for combinatorial optimization, even when there is limited data to assess the performance of meta-heuristic algorithms.

2.3 Hyper-Heuristics

Hyper-heuristics emerged as a method to build solution systems which can not only solve individual combinatorial optimization problems, but rather a wide range of problem domains.

In other words, hyper-heuristics aim to raise the level of generality for the operation of optimization systems. Moreover, the term hyper-heuristics was coined by Cowling et al. [32], who described them as “heuristics to choose heuristics”. Burke et al [19] proposed a hyper- heuristic method for automated scheduling which is not restricted to one problem.

The literature on hyper-heuristics lacks clarity regarding a common formal definition of the concept. For this work, we accept the formal definition presented by Pillay and Qu [95].

Hence, a hyper-heuristic is defined as a search method for a given problem p. To find a solution state S, the hyper-heuristic explores a set of heuristics. Heuristic configurations are explored to move from a problem state s to the next s⁰, stopping when the solution state S for problem p is reached.

In a comprehensive literature review on hyper-heuristics, Burke et al. [20] present a

(31)

survey on the state-of-the-art. The authors extended Cowling’s definition of hyper-heuristics to a “search method or learning mechanism for selecting or generating heuristics to solve computational search problems”. Their work presents a commonly accepted classification of hyper-heuristics, distinguishing them by the learning methodology they use and the nature of the heuristic space as shown in Figure 2.2. Then, if distinguishing hyper-heuristics by their learning strategy, they are classified as follows:

Figure 2.2: Hyper-heuristic classification by learning methodology and the nature of the heuristic space as proposed by Burke et al. [20].

Online learning In online learning, the learning takes place in parallel, as the algorithm is solving an instance of the problem. Hence, the problem features are used by the high- level strategy to determine the next heuristic to be applied. Since the algorithm makes decisions as the value for the problem features change, there is no distinction between training and testing phases, common in machine learning methods.

Offline learning Contrary to online learning, this type of hyper-heuristics learn from a set of training instances. In offline learning, the hyper-heuristic collects knowledge in the form of rules during training, which should generalize well to later be applied for solving unseen instances from a related domain. As in online learning, the problem features are mapped to the heuristics.

No learning Hyper-heuristics that do not use feedback from the search process belong to this classification.

Then, according to the nature of the heuristic space, hyper-heuristics can be classified as follows:

Selection hyper-heuristics The basis of selection hyper-heuristics is to choose among existing heuristics. The framework is provided with a set of existing heuristics, and the algorithm must select the most suitable to apply, depending on a given problem state.

The set of heuristics from which the hyper-heuristic framework can choose is generally problem-specific.

(32)

2.3. HYPER-HEURISTICS 17

Generation hyper-heuristics In generation hyper-heuristics, the idea is to create new heuristics from the components of existing ones. In addition to producing a solution, this hyper-heuristic framework is also able to produce new heuristics as outputs, which can potentially be reused to solve other problems.

Furthermore, the literature considers a second level in this dimension. When considering the search paradigms, hyper-heuristics can be classified as follows:

Constructive hyper-heuristics Starting with an empty solution, constructive hyper-heuristics build a solution incrementally. A sequence of heuristics is applied until the final solution state has been reached. There is an interesting challenge in this paradigm in the form of associating the partial states of the solution with the most adequate heuristic to apply.

Perturbative hyper-heuristics Contrary to constructive methods, perturbative hyper-heuristics start with a complete candidate solution. The perturbative framework then modifies one or more components of the solution to move from a solution state to another. The process is repeated until a final solution state has been reached.

2.3.1 Hyper-Heuristics for Educational Timetabling

The literature contains a wide range of examples of hyper-heuristics used to solve educational timetabling problems, particularly the CB-CTT. Qu and Burke evaluated different local search algorithms for use in a selection hyper-heuristic [96]. The hybrid method was employed to search the space of the heuristic combination and the space of the solutions. This hybrid method was tested on both course and examination timetabling problems, achieving competitive results. Soria-Alcaraz et al. proposed an iterated local search hyper-heuristic, alternating between a perturbation and improvement phase [110]. These phases employ offline and online learning, respectively, as feedback from the search process.

Burke et al. proposed a multi-objective hyper-heuristic that employs tabu search for searching in the space of perturbation heuristics [24]. Then, each heuristic is selected to optimize a specific objective during each iteration of the search. Kalender et al. proposed a selection hyper-heuristic employing greedy gradient for heuristic selection and simulated annealing (SA) for move acceptance to guide the search process [60, 61].

Hyper-heuristics based on evolutionary algorithms (EA) are also a popular approach for educational timetabling. Terashima-Marin et al. used a non-direct chromosome representation based on evolving the configuration of constraint satisfaction methods for examination problems [114]. Pillay et al. [91, 92] also implemented an EA hyper-heuristic to search in the space of heuristic combinations using three different chromosome representations, which produced competitive results when compared against that of other hyper-heuristic methods based on other methodologies.

The works cited in this section show the feasibility of applying hyper-heuristics for the CB-CTT and similar problem domains and as a general solution for educational timetabling.

Moreover, evidence on hyper-heuristics successfully generalizing over problem domains [79, 81] makes this an interesting topic of research for the advancement of educational timetabling.

(33)

2.4 Cross-Industry Standard Process for Data Mining

Extracting information from data has occurred for centuries. Manual methods for identifying patterns in data include Bayes’ theorem in the 1700s [33] and regression analysis dating back to the 19th century [112]. However, with the power of computer technology dramatically increasing the capacity for data collection and storage, manual methods gave way to more sophisticated techniques like Knowledge Discovery in Databases (KDD) [41]. KDD introduced data mining as the process of extracting information from large sets of data, usually in the form of patterns, anomalies and correlations for predicting outcomes. Cross-Industry Standard Process for Data Mining (CRISP-DM) is an open standard that describes the steps for data mining [104].

Figure 2.3: Example of the steps described by the Cross-Industry Standard Process for Data Mining (CRISP-DM). The outer circle represents the cyclic nature of the data analysis process.

The CRISP-DM methodology provides a general framework that can work with any project. This methodology for data mining is structured in six sequential phases where the output of one becomes the input of the next. However, the stages in CRISP-DM are flexible in such a way that some phases allow partial or complete revision of previous stages as shown in Figure 2.3. The six phases of CRISP-DM are described as follows:

(34)

2.5. SUMMARY 19

Business Understanding The initial phase of the CRISP-DM methodology focuses on understanding the goals of the project. Once the objectives have been established clearly, the data analysis process can begin in the context of a particular model.

Data Understanding The data understanding phase starts with an initial data collection, followed by strategies to get more familiar with the data, such as Exploratory Data Anal- ysis (EDA) [116]. The preliminary knowledge extracted by applying EDA can be used to identify opportunities with the data.

Data Preparation The goal of the data preparation phase is to process the available raw data to produce the datasets used for modeling. Common tasks include data cleaning [97], feature selection [65] and feature transformation [68].

Modeling In the modeling phase, machine learning algorithms are implemented and applied to the input data. Complementary tasks like parameter tuning [11] usually take place to improve the performance of the models. Since the implementation of machine learning algorithms may have specific requirements regarding input data, it is very common to move back and forth with the data preparation phase.

Evaluation The goal on the evaluation phase is to determine if the produced models satisfy the requirements established on the business understanding phase. The output of this phase is an assessment of the overall quality of the models and the decision to whether deploy them into production.

Deployment The last phase of the CRISP-DM methodology focuses on deploying the created models into production. However, note that the implementation of the models presented in this thesis is beyond the scope of this work.

Existing research recognises the feasibility of solving combinatorial optimization problems under the light of the CRISP-DM methodology. Makrymanolakis et al. proposed an algorithm that combined various local search techniques, guided by CRISP-DM, for solving the Permutation Flow Shop Scheduling Problem (PFSSP) [78]. The authors reported that a great amount of systematic executions were required to extract knowledge from the available datasets, contrary to more empirical random executions. While initially expensive in terms of computational power, the data mining procedure proved useful for parameter control. Haeri et al. presented a hybrid data mining approach based on multi-objective particle swarm optimization (MOPSO) for the Travelling Salesman Problem (TSP) [56]. Data mining was used to extract rules based on the most efficient solutions of MOPSOs. The extracted rules were then applied to solve new TSP instances, improving on the quality solutions obtained by MOPSO in a subset of the instances.

2.5 Summary

This chapter described the most relevant topics related to this work. The theoretical framework focused on the Curriculum-Based Timetabling problem and strategies to solve the problem,

(35)

including metaheuristics, hyper-heuristics, and other machine learning methods. The CB- CTT represents a significant field of study, both for its NP-hard nature and the recurrent task of building timetables by administrative personnel in universities.

The concept of hyper-heuristic introduced in this chapter is quite relevant for this investigation. The next chapters will focus on hyper-heuristics as high-level heuristics that solve a problem by selecting low-level heuristics. These approaches have been widely used to solve combinatorial optimization problems from a wide range of domains [36]. However, this dissertation focuses on hyper-heuristics for the CB-CTT.

The next chapter presents the methodology followed throughout this research work to develop two hyper-heuristic models.

(36)

Chapter 3 Methodology: Hyper-Heuristic Generation Through CRISP-DM

This chapter introduces a hyper-heuristic model generated by following the stages of the Cross-Industry Standard Process for Data Mining (CRISP-DM). The training process based on a data-driven approach provides a model to generalize the patterns of permutations of existing heuristics so that specific combinations increase the performance on a wider set of problem instances. These steps of CRISP-DM are used to generate selection hyper-heuristics.

3.1 Business Understanding

This section presents the objective of the CRISP-DM process, for finding suitable rules and patterns on instances of the CB-CTT. In particular, we want to define in a very concise manner the timetabling problem we are going to study, the characteristics of the problem we consider, and the ultimate goal we aim to achieve.

3.1.1 The Curriculum-Based Course Timetabling Problem

Scheduling is a very important activity for many aspects of life, as timetables are necessary in many different fields like education, healthcare, transportation, entertainment, and so on.

This research focuses on educational timetabling, as this is a recurring activity for the admin- istration of educational institutions that still does not have a trivial solution, being considered among the NP-complete problems.

A widely accepted breakdown of the educational timetabling problem is university course timetabling, school timetabling, and examination timetabling. For this research, we decided to focus on the university course timetabling as it poses the interesting challenge of designing different arrangements of lectures that allow students to choose their own personal schedule, which might be different from their peers. Contrary to this, in school timetabling the students are aggregated in classes, and classes are assigned to specific rooms. Another interesting challenge presented by the university course timetabling is that courses are normally scheduled within a week, repeating through the semester, while in examination timetabling the events take place only once at the end of the semester.

21

(37)

University course timetabling can be broken down into post enrolment course timetabling and curriculum-based course timetabling. Both of them present different challenges, the main difference between them is that in the former the timetable is produced after the student enrolment has taken place, while in the latter the timetable is created based on curricula published by the university. In this study, we decided to tackle the curriculum-based course timetabling, as it corresponds to the way timetables are produced in Tecnologico de Monterrey.

The curriculum-based course timetabling problem (CB-CTT) deals with the allocation of the courses formed by lectures on different timeslots, which are formed by the number of working days and the number of periods per day, subject to some constraints. The distribution of these entities may vary among different instances of the problem, making it difficult to tell if how easy or difficult it will be to find a good solution. For example, an instance with a large number of lectures and a small number of timeslots might look very constrained, but can be easy to resolve if there is a large number of rooms.

These characteristics of an instance can also vary during the process of allocating lectures into timeslots. For example, it might be good to schedule early the lectures that can produce a lot of conflicts whether by sharing students or professors, rather than later when the number of available timeslots has been reduced.

3.1.2 Justification of the Hyper-Heuristic Approach

Previous work on university timetabling problems has demonstrated that combining heuristics yield good results, including research on the CB-CTT [93]. Therefore, the question arises:

how can we find a sequence of heuristics that perform better than others on CB-CTT instances? Experimentation tests were conducted on synthetically generated CB-CTT instances and then compared against random methods. The results encouraged the idea of creating data-driven hyper-heuristic models can obtain competitive results when compared with the traditional heuristics run in isolation. A set of experiments was created to test the best hyper- heuristic model against heuristics commonly used on scheduling problems.

In the next section, we identify the properties which can be used to characterize an instance and how they vary during the solution process with the goal of deciding how easy or difficult it will be to find a good solution. Particularly, how suitable a given heuristic is to different instances of the problem.

3.2 Data Understanding

This section presents the data we consider for understanding the instances of the CB-CTT.

More specifically, this section is divided into two branches: static and dynamic characteristics.

The distinction is made to separate the characteristics that can be used to describe the initial state of an instance and the features that are better suited to describe specific states of the problem throughout the construction phase.

(38)

3.2. DATA UNDERSTANDING 23

3.2.1 Static Features

In this study, we will call static features the set of features that can be used to describe individual instances of the problem and do not change throughout the construction process of assigning lectures. These features are common in the literature [17] and include features related to the size of the problem like the number of lectures, courses and curricula. Features that describe how many positions an instance will have to accommodate lectures include the number of available rooms, the number of working days in the week, and the periods per day are such characteristics.

Other static features can be computed from the ones mentioned previously with the idea of establishing the relationship between the individual features. Some of these calculated static features include the total available positions, the average conflicts among lectures, the average teacher availability, and the average room occupation.

The total available positions P represent the number of spaces where a lecture can be allocated. This is calculated simply by multiplying the number of rooms r by the number of working days in the week d and the periods p formed by each day. Formally, the positions can be calculated as Equation 3.1.

P = rpd (3.1)

To calculate the average percentage of conflicts among lectures, let Kc denote the conflicts of a lecture of course with others. Conflicts between pairs of lectures occur if they cannot be scheduled for the same timeslot, whether they share the same teacher, course or curriculum. Let C denote the set of all courses and lc the number of lectures on course c, then the set of all lectures L will be the sum of every lecture l on every course. Therefore, the average conflict percentage Co can be computed as Equation 3.2.

Co=

n

P

cC

(k_cl_c)

L (3.2)

Let u_cbe the number of unavailable periods for course c. The average teacher availability A is represented by the set of periods where lectures a course lc must not be scheduled.

Then, we can compute the unavailability percentage for lectures from a single course as Equa- tion 3.3. Therefore, the average A for every lecture can be calculated as Equations 3.3 and 3.4.

a_c = 1 − u_c

pd (3.3)

A=

n

P

cC

(aclc)

L (3.4)

Finally, the average room occupation O can be computed by dividing the total number of lectures l by the total number of positions computed by Equation 3.1. Formally, O can be computed as:

O = l

rpd (3.5)

Instituto Tecnologico y de Estudios Superiores de Monterrey

Monterrey Campus

School of Engineering and Sciences

Exploring Data-Driven Selection Hyper-Heuristic Approaches for the Curriculum-Based Course Timetabling

Carlos Alfonso Hinojosa Cavada

Master of Science

Computer Science

Declaration of Authorship

Dedication

Acknowledgements

Exploring Data-Driven Selection Hyper-Heuristic

Approaches for the Curriculum-Based Course Timetabling by

Carlos Alfonso Hinojosa Cavada Abstract

List of Figures

List of Tables

Contents

Chapter 1 Introduction

1.1 Motivation

1.2 The Curriculum-Based Course Timetabling

1.3 Hypothesis

1.4 Objectives

1.5 Methodology

1.6 Contributions

1.7 Document Structure

Chapter 2

Theoretical Background

2.1 The Curriculum-Based Timetabling Problem

2.1.1 Definitions and Formal Representation

2.1.2 Exact Methods for Solving the CB-CTT

2.1.3 Approximation Methods for Solving the CB-CTT

2.1.4 Constructive Heuristics for Solving the CB-CTT

2.1.5 Other Educational Timetabling Problems

2.2 Metaheuristics and Machine Learning Methods

2.2.1 Tabu Search

2.2.2 Simulated Annealing

2.2.3 Genetic Algorithms

2.2.4 Neural Networks

2.2.5 Meta-Learning

2.3 Hyper-Heuristics

2.3.1 Hyper-Heuristics for Educational Timetabling

2.4 Cross-Industry Standard Process for Data Mining

2.5 Summary

Chapter 3

Methodology: Hyper-Heuristic Generation Through CRISP-DM

3.1 Business Understanding

3.1.1 The Curriculum-Based Course Timetabling Problem

3.1.2 Justification of the Hyper-Heuristic Approach

3.2 Data Understanding

3.2.1 Static Features