• No se han encontrado resultados

CAPÍTULO I. EL CONTROL TELEMÁTICO Y SU APLICACIÓN EN EL

4. La aplicación del control telemático en el ámbito comparado

4.2. Ámbitos de aplicación del control telemático

4.2.9. Síntesis

To generate the association rules, the minimum support used was 0.1, i.e. the rule needs to have occurred at least 10% of the time to be considered. Moreover, the minimum confidence value assumed was 0.9, i.e. we would want to be more than 90% confident that the rule is applicable. These values were chosen to ensure that the rules are frequent and interesting enough to be considered in an educational setting [286]. Based on the aforementioned settings, the following rules have been generated:

• Engagement level=H & Quiz1≥90=⇒Course Grade≥ 90:

The rule had a support = 0.2, i.e. 20% of all students were classified as highly engaged, had a grade higher than 90 in their quiz, and completed the course with a grade higher than 90. The rule had a confidence of 1 and a lift of 4.02. This means that all the students who were highly engaged and performed well in their quiz ended up excelling in the course. The high lift value shows that the components are highly correlated. This means that engagement level can indeed be used as a predictor for academic performance.

• Number of Logins≥60 & Quiz1≥80=⇒ Course Grade ≥ 80:

The support of this rule is 0.61, which indicates that around 61% of the students logged into the system more than 60 times. This is an indication of being moder- ately or highly engaged as per the results obtained in [210]. The rule’s confidence

is 1 and the lift is 1.37, indicating that all students who were moderately or highly engaged and performed well in their quiz had a high final course grade. Moreover, the lift value indicates that there is a positive correlation between the antecedent and consequent of the rule.

• Engagement level=M & Quiz1≥70 =⇒ 70≤Course Grade≤90:

This rule had a support of 0.4, confidence of 0.93, and a lift of 1.1. This shows again that there is a correlation between engagement and academic performance. This is because students that were moderately engaged tended to get a moderate grade in the course.

• Engagement level=M =⇒ 70≤Course Grade≤90:

This rule further emphasizes the previous obtained rule. The rule had a support of 0.4 as well. The confidence is 0.9 and the lift is 1.1 too.

In addition to the obtained rules, the average grade of the students is calculated based on their engagement. It can be shown that students with medium engagement had an average final grade of 77 while those that were highly engaged had an average final grade of 79. On the other hand, students that were lowly engaged had an average grade of 61. Moreover, when conducting the t-test at 5% significance level, it is shown that the grades of the low engagement students (target group we are trying to identify) is statistically different than those with medium and high engagement. This cements the hypothesis that student engagement is highly correlated with academic performance even in a blended e-learning environment. Hence, it can be used as a predictor and identifier of students that may need help based on their engagement in the course.

7.7

Conclusion

The field of e-learning has become more popular due to the proliferation of tech- nology. Among the challenges facing e-learning is keeping the students engaged and motivated. This is especially important due to the impact that engagement has on aca- demic performance. To that end, this chapter proposed the use of apriori algorithm to investigate the impact of engagement on students’ academic performance in a blended e-learning environment. Nine engagement metrics that include both interaction-related

as well as effort-related metrics were identified and calculated from the event log dataset available and combined with the grades to form a new dataset composed of eighteen fea- tures. Apriori algorithm was used to generate rules that studied the impact of students’ engagement level on their academic performance. Experimental results showed that there is a high positive correlation between engagement and academic performance. This was shown through the association rules generated which stated with high confidence and lift that students with higher levels of engagement tended to perform better in the course. Due to this positive correlation, engagement can be used as a predictor of the students’ academic performance. This in turn can be used to identify the unengaged students who may need help with the course and work with them to improve their engagement and possibly their performance.

Chapter 8

DNS Typosquatting Domain Detection: A

Data Analytics & Machine Learning Based

Approach

8.1

Introduction

Domain Name System (DNS) protocol, is the standard mechanism for name to IP address resolution [287]. Name servers generally maintain complete name/address information about a particular zone in a file known as the zone file. Every zone needs to provide a primary and a secondary name server to enhance resiliency, redundancy and load balancing. Hence, the DNS protocol is one of the core components in today’s Internet given that it helps users locate servers and mailing hosts [287].

However, DNS suffers from lack of data integrity and origin authentication pro- cesses. This makes it vulnerable to a variety of security concerns and breaches as shown by the various DNS attacks in recent years [289, 290]. For example, the distributed denial of service (DDoS) attack on Dyn in October 2016 was one of the largest attacks of this kind with a reported strength of 1.2 Terabytes/s (Tbps) [291, 292]. This attack managed to bring down a significant portion of America’s Internet Service [291, 292]. Another example is the attack on a Brazilian Bank’s website in which attackers redirected all of the traffic targeted to the bank’s website to their own servers by changing the DNS registrations of all the bank’s domains [293].

Among the many vulnerabilities and security challenges of DNS protocol is the issue of typosquatting. Typosquatting refers to the registration of a domain name that

is extremely similar to that of an existing popular brand (ex: www.google.com and

www.goggle.com,www.paypal.com andwww.paypa1.com,...). The goal is to redirect unsus- pecting users to malicious/suspicious websites by registering confusingly similar domain names that the user might not pay attention to [294]. This is particularly important given that it can be a threat to corporate secrets [295]. Moreover, it can be used to steal information or commit fraud [295].

For practical security and availability reasons it is important that DNS is able to tolerate failures and attacks [290]. That is why a variety of techniques have been pro- posed in the literature to combat and protect against failures and attacks. For example, the DNSSEC protocol, which is a set of security extensions added to the original DNS protocol, has been proposed to address some of the existing vulnerabilities by providing data integrity and origin authentication [296, 297, 298]. However, DNSSEC is still prone to other attacks such as synchronization attacks and amplified denial of service attacks [290, 299]. Therefore, it is crucial that new efficient detection algorithms are developed that can help identify malicious/suspicious queries and protect systems from the various attacks.

In this chapter, a machine learning-based approach is proposed to tackle the ty- posquatting DNS vulnerability. To that end, exploratory data analytics is first used to better understand the behavior and trends observed in eight features that characterize the domain name. Furthermore, a majority voting-based ensemble learning classifier that is based on five traditional supervised machine learning classification algorithms is pro- posed that can detect suspicious domains. Moreover, the observed trends are verified by studying the same features in an unlabeled dataset using unsupervised machine learning clustering algorithm and through applying the developed ensemble learning classifier.

The remainder of this chapter is organized as follows: Section 8.2 presents the main vulnerabilities and security challenges facing the DNS protocol. Section 8.3 gives an overview about the methodologies that have been proposed in the literature as well as other potential methodologies to adopt. Section 8.4 presents the proposed approach adopted in this work. Section 8.5 describes the two dataset used in this work including the extraction of the considered features. Section 8.6 discusses the experiment details and results. Finally, Section 8.7 concludes the chapter.

Figure 8.1: DNS Vulnerabilities and Challenges