CAPÍTULO III JUECES: ENTRE EL PODER DE UN JEFE POLÍTICO
3.2 El sistema judicial: entre un jefe político
Following discussions with the University’sInformation Systems ServicesInformation and Systems Services (ISS) Department and DCU’s Institutional Research & Analysis Officer we received approval to use data collected by the University IT systems relating to student WiFi connections on campus under strict security and anonymisation control, as described earlier in Section1.4. This approval also conformed to the Universities ethics guidelines. Approximately 1,000 Network Access Server (NAS) units located across the main Glas-
nevin campus capture and control all wireless traffic active on the WiFi network. Eduroam log data contains the captured connections from all devices connecting to the WiFi network on the campus. The specific log our research is interested in is called the “auth-details” log. This log captures the request made by a WiFi-enabled device such as smartphone, laptop or tablet, to access the network. For network efficiency reasons these files are saved to one of two separate servers maintained by the University’s IT Department. These servers Orcus2 and Senda2 have been configured to record the network activity on a continuous basis.
A sample of the data contained within a single request packet or log entry is shown below: • Tue May 12 00:00:21 2015 • Packet-Type = Access-Request • User-Name = ”0cj8sn5isbr4ojtna9ne678hg439nhed” • Acct-Session-Id = ”1ADDD1B4-F437B7DFAC71-0000003805” • Calling-Station-Id = ”F4-37-B7-DF-AC-71” • Called-Station-Id = ”FC-0A-81-DE-1C-F1:eduroam” • Vendor-388-Attr-2 = 0x656475726f616d • NAS-Port = 6 • NAS-Port-Type = Wireless-802.11 • Framed-MTU = 1400 • Service-Type = Framed-User • NAS-Identifier = ”Res CP W514” • NAS-Port-Id = ”radio1” • Vendor-388-Attr-17 = 0x5265735f43505f57353134 • Vendor-388-Attr-19 = 0x46432d30412d38312d44442d44312d4234 • Vendor-388-Attr-23 = 0x5265735f4873655f31372d3139 • Connect-Info = ”CONNECT 72.2Mbps 802.11bgn” • EAP-Message = 0x0201000a01636f786337 • NAS-IP-Address = 136.206.23.253 • Proxy-State = 0x000100051addd1b4001a
• Message-Authenticator = 0x8e4c386a5adcaafe0faaef82e614cf7d
This packet contains a mixture of data fields of which some will be identified as useful. As part of our process definition we identified the minimum information from a request packet required to identify student location. We identified the following fields as being of important to our research. Using Python we developed a model to differentiate between the useful data,
• Date and Time of the access request • User Name
• Session Id, assigns for the duration of that user’s visit (session) • Calling-Station-Id, is the devices unique id making the request • Called-Station-Id, the id of the NAS recording the request
• NAS-Identifier, a descriptor provided by the network administrator • NAS-IP-Address, unique physical address of NAS
and the superfluous fields that would not add value to our analysis, such as . . . • Vendor-information
• Connect-Info • EAP-Message • Proxy-State
• message-Authenticator
The fields we identified as containing the minimal data upon which we could base our research and necessary to be able to identify individuals by date, time and location are outlined here: When: • Mon Jun 2 16:04:35 2014 Who: • User-Name = ”0cj8sn5isbr4ojtna9ne678hg439nhed” • Acct-Session-Id = ”0B34FF44-28E02C402CD2-0000073359” • Calling-Station-Id = ”28-E0-2C-40-2C-D2”
Where
• Called-Station-Id = ”5C-0E-8B-26-2A-50:eduroam” • NAS-Port-Type = Wireless-802.11
• NAS-Identifier = ”Sports Gallery”
• Connect-Info = ”CONNECT 65Mbps 802.11bgn” • NAS-IP-Address = 136.206.23.253
The data collected per log can be categorised as When the connection event occurred, Who was making the request and Where was the request being made from. The quantity of data made available for the research was extensive as Table3.1outlines. The year 2014 contains WiFi log data from September 2014 to December 2014 inclusive, which is the first semester of the academic year 2015 (2014/2015), both 2015 and 2016 contain two semesters of WiFi data and 2017 contains the second semester (Jan - May) of the 2017 academic year.
Year Size (Gigabytes)
2014 44 GB
2015 99 GB
2016 96 GB
2017 24 GB
Table 3.1: Volume of raw log file data used in experiment
All of the data cleansing, preparation and transformation stages were carried out using a bespoke Python system that reads in the data log files containing the log data, removes all unnecessary data and stores a single day’s log file per server in .csv format.
We classify the raw input file as containing semi-structured data comprising of non- linear fields which vary in order from record. Having extracted our required fields, we needed to standardise the data within each field, for example the ”Called-Station-Id” which collects the NAS identifier or MAC address, may be formatted as:
“5C-0E-8B-26-2A-50:eduroam” or
“5C-0E-8B-26-2A-50”.
We therefore standardised these addresses to “5C-0E-8B-26-2A-50”.
different servers (Orcus2 & Senda2), from these files we created a single file representing the WiFi activity for each day.
Access requests for access to Eduroam generate a number of logs per request and this occurs due to the handshaking or conversation between the device and the network as the authentication process is completed. Each packet in this conversation contains the same session id (Acct-Session-Id) allowing both parties to the conversation to differentiate it from other conversations on the network. We will be focusing our attention on the first of these tagged entries as it indicates the first request by the device for access to the NAS and therefore the most likely arrival time for the student at that location. We remove all other duplicate records from our dataset to avoid any form of double counting.
In addition to the access requests made by those associated with the University, the net- work will receive requests for access to the wireless network from University visitors who have Eduroam credentials from other institutions. As part of our data preparatory process we removed these non DCU based requests from the dataset. Once removed we believe we have a clean, robust dataset on which we can carry out our analysis. The following section 3.2.2give a brief illustration of the ability of this data to be used to identify activity