Once the location of each activity is estimated, the next step is to select the servicing cell towers. In real world situations, the selection of the cell which enables a call, SMS or data transfer to a mobile device is influenced by factors which include, among others, cell network topology, cell congestion and the perceived distance between the cell tower and device, which takes account of current channel characteristics and transmission power. To model this phenomenon, cell tower connection probabilities are derived from all relevant GPS-tracked mobile phone cell tower connection traces sourced from OpenCellID [196]. Cell tower connection probabilities allow for the deterministic localisation of a mobile device by isolating the most probable area the device was located while being serviced by that cell. This measure is then used to evaluate the likelihood of a cell servicing an activity which occurred at the location (Ax, Ay). By
evaluating this measure for each cell in the network of interest, the selection of which cell services the activity may be determined stochastically.
The OpenCellID [196] database allows users to upload their GPS locations with their corresponding detected cell tower IDs via a client-side application installed on their mobile devices. Data from this database for the network under investigation amounted to just over 21,000 entries. These entries spanned the entire countryside, mostly along main roads and in cities, although a significant proportion were located in rural areas. The spatial distribution of activities is displayed in Figure 4.7.
To form cell tower connection probabilities, the distance between OpenCellID GPS recordings and the cell tower to which they are connected is tabulated. Each measurement may then be collated into a single distribution which describes the observed distances users were from cell towers while being serviced by them. If sufficient data were available, it would be possible to derive cell tower specific connection distributions from the GPS sourced information. In practice, the low number of samples and the biased sampling nature of the GPS recordings (e.g. many collected along roads but few for forests, etc.) means that individual connection distributions are not suitable for inference purposes. Instead, using the aforementioned distance measurements, it is possible to estimate a generic probability density function (pdf) which measures the likelihood of connecting to a cell as a function of the distance from the cell tower location. This distribution is displayed in Figure 4.8. Note the distribution
of connection distancesis in general unique to each cell tower, reflecting the practical factors influencing their connection characteristics.
Figure 4.7: Distribution of GPS recordings from OpenCellID from devices active on the
Meteor Network 0 5 10 15 20 25 0 0.02 0.04 0.06 0.08 0.1 0.12 s(km) P ( s )
Figure4.8: Distribution of the observed distances OpenCellID recordings were from servicing cell site locations.
A commonly observed phenomenon with GPS recordings is for repeated samples to cluster into dense regions once the device is stationary [197]. These dense regions are clearly evident within OpenCellID recording, as illustrated in Figure 4.9, and do introduce peaks within the derived pdf (i.e. the peak at the 5 km mark in Figure 4.8). As a result, it is necessary to remove such areas as they unduly bias particular distances. To isolate these regions, samples were grouped using DBSCAN [179], a density based clustering algorithm with noise. Each identified cluster was then replaced with a single point measurement located at its centroid. The aforementioned pdf is then reconstituted and is as depicted in Figure 4.10.
Figure4.9: Irregularities (red circle) in OpenCellID recordings
A single distribution characterising the probability of connecting to a cell, which accounts for varying cell size, is constructed by scaling each OpenCellID recording by the theoretical radius of the connecting cell tower (Cr), as estimated from the Voronoi cell polygons. The
resulting pdf, illustrated in Figure 4.11, measures the likelihood of connecting to a cell as a function of the normalised distance ˆsfrom the cell tower location, where
ˆ
s= s
Cr
0 5 10 15 20 25 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 s(km) P ( s )
Figure 4.10: Distribution of the observed distances OpenCellID recordings were from
servicing cell site locations with irregularities removed.
0 5 10 15 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 ˆ s Pn (ˆ s )
Figure4.11: Distribution of likelihood of connecting to a cell as a function of the normalised distance ˆsfrom the cell tower location.
Using this pdf, the probability of connecting to a cell while being at least a distancesfrom a cell with connection radiusCris given by
P( ˆs> s/Cr)= P( ˆs> s∗)=
∞
w
s∗
Pn( ˆs)ds (4.7)
Given that connections can be assumed to occur on a 2D plane, the probability density function for connecting at a given radiusψand a bearingθto the cell towerCcan be expressed as
PC( ˆψ, θ)=
Pn( ˆψ)
where ˆψ = ψ/Cr andψis the Euclidean distance from the activity location (Ax,Ay) and cell
centroid (Cx,Cy). Angleθis as depicted in Figure 4.12.
Cr ψ θ
(Ax, Ay)
(Cx, Cy)
Figure4.12: Illustration showing the calculation of distanceψand angleθ.
The probability of an event occurring within a selected region or area, Aβ, is given by
r
AβPC( ˆψ, θ)dAβ. This can be approximated by summing over an evenly distributed grid of discrete point measurements which fall within the enclosed region, that is
P(Aβ)≈
MA
X
i=1
PC( ˆψi, θi)∆i (4.9)
where∆iis the area of theithgrid pixel, ( ˆψi,θi) are the polar coordinates of pixel centre andMA
is the number of points which fall within areaAβ. For a uniform grid∆i = ∆∀iand hence
P(Aβ)= ∆
MA
X
i=1
PC( ˆψi, θi). (4.10)
An example illustrating the spatial dispersion of connection probability from two cells of varying radius is depicted in Figure 4.13. To enable a direct comparison, each figure has been normalised to the same scale.
Given the position of an activity (Ax,Ay) along a travel path, as determined in Section 4.3.1,
PC( ˆψi, θ) for each cell in the network under investigation is tabulated. The choice of which
cell to select is then stochastically sampled from 50 of the top ranked cell towers. Fifty cells are selected as a compromise between spatial variance and realistic network behaviour. An example of two simulated CDR trajectories is depicted in Figure 4.14.
(a) (b)
Figure4.13: The spatial dispersion of connection probability from selected cells: (a) Cell with a radius of 2 km; and (b) Cell with a radius of 10 km.
1.5 2 2.5 3 x 105 0.5 1 1.5 2 2.5 x 105 Easting Northing T J
Servicing Cell Locations
(a) 1.5 2 2.5 3 x 105 0.5 1 1.5 2 2.5 x 105 Easting Northing T J
Servicing Cell Locations
(b)
Figure4.14: Simulated CDR trajectories, J, travelling along travel paths of interest,T, between Dublin City and Cork City: (a) Path following rail line; and (b) Path following motorway.