2 ESTUDIO TÉCNICO
2.3. Descripción de la instalación
2.4.2 DIMENSIONADO GENERAL DE LOS EQUIPOS
2.4.2.4 Enfriador y precalentador
As described in the literature, physical visits of POIs and events depend on a
complex set of factors related to the individual and social group attending as well
as contextual and entity-specific factors. Due to a lack of personal user data, many
of these relevant factors cannot be determined. For reasons of privacy protection,
Facebook does not publicly offer this information. Thus, estimates regarding offline
popularity must be based on contents that are publicly accessible and not privacy-
critical at the same time.
troduced by the crowdsourced character of social media. As can be seen in the
literature, thematising the representativeness of social media is determined by the
over- and underrepresentation of certain social groups. In fact, the user-generated
contents offered on these platforms represent only the interests of certain parts of
society.
6.2.1 Adjustment using a reference data source
As OSM is a general mapping resource, it is assumed that the thematic distri-
bution of its place objects corresponds to the actual real life occurrences. Based on
a high number of contributors especially in Germany, the information contained
is expected to be unbiased and highly detailed. Facebook, on the other hand,
maintains a widely uncontrolled POI database without public review or correction
processes. Thus, the over- and underrepresentation of the Facebook dataset is
examined based on the thematic differences between both datasets.
First, city center coordinates for the 70 largest cities in Germany are used
as a basis for creating quadratic polygons that define the focus area. As parking
is an issue mainly in urban contexts, the representativeness of this section is as-
sumed to be higher than the results when all of Germany is taken into account.
Moreover, while OSM data is available on a nationwide basis, the data acquisition
applied for the Facebook dataset limits its availability to urban areas. A nation-
wide comparison would lead to biased results that are avoided by focusing on city
areas only. For the quadratic polygon size, an edge length of 25 km is chosen to
achieve a sufficiently large subset of OSM and Facebook POIs for these areas. Tak-
ing into account the categorical matches defined in the course of the data source
benchmarking (Appendix F3), the sum of thematically similar objects over all city
polygons is calculated for both data sources. Subsequently, the relative difference
the raw online popularity values. As exemplarily displayed in Figure 27(a), if the
sum of Facebook objects is lower then the corresponding value on the OSM side, it
is assumed that this POI category is underrepresented in Facebook. In this case,
the given popularity attributes must be increased to correct the observed bias.
(a) General adjustment principle (b) Relative POI count differences by category
Figure 27. Adjusted popularity with reference data source
Figure 27(b) shows several exemplary OSM categories and their relation to
corresponding Facebook object counts. While certain categories are almost equally
represented, for example restaurants, the relative differences vary greatly for other
categories. Bakeries, for instance, are found to be highly underrepresented in Face-
book with about 84% less objects counted for the test area compared to OSM. Bars
and ice cream places, to the contrary, are meaningfully more represented in Face-
book than in OSM. Figure 28 shows the distribution of relative object occurrences
using a histogram with ten bins. Thin lines at the bottom of the diagram indi-
cate occurrences of single values. It can be seen that the section indicating higher
directly into adjustment factors would practically remove the popularity informa-
tion. Thus, the general extent of applied adjustment is limited to account for only
up to 50% of the observed popularity. To make an example, a bakery with 100 fans
on Facebook qualifies for a 84% popularity increase. Limiting the extent of the ad-
justment applied, an adjusted popularity of 142 fans is calculated. This procedure
is repeated similarly with check-ins and the number of people talking about the
POI. As the weighted influence of the adjustment procedure is arbitrarily chosen,
additional feature sets using 30% and 70% adjustment influence are generated.
Figure 28. Facebook POI popularity adjustment factor distribution
Before the adjustment of online popularity can take place, categories with less
than ten objects for any of the two data sources are removed from the analysis
set. Small absolute differences among object counts for these samples would result
in large popularity adjustment factors that do not reflect the general distribution.
Taking into account this filter logic, only 69 OSM categories fulfill all described
21% of all 1,124 categories. In total about 254,000 Facebook POIs are affected by
the category-based popularity adjustment. This corresponds to about 19% of the
filtered dataset. Popularity adjustment with OSM as reference data source cannot
be conducted for the remaining POI objects.
For events, no generalized data source exists that reflects objects for various
genres and target groups in an equal manner. The benchmark data sources taken
into account, Ticketmaster and Eventbrite, are both considered biased based on
their commercial focus. These platforms generally have a low interest in promoting
free events as there is no direct revenue potential regarding their business mod-
els. In fact, popularity adjustment based on a reference data source equal to the
procedure applied on the POI side is not possible.
6.2.2 Adjustment using domain knowledge
As there are no publications describing patterns or generalized influence fac-
tors for physical visits of POIs, findings from literature cannot be used to define
adjusted popularity features. However, regarding public events, domain knowl-
edge was published by the platform Doorkeeper [121] that is used as a basis for
adjustment. As Doorkeeper is a Japanese event platform with a comparably small
amount of offered events and a tendency towards professional themes, the data
source describes a rather specific subgroup of all possible events. However, the
information published provides interesting insights covering a general rather than
an individual level. Domain knowledge with regards to the influences of weekday
and event size is provided and displayed in Figure 29. Also, experiences related to
event pricing are provided but cannot be accounted for in the context of popularity
adjustment as Facebook does not provide corresponding data for cross-reference.
For each event in the Facebook dataset, the respective weekday is extracted
(a) Check-in rate by weekday (b) Check-in rate by event size
Figure 29. Event domain knowledge from Doorkeeper; based on [121]
attendance rates. Detailed information regarding the check-in rate depending on
the number of event participants is only available in a graphical format. Thus,
data points are manually copied and used to fit a trend line into the presented data
sample. Size-dependent popularity is adjusted in accordance with this function. As
only the value range up to 200 participants is presented, larger events are assumed
to fall into the range of approximately 50% offline attendance. Other external data
also reports confirming information for this range [120].
No domain knowledge is available with regards to the interaction of both
considered factors, weekday and event size. Thus, the mean of both is calculated
and applied to the popularity attributes in the Facebook event database. This
represents an equally weighted relationship. Moreover, two additional feature sets
are generated with directed weighting to each of the factors using a ratio of 70:30
respectively.