=TAMR
6.6 ANÁLISIS DE MERCADO
6.6.1 SEGMENTACIÓN DE MERCADO
We first evaluated our geotagger against the ground truth, which is a dataset of 500 tweets randomly sampled and manually geotagged. As displayed in Table 6.7, OzCT geotagger has achieved 81% recall for the detection of definite locations and its precision reaches 80.19%
which results in the F1-score of 80.40%. These results indicate that when there is a definite location reference in the tweet content, the geotagger detects them in over 80% of situations with the assurance of 80% correctness. In addition, our approach promises precision of 70.14% on no-location detection when there is no geographical reference in the tweet content;
whereas, this number decreases to 57.46% for resolving ambiguous locations, which is the most complex status in geotagging tweets.
We studied the OzCT geotagger’s behaviour in more detail and identified two reasons for this considerable drop. One problem is with the gazetteers used. Google Map API is designed to retrieve the most relevant geographical location for the location entities and neglects the less frequent results. Another issue is that the toponym resolution phase could be further improved by using more heuristic rules. In addition, the proposed ontological model may be
Results and Discussion 138
STATECITYSUBURBSTREET TWEET(#)300110856441 AlchemyAPINERTRUEDETECTION(#)362871 ACCURACY(%)32.7230.5810.932.43 Yahoo!GeoPlanetTRUEDETECTION(#)5334144 ACCURACY(%)48.1840.0021.879.7 OzCTGEOTAGGERTRUEDETECTION(#)69603921 ACCURACY(%)62.7270.5860.9351.21 Table6.5:AccuracyoftheGeographicalFocusofDefiniteLocationsinThreeGeotaggers(ConfusionMatrix) TWEET(#)geotaggingTime(sec) PerTweetTotal AlchemyAPINERYahoo!GeoPlanetOzCTGeotaggerAlchemyAPINERYahoo!GeoPlanetOzCTGeotagger 10.830.911.640.830.911.64 100.830.911.648.309.1016.40 1000.830.911.5283.0091.00152.00 5000.830.911.43415.00455.00715.00 Table6.6:GeotaggingTimeofThreegeotaggingSystems
Results and Discussion 139
Recall Precision F1 score
DEFINITE 81.00 80.19 80.40
AMBIGUOUS 57.46 59.23 58.33
NO-LOCATION 71.21 70.14 70.67 Table 6.7: Recall, Precision and F1-score for OzCT Geotagger
Figure 6.2: Geographical Focus of Three Geotaggers
able to resolve such ambiguities.
In a separate experiment, for measuring the accuracy of the OzCT geotagger’s geographi-cal focus, we tested our approach in recognising definite location with two different automated geocoding systems, Yahoo! GeoPlanet and AlchemyAPI NER were used to geotag the same dataset. As Figure 6.2 illustrates, all three platforms perform reasonably well to detect ge-ographical references at the level of “state” and “city”, although their degree of coverage varies. Table 6.5 indicates that the AlchemyAPI NER detects 30% of tweets containing state and city references. Similarly, Yahoo! GeoPlanet service shows a better performance by an increase to 45% accuracy. These numbers significantly improve in our geotagger to 62.92%
Results and Discussion 140
Figure 6.3: Precision and Accuracy of Geographical Focus of OzCT Geotagger
and 70.58% for detection of states and cities respectively. From another perspective, Alche-myAPI and Yahoo! GeoPlanet perform poorly when there are suburb references in the tweet content (on average 15%), whereas, the OzCT geotagger detects suburbs in more than 60%
of situations as well as it performs correctly in over half of the conditions where there are street-type references.
Time Analysis
From another perspective, the comparison of geotagging time for three systems is shown in Table 6.6, which explains some of the above behaviour of each system. We summarise our findings as follows.
1. The total time of geotagging tweets in OzCT geotagger is nearly two times slower than the two other geocoding systems.
2. The time of geotagging a tweet does not change for any number of tweets for Alche-myAPI NER and Yahoo! GeoPlanet platforms, whereas, this number decreases in OzCT geotagger by an increase of the tweet number. As a result, the total geotagging time of tweets is gradually decreased in our method.
The results can be interpreted in a way that in designing OzCT geotagger, we emphasised more on street and suburb detection by applying heuristic rules at the toponym resolution phase; whereas, in most geocoding systems, an identified location entity is considered as a
Results and Discussion 141
success and those systems do not query more detailed results. This could be because such systems are designed to detect the most detectable location in unstructured data and do not necessarily seek more details once a geolocation match is identified. Whereas, OzCT geotagger aims at detecting location references in the tweet content by the highest granularity. These views then explain why our method is almost two times slower than those systems due to time spend on more calculations.
Another explanation for such different behaviour could be that Yahoo! GeoPlanet and AlchemeyAPI are not exclusively designed to specify location(s) in the tweet content consid-ering the limitations with tweet (For example short length, noisy content). This can be the reason why the geotagging time of a tweet does not change for any number of tweets in these systems – AlchemyAPI NER (0.83 sec) and Yahoo! GeoPlanet (0.91 sec) – because these sys-tems most likely perform the same process of location detection for all tweets. In comparison with OzCT geotagger, the geotagging time of a tweet decreases over time because our method performs location detection process in a more intelligent way. For example, OzCT geotagger stores the detected location keywords with the results from gazetteers’ search in a map dur-ing the run time. By finddur-ing the same location keyword(s) in next tweets, the program first searches the map before checking the gazetteers which results in reducing the overall time of geotagging process. This is also shown on Algorithm 4.1 in the second IF-ELSE statement.
We estimate that by evaluating OzCT geotagger against specific geocoding systems for tweets – to the best of our knowledge, there is no such tweet geotagging system so far – the results would be different.
Finally, we compared the precision of the OzCT geotagger for detecting three types of location with the accuracy of the geographical focus particularly for detecting a definite location given in Figure 6.3. The comparison indicates that almost two thirds of the testing dataset is geotagged as a definite location. Additionally, the OzCT geotagger is capable of location detection at different granularities (For example state, city, suburb and street) for almost a quarter of the datasets with definite locations. These figures are based on the 500 sample tweets and the results might be different for another dataset with larger or smaller scale. However, due to the nature of the testing dataset used in this research (a sample of
Results and Discussion 142
500 tweets out of over 22,000), we expect the overall behaviour of the OzCT geotagger will be similar for other tweet collections.