1. GENERALIDADES
1.4 Objetivos
1.4.2 Objetivos Específicos
The MediaEval Placing Task 2014 (ME2014PT) required that participants use systems that automatically assign geographical coordinates (latitude and longitude) to Flickr photos and videos using one or more of the following data: Flickr metadata, visual content, audio content, and social information (see Choi et al. (2014) for more details about this evalua- tion). The ME2014PT training data consisted of 5,000,000 geotagged photos and 25,000 geotagged videos, and the test data consists of 500,000 photos and 10,000 videos. This data has been extracted from the YFCC100M11 dataset (Yahoo Flickr Creative Commons
100M) (Thomee et al., 2015). This resource has 99.3 million images and 0.7 million videos. 6.5.5.1 Official Experiments at MediaEval Placing Task 2014
A set of four experiments was designed for the MEPT2014 (Main Task) test set of 510,000 Flickr photos and videos (see the description of the experiments in Table 6.13 and the results in Figure 6.12 and Table 6.14):
1. The experiment run1 used the HLM approach with Re-Ranking up to 100 km and the MediaEval 2014 training set metadata as a training data. From a set of 5,050,000 photos and videos of the MediaEval 2014 training set, a set of 3,057,718 coordinate pairs with related metadata info were created as textual documents and then indexed with Terrier.
2. The experiment run3 used the GeoKB approach.
3. The experiment run4 used the GeoFusion approach with the MediaEval training cor- pora.
4. The experiment run5 used the GeoFusion approach with the MediaEval training cor- pora in combination with the English Wikipedia georeferenced pages HLM model.
Table 6.13: MediaEval Placing Task 2014 Experiments.
run Approach
run1 IR (HLM) Re-Rank (100km)
run3 GeoKB
run4 GeoFusion: IR (HLM) Re-Rank (100km) + GeoKB
run5 GeoFusionWiki: R(HLM) Re-Rank (100km) + GeoKB+ GeoWiki
11
6.5. Experiments Georeferencing Informal Documents 159
Table 6.14: Official TALP-UPC Results at Media Eval Placing Task 2014. Percentage of correctly georeferenced photos/videos within certain amount of kilometers and median error
for each run. Margin run1 run3 run4 run5
10m 0.29 0.08 0.23 0.23 100m 4.12 0.80 3.00 3.00 1km 16.54 10.71 15.90 15.90 10km 34.34 33.89 38.52 38.53 100km 51.06 42.35 52.47 52.47 1000km 64.67 52.54 65.87 65.86 5000km 78.63 69.84 79.29 79.28 Median Error (kms) 83.98 602.21 64.36 64.41
Figure 6.12: Official TALP-UPC Results at Media Eval Placing Task 2014. Accuracy against margin of error in kms
0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 % 0.01 0.1 1 10 100 1000 5000 Accuracy Kms experiments run1 run3 run4 run5
160 Chapter 6. Textual Georeferencing Approaches 6.5.5.2 Experiments after MediaEval Placing Task 2014
The approaches were tested with three corpora for training (see Table 6.15): 1) the ME2014 Training dataset, 2) YFCC100M_A, the YFCC100M geotagged dataset (47,959,829 geo- tagged items) with items that are not contained in the test set, and 3) YFCC100M_B, the YFCC100M geotagged dataset (47,959,829 geotagged items) with items that are not contained in the test set and items that do not pertain to any user of the test set. From the ME2014PT training and the YFCC100M geotagged datasets we extracted all the unique coordinates with associated text: about 2,741,717, 11,382,289, and 11,253,099 coordinates respectively.
Table 6.15: Features of the training corpus in official and posterior experiments.
Training Corpus #items #unique_coordinates #coordinates_with_text #users
MEPT2014 5,025,000 3,057,718 2,741,717 172,024
YFCC100M_A 47,959,829 12,578,450 11,382,289 212,877
YFCC100M_B 44,000,224 11,619,425 11,253,099 205,988
Two sets of experiments were performed:
1. Official experiments with the ME2014PT dataset and posterior experi- ments with and without gazetteer use. The results of this experiments are shown in Table 6.16. The official run1 at the benchmark was done with the HLM model and a distance threshold of 100km for Re-Ranking and it achieved the best official results in accuracies at high distances (1,000km and 5,000km). It is worth noting that in the benchmark there is not a system performing well in all distances. The GeoFusion approaches achieved the best results in the experiments at ranges from 10 km to 5,000 km with the ME2014PT Training dataset, clearly outperforming the GeoKB, IR, and IR with Re-Ranked approaches. The GeoFusion approaches achieved the best results at these evaluation ranges because this approach combines high pre- cision rules based on Toponym Disambiguation heuristics and predictions that come from an IR model when these rules are not activated. When these rules are activated (144,074 cases of 510,000), they achieve accuracy percentages of 87.37% (125,878 of 144,074 items) predicting up to 100 km. By contrast, the HLM IR model trained with the ME2014PT training set with Re-Ranking achieved a 78.34% of accuracy at 100 km when evaluated over this subset (144,074 cases). The HLM approach with Re-Ranking obtained the best results in distance ranges from 10m to 1 km because it captures non-geographical highly descriptive and unique keywords and place names appearing in the geographical coordinates’ associated metadata that are not present in the gazetteer. The approach that uses the English Wikipedia georeferenced pages to handle difficult cases does not generally offer better performance than the original GeoFusion approach.
6.5. Experiments Georeferencing Informal Documents 161
Table 6.16: Results of Run1 at ME2014PT (use provided training dataset only) and poste- rior experiments (without and with gazetteers used).
accuracy percentage System 10m 100m 1km 10km 100km 1000km 5000km Benc hmark Results CEALIST(Popescu, et al, 2014) 0.01 0.61 22.62 40.00 47.36 61.17 74.94 RECOD(L. Li et al., 2014) 0.55 6.06 21.04 37.59 46.14 61.69 76.76 SonSensCERTH(Kordopatis et al 2014) 0.50 5.85 23.02 39.92 46.87 60.11 74.80 UQ-DKE(Cao et al., 2014) 1.07 4.98 19.57 41.71 52.46 63.61 77.28 USEMP(Popescu, et al, 2014) 0.78 1.61 23.48 40.77 48.11 61.79 75.30
ICSI/TUDelft(Choi and X. Li, 2014) 0.24 3.15 16.65 34.70 45.58 60.67 75.03
TALP-UPC12 (Ferrés, et al, 2014) 0.29 4.12 16.54 34.34 51.06 64.67 78.63
P ost-Ev aluation Exp erimen ts GeoKB 0.07 0.89 11.31 34.44 42.26 48.45 58.32 HLM top-ranked 0.46 5.58 20.07 37.17 46.34 60.40 75.59 HLM@10km 0.29 4.18 17.35 41.99 50.97 63.38 77.91 HLM@1km 0.30 4.65 24.03 41.10 49.53 62.20 75.79 [email protected] 0.46 7.20 22.29 38.37 46.86 60.10 74.59 TFIDF@100km 0.29 4.21 16.84 34.32 50.15 63.52 77.69 BM25@100km 0.29 4.24 17.01 34.63 50.60 63.88 77.93 HLM@100km+GeoKB 0.25 3.25 16.82 39.71 53.61 66.78 80.06 HLM@10km+GeoKB 0.26 3.32 17.30 43.48 53.47 65.67 79.47 HLM@1km+GeoKB 0.25 3.56 20.74 42.80 52.36 64.76 77.48 [email protected]+GeoKB 0.35 5.03 19.69 40.95 50.53 63.22 76.58 TFIDF@100km+GeoKB 0.25 3.19 16.72 39.34 53.07 66.10 79.39 BM25@100km+GeoKB 0.25 3.21 16.83 39.53 53.31 66.30 79.52 HLM@100km+GeoKB+Wiki 0.25 3.25 16.82 39.72 53.61 66.77 80.05
2. Official experiments with the use of external data and gazetters allowed and posterior experiments with the YFCC100M geotagged dataset. The results and details of these experiments are shown in Table 6.17. In these experi- ments the official results obtained were not so good and achieved only the median (of all participants) in distances higher than 10km. In this case the CEALIST and USEMP (Popescu et al., 2014) systems13 got the best results. On the other hand, the GeoFusion approaches trained with the YFCC100M_A only improve slightly the IR models in accuracy ranges from 1,000 km to 5,000 km. The results with the YFCC100M_A geotagged dataset as a training data lead to the following conclusions: 1) with YFCC100M_A data, the accuracy of the Data-Driven approach outperforms the GeoKB approach, 2) although the YFCC100M_A geotagged dataset used in this study had filtered out the items appearing in the test set, some users with items in the test set could have also items in the train set, and this fact could lead the IR model to have a gain by modeling user’s particular way of tagging (M. Larson et al., 2015). In comparison with the results of the other participants, the IR with Re-Ranking and GeoFusion approaches achieved state-of-the-art results at ME2014PT evaluation. The HLM with Re-Ranking approach obtained the best results for accuracies at distances of 1,000 km and 5,000 km in the task where only the official training data can be used to predict. In posterior experiments using the YFCC100M_A geotagged dataset, the IR with Re-Ranking and GeoFusion approaches outperformed the best results for
12This run used the HLM@100km approach (Re-ranking at 100km).
13In these official experiments CEALIST and USEMP systems were trained with the YFCC100M_A geotagged
162 Chapter 6. Textual Georeferencing Approaches accuracies from 10m to 100m with accuracy percentages of 20.63% and 26.64%. A final experiment has been done to assess the effects of filtering out all those users of the training set (YFCC100M) that have items that appear also in the test set. The experiment used the IR HLM with Re-Ranking approach at 100km with the YFCC100M_B geotagged dataset which has those users filtered as a training set. The results show an important difference compared with the same algorithm using the YFCC100M_A dataset. Thus seems that this results confirm that the observation that using the YFCC100M_A dataset could lead to model user’s particular way of tagging done by M. Larson et al. (2015).
Table 6.17: Overall official best results at ME2014 runs (anything allowed except crawl- ing the exact items of the test set) and posterior experiments (training with YFCC100M geotagged). accuracy percentage System 10m 100m 1km 10km 100km 1000km 5000km Benc hmark Results CEALIST(Popescu, et al, 2014) 0.01 1.22 40.25 55.98 62.26 72.14 81.95 RECOD(L. Li et al., 2014)14 0.59 6.26 21.15 37.50 46.03 61.41 75.07 SonSensCERTH(Kordopatis et al 2014) 0.50 5.85 23.02 39.92 46.87 60.11 74.80 UQ-DKE(Cao et al., 2014) 1.08 5.05 20.23 43.68 56.03 69.08 81.14 USEMP(Popescu, et al, 2014) 2.56 4.33 44.14 61.34 69.10 78.69 86.52
ICSI/ TUDelft(Choi and X. Li, 2014) 0.32 3.41 12.13 19.95 22.82 33.79 53.06
TALP-UPC15(Ferrés, et al, 2014) 0.23 3.00 15.90 38.52 52.47 65.87 79.29
P
ost-Ev
aluation
training with YFCC100M_A geotagged photos/videos
HLM@100km 20.63 26.64 40.65 56.13 68.52 76.60 84.76 BM25@100km 19.96 26.10 40.30 55.80 68.30 76.72 85.69 TFIDF@100km 19.84 25.97 40.11 55.57 68.06 76.54 85.56 HLM@100km+GeoKB 13.72 18.14 32.62 54.53 67.49 77.05 86.10 BM25@100km+GeoKB 13.20 17.64 32.16 54.05 67.09 76.83 85.97 TFIDF@100km+GeoKB 13.12 17.55 32.03 53.88 66.91 76.69 85.87
training with YFCC100M_B (geotagged without users in test set)
HLM@100km 0.36 4.53 17.27 34.10 51.31 64.95 78.52
14
In this run they used both textual and visual features. 15