Entrega, copia de la respuesta oficio del Área

The proposed location NER method solely identifies event location and omits other locations mentioned in tweets. For the evaluation, the tweets in the test dataset were utilised to compare the proposed method with existing Arabic named entity recognition systems. The first system is FARASA (Institute, 2017), which is based on work reported by (Darwish and Gao, 2014); their system was adopted since it produces near state-of-the-art results. The second system adopted is Polyglot-NER (Al-Rfou, 2017), which is the work proposed by (Al- Rfou et al., 2015). Figure 6-5 shows an example of results when tested with a test dataset, the highlights in the blue colour and underlined (one line) represent a correct event location NE,

116

and the highlights in red colour and underlined (two lines) represent an incorrect event location NE.

Recall, precision, and F-measure are usually used to measure the system’s performance in location NER. Recall, precision and F-measure were computed as follows:

Precision =

number of correct NE recognized by the system (TP)_{number of NE givenby the system (TP+FP)} (2)

𝑅𝑅𝑅𝑅𝑐𝑐𝑅𝑅𝑅𝑅𝑅𝑅 =

number of correct NE recognized by the system (TP)_{number of correct NE in the corpus (TP+TN)} (3)

Example 1: Filtered tweet: “دﺻر ه دﺣﻻا موﯾﻟافﺋﺎطﻟا قرﺷموﻘﺑﻟا ﺔﺑرﺗ قرﺷ لوﯾﺳ”. FARASA: “دﺻر ه دﺣﻻا موﯾﻟا فﺋﺎطﻟا قرﺷموﻘﺑﻟا ﺔﺑرﺗ قرﺷلوﯾﺳ

”

. Polyglot-NER: “دﺻر ه دﺣﻻا موﯾﻟا فﺋﺎطﻟا قرﺷ موﻘﺑﻟا ﺔﺑرﺗ قرﺷلوﯾﺳ

”

. Our system: “دﺻر ه دﺣﻻا موﯾﻟافﺋﺎطﻟا قرﺷموﻘﺑﻟا ﺔﺑرﺗ قرﺷ لوﯾﺳ ”. Example 2: Filtered tweet: “سﻘط ﺔﻧﯾدﻣﻟا تﺎﺷﻛﻣ يدﻣﺣﻻا رﺎﻣﻋ عدﺑﻣﻟا ﻖﯾرﻔﻟا وﺿﻋ نﻻاﺔﻧﯾدﻣﻟا برﻏ بوﻧﺟتﺑھ ﺔﯾرﻗ لوﯾﺳ”. FARASA: “سﻘط ﺔﻧﯾدﻣﻟا تﺎﺷﻛﻣ يدﻣﺣﻻا رﺎﻣﻋ عدﺑﻣﻟا ﻖﯾرﻔﻟا وﺿﻋ نﻻاﺔﻧﯾدﻣﻟا برﻏ بوﻧﺟتﺑھ ﺔﯾرﻗ لوﯾﺳ”. Polyglot-NER: “سﻘط ﺔﻧﯾدﻣﻟا تﺎﺷﻛﻣ يدﻣﺣﻻا رﺎﻣﻋ عدﺑﻣﻟا ﻖﯾرﻔﻟا وﺿﻋ نﻻا ﺔﻧﯾدﻣﻟا برﻏ بوﻧﺟتﺑھ ﺔﯾرﻗ لوﯾﺳ”. Our system: “سﻘط ﺔﻧﯾدﻣﻟا تﺎﺷﻛﻣ يدﻣﺣﻻا رﺎﻣﻋ عدﺑﻣﻟا ﻖﯾرﻔﻟا وﺿﻋ نﻻاﺔﻧﯾدﻣﻟا برﻏ بوﻧﺟتﺑھ ﺔﯾرﻗ لوﯾﺳ”. Example 3: Filtered tweet: “ ءﺎﺛﻼﺛﻟاصﯾﻌﻟا قرﺷﮫﺻارﻘﻟا قرﺷ لوﯾﺳو رﺎطﻣا روﺻﻟﺎﺑ ”. FARASA: “ءﺎﺛﻼﺛﻟا صﯾﻌﻟا قرﺷ ﮫﺻارﻘﻟا قرﺷ لوﯾﺳو رﺎطﻣا روﺻﻟﺎﺑ”. Polyglot-NER: “ءﺎﺛﻼﺛﻟا صﯾﻌﻟا قرﺷ ﮫﺻارﻘﻟا قرﺷ لوﯾﺳو رﺎطﻣا روﺻﻟﺎﺑ”. Our system: “ءﺎﺛﻼﺛﻟا صﯾﻌﻟا قرﺷﮫﺻارﻘﻟا قرﺷ لوﯾﺳو رﺎطﻣا روﺻﻟﺎﺑ”.

Figure 6-5 Examples of location NER systems results. It denote error as the following true positive , false positive and

117

F measure =

2∗(precision∗recall)_{(precision+recall)} (4)

Where TP is True Positive, FP is False Positive, and TN is True Negative.

The comparative results of the NER systems in Table 6-1 highlight one of the key contributions of this thesis. The results from testing data represent the accuracy of locations by the NER systems, specifically, the ability to recognise the location from colloquial Arabic text obtained from Twitter, are shown in Table 6-1. This table shows which technologies are best at detecting locations using colloquial Arabic text. Table 6-1 shows the F1, recall and precision results for 100 tweets in a test dataset generated by the three location NER systems (see Appendix A “Test data of NER task” for details of NER systems performance). As shown in Table 6-1 the proposed system outperformed the other systems and achieved a degree for F1 of 86%. After analysing Table 6-1 which shows that the recall results, it has been demonstrated that the FARASA system achieved the worst performance. The main reason for that is because the FARASA system is based on cross-lingual links between Arabic and English, which means the FARASA system translates Arabic text into English in order to exploit discriminative features and large resources in the English language. As mentioned by (Darwish and Gao, 2014), the FARASA system’s accuracy performance dropped significantly when tested with the Twitter dataset, and some of the factors that were observed are:

1) some words that would typically be regular words are recognised as location NE, for example " لوﯾﺳ " which mean “floods” are recognised as location NE referenced to Seoul “لوﯾﺳ- لوﺋﯾﺳ” the capital city of South Korea).

2) Some location NEs in the tweet dataset are unknown or not common, and when translated into English the translator deals with it as a regular word.

118

The Polypglot-NER system achieved better accuracy in the results compared to the FARASA system, however, this result is still far away from the proposed system’s results. Polypglot- NER is based on a word level classification problem; it utilises Wikipedia to create a named entity training corpus. The main reason why the proposed system outperformed the Polypglot-NER system is that the Polypglot-NER system aims to recognise all location NE mentioned in tweets, whereas the proposed system aims to recognise event location NE and ignore other location NEs mentioned in the tweet. Moreover, the proposed system considers the neighbouring words and the word order in the sentence (tweet) to distinguish the event location NE and disregard other words.

Table 6-1 Location NER systems results

Table 6-2 presents the performance of each NER system with three example tweets (the tweets are presented in figure 6-5). For example, the tweet “فﺋﺎطﻟا قرﺷ موﻘﺑﻟا ﺔﺑرﺗ قرﺷ لوﯾﺳ

موﯾﻟا دﺣﻻا ه

دﺻر ” is made up of 10 words, and the proposed system returned three words as the location NE: “فﺋﺎطﻟا موﻘﺑﻟا ﺔﺑرﺗ”. Figure 6-5 shows that the location NE’s for this tweet are “ ﺔﺑرﺗ ” and “ فﺋﺎطﻟا ”, therefore the TP = 2 . However, the proposed system also provided the location NE “موﻘﺑﻟا ”, but this is incorrect, therefore the FP =1. FN represents the number of words that are NEs, yet that the system did not able to recognise, for this example the FN =0. Any remaining words represent TN, which are not location NEs, and the proposed system

precision Recall F1

FARASA 60.39 23.01 33.32

Polyglot-NER 62.68 47.90 54.30

119

recognised it correctly (TN = 7). The precision, recall and F1 values for three location NE systems have been calculated according to the TP, FP, FN and TN values.

Table 6-2 NER systems perform on 3 example tweets (TP = True Positive, FP= False Positive, FN= False Negative, TN= True Negative,)

Tweet ID

Inferred Location NE by proposed

System Proposed NER system Farasa System Polyglot-NER System

words in tweet TP FP FN TN TP FP FN TN TP FP FN TN 1 ﻒﺋﺎﻄﻟا مﻮﻘﺒﻟا ﺔﺑﺮﺗ 2 1 0 7 0 1 2 7 2 0 2 6 10 3 ﺔﻨﯾﺪﻤﻟا ﺖﺒھ ﺔﯾﺮﻗ 3 0 0 12 0 0 3 12 0 3 1 11 15 67 ﺺﯿﻌﻟا ﮫﺻاﺮﻘﻟا 2 0 0 7 0 0 2 7 1 0 1 7 9 6.6 Chapter summary

This chapter has identified the main research challenges in inferring the location NE for high-risk floods from Arabic tweets, and has formatted the research hypotheses which have been tested in this chapter. Subsequently, the flood location NER approach has been developed using the L2S method, along with developing a named entity annotator using a BIO tag scheme. The flood location NER task was tackled by producing a comparative experiment to compare and evaluate the performance of the proposed system with FARASA and Polyglot- NER systems. The experiments’ results show that the proposed system significantly outperformed other systems when applied to colloquial Arabic text collected from Twitter. Hence, the null hypotheses have been rejected and it is accepted that the flood location can be accurately estimated by recognising Arabic named entities, especially locations and organisations which are mentioned in the tweets. The next chapter discusses and addresses location NEL problems, and will use the location NE identified using the proposed NER system.

120

Chapter 7: Locations Named Entity Linking

In document ÍNDICE. Página Diagrama de flujo 36 8 Relación de documentos que intervienen en el procedimiento Anexos (página 30-36)