• No se han encontrado resultados

5. Infraestructura para la operación del proceso.

2.1.6 Logística

We have stressed that domain dependency is one of the issues in opinion mining. It is a common problem that a word may have completely different sentiment orientation in different domains [59] [75]. In their study, [68] found that to achieve good sentiment analysis results, it is important to build a domain-specific lexicon, which is related to both the entities/aspects and their sentiment expressions. Rastogi [82] stated that incorporating the information which is in domain specific lexicons can bring about a drastic improvement in accuracy for sentiment analysis.

These lexicons are sometimes manually created, either by hand annotation[66] [75], using web searches [82], manually coming up with a seed list based on common words, and then growing this list from a lexicon, like WordNet through bootstrapping [38][115].

While determining these polarity scores can be done manually, this functions based on human intuition, and hence is subject to bias and can be influenced by the individual’s education, as well as cultural background [32]. In addition to being subject to bias, they are also constrained to a small number of terms and are time consuming to create [67]. Because they rely on human intervention, they are costly, and as such, are unsuitable for processing of large volumes of data [32].

Due to some of the above issues, we propose a novel semi-automatic approach to generate a domain specific lexicon, based on the words present in our corpus and how they are used.

6.6.1

Seed Word Extraction

One of the key aspects to developing a domain specific lexicon is the seed word selection process. Our intention is to determine this list semi-automatically, without any interference from human subjects.

To achieve this aim, we have used WLLS, due to its ability to capture terms’ relevancy with respect to a certain class.

Out motivation is to create an approach where the emotive words are classified based on the sentiment that they express in the particular domain in focus. WLLS has been shown to work well as a good tool to select informative features.

We extract the top 100 emotive positive words and top 100 emotive negative words. By emotive words, we mean words whose part of speech tag is adjective, adverb, or verb. The figure ’100’ is arbitrarily decided,and is also based on the choice of the count of emotive seed words number which was made in [115].

These top 200 words form our seed words list which will be used in the bootstrapping approach to generate our domain specific lexicon.

6.6.2

Bootstrapping

Our bootstrapping approach uses the SentiWordNet lexicon. This approach involves extract- ing the synonyms of the emotive seed words from SentiWordNet, and hence, populating the domain specific lexicon.

Hu and Liu [38] carried out a similar approach in their work on mining and summarizing product reviews. They determined the semantic orientation of adjectives by using a simple, but effective method which utilized the adjective synonyms and antonyms set in WordNet. They stated that adjectives share the same orientation as their synonyms. Mining WordNet for a domain specific lexicon is also used in [115] and [28].

Our approach differs from the above in that we use SentiWordNet for our approach, as it is a lexicon that was generated for sentiment analysis. We also use the subjectivity lexicon as a supporting lexicon. Fei et al [28] who generated two sets of lexicons, one generated from the first sense of a seed word in WordNet and another based on all the senses, did not carry out sentiment classification.

Our bootstrapping approach is also different from what has been reported in the litera- ture, to the best of our knowledge.

In our approach, the bootstrapping approach is carried out by looking up the words in the seed list on SentiWordNet, and checking all its sysnsets under each part of speech. The score of the word in each synset is examined. If in the score triple, the positive score is the highest, then the synonyms in the synset with #1 and #2 are selected and added to the seed list. The other entries from #3 are ignored. The same process is repeated for the negative words. This will give rise to two lists, one of positive words, and the other of negative words. Any words which are not found in SentiWordNet, but exist in the dataset as an emotive word will be looked up in the subjectivity lexicon.

Data: List of top 100 Positive emotive words (A) and List of top 100 Negative emotive words B

Result: Domain Specific Lexicon

Posscore is positive score, Negscore is negative score and Objscore is objective score

For word i in A do

if i in SentiWordNet then

if POS(i) = Adjective or Adverb or Verb then

for synsets 1 To n

if Posscore > Negscore AND Posscore > Objscore select word#1 AND word #2

add to List A

move to next word end

end end else

add word to Temp

For word j in B do

if j in SentiWordNet then

if POS(j) = Adjective or Adverb or Verb then

for synsets 1 To n

if Negscore > posscore AND Negscore > Objscore select word#1 AND word #2

add to List B

move to next word end

end end else

add word to Temp2

Look up words in Temp and Temp2 in the subjectivity lexicon if word in Lexicon if word is positive add to List A if word is negative add to List B else discard word end end

Documento similar