• No se han encontrado resultados

Título del gráfico

RESPUESTAS A LAS INTERROGANTES DE LA INVESTIGACIÒN

Based on the results of the systematic study of different aspects of methodological design pre- sented in the previous section, this section examines three different strategies to boost the performance for automatic user profiling using only a single message per user: a feature union approach (Section 4.5.1), a balancing strategy (Section 4.5.2) and a cross-task classification approach (Section 4.5.3).

4.5.1 Combining Features into Complex Models

Because the experiments that were based on token unigram and character 4-gram features showed the best results for both age groups, in the next step we combined both these types with other single feature types. Interestingly, as can be seen in Figure 4.2 and 4.3 there was only one combination that was able to top the performance of the BOW features for both categories: when merging character 4-grams with sociolinguistic features, the NB classifier achieved a 83.3% precision, a 56.2% recall and a 67.1% F-score for the adult class and a 91.5% precision, a 97.7% recall and a 94.5% F-score for the adolescent age group. Threefold combinations of single feature types did not produce higher results for either category.

Similar to the feature union experiments for age group detection described above, the best single feature types were merged with other types to examine which combinations could boost the performance of the gender classifier. Again, the best results were achieved when combining character tetragrams with the sociolinguistic features introduced in this chapter, resulting

in a 47.8% precision, 45.1% recall and 46.6% F-score for the male category. With regard to identifying females, the BOW model still outperformed all other combinations. An overview of the performance of the different (combinations of) feature types is presented in Figure 4.4 and 4.5.

Figure 4.2: Precision, recall and F-score for age prediction per feature type (combination) for the

PLUS20 class.

Figure 4.3: Precision, recall and F-score for age prediction per feature type (combination) for the

Figure 4.4: Precision, recall and F-score for gender prediction per feature type (combination) for

the MALEclass.

Figure 4.5: Precision, recall and F-score for gender prediction per feature type (combination) for

4.5.2 Balancing the Data

To create a good reflection of reality, up until this point, a highly skewed data distribution was adopted during each learning experiment. The results discussed above showed that it is feasible to improve upon the baseline performance for the minority classes that are the focus of this study (i.e., male adults), based on a heavily imbalanced dataset containing only a single message per user. The series of experiments described in this section investigated whether balancing the data in train while maintaining the original skewed data distribution in test could increase the performance for these categories. Furthermore, experiments on the completely balanced NETLOG_SUBSET2 dataset were also included to enable a useful comparative analysis with regard to prior studies in automatic user profiling (see Section 4.2.1).

The best results for both types of balanced data experiments were again obtained by the Multi- nomial Naïve Bayes classifier, using a combination of the character 4-grams and sociolinguistic features. Balancing the dataset in each training partition only, led to a considerably higher recall score for the minority classes when predicting age and gender compared to the imbalanced data

experiments described above, but the precision and F-score decreased in for both the PLUS20

and the MALEcategory.

Finally, regarding the completely balanced learning experiments, this study’s findings show that adding the sociolinguistic features (which were newly introduced in Section 4.3.2) to the

more traditional charactern-gram features produced a higher accuracy of 84.1% compared to

prior work for age prediction on blogs (e.g., [73, 116, 186]) and social network postings ([152, 204]) which all incorporated multiple messages per user in their experiments. With regard to gender prediction, the Naïve Bayes classifier yielded slightly lower results (61.0% accuracy) than reported by [34], who set up a learning experiment in which a single tweet per user was incorporated. However, the authors of [34] did not include adolescent messages in their dataset.

4.5.3 Combining Age and Gender Prediction

Finally, the third part of this section investigates whether gender meta-data can be a helpful information source in constructing more accurate classifiers for age group detection.

(see Section 4.4.1), which are referred to as BIN_EXP., in this section, three different approaches of including the meta-data for gender are examined in order to investigate their effect on age group prediction. Given that one of the key objectives of this thesis is to investigate the feasibility of detecting adults posing as adolescents in social network environments, during these experiments the focus lies on the scores for the adult class.

In the first experiment (EXP_1) the data was balanced according to both age and gender in

each training partition of the NETLOG_SUBSET2 dataset. Next, the Multinomial Naïve Bayes

classifier was retrained, extracting the 50,000 most frequent features (DF) and using binary feature values (see Section 4.4.1). Compared to the results of the imbalanced data experiments,

both the recall and F-score for the PLUS20 category improved from 56.2% to 81.9% and from 67.1

to 69.8%, respectively. The precision, however, decreased from 83.3% to 60.8%.

In the second experiment (EXP_2) a three-way NB model was trained on the original, imbal-

anced dataset, which included the MIN16 category and two complex classes for the older class

in which gender was included, namely PLUS20_MALEand PLUS20_FEMALE. Subsequently, the

complex classes were reduced to PLUS20 in the classifier’s output in order to examine whether

the extra gender information the classifier had acquired during training would lead to a better age prediction on the binary test sets. Although the recall dropped slightly to 80.9%, the results

of EXP_2 showed an improvement upon the EXP_1 approach with regard to the precision (67.7%)

and the F-score (73.7%).

The third experiment (EXP_3) consisted of including gender as an additional feature in every

instance of the original NETLOG_SUBSET2 data (i.e., a feature union approach). Again the results

improved upon those of BIN. and EXP_1, but not upon those of EXP_2, resulting in a precision

of 65.8%, a recall of 81.4% and an f-score of 72.8% for the adult class. In Table 4.8 an overview is provided of the results for the three additional experiments compared to those of the binary

age classification experiments (BIN_EXP.). In EXP_2, the model showed a classification error of

Table 4.8: Results for age prediction when including gender meta-data: EXP_1 (data balanced

according to age group and gender in train), EXP_2 (3 classes in train, 2 in test) and EXP_3

(gender as feature).

Scores (%) Class Bin.

Including Gender

Exp_1 Exp_2 Exp_3

Precision Min16 91.5 96.0 95.9 95.9 Plus20 83.3 60.8 67.7 65.8 Recall Min16 97.7 89.0 92.0 91.2 Plus20 56.2 81.9 80.9 81.4 F-score Min16 94.5 92.4 93.9 93.5 Plus20 67.1 69.8 73.7 72.8