In Chapter 4 we analysed existing taxonomies based on their application to Expertise Mining and we identified a need for automatically constructed topical hierarchies. We proposed a method for constructing a topical hierarchy from text that has several advantages compared to previous approaches for constructing lexical taxonomies:
• the resulting hierarchies are informative and have a high coverage of domain- specific concepts
• a much larger number of technical terms can be connected in the hierarchy • automatic identification of the hierarchy root
6.2 Discussion
• no human input, in the form of seed terms or upper level concepts, is required This method is based on a global generality measure that takes in consideration the co-occurrence of a term with all the other terms in a domain. A graph-pruning algo- rithm was used to derive a tree-like structure from the dense, noisy graph constructed using the semantic relatedness of terms. Then, a novel expertise measure was pro- posed based on a topical hierarchy. Topical hierarchies were mainly evaluated through their contribution in the task of expert finding, by comparing them with hierarchies constructed using hierarchical clustering. Topical hierarchies are more coherent that hierarchical clusters, consistently achieving the best results when applied to expert finding.
Topical hierarchies also proved to be intuitive enough to be directly used by humans for analysing large corpora, as we could see in Section 5.2. But there is still place for improvement, for example a manual analysis showed that topics tend to drift from the main knowledge area on longer paths. This limitation could be addressed by clustering topics as a pre-processing step. This step is also important when constructing topical hierarchies for more heterogeneous domains, such as the documents in a university, which cover a wide range of topics, with a small overlap. We did not limit the depth or the width of the hierarchy based on manually provided parameters, but this could be desirable when constructing hierarchies for human users. There are cognitive limitations which make less complex structures more usable and easier to comprehend if they are designed directly for the end user.
Knowledge structures can be evaluated through comparison with existing gold stan- dards or through manual evaluation with domain experts. Further studies are needed to prove the applicability of topical hierarchies beyond Expertise Mining.
6.2.3 Expert profiling
Expert profiles provide useful information when searching for an expert, as they put the expertise of a person in the context of other expertise topics. A main motivation of our work was to completely automate the construction of expert profiles, by eliminating the need for manually-identified expertise topics. This was achieved by constructing expert profiles using automatically-extracted expertise topics. Expertise topics extracted from documents authored by a person, were ranked based on their overall quality and based
6. CONCLUSIONS
on their relevance to that person. Our approach for expert profiling was evaluated on a dataset about workshop committee members from different fields of Computer Science. It was assumed that each committee member is an expert on all the expertise topics mentioned in the call-for-papers of a workshop.
This evaluation dataset has its limitations, especially for workshops that intend to bring together researchers from different research areas. Another limitation is that the resulting profiles are much larger than what would normally be displayed in an expert search system. A main challenge remains to construct profiles that are as complete as possible while being still concise. A possible solution to this problem can be to make use of a topical hierarchy to select broad topics that can summarise expertise, while filtering more specialised topics. Additionally, further work is required to represent expert profiles in machine readable formats which are compatible with existing standards for competency.
6.2.4 Expert finding
Expert finding has many applications, in industry as well as in research communities. To tackle this problem, we relied mainly on documents associated with a person, inves- tigating several content-based methods for expertise. We relied on the number of times a person mentions an expertise topic to measure relevance, and the number of different documents written about a topic to measure experience. Additionally, we proposed a measure based on a topical hierarchy, which takes into consideration how well a person knows specialised topics in an area. The method used to combine these scoring func- tions is rather simplistic, leaving place for improvement either by considering a linear combination or a more sophisticated learning-to-rank approach.
We showed that these measures outperform state-of-the-art methods for expert finding in terms of both precision and recall, when applied to domain-specific datasets. Nevertheless, the overall results are still low enough to require additional sources of in- formation beyond textual documents. A solution to this problem is to combine expertise extracted from the content written by a person with more structured information about the number of citations of a document or about education or training, which can be extracted from CVs.
Another limitation of this work is that our expert ranking approach is static, and does not take into consideration the profile of the person that is doing the search. The