II. MARCO TEÓRICO
2.2 Bases teóricas
2.2.2 Lactancia Materna y su protección legal:
2.2.2.1 Composición de la lactancia materna y tipos de Lactancia
Here, we outline a use case that demonstrate how the result of interlinking OpenAIRE with related datasets can support scholarly communication. Listing 5.14 We imagine such services to be integrated into environments for reading or writing scholarly papers. A difficult and time consuming task for peer reviewers is to get a quick overview of the state of the art of the field covered by the paper or dataset under review. The OpenAIRE LOD itself has information about the subject of a paper or a dataset, which can be linked to subject classification schemes such as the ACM CCS. Furthermore, CiteSeer provides citation graphs of papers. We can thus offer to peer reviewers a service that finds papers or datasets similar to the one under review. One of the critical facts in the process of writing and publishing is the comprehensiveness of citations inside scholarly data.
<SOURCE>
<ID > s o u r c e 1 < / ID >
<ENDPOINT> h t t p : / / b e t a . l o d . o p e n a i r e . eu / s p a r q l < / ENDPOINT> <VAR>? x < /VAR>
<PAGESIZE > 10 000 < / PAGESIZE>
<RESTRICTION>? x a oav : P e r s o n < / RESTRICTION>
<PROPERTY> f o a f : name AS l o w e r c a s e RENAME name < /PROPERTY> <PROPERTY> d c t e r m s : c r e a t o r / c e r i f : name AS l o w e r c a s e −>
r e g e x r e p l a c e ( " [ ^ A−Za−z0 − 9 ] " , " " ) RENAME t i t l e < /PROPERTY> < /SOURCE> <TARGET> <ID > s o u r c e 2 < / ID > <ENDPOINT>C : \ d b l p 2 . n t < / ENDPOINT> <VAR>? y < /VAR> <PAGESIZE>−1</PAGESIZE> <RESTRICTION>? y r d f : t y p e d b l p : P e r s o n < / RESTRICTION> <PROPERTY> d b l p : p r i m a r y F u l l P e r s o n N a m e AS
l o w e r c a s e RENAME dname < /PROPERTY>
<PROPERTY>^ d b l p : a u t h o r e d B y / d b l p : t i t l e AS l o w e r c a s e −> r e g e x r e p l a c e ( " [ ^ A−Za−z0 − 9 ] " , " " ) RENAME d t i t l e < /PROPERTY> <TYPE>NT< / TYPE>
< /TARGET>
<METRIC>AND( J a r o ( x . name , y . dname ) | 0 . 8 5 ,
L e v e n s h t e i n ( x . t i t l e , y . d t i t l e ) | 0 . 7 ) </METRIC>
Listing 5.14: LIMES configuration A configurations file for interlinking of the Person entity is represented with certain metrics and metadata.
A service similar to the one for peer reviewers explained above could be offered to authors. Research dynamics could be understood better by analyzing how people who publish on certain topics move in
the community, e.g., to other organizations. Having access to the networks of a papers and authors and their organizations, and furthermore taking into account the events in which people participate enables new indicators for measuring the quality and relevance of research that are not just based on counting citations.
Having access to the networks of a papers and authors and their organizations, and furthermore taking into account the events in which people participate enables new indicators for measuring the quality and relevance of research that are not just based on counting citations. To enrich content of openAIRE dataset, we carried out interlinking between different concepts from OpenAITRE and the corresponding concept in four candidate datasets; namely: DBLP, DBpedia, ACM and SWDF. The Person entity in OpenAITRE is defined as oav:Person vocabulary in OpenAITRE data schema and as dblp:Person vocabulary in DBLP.
It should be highlighted that the comparison of person entities and associated properties in OpenAIRE. While running the tool, this configuration file will construct and execute two different SPARQL queries from source and target datasets to get the selected properties values and apply string similarity matching on. The result of this interlinking is number of links in RDF, shown in Listing 5.15 , which connect OpenAITRE and DBLP Person entities using “owl:sameAs” relationship. We can follow a similar approach in Silk linkage rule for Person Interlinking.
<\ h r e f { h t t p : / / l o d . o p e n a i r e . eu / d a t a / r e s u l t / d o a j a r t i c l e s : : 6 5 8 0 3
a 4 2 3 c a 8 b 7 c c 4 1 1 d 9 7 c 0 0 8 b 1 b 4 e c }{ h t t p : / / l o d . o p e n a i r e . . . b 1 b 4 e c }> owl : sameAs < \ h r e f { h t t p : / / d b l p . o r g / r e c / j o u r n a l s / e n t r o p y / ZengeyaBC15 }{ h t t p : / / d b l p . o r g . . . /
ZengeyaBC15 } > \ n e w l i n e
Listing 5.15: Sample Interlinking Result. The sameAs relations are constructed based on LIMES configuration.
Evaluation of Interlinking Tools To find the common and individual links created by selected inter- linking tools, we wrote a script [5, Appendix C], which compares the contents of results obtained by two tools and returns the number of common links and also the number of links found by one tool but not by the other. In an experiment with considering publications of OA data and publications of DBLP data LIMES was able to match 432 entities, i.e. more than Silk. The number of common records discovered by both Silk and LIMES is 358. 74 links were found by LIMES but not by Silk, and 3 links were found by Silk but not by LIMES.
In addition to the number of discovered links, reliability of the obtained links is also important. Thus, to evaluate the quality and reliability of the links obtained via each tool, we created a reference linkset (gold standard) consisting of 100 publication resource selected from OA and by manual research found 38 links to SWDF. We then ran Silk and LIMES to find only links from these 100 selected OA resources to SWDF and then compared their output to the gold standard. We computed precision, recall and F-measure to check completeness and correctness of the links found; Table 5.8 shows the results. Precision is the ratio of the number of relevant items to the number of retrieved items, i.e.:
Precision= true positive
true positive + false positive In our case, this means
Precision=(Number of created links – Number of incorrect links) Number of created links
and indicates the correctness of links discovered.
5.3 Interlinking Tool Number of created links Number of missing links Number of incorrect
discovered links Precision Recall F-measure
LIMES 37 1 0 1 0.97 0.98
Silk 29 9 1 0.96 0.76 0.85
Table 5.8: Evaluation. The evaluation of interlinking tools result against a gold standard.
Recall= true positive
true positive + false negative In our case, this means
Recall=(Number of created links – Number of incorrect links) (Number of correct links + number of missing links)
and indicates the completeness of links discovered. F-measure is a combined measure of accuracy defined as the harmonic mean of precision and recall:
F1 =2 ∗ precision ∗ recall precision+ recall
The evaluation revealed 9 missing links and one incorrectly discovered link in Silk and 1 missing in LIMES. This corresponded to a Precision of 1, a Recall of 0.97 and an F-measure of 0.98 for LIMES and a Precision of 0.96, a Recall of 0.76 and an F-measure of 0.84 for Silk. The main advantage for LIMES within this small evaluation is the execution time. However we consider the best practices so far which showed that LIMES outperforms Silk dealing with big data. Therefore, due to the fact that we got more relevant, reliable and accurate results from LIMES compared to Silk, we chose LIMES for further interlinking OpenAIRE with other datasets.
Links between Target dataset Target instances Generated links Sample of
generated links Verified links Precision
Publication DBLP 164890 2276 150 147 0.98 Publication SWDF 5009 432 150 150 1.0 Publication ACM 10378 1082 150 136 0.9 Person SWDF 11184 2000 200 180 0.9 Person DBLP 932000 6852 200 111 0.55 Person DBpedia 23373 1088 200 80 0.40 Organization SWDF 3212 866 30 30 1.0 Organization DBpedia 3472 38 30 30 1.0
Table 5.9: Evaluation. Number of links and precision values obtained between OA and DBLP, SWDF, ACM and DBpedia for publications, persons and organizations.