Quality assessment of linked data sources for named-entity resolution

Quality assessment of linked data sources for named-entity resolution

Carmen Brando Nathalie Abadie Francesca Frontini

CRH UMR 8558 CNRS - EHESS, Paris, France

Univ. Paris-Est, LASTIG COGIT, IGN, ENSG, Saint-Mandé, France

Praxiling UMR 5267 CNRS - UPVM3, Université Paul-Valéry Montpellier 3, France

Corresponding Author Email: 
carmen.brando@ehess.fr, nathalie-f .abadie@ign.fr, francesca.frontini@univ-montp3.fr
Page: 
31-54
|
DOI: 
https://doi.org/10.3166/ISI.21.5-6.31-54
Received: 
N/A
|
Accepted: 
N/A
|
Published: 
31 December 2016
| Citation
Abstract: 

More applications in the Digital Humanities rely on Linked Data for the semantic enrichment of digital collections by means of URI, typically for providing background information about authors, works of art and historical places, mentioned in these collections. In this sense, Named Entity Linking (NEL) is the task of automatically assigning the appropriate referent to a named-entity mention tagged in a text. Nevertheless, data sources of the Web of Data still experiences quality issues which are critical for NEL and many Digital Humanities applications. The present article hence proposes an empirical study to assess the quality of any Linked Data (LD) set meant to be used as Knowledge Base in graph-based NEL. Our methodology deals with state-of-art quality aspects from a fitness-for-use perspective. We perform experiments on two French heritage texts and choose to test two types of linking: on the one hand to a generalistic Linked Data source and on the other to domain-specific ones. The proposed study assesses to which degree the different Linked Data sources are better suited to be used as Knowledge Base for some NEL use case.

Keywords: 

data quality, named-entity linking, linked data, digital humanities

1. Introduction
2. Mesures pour l’évaluation des résultats des applications de résolution d’entités nommées
3. Influence de la qualité des sources du web de données sur les applications de résolution d’entités nommées
4. Mesures d’évaluation de la qualité des sources du web de données pour des applications de résolution d’entités nommées
5. Mise en oeuvre et résultats
6. Conclusion et perspectives
  References

Besançon R., Daher H., Ferret O., Le Borgne H. (2016). Utilisation des relations d’une base de connaissances pour la désambiguïsation d’entités nommées. In 23ème conférence sur le traitement automatique des langues naturelles (jep-taln-recital 2016), p. 290–303.

Brando C., Frontini F., Ganascia J. (2016). REDEN: named entity linking in digital literary editions using linked data sets. CSIMQ, vol. 7, p. 60–80. Consulté sur http://dx.doi.org/10.7250/csimq.2016-7.04

Brando C., Frontini F., Ganascia J.-G. (2015). Disambiguation of named entities in cultural heritage texts using linked data sets. In New trends in databases and information systems, vol. 539, p. 505-514. Springer.

Cheatham M., Hitzler P. (2013). String similarity metrics for ontology alignment. In International semantic web conference, p. 294–309.

Erp M. van, Mendes P., Paulheim H., Ilievski F., Plu J., Rizzo G. et al. (2016, 05). Evaluating entity linking: An analysis of current benchmark datasets and a roadmap for doing a better job. In LREC 2016, 10th edition of the Language Resources and Evaluation Conference, 23-28 May 2016, Portoroz, Slovenia. Portoroz, Slovénie. Consulté sur http://www.eurecom.fr/publication/4859

Ferraram A., Nikolov A., Scharffe F. (2013). Data linking for the semantic web. Semantic Web: Ontology and Knowledge Base Enabled Tools, Services, and Applications, vol. 169.

Gruetze T., Kasneci G., Zuo Z., Naumann F. (2016). Coheel: Coherent and efficient named entity linking through random walks. Web Semantics: Science, Services and Agents on the World Wide Web, vol. 37, no 0. Consulté sur http://www.websemanticsjournal.org/index.php/ps/article/view/463

Hachey B., Radford W., Curran J. R. (2011). Graph-based named entity linking with wikipedia. In A. Bouguettaya, M. Hauswirth, L. Liu (Eds.), Wise, vol. 6997, p. 213-226. Springer. Consulté sur http://dblp.uni-trier.de/db/conf/wise/wise2011.html#HacheyRC11

Han X., Sun L., Zhao J. (2011). Collective entity linking in web text: A graph-based method. In Proceedings of the 34th international acm sigir conference on research and development in information retrieval, p. 765–774. New York, NY, USA, ACM. Consulté sur http://doi.acm.org/10.1145/2009916.2010019

Jentzsch A., Mühleisen H., Naumann F. (2015). Uniqueness, density, and keyness: Exploring class hierarchies. In Proceedings of the 6th international workshop on consuming linked data (cold 2015). Consulté sur http://ceur-ws.org/Vol-1426/paper-03.pdf

Mihalcea R., Csomai A. (2007). Wikify!: Linking documents to encyclopedic knowledge. In Proceedings of the sixteenth acm conference on conference on information and knowledge management, p. 233–242. New York, NY, USA, ACM. Consulté sur http://doi.acm.org/10.1145/1321440.1321475

Moro A., Raganato A., Navigli R. (2014). Entity linking meets word sense disambiguation: a unified approach. TACL, vol. 2, p. 231-244. Consulté sur http://dblp.uni-trier.de/db/journals/tacl/tacl2.html\#0001RN14

Nikolov A., d’Aquin M., Motta E. (2011). What should i link to? identifying relevant sources and classes for data linking. In Joint international semantic technology conference, p. 284–299.

Paulheim H., Bizer C. (2014, avril). Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst., vol. 10, no 2, p. 63–86. Consulté sur http://dx.doi.org/10.4018/ijswis.2014040104

Pustejovsky J. (1991). The generative lexicon. Computational linguistics, vol. 17, no 4, p. 409–441.

Ruckhaus E., Vidal M., Castillo S., Burguillos O., Baldizan O. (2014). Analyzing linked data quality with liquate. In The semantic web: ESWC 2014 satellite events - ESWC 2014 satellite events, anissaras, crete, greece, may 25-29, 2014, revised selected papers, p. 488–493. Consulté sur http://dx.doi.org/10.1007/978-3-319-11955-7\_72

Schmidt M., Lausen G. (2013). Pleasantly consuming linked data with rdf data descriptions. CoRR, vol. abs/1307.3419. Consulté sur http://dblp.uni-trier.de/db/journals/corr/corr1307.html\#SchmidtL13

Shvaiko P., Euzenat J. (2005). A survey of schema-based matching approaches. In Journal on data semantics iv, p. 146–171. Springer.

Sinha R. S., Mihalcea R. (2007). Unsupervised graph-basedword sense disambiguation using measures of word semantic similarity. In Icsc, vol. 7, p. 363–369.

Usbeck R., Ngomo A.-C. N., Röder M., Gerber D., Coelho S., Auer S. et al. (2014). AGDISTIS- Graph-Based Disambiguation of Named Entities Using Linked Data. In P. Mika et al. (Eds.), The semantic web – iswc 2014, vol. 8796, p. 457-471. Springer International Publishing. Consulté sur http://dx.doi.org/10.1007/978-3-319-11964-9\_29

Zaveri A., Rula A., Maurino A., Pietrobon R., Lehmann J., Auer S. (2015). Quality assessment for linked data: A survey. Semantic Web Journal. Consulté sur http://www.semantic-web-journal.net/content/quality-assessment-linked-data-survey