Benefit from domain ontologies and rule mining to improve truth discovery

Benefit from domain ontologies and rule mining to improve truth discovery

Valentina Beretta Sylvie Ranwez Sébastien Harispe Isabelle Mougenot  

LGI2P, IMT Mines Ales, Univ Montpellier, Ales, France 6, avenue de Clavières, F-30 319 Alès, France

UMR 228 Espace-Dev, Université de Montpellier 500, rue JF. Breton, F-34 093 Montpellier cedex 5, France

Corresponding Author Email: 
prenom.nom@mines-ales.fr; isabelle.mougenot@umontpellier.fr
Page: 
373-405
|
DOI: 
https://doi.org/10.3166/RIA.32.373-405
Received: 
| |
Accepted: 
| | Citation

OPEN ACCESS

Abstract: 

Data veracity is one of the main issues regarding web data. Facing fake news proliferation and disinformation dangers, Truth Discovery models can be used to assess this veracity by estimating value confidence and source trustworthiness through analysis of claims on the same real-world entities provided by different sources. This treatment is crucial within an automated knowledge extraction process, in particular if resulting knowledge bases (KB) are devoted to be used in decision processes. Many studies have been conducted in Truth Discovery domain; however none of them, to our knowledge, take into account the a priori knowledge that may exist regarding a domain (e.g., domain ontologies). This article proposes two ways to reinforce some value confidences and thus source trustworthiness calculus during this process: the first one considers the conceptshierarchy and the second one exploits patterns that are extracted from KB using association rule learning techniques. Both approaches are validated and tested using benchmarks, that are freely available as well as the source code.  

Keywords: 

truth discovery, ontologies, semantic web, value confidence, source trustworthiness, association rule learning, reasoning

1. Introduction
2. État de l’art et positionnement
3. Formalisation du problème et description de l’approche proposée
4. Évaluation de la méthode
5. Résultats
6. Conclusion et perspectives
  References

Auer S. et al., (2007). DBpedia: A Nucleus for a Web of Open Data. In K. Aberer et al., eds. The Semantic Web, Lecture Note in Computer Science. Springer Berlin Heidelberg, pp. 722–735. 

Beretta V. et al., (2016). How Can Ontologies Give You Clue for Truth-Discovery? An Exploratory Study. In Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics. Nîmes, France, pp. 15:1-15:12.

Berti-Équille L. & Borge-Holthoefer J. (2015). Veracity of Data : From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics, ser. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, pp.1-155. 

Blanco L. et al., (2010). Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources. In Proceedings of the 22nd International Conference on Advanced Information Systems Engineering, Hammamet, Tunisia, pp.83–97. 

Boley H. (2000). Relationships between logic programming and RDF. In Proceedings of the 6th Pacific Rim International Conference on Artificial Intelligence, Melbourne, Australia, Melbourne, Australia,, pp. 201-218. 

Dong X.L. et al., (2010). Global detection of complex copying relationships between sources. In Proceedings of the VLDB Endowment, 3(1-2), pp.1358–1369. 

Dong X.L. et al., (2015). Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. In Proceedings of the VLDB Endowment, 8(9), pp. 938–949. 

Dong X.L., Berti-Equille L. & Srivastava D. (2009a). Integrating conflicting data: the role of source dependence. In Proceedings of the VLDB Endowment, 2(1), pp. 550–561. 

Dong X.L., Berti-Equille L. & Srivastava D. (2009b). Truth Discovery and Copying Detection in a Dynamic World. In Proceeding of VLDB Endowment, 2(1), pp. 562–573. 

Feno D.R. (2007). Mesures de qualité des règles d’association : normalisation et caractérisation des bases. Université de la Réunion, France. 

Galárraga L. et al., (2015). Fast rule mining in ontological knowledge bases with AMIE+. The VLDB Journal, 24(6), pp.707–730. 

Galland A. et al., (2010). Corroborating Information from Disagreeing Views.In Proceedings of the third ACM international conference on Web search and data mining, New York City, NY, USA, pp.131–140. 

Gupta M., Sun Y. & Han J. (2011). Trust analysis with clustering. In Proceedings of the 20th international conference companion on World wide web, pp. 53-54. 

Harispe S. et al., (2015). On the consideration of a bring-to-mind model for computing the Information Content of concepts defined into ontologies. In Proceedings of IEEE International Conference on Fuzzy Systems, Istanbul, Turkey, pp. 1-8. 

Harispe S. et al., (2015). Semantic Similarity from Natural Language and Ontology Analysis. Synthesis Lectures on Human Language Technologies, 8(1), pp.1–254. 

Harispe S. et al., (2013). SML: semantic measure library. Available at: http://www.semanticmeasures- library.org/sml/. 

Hitzler P. et al., (2009). OWL 2 Web Ontology Language Primer. W3C recommendation, pp.1–123. 

Jean P.-A. et al., (2016). Uncertainty Detection in Natural Language: A Probabilistic Model. In Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics. Nîmes, France, pp. 10:1-10:10. 

Li Y. et al., (2015). A Survey on Truth Discovery. ACM SIGKDD Explorations Newsletter, 17(2), pp.1–16. 

Maimon O. & Rokach L. (2005). Data Mining and Knowledge Discovery Handbook. O. Maimon & L. Rokach (eds.), Springer US Publisher, pp.1-1285. 

Mann C.J.H. (2003). The Description Logic Handbook – Theory, Implementation and Applications, Kybernetes, 32(8-9). 

Meng C. et al., (2015). Truth Discovery on Crowd Sensing of Correlated Entities. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, Seoul, Republic of Korea, pp.169–182. 

Nenov Y. et al., (2015). RDFox: A Highly-Scalable RDF Store. In Proceedings of the 14th International Semantic Web Conference, Bethlehem, Pennsylvania, pp. 3-20. 

Pasternack J. & Roth D. (2010). Knowing what to believe (when you already know something). In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, pp.877–885. 

Pasternack J. & Roth D. (2011). Making better informed trust decisions with generalized factfinding. In Proceedings of the22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain, 3, pp.2324–2329. 

Pochampally R. et al., (2014). Fusing data with correlations. In Proceedings of the 2014 ACM Special Interest Group on Management of Data, Snowbird, USA, pp.433–444. 

Qi G.-J. et al., (2013). Mining collective intelligence in diverse groups. In Proceedings of the 22nd international conference on World Wide Web, pp. 1041–1052. 

Quboa Q.K. & Saraee M. (2013). A State-of-the-Art Survey on Semantic Web Mining. Intelligent Information Management, Rio de Janeiro, Brazil, 5, pp.10–17. 

Seco N., Veale T. & Hayes J. (2004). An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European Conference on Artificial Intelligence, Valencia, Spain, pp.1089–1090. 

Shafer G. (1976). A Mathematical Theory of Evidence, Princeton: Princeton University Press. 

Wang D., Abdelzaher T. & Kaplan L. (2015). Social Sensing: Building Reliable Systems on Unreliable Data, Morgan Kaufmann Publishers, San Francisco, CA, USA, pp. 1-232. 

Wang S. et al., (2015). Scalable Social Sensing of Interdependent Phenomena. In Proceedings of the 14th International Conference on Information Processing in Sensor Networks, Seattle, USA, pp.202–213. 

Wang X. et al., (2015). An Integrated Bayesian Approach for Effective Multi-Truth Discovery. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, pp. 493–502. 

Wang Z. & Li J. (2015). RDF2Rules: Learning Rules from RDF Knowledge Bases by Mining Frequent Predicate Cycles. arXiv:1512.07734. 

Yin X., Han J. & Yu P.S. (2008). Truth discovery with multiple conflicting information providers on the Web. IEEE Transactions on Knowledge and Data Engineering, 20(6), pp.796–808. 

Zhao B. et al., (2012). A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration. In Proceedings of the VLDB Endowment, 5(6), pp.550–561.