Semantic similarity between controled vocabularies

Semantic similarity between controled vocabularies

Mélissa Mary Lina F. Soualmia Xavier Gansel

bioMérieux SA, Département développement et intégration, 3 route de Port Michaud, 38390 La Balme Les Grottes, France

LITIS EA 4108 et NormaSTIC CNRS 3638, Normandie Université, Université de Rouen de Normandie, 76000 Rouen, France

LIMICS INSERM UMR_1142, Sorbonne Universités, 75000 Paris, France

Corresponding Author Email: 
{melissa.mary, xavier.gansel}@biomerieux.com, Lina.Soualmia@chu-rouen.fr
Page: 
55-83
|
DOI: 
https://doi.org/10.3166/ISI.21.5-6.55-83
Received: 
N/A
|
Accepted: 
N/A
|
Published: 
31 December 2016
| Citation

OPEN ACCESS

Abstract: 

Medical data numerization raises syntactic but also semantic interoperability challenges between information systems and knowledge organisation systems. Knowledge integration was largely studied into general purposes as in specific domain such as clinical and biology. As in vitro diagnostics is transdisciplinary domain it should answer to the same knowledge integration issues, which are encountered in clinical and biological field, using tools adapted to its multidisciplinary knowledge. In this article we propose a literature review about knowledge integration and linked data state of art with a specific focused on IVD data. We present an evaluation of concepts alignment extracted from two standards used in DIV and available on line. Methods we propose are based on three lexical semantic similarity measures and one heuristic algorithm. Results we obtained illustrates that lexical measures are not enough efficient to be used into laboratory domain. However, alignments obtained with the heuristic approach and filtered with a semantic dimension comply with our performance criteria. This strategy is under improvement process by the integration of semantic similarity and the refinement of lexical parameter into the heuristic approach.

Keywords: 

data integration, ontology alignment, biomedical terminology

1. Introduction
2. Revue de littérature
3. Matériel et méthode
4. Résultats et discussion
5. Conclusion
  References

Ananiadou S., McNaught J. (2006). Text Mining for Biology and Biomedicine. Citeseer. Aronson A. R. (2006). Metamap: Mapping text to the umls metathesaurus. Bethesda, MD: NLM, NIH, DHHS, p. 1-26.

Bellahsene Z., Bonifati A., Rahm E. (2011). Schema Matching and Mapping (vol. 57). Springer.

Blumenthal D. (2010). Launching HITECH. New England Journal of Medicine, vol. 362, n° 5, p. 382-385.

Bodenreider O. (2008). Issues in Mapping LOINC Laboratory Tests to SNOMED CT. AMIA Annual Symposium Proceedings, p. 51-55.

Brahma B., Refoufi A. (2015). Ontology Matching Algorithms. Communication présentée au Proceedings of the International Conference on Intelligent Information Processing, Security and Advanced Communication, New York, NY, USA : ACM, p. 89:1-89:5.

Cerami E. G., Gross B. E., Demi E., Rodchenkov I., Babur Ö., Anwa N., Shcultz N., Bader G. D., Sander C. (2011). Pathway Commons, a web resource for biological pathway data. Nucleic Acids Research, 39(suppl 1), D685-D690.

Cohen K. B. (2016, juin). Clinical language and scientific language: linguistic contrasts and ontological similarities. Communication présentée au Atelier IA & Santé, 27e journées francophones d’Ingénierie des Connaissances.

Encode Project Consortium (2004). The ENCODE (ENCyclopedia of DNA elements) project. Science, 306, n° 5696, p. 636-640.

Cornet R., de Keizer N. (2008). Forty years of SNOMED: a literature review. BMC Medical Informatics and Decision Making, 8 (Suppl 1), S2.

Damerau F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, vol. 7, n° 3, p. 171-176.

De Morveau L.-B. G. (1787). Méthode de nomenclature chimique. Dolin R. H., Huff S. M., Rocha R. A., Spackman K. A., Campbell K. E. (1998). Evaluation of a "lexically assign, logically refine" strategy for semi-automated integration of overlapping terminologies. Journal of the American Medical Informatics Association, vol. 5, n° 2, p. 203-213.

Dos Reis J. C., Pruski C., Reynaud-Delaître C. (2015). State-of-the-art on mapping maintenance and challenges towards a fully automatic approach. Expert Systems with Applications, vol. 42, n° 3, p. 1465-1478.

Dumontier M., Callahan A., Cruz-Toledo J., Ansell P., Emonet, V., Belleau F., Droit A. (2014). Bio2RDF Release 3: A Larger Connected Network of Linked Data for the Life Sciences. Communication présentée au Proceedings of the 2014 International Conference on Posters & Demonstrations Track, vol. 1272 Aachen, Germany, p. 401-404.

Euzenat J. et Shvaiko P. (2007). Ontology matching (vol. 333). Springer.

Fieschi M. (2009). La gouvernance de l’interopérabilité sémantique est au coeur du développement des systèmes d’information en santé (rapport public Publication 094000394).

Grosjean J., Merabti T., Dahamna B., Kergourlay I., Thirion B., Soualmia L. F., Darmoni S. J. (2011). Health multi-terminology portal: a semantic added-value for patient safety. StudHealth Technol Inform, 166, p. 129-138.

Hamdi F., Safar B., Reynaud C., Zargayouna H. (2010). Alignment-based partitioning of large-scale ontologies. Advances in Knowledge Discovery and Management Springer, p. 251-269.

Hettne K. M., van Mulligen E. M., Schuemie M. J., Schijvenaars B. J., Kors J. A. (2010). Rewriting and suppressing UMLS terms for improved biomedical term identification. Journal of Biomedical Semantics, vol. 1, n° 1, p. 5.

Hodge G. (2000). Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. ERIC. IHTSDO et Regenstrief Institute. (juillet 2013). Regenstrief and the IHTSDO are working together to link LOINC and SNOMED CT. Repéré à https://loinc.org/collaboration/ihtsdo

Krauthammer M., Nenadic G. (2004). Term identification in the biomedical literature. Journal of Biomedical Informatics, vol. 37, n° 6, p. 512-526.

Lapage S. P., Sneath P. H. A., Lessel E. F., Skerman V. B. D., Seeliger H. P. R., Clark W. A. (Dir.). (1992). International Code of Nomenclature of Bacteria: Bacteriological Code, 1990 Revision. Washington (DC) : ASM Press.

Lapatas V., Stefanidakis M., Jimenez R. C., Via A., Schneider M. V. (2015). Data integration in biological research: an overview. Journal of Biological Research-Thessaloniki, vol. 22, n° 1, p. 9.

Levenshtein V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Communication présentée au Soviet Physics Doklady, vol. 10, p. 707-710.

Liu H., Lussier Y. A., Friedman C. (2001). Disambiguating Ambiguous Biomedical Terms in Biomedical Narrative Text: An Unsupervised Method. Journal of Biomedical Informatics, vol. 34, n° 4, p. 249‑261.

Louie B., Mork P., Martin-Sanchez F., Halevy A., Tarczy-Hornoch P. (2007). Data integration and genomic medicine. Journal of Biomedical Informatics, vol. 40, n° 1, p. 5-16.

Macary F. (2007). IHDE, CDA et LOINC : des composants d’interopérabilité au service du partage des résultats de biologie médicale. Spectra biologie, vol. 26, n° 158, p. 51-57.

Mary M., Soualmia, L. F., Gansel, X. (2016, juin). Evaluation de la qualité des liens sémantique entre vocabulaires contrôlés. Communication présentée au Atelier SoWeDo, 27e Journées francophones d’Ingénierie des Connaissances, Montpellier.

Mary M., Soualmia L. F., Gansel X. (2016, octobre). Projection des propriétés d’une ontologie pour la classification d’une ressource terminologique. Communication présentée aux 6e Journées francophones sur les ontologies, Bordeaux.

McCray A. T. (1989). The UMLS Semantic Network. Communication. Annual Symposium Proceedings on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care. American Medical Informatics Association, p. 503-507.

Merabti T., Grosjean J., Soualmia L. F., Joubert M., Darmoni S. J. (2012). Aligning biomedical terminologies in French: towards semantic interoperability in medical applications. INTECH Open Access Publisher.

Nelson S. J., Powell T., Humphreys L. B. (2006). The Unified Medical Language System (UMLS) of the National Library of Medicine, 61, p. 40-42.

Ogren P. V., Cohen K. B., Acquaah-Menah G. V., Eberlein J., Hunter L. (2004). The compositional structure of Gene Ontology terms. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, p. 214-225.

Rahm E., Bernstein P. A. (2001). A survey of approaches to automatic schema matching. The VLDB Journal, vol. 10, n° 4, p. 334-350.

Robertson S. E., Jones K. S. (1976). Relevance weighting of search terms. Journal of the American Society for Information science, vol. 27, n° 3, p. 129-146.

Sahama T. R., Croll P. R. (2007). A data warehouse architecture for clinical data warehousing. Communication présentée au Proceedings of the fifth Australasian symposium on ACSW frontiers, vol. 68, Australian Computer Society, Inc., p. 227-232.

Sheide A., Wilson P. S. (2013). Reading up on LOINC. Journal of AHIMA/American Health Information Management Association, vol. 84, n° 4, p. 58-60.

Shvaiko P., Euzenat J. (2005). A survey of schema-based matching approaches. Journal on Data Semantics IV. Springer, p. 146-171.

Shvaiko P., Euzenat J. (2008). Ten challenges for ontology matching. On the Move to Meaningful Internet Systems: OTM 2008, p. 1164-1182.

Sorrentino S., Bergamaschi S., Gawinecki M., Po L. (2010). Schema label normalization for improving schema matching. Data & Knowledge Engineering, vol. 69, n° 12, p. 1254-1273.

Stroetmann V. (2009). Semantic Interoperability for Better Health and Safer Healthcare. European Communities.

Turian J., Ratinov L., Bengio Y. (2010). Word representations: a simple and general method for semi-supervised learning. Communication présentée au Proceedings of the 48th annual meeting of the association for computational linguistics (p. 384–394), Association for Computational Linguistics.

Vreeman D. (7 novembre 2015). Guidelines for using LOINC and SNOMED CT Together. Daniel Vreeman. https://danielvreeman.com/guidelines-for-using-loinc-and-snomed-cttogether-without-overlap/

Winkler W. E. (1999). The state of record linkage and current research problems. Communication présentée au Statistical Research Division, US Census Bureau, Citeseer.

Yamamoto Y., Yamaguchi A., Bono H., Takagi T. (2011). Allie: a database and a search service of abbreviations and long forms. Database: The Journal of Biological Databases and Curation.

Zhou W., Torvik V. I., Smalheise N. R. (2006). ADAM: another database of abbreviations in MEDLINE. Bioinformatics, vol. 22, n° 22, p. 2813-2818.