Semi-Automatic formalization of a patient/doctor vocabulary for breast cancer

Mike Donald Tapi Nzali Jérôme Azé Sandra Bringay Christian Lavergne Caroline Mollevi Thomas Opitz 

Université de Montpellier, France

Université Paul Valéry, Montpellier 3, France

Institut Montpelliérain Alexander Grothendieck, France

Laboratoire d’Informatique, de Robotique et de Microélectronique de Montpellier, France

Institut du Cancer Montpellier, Montpellier, France

Biostatistique et Processus Spatiaux, INRA Avignon, France

Corresponding Author Email:,,,,,
31 October 2016
Nowadays, social media is increasingly used by patients and health professionals. Most often, the patients are lay in the medical field, they use slang, abbreviations, and their own vocabulary during their exchanges. In order to automatically analyze texts from social networks, we need a specific vocabulary. Considering a corpus of documents from messages from social media like forums and Facebook, we describe the construction of a lexical resource that aligns the vocabulary of patients to that of health professionals. In order to build this resource and transform it into a SKOS ontology, we use several methods taking into account the linguistic and statistical aspects proposed in the literature. On the one hand, this work will improve information retrieval in health forums and on the other hand it will facilitate the development of statistical studies based on information extracted from these forums.


information extraction, social media, statistic-based measure, ontology, patient vocabulary.

1. Introduction
2. Motivations et état de l’art
3. Méthodes
4. Résultats
5. Formalisation de la ressource sous la forme d’une ontologie en SKOS
6. Discussion
7. Conclusion et perspectives

Ces travaux ont été financés par l’ANR SFIR (Semantic Indexing of French Biomedical Data Resources) et par par l’Institut de Recherche en Santé Publique (http:/ /


