Home Journals RIA Identification of product categories from advertising catalogs

JOURNAL METRICS

CiteScore 2023: ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

Identification of product categories from advertising catalogs

Céline Alec| Chantal Reynaud-Delaître | Brigitte Safar | Zied Sellami | Uriel Berdugo

LRI, Univ. Paris-Sud, CNRS, Université Paris-Saclay, Orsay, France

Linagora, 100 Terrasse Boieldieu - Tour Franklin, Paris - La Défense, France

Wepingo, 6 Cour Saint Eloi, Paris, France

Corresponding Author Email:

celine.alec@lri.fr, chantal.reynaud@lri.fr, brigitte.safar@lri.fr, zsellami@linagora.com, uriel.berdugo@wepingo.com

Received:

N/A

| |

Accepted:

N/A

| | Citation

ria30_5_06_alec.pdf

OPEN ACCESS

Abstract:

In this paper, we propose an approach of information extraction, based on an ontology, and applied to documents from advertising catalogs. Documents are relatively poor descriptions of products. The information to be extracted, or annotations, concern the categories and features of the products, listed in a domain ontology. Thus, the information extraction about a product is actually an ontology population process, more precisely the population of concepts representing its categories and features. The poverty of the descriptions makes a fully automatic population impossible. We propose a two-step approach: (1) a first semi-Automatic annotation step, which covers a small set of documents; (2) a second step, which annotates all other documents, in an entirely automatic way, based on machine learning mechanisms exploiting the results of the first step. The originality of this work relies on an incremental approach to refine the extracted information. The work described has been applied on real data, in the toy domain.

Keywords:

information extraction, ontology population, semantic annotation, B2C application.

1. Introduction

2. Cadre de travail

3. État de l’art

4. Proposition d’une approche de peuplement d’ontologie

5. Évaluation de l’approche

6. Conclusion et perspectives

Remerciements

Nous remercions la société Wepingo qui a financé ce travail dans le cadre du projet PORASO.

References

Amardeilh F., Damljanovic D. (2009). Du texte à la connaissance : annotation sémantique et peuplement d’ontologie appliqués à des artefacts logiciels. In F. L. Gandon (Ed.), Journées Francophones d’Ingénierie des Connaissances (IC), p. 157-168. Hammamet, Tunisie, PUG.

Amardeilh F., Laublet P., Minel J.-L. (2005). Document annotation and ontology population from linguistic extractions. In Proceedings of the 3rd international conference on Knowledge Capture (K-CAP), p. 161–168. New York, NY, USA, ACM.

Aussenac-Gilles N., Kamel M., Comparot C., Buscaldi D. (2013, juillet). Construction d’ontologies à partir de pages web structurées. In R. Troncy (Ed.), Journées Francophones d’Ingénierie des Connaissances (IC), p. 1–17. Lille, France, AFIA.

Barriere C., Agbago A. (2006). Terminoweb: a software environment for term study in rich contexts. In Proceedings of the 2005 international conference on terminology, standardization and technology transfer, p. 103–113.

Béchet N., Aufaure M.-A., Lechevallier Y. (2012, mai). Construction et peuplement de structures hiérarchiques de concepts dans le domaine du e-tourisme. In Journées Francophones d’Ingénierie des Connaissances (IC), p. 475-490. Chambéry, France. Consulté sur http://hal.archives-ouvertes.fr/hal-00746719

Bontcheva K., Tablan V., Maynard D., Cunningham H. (2004). Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, vol. 10, no 3/4, p. 349–373.

Cortes C., Vapnik V. (1995, septembre). Support-Vector Networks. Machine Learning, vol. 20, no 3, p. 273–297.

Fan R.-E., Chang K.-W., Hsieh C.-J.,Wang X.-R., Lin C.-J. (2008). LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, vol. 9, p. 1871–1874. (Software available at http://www.csie.ntu.edu.tw/~cjlin/liblinear)

Garon D., Filion R., Chiasson R. (2002). Le système ESAR: guide d’analyse, de classification et d’organisation d’une collection de jeux et jouets. Editions ASTED.

Gruber T. R. (1993). A translation approach to portable ontology specifications. Knowledge Acquisition, vol. 5, no 2, p. 199-220.

Hsu C.-W., Chang C.-C., Lin C.-J. (2003). A Practical Guide to Support Vector Classification. Rapport technique. Department of Computer Science, National Taiwan University. Consulté sur http://www.csie.ntu.edu.tw/~cjlin/papers.html

Kessler R., Béchet N., Roche M., Moreno J. M. T., El-Bèze M. (2012). A Hybrid Approach to Managing Job Offers and Candidates. Information Processing and Management, vol. 48, no 6, p. 1124-1135.

Manning C. D., Schütze H. (1999). Foundations of statistical natural language processing. Cambridge, Massachusetts, The MIT Press.

Petasis G., Karkaletsis V., Paliouras G., Krithara A., Zavitsanos E. (2011). Ontology Population and Enrichment: State of the Art. In Knowledge-driven multimedia information extraction and ontology evolution, p. 134-166.

Popov B., Kiryakov A., Ognyanoff D., Manov D., Kirilov A. (2004, septembre). KIM – a Semantic Platform for Information Extraction and Retrieval. Natural Language Engineering, vol. 10, no 3-4, p. 375–392.

Reeve L. (2005). Survey of semantic annotation platforms. In Proceedings of the 2005 acm symposium on applied computing, p. 1634–1638. ACM Press.

Reymonet A., Thomas J., Aussenac-Gilles N. (2007). Modélisation de Ressources Termino-Ontologiques en OWl. In F. Trichet (Ed.), Journées Francophones d’Ingénierie des Connaissances (IC), p. 169-181. Grenoble, France, Cepadues.

Salton G., McGill M. J. (1986). Introduction to Modern Information Retrieval. New York, NY, USA, McGraw-Hill, Inc.

Suchanek F. M., Sozio M., Weikum G. (2009). SOFIE: a Self-Organizing Framework for Information Extraction. In World Wide Web Conference (WWW), p. 631-640. Madrid, Spain, ACM.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Identification of product categories from advertising catalogs