Home Journals RIA Using reinforcement learning to continuously improve a document treatment chain

JOURNAL METRICS

CiteScore 2022: 2.8 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2022: 0.299 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2022: 0.665 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

123.png

Using reinforcement learning to continuously improve a document treatment chain

Esther Nicart| Bruno Zanuttini | Bruno Grilhères | Patrick Giroux | Arnaud Saval

Cordon Electronics DS2i, 27000 Val de Reuil, France

Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France

Airbus Defence and Space, Élancourt, France

Corresponding Author Email:

esther.nicart@cordonweb.com; arnaud.saval@cordonweb.com; bruno.zanuttini@unicaen.fr; esther.nicart@unicaen.fr; bruno.grilheres@airbus.com; patrick.giroux@airbus.com

Received:

| |

Accepted:

| | Citation

ria31_6_03_nicart.pdf

OPEN ACCESS

https://ria.revuesonline.com/accueil.jsp

Abstract:

We model a document treatment chain as a Markov Decision Process, and use reinforcement learning to allow the agent to learn to construct and continuously improve custommade chains “on the fly”. We build a platform which enables us to measure the impact on the learning of various models, web services, algorithms, parameters, etc. We apply this in an industrial setting, specifically to an open source document treatment chain which extracts events from massive volumes of web pages and other open-source documents. Our emphasis is on minimising the burden of the human analysts, from whom the agent learns to improve guided by their feedback on the events extracted. For this, we investigate different types of feedback, from numerical feedback, which requires a lot of tuning, to partially and even fully qualitative feedback, which is much more intuitive, and demands little to no user calibration. We carry out experiments, first with numerical feedback, then demonstrate that intuitive feedback still allows the agent to learn effectively.

Keywords:

artificial intelligence, reinforcement learning, extraction and knowledge management, man-machine interaction, open source intelligence (OSINT)

1. Introduction

2. La plateformeWebLab

3. Apprentissage par renforcement

4. Amélioration continue via l’apprentissage par renforcement

5. Cadre expérimental

6. Mesure de la qualité des résultats

7. Tests avec un feedback numérique

8. Tests avec un feedback intuitif

9. Conclusion et perspectives

Remerciements

Les auteurs veulent remercier Hugo Gilbert pour les fructueuses discussions sur les feedbacks qualitatifs, ainsi que les reviewers anonymes d’IC2015 et de la RIA pour leurs retours constructifs.

References

Akrour R., Schoenauer M., Sebag M. (2011). Preference-based policy learning. In Machine Learning and Knowledge Discovery in Databases, p. 12–27. Springer.

Akrour R., Schoenauer M., Sebag M. (2012). APRIL: Active preference learning-based reinforcement learning. In Machine Learning and Knowledge Discovery in Databases, vol. 7524, p. 116–131. Springer Berlin Heidelberg.

Amann B., Constantin C., Caron C., Giroux P. (2013, mars). WebLab PROV: Computing finegrained provenance links for XML artifacts. In BIGProv’13 Workshop (in conjunction with EDBT/ICDT), p. 298-306. Gênes, Italy, ACM.

Azaria A., Rabinovich Z., Kraus S., Goldman C. V., Gal Y. (2012). Strategic advice provision in repeated human-agent interactions. Institute for Advanced Computer Studies University of Maryland, vol. 1500, p. 20742.

Brafman R. I., Tennenholtz M. (2003). R-max-a general polynomial time algorithm for nearoptimal reinforcement learning. The Journal of Machine Learning Research, vol. 3, p. 213–231.

Bratko I., Suc D. (2003). Learning qualitative models. Artificial Intelligence, vol. 24, no 4, p. 107.

Busa-Fekete R., Szörényi B.,Weng P., ChengW., Hüllermeier E. (2014, décembre). Preferencebased reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm. Machine Learning, vol. 97, no 3, p. 327–351.

Camel. (2015). Apache Camel. http://camel.apache.org/. (Accessed: 2015-03-17)

Caron C., Amann B., Constantin C., Giroux P., Santanchè A. (2014). Provenance-based quality assessment and inference in data-centric workflow executions. In On the move to meaningful internet systems: Otm 2014 conferences, p. 130–147.

Cohen W. W., Dalvi B. B., Cohen B. D. W. W. (2013). Very Fast Similarity Queries on Semi-Structured Data from the Web. In SDM, p. 512–520.

Cunningham H., Maynard D., Bontcheva K., Tablan V., Aswani N., Roberts I. et al. (2014). Developing Language Processing Components with GATE Version 8 (a User Guide). https://gate.ac.uk/sale/tao/tao.pdf. (Accessed: 2014-12-17)

Doucy J., Abdulrab H., Giroux P., Kotowicz J.-P. (2008). Méthodologie pour l’orchestration sémantique de services dans le domaine de la fouille de documents multimédia.

Dutkiewicz J., Je¸drzejek C., Cybulka J., Falkowski M. (2013). Knowledge-based highlyspecialized terrorist event extraction. RuleML2013 Challenge, Human Language Technology and Doctoral Consortium, p. 1.

Fishburn P. C. (1984). SSB utility theory: an economic perspective. Mathematical Social Sciences, vol. 8, no 1, p. 63 - 94. Consulté sur http://www.sciencedirect.com/science/article/pii/0165489684900611

Formiga L., Barrón-Cedeño A., Màrquez L., Henríquez C. A., Mariño J. B. (2015). Leveraging online user feedback to improve statistical machine translation. Journal of Artificial Intelligence Research, vol. 54, p. 159–192.

Fromherz M. P., Bobrow D. G., De Kleer J. (2003). Model-based computing for design and control of reconfigurable systems. AI magazine, vol. 24, no 4, p. 120.

Fürnkranz J., Hüllermeier E., Cheng W., Park S.-H. (2012, octobre). Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Machine Learning, vol. 89, no 1-2, p. 123–156.

GATE. (2016). GATE Information Extraction. https://gate.ac.uk/ie/. (Accessed: 2016-06-20)

Geonames. (2015). Geonames. http://www.geonames.org/. (Accessed: 2015-03-17)

Gilbert H., Zanuttini B., Viappiani P., Weng P., Nicart E. (2016). Model-free reinforcement learning with skew-symmetric bilinear utilities. (Accepted at UAI16. Available at http://zanuttini.users.greyc.fr/research/ssbQLearning.pdf)

Ginstrom R. (2007). The GITS Blog: Fuzzy substring matching with Levenshtein distance in Python. http://ginstrom.com/. (Accessed: 2014-08-19)

Hage W. R. van, Malaisé V., Segers R., Hollink L., Schreiber G. (2011, juillet). Design and use of the Simple Event Model (SEM). Web Semantics: Science, Services and Agents on the World Wide Web, vol. 9, no 2, p. 128–136.

Karami A. B., Sehaba K., Encelle B. (2014, mai). Apprentissage de connaissances d’adaptation à partir des feedbacks des utilisateurs. In 25es Journées francophones d’Ingénierie des Connaissances, p. 125–136.

Knox W. B., Stone P. (2015). Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance. Artificial Intelligence, vol. 225, p. 24–50.

LaFree G. (2010). The Global Terrorism Database: Accomplishments and Challenges | LaFree | Perspectives on Terrorism. Perspectives on Terror, vol. 4, no 1.

Loftin R., Peng B., MacGlashan J., Littman M. L., Taylor M. E., Huang J. et al. (2016-01). Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous Agents and Multi-Agent Systems, vol. 30, no 1, p. 30–59.

NGramJ. (2015). NGramJ, smart scanning for document properties. http://ngramj.sourceforge .net/. (Accessed: 2015-02-18)

Nicart E., Zanuttini B., Grilhères B., Praca F. (2016). Dora Q-learning - making better use of explorations. In D. Pellier (Ed.), Proc. 11es journées francophones sur la planification, la décision et l’apprentissage pour la conduite de systèmes (jfpda 2016).

Nicart E., Zanuttini B., Grilhères B., Giroux P. (2015). Amélioration continue d’une chaîne de traitement de documents avec l’apprentissage par renforcement. In Actes des 26es journées francophones d’Ingénierie des Connaissances (IC 2105).

Ogrodniczuk M., Przepiórkowski A. (2010). Linguistic Processing Chains as Web Services: Initial Linguistic Considerations. In Proceedings of the Workshop on Web Services and Processing

Pipelines in HLT: Tool Evaluation, LR Production and Validation (WSPP 2010) at the Language Resources and Evaluation Conference (LREC 2010), p. 1–7.

Pandit S., Gupta S., others. (2011). A comparative study on distance measuring approaches for clustering. International Journal of Research in Computer Science, vol. 2, no 1, p. 29–31.

Puterman M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming 1st. John Wiley & Sons, Inc. New York, NY, USA.

Rao K., Whiteson S. (2011). V-MAX: A General Polynomial Time Algorithm for Probably Approximately Correct Reinforcement Learning. Thèse de doctorat non publiée, Amsterdam.

Rodrigues F., Oliveira N., Barbosa L. (2015). Towards an engine for coordination-based architectural reconfigurations. Computer Science and Information Systems, vol. 12, no 2, p. 607–634.

Saïs F., Serrano L., Khefifi R., Scharffe F. (2013). SOS-DLWD 2013.

Serrano L. (2014). Vers une capitalisation des connaissances orientée utilisateur: extraction et structuration automatiques de l’information issue de sources ouvertes. Thèse de doctorat non publiée, Universté de Caen.

Sutton R. J., Barto A. G. (1998). Reinforcement learning: An introduction. MIT press.

Szepesvári C. (2010). Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, vol. 4, no 1, p. 1–103.

Tika. (2015). Apache Tika - a content analysis toolkit. http://tika.apache.org/. (Accessed: 2015-02-18)

Tversky A., Gati I. (1978). Studies of similarity. Cognition and categorization, vol. 1, no 1978, p. 79–98.

Watkins C. J. C. H. (1989). Learning From Delayed Rewards. Thèse de doctorat non publiée, Kings College.

WebLab. (2015). WebLab wiki. http://weblab-project.org/. (Accessed: 2015-03-17)

Wilson A., Fern A., Tadepalli P. (2012). A bayesian approach for policy learning from trajectory preference queries. In Advances in neural information processing systems, p. 1133–1141.

Wirth C., Fürnkranz J. (2013). EPMC: Every visit preference monte carlo for reinforcement learning. In Asian conference on machine learning, ACML 2013, canberra, ACT, australia, november 13-15, 2013, p. 483–497.

Wirth C., Neumann G. (2015). Model free preference-based reinforcement learning. In EWRL.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Using reinforcement learning to continuously improve a document treatment chain