Home Journals RIA Temporal association rules of social signals for the synthesis of behaviors of embodied conversationnal agents. Application to interpersonal stance

JOURNAL METRICS

CiteScore 2023: ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2023: ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2023: ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

Temporal association rules of social signals for the synthesis of behaviors of embodied conversationnal agents. Application to interpersonal stance

Thomas Janssoone| Chloé Clavel | Kévin Bailly | Gaël Richard

Sorbonne Universités, UPMC Univ Paris 06, CNRS, ISIR, 4 place Jussieu, 75252 Paris, France

LTCI, Télécom ParisTech, Université Paris Saclay, Paris, France

Corresponding Author Email:

prenom.nom@isir.upmc.fr; prenom.nom@telecom-paristech.fr

Received:

| |

Accepted:

| | Citation

ria31_5_04_janssoone.pdf

OPEN ACCESS

https://ria.revuesonline.com/accueil.jsp

Abstract:

In the field of Embodied Conversational Agent (ECA) one of the main challenges is to generate socially believable agents. The long run objective of the present study is to infer rules for the multimodal generation of agents’ socio-emotional behaviour. In this paper, we introduce the Social Multimodal Association Rules with Timing (SMART) algorithm. It proposes to learn the rules from the analysis of a multimodal corpus composed by audio-video recordings of human-human interactions. The proposed methodology consists in applying a Sequence Mining algorithm using automatically extracted Social Signals such as prosody, head movements and facial muscles activation as an input. This allows us to infer Temporal Association Rules for the behaviour generation. We show that this method can automatically compute Temporal Association Rules coherent with prior results found in the literature especially in the psychology and sociology fields. The results of a perceptive evaluation confirms the ability of a Temporal Association Rules based agent to express a specific stance.

Keywords:

temporal association rules, TITARL, virtual agent, interpersonal stance, social signal processing

1. Introduction

2. État de l’art

3. SMART : trouver l’information temporelle liant les signaux sociaux

4. Validation : études selon différents signaux sociaux et différentes échelles de temps

5. Conclusion

Remerciements

Ce travail a été réalisé au sein du Labex SMART avec le support financier de l’État français, représenté par l’ANR, dans le cadre du programme Investissements d’Avenir sous la référence ANR-11-IDEX-0004-02. Les auteurs veulent également remercier la plate-forme Teralab pour son aide dans la réalisation de ce projet.

References

Argyle M. (1975). Bodily communication. Methuen Publishing Company. Audibert N. (2007). Morphologie prosodique des expressions vocale des affects: quel timing pour le décodage de l’information émotionnelle. Actes des VIIèmes RJC Parole, Paris.

Barbulescu A., Ronfard R., Bailly G. (2016). Characterization of audiovisual dramatic attitudes. In Interspeech.

Bawden R., Clavel C., Landragin F. (2015). Towards the generation of dialogue acts in socioaffective ecas: a corpus-based prosodic analysis. Language Resources and Evaluation.

Boersma P., Weenink D. (2017, March). Praat: doing phonetics by computer [computer program]. version 6.0.27. Consulté sur http://www.praat.org/

Cafaro A., Vilhjálmsson H. H., Bickmore T., Heylen D., Jóhannsdóttir K. R., Valgardsson G. S. (2012). First impressions: Users’ judgments of virtual agents’ personality and interpersonal attitude in first encounters. In International conference on intelligent virtual agents.

Chen Y., Gao W., Zhu T., Ling C. (2002). Learning prosodic patterns for mandarin speech synthesis. Journal of Intelligent Information Systems.

Chindamo M., Allwood J., Ahlsen E. (2012). Some suggestions for the study of stance in communication. In Privacy, security, risk and trust (passat), 2012 international conference on and 2012 international conference on social computing (socialcom).

Chollet M., Ochs M., Pelachaud C. (2014). From non-verbal signals sequence mining to bayesian networks for interpersonal attitudes expression. In International conference on intelligent virtual agents.

Cowie R., Gunes H., McKeown G., Vaclau-Schneider L., Armstrong J., Douglas-Cowie E. (2010). The emotional and communicative significance of head nods and shakes in a naturalistic database. In Proc. of lrec int. workshop on emotion.

Cowie R., Sawey M. (2011). Gtrace-general trace program from queen’s, belfast.

Degottex G., Kane J., Drugman T., Raitio T., Scherer S. (2014). Covarep - a collaborative voice analysis repository for speech technologies. In 2014 ieee international conference on acoustics, speech and signal processing (icassp).

Dermouche S., Pelachaud C. (2016). Sequence-based multimodal behavior modeling for social agents. In 18th acm international conference on multimodal interaction. ACM.

Fernandez R., Rendel A., Ramabhadran B., Hoory R. (2014). Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. In Interspeech.

Guillame-Bert M., Crowley J. L. (2012). Learning temporal association rules on symbolic time sequences. In Asian conference on machine learning.

Janssoone T., Clavel C., Bailly K., Richard G. (2016). Using temporal association rules for the synthesis of embodied conversational agent with a specific stance. In International conference on intelligent virtual agents.

Keltner D. (1995). Signs of appeasement: Evidence for the distinct displays of embarrassment, amusement, and shame. Journal of personality and social psychology, vol. 68, no 3.

Laskowski K., Edlund J., Heldner M. (2008). Learning prosodic sequences using the fundamental frequency variation spectrum. In Proc. 4th international conference on speech prosody.

Lee J., Marsella S. (2012). Modeling speaker behavior: A comparison of two approaches. In Iva.

Martínez H. P., Yannakakis G. N. (2011). Mining multimodal sequential patterns: a case study on affect detection. In Proceedings of the 13th international conference on multimodal interfaces.

McKeown G., Valstar M., Cowie R., Pantic M., Schröder M. (2012). The semaine database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Transactions on Affective Computing, vol. 3, no 1.

Nicolle J., Bailly K., Chetouani M. (2015). Facial action unit intensity prediction via hard multi-task metric learning for kernel regression. , vol. 6.

Ochs M., Pelachaud C. (2012). Model of the perception of smiling virtual character. In 11th international conference on autonomous agents and multiagent systems-volume 1.

Pentland A. (2004). Social dynamics: Signals and behavior. In 3rd international conference on developmental learning.

Ravenet B., Ochs M., Pelachaud C. (2013). From a user-created corpus of virtual agent’s non-verbal behavior to a computational model of interpersonal attitudes. In International workshop on intelligent virtual agents.

Rudovic O., Nicolaou M. A., Pavlovic V. (2014). 1 machine learning methods for social signal processing.

Sandbach G., Zafeiriou S., Pantic M. (2013). Markov random field structures for facial action unit intensity estimation. In Ieee international conference on computer vision workshops.

Savran A., Cao H., Nenkova A., Verma R. (2014). Temporal bayesian fusion for affect sensing: Combining video, audio, and lexical modalities. IEEE transactions on cybernetics, vol. 45, no 9.

Scherer K. R. (2005). What are emotions? and how can they be measured? Social science information.

Srikant R., Agrawal R. (1996). Mining sequential patterns: Generalizations and performance improvements. Advances in Database Technology—EDBT’96.

Truong K., Heylen D., Chetouani M., Mutlu B., Salah A. A. (2015). Erm4ct ’15: Proceedings of the international workshop on emotion representations and modelling for companion technologies.

Tusing K. J., Dillard J. P. (2000). The sounds of dominance. Human Communication Research.

Vinciarelli A., Pantic M., Bourlard H. (2009). Social signal processing: Survey of an emerging domain. Image and vision computing, vol. 27, no 12.

Vinciarelli A., Pantic M., Heylen D., Pelachaud C., Poggi I., D’Errico F. et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing. IEEE Transactions on Affective Computing.

Ward N., A. S. (2016). Action-coordinating prosody. Speech Prosody.

Zhao R., Sinha T., Black A., Cassell J. (2016). Socially-aware virtual agents: Automatically assessing dyadic rapport from temporal patterns of behavior. In International conference on intelligent virtual agents.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Temporal association rules of social signals for the synthesis of behaviors of embodied conversationnal agents. Application to interpersonal stance