Modeling rapport for conversations about health with autonomous avatars from video corpus of clinician-client therapy sessions

Abstract. In human face-to-face conversations, non-verbal behaviors (NVB), such as gaze, facial expressions, gestures, and body postures, can improve communication effectiveness, by creating a smooth interaction between the interlocutors - called rapport. During human interactions with embodied conversational agents (ECAs) (a.k.a. virtual humans), akey issue for the success of the interaction is the ability of an ECA to establish and maintain some level of rapport with its human counterpart.This need is particularly important for ECAs who interact in contexts involving socio-emotional content, such as education and entertainment,or in the role of health assistants delivering healthcare interventions, asin the context of this study. Because clinical psychologists are trained in establishing and maintaining rapport, we designed an ECA that learns offline from such an expert, which NVBs to display, when to display them,when not to display them, in real time. We describe our data-driven machine learning approach to modeling rapport from a corpus of annotated videos of counseling sessions, that were conducted by a licensed practicing clinical psychologist with role-playing patients. Results of a randomly controlled experiment show that, in its role of delivering a brief screening health intervention, our ECA improved user’s attitude,intention to (re-)use the ECA system, perceived enjoyment, perceived sociability, perceived usefulness, social presence, and trust.

Full text available here: Amini, Boustani, & Lisetti (2021)