Embodiment and gender interact in alignment to TTS voices

Michelle Cohn, UC Davis, Davis, California, United States
Patrik Jonell, KTH, Stockholm, Sweden
Taylor Kim, UC Davis, Davis, California, United States
Jonas Beskow, KTH, Stockholm, Sweden
Georgia Zellou, UC Davis, Davis, California, United States

AbstractThe current study tests subjects’ vocal alignment toward female and male text-to-speech (TTS) voices presented via three systems: Amazon Echo, Nao, and Furhat. These systems vary in their physical form, ranging from a cylindrical speaker (Echo), to a small robot (Nao), to a human-like robot bust (Furhat). We test whether this cline of personification (cylinder < mini robot < human-like robot bust) predicts patterns of gender-mediated vocal alignment. In addition to comparing multiple systems, this study addresses a confound in many prior vocal alignment studies by using identical voices across the systems. Results show evidence for a cline of personification toward female TTS voices by female shadowers (Echo < Nao < Furhat) and a more categorical effect of device personification for male TTS voices by male shadowers (Echo < Nao, Furhat). These findings are discussed in terms of their implications for models of device-human interaction and theories of computer personification.

The Document

Return to previous page