Linguistic Cues to Social Goals in Spoken & VIrtual, Private & Broadcast Interactions

This project addresses the problem of automatic analysis of speech and text for linguistic cues to social goals of participants in multi-party interactions. In particular, the work aims to use data-driven learning in combination with theories of organizational communication identification and control to develop algorithms for automatically extracting linguistic cues and using them to identify speaker/author In order to learn linguistic features that generalize across different forms of speech and text, we work with both synchronous (i.e. live) work group planning interactions (meeting recordings, online chat) and asynchronous broadcast discussions (talk shows, Wikipedia discussion pages). We use the term 'genre' to refer to the different data forms, where genre differences comprise the situational characteristics of a text or speech, including its modality (written vs. spoken), timing (synchronous vs. asynchronous), audience (private vs. broadcast), and formality, as well as more fine-grained characteristics. We leverage a variety of machine-learning strategies to identify turn-level social moves and map these to higher level social constructs. Key technical challenges include learning robust features from sparse training data, estimating confidences of decisions, and presentation of meaningful evidence behind decisions.

SPONSOR: IARPA

AWARD PERIOD: August 2009 - October 2011

TEAM MEMBERS:

PUBLICATIONS:

Works supported all or in part by the IARPA SCIL program. All statements of fact, opinion or conclusions in the papers below are those of the authors and should not be construed as representing the official views or policies of IARPA or the US Government.

RESOURCES: