Linguistic Cues to Social Goals in Spoken & VIrtual, Private & Broadcast Interactions
This project addresses the problem of automatic analysis of speech and text for linguistic cues to social goals of participants in multi-party interactions. In particular, the work aims to use data-driven learning in combination with theories of organizational communication identification and control to develop algorithms for automatically extracting linguistic cues and using them to identify speaker/author
- Status in terms of authority;
- Moves to associate or disassociate with individuals or a group; and
- Moves designed to influence opinions, beliefs or actions.
In order to learn linguistic features that generalize across different forms of speech and text, we work with both synchronous (i.e. live) work group planning interactions (meeting recordings, online chat) and asynchronous broadcast discussions (talk shows, Wikipedia discussion pages). We use the term 'genre' to refer to the different data forms, where genre differences comprise the situational characteristics of a text or speech, including its modality (written vs. spoken), timing (synchronous vs. asynchronous), audience (private vs. broadcast), and formality, as well as more fine-grained characteristics. We leverage a variety of machine-learning strategies to identify turn-level social moves and map these to higher level social constructs. Key technical challenges include learning robust features from sparse training data, estimating confidences of decisions, and presentation of meaningful evidence behind decisions.
SPONSOR: IARPA
AWARD PERIOD: August 2009 - October 2011
TEAM MEMBERS:
- UW Senior staff:
Prof. Mari Ostendorf (EE), Principal Investigator
Prof. Emily Bender (Linguistics)
Prof. Mark Zachry (HCDE)
- UW Graduate Students:
Brian Hutchinson, Bin Zhang, Alex Marin, Wei Wu, Anna Margolis (EE)
Meghan Oxley, Liyi Zhu (Linguistics)
Jonathan Morgan (HCDE)
- Collaborators: SRI International
PUBLICATIONS:
Works supported all or in part by the IARPA SCIL program. All statements of fact, opinion or conclusions in the papers below are those of the authors and should not be construed as representing the official views or policies of IARPA or the US Government.
- E. Bender et al., "Annotating social acts: Authority Claims and Alignment Models in Wikipedia Talk Pages," Proc. ACL Workshop on Language in Social Media, to appear, 2011
- A. Marin, B. Zhang, and M. Ostendorf, "Detecting forum authority claims in online discussions," Proc. ACL Workshop on Language in Social Media, to appear, 2011
- J. Morgan, M. Oxley, E. Bender, M. Zachry and B. Hutchinson, "Authority claims as identity markers in Wikipedia discussions," Georgetown Roundtable on Languages and Linguistics, March 2011
- A. Marin et al., "Detecting authority bids in online discussions," Proc. IEEE Workshop on Spoken Language Technology, pp. 49-54, 2010.
- M. Oxley, J.T. Morgan, M. Zachry, and B. Hutchinson, "'What I Know Is . . .': Establishing Credibility on Wikipedia Talk Pages," Proc. WikiSym, 2010.
- B. Zhang, B. Hutchinson, W. Wu, and M. Ostendorf "Extracting phrase patterns with minimum redundancy for unsupervised speaker role classification," Proc. NAACL HLT, pp. 717-720, 2010.
- B. Hutchinson, B. Zhang, and M. Ostendorf, "Unsupervised broadcast conversation speaker role labeling," Proc. ICASSP, pp. 5322-5325, 2010.
RESOURCES:
- Original Wikipedia discussion data extracted from English Wikipedia dump (tgz)
- Alignment and Authority in Wikipedia Discussions (AAWD) Corpus
- Chat data (all languages) tar.gz
- Social constructs data