University of Pittsburgh
Beyond a Bag of Words | Baekkwan Park, Michael Colaresi, Kevin Greene | 2019 | Peace Economics, Peace Science and Public Policy

Beyond a Bag of Words: Using PULSAR to Extract Judgments on Specific Human Rights at Scale

Baekkwan Park, Kevin Greene and Michael Colaresi

Beyond a Bag of Words | Baekkwan Park, Michael Colaresi, Kevin Greene | 2019 | Peace Economics, Peace Science and Public Policy

Beyond a Bag of Words | Baekkwan Park, Michael Colaresi, Kevin Greene | 2019 | Peace Economics, Peace Science and Public Policy

Abstract

Sentiment, judgments and expressed positions are crucial concepts across international relations and the social sciences more generally. Yet, contemporary quantitative research has conventionally avoided the most direct and nuanced source of this information: political and social texts. In contrast, qualitative research has long relied on the patterns in texts to understand detailed trends in public opinion, social issues, the terms of in- ternational alliances, and the positions of politicians. Yet, qualitative human reading does not scale to the ac- celerating mass of digital information available currently. Researchers are in need of automated tools that can extract meaningful opinions and judgments from texts. Thus, there is an emerging opportunity to marry the model-based, inferential focus of quantitative methodology, as exemplified by ideal point models, with high resolution, qualitative interpretations of language and positions. We suggest that using alternatives to simple bag of words (BOW) representations and re-focusing on aspect-sentiment representations of text will aid re- searchers in systematically extracting people’s judgments and what is being judged at scale. The experimental results below show that our approach which automates the extraction of aspect and sentiment MWE pairs, out- performs BOW in classification tasks, while providing more interpretable parameters. By connecting expressed sentiment and the aspects being judged, PULSAR (Parsing Unstructured Language into Sentiment-Aspect Rep- resentations) also has deep implications for understanding the underlying dimensionality of issue positions and ideal points estimated with text. Our approach to parsing text into aspects-sentiment expressions recovers both expressive phrases (akin to categorical votes), as well as the aspects that are being judged (akin to bills). Thus, PULSAR or future systems like it, open up new avenues for the systematic analysis of high-dimensional opinions and judgments at scale within existing ideal point models.