I really appreciate seeing the principled reasoning for using certain types of classifiers, and doing feature analysis both before and after classification. On this basis alone, this paper seems like a good guide to classifier best practices for the social sciences. Moreover, the discussion section takes care to relate the specific findings to larger theoretical ideas in affective states, like collaborative conversation style, and the relationship between specific features and affective state (e.g., negation use during flirtation may be related to teasing or self-deprecation; the potential distinction between extraversion and assertiveness; the connection between hedging and psychological distancing; what laughter signals at different points in the conversational turn). Thanks, R&al2013!
(1) Data cleanliness: R&al2013 want a really clean data set to learn from, which is why they start with the highest 10% and lowest 10% of judged stance ratings. We can certainly see the impact of having messier data, based on the quartile experiments. In short, if you use less obvious examples to train, you end up with worse performance. I wonder what would happen if you use the cleaner data to train (say, the top and bottom 10%), but tested on classifying the messier data (top and bottom 25%). Do you think you would still do as poorly, or would you have learned some good general features from the clean dataset that can be applied to the messy dataset? (I’m thinking about this in terms of child-directed speech (CDS) for language acquisition, where CDS is “cleaner” in various respects than messy adult-directed data.)
(2) This relates to the point in the main section about how R&al2013 really care about integrating insights from the psychology of the things they’re trying to classify. In the lit review, I appreciated the discussion of the psychological literature related to interpersonal stance (e.g., specifying the different categories of affective states). This demonstrates the authors are aware of the cognitive states underpinning the linguistic expression.
(3) Lexical categories, using 10 LIWC-like categories: I appreciated seeing the reasoning in footnote 1 about how they came up with these, and more importantly, why they modified them the way they did. While I might not agree with leaving the “love” and “hate” categories so basic (why not use WordNet synsets to expand this?), it’s at least a reasonable start. Same comment for the hedge category (which I love seeing in the first place).
(4) Dialog and discourse features: Some of these seem much more complex to extract (ex: sympathetic negative assessments). The authors went for a simple heuristic regular expression to extract these, but this is presumably only a (reasonable) first-pass attempt. On the other hand, given that they had less than 1000 speed-dates, they probably could have done some human annotation of these just to give the feature the best shot of being useful. Then, if it’s useful, they can worry about how to automatically extract it later.
(5) It’s so interesting to see the accommodation of function words signifying flirtation. Function words were the original authorship stylistic marker, under the assumption that your use of function words isn’t under your conscious control. I guess the idea would be that function word accommodation also isn’t really under your conscious control, and imitation is the sincerest form of flattery (=~ flirtation)…