I really enjoyed seeing another example of a quantitative framework that builds a pipeline between behavioral data and modeling work. The new(er?) twist in Y&al2017 for me is using Bayesian data analysis to do model-fitting after the behavioral data were collected (originally collected to evaluate the unfitted model predictions). It definitely seems like the right thing to do for model validation. More generally, this pipeline approach seems like the way forward for a lot of different language science questions where we can’t easily manipulate the factors we want experimentally. (In fact, you can see some of the trouble here about how to interpret the targeted behavioral manipulations even still.)
More targeted thoughts:
(1) I liked seeing this specific implementation of hedging, which is a catch-all term for a variety of behaviors that soften the content (= skew towards face-saving). It’s notable that the intuition seems sensible (use more negation when you want to face-save), but the point of the model and subsequent behavioral verification is to concretely test that intuition. Just because something’s sensible in theory doesn’t mean it’s true in practice.
A nice example of this for me was the prediction in Figure 2 that more negations occur when the goal is both social and informative (both-goal), rather than just social. Basically, the social-only speaker tells direct white lies, while the informative-only speaker just tells the truth, so neither uses negation as much as the both-goal speaker for negative states.
(2) I think I need to unpack that first equation in the Polite RSA model. I’m not familiar with the semicolon notation — is this the joint utility of the utterance (w) ….given the state (s)….and given the goal weights (epistemic and social)? (This is what shows up P_S2.) The rest I think I follow: the first term is the epistemic weight * the negative surprisal of L0; the second term is the social weight * the value for those states that are true for L0; the third term is the cost of the utterance (presumably in length, as measured by words).
(3) Figure 1: How funny that “it wasn’t terrible” is accepted at near ceiling when the true value is “good” (4 out of 5) or “amazing” (5 out of 5). Is this some kind of sarcasm/curmudgeonly speaker component at work?
(4) For the production experiment, I keep thinking this is the kind of thing where a person’s own social acuity might matter. (That is, if the poem was 2 out of 5, and a tactful vs. direct person is asked what they’d say to make someone else feel good, you might get different responses.) I wonder if they got self-report on how tactful vs. direct their participants thought they were (and whether this actually does matter).
I also have vague thoughts that this is the kind of task you could use to indirectly gauge tactful vs. direct in neurotypical people (say, for HR purposes) as well as in populations that struggle with non-literal language. This might explain some of the significant deviations in the center panel of Figure 2 for the low states (1 and 2): the participants for social-only used negation (rather than white lies, presumably) much more than predicted. (Though maybe not once the weights and free parameters are inferred — the fit is pretty great.)
Maybe this social acuity effect comes out in the Bayesian data analysis, which inferred the participant weights. There wasn’t much participant difference between the social-only weight and the both weight (0.57 vs. 0.51). Again, I’d be really curious to see if this separated out by participant social acuity.
(5) I found the potential (non?)-difference between “it wasn’t amazing” and “it wasn’t terrible” really interesting. I keep trying to decide if I differ in how I deploy them myself. I think I do — if I’m talking directly to the person, I’ll say “it wasn’t terrible”; if I’m talking to someone else about the first person’s poem, I’ll say “it wasn’t amazing”. I’m trying to ferret out why I have those intuitions, but it probably has something to do with what Y&al2017 discuss at the very end about the speaker’s own face-saving tactics.