Monday, November 20, 2017

Some thoughts on Stevens et al. 2017

It’s really nice to see an RSA model engaging with pretty technical aspects of linguistic theory, as S&al2017 do here. In these kinds of problems, there tend to be a lot of links to follow in the chain of reasoning, and it’s definitely not easy to adequately communicate them in such a limited space. (Side note: I forget how disorienting it can be to not know specific linguistics terms until I try to read them all at once in an abstract without a concrete example. This is a good reminder to those of us who work in more technical areas: Make sure to have concrete examples handy. The same thing is true for walking through the empirical details with the prosodic realizations as S&al2017 have here —  I found the concrete examples super-helpful.)

Specific thoughts:

(1) For S&al2017, “information structure” = inferring the QUD probabilistically from prosodic cues?

 (2) I think the technical linguistic material is worth going over, as it connects to the RSA model. For instance, I’m struggling a bit to understand the QUD implications for having incomplete answers vs. having complete answers, especially as it relates to a QUD’s compatibility with a given melody. 

For example, when we hear “Masha didn’t run QUICKLY”, the QUD is something like “How did Masha run?”. That’s an example of an incomplete answer. What’s a complete answer version of this scenario, and how does this impact the QUD? Once I get this, then I think it makes complete sense to use the utility function defined in equation (10). 

(3) I was struck by S&al2017’s notational trick, where they get out of the recursive social reasoning loop of literal listener to speaker to pragmatic listener. Here, it’s utility function to speaker to hearer because they’re presumably trying to deemphasize the social reasoning aspect? Or they just thought it made more sense described this way?

(4) About those results:
Figure 2: It’s nice to see modelers investigating the effect of the rationality (softmax) parameter in the speaker function. From the look of Figure 2, speakers need to be pretty darned rational indeed (really exaggerate endpoint behavior) in order to get any separation in commitment certainty predictions. 

Thinking about this intuitively, we should expect the LH Name condition (MASHA didn’t run quickly) to continue to be ambivalent about commitment to Masha running at all. That definitely shows up. I think. (Actually, I wonder if if might have been more helpful to ask participants to rate things on a scale from 1 (No, certainly not) to 7 (Yes, certainly so). That seems like it would make a 4 score easier to interpret (4 = maybe yes, maybe no). Here, I’m a little unsure how participants were interpreting the middle of the scale. I would have thought “No, not certain” would be the “maybe yes, maybe no” option, and so we would expect scores of 1. This is something of an issue when we come to the quantitative fit of the model results to the experimental results. Is the behavioral difference shallow just because of the way humans were asked to give their answers?  The way the model probability is calculated in (16) suggests that the model is operating more under the 1 = “no, certainly not” version (if I’m interpreting it correctly - -you have the “certainly yes” option contrasted with the “certainly not” option).


Clearly, however, we see a shift up in human responses in Figure 3 for the LH Adverb condition (Masha didn’t run QUICKLY), which does accord with my intuitions. And we get them from the model in Figure 2, as long as that rationality parameter is turned way up. (Side note: I’m a little unclear about how to interpret the rationality parameter, though. We always hedge about it in our simulation results. It seems to be treated as a noise parameter, i.e., humans are noisy, so let’s use this to capture some messy bits of their behavior. In that case, maybe it doesn’t mean much of anything that it has to be turned up so high here.)

No comments:

Post a Comment