It’s really cool to see how adding processing considerations to an idealized (i.e., rational) model yields observable behavior. It reminds me of the importance of the different Marr explanation levels, where the algorithmic level is where processing considerations often get added (since these affect the algorithm humans use). A lot of work we’ve read about so far has been at the computational level (where, for example, the Rational Speech Act model typically lives). But in the back in my mind, I’m always thinking about what key differences might emerge once we have the bottleneck of human cognitive constraints.
Some other thoughts:
(1) Introduction, “As they occur in languages with widely different grammatical structures, we can expect that such an explanation will make reference to general principles of human communication and cognition” - I’m completely sympathetic to this approach, though it strikes me as funny that this is the same empirical fact that generativists use to appeal to innate, language-specific mechanisms (i.e., Universal Grammar). That is, the appearance of a pattern like this across the world’s languages is a signal to generativists that a universal language-specific principle is at work. Of course, as Hahn et al. (2018) note, it could well be that the universal principle has an effect on language (here, as adjective ordering constraints) but in fact the principle itself could be domain-general (e.g., something having to do with memory limitations, etc.).
(2) The Function of Subjective Adjectives: I love seeing how to operationalize intuitions formally -- this is a great example. We have a somewhat squishy notion of subjectivity that gets formalized as judgments whose truth is relative to individuals, which subsequently gets implemented as the listener inferring the speaker’s judgment.
(3) A Model of Adjective Use, where the listener infers a full word state that includes multiple people: This seems equivalent to inferring that the adjective is in fact subjective. Developmentally, that’s definitely a step kids have to figure out, i.e., is adjective A likely to be something everyone agrees on or not?
(4) Communication: Rational Listeners and Speakers, an RSA model with just L0 and S. So this is just a model of why a speaker chooses to say something, rather than how a pragmatic listener (L1) chooses to interpret it? I wonder why we stop at this level rather than going another layer to a pragmatic speaker (S1) who chooses to say something, based on how a pragmatic listener will interpret it. That’s what we have to do when modeling Truth Value Judgment Tasks (TVJTs), for example. Maybe that’s because TVJTs aren’t normal speech events, but instead involve participants judging what they themselves would say?
(5) Communication: Rational Listeners and Speakers, where a speaker’s utility function is adjusted (from just basic negative surprisal) because she realizes other people’s judgments may be different than others: This part where the expected utility isn’t just negative surprisal may be a developmental step kids would have to complete. That is, if kids realize adjectives can be subjective and other people may disagree, then they’d behave like what’s modeled here. On the other hand, if kids don’t realize adjectives can be subjective, they may simply go with negative surprisal.
(6) Communication: Rational Listeners and Speakers, where the cost is the surprise of utterance u across the community’s language use: This is interesting too -- usually we see cost having to do with individual production costs such as longer utterances being more costly than shorter ones. But of course here, all the utterances are the same length. Instead, what could differ is the frequency of that combination. This seems like a useful aspect to incorporate into speaker models more generally, since frequency in the input can certainly affect ease of production.
(7) Adding Noise: I love seeing how this explanation works, with words further back being more likely to be deleted before the whole phrase can be interpreted. It’s nicely intuitive that more subjective words would be preferred further back, since they lead to more disagreement across listeners. But I wonder how this story would work for languages where the adjectives comes after. In that case, the more subjective adjective is still further away, but this time it’s the one the listener would have heard most recently. So, I think that means the one further in the past would be deleted more often -- in this case, the less subjective one -- and the more subjective one would be likely to survive. And then it becomes weird, because now we get the reverse situation, where the surviving adjective is the one that listeners don’t agree on as much. It seems like this account would predict languages with adjectives occurring after the noun to have the more subjective ones closer to the noun, since they’d be more likely to be forgotten. But that’s not what we see.
There’s a specific note about this in the discussion: “Our account seems to make the correct prediction. In such languages, the noun is more likely to be lost when the second (subjective, in this case) adjective is reached.” -- So is the idea that the listener is just left with the two adjectives and no noun? Why does that lead to the correct order of noun-less_subjective_adj-more_subjective_adj, from a communicative standpoint?