I really like the clean layout of this approach and its mathematical predictions, even if I sometimes had to re-read some of the pieces to make sense of them. (I suspect this may be due to the length limitations.) In particular, this strikes me as an example of good corpus work motivated by interesting theoretical questions, and which makes sure to connect the results to the bigger picture of language change, language acquisition (both first and second), and language use.
More specific thoughts:
(1) The abstract mentions smoothing information over discourse to make nouns “more equally predictable in context”— I had some trouble figuring out what this meant. It shows up again in the section discussing grammatical gender, in the context of uncertainty over an utterance. My best guess is that this means at any point along the utterance, we can predict what noun is coming with probabilities that are more uniform?
Possibly related: “While the average uncertainty following the determiners was similar across languages, German determiners supported much greater entropy reduction than their English equivalent.” — So does this mean the range of entropy reduction was greater for German, even if on average it all washed out compared to English? And if it does wash out on average, then is the German gender system on determiners helping communicative efficiency in general, compared to English? This seems like it’s related to this comment that occurs a bit later: “However, whereas German provided a substantial entropy offset, English provided none at all.” What does an entropy offset refer to?
Also related: Trying to understand what’s going on in Figure 1. Are the two blue lines English vs. German? If so, where are the 10.17 and 10.55 coming from? They seem like they refer to different noun frequencies (based on the y axis). Is the idea that the y axis shows how many more nouns could be used with a certain entropy? If so, then the way to read this is that the three dotted lines come from another calculation, but we see the nouniness of their effect on the y axis. And then, the way to interpret that is that more entropy yields more nouns….so decreasing entropy means more predictable which means fewer nouns…..which means lower lexical diversity? Or does having fewer nouns possible mean you get to be more precise, and so when you sum over all contexts, you get more lexical diversity? I think that’s what this comment indicates: “German speakers appear to use the entropy reduction provided by noun class to choose nouns that are more specific, resulting in greater nominal diversity.”
(2) I’m a little surprised by the claim that it’s mostly adult speakers who innovate — all the first language acquisition work I’ve seen would suggest the bottleneck of L1 acquisition is a non-trivial cause of change (which I’m equating to how “innovation” is used here.) This may be a bias on my part because of my own work on Old English to Middle English word order change, with the idea it was caused in no small part by selective filters on first language learning: Pearl & Weinberg 2007.
Pearl, L., & Weinberg, A. (2007). Input filtering in syntactic acquisition: Answers from language change modeling. Language learning and development, 3(1), 43-72.
(3) Following up on the idea that adjectives in English reduce noun entropy, can we then get adjective ordering out of this (and link it to perceived subjectivity, a la Scontras et al. 2017?) In particular, is the more subjective adjective, which is further away, “more discriminative” or “less definite”? (Less definite seems to be in the same vein.) Is perceived subjectivity somehow tied to frequency?
Scontras, G., Degen, J., & Goodman, N. D. (2017). Subjectivity predicts adjective ordering preferences. Open Mind, 1, 53-65.