I really appreciate seeing a clear explanation at the outset about how to cognitively interpret word entropy. The first thing I wonder when I see a cognitive model is what the variables are meant to correspond to in human cognition, and we get that right up front when it comes to discussing entropy (and why we should care about it). Basically, it’s a reflection of a processing cost (where minimizing entropy means minimizing that cost), so we potentially get some explanatory power about why language use looks the way it does, through the lens of entropy.
The main contributions of B&al2017 seem to be about establishing the ground truth about cross-linguistic entropy variation and methodology for assessing entropy -- so before we start worrying about what causes variation in word entropy, let’s first figure out how to assess it really well and then see if there are actually any differences than need explanation. The main finding is that, hey, word entropy doesn’t really differ. Therefore, whatever entropy indexes cognitively also doesn’t differ from language to language....which I think would make sense if this is about general human language processing abilities.
The other main finding is summed up this way: Unigram entropies and entropy rates are related -- in fact, you can predict entropy rate from unigram entropy. Here’s where I start to quibble because the interpretation given here doesn’t help me much: “uncertainty-reduction by co-textual information is approximately linear across the languages of the world.” What does this mean exactly? I don’t know how to contextualize that with respect to language processing. To be fair, I think B&al2017 are clear (in section 5.3) that they don’t know how either: “The exact meaning and implications of these constants are topics for future research.”
Other thoughts:
(1) B&al2017 note that they’ll discuss how the word entropy facts (i.e., the consistency across human languages) result from a trade-off between word learnability and word expressivity. In 6.1, they give us a bit of a sketch, which is nice -- basically this:
unlimited entropy = unlimited expressivity = unpredictable = hard to learn
minimum entropy = no expressivity = hard to communicate
This is the basic language evolution bottleneck, and then languages find a balance, with Kirby and colleagues providing simulations to prove it...or at least how compositionality results from these two pressures. But I’d like to think more about how that relates to word entropy. Compositionality = build larger things out a finite number of combinable pieces. Word entropy = ...what happens when you have that kind of system? But the interesting thing is how little variation there is, so it’s about a very narrow range of entropy resulting from this kind of system. So does any compositional system end up producing this range? (My sense is no, but I don’t know for sure.) If not, then we may have some interesting constraints on what kind of compositional system human languages end up producing.
(2) It’s interesting that orthographic words have been “vindicated” as reasonable units of analysis for describing regularities in language. Certainly there’s a big to-do in the developmental literature about words as a target of early speech segmentation (where the general consensus is “not really”).
(3) B&al2017 note that morphological complexity impacts unigram entropy, which makes sense: more complex words = more word types. Does this mean that for morphologically complex languages (e.g., agglutinative and polysynthetic), it would make more sense to do morpheme entropy? Or maybe morpheme entropy would be a better baseline period for cross-linguistic comparison? (This reminds me of the frequent frames literature in development, where there’s a question about whether the frame units ought to be words or morphemes, and how the child would figure out which to use for her language.)