Computational Models of Language (at UC Irvine): Some thoughts on Tal et al. 2021

Wednesday, October 27, 2021

Some thoughts on Tal et al. 2021

This seemed to me like a straightforward application of a measure of redundancy (measuring whatever level of representation you like) to quantify redundancy in child-directed speech over developmental time. As T&al2021 note, the idea of repetition and redundancy in child-directed speech isn’t new, but this way of measuring it is, and the results certainly accord with current wisdom that (i) repetition in speech is helpful for young children, and (ii) repetition gets less as children get older (and the speech directed at them gets more adult-like). The contributions therefore also seem pretty straightforward: a new, more holistic measure of repetition/redundancy at the lexical level, and the finding that multi-word utterances seem to be the thing that gets repeated less as children get older.

Some other thoughts:

(1) Corpus analysis: For the Providence corpus, with such large samples, I wonder why T&al2021 chose to make only two age bins (12-24 months, and 24-36 months). It seems like there would be enough data there to go finer-grained (like maybe every two months: 12-14, 14-16, etc), and especially zoom in on the gaps in the NewmanRatner corpus between 12 and 24 months.

(2) I had some confusion over the discussion of the NewmanRatner results, regarding the entropy decrease they found with the shuffled word order of Study 2. In particular, I think the explanation for the entropy decrease was that lexical diversity didn’t increase in this sample as children got older. But, I didn’t quite follow why this explained the entropy decrease. More specifically, if lexical diversity stays the same, the shuffled word order keeps the same frequencies of individual words over time, so no change in entropy at the lexical level. With shuffled word order, the multi-word sequences are destroyed, so that should increase entropy. How does no change + entropy increase lead to an overall entropy decrease?

Relatedly, T&al2021 say about Study 2 that “the opposite tendencies of lexical- and multi-word repetitiveness in this corpus seem to cancel each other out at 11 months”. This related to my confusion above. Basically, we have constant lexical diversity, so there’s no change to entropy over time coming from the lexical level. Decreasing multi-word repetitions leads to higher entropy over time. What are the opposite tendencies here? It seems like there’s only one tendency (increasing entropy from the loss of the multi-word repetitions).

Computational Models of Language (at UC Irvine)

Wednesday, October 27, 2021

Some thoughts on Tal et al. 2021

No comments:

Post a Comment

People who think this blog is awesome

Members