One of the things I quite like about this paper is that it’s a really nice example of what you can do with observational data (like the CDI data), though of course there are still the standard limitations on the accuracy of caretaker reports, the fact that you’re getting at production (in this case) rather than comprehension so we’re seeing a time delay w.r.t. when the knowledge is acquired by the child, etc.
Also, how nice to see a study with this many subjects! I think this size subject pool is more standard in medical studies, but it’s really rare that we’ve seen this size for language acquisition studies. This means that when we find trends, we can be more sure it’s not just a fluke of the sample.
The question from the modeler’s perspective then becomes “What can we do with this?” Certainly this provides an empirical checkpoint in multiple languages for specific details about the development trajectory. So, I think this makes it good behavioral data for models of syntactic development (e.g., MOSAIC by Freudenthal & colleagues: Freudenthal et al. 2007, 2009; Variational learning: Yang 2004, Legate & Yang 2007) and models of vocabulary development (e.g., the model of McMurray & colleagues: McMurray 2007, Mitchell & McMurray 2009, McMurray et al. 2012) to try and match their outputs against. Especially good is the differences across languages - these are the kind of nuances that may distinguish models from each other. Perhaps even more interesting would be an attempt to build a joint model that combines promising syntactic development and vocabulary development models together so that you can look for the correlational data this large-scale observational study provides.
Some more targeted thoughts:
(1) The methodology advance of wordbank.stanford.edu pleases me no end — I think this kind of aggregation approach is the way forward. Once you can aggregate data sets of this size, you can find things that you can feel more confident about as a scientist. So, the finding that there are age effects on syntax (less so on morphology) and on function words (less so on nouns) is something that people will take notice of.
(2) Analysis 1: I wonder how much of an effect the linguistic properties of these languages has (ex: Spanish, Norwegian, and Dutch are morphologically much richer than English). It would be nice to see some sort of quantitative measure of the morphological richness, and maybe other potentially relevant cross-linguistic factors. A related thought: Are there any useful/explanatory cross-linguistic differences in the actual items in the Complexity (Morphological & Syntactic) items?
(3) Analysis 2, Figure 4: There’s an interesting difference in early Spanish where predicates lag behind function words until the vocabulary size =~ 0.4. Presumably this is something due to the language itself, and the items in the predicates vs. function words categories? It’s notable that Spanish is also the only language where predicates don’t seem to have an age effect coefficient (see Figure 5) - so predicate development is totally predictable from the child’s vocabulary development. Also, Figure 5 shows Danish with a big age effect for Nouns — does this have to do with the particular nouns, I wonder? Or something about Danish nouns in general?
~~~
References:
Freudenthal, D., Pine, J. M., Aguado‐Orea, J., & Gobet, F. (2007). Modeling the developmental patterning of finiteness marking in English, Dutch, German, and Spanish using MOSAIC. Cognitive Science, 31(2), 311-341.
Freudenthal, D., Pine, J. M., & Gobet, F. (2009). Simulating the referential properties of Dutch, German, and English root infinitives in MOSAIC. Language Learning and Development, 5(1), 1-29.
Legate, J. A., & Yang, C. (2007). Morphosyntactic learning and the development of tense. Language Acquisition, 14(3), 315-344.
McMurray, B. (2007). Defusing the childhood vocabulary explosion. Science, 317(5838), 631-631.
McMurray, B., Horst, J. S., & Samuelson, L. K. (2012). Word learning emerges from the interaction of online referent selection and slow associative learning. Psychological review, 119(4), 831.
Mitchell, C., & McMurray, B. (2009). On leveraged learning in lexical acquisition and its relationship to acceleration. Cognitive Science, 33(8), 1503-1523.
Yang, C. D. (2004). Universal Grammar, statistics or both?. Trends in cognitive sciences, 8(10), 451-456.