One of the things I really enjoyed about this paper was the framing they give to explain why we should care about the emergence of grammatical categories, with respect to the existing debate between (some of) the nativists and (some of) the constructivists. Of course I'm always a fan of clever uses of Bayesian inference to problems in language acquisition, but I sometimes really miss the level of background story that we get here. (So hurrah!)
That being said, I was somewhat surprised to see the conclusion M&al2014 drew with respect to their results that (some kind of) a nativist view wasn't supported. To me, the fact that we see very early rapid development of this grammatical category knowledge is an unexpected thing from the "gradual emergence based on data" story (i.e., constructivist perspective). So, what's causing the rapid development? I know it's not the focus of M&al2014's work here, but positing some kind of additional learning guidance seems necessary to explain these results. And until we have a story for how that guidance would be learned, the "it's innate" answer is a pretty good placeholder. So, for me, that places the results in the nativist side, though maybe not the strict "grammatical categories are innate" version. Maybe I'm being unfair to the constructivist side, though -- would they have an explanation for the rapid, early development?
Another very cool thing was the application of this approach to the Speechome data set. It's been around for awhile, but we don't have a lot of studies that use it and it's such an amazing resource. One of the things I wondered, though, was whether the evaluation metric M&al2014 propose can only work if you have this density of data. It seems like that might be true, given the issues with confidence intervals on the CHILDES datasets. If so, this is different from Yang's metric [Yang 2013] which can be used on much smaller datasets. (My understanding is that as long as you have enough data to form a Zipfian distribution, you have enough for Yang's metric to be applied.)
One thing I didn't quite follow was the argument about why only a developmental analysis is possible, rather than both a developmental and a comparative analysis. I completely understand that adults may have different values for their generalized determiner preferences, but we assume that they realize determiners are a grammatical class. So, given this, whatever range of values adults have is the target state for acquisition, right? And this should allow a comparative analysis between wherever the child is and wherever the adult is. (Unless I'm missing something about this.)
Some more targeted thoughts:
As a completely nit-picky thing that probably doesn't matter, it took me a second to get used to calling grammatical categories syntactic abstractions. I get that they're the basis for (many) syntactic generalizations, but I wouldn't have thought of them as syntactic, per se. (Clearly, this is just a terminology issue, and other researchers that M&al2014 cite definitely have called it syntactic knowledge, too.)
M&al2014 state in the previous work section that Yang's metric is "not well-suited to discovering if a child could be less than fully productive at a given stage of development". I'm not sure I understand why this is so - if the observed overlap in the child's output is less than the expected overlap from a fully productive system, isn't that exactly the indicator of a less than fully productive system?
In the generative model M&al2014 use, they have a latent variable that represents the unrecorded caregiver input (DA), which is assumed to be drawn from the same distribution as the observed caregiver input (dA). I don't follow what this variable contributes, especially if it follows the same distribution as the observed data.
The table just below figure 4: I'm not sure I followed this. What would rich morphology be for English data, for example? And are the values for "Current" the v value inferred for the child? Are the Yang 2013 values calculated based on his expected overlap metric?
I wonder if the reason there were developmental changes found in the Speechome corpus is more about having enough data in the appropriate age range (i.e., < 2 years old). The other corpora had a much wider range of ages, and it could very well be that the ones that included younger-than-2-year-old data had older-age data included in the earliest developmental window investigated.
There's a claim made in the discussion that "no previous analysis has taken into account the input that individual children hear in judging whether their subsequent determiner usage has changed its productivity". I think what M&al2014 intend is something related to the explicit modeling of how much of the productions are imitated chunks, and if so, that seems completely fine (though one could argue that the Yang 2010 manuscript goes into quite some detail modeling this option). However, the way the current sentence reads, it seems a bit odd to say no previous analysis has cared about the input -- certainly Yang's metric can be used to assess productivity in child-directed speech utterances, which are the children's input. This is how a comparative analysis would presumably be made using Yang's metric.
Similarly, there's a claim near the end that the Bayesian analysis "makes inferences regarding developmental change of continuity in a single child possible". While it's true that this can be done with the Bayesian analysis, there seems to be an implicit claim that the other metrics can't do this. But I'm pretty sure it can also be done with the other metrics out there (e.g., Yang's). You basically apply the metric to data at multiple time points, and track the change, just as M&al2014 did here with the Bayesian metric.
Yang, C. 2013. Onotogeny and philogeny of language. 2013. Proceedings of the National Academy of Science, 110 (16). doi:10.1073/pnas.1216803110.