I really appreciate this kind of overview, especially for an acquisition modeling literature I’m not as familiar with. It’s heartening to see similar broad concerns (consensus about what models should be doing), even if I might not always agree with the particulars. What caught my initial attention here is the focus on moving beyond “purely distributional features of the input” — though it turns out this might mean something different to me than to the authors.
For me, “purely distributional” means using only distributional information (rather than being additionally biased to skew the distributions in some way, e.g., by upweighting certain data and downweighting others). Importantly, "purely distributional" can still be information about the distribution of fairly abstract things, like thematic role positions. For M&C2014, based on the intro, it seems like they want it to mean distributions of words, since they specifically point out the “relative lack of semantic information” in current distributional usage-based models. They also contrast a purely distributional version of Perfors et al.’s dative alternation learning model with one that includes “a single semantic feature”. So while I’m happy to see the inclusion of more abstract linguistic features, I would still class the use of the distributions of those features as a purely distributional strategy. (This is part of the general idea that it's not that you're counting, but rather what you're counting.)
Some additional thoughts:
(1) I like the suggestion to create models that can produce behavioral output that we can compare against children’s behavioral output. (This is under the general heading of “Models should aim to capture aspects of language use”.) That way, we don’t have to spend so much time arguing over the theoretical representation we choose for the model’s internal knowledge — the ultimate checkpoint is that it’s a way to generate the observed behavior (i.e., an existence proof). This is exactly the sort of the thing we read about last time in the reading group. Of course, as we also saw last time, this is much easier said than done.
(2) One criticism M&C2014 bring up as they discuss the models of semantic role labeling is that there’s a fixed set of predefined semantic roles. Is this really a problem, though? I think there’s evidence for early conceptual roles in infants (something like proto-agent and proto-patient).
Also, later on in the discussion of verb argument structure, M&C2014 describe Chang’s Embodied Construction Grammar model as involving a set of “predefined schemas” that correspond to “actions, objects, and agents”. This doesn’t seem to cause M&C2014 as much consternation — why is it any more usage-based to have predefined conceptual schemas instead of predefined conceptual roles?
(3) I admit, I was somewhat surprised in the future extensions discussion to see “subject-auxiliary inversion” as an example of complex grammatical phenomena. In my head, that’s far more basic than many other things I see in the syntactic development literature, such as raising vs. control verb interpretation, quantifier scope ambiguity, syntactic islands constraints, binding relations, negative polarity items, and so on. Related to this, it’s unclear to me how much “social feedback” incorporation that “reflect[s] the semi-supervised nature of the learning task” is going to matter for syntactic knowledge like this. How much feedback do children get (and actually absorb, even if they get it) for these more sophisticated knowledge elements?