I really appreciate trying to leverage a sophisticated language modeling tool (neural networks = NNs) to help us understand child language acquisition. I love the attempt to see how different input affects acquisition (here of passivization in English). That said, I’m still struggling to be convinced of what seems to be a major claim of the paper: “neural network language models as theories of acquisition”. I have Feelings (TM) about this, which I talk about below. Short version: I’d like to believe this, but I just don’t yet. So, I don’t know what to do with these results, given that I care about child language acquisition.
Thoughts:
(1) NNs as theories of acquisition: The Feelings (TM).
What’s the theory exactly? I think it’s probably about the nature of the input, at best. That is, it’s asking what kind of information is in the input signal, and using a NN to extract that information. I feel like we can talk about NNs as measuring signal available in the input, assuming the powerful learning mechanism of the NN. And so sure, the signal either is or isn’t there. But that’s not a theory of acquisition. That’s more an assessment of the input signal (a poverty of the stimulus argument). And I’m all in favor of exploring what information is available vs. not in the input signal. I just feel like that’s not the same as a theory of acquisition, which should speak to how the child uses that information as part of the (acquisitional) intake.
I mean, I really like the idea of zeroing in on the types of input signal that have an effect on generalization behavior (i.e., frequency of active vs. passive but not actionality/affectedness). But what’s missing for me is an explanation of *why* those things matter or don’t matter. This is where a computational cognitive model has a leg up on NNs/LLMs, because the cognitive model version is implementing an interpretable theory of acquisition. Then, when the intake changes and the generated behavior changes, we can look inside to understand why those changes had the effect they did. That’s a more satisfying theory of acquisition to me.
From section 4, Experiment 1B: Comparing language model and human judgments.
For all the reasons outlined here about how neural networks aren’t human-like (neural networks overgeneralize in ways humans don’t, neural networks are less data-efficient than humans), I really hesitate to label a NN model a “theory of acquisition”. Again, I’m for it as a tool for measuring information in the input signal, but not as a theory of the acquisition process.
From 5, Experiment 2: Intervening on training data: “To the extent that the model is a reliable cognitive model of human language learning, our interventions…” – Exactly this. This is my issue. I’m struggling to be convinced that these NNs are reliable cognitive models of human language learning. And with that in question in my mind, I don’t know what to take from these results.
About 8.2 Using neural networks as models of human learners
I really appreciate the attempt of this section to justify how NNs can be used as theories of acquisition, but I still have the same concerns from above. At best, if given plausible input data, these models can assess information available in that input signal. I don’t think they tell us about how a child is using that signal, or offer a “theory” (i.e., explanation) for how acquisition works. I do agree that the ability to manipulate the input signal (or intake) is valuable and hard to do in behavioral experiments. But this is where computational *cognitive* models have a leg up: there we can adjust the input/intake to the modeled child however we want, and we’re implementing a theory that in fact models something about the child’s acquisition process.
From 8.2: “...working with neural networks allows for the ability to probe a model’s internal processes to understand which mechanisms are vital to the model’s learning process and form hypotheses about how humans may learn”.
I would love to see this here. What internal processes of the NNS here link to mechanisms of passivization acquisition?
From 8.2: “Without a clear understanding of the inductive biases of the particular neural network chosen for comparison, we cannot make a fair comparison between these models and our theories of human cognition.”
Yes! Exactly. So, given that we all agree on this point, what do we do with the results here if we’re interested in theories of child language acquisition?
Also, about the input set used (4.2 Training corpus).
If you’re talking about data children have access to, your average kid under tween age probably isn’t reading adult-directed reddit text. At the lexical level, there’s a massive difference in lexical composition in speech directed to young children (under five) at the very least, let alone under 10. There may be structural differences in active vs. passive frequency, based on age of child the speech is directed at (let alone any differences between reddit posting and actual child-directed speech or child-text materials). So, as an acquisition researcher, what do I do with the fact that a learning model can or can’t extract information from adult-directed text about the passive? Does this tell me about the signal available in the actual data children get access to? I’m just struggling to see how this implementation informs acquisition (let alone acquisition theory, which my previous feelings were about).
(2) The impact of lexical semantics
I wish other lexical semantic hypotheses had been explored here besides affectedness, because this is a bit of a straw man. Other verbs can surely passivize – like perception verbs (“see” - Lisa sees penguins. Penguins are seen by Lisa) and subject experiencers (“love” – Lisa loves penguins. Penguins are loved by Lisa). But there’s no action and the theme isn’t affected.
Nguyen & Pearl 2021 have a rundown of some of the nuances of lexical semantics, and how they seem to matter (short version: semantic clusters seem to correlate with the acquisition trajectory of passives).
Nguyen, E., & Pearl, L. (2021). The link between lexical semantic features and children’s comprehension of English verbal be-passives. Language Acquisition, 28(4), 433-450.
But anyway…I guess any lexical semantic hypothesis could be investigated this way, and maybe some future work in this area can look at more nuanced versions of the lexical semantic hypothesis.
From 3.3 Results: Doesn’t the fact that there was an effect of verb class (i.e., estimation, price, duration, and experiencer-theme passive drops were different from agent-patient passive drops) indicate that there’s an effect of lexical semantics? It makes me think the lexical semantic manipulation wasn’t quite the right thing somehow, if an effect of actional (agent-patient) vs. not didn’t show up.
From 7, Experiment 2B: Lexical semantics does not significantly affect our models’ acceptability judgments
So, I get that putting the unpassivizable verb in active sentences which the passivizable verb appeared in will nudge the semantics, but nudge it how? Did all the active sentences show affectedness? I guess this maybe addresses my earlier worry that targeting affectedness alone as the lexical semantic feature wasn’t really fair. Here, who knows what’s being targeted, so it might be affectedness, or it might be some other aspect of the actional passivizable verbs. But this comes back to my greater worry: When we find that this lexical nudge doesn’t impact the model’s passivization performance, what do we do with that result? Does lexical nudging just never work? Is it just this lexical nudge, whatever it actually did, that doesn’t work? Why does this lexical nudge work with some verbs but not others?
(3) About indirect evidence
From 6, Experiment 2A: Frequency significantly affects our models’ acceptability judgments
“...English-speaking hear the passive infrequently in child-directed speech…just four passive utterances that include a by-phrase”.
This is where I think a richer discussion of indirect evidence could be useful (a little of it comes back in the general discussion 8.4 with other passive types). There are other types of passives besides be-passives with by-phrases (e.g., Lisa was annoyed by the claim.) For instance, there are passives (or adjectival-passives) without by-phrases (e.g., Lisa was annoyed), and get-passives (e.g., Lisa got annoyed). I do agree that compared with active uses, the passive is rarer than other syntactic constructions – but there are other quantifiable sources of indirect evidence for the be-passive + by-phrase. The question of how much these are or aren’t impacting generalization behavior seems interesting and testable.