I’m definitely on board with the spirit of these papers. My position: I would love to understand more about how children do what they do when it comes to language acquisition. If that also helps large language models (LLMs) do what they do better, then that’s great too.
Some other specific thoughts, responding to certain ideas in “Bridging the gap”:
(1) I definitely understand that the interactive, social nature of children’s input matters. In particular, the social part in child language acquisition is usually about why certain input has more impact than others - the input in an interactive, social environment gets absorbed better by kids. But absorption doesn’t seem to be the problem for LLMs – they take in their data just fine. That said, it does seem like the interaction part helps Chat-GPT (I.e., the ability to query).
More generally, it could be that what a certain input quality (e.g., being social and interactive) does for human kids isn’t necessary for an LLM. But, we don’t know that until we understand why that input quality helps kids in the first place.
(2) I also understand that multimodal input gives concrete extensions to some concepts, and so helps “ground out” meaning in the real world for kids. I’m less sure how multimodal input would help current AI systems — is it maybe helpful for bootstrapping the rest of the cognitive system (somehow?) that allows flexible reasoning?
(3) I think there’s a really good point made about needing the apples-to-apples comparison for evaluation. I remember earlier in the evaluation of speech segmentation models, the models were compared against perfect (adult-like) accuracy of segmentation, and few cognitively-plausible ones did all that well. In contrast, when these same models were tested on the segmentation tasks given to infants (which were meant to demonstrate infant segmentation ability), most models did just fine. Now, whether the models accomplished segmentation the way that the infants did is a different question, and one that would also apply to LLMs once we have apples-to-apples comparisons.