I really like the investigation of syntactic bootstrapping in this kind of computational manner. While experimental approaches like the Human Simulation Paradigm (HSP) offer us certain insights about how (usually adult) humans use different kinds of information, they have certain limitations that the computational learner doesn't (such as the researcher knowing exactly what the internal knowledge state is, and how it changes). From my perspective, the HSP with adults (and maybe even with 7-year-olds) is a kind of ideal learner approach, because it asks what inferences can be made with maximal knowledge about (the native) language - so while it clearly involves human processing limitations, it's examining the best that humans could reasonably be expected to do in a task that's similar to what word-learners might be doing. The computational learner is much more limited in the knowledge it has access to a priori, and I think the researchers really tried to give it reasonable approximations of what very young children might know about different language aspects. In addition, as A & P mention, the ability to track the time course of learning is a nice feature (though with some caveats with respect to implementation limitations).
Some more targeted thoughts:
I thought the probabilistic accuracy was a clever measure for taking advantage of the distribution over words that the learner calculates.
As I said above, tracking learning over time is an admirable goal - however, the modeled learner here clearly is only qualitatively doing this, since there's such a spike in performance between 0 and 100 training examples. I'm assuming A & P would say that children's inference procedures are much noisier than this (and so it would take children longer), unless there's evidence that children really do learn the exact correct meaning in under 100 examples (possible, but seems unlikely to me).
I was a little surprised that A & P didn't discuss the difference in Figure 1 between the top and bottom panel with respect to the -LI condition. (This was probably due to the length constraints, but still.) It's a bit mystifying to me how absolute accuracy could be close to the +LI condition while verb improvement is much lower than the +LI condition. I guess this means the baseline for verb improvement was different between the +LI and -LI conditions somehow?
It was indeed interesting to see that having no linguistic information (-LI) was actually beneficial for noun-learning - I would have thought noun-learning would also be helped by linguistic context. A & P speculate that this is because early nouns refer to observable concepts (e.g., concrete objects) and/or the nature of the training corpus made the linguistic context for nouns more ambiguous than for verbs. (The latter reason ties into the linguistic context more.) I wonder if this effect would persist with a different training corpus (after all, there were some assumptions A & P made when constructing this corpus - they seemed reasonable, but there are still different ways to construct the corpus.)
No comments:
Post a Comment