I really like that M&B are looking at a learning problem that would be interesting to both nativists and non-nativists (a lot of the time, it seems like the different sides are talking past each other on what problems they're trying to solve). I also really like that they're exploring a variety of different probabilistic learning models. It does seem that M&B are still approaching the learning problem from a strongly nativist perspective, given the way they've described the actual problem: the learner knows there are two classes of behavior that link syntactic structure to semantic interpretation (raising vs. control), and that there are specific cues the learner should use to figure out which behavior a given verb has (animacy & eventivity). Importantly, only those cues (and their distribution) are relevant. There also seems to be an implicit assumption (at least initially) that unambiguous data are required to distinguish the behavior of any given verb, and the learning problem results because unambiguous data aren't always available (this is a common way learnability problems are framed in a nativist perspective). One thing I wondered while reading this is what would happen if the behavior of these verbs was taken in the context of a larger system - that is, would it possibly be easier to recognize these distinct classes of verbs if other information were deemed relevant besides the two cues M&B look at? I believe they hint at this themselves in the paper - that it might be possible to look at the syntactic distribution of these verbs over all frames, rather than just the ambiguous frame that signals either raising or control (She VERBed to laugh). This doesn't solve the problem of knowing what the different linking rules are between structure and interpretation, but maybe it makes the classification problem (that there are distinct classes of verbs) easier.
Some more targeted thoughts:
- Footnote 2 talks about the issues of homophony, and I can certainly see that tend's meanings are pretty distinct between raising and regular transitive verb. However, happens looks like it means very similar things whether it's raising or regular transitive, so I wonder how children would make this distinction - or if they would at all. If not, then this looks like an additional class of verb that involves mixed behavior.
- The end of section 2 talks about how 3- and 4-year-olds are very sensitive to animacy when they interpret verbs in the ambiguous raising/control frame. I can completely believe that animacy might generally be a cue children use to help them figure out what things should mean (e.g., if a verb takes an agent or not).
- I really like the discussion/caveat that M&B do in the intro of section 4 about biological plausibility.
- I also really liked the discussion of the linear reward penalty (LRP) learner's issues in section 4.2.1. Not having an intermediate state equilibrium is problematic if you need there to be mixed behavior (e.g., something is ambiguous between raising and control). I admit, I was surprised by the saturating accumulator model M&B chose to implement to correct that problem. I had some trouble connecting the various pieces of it to the process in a child's mind - the intuitive mapping didn't work for me the way it does for the LRP learner. For example, the index they talk about right at the end of section 4.2.2 seems fairly ad-hoc and requires children to do abstracting over patterns of frames defined by these different semantic cues.
Discussion board for the reading group based out of UCI.
Friday, November 18, 2011
Tuesday, November 8, 2011
Next time on 11/21: Mitchener & Becker (2011)
Thanks to those of you who were able to join our nicely in-depth discussion of Alishahi & Pyykkonen (2011)'s article on syntactic bootstrapping! I think we figured out some of the details that were glossed over, and these really helped to understand the contribution of the study.
Next time, on Nov 21 (@3pm in SBSG 2221), we'll be looking at an article that examines how a subtle syntactic distinction that has specific semantic implications (called the raising-control distinction) could be learned.
Mitchener, G. & Becker, M. (2011). Computational Models of Learning the Raising-Control Distinction. Research on Language and Computation, 8(2), 169-207.
See you then!
Friday, November 4, 2011
Some thoughts on Alishahi & Pyykkonen (2011)
I really like the investigation of syntactic bootstrapping in this kind of computational manner. While experimental approaches like the Human Simulation Paradigm (HSP) offer us certain insights about how (usually adult) humans use different kinds of information, they have certain limitations that the computational learner doesn't (such as the researcher knowing exactly what the internal knowledge state is, and how it changes). From my perspective, the HSP with adults (and maybe even with 7-year-olds) is a kind of ideal learner approach, because it asks what inferences can be made with maximal knowledge about (the native) language - so while it clearly involves human processing limitations, it's examining the best that humans could reasonably be expected to do in a task that's similar to what word-learners might be doing. The computational learner is much more limited in the knowledge it has access to a priori, and I think the researchers really tried to give it reasonable approximations of what very young children might know about different language aspects. In addition, as A & P mention, the ability to track the time course of learning is a nice feature (though with some caveats with respect to implementation limitations).
Some more targeted thoughts:
I thought the probabilistic accuracy was a clever measure for taking advantage of the distribution over words that the learner calculates.
As I said above, tracking learning over time is an admirable goal - however, the modeled learner here clearly is only qualitatively doing this, since there's such a spike in performance between 0 and 100 training examples. I'm assuming A & P would say that children's inference procedures are much noisier than this (and so it would take children longer), unless there's evidence that children really do learn the exact correct meaning in under 100 examples (possible, but seems unlikely to me).
I was a little surprised that A & P didn't discuss the difference in Figure 1 between the top and bottom panel with respect to the -LI condition. (This was probably due to the length constraints, but still.) It's a bit mystifying to me how absolute accuracy could be close to the +LI condition while verb improvement is much lower than the +LI condition. I guess this means the baseline for verb improvement was different between the +LI and -LI conditions somehow?
It was indeed interesting to see that having no linguistic information (-LI) was actually beneficial for noun-learning - I would have thought noun-learning would also be helped by linguistic context. A & P speculate that this is because early nouns refer to observable concepts (e.g., concrete objects) and/or the nature of the training corpus made the linguistic context for nouns more ambiguous than for verbs. (The latter reason ties into the linguistic context more.) I wonder if this effect would persist with a different training corpus (after all, there were some assumptions A & P made when constructing this corpus - they seemed reasonable, but there are still different ways to construct the corpus.)
Some more targeted thoughts:
I thought the probabilistic accuracy was a clever measure for taking advantage of the distribution over words that the learner calculates.
As I said above, tracking learning over time is an admirable goal - however, the modeled learner here clearly is only qualitatively doing this, since there's such a spike in performance between 0 and 100 training examples. I'm assuming A & P would say that children's inference procedures are much noisier than this (and so it would take children longer), unless there's evidence that children really do learn the exact correct meaning in under 100 examples (possible, but seems unlikely to me).
I was a little surprised that A & P didn't discuss the difference in Figure 1 between the top and bottom panel with respect to the -LI condition. (This was probably due to the length constraints, but still.) It's a bit mystifying to me how absolute accuracy could be close to the +LI condition while verb improvement is much lower than the +LI condition. I guess this means the baseline for verb improvement was different between the +LI and -LI conditions somehow?
It was indeed interesting to see that having no linguistic information (-LI) was actually beneficial for noun-learning - I would have thought noun-learning would also be helped by linguistic context. A & P speculate that this is because early nouns refer to observable concepts (e.g., concrete objects) and/or the nature of the training corpus made the linguistic context for nouns more ambiguous than for verbs. (The latter reason ties into the linguistic context more.) I wonder if this effect would persist with a different training corpus (after all, there were some assumptions A & P made when constructing this corpus - they seemed reasonable, but there are still different ways to construct the corpus.)
Subscribe to:
Posts (Atom)