Computational Models of Language (at UC Irvine): Some thoughts on Crain & Thornton (2012)

Once again, I'm a fan of these kind of review articles because they often distill some of the arguments and assumptions that a particular perspective makes. It's quite clear that the authors come from a linguistic nativist perspective, and offer a set of phenomena that they think make the case for linguistic nativism very clearly. This is good for us as modelers because we can look at what the learning problems are that cause people to take the linguistic nativist perspective.

I admit that I do find some of their claims a little strong, given the evidence. This might be due to the fact that it is a review article, so they're just summarizing, rather than providing a detailed argument. However, I did find it a little ironic that they seem to make a particular assumption about what productivity is, and this kind of assumption is precisely what Yang (2010 Ms, 2011) took the usage-based folk to task for (more on this below). I also think the authors are a little overzealous in characterizing the weaknesses of the usage-based approach sometimes - in particular, they don't seem like they want to have statistical learning be part of the acquisition story at all. While I'm perfectly happy to say that statistical learning can't be the whole story (after all, we need a hypothesis space for it to operate over), I don't want to deny its usefulness.

More specific thoughts:

- I was surprised to find a conflation of nature (innate) vs. nurture (derived) with domain-specific vs. domain-general in the opening paragraph. To me, these are very different dimensions - for example, we could have an innate, domain-general learning process (say, statistical learning) and derived, domain-specific knowledge (say, phonemes).

- I thought this characterization of the usage-based approach was a little unfair: "...child language is expected to match that of adults, more or less". And then later on, "...children only (re)produce linguistic expressions they have experienced in the input..." Maybe on an extreme version, this is true. But I'm pretty sure the usage-based approach is meant to account for error patterns, too. And that doesn't "match" adult usage, per se, unless we're talking about a more abstract level of matching. This again comes up when they say the child "would not be expected to produce utterances that do not reflect the target language", later on in the section about child language vs. adult language.

- I thought the discussion of core vs. periphery was very good. I think this really is one way the two approaches (linguistic nativist vs. usage-based) significantly differ. For the usage-based folk, this is not a useful distinction - they expect everything to be accounted for the same way. For the linguistic nativist folk, this isn't necessarily true: Core phenomena may be learned in a different way than periphery phenomena.

- I was less impressed by the training study that showed 7-year-olds can't learn structure-independent rules. At that point in acquisition, it wouldn't surprise me at all if their hypothesis space was highly (insurmountably) biased towards structure-dependent rules, even if they had initially allowed structure-independent rules. However, the point I think the authors are trying to make here is that statistical learning needs a hypothesis space to operate over, and doesn't necessarily have anything to do with defining that hypothesis space. (And that, I can agree with.)

- This is the third time this quarter we've seen the structure-dependence of rules problem invoked. However, it's interesting to me that the fact there is still a learning problem seems to be glossed over. That is, let's suppose we know we're only supposed to use structure-dependent rules. It's still a question of which rule we should pick, given the input data, isn't it? This is an interesting learning problem, I think.

- The discussion about how children must avoid making overly-broad generalizations (given ambiguous data) seems a bit old-fashioned to me. Bayesian inference is one really easy way to learn the subset hypothesis, given ambiguous data, for example. But I think this shows how techniques like Bayesian inference haven't really managed to penetrate the discussions of language acquisition in linguistic nativist circles.

- For the Principle C data, the authors make an assertion that 3-year-olds knowing the usage of names vs. pronouns indicates knowledge that they couldn't have learned. But this is an empirical question, I think - what other (and how many other) hypotheses might they have? What are the relevant data to learn from (utterances with names and pronouns in them?), and how often do these data appear in child-directed speech?

- The conjunction and disjunction stuff is definitely very cool - I get the sense that these kind of data don't appear that often in children's data, so it again becomes a very interesting question about what kinds of generalizations are reasonable to make, given ambiguous data. Additionally, it's hard to observe interpretations the way we can observe the forms of utterances - in particular, it's unclear if the child gets the same interpretation the adult intends. This in general makes semantic acquisition stuff like this a very interesting problem.

- For the passives, I wonder if children's passive knowledge varies by verb semantics. I could imagine a situation where passives with physical verbs come first (easily observable), then internal state (like heard), and then mental (like thought). This ties into how observable the data are for each verb type.

- For explaining long-distance wh questions with wh-medial constructions (What do you think what does Cookie Monster like?), I think the authors are a touch hasty on dismissing a juxtaposition account simply because kids don't repeat the full NP (e.g., Which smurf) in the wh-medial position. It seems like this could be explained by a bit of pragmatic knowledge about pronoun vs. name usage, where kids don't like to say the full name after they've already said it earlier in the utterance (we know this from imitation tasks with young kids around 3 years old, I believe).

- The productivity assumption I mentioned in the intro to this post relates to this wh-medial question issue. The third argument against juxtaposition is that we should expect to see certain kinds of utterances regularly (like (41)), but we don't observe them that often. However, before assuming this means that children do not productively use these forms, we probably need to have an objective measure of how often we would expect them to use these forms (probably based on a Zipfian distribution, etc.).

- I love how elegant the continuity hypothesis is. I'm less convinced by the wh-medial questions as evidence, but it's potentially a support for it. However, I find the positive polarity stuff (and in particular, the different behavior in English vs. Japanese children, as compared to adults) to be more convincing support for it (the kids have an initial bias that they probably didn't pick up from the adults). The only issue (for me) with the PPI parameter is that it seems very narrow. Usually, we try to make parameters for things that connect to a lot of different linguistic phenomena. Maybe this parameter might connect to other logical operators, and not just AND and OR? On a related note, if it's just tied to AND and OR, what does the parameter really accomplish? That is, does it reduce the hypothesis space in a useful way? How many other hypotheses could there be otherwise for interpreting AND and OR?

- Related to the PPI stuff: I was less clear on their story about how children pick that initial bias: "...favor parameter values that generate scope relations that make sentences true in the narrowest range of circumstances...". This is very abstract indeed - kids are measuring an interpretation by how many hypothetical situations it would be true for. This really depends on their ability to imagine those other situations and actively be comparing them against a current interpretation...

~~~
References:
Yang, C. (2010 Ms.) Who's Afraid of George Kingsley Zipf? Unpublished Manuscript, Universty of Pennsylvania.

Yang, C. (2011). A Statistical Test for Grammar. Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, 30-38.

Computational Models of Language (at UC Irvine)

Monday, May 14, 2012

Some thoughts on Crain & Thornton (2012)

No comments:

Post a Comment

People who think this blog is awesome

Members