Computational Models of Language (at UC Irvine): May 2012

Wednesday, May 30, 2012

Have a good summer, and see you in the fall!

Thanks so much to everyone who was able to join us for our lively discussion today, and to everyone who's joined us this past academic year!

The CoLa Reading Group will be taking a hiatus this summer, and we'll resume again in the fall quarter. As always, feel free to send me suggestions of articles you're interested in reading, especially if you happen across something particularly interesting!

Monday, May 28, 2012

Some thoughts on Sondregger & Niyogi (2010)

I think this paper is a really nice example of how to use real data for language change modeling, and why you would want to. I like this methodology in particular, where properties of the individual learner are explored and measured by their effects on the population dynamics. Interestingly, I think this is different than some of the other work I'm familiar with relating language acquisition and language change, since I'm not sure it restricts the learning period to the period of language acquisition, per se. In particular, the knowledge being modeled - stress patterns of lexical items, possibly based on influence from the rest of the lexicon - is something that seems like it can change after native language acquisition is over. That is, the learners here don't have to be children (which is something that Pearl & Weinberg (2007) assumed for the knowledge they looked at, and something with work by Lightfoot (1999, 2010) generally assumes). Based on some of the learning assumptions involved in this paper (e.g., probability matching when given noisy input, using the lexicon to determine the most likely stress pattern), I would say that the modeled learners probably aren't children. And that's totally fine. The only caveat is that then the explanatory power of learning to explain the observed changes becomes a little less, simply because other factors may be involved (language contact, synchronic change within the adults of a population, etc.), and these other factors aren't modeled here. So, when you get the population reproducing the observed behaviors, it's true that this learning behavior on its own could be the explanatory story - but it's also possible that a different learning behavior coupled with these other factors might be the true explanatory story. I think this is inherently a problem in explanatory models of language change, though - what you provide is an existence proof of a particular theory of how change happens. So then it's up to people who don't like your particular theory to provide an alternative. ;)

More targeted thoughts:

- I was definitely intrigued by the constrained variation observed in the stress patterns of English nouns and verbs together. Ross' generalization seems to describe it well enough (primary stress for nouns is further to the left than primary stress for verbs), but that doesn't explain where this preference comes from - it certainly seems quite arbitrary. Presumably, it could be an accident of history that a bunch of the "original" nouns happened to have that pattern while the verbs didn't, and that got passed along through the generations of speakers. The authors mention something later on about how nouns appear in trochaic-biasing contexts, while verbs appear in iambic-biasing contexts (based on work by Kelly and colleagues). This again seems like the result of some process, rather than the cause of it. Maybe it has something to do with the order of verbs and their arguments? I could imagine that there's some kind of preference for binary feet where stress occurs every other syllable, and then the stress context for nouns vs. verbs comes from that (somehow)...

- The authors mention that falling frequency (rather than low frequency) seems to be the trigger for change to {1,2}. This means that something could be highly frequent, but because its frequency lessens some (maybe lessens rapidly?), change is triggered. That seems odd to me. Instead, it seems more likely that both falling frequency and low frequency might be caused by the same underlying something, and that's the something that triggers change. (Caveat: I haven't read the work the authors mentioned, so maybe it's laid out more clearly there.) However, they restate it again at the end of this paper, relating to the last model they look at.

- The last model the authors explore (coupling by priors + mistransmission) is the one that does best at matching the desired behaviors, such as changing to {1,2} more often. I interpreted this model as something like the following: If enough examples are heard, the mistransmission bias encourages mis-hearing in the right direction, given the priors that come from the lexicon on overall stress patterns. However, the mistransmission also means that it goes towards that {1,2} pattern more slowly, so only higher frequencies can make it happen the way we want it to (and this is how it differs from the fourth model that just has coupling by priors).

~~~
References
~~~

Lightfoot, D. (1999). The development of language: Acquisition, change, and evolution. Oxford, Eng-
land: Blackwell.

Lightfoot, D. (2010). Language acquisition and language change. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 677-684. doi: 10.1002/wcs.39.

Pearl, L. & Weinberg, A. (2007) Input Filtering in Syntactic Acquisition: Answers from Language Change Modeling, Language Learning and Development, 3(1), 43-72.

Wednesday, May 16, 2012

Next time on May 30: Sondregger & Niyogi (2010)

Thanks to everyone who was able to join our rousing discussion today of Crain & Thornton's (2012) article on syntax acquisition! Next time on May 30 at 10:30am in SBSG 2221, we'll be looking at an article that examines the interplay of language acquisition and language change, looking at the role of mistransmission in a dynamical system:

Sondregger, M. & Niyogi, P. (2010). Combining data and mathematical models of language change. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 1019-1029.
http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/SondreggerNiyogi2010_DataModelsLangChange.pdf

See you then!

Monday, May 14, 2012

Some thoughts on Crain & Thornton (2012)

Once again, I'm a fan of these kind of review articles because they often distill some of the arguments and assumptions that a particular perspective makes. It's quite clear that the authors come from a linguistic nativist perspective, and offer a set of phenomena that they think make the case for linguistic nativism very clearly. This is good for us as modelers because we can look at what the learning problems are that cause people to take the linguistic nativist perspective.

I admit that I do find some of their claims a little strong, given the evidence. This might be due to the fact that it is a review article, so they're just summarizing, rather than providing a detailed argument. However, I did find it a little ironic that they seem to make a particular assumption about what productivity is, and this kind of assumption is precisely what Yang (2010 Ms, 2011) took the usage-based folk to task for (more on this below). I also think the authors are a little overzealous in characterizing the weaknesses of the usage-based approach sometimes - in particular, they don't seem like they want to have statistical learning be part of the acquisition story at all. While I'm perfectly happy to say that statistical learning can't be the whole story (after all, we need a hypothesis space for it to operate over), I don't want to deny its usefulness.

More specific thoughts:

- I was surprised to find a conflation of nature (innate) vs. nurture (derived) with domain-specific vs. domain-general in the opening paragraph. To me, these are very different dimensions - for example, we could have an innate, domain-general learning process (say, statistical learning) and derived, domain-specific knowledge (say, phonemes).

- I thought this characterization of the usage-based approach was a little unfair: "...child language is expected to match that of adults, more or less". And then later on, "...children only (re)produce linguistic expressions they have experienced in the input..." Maybe on an extreme version, this is true. But I'm pretty sure the usage-based approach is meant to account for error patterns, too. And that doesn't "match" adult usage, per se, unless we're talking about a more abstract level of matching. This again comes up when they say the child "would not be expected to produce utterances that do not reflect the target language", later on in the section about child language vs. adult language.

- I thought the discussion of core vs. periphery was very good. I think this really is one way the two approaches (linguistic nativist vs. usage-based) significantly differ. For the usage-based folk, this is not a useful distinction - they expect everything to be accounted for the same way. For the linguistic nativist folk, this isn't necessarily true: Core phenomena may be learned in a different way than periphery phenomena.

- I was less impressed by the training study that showed 7-year-olds can't learn structure-independent rules. At that point in acquisition, it wouldn't surprise me at all if their hypothesis space was highly (insurmountably) biased towards structure-dependent rules, even if they had initially allowed structure-independent rules. However, the point I think the authors are trying to make here is that statistical learning needs a hypothesis space to operate over, and doesn't necessarily have anything to do with defining that hypothesis space. (And that, I can agree with.)

- This is the third time this quarter we've seen the structure-dependence of rules problem invoked. However, it's interesting to me that the fact there is still a learning problem seems to be glossed over. That is, let's suppose we know we're only supposed to use structure-dependent rules. It's still a question of which rule we should pick, given the input data, isn't it? This is an interesting learning problem, I think.

- The discussion about how children must avoid making overly-broad generalizations (given ambiguous data) seems a bit old-fashioned to me. Bayesian inference is one really easy way to learn the subset hypothesis, given ambiguous data, for example. But I think this shows how techniques like Bayesian inference haven't really managed to penetrate the discussions of language acquisition in linguistic nativist circles.

- For the Principle C data, the authors make an assertion that 3-year-olds knowing the usage of names vs. pronouns indicates knowledge that they couldn't have learned. But this is an empirical question, I think - what other (and how many other) hypotheses might they have? What are the relevant data to learn from (utterances with names and pronouns in them?), and how often do these data appear in child-directed speech?

- The conjunction and disjunction stuff is definitely very cool - I get the sense that these kind of data don't appear that often in children's data, so it again becomes a very interesting question about what kinds of generalizations are reasonable to make, given ambiguous data. Additionally, it's hard to observe interpretations the way we can observe the forms of utterances - in particular, it's unclear if the child gets the same interpretation the adult intends. This in general makes semantic acquisition stuff like this a very interesting problem.

- For the passives, I wonder if children's passive knowledge varies by verb semantics. I could imagine a situation where passives with physical verbs come first (easily observable), then internal state (like heard), and then mental (like thought). This ties into how observable the data are for each verb type.

- For explaining long-distance wh questions with wh-medial constructions (What do you think what does Cookie Monster like?), I think the authors are a touch hasty on dismissing a juxtaposition account simply because kids don't repeat the full NP (e.g., Which smurf) in the wh-medial position. It seems like this could be explained by a bit of pragmatic knowledge about pronoun vs. name usage, where kids don't like to say the full name after they've already said it earlier in the utterance (we know this from imitation tasks with young kids around 3 years old, I believe).

- The productivity assumption I mentioned in the intro to this post relates to this wh-medial question issue. The third argument against juxtaposition is that we should expect to see certain kinds of utterances regularly (like (41)), but we don't observe them that often. However, before assuming this means that children do not productively use these forms, we probably need to have an objective measure of how often we would expect them to use these forms (probably based on a Zipfian distribution, etc.).

- I love how elegant the continuity hypothesis is. I'm less convinced by the wh-medial questions as evidence, but it's potentially a support for it. However, I find the positive polarity stuff (and in particular, the different behavior in English vs. Japanese children, as compared to adults) to be more convincing support for it (the kids have an initial bias that they probably didn't pick up from the adults). The only issue (for me) with the PPI parameter is that it seems very narrow. Usually, we try to make parameters for things that connect to a lot of different linguistic phenomena. Maybe this parameter might connect to other logical operators, and not just AND and OR? On a related note, if it's just tied to AND and OR, what does the parameter really accomplish? That is, does it reduce the hypothesis space in a useful way? How many other hypotheses could there be otherwise for interpreting AND and OR?

- Related to the PPI stuff: I was less clear on their story about how children pick that initial bias: "...favor parameter values that generate scope relations that make sentences true in the narrowest range of circumstances...". This is very abstract indeed - kids are measuring an interpretation by how many hypothetical situations it would be true for. This really depends on their ability to imagine those other situations and actively be comparing them against a current interpretation...

~~~
References:
Yang, C. (2010 Ms.) Who's Afraid of George Kingsley Zipf? Unpublished Manuscript, Universty of Pennsylvania.

Yang, C. (2011). A Statistical Test for Grammar. Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, 30-38.

Wednesday, May 2, 2012

Next time on May 16: Crain & Thornton (2012)

Thanks to everyone who was able to join us for our thoughtful discussion of Bouchard (2012)! Next time on May 16, we'll be reading a survey article on syntactic acquisition that compares two opposing current approaches, and attempts to adjudicate between them. It's possible that the learning problems discussed can be good targets for computational modeling studies as well.

Crain, S. & Thornton, R. (2012). Syntax acquisition. WIREs Cogn Sci, doi: 10.1002/wcs.1158.
http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/CrainThornton2012_SyntaxAcquisition.pdf

See you then!
-Lisa

Computational Models of Language (at UC Irvine)