Computational Models of Language (at UC Irvine): October 2011

Monday, October 17, 2011

Next time: Alishahi & Pyykkonen (2011)

Thanks to those of you who were able to join our nicely in-depth discussion today of Dillon et al. (2011)'s article on applying Bayesian models to phonological acquisition! Next time on 11/7 (@3:30pm in SBSG 2221), we'll be discussing an article that looks at the phenomenon of syntactic bootstrapping, which is the ability to infer word meaning and abstract structure associated with that word from the syntactic context of the word:

Alishahi, A. & Pyykkonen, P. (2011). The onset of syntactic bootstrapping in word learning: Evidence from a computational study. Proceedings of the 33nd Annual Conference of the Cognitive Science Society, Boston, MA.

See you then!

Friday, October 14, 2011

Some thoughts on Dillon et al. (2011)

I'm really fond of this paper - I love that they're tackling realistic problems (with realistic language data), that they're seriously looking at the state of the art with respect to computational models of it, and that they're finding a way to connect linguistic theory (e.g., "There are phonological rules") with this level of concreteness (e.g., "Let's make them linear models operating over acoustic space"). Because of all this, I think their point about the potential issues of two-stage models comes across very clearly. And I love that that they can make a model that learns both phonemes and their relationships between phonetic categories simultaneously. Moreover, the fact they can do this without trying to learn a lexicon simultaneously (like Feldman, Griffiths, & Morgan (2009) do) is impressive to me, since that was the main thing that seemed to lead to good results for Feldman et al. (2009). Notably, they make use of the linguistic context (i.e., does a uvualar consonant follow), which is something Swingley (2009) recently suggested looks really helpful for English phonemes in a review of infant phoneme learning.

A few more targeted thoughts:

I really like that they note the three-vowel +allophones system is not just a special weirdness of Inuktitut, but rather something that occurs in a number of different languages. This makes it more important to be able to account for this kind of data, and bolsters support for the single stage model.
I also thought it was useful to note that the EM approach follows the frequentist tradition. After a moment's reflection, this is clearly true, but it didn't occur to me until they pointed it out.
Because of the nature of the Bayesian model, the more data that come in, the more the model is likely to prefer more categories over less (and the explanation they give for this just before the discussion of Expt 1 is entirely sensible). This carries over even for their cool Expt 3 model that learns categories and rules simultaneously (as we can see in Table 6) - the 12000 data point model is much more likely to posit 4 or 5 categories than the 1000 data point model. I'm wondering what this means for actual acquisition. Should we expect that infants learn very quickly and so end up with 3 categories + rules? Or would we expect that infants might go through a stage where they have 4 or 5 categories, and have to recover (maybe based on doing word segmentation/lexicon item discovery)?
For the one-stage model in Expt 3, they mention that they build in a bias for complementary distribution - is this an uncontroversial assumption (or easy to derive from innate abilities we know infants do have)? I honestly don't have strong intuitions about this. It'd be great if it was.

References:

Feldman, N., Griffiths, T., and Morgan, J. (2009). Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Conference on Cognitive Science.

Swingley, D. (2009). Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society B, 364, 3617-3632.

Monday, October 3, 2011

Thanks to those of you who were able to join our spirited discussion today of Dunbar et al. (2010)'s article on Bayesian reasoning in linguistics! Next time on 10/17 (@3pm in SBSG 2221), we'll be discussing an article by the same crew of authors that models the acquisition of specific phonological phenomena:

Dillon, B., Dunbar, E., & Idsardi, B. (2011 ms). A single stage approach to learning phonological categories: Insights from Inuktitut. University of Maryland, College Park and University of Massachusetts, Amherst.

See you then!

Computational Models of Language (at UC Irvine)

Monday, October 17, 2011

Next time: Alishahi & Pyykkonen (2011)

Friday, October 14, 2011

Some thoughts on Dillon et al. (2011)

Monday, October 3, 2011

People who think this blog is awesome

Members