Friday, October 14, 2011

Some thoughts on Dillon et al. (2011)

I'm really fond of this paper - I love that they're tackling realistic problems (with realistic language data), that they're seriously looking at the state of the art with respect to computational models of it, and that they're finding a way to connect linguistic theory (e.g., "There are phonological rules") with this level of concreteness (e.g., "Let's make them linear models operating over acoustic space").  Because of all this, I think their point about the potential issues of two-stage models comes across very clearly.  And I love that that they can make a model that learns both phonemes and their relationships between phonetic categories simultaneously. Moreover, the fact they can do this without trying to learn a lexicon simultaneously (like Feldman, Griffiths, & Morgan (2009) do) is impressive to me, since that was the main thing that seemed to lead to good results for Feldman et al. (2009).  Notably, they make use of the linguistic context (i.e., does a uvualar consonant follow), which is something Swingley (2009) recently suggested looks really helpful for English phonemes in a review of infant phoneme learning.

A few more targeted thoughts:

  • I really like that they note the three-vowel +allophones system is not just a special weirdness of Inuktitut, but rather something that occurs in a number of different languages.  This makes it more important to be able to account for this kind of data, and bolsters support for the single stage model.
  • I also thought it was useful to note that the EM approach follows the frequentist tradition.  After a moment's reflection, this is clearly true, but it didn't occur to me until they pointed it out.
  • Because of the nature of the Bayesian model, the more data that come in, the more the model is likely to prefer more categories over less (and the explanation they give for this just before the discussion of Expt 1 is entirely sensible).  This carries over even for their cool Expt 3 model that learns categories and rules simultaneously (as we can see in Table 6) - the 12000 data point model is much more likely to posit 4 or 5 categories than the 1000 data point model.  I'm wondering what this means for actual acquisition.  Should we expect that infants learn very quickly and so end up with 3 categories + rules?  Or would we expect that infants might go through a stage where they have 4 or 5 categories, and have to recover (maybe based on doing word segmentation/lexicon item discovery)?
  • For the one-stage model in Expt 3, they mention that they build in a bias for complementary distribution - is this an uncontroversial assumption (or easy to derive from innate abilities we know infants do have)?  I honestly don't have strong intuitions about this.  It'd be great if it was.




References:


Feldman, N., Griffiths, T., and Morgan, J. (2009). Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Conference on Cognitive Science.



Swingley, D. (2009). Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society B, 364, 3617-3632.


No comments:

Post a Comment