So I'm definitely a huge fan of work that combines different levels of information when solving acquisition problems, and this is that kind of study. In particular, as Feldman et al. note themselves, they're making explicit an idea that came from Swingley (2009): Maybe identifying phonetic categories from the acoustic signal is easier if you keep word context in mind. Another way of putting this is that infants realize that sounds are part of larger units, and so as they try to solve the problem of identifying their native sounds, they're also trying to solve the problem of what these larger units are. This seems intuitively right to me (I had lots of notes in the margins saying "right!" and "yes!!"), though of course we need to grant that infants realize these larger units exist.
One thing I was surprised about, since I had read an earlier version of this study (Feldman et al. 2009): The learners here actually aren't solving word segmentation at the same time they're learning phonetic categories. For some reason, I had assumed they were - maybe because the idea of identifying the lexicon items in a stream of speech seems similar to word segmentation. But that's not what's going on here. Feldman et al. emphasize that the words are presented with boundaries already in place, so this is a little easier than real life. (It's as if the infants are presented with a list of words, or just isolated words.) Given the nature of the Bayesian model (and especially since one of the co-authors is Sharon Goldwater, who's done work on Bayesian segmentation), I wonder how difficult it would be to actually do word segmentation at the same time. It seems fairly similar to me, with the lexicon model already in place (geometric word length, Dirichlet process for lexicon item frequency in the corpus, etc.)
Anyway, on to some more targeted thoughts:
--> I thought the summary of categorization & the links between categorization in language acquisition and categorization in other areas of cognition was really well presented. Similarly, the summary of the previous phonetic category learning models was great - enough detail to know what happened, and how it compares to what Feldman et al. are doing.
--> Regarding the child-directed speech data used, I thought it was really great to see this kind of empirical grounding. I did wonder a bit about which corpora the CHILDES parental frequency count draws from though - since we're looking at processes that happen between 6 and 12 months, we might want to focus on data directed at children of that age. There are plenty of corpora in the American English section of CHILDES with at least some data in this range, so I don't think it would be too hard. The same conversion with the CMU pronouncing dictionary could then be used on those data. (Of course, getting the actual acoustic signal would be best, but I don't know how many CHILDES corpora have this information attached to them. But if we had that, then we could get all the contextual/coarticulatory effects.) On a related note, I wonder how hard it would be to stack a coarticulatory model on top of the existing model, once you had that data. Basically, this would involve hypothesizing different rules, perhaps based on motor constraints (rather than the more abstract rules that we see in phonology, such as those that Dillon et al. (forthcoming) look into in their learning model). Also related, could a phonotactic model of some kind be stacked on top of this? (Blanchard et al. 2011 combine word segmentation & phonotactics.) A word could be made up of bigrams of phonetic categories, rather than just the unigrams in there now.
--> I liked that they used both the number of categories recovered and the pairwise performance measures to gauge model performance. While it seems obvious that we want to learn the categories that match the adult categories, some previous models only checked that the right number of categories were recovered.
--> The larger point about the failure of distributional learning on its own reminds me a bit of Gambell & Yang (2006), who essentially were saying that distributional learning works much better in conjunction with additional information (stress information in their case, since they were looking at word segmentation). Feldman et al.'s point is that this additional information can be on a different level of representation, and depending on what you believe about stress w.r.t. word segmentation, Gambell & Yang would be saying the same thing.
--> The discussion of minimal pairs is very interesting (and this was one of the cool ideas from the original Feldman et al. 2009 paper) - minimal pairs can actually harm phonetic category acquisition in the absence of referents. In particular, it's more parsimonious to just have one lexicon item whose vowel varies, and this in turn creates broader vowel categories than we want. So, to succeed, the learner needs to have a fairly weak bias to have a small lexicon - this then leads to splitting minimal pairs into multiple lexicon items, which is actually the correct thing to do. However, we then have to wonder how realistic it is to have such a weak bias for a small lexicon. (Given memory & processing constraints in infants, it might seem more realistic to have a strong bias for a small lexicon.) On a related note, Feldman et al note later on that information about word referents actually seem to hinder infant ability to distinguish a minimal pair (citing Stager & Werker 1997). Traditionally, this was explained as something like "word learning is extra hard processing-wise, so infants fail to make the phonetic category distinctions that would separate minimal pairs." But the basic point is that word referent information isn't so helpful. But maybe it's enough for infants to know that words are functionally different, even if the exact word-meaning mapping isn't established? This might be less cognitively taxing for infants, and allow them to use that information to separate minimal pairs. Or instead, maybe we should be looking for evidence that infants are terrible at learning minimal pairs when they're first building their lexicons. Feldman et al. reference some evidence that non-minimal pairs are actually really helpful for category learning (more specifically, minimal pairs embedded in non-minimal pairs.)
--> I thought the discussion of hierarchical models in general near the end was really nice, and was struck by the statement that "knowledge of sounds is nothing more than a type of general knowledge about words". From a communicative perspective, this seems right - words are the meaningful things, not individual sounds. Moreover, if we translate this statement back over to syntax since Perfors et al. (2011) used hierarchical models to learn about hierarchical grammars, we get something like "knowledge of hierarchical grammar is nothing more than a type of general knowledge about individual parse tree structures", and that also seems right. Going back to sounds and words, it's just a little odd at first blush to think of sounds as being the higher level of knowledge and words being the lower level of knowledge. But I think Feldman et al. argue for it effectively.
--> I thought this was an excellent statement describing the computational/rational approach: "...identifying which problem [children] are solving can give us clues to the types of strategies that are likely to be used."
Blanchard, D., J. Heinz, & R. Golinkoff. 2010. Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 27, 487-511.
Dillon, B., E. Dunbar, & W. Idsardi. forthcoming. A single stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science.
Feldman, N., T. Griffiths, & J. Morgan. 2009. Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Conference on Cognitive Science.
Gambell, T. & C. Yang. 2006. Word Segmentation: Quick but not dirty. Manuscript, Yale University.
Perfors, A., J. Tenenbaum, & T. Regier. 2011. The learnability of abstract syntactic principles. Cognition, 118, 306-338.
Stager, C. & J. Werker. 1997. Infants listen for more phonetic detail in speech perception than word-learning taste. Nature, 388, 381-382.
Swingley, D. 2009. Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society B, 364, 3617-3632.