Computational Models of Language (at UC Irvine): October 2012

Wednesday, October 24, 2012

Next time on 11/14 @ 2pm in SBSG 2221 = Hsu et al. 2011

Hi everyone,

Thanks to everyone who participated in our thoughtful discussion of Gagliardi et al. (2012)! For our next meeting on Wednesday November 14th @ 2pm in SBSG 2221, we'll be looking at an article that investigates a way to quantify natural language learnability and discusses the impact this has on the debate about the nature of the necessary learning biases for language:

Hsu, A., Chater, N., & Vitanyi, P. 2011. The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis. Cognition, 120, 380-390.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/HsuEtAl2011_ProbLang.pdf

See you then!
-Lisa

Monday, October 22, 2012

Some thoughts on Gagliardi et al. (2012)

I thought this was a really lovely Cog Sci paper showcasing how to combine experimental & computational methodologies (and still make it all fit in 6 pages). The authors really tried to give the intuitions behind the modeling aspects, which makes this more accessible to a wider audience. The study does come off as a foundational one, given the many extensions that could be done (involving effects in younger word learners, cross-linguistics applications, etc.), but I think that's a perfectly reasonable approach (again, given the page limitations). I also thought the empirical grounding was really lovely for the computational modeling part, especially as relating to the concept priors. Granted, there are still some idealizations being made (more discussion of this below), but it's nice to see this being taken seriously.

Some more targeted thoughts:

--> One issue concerns the age of the children tested experimentally (4 years old) (and as Gagliardi et al. mention, a future study should look at younger word learners). The reason is that 4-year-olds are fairly good word learners (and have a vocabulary of some size), and presumably have the link between concept and grammatical category (and maybe morphology and grammatical category for the adjectives) firmly established. So it maybe isn't so surprising that grammatical category information is helpful to them. What would be really nice is to know when that link is established, and the interaction between concept formation and recognition/mapping to grammatical categories. I could certainly imagine a bootstrapping process, for instance, and it would be useful to understand that more.

--> The generative model assumes a particular sequence, namely (1) choose the syntactic category, (2) choose the concept, and (3) choose instances of that concept. This seems reasonable for the teaching scenario in the experimental setup, but what might we expect in a more realistic word-learning environment? Would a generative model still have syntactic category first (probably not), or instead have a balance between syntactic environment and concept? Or maybe it would be concept first? And more importantly, how much would this matter? It would presumably change the probabilities that the learner needs to estimate at each point in the generative process.

--> I'd be very interested to see the exact way the Mechanical Turk survey was conducted for classifying things as examples of kinds, properties, or both (and which words were used). Obviously, due to space limitations, this wasn't included here. But I can imagine that many words might easily be described as both kind & concept, if you think carefully enough (or maybe too carefully) about it. Take "cookie", for example (a fairly common child word, I think): It's got both kind (ex: food) and property aspects (ex: sweet) that are fairly salient. So it really matters what examples you give the participants and how you explain the classification you're looking for. And even then, we're getting adult judgments, where child judgments might be more malleable (so maybe we want to try this exercise with children too, if we can).

--> Also, on a related note, the authors make a (reasonable) idealization that the distribution of noun and adjective dimensions in the 30-month-old CDIs are representative of the "larger and more varied set of words" that the child experimental participants know. However, I do wonder about the impact of that assumption, since we are talking about priors (which drive the model to use grammatical category information in a helpful way). It's not too hard to imagine children whose vocabularies skew away from this sample (especially if they're older). Going in the other direction though, if we want to try to extend this to younger word learners, then the CDIs start to become a very good estimate of the nouns and adjectives these children know, so that's very good.

Wednesday, October 10, 2012

Next time on Oct 24 @ 2pm in SBSG 2221 = Gagliardi et al. 2012

Thanks to everyone who participated in our thoughtful discussion of Feldman et al. (2012 Ms)! For our next meeting on Wednesday October 24 @ 2pm in SBSG 2221, we'll be looking at an article that seeks to model learning of word meaning for specific grammatical categories:

Gagliardi, A., E. Bennett, J. Lidz, & N. Feldman. 2012. Children's Inferences in Generalizing Novel Nouns and Adjectives. In N. Miyake, D. Peebles, & R. Cooper (Eds), Proceedings of the 34th Annual Meeting of the Cognitive Science Society, 354-359.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/GagliardiEtAl2012_NounAdjectiveClassification.pdf

See you then!

Monday, October 8, 2012

Some thoughts on Feldman et al. (2012 Ms)

So I'm definitely a huge fan of work that combines different levels of information when solving acquisition problems, and this is that kind of study. In particular, as Feldman et al. note themselves, they're making explicit an idea that came from Swingley (2009): Maybe identifying phonetic categories from the acoustic signal is easier if you keep word context in mind. Another way of putting this is that infants realize that sounds are part of larger units, and so as they try to solve the problem of identifying their native sounds, they're also trying to solve the problem of what these larger units are. This seems intuitively right to me (I had lots of notes in the margins saying "right!" and "yes!!"), though of course we need to grant that infants realize these larger units exist.

One thing I was surprised about, since I had read an earlier version of this study (Feldman et al. 2009): The learners here actually aren't solving word segmentation at the same time they're learning phonetic categories. For some reason, I had assumed they were - maybe because the idea of identifying the lexicon items in a stream of speech seems similar to word segmentation. But that's not what's going on here. Feldman et al. emphasize that the words are presented with boundaries already in place, so this is a little easier than real life. (It's as if the infants are presented with a list of words, or just isolated words.) Given the nature of the Bayesian model (and especially since one of the co-authors is Sharon Goldwater, who's done work on Bayesian segmentation), I wonder how difficult it would be to actually do word segmentation at the same time. It seems fairly similar to me, with the lexicon model already in place (geometric word length, Dirichlet process for lexicon item frequency in the corpus, etc.)

Anyway, on to some more targeted thoughts:

--> I thought the summary of categorization & the links between categorization in language acquisition and categorization in other areas of cognition was really well presented. Similarly, the summary of the previous phonetic category learning models was great - enough detail to know what happened, and how it compares to what Feldman et al. are doing.

--> Regarding the child-directed speech data used, I thought it was really great to see this kind of empirical grounding. I did wonder a bit about which corpora the CHILDES parental frequency count draws from though - since we're looking at processes that happen between 6 and 12 months, we might want to focus on data directed at children of that age. There are plenty of corpora in the American English section of CHILDES with at least some data in this range, so I don't think it would be too hard. The same conversion with the CMU pronouncing dictionary could then be used on those data. (Of course, getting the actual acoustic signal would be best, but I don't know how many CHILDES corpora have this information attached to them. But if we had that, then we could get all the contextual/coarticulatory effects.) On a related note, I wonder how hard it would be to stack a coarticulatory model on top of the existing model, once you had that data. Basically, this would involve hypothesizing different rules, perhaps based on motor constraints (rather than the more abstract rules that we see in phonology, such as those that Dillon et al. (forthcoming) look into in their learning model). Also related, could a phonotactic model of some kind be stacked on top of this? (Blanchard et al. 2011 combine word segmentation & phonotactics.) A word could be made up of bigrams of phonetic categories, rather than just the unigrams in there now.

--> I liked that they used both the number of categories recovered and the pairwise performance measures to gauge model performance. While it seems obvious that we want to learn the categories that match the adult categories, some previous models only checked that the right number of categories were recovered.

--> The larger point about the failure of distributional learning on its own reminds me a bit of Gambell & Yang (2006), who essentially were saying that distributional learning works much better in conjunction with additional information (stress information in their case, since they were looking at word segmentation). Feldman et al.'s point is that this additional information can be on a different level of representation, and depending on what you believe about stress w.r.t. word segmentation, Gambell & Yang would be saying the same thing.

--> The discussion of minimal pairs is very interesting (and this was one of the cool ideas from the original Feldman et al. 2009 paper) - minimal pairs can actually harm phonetic category acquisition in the absence of referents. In particular, it's more parsimonious to just have one lexicon item whose vowel varies, and this in turn creates broader vowel categories than we want. So, to succeed, the learner needs to have a fairly weak bias to have a small lexicon - this then leads to splitting minimal pairs into multiple lexicon items, which is actually the correct thing to do. However, we then have to wonder how realistic it is to have such a weak bias for a small lexicon. (Given memory & processing constraints in infants, it might seem more realistic to have a strong bias for a small lexicon.) On a related note, Feldman et al note later on that information about word referents actually seem to hinder infant ability to distinguish a minimal pair (citing Stager & Werker 1997). Traditionally, this was explained as something like "word learning is extra hard processing-wise, so infants fail to make the phonetic category distinctions that would separate minimal pairs." But the basic point is that word referent information isn't so helpful. But maybe it's enough for infants to know that words are functionally different, even if the exact word-meaning mapping isn't established? This might be less cognitively taxing for infants, and allow them to use that information to separate minimal pairs. Or instead, maybe we should be looking for evidence that infants are terrible at learning minimal pairs when they're first building their lexicons. Feldman et al. reference some evidence that non-minimal pairs are actually really helpful for category learning (more specifically, minimal pairs embedded in non-minimal pairs.)

--> I thought the discussion of hierarchical models in general near the end was really nice, and was struck by the statement that "knowledge of sounds is nothing more than a type of general knowledge about words". From a communicative perspective, this seems right - words are the meaningful things, not individual sounds. Moreover, if we translate this statement back over to syntax since Perfors et al. (2011) used hierarchical models to learn about hierarchical grammars, we get something like "knowledge of hierarchical grammar is nothing more than a type of general knowledge about individual parse tree structures", and that also seems right. Going back to sounds and words, it's just a little odd at first blush to think of sounds as being the higher level of knowledge and words being the lower level of knowledge. But I think Feldman et al. argue for it effectively.

--> I thought this was an excellent statement describing the computational/rational approach: "...identifying which problem [children] are solving can give us clues to the types of strategies that are likely to be used."

~~~
References

Blanchard, D., J. Heinz, & R. Golinkoff. 2010. Modeling the contribution of phonotactic cues to the problem of word segmentation. Journal of Child Language, 27, 487-511.

Dillon, B., E. Dunbar, & W. Idsardi. forthcoming. A single stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science.

Feldman, N., T. Griffiths, & J. Morgan. 2009. Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Conference on Cognitive Science.

Gambell, T. & C. Yang. 2006. Word Segmentation: Quick but not dirty. Manuscript, Yale University.

Perfors, A., J. Tenenbaum, & T. Regier. 2011. The learnability of abstract syntactic principles. Cognition, 118, 306-338.

Stager, C. & J. Werker. 1997. Infants listen for more phonetic detail in speech perception than word-learning taste. Nature, 388, 381-382.

Swingley, D. 2009. Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society B, 364, 3617-3632.

Computational Models of Language (at UC Irvine)