Computational Models of Language (at UC Irvine): Some thoughts on Frank et al. 2013

One of the things I really liked about this paper was the intention to integrate more context into a computational model of acquisition (in this case, implemented as the child using utterance type information). While the particular utterance types may be idealized, it’s an excellent first step to show where this information helps and how the number of utterance types impacts that helpfulness (basically, as a proxy for more preceding context when there’s data sparseness). More generally, this got me thinking about approximation, e.g., approximating more context by the utterance type cue and approximating hierarchical structure with trigrams. We know it’s not the same, but it seems to be good enough, perhaps because it manages to capture the relevant property anyway. (For the utterance type approximating more context, this seems to be true — utterance type tells you about category order stuff more generally, and preceding context gives you specific category order information for the local environment.)

The authors also note that the actual utterance types children infer may be based on a number of cues, such as prosody (in particular, pitch contour or intonation). Knowing the ideal number of utterance types might be useful so we know how many classes we’re aiming for, based on these cues. At the very least, the results here suggest fewer may be better. Relatedly: recent experimental work by Geffen & Mintz (2014) suggests 12-month-olds can at least make a binary classification between declaratives vs. yes/no questions in English in the absence of prosodic contour cues — so there may be other cues infants are able to use besides prosody at the age when early grammatical categorization would be happening.
*Reference: Geffen, S. & Mintz, T. 2014. Can You Believe It? 12-Month-Olds Use Word Order to Distinguish Between Declaratives and Polar Interrogatives. Language Learning and Development. DOI: 10.1080/15475441.2014.951595.

More specific thoughts:

(1) Introduction, age ranges: “…children who are at the point of learning syntax — at 2-3 years of age” — This is just me being persnickety, but I think the age is closer to 1 if we’re talking about early categorization before the learner has any knowledge of categories (which is the start state of the learner modeled here). I don’t think it matters for the model they do here and the cues they rely on, but it’s a more general point about these kinds of computational models. If we’re going to a model a process where the learner is basically starting from scratch (no prior knowledge of categories here), then this is going to be happening very early and probably won’t persist for very long. Even after a little of this kind of learning, the learner then has some knowledge, which ought to bias future learning (future categorization here). This brings up the tricky subject of what the output ought to be for such early stage learning models (which Lawrence, Galia, and I have been worrying about lately). Do we really want adult-level grammatical categories? Maybe not. But what’s acceptable output and how do we tell if we’ve got it? F&al2013 sensibly compare the inferred categories to the CHILDES-annotated categories, which are based on adult categories. But if this is meant to model early categorization occurring around 12 months and only lasting long enough to boost other categorization, maybe that’s not the output we want.

(2) English experiments, 4.1.1. Corpora, methods quibble: “Wh-words are tagged as adverbs…pronouns…or determiners.” — I wonder why. Wh-words have pretty distinct properties with respect to word order (wh-fronting in English), among other things. It seems like it might have been more useful to cluster wh-words together into their own category.

(3) English experiments, 4.1.2. Inference: Somewhat related to the point above, it’d be a nice extension to not preset the number of categories the learner is meant to identify, and instead infer how many categories are best and what words belong in those categories. (Hello, infinite BHMM…)

(4) 5.2, BHMM-E: This is a really nice demonstration of how wrong assumptions hurt. It seem like we often see models that show how assumptions are helpful (because, hey, that’s interesting!), but it’s less often that we see such a clean demonstration of active harm resulting (instead of it just having no effect).

(5) 5.4.3, cross-linguistic variation: “Spanish does not show the same improvement…BHMM-T models do not differ from the baseline BHMM” — It sounds like whether infants heed utterance type as a cue may need to be learned, rather than just being a thing they do. Though since it doesn’t actually harm (it just doesn’t help), maybe it’s okay for infants to try to use it anyway in Spanish. However, just brainstorming about how infants might learn to pay attention to word type…perhaps they could notice word order differences across utterance types (i.e., use various cues to identify utterance types and then see if word order as defined by specific recognizable words — rather than grammatical categories — seems to change). Then, if word order varies, use utterance type information for categorization; if not, then don’t.

Computational Models of Language (at UC Irvine)

Wednesday, February 25, 2015

Some thoughts on Frank et al. 2013

No comments:

Post a Comment

People who think this blog is awesome

Members