Friday, May 16, 2014

Some thoughts on Kol et al. 2014

I completely love that this paper is highlighting the strength of computational models for precisely evaluating theories about language learning strategies (which is an issue near and dear to my heart). As K&al2014 so clearly note, a computational model forces you to implement all the necessary pieces of your theory and can show you where parts are underspecified. And then, when K&al2014 demonstrate the issues with the TBM, they can identify what parts seem to be causing the problem and where the theory needs to include additional information/constraints.

On a related note, I love that K&al2014 are worrying about how to evaluate model output — again, an issue I’ve been thinking about a lot lately.  They end up doing something like a bigger picture version of recall and precision — we don’t just want the model to generate all the true utterances (high recall). We want it to also not generate the bad utterances (high precision). And they demonstrate quite clearly that the TBM’s generative power is great…so great that it generates the bad utterances, too (and so has low precision from this perspective). Which is not so good after all.

But what was even more interesting to me was their mention of measures like perplexity to test the “quality of the grammars” learned, with the idea that good quality grammars make the real data less perplexing. Though they didn’t do it here, I wonder if there’s a reasonable way to do that for the learning strategy they talk about here — it’s not a grammar exactly, but it’s definitely a collection of units and operations that can be used to generate an output. So, as long as you have a generative model for how to produce a sequence of words, it seems like you could use a perplexity measure to compare this particular collection of units and operations against something like a context-free grammar (or even just various versions of the TBM learning strategy).

Some more targeted thoughts:

(1) K&al2014 make a point in the introduction that simulations that “specifically implement definitions provided by cognitive models of language acquisition are rare”.  I found this a very odd thing to say — isn’t every model an implementation of some theory of a language strategy? Maybe the point is more that we have a lot of cognitive theories that don’t yet have computational simulations.

(2) There’s a certain level of arbitrariness that K&al2014 note for things like how many matching utterances have to occur for frames to be established (e.g., if it occurs twice, it’s established).  Similarly, the preference for choosing consecutive matches over non-consecutive matches is more important than choosing more frequent matches. It’s not clear there are principled reasons for this ordering (at least, not from the description here — and in fact, I don’t think the consecutive preference isn’t implemented in the model K&al2014 put together later on). So, in some sense, these are sort of free parameters in the cognitive theory.


(3) Something that struck me about having high recall on the child-produced utterances with the TBM model — K&al2014 find that the TBM approach can account for a large majority of the utterances (in the high 80s and sometimes 90s). But what about the rest of them (i.e., those 10 or 20% that aren’t so easily reconstructable)? Is it just a sampling issue (and so having denser data would show that you could construct these utterances too)? Or is it more what the linguistic camp tends to assume, where there are knowledge pieces that aren’t a direct/transparent translation of the input? In general, this reminds me of what different theoretical perspectives focus their efforts on — the usage-based camp (and often the NLP camp for computational linguistics) is interested in what accounts for most of everything out there (which can maybe be thought of as the “easy” stuff), while the UG-based camp is interested in accounting for the “hard” stuff (even though that may be a much smaller part of the data).

No comments:

Post a Comment