Monday, January 18, 2016

Some thoughts on Gutman et al. 2014

I’m a big fan of G&al2014’s goal of learning the initial knowledge that gets other acquisition processes started. In this case, it’s about learning the basic elements that allow syntactic boostrapping to start, which itself allows children to learn more abstract word meanings. In CoLaLab, we’ve been looking at this same idea of useful initial knowledge with respect to speech segmentation and early syntactic categorization. 

For G&al2014’s work, I find it interesting that they rely on comparison to adult prosodic categories (specifically VN and NP) — I wonder if there’s a way to determine if the inferred prosodic categories are “good enough” in some sense, beyond matching VN and NP. For example, maybe the inferred categories can be used directly for syntactic bootstrapping, or maybe they can used to ease language processing in some measurable way. (As a side note, it also took me a moment to realize “syntactic categorization” for G&al2014 referred to prosodic phrase types rather than the typical syntactic categories like “noun” and “verb”. Just goes to show the importance of defining your terms to avoid confusion.)

I’m also a big fan of models that recognize children use a variety of cues very early on, i.e., here, prosody and semantics of a few familiar words, as well as edge sensitivity. Of course, it’s also important to understand the contribution of individual sources of information. But it’s really nice to see a more integrated model like this because it’s likely to be a more accurate simulation of what children are actually doing.

Other thoughts:

(1) I really like how this model shows which property of function words (the fact that they occur at prosodic phrase edges) allows children to learn that function words are really great cues — even before they have an official “function word” category like “determiner”.

(2) It’s interesting that the syntactic skeleton (formed via function words and prosodic boundaries) matches adult structure (NP = an apple) in some cases and not so much in others (he’s eating = VN, which isn’t a VP or an NP - it’s actually a non-constituent). I wonder how the recovery/update process works if you end up with a bunch of VN units - that is, what causes you to switch to VP = V NP and treat “he’s eating” as not-really-a-syntactic-unit in “he’s eating an apple”? 

(3) Section 2, Experiment 1: If units are constructed by looking at the initial word, it’s important that there not be too much variety in that first word (unless we want toddlers to end up with a zillion phrasal units). From the details in 2.2.1, it looks like they use the k most frequent words to define k classes of units, with k ranging from 5 to 70. Presumably, this would be something implicit to the learner, based on the learner's cognitive capacity limitations or some such. I also like that this is relying on the most frequent words, since that seems quite plausible as a way to figure out which phrasal types to notice. Related thought: Is it possible to design a model where k itself is inferred? I’m thinking generative non-parametric Bayesian models, for example.


(4) I also found it interesting that they used purity as the evaluation measure for a phrasal category, rather than pairwise precision (PWP). I wonder what benefit purity has over PWP, since footnote 7 explicitly notes they’re related. Is purity easier to interpret for some reason? G&al2014 do calculate recall and precision for the best instances of VN and NP, though (and find that the categories are very precise, even with as few as 10 categories).