Wednesday, February 11, 2015

Some thoughts on Yurovsky & Frank 2014 Ms

One thing I really enjoyed about this paper was the integration of cognitive resource constraints (memory and attention) into an ideal learner model. I may have some quibbles as to calling this “algorithmic” vs. “computational” (more on this below), since that distinction for me has to do with the inference process, but the core idea of including these aspects in the learning model seems like a nice step forward.

That being said, I thought the way “attention” was integrated was a bit curious — if I’m understanding correctly, it was part of the speaker’s intentions (I). Is this because the listener focuses her attention on the speaker’s intention to refer to something repeatedly? That’s the best link I could come up with. (More discussion on this below, too.) If so, I could imagine this ability maturing over time, so that early word-learners (~1 year old) have less ability to do this accurately than adults.

Back to more general things: This was also a nice demonstration of how two very different stories of a process can be implementations of a more general approach (as represented by the $sigma variable). Still, as the authors themselves note, it’s unclear what this particular study shows for either L1 or L2 learning. But it’s a good methodology demonstration, and maybe once more L1 data is available, this model can be applied to tell us something about word learning in toddlers.

More specific comments: 

(1) Introduction, “…both of these algorithmic-level solutions will, in the limit, produce successful word-reference mapping, they will do so at very different rates…may be necessary to posit additional biases and constraints on learners in order for human-scale lexicons to be learned in human-scale time from the input available to children” — This is a very good point, and highlights one important measure of algorithmic-level approaches. That being said, I think the particular approaches being discussed here are really only meant to apply to very early word-learning when almost no words are already known, which may only last a short while. So, the “human-scale lexicon” may be rather small.

(2) Model, p.17: “…the most convenient place to integrate attention is in defining the learner’s beliefs about P(I | O)…[o]ne possibility is to let each object be equally likely to be the intended reference…[a]lternatively, the learner could place all the probability mass on one hypothesized referent…more flexible alternative is to assign some probability mass $sigma to the hypothesized referent…” — So this is the specific instantiation I alluded to in my comment at the beginning. Since I is meant to be about the speaker’s intentions, it seems like this has to be some kind of theory of mind thing, where the listener assumes the speaker is intending to talk about everything with uniform probability (option one), only one thing all the time (option two), or some things more than other (option three). This seems vaguely odd as a model of listener “attention”, though it may capture assumptions about communicative goals very naturally.

(3) General Discussion, p.21: “…graded shift in representation was well-described by an ideal learning model, but only when this model was modified to take into account psychological constraints on attention and memory…the shift from a computational to an algorithmic (or, psychological) description was critical” — And this is where my quibbles arise. I completely agree that integrating resource constraints is a great step forward, but I hesitate to say these were integrated at the algorithmic level. The inference process was still MCMC, if I understood correctly, and I don’t think any modification was done to it. So, for me, that’s a way to approximate the optimal inference, and so is a computational-level thing. Maybe this is more “rational process model”, though (one step down from pure computational, but not yet what I'd call algorithmic)? 

No comments:

Post a Comment