Monday, November 18, 2013

Some thoughts on Lewis & Frank 2013

I'm always a fan of learning models that involve solving different problems simultaneously, with the idea of leveraging information from one problem to help solve the other (Feldman et al. 2013 and Dillon et al. 2013 are excellent examples of this, IMHO). For Lewis & Frank (L&F), the two problems are related to word learning: how to pick the referent from a set of referents and how to pick which concept class that referent belongs to (which they relate to how to generalize that label appropriately).  I have to say that I struggled to understand how they incorporated the second problem, though -- it doesn't seem like the concept generalization w.r.t. subordinate vs. superordinate classes maps in a straightforward way to the feature analysis they're describing.  (More on this below.) I was also a bit puzzled by their assumption of where the uncertainty in learning originates from and the link they describe between what they did and the origin/development of complex concepts (more on these below, too).

On generalization & features:  If we take the example in their Figure 1, it seems like the features could be something like f1 = "fruit", f2 = "red", and f3 = "apple". The way they talk about generalization is as underspecification of feature values, which feels right.  So if we say f1 is the only important feature, then this corresponds nicely to the idea of "fruit" as a superordinate class.  But what if we allow f2 to be the important feature? Is "red" the superordinate class of "red" things?  Well, in a sense, I suppose. But this falls outside of the noun-referent system that they're working in - "red" spans many referents, because it's a property.  Maybe this is my misunderstanding in trying to map this whole thing to subordinate and superordinate classes, like Xu & Tenenbaum 2007 talk about, but it felt like that's what L&F intended, given the model in Figure 2 that's grounded in Objects at the observable level and the behavioral experiment they actually ran.

On where the uncertainty comes from: L&F mention in the Design of the Model section that the learning model assumes "the speaker could in principle have been mistaken about their referent or misspoken". From a model building perspective, I understand that this is easier to incorporate and allows graded predictions (which are necessary to match the empirical data), but from the cognitive perspective, this seems really weird to me. Do we have reason to believe children assume their speakers are unreliable? I was under the impression children assume their speakers are reliable as a default. Maybe there's a better place to work this uncertainty in - approximated inference from a sub-optimal learner or something like that. Also, as a side note, it seems really important to understand how the various concepts/features are weighted by the learner. Maybe that's where uncertainty could be worked in at the computational level.

On the origin/development of concepts: L&F mention in the General Discussion that "the features are themselves concepts that can be considered as primitives in the construction of more complex concepts", and then state that their model "describes how a learner might bootstrap from these primitives to infer more and complex concepts". This sounds great, but I was unclear how exactly to do that. Taking the f1, f2, and f3 from above, for example, I get that those are primitive features. So the concepts are then things that can be constructed out of some combination of their values (whether specified or unspecified)? And then where does the development come in? Where is the combination (presumably novel) that allows the construction of new features? I understand that these could be the building units for such a model, but I didn't see how the current model shows us something about that.

Behavioral experiment implementation: I'm definitely a fan of matching a model to controlled behavioral data, but I wonder about the specific kind of labeling they gave their subjects. It seems like they intended "dax bren nes" to be the label for one object shown (it's just unclear which it is - but basically, this might as well be a trisyllabic word "daxbrennes" ). This is a bit different from standard cross-situational experiments, where multiple words are given for multiple objects. Given that subjects are tested with that same label, I guess the idea is that it simplifies the learning situation.

Results:  I struggled a bit to decipher the results in Figure 5 - I'm assuming the model predictions are for the different experimental contexts, ordered by human uncertainty about how much to generalize to the superordinate class. Is the lexicon posited by the model just how many concepts to map to "dax-bren-nes", where concept = referent?

~~~
References

B. Dillon, E. Dunbar, & W. Idsardi. 2013. A single-stage approach to learning phonological categories: Insights from Inuktitut. Cognitive Science,  37, 344-377.

Feldman, N. H., Griffiths, T. L., Goldwater, S.,  Morgan, J. L. 2013. "A role for the developing lexicon in phonetic category acquisition." Psychological Review120(4), 751-778.

Xu, F., & Tenenbaum, J. 2007. Word Learning as Bayesian Inference.  Psychological Review, 114(2), 245-272.



No comments:

Post a Comment