Wednesday, October 19, 2022

Some thoughts on Hitczenko & Feldman 2022

I love seeing work that evaluates an idea against naturalistic data. It’s often the exciting next “proof of concept” once you’ve got an implemented theory that works on idealized data or controlled experimental data.


Some other thoughts:

(1) I completely sympathize with the idea that anything from the broader context might be relevant for discriminating contrastive dimensions. I think the question then becomes how infants decide which contextual factors to pay attention to, out of all the possible ones. Are certain ones more salient period, or because the infant brain has certain perceptual biases, etc? What’s the hypothesis space of possible contextual features, and how might an infant navigate through that hypothesis space?


(2) Thinking about noise: I wonder how much noise this kind of approach can tolerate. For instance (and this is a point the H&F2022 bring up in the discussion), if infants have a fuzzier notion of distributional similarity than Earthmover’s distance/KL divergence/whatever because of their developing learning abilities, can they still catch onto these distributional differences?


H&F2022 also implement some ideas for fuzzier (mis)perception of the input, which shows this approach can tolerate at least 20% noise in perception. So maybe someone could implement the fuzzier distributional similarity idea in a similar way.


Tuesday, October 4, 2022

Some thoughts on Cao et al. 2022

I really like seeing modeling work like this where a more complex, ideal computation (here, EIG) can be well-approximated by a simpler, more-heuristic computation (here, surprisal and KL divergence) when it comes to capturing developmental behavior. Of course, this paper is presenting a first-pass evaluation over adult behavior, but as the authors note, future work can extend their evaluation to infant looking behavior. I definitely would like to see how well this approach works for infant data, since I’d be surprised if there wasn’t some immaturity (i.e., resource constraints, other biases) at work for the computation itself in infants, compared with adult decision-making. And then the interesting question is how to capture that immaturity – for instance, do the approximations of the computation work even better than the idealized computation with EIG? Would even simpler heuristics that don’t approximate EIG as well but are also backward-looking, rather than forward-looking, be better?


Other specific thoughts:


(1) Noisy perception: It’s really nice to see this worked into a developmental model, since – especially for infants – imperfect representations of stimuli seems like a plausible situation. That is, the “perceptual intake” into the learning system depends on immature knowledge and abilities, and is therefore different from the input signal that’s out there in the world. (To be fair, the perceptual intake for adults is also different from the input signal out there in the world, and adults don’t have immature knowledge and abilities. So children basically have to learn to be adult-like in how they “skew” the input signal.)


(2) The RANCH model involves accumulating noisy samples and choosing what to do at each moment. This sounds like the diffusion model of decision-making from mathematical psych to me. I wonder if RANCH is an implementation of that (and if not, how they differ)?


(3) What the learner needs to know: A key idea here is that the motivation to sample the input at all is because the learner knows perception is noisy. To me, this is pretty reasonable knowledge to build into a modeled child. It reminds me of Perkins et al. 2022 where the learner knows misperception occurs, and so has to learn to filter out erroneous data. Importantly there, the modeled learner doesn’t have to know the specifics beyond that.


Perkins, L., Feldman, N. H., & Lidz, J. (2022). The Power of Ignoring: Filtering Input for Argument Structure Acquisition. Cognitive Science, 46(1), e13080.