I love seeing work that evaluates an idea against naturalistic data. It’s often the exciting next “proof of concept” once you’ve got an implemented theory that works on idealized data or controlled experimental data.
Some other thoughts:
(1) I completely sympathize with the idea that anything from the broader context might be relevant for discriminating contrastive dimensions. I think the question then becomes how infants decide which contextual factors to pay attention to, out of all the possible ones. Are certain ones more salient period, or because the infant brain has certain perceptual biases, etc? What’s the hypothesis space of possible contextual features, and how might an infant navigate through that hypothesis space?
(2) Thinking about noise: I wonder how much noise this kind of approach can tolerate. For instance (and this is a point the H&F2022 bring up in the discussion), if infants have a fuzzier notion of distributional similarity than Earthmover’s distance/KL divergence/whatever because of their developing learning abilities, can they still catch onto these distributional differences?
H&F2022 also implement some ideas for fuzzier (mis)perception of the input, which shows this approach can tolerate at least 20% noise in perception. So maybe someone could implement the fuzzier distributional similarity idea in a similar way.