I love how this work uses state-of-the-art “AI” models (i.e., neural networks with distributed representations) to concretely explore ideas about what must be built in in order to explain infant phonetic category development. The authors are quite clear that this is a computational-level model that tries to capture “evolutionary” vs. “developmental” timelines, with the built-in knowledge getting built in during the evolutionary stage, as implemented by the pretraining. To me, this is a beautiful application of the neural network framework that yields easily interpretable results.
Other thoughts:
(1) Pretraining types: Why ambient in Experiment 1 and multilingual in Experiment 2, instead of language-specific sounds? I get that ambient is a reasonable baseline (that did shockingly well), but I’m curious as to why multilingual sounds would be a better initial-state-inducer than American English sounds/Japanese sounds, respectively. Maybe it’s about capturing the “evolutionary aspect” that the sound processing mechanisms are tuned to human speech in general?
Related: Future work might try to pretrain with the kind of sounds that can be heard in utero (I know certain parts of the acoustic signal make it through while others don’t). The language-specific versions of that seem like they’d be a great investigation of prenatal development.
(2) “Panel a) of Figure 5 suggests that pretraining in multilingual is sensibly similar to pretraining on ambient speech sounds in terms of initial speech sound discrimination”.
I don’t understand what’s sensible about this. I might expect that pretraining on speech sounds would yield even better initial performance than pretraining on ambient sounds. The authors themselves note this: “Contrary to our initial hypothesis, training on multilingual speech does not yield higher speech sound discrimination capabilities compared to pretraining on ambient sounds.” So maybe the “sensibly” is a typo?
(3) It seems like no version of this model actually shows true loss of discriminability. Instead, the best case (Experiment 3 in the Appendix) shows a slight reduction of discriminability, though still pretty darned good, at >.90 accuracy. So, I’m not sure what to make of the actual results, even though I really appreciate the way the authors set up the acquisition problem space. Maybe the issue is that this is effectively an ideal learner? So, maybe adding in cognitive constraints would cause the “loss trajectory” to actually become loss of discrimination (accuracy <0.50)?
I think that in this article, “sensibly” is a gallicism for the French adverb “sensiblement”, which means “roughly, approximately” (not “which makes sense”)
ReplyDelete