General thoughts: I really like this kind of question in acquisition modeling: what does the learner actually have access to early on? A lot of existing work assumes children operate over clean, adult-like syllables. That’s clearly idealized. So the more interesting question is this: what could plausibly function as syllables before phonology is in place—and would those units be good enough for downstream learning?
Räsänen et al. (2018) take a nice step in this direction. Instead of assuming syllables, they derive syllable-like “acoustic chunks” directly from the speech signal using sonority. These aren’t phonological syllables, but rather perceptual units that fall out of general auditory processing, grounded in properties of the human auditory system. So now… let’s talk “syllables.”
What counts as a useful “syllable”?
The core result is that these acoustic chunks align reasonably well with annotated syllable boundaries across languages. That’s encouraging: it suggests learners could extract something syllable-like without prior linguistic knowledge.
But… do we actually need a match to adult-like syllables at this stage of acquisition? The goal is to use syllables as input to other processes (like word segmentation). So, to me, the relevant question becomes: are these units good enough for the tasks syllables are supposed to support?
I would love to see a downstream test in future work. For example: take these acoustically-derived units and feed them into a word segmentation model. Do we still get reasonable performance? Does it degrade relative to idealized syllables? Or is it surprisingly robust—maybe especially for infant-directed speech like that in the Brent corpus?
For me, that’s the next step: not just approximating syllables, but testing whether those approximations are functionally adequate. This paper lays excellent groundwork for asking that question concretely.
No comments:
Post a Comment