So I admit I found this paper a bit tougher going than the previous one, most likely due to how much information they had to fit into a limited space. Anyway, once I wrapped my head around what it meant for something to be "adapted", things started to make more sense.
Some more targeted thoughts:
(1) Given our discussion last time about the syllable as a likely basic unit of representation (given neurological evidence), we had talked about implementing learning models that take the syllable as the basic unit. How similar is what Johnson & Goldwater have done here with their collocation-syllable adaptor grammar to this idea? Clearly, the syllable is one unit of representation that matters in this model, but they also go below the syllable level to include properties of syllables that correspond (roughly) to phonotactic constraints on syllable-hood. Does this mean a learner would have to be able to analyze individual phonemes in order to use this model? If so, what happens if we get rid of any representation below the syllable-level? Is there any place for phonotactic constraints then?
(2) I'd like to look closer at Table 1 to try to understand what benefits the learner. Because there are so many conditions, it's a bit hard to pick apart the impact of any one condition. For example, J&G argue that table label resampling leads to goodness for the models with rich hierarchical structure (like the collocation-syllable model), and point to figure 1 to show this. But looking at the 3rd and 4th entries from the bottom of table 1, it seems like performance worsens with table label resampling.
(3) The idea of maximum marginal decoding is interesting to me, because it reminds me of the difference between "weakly equivalent" grammars and "strongly equivalent" grammars. Weakly equivalent = output is the same, even if internal structure isn't; Strongly equivalent = output is the same and internal structure is the same. It seems here that aggregating "weakly equivalent" word segmentations leads to better performance.
Thursday, August 19, 2010
The first meeting of the CoLa Reading group was quite fun - thanks to all who could make it! Next time (Sept 2), we'll be looking at the word segmentation paper of Johnson & Goldwater (2009), which can be downloaded from the CoLa Reading Group website schedule page.
Tuesday, August 17, 2010
As many of you know, I'm very sympathetic to this style of modeling where there's an attempt to use learning algorithms that seem like they might be cognitively plausible. So, short version of my thoughts on this: Yay, algorithmic-level modeling (specifically, the algorithmic level of representation of Marr (1982)) that gets some very promising looking results.
More specific things that occurred to me as I was reading:
More specific things that occurred to me as I was reading:
- p.2-3: The authors mention how they're not going to be tackling the segmentation of auditory linguistic stimuli (not unreasonable), but that "any word segmentation model could easily be plugged into a system that recognizes phonemes from speech". It's not so clear to me that the phoneme level of representation is right for modeling initial word segmentation, though it's a reasonable first step. Specifically, given what we know of the time course of acquisition, it seems like native language phoneme identification isn't fully online till about 10-12 months - but initial word segmentation is likely happening around 6 months. Given this, it seems more likely that infants may be working with a representation that's more abstract than the raw auditory signal, but less settled than the adult phonemic representation. For example, perhaps allophones might be perceived as separate sounds by the infant at this point in development. Anyway, this isn't a critique of this model in particular - most word seg models I've seen work with phonemes - but it'd be very interesting to see how any of the prominent word seg models would perform on input that's messier than the phonemic representation commonly used.
- p.9, p.14, p.18: The authors emphasize that their target unit of extraction is the phonological word (and their exposition of different definitions of "word" was quite nice, I thought). Unfortunately, they have the problem of only having orthographic word corpora available. I wonder how hard it would be to convert the existing corpus into a phonological word corpus - they say it's a hard and time-consuming process, but perhaps there are some rewrite rules that could do a reasonable approximation? Or maybe it would be useful to note how many "mis-segmentations" of any model are actually viable phonological word segmentations.
- Looking at figure 1 on p.10, and the exposition about the model: I wonder how the model actually chooses the most probably segmentation from all possible segmentations for an utterance. Initially, this is probably very easy because there's nothing in the lexicon. But once the lexicon is populated, it seems like there could be a lot of possibilities to choose from. Maybe some kind of heuristic choice? This part of learning is what the dynamic programming algorithms do in the Bayesian models of Pearl, Goldwater, & Steyvers (2010).
- p.11 - the second phonotactic constraint: It's probably worth noting that requiring all words to have a syllabic sound means the learner must know beforehand (or somehow be able to derive) what a syllabic sound is. This seems like domain-specific knowledge (i.e., "all sounds with these properties are syllabic", etc.) - is there any way it wouldn't be? Supposing this is definitely domain-specific (though language universal) knowledge, how plausible is it that humans have innate knowledge of the necessary properties for syllable-hood? I know there's some evidence that the syllable is a basic unit of infant perception, so this could be very reasonable after all.
- p.17 - testing the "require syllabic" constraint on its own: The authors explain that the reason a learner with only this constraint fails is because longer words receive the same probability as shorter words. Maybe a slightly more informed version of this learner could assign each phoneme a small constant probability (rather than all unfamiliar words getting the same probability) - it seems like this would allow the word length effects to emerge and could lead to better segmentation. Maybe this learner would prefer CV or V words (due to their length + still being syllabic) - which would lead to major undersegmentation. Still, I wonder how bad it would be, since so many English child-directed speech words are monosyllabic anyway.
Wednesday, August 4, 2010
So, it looks like one of the best times to meet this summer would be Thursdays at noon. We'll have our first meeting Aug 19th, in SBSG 2200, starting with Blanchard et al. (2010). Hope to see you there! (And even if not, to see your comments here!)