Tuesday, September 14, 2010

Jones, Johnson, & Frank (2010): Some thoughts

I'm definitely in favor of using information from multiple sources for acquisition, so I was intrigued when I saw that word-meaning mapping information was being used to constrain word segmentations. I found the model description very comprehensible. :) A couple of things:


  • p.504, "...most other words don't refer to the topic object...corresponding to a much larger [alpha]0 value" - While this was fine at first glance, I started thinking about the nature of the child utterances. Take the example in figure 1: "Is that the pig?" The "Is that the" part would be classified as non-referential by this model, but I could see these being commonly re-used words (and indeed, a commonly reused frame). The same goes for function words in general like "the" and "is". I wonder what would happen if they allowed alpha0 to be smaller, so that they get more reuse in the non-referential words. Part of the reason this integrated model seems to do better is that it has pressure at both the segmentation level and the word-meaning mapping level to make fewer lexicon items. Wouldn't forcing more reuse in the non-referential words make that better?

  • On a related note, it seems like the point of using the word-meaning mapping info (and having pressure there to make fewer items) is to correct undersegmentation that occurs (see the "book" example on 508). So maybe if there's too much pressure to make fewer lexical items (say, from forcing more reuse in non-referential words), there's a lot of undersegmentation? I'm not sure if that would follow for the "book" example they give, though. Let's suppose you have the following segmentation choices:

    • abook, yourbook, thebook
    • a book, your book, the book


    If you have more pressure to reuse non-referential words, then wouldn't you be even more likely to prefer the second segmentation over the first?

    Also, we know of other ways to fix undersegmentation - ex: using a bigram ("context") model for segmentation, instead of using a unigram model. If the model used the bigram assumption, would the word-meaning mapping information still improve segmentation?

  • A smaller nitpick question: I'm not quite sure I understand how accuracy in Table 2 (p.506) can be better for the Joint model when all the other measures are either equal to or worse than the Gold Seg model. Am I missing something in the discussion of what accuracy is (or maybe what the other measures are getting at)?

No comments:

Post a Comment