Computational Models of Language (at UC Irvine): Some Thoughts on O'Donnell et al. (2011)

I like that this paper is interested in big ideas of knowledge representation (basically, how big are the chunks that we store), and provides what seems like a sensible formalization of the idea that medium-size reusable chunks are probably the way to go. Within the same framework, they also provide formalizations of other ideas for the unit of representation (basically, use the smallest units (full-parsing/generative), use the largest units (full-listing), and use all the units (exemplar)), which is nice for easy comparison purposes. While the intuition that medium-size reusable chunks are best is perhaps unsurprising, I think this gives us a clear quantitative argument for that idea. I do wish we had been given some sense of what exactly these medium-size chunks look like for the two different morphology problems though - at first I thought this was due to space limitation, but the tech report (O'Donnell et al. 2009) version doesn't really show us what these look like either. I wonder how well they match (or don't match) current morphological theories of representation. I know the full-parsing theory is a strong viewpoint for syntax currently, but I don't know how many linguists believe that's really a viable option for morphology. On the flip side, the exemplar-based idea seems like it would make more sense for morphology (where we have a fairly small number of possible combinations), while it seems like that would be a harder sell for syntax (where there can be quite a lot of different parses, especially for longer sentences). Similarly, the full-listing approach seems intractable for syntax. Of course, this only really matters if we think Fragment Grammars apply at multiple levels of linguistic representation (e.g., morphology and syntax). I'm assuming this is what the authors intend, though.

Some more more targeted thoughts:

- Exemplar-based Inference: I can't imagine a world where this would win out, compared to Fragment Grammars (FragGs). At best, it has the same coverage as FragGs, but it has to store a heck of a lot more. Perhaps this is included for completeness in model comparison, particularly since the DOP framework assumes this?

- I thought it was very good to mention other models that have similar properties to FragGs. However, given the descriptions provided, I really wondered how Parsimonious Data-Oriented Parsing differs from FragGs ("...explicitly eschews the all-subtree approach in favor of finding a set of subtrees which best explains the data.") Maybe in the way inference is done?

- In terms of comparing this to our reading from last time (Yang 2010), I wonder what's actually being explained by the inference process behind FragGs. Is this a way to assess which representation is likely to be correct for adult usage? If so, this makes it similar to Yang (2010), as that was an assessment of productivity in child speech. Or is this instead a proposal for how adults actually come to have these medium-size chunks, and so it would be a computational level explanation of the actual process of chunk formation?

- A minor note on the past tense representation: I found it interesting that the rule for past tense formation was explicitly encoded in the "morphological representation". This makes this representation seem much more similar to work by Yang on morphological productivity in the English past tense (e.g., Yang 2005), which talks about predictability of child behavior based on the rules used to form the past tense.

- The derivational morphology section: I admit, I got a bit lost on some of the details here.

How do we take 10,000 "forms" as data, and have that yield 25,000 types and 7.2 million tokens? What are these forms?
I like the P and P* measures, since those seem to correlate somewhat with the idea of precision and recall (P ~= how generalizable is this suffix, P* ~= how many novel words use this suffix). But then, why are we looking for a correlation between them instead of using an F-score? What does it mean in Table 1 to have a correlation for P, for example? Is that P vs. P*? Or P vs something else?
Table 2 left me similarly puzzled - I couldn't decipher this: "...the marginal probability that each suffix occurred first or second in such forms...Table 2 gives the Spearman rank correlation between the (log) ratio of the probability of appearing second to the probability of appearing first with the mean rank statistic..." So if we take a word with two suffixes, s1 and s2, what exactly is being computed? Is it log(prob(s1 in first position & s2 in second position)/prob(s2 in second position & s1 in first position))? And then that's being correlated with the empirical relative ranking of these two suffixes? So we want that probability ratio to be greater than 1, which gives a positive value when you take the log. And then we're trying to correlate that positive number with the mean rank of the two suffixes? Why should this be correlated?

- In the conclusion, the authors talk about how the difference between FragGs and other models is that FragGs care about predictive ability - future novelty vs. future reuse. But I'm not sure I understand how that differs from the computation vs. storage tradeoff (which they advocate replacing with future novelty vs. future reuse) - isn't future novelty based on computation while future reuse is based on storage? If so, this seems like they're restating the tradeoff, but with an emphasis on future usage (i.e., "we care about computation vs. storage because we care about the ability to use language efficiently in the future").

~~~

References

O'Donnell, T., Goodman, N., & Tenenbaum, J. (2009). Fragment Grammars: Exploring Computation and Reuse in Language. Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2009-013.

Yang, C. (2005). On Productivity. Yearbook of Language Variation, 5, 265-302.

Yang, C. (2010 Ms.) Who's Afraid of George Kingsley Zipf? Unpublished Manuscript, Universty of Pennsylvania.

Computational Models of Language (at UC Irvine)

Friday, March 9, 2012

Some Thoughts on O'Donnell et al. (2011)

No comments:

Post a Comment

People who think this blog is awesome

Members