Friday, April 22, 2011

Thoughts on Heinz (2009)

One of the things I really enjoyed about this paper was the clarity of its description - I thought Heinz did a wonderful job stepping the reader through the fundamentals and building up his main point. I also really like the approach of connecting the hypothesis space to learner bias. While Heinz's focus in this paper is more on learnability in the limit, I still think some very useful connections were made to language acquisition by children. I was also very fond of the way "don't count beyond two" falls out of the notion of neighborhood-distinctness. Then, the question is how neighborhood-distinctness comes about. Is it a constraint on the learner's abstraction abilities (as Heinz walks us through learners abstracting over prefixes and suffixes)? Is it just a more compact representation, and so that's simpler (Bayesian bias for simplicity)?

Some more targeted thoughts:

  • I think one of the claims is that the hypothesis space of the learner would exclude hypotheses allowing stress patterns that don't obey neighborhood-distinctness (ND). It seems like a slightly more relaxed version of this could be beneficial: Specifically, the hypothesis space includes stress patterns that don't obey ND, but the learner is strongly biased against these patterns. In order to learn languages that violate ND, the learner has to overcome this bias. So, these non-ND language patterns are in the hypothesis space, but their prior (in Bayesian terms) is low. From this, we could predict that non-ND languages take longer for children to learn - we might expect children to follow ND generalizations for some time before they finally overcome their initial bias based on the data.

  • A tangent on the point above - I noticed several examples Heinz mentioned (that were non-ND) involved words of four or five syllables. While of course this varies from language to language, English child-directed speech contains very few words of 4+ syllables. If this is something generally true of languages (and we might imagine some Zipfian distribution that makes it so), maybe that's another reason children might end up with ND hypotheses about stress patterns, and have to unlearn them (or learn exceptions).

  • How translatable is the FBLearner and the FSA representation to human acquisition algorithms and human representations? I can see how this might be meant as a direct correlate, especially given the attention given to an online variant of the learning algorithm.

No comments:

Post a Comment