Computational Models of Language (at UC Irvine): April 2011

Tuesday, April 26, 2011

Next time: Heinz (2010), parts 1 and 2

Thanks to everyone who was able to join us for a very interesting discussion of Heinz's (2009) computational phonology paper! Next time on May 9th, we'll be continuing to look into the enterprise of computational phonology and what it can offer to both theoretical linguistics and language acquisition.

Computational Phonology, Part 1

Computational Phonology, Part 2

See you then!

Friday, April 22, 2011

Thoughts on Heinz (2009)

One of the things I really enjoyed about this paper was the clarity of its description - I thought Heinz did a wonderful job stepping the reader through the fundamentals and building up his main point. I also really like the approach of connecting the hypothesis space to learner bias. While Heinz's focus in this paper is more on learnability in the limit, I still think some very useful connections were made to language acquisition by children. I was also very fond of the way "don't count beyond two" falls out of the notion of neighborhood-distinctness. Then, the question is how neighborhood-distinctness comes about. Is it a constraint on the learner's abstraction abilities (as Heinz walks us through learners abstracting over prefixes and suffixes)? Is it just a more compact representation, and so that's simpler (Bayesian bias for simplicity)?

Some more targeted thoughts:

I think one of the claims is that the hypothesis space of the learner would exclude hypotheses allowing stress patterns that don't obey neighborhood-distinctness (ND). It seems like a slightly more relaxed version of this could be beneficial: Specifically, the hypothesis space includes stress patterns that don't obey ND, but the learner is strongly biased against these patterns. In order to learn languages that violate ND, the learner has to overcome this bias. So, these non-ND language patterns are in the hypothesis space, but their prior (in Bayesian terms) is low. From this, we could predict that non-ND languages take longer for children to learn - we might expect children to follow ND generalizations for some time before they finally overcome their initial bias based on the data.
A tangent on the point above - I noticed several examples Heinz mentioned (that were non-ND) involved words of four or five syllables. While of course this varies from language to language, English child-directed speech contains very few words of 4+ syllables. If this is something generally true of languages (and we might imagine some Zipfian distribution that makes it so), maybe that's another reason children might end up with ND hypotheses about stress patterns, and have to unlearn them (or learn exceptions).
How translatable is the FBLearner and the FSA representation to human acquisition algorithms and human representations? I can see how this might be meant as a direct correlate, especially given the attention given to an online variant of the learning algorithm.

Monday, April 11, 2011

Next time: Heinz (2009)

Thanks to everyone who joined us (virtually or in person) to discuss phonological learning from acoustic data! We definitely had some good suggestions on how to extend the work in Yu (2010) both computationally and experimentally. Next time on April 25th, we'll be turning to a computational study on stress systems by Heinz (2009), which provides a way to connect properties of the learner with typological properties of observed stress systems:

Heinz(2009): On the role of locality in learning stress patterns

See you then!

Friday, April 8, 2011

Thoughts on Yu (2010)

I'm really pleased with the way Yu strives to position the question she's exploring in this paper in the larger framework of language acquisition. Another strength of this paper for me is the empirical basis in acoustic data. I think a fair number of phonological category learning models do this, but you don't see this in as many word segmentation models (which usually assume that learners have abstracted phonemes already). One thing that I was hoping to see but didn't: Yu talks about how it's important to figure out the right set of acoustic features/cues for learners to pay attention to, but then doesn't really offer a way to sort through the potentially infinite number of relevant acoustic features. Her work here focuses more on distinguishing the effectiveness of already proposed features (which is certainly an important thing to do).

Some more targeted thoughts:

On p.5, Yu mentions how the learning problem can be thought of effectively as a poverty of the stimulus (PoS) problem, because the learner has to generalize from a finite set to generalizations that cover infinite sets. I do get this, but it does seem like this might be an easier generalization (in this particular case) to make than some of the problems that are traditionally held up as poverty of the stimulus (say, in syntax). This is because the acoustic data points available might be best fit by a generalization that's close enough to the truth - not every data point appears, but enough appear that are spread out sufficiently. On the other hand, a harder PoS problem would be if the data points that appear are most compatible with a generalization that is in fact the wrong one (here, if the proper ellipsis was actually much much bigger than the observed data suggest, and only extended along a particular dimension, for example).
On p.6, in footnote 1, we can really see the differences in approach to (morpho)syntax taken by linguistics vs. computational linguistics. I believe it's standard to assume a probabilistic distribution over whatever units you're working with, which has to map real-values, while in linguistics it's more standard to assume a categorical (discrete) approach. (Though of course there are linguists who adopt a probabilistic approach by default - I just think they're not in the majority in generative linguistic circles.)
on p.12, where Yu notes that there are distinctions between adult-directed and child-directed speech, and justifies the decision to use adult-directed speech: While I can certainly understand the practical motivations for doing this, it would be really good to know how different adult-directed speech is compared to child-directed speech, particularly for the acoustic properties that Yu is interested in with respect to tone. I have the (possibly mistaken) impression that there might be quite significant differences.

Computational Models of Language (at UC Irvine)