Computational Models of Language (at UC Irvine): Some thoughts on Morley 2015

I definitely appreciate the detailed thought that went into this paper — Morley uses this deceptively simple case study to highlight how to take complexity in representation and acquisition seriously, and also how to take arguments about Universal Grammar seriously. (Both of these are, of course, near and dear to my heart.) I also loved the appeal to use computational modeling to make linguistic theories explicit. (I am all about that.)

I also liked how she notes the distinction between learning mechanism and hypothesis space constraints in her discussion of how UG might be instantiated — again, something near and dear to my heart. My understanding is that we’ve typically thought about UG as constraints on the hypothesis space (and the particular UG instantiation Morley investigated is this kind of UG constraint). To be fair, I tend to lean this way myself, preferring domain-general mechanisms for navigating the hypothesis space and UG for defining the hypothesis space in some useful way.

Turning to the particular UG instantiation Morley looks at, I do find it interesting that she contrasts the “UG-delimited H Principle” with the “cycle of language change and language acquisition” (Intro). To me, the latter could definitely have a UG component in either the hypothesis space definition or the learning mechanism. So I guess it goes to show the importance of being particular about the UG claim you’re investigating. If the UG-delimited H Principle isn’t necessary, that just rules out the logical necessity of that type of UG component rather than all UG components. (I feel like this is the same point made to some extent in the Ambridge et al. 2014 and Pearl 2014 discussion about identifying/needing UG.)

Some other thoughts:

(1) Case Study:

(a) I love seeing the previous argument for “poverty of the typology implies UG” laid out. Once you see the pieces that lead to the conclusion, it becomes much easier to evaluate each component in its own right.

(b) The hypothetical lexicon items in Table 1 provide a beautiful example of overlapping hypothesis extensions, some of which are in a subset-superset relationship depending on the actual lexical items observed (I’m thinking of the Penultimate grammar vs the other two, given items 1,3, and 4 or item 1, 2, and 5). Bayesian Size Principle to the rescue (potentially)!

(c) For stress grammars, I definitely agree that some sort of threshold for determining whether a rule should be posited is necessary. I’m fond of Legate & Yang (2013)/Yang (2005)’s Tolerance Principle myself (see Pearl, Ho, & Detrano 2014, 2015 for how we implement it for English stress. Basic idea: this principle provides a concrete threshold for which patterns are the productive ones. Then, the learner can use those to pick the productive grammar from the available hypotheses). I was delighted to see the Tolerance Principle proposal explicitly discussed in section 5.

(2) The Learner

(a) It’s interesting that a distribution over lexical item stress patterns is allowed, which would then imply that a distribution over grammars is allowed (this seems right to me intuitively when you have both productive and non-productive patterns that are predictable). Then, the “core” grammar is simply the one with the highest probability. One sticky thing: Would this predict variability within a single lexical item? (That is, sometimes an item gets the stress contour from grammar 1 and sometimes it gets the one from grammar 2.) If so, that’s a bit weird, except in cases of code-switching within dialects (maybe example: American vs. British pronunciation). But is this what Stochastic OT predicts? It sounds like the other frameworks mentioned could be interpreted this way too. I’m most familiar with Yang’s Variational Learning (VL), but I’m not sure the VL framework has been applied to stress patterns on individual lexical items, and perhaps the sticky issue mentioned above is why?

Following this up with the general learners described, I think that’s sort of what the Variability/Mixture learners would predict, since grammars can just randomly fail to apply to a given lexical item with some probability. This is then a bit funny because these are the only two general learners pursued further. The discarded learners predict different-sized subclasses of lexical items within which a given grammar applies absolutely, and that seems much more plausible to me, given my knowledge of English stress. Except the description of the hypotheses given later on in example (5) make me think this is effectively how the Mixture model is being applied? But then the text beneath (7) clarifies that, no, this hypothesis really does allow the same lexical item to show up with different stress patterns.

(b) It’s really interesting to see the connection between descriptive and explanatory adequacy and Bayesian likelihood and prior. I immediately got the descriptive-likelihood link, but goggled for a moment at the explanatory-prior link. Isn’t explanatory adequacy about generalization? Ah, but a prior can be thought of as an extension of items -- and so the items included in that extension are ones the hypothesis would generalize to. Nice!

(3) Likely Input and a Reasonable Learner: The take-home point seems to be that lexicons that support Gujarati* are rare, but not impossible. I wonder how well these match up to the distributions we see in child-directed speech (CDS)? Is CDS more like Degree 4, which seems closest to the Zipfian distribution we tend to see in language at different levels?

(4) Interpretation of Results: I think Morley makes a really striking point about how much we actually (don’t) know about typological diversity, given the sample available to us (basically, we have 0.02% of all the languages). It really makes you (me) rethink making claims based on typology.

References

Ambridge, B., Pine, J. M., & Lieven, E. V. (2014). Child language acquisition: Why universal grammar doesn't help. Language, 90(3), e53-e90.

Pearl, L. (2014). Evaluating learning-strategy components: Being fair (Commentary on Ambridge, Pine, and Lieven). Language, 90(3), e107-e114.

Pearl, L., Ho, T., & Detrano, Z. 2014. More learnable than thou? Testing metrical phonology representations with child-directed speech. Proceedings of the Berkeley Linguistics Society, 398-422.

Pearl, L., Ho, T., & Detrano, Z. (2015, under review). An argument from acquisition: Comparing English metrical stress representations by how learnable they are from child-directed speech. http://ling.auf.net/lingbuzz/002072.

Computational Models of Language (at UC Irvine)

Wednesday, November 25, 2015

Some thoughts on Morley 2015

No comments:

Post a Comment

People who think this blog is awesome

Members