Wednesday, November 25, 2015

Some thoughts on Morley 2015

I definitely appreciate the detailed thought that went into this paper — Morley uses this deceptively simple case study to highlight how to take complexity in representation and acquisition seriously, and also how to take arguments about Universal Grammar seriously. (Both of these are, of course, near and dear to my heart.) I also loved the appeal to use computational modeling to make linguistic theories explicit. (I am all about that.)

I also liked how she notes the distinction between learning mechanism and hypothesis space constraints in her discussion of how UG might be instantiated — again, something near and dear to my heart. My understanding is that we’ve typically thought about UG as constraints on the hypothesis space (and the particular UG instantiation Morley investigated is this kind of UG constraint). To be fair, I tend to lean this way myself, preferring domain-general mechanisms for navigating the hypothesis space and UG for defining the hypothesis space in some useful way. 

Turning to the particular UG instantiation Morley looks at, I do find it interesting that she contrasts the “UG-delimited H Principle” with the “cycle of language change and language acquisition” (Intro). To me, the latter could definitely have a UG component in either the hypothesis space definition or the learning mechanism. So I guess it goes to show the importance of being particular about the UG claim you’re investigating. If the UG-delimited H Principle isn’t necessary, that just rules out the logical necessity of that type of UG component rather than all UG components. (I feel like this is the same point made to some extent in the Ambridge et al. 2014 and Pearl 2014 discussion about identifying/needing UG.)

Some other thoughts:
(1) Case Study: 

(a)  I love seeing the previous argument for “poverty of the typology implies UG” laid out. Once you see the pieces that lead to the conclusion, it becomes much easier to evaluate each component in its own right.

(b) The hypothetical lexicon items in Table 1 provide a beautiful example of overlapping hypothesis extensions, some of which are in a subset-superset relationship depending on the actual lexical items observed (I’m thinking of the Penultimate grammar vs the other two, given items 1,3, and 4 or item 1, 2, and 5). Bayesian Size Principle to the rescue (potentially)!

(c) For stress grammars, I definitely agree that some sort of threshold for determining whether a rule should be posited is necessary. I’m fond of Legate & Yang (2013)/Yang (2005)’s Tolerance Principle myself (see Pearl, Ho, & Detrano 2014, 2015 for how we implement it for English stress. Basic idea: this principle provides a concrete threshold for which patterns are the productive ones. Then, the learner can use those to pick the productive grammar from the available hypotheses). I was delighted to see the Tolerance Principle proposal explicitly discussed in section 5.

(2) The Learner

(a) It’s interesting that a distribution over lexical item stress patterns is allowed, which would then imply that a distribution over grammars is allowed (this seems right to me intuitively when you have both productive and non-productive patterns that are predictable). Then, the “core” grammar is simply the one with the highest probability. One sticky thing: Would this predict variability within a single lexical item? (That is, sometimes an item gets the stress contour from grammar 1 and sometimes it gets the one from grammar 2.) If so, that’s a bit weird, except in cases of code-switching within dialects (maybe example: American vs. British pronunciation). But is this what Stochastic OT predicts? It sounds like the other frameworks mentioned could be interpreted this way too. I’m most familiar with Yang’s Variational Learning (VL), but I’m not sure the VL framework has been applied to stress patterns on individual lexical items, and perhaps the sticky issue mentioned above is why? 

Following this up with the general learners described, I think that’s sort of what the Variability/Mixture learners would predict, since grammars can just randomly fail to apply to a given lexical item with some probability. This is then a bit funny because these are the only two general learners pursued further. The discarded learners predict different-sized subclasses of lexical items within which a given grammar applies absolutely, and that seems much more plausible to me, given my knowledge of English stress. Except the description of the hypotheses given later on in example (5) make me think this is effectively how the Mixture model is being applied? But then the text beneath (7) clarifies that, no, this hypothesis really does allow the same lexical item to show up with different stress patterns.

(b) It’s really interesting to see the connection between descriptive and explanatory adequacy and Bayesian likelihood and prior. I immediately got the descriptive-likelihood link, but goggled for a moment at the explanatory-prior link. Isn’t explanatory adequacy about generalization? Ah, but a prior can be thought of as an extension of items -- and so the items included in that extension are ones the hypothesis would generalize to. Nice!

(3)  Likely Input and a Reasonable Learner: The take-home point seems to be that lexicons that support Gujarati* are rare, but not impossible. I wonder how well these match up to the distributions we see in child-directed speech (CDS)? Is CDS more like Degree 4, which seems closest to the Zipfian distribution we tend to see in language at different levels?

(4) Interpretation of Results: I think Morley makes a really striking point about how much we actually (don’t) know about typological diversity, given the sample available to us (basically, we have 0.02% of all the languages). It really makes you (me) rethink making claims based on typology.


Ambridge, B., Pine, J. M., & Lieven, E. V. (2014). Child language acquisition: Why universal grammar doesn't help. Language, 90(3), e53-e90.

Pearl, L. (2014). Evaluating learning-strategy components: Being fair (Commentary on Ambridge, Pine, and Lieven). Language, 90(3), e107-e114.

Pearl, L., Ho, T., & Detrano, Z. 2014. More learnable than thou? Testing metrical phonology representations with child-directed speech. Proceedings of the Berkeley Linguistics Society, 398-422.

Monday, November 2, 2015

Some thoughts on Pietroski 2015 in press

One of the things that stood out most to me from this article is the importance of the link between structured sequences and intended meanings (e.g., with the eager/easy to please examples). Pietroski is very clear about this point (which makes sense, as it was one of the main criticisms of the Perfors et al. 2011 work that attempted to investigate poverty of the stimulus for the canonical example of complex yes/no questions). Anyway, the idea that comes through is that it’s not enough to just deal with surface strings alone. Presumably it becomes more acceptable if the strings also include latent structure, though, like traces? (Ex: John is easy to please __(John) vs. John is eager (__John) to please.) At that point, some of the meaning is represented in the string directly.

I’m not sure how many syntactic acquisition models deal with the integration of this kind of meaning information, though. For example, my islands model with Jon Sprouse (Pearl & Sprouse 2013) used latent phrasal structure (IP, VP, CP, etc) to augment the learner’s representation of the input, but was still just trying to assign acceptability (=probability) to structures irrespective of the meanings they had. That is, no meaning component was included. Of course, this is why we focused on islands that were supposed to be solely “syntactic”, unlike, for instance, factive islands that are thought to incorporate semantic components. (Quickie factive island example: *Who do you forget likes this book? vs. Who do you believe likes this book?). Is our approach an exceptional case, though? That is, is it never appropriate to worry only about the “formatives” (i.e., the structures in absence of interpretation)? For instance, what if we think of the learning problem as trying to decide what formative is the appropriate way to express a particular interpretation — isn’t identifying the correct formative alone sufficient in this case? Concrete example: Preferring “Was the hiker who was lost killed in the fire?” over “*Was the hiker who lost was killed in the fire?” with the interpretation of "The hiker who was lost was killed in the fire [ask this]".

Some other thoughts:

(1) My interpretation of the opening quote is that acquisition models (as theories of language learning and/or grammar construction) matter for theories of language representation because they facilitate the clear formulation of deeper representational questions. (Presumably by highlighting more concretely what works and doesn’t work from a learning perspective?) As an acquisition chick who cares about representation, this makes me happy.

(2) For me, the discussion about children’s “vocabulary” that allows them to go from “parochial courses of human experience to particular languages” is another way of talking about the filters children have on how they perceive the input and the inductive biases they have on their hypothesis spaces. This makes perfect sense to me, though I wouldn’t have made the link to the term “vocabulary” before this. Relatedly, the gruesome example walkthrough really highlights for me the importance of inductive biases in the hypothesis space. For example, take the assumption of constancy w.r.t. time for what (most) words mean (so we never get green before time t or blue after time t as possible meanings, even though these are logically possible given the bits we build meaning out of). So we get that more exotic example, which gets followed up with more familiar linguistic examples that help drive the point home.

Pearl, L., & Sprouse, J. (2013). Syntactic islands and learning biases: Combining experimental syntax and computational modeling to investigate the language acquisition problem. Language Acquisition20(1), 23-68.

Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition118(3), 306-338.