Monday, May 15, 2017

Thoughts on Yang et al. 2017

I feel like Universal Grammar (UG) was better defined by the end of this exposition (thanks, Y&al2017!), but now I want to have a heart-to-heart about the difference between “hierarchy” and “combination”. Still, I appreciated this convenient synthesis of evidence from the generative grammar tradition, especially as it relates to the kind of considerations I have as an acquisition modeler. 

Specific thoughts:

(1) Hierarchy vs. combination:
Part 2.2: While I’m a fan of hierarchical structures being everywhere in language, I wasn’t sure how connected the newborn n-syllable tasks were to the point about hierarchy. Why does being sensitive to the number of vowels (“vowel centrality”) indicate there must be hierarchical structure? For example, what if newborns hadn’t inferred hierarchy yet, but were simply sensitive to the more acoustically salient cue of vowels — wouldn’t we see the same results, even if all they really perceived was something like V V for “baku” and “alprim”?

Similarly with the babbling examples: How do we know these are hierarchical (vs. say, linear) structures? 

Similarly with the prosodic contour distinctions for the 6- to 12-week-olds: We know they perceive the prosodic contours, but not that they recognize the words and phrases in these languages. (In fact, we assume they don’t — they haven’t really managed reliable speech segmentation yet.) So how does recognizing prosodic contour distinctions over acoustic units relate to the hierarchical structure Merge gives?

My main issue is coming down to “combinatorial” vs. “hierarchical”. I think you can make combinations of things without those things being combined hierarchically. So these two terms don’t mean the same thing to me, which is why the evidence in section 2.2 doesn’t seem as compelling about hierarchy (though it is for combinations). Contrast this with the 2.3 examples of syntactic development, where c-command definitely is about hierarchy.


(2) UG: Initially, UG is described as domain-specific principles of language knowledge, without specifying whether these are innate principles or not (and also seeming to focus on the knowledge about language, rather than, say, knowledge about how to learn language (= learning mechanism)). But then, we see UG described as “internal constraints that hold across all linguistic structures”  — though this highlights the innate component, it now doesn’t seem to indicate these constraints have to be just about language. That is, they could be constraints that apply to language as well as other things, e.g., hierarchy, which they talk about as Merge. I’m thinking visual scene parsing is similar, where you have hierarchical chunks. So this would be a vision system version of Merge. 

A little later on, we see “Universal Grammar” as the “initial state of language development” that's “determined by our genetic endowment”, which reinforces the innate component, but hedges on whether this is innate knowledge of the structure of language, or innate knowledge about how to learn language. This latter interpretation becomes more salient when they describe UG as infants interpreting parts of the environment as linguistic experience. This seems to be about the perceptual intake, and is less about knowledge of language than knowledge about what could count as language (= learning mechanism). Maybe that’s a broader definition of what it means to be a “principle of language”?

Later on in part 3.2, we get to more canonical UG examples, which are the linguistic parameters. These feel much more obviously language-specific. If they’re meant to be innate (which is how they’re typically talked about), then there we go. 

Side note: I would dearly love to figure out if specific linguistic parameters like these are derivable from other more basic linguistic building blocks. I think this is where the Minimalist Program (MP) and the Principles & Parameters (P&P) representations can meet, with MP providing the core building blocks that generate the P&P variables. I just haven’t seen it explicitly done yet. But it feels very similar to the implicit vs. explicit hypothesis space distinction that Perfors (2012) discusses, where the linguistic parameters are the explicit hypotheses generated from the MP building blocks that are capable of generating all the hypotheses in the implicit hypothesis space.

Perfors, A. (2012). Bayesian models of cognition: what's built in after all? Philosophy Compass, 7(2), 127-138.


(3) Efficient computation: I really like seeing this term here as a core factor, though I’m tempted to make it “efficient enough computation”, especially if we’re going to eventually tie this kind of thing back to evolution.

(4) Rhetorical device danger: Section 3.1 has this statement that I think can get us into hot water later on: “[I]t follows that language learners never witness the whole conjugation table…fully fleshed out, for even a single verb.”  Now we’ve just thrown down the gauntlet for some corpus analyst to hunt through a large enough sample and find just one verb that does. It doesn’t affect the main point at all, but it’s the kind of thing that can be easily misunderstood (c.f., aux inversion input for arguing against Poverty of the Stimulus).

(5) Section 3.3: “…linguistic principles such as Structure Dependence and the constraint on co-reference [c-command]…are most likely accessible to children innately” — Yes! In the sense that these principles are allowed into the hypothesis space. Accessible is definitely the right (hedgy) word, rather than saying these are the only options period.

(6) Section 3.3, on Bayesian models of indirect negative evidence : ”…for this reason, most recent models  of indirect negative evidence explicitly disavow claims of psychological realism” — I find this a bit tricksy. Reading it, you might think: “Oh! The issue is that indirect negative evidence isn’t psychologically plausible to use.” But in actuality,  the “disavowal” is about a computational-level inference algorithm being psychologically real. As far as I know, there are no claims that the computation it’s doing with that algorithm isn’t psychologically real; rather, they assume humans approximate that computation (which uses indirect negative evidence).  

Related is the stated computational "intractability" of using indirect negative evidence: I admit, I find this weird. If we’re happy to posit alternative hypotheses in a subset-superset relationship, why is it so hard to posit predictions from those two hypotheses? The hard part seems to be about defining the hypotheses so explicitly in the first place, and that doesn’t seem to be the part that’s targeted as “psychologically intractable”. If anything, it seems to be the psychologically necessary part. (The description that follows this bit in section 3.3 seems to highlight this, where Y&al2017 talk about the superset grammar existing, even if the default is the subset grammar.)


(7) Section 4.1, on the importance of empirical details: I really appreciate the pitch to make proposals account for specific empirical details. This is something near and dear to my heart. Don’t just tell me your $beautiful_theory will solve all my language acquisition problems; show me exactly how it solves them, one by one. (Minimalism, I’m looking at you. And to be fair, that’s exactly what the next-to-last sentence of section 4.1. says.)

Monday, May 1, 2017

Thoughts on Han et al. 2016 + Piantadosi & Kidd 2016 + Lidz et al. 2016

As with our previous reading, I really appreciate the clarity with which the arguments are laid out by H&al2016, P&K2016’s reply, and L&al2016’s reply-to-the-reply. I can also see where some confusion is arising in the debates surrounding this — there seems to be genuine ambiguity in the way terminology is used to describe the different perspectives about the source of linguistic knowledge (e.g., what “endogenous” actually refers to — more on this below). I also really like seeing a clear, concrete example of solving an induction problem that involves fairly abstract knowledge, and using knowledge internal to the learner to do so.

Specific thoughts:

(1) Endogenous: 
It’s interesting that the basic distinction drawn in the opening paragraph of H&al2016 is between domain-general vs. language-specific innate mechanisms, which is different than simply endogenous vs. not (that is, it’s a question of which endogenous it is): “…did the data…allow for construction of knowledge through general cognitive mechanisms…or did that experience play more of a triggering role, facilitating the expression of abstract core knowledge…”

I think the reply by P&K2016 hits on an interesting terminology issue. For H&al2016, endogenous means “internal to the child”; in contrast, P&K2016 seem to go with the more narrow definition of “genetically specified with no external influence”. This then makes P&K2016 question what to make of parents having different grammars than their kids. For H&al2016, I think the point is simply that something internal to the child  — and not solely genetic — is responsible. It’s possible that the internal something developed from a combination of genetics & other data experience, but it’s clearly something that can differ between parents and children. (General point: Just because something’s genetic doesn’t mean it doesn’t interact with the environment to produce the observed result. Concrete example: Height depends on genetics and nutrition.) 

This issue about what kind of endogenous knowledge (rather than simply is it or isn’t it endogenous) is also something P&K2016 pick up on in their reply. They specifically bring up domain-general endogenous factors as possibilities (“differences in memory, motivation, or attention”) and note that the “root cause of the variation may not even be linguistic”. This, as far as I can tell, doesn’t go against H&el2016’s original point. So, it seems like P&K2016 are targeting a more specific position than H&al2016 argued in their paper, though H&al2016’s initial introductory wording suggested that more specific position.

I think L&al2016’s reply-to-the-reply reflects the ambiguity in this position — they note that their paper provides evidence for “endogenous linguistic content”. While the basic reading of this is simply “knowledge about language that’s internal” (and so silent about whether the origin of this knowledge is domain-specific or domain-general), I think it’s easy to interpret this as arguing for the origin of that knowledge to also be language-specific. The final paragraph of L&al2016’s reply underscores this interpretation, as they argue against domain-general mechanisms like memory, attention, and executive function being the source of the endogenous linguistic knowledge. And that, of course, is what P&K2016 (and many others) aren’t fond of. 


(2) Empiricism, P&K2016’s closing: What’s a “reasonable version” of empiricism? My (perhaps naive) understanding was that empiricism believes everything is learned and nothing is innate, which I didn’t think anyone believed anymore. I thought that as soon as you believe even one thing is innate (no matter what flavor of innate it is), you’re by definition a nativist. Maybe this is another example of terminology being used differently by the different perspectives.


(3) One of the interesting things about the experiments in H&al2016 is that the experimental stimuli could be the driving force of grammatical choice. That is, there’s a possibility that people did have multiple grammars before the experiment, but selected one during the course of the experiment and then learned it. This is one way that could happen:

(a) When finally presented with data that require a choice in the verb-raising parameter, participants make that choice. 
(b) Primed by the previous choice (which may have involved some internal computation that was effortful and which they don’t want to repeat), participants stick with it throughout the first test session, thereby reinforcing that choice. 
(c) This prior experience is then reactivated in the second test session a month later, and used as a prior in favor of whichever option was previously chosen. 

If this is what happened, then by the act of testing people, we enable the convergence on a single option where there were previously multiple ones - how quantum mechanics of us…