Sunday, December 1, 2019

Some thoughts on Ud Deen & Timyam 2018

I really appreciate seeing a traditional Universal Grammar (UG) approach to learnability spelled out so clearly, especially coupled with clear behavioral data about the phenomenon in question (condition C). It makes it easier to see where I agree vs. where I’m concerned that we’re not being fair to alternative accounts (or even how we’re characterizing linguistic nativist vs. non-linguistic nativist accounts). What really struck me after all the learnability discussions at various points (more on this below) is that I really want a computational cognitive model that tries to learn condition C child behavior facts in English and Thai.  When we have a concrete model, we can be explicit about what we’re building in that either makes the model behave vs. not behave like children in these languages. And then it makes sense to have a discussion about the nature of the built in stuff.

Specific thoughts:
(1) Intro, the traditional UG learnability approach: The traditional claim is if we test kids as young as we can (here, that’s Thai four-year-olds, though in English we’ve apparently tested 2.5-year-olds), and they show a certain type of knowledge, we assume that knowledge is innate. For me, I think this is a good placeholder — we have to explain how kids have this knowledge by that age. That then means a careful analysis of the input and reasonable investigation of how that knowledge could be derived from more fundamental building blocks (maybe language-specific building blocks, but maybe not). This goal of course depends on how the knowledge of condition C is represented, which is part of what often evolves in theoretical debates.

(2) Section 2.2, learnability again: “the negative properties of a language are downright impossible to acquire from the input because children never get evidence of what is impossible in the language.” -- Of course, it all depends on the representation we think children are learning, and what building blocks go into that representation. This is what Pearl & Sprouse (2013) did for syntactic islands (i.e., subjacency constraints) -- and it turns out you can get away with much more general-purpose knowledge, which may or may not be UG. 


UD&T2018 walk through how the learning process might work for condition C knowledge (and kudos to them for doing an input analysis! Much more convincing when you have actual counts in children’s input data of the phenomena you’re talking about for learnability). They highlight that children basically hear all the viable combinations of pronoun and name with both co-indexed and non-coindexed readings, but only hear the problematic structure with the non-coindexed reading. They then ask why a child wouldn’t just assume the co-indexed reading is fine for this one, too. (And that would lead to a condition C violation.) 

But the flip side is if children get fairly strong evidence for all the other options allowing coindexed readings but this structure doesn’t, why would they assume it can be coindexed? This really comes back to expectations and indirect evidence -- if you keep expecting something to occur, and it keeps not occurring, you start shifting probability towards the option that it can’t in fact occur. To make this indirect evidence account work, children would have to expect that coindexed reading to occur and keep seeing it not occur. This doesn’t seem implausible to me, but it helps to have an existence proof. (For instance, what causes them to have this expectation and what are the restrictions on the structures they expect to allow co-indexing?)

All that said, I’m completely with UD&T2018 that children aren’t unbiased learners. Children clearly have constraints on the hypotheses they entertain. What I’m not sure of is the origin of those constraints as being language-specific and innate vs. derived from prior experience. As I mentioned above, it really depends what building blocks underlie the explicit hypotheses the child is entertaining. It’s perfectly possible for the implicit hypothesis space to be really (infinitely) large, because of the way building blocks can combine (especially if there’s any kind of recursion). But domain-general biases (e.g., prefer explicit hypotheses that require fewer building blocks) can skew the prior over this implicit hypothesis space in a useful way.

I’m also not very convinced about the need for a language-specific Subset Principle -- this same preference for a narrower hypothesis, given ambiguous data, falls out from standard Bayesian inference. (Basically, a narrower hypothesis has a higher likelihood for generating an ambiguous data point than a wider hypothesis does, so boom -- we have a preference for a narrower hypothesis that comes from a domain-general learning mechanism.) But okay, if a bias for the narrower hypothesis helps children navigate the condition C acquisition problem, great.

(3) Section 2.4, the nativist vs. non-nativist discussion: It’s illuminating to see the discussion of what would count as “nativist” vs. “non-nativist” here. (Side note: I have a whole discussion about this particular terminology issue in a recent manuscript about Poverty of the Stimulus: Pearl 2019: Pearl, L. (under review). Poverty of the Stimulus Without Tears. https://ling.auf.net/lingbuzz/004646.)

Using the terms I prefer, UD&T2018 are thinking about a linguistic nativist approach (=their “nativist”) where explicit knowledge of condition C is innately available vs. a non-linguistic nativist approach (=their “non-nativist”) where this knowledge is instead derived from the input. Of course, it’s very possible that explicit knowledge of condition C is derived from the input over time by *still* using some innate, language-specific knowledge (maybe something more general-purpose). If so, then we still have a linguistic nativist position, but now it’s one that’s utilizing the input more than UD&T2018 think linguistic nativist approaches do. What would make an approach non-linguistic nativist is the following: the way explicit knowledge of condition C was derived from the input never involved language-specific innate knowledge, but rather only domain-general innate knowledge (or mechanisms, like Bayesian inference).

Related though, on children initially obeying condition C everywhere (even for bare nominals): This behavior would accord with (relatively) rapid acquisition of explicit condition C knowledge (which initially gets overapplied for Thai). But, it’s not clear to me that we can definitively claim linguistic vs. non-linguistic nativist for the necessary knowledge. Again, it all depends on how condition C knowledge is represented and what building blocks kids use (and could track in the input) to construct that knowledge. 

Also, with respect to age of acquisition, without an explicit theory of how learning occurs, how do we know that derived approaches to learning condition C couldn’t yield child judgment behavior at four? That is, how do we know that acquisition wouldn’t be fast enough this way? (This is one of my main concerns with claims that young children knowing something means they were innately endowed with that knowledge.)

(4) Section 4, discussion of Thai children overapplying condition C: UD&T2018 talk about this as children allowing the most restrictive grammar. But honestly, thinking about this in terms of Yang’s Tolerance Principle, couldn’t it be about learning the rule (here: condition C) in the presence of exceptions? So it’s the same grammar, but we just have exceptions in Thai while languages like English don’t have exceptions. If so, then it’s not clear that considerations about more restrictive vs. less restrictive grammars apply.

(5) Section 4, on minimalist approaches to condition C: I like the idea of considering the building blocks of condition C a lot. But then, isn’t it an open question whether the explicit condition C knowledge constructed from these building blocks has to be innate (and perhaps more interestingly, as constrained as UD&T2018 initially posit it to be)? I’m very willing to believe some building blocks are innate (I’m a nativist, after all), but it seems yet-to-be-determined whether the necessary building blocks are language-specific. And again, even if they are, how constrained do they make the child’s implicit hypothesis space? (This is where it really hit me that I wanted a computational cognitive model of learning condition C facts — and really, a model that replicates adult and child judgment behavior.)