One thing that stood out to me in this paper was his stance on computational-level vs. algorithmic-level models of syntactic acquisition. Right up front, he establishes his view that algorithmic-level models are the ones with the greatest contribution (and this line of discussion seems to continue in section 4, where he seems dismissive of some existing computational-level models). I do have great sympathy for wanting to create algorithmic-level models, but I still believe computational-level models have something to offer. The basic idea for me is this: if you have an ideal learner that can't learn the required knowledge from the available data, this seems like a great starting point for a poverty of the stimulus claim. (It may turn out that some algorithmic-level model doesn't have the same issue, but then you know the "magic" that happens is in the specific process that algorithmic-level model uses. And maybe that "magic" corresponds to some prior knowledge or innate bias in the learning procedure, etc. At any rate, the ideal learner model has contributed something.)
I also found Yang's discussion of the PAC learnability framework enlightening in section 3. A couple of comments stood out to me:
- p.6: The comment about how to turn infinite grammars finite, by ignoring sufficiently long sentences (that, for example, contain lots of recursion). Yang notes that few language scientists would find the notion of a finite grammar appealing. On the other hand, I feel like we could have some sympathy for people who believe that sentences of infinite length are not really part of the language. Yes, they're part of the language by definition (of what recursion is, for example), but they seem not to be part of the language if we define language as something like "the strings that people could utter in order to communicate". I think Yang's larger point remains that the set of strings in a grammar of any language is infinite in size, though.
- In that same paragraph, Yang seems dismissive of the hypothesis space of probabilistic context-free grammars (PCFGs) being realistic in current model implementations, specifically because the "prior probabilities of these grammars must be assumed". While it may be the case that some models take this approach, I don't think it's necessarily true. If you already have a PCFG, couldn't the prior for the grammar be derived by the some combination of the rules' probabilities? (I feel like Hsu & Chater (2010) do something like this with their MDL framework, where the prior is the encoding of the grammar.)