Some more targeted thoughts:
- On p.5, Yu mentions how the learning problem can be thought of effectively as a poverty of the stimulus (PoS) problem, because the learner has to generalize from a finite set to generalizations that cover infinite sets. I do get this, but it does seem like this might be an easier generalization (in this particular case) to make than some of the problems that are traditionally held up as poverty of the stimulus (say, in syntax). This is because the acoustic data points available might be best fit by a generalization that's close enough to the truth - not every data point appears, but enough appear that are spread out sufficiently. On the other hand, a harder PoS problem would be if the data points that appear are most compatible with a generalization that is in fact the wrong one (here, if the proper ellipsis was actually much much bigger than the observed data suggest, and only extended along a particular dimension, for example).
- On p.6, in footnote 1, we can really see the differences in approach to (morpho)syntax taken by linguistics vs. computational linguistics. I believe it's standard to assume a probabilistic distribution over whatever units you're working with, which has to map real-values, while in linguistics it's more standard to assume a categorical (discrete) approach. (Though of course there are linguists who adopt a probabilistic approach by default - I just think they're not in the majority in generative linguistic circles.)
- on p.12, where Yu notes that there are distinctions between adult-directed and child-directed speech, and justifies the decision to use adult-directed speech: While I can certainly understand the practical motivations for doing this, it would be really good to know how different adult-directed speech is compared to child-directed speech, particularly for the acoustic properties that Yu is interested in with respect to tone. I have the (possibly mistaken) impression that there might be quite significant differences.