Tuesday, November 13, 2018

Some thoughts on White et al. 2018

I love seeing syntactic bootstrapping not just as an initial word-learning strategy, but in fact as a continuing source of information (and thus useful for very subtle meaning acquisition). Intuitively, this makes sense since we can learn new words by reading them in context, and as an adult, I think that’s the main way we learn new words. But you don’t see as much work on the acquisition side exploring this idea. Hopefully these behavioral experiments can inform both future cognitive models and future NLP applications.

Other thoughts:

(1) The fact that some verbs have both representational and preferential properties underscores that there’s likely to be a continuum, rather than categorical distinctions. This reminds me of the raising vs control distinction (subject raising: He seemed to laugh; subject control: He wanted to laugh), where there are verbs that seem to allow both syntactic options (e.g., begin: It began to fall (raising) vs. He began to laugh (control)). So, casting the acquisition task as “is this a raising or a control verb?” may actually be an unhelpful idealization — instead of a binary classification, it may be that children are identifying where on the raising-control continuum a verb falls, based on its syntactic usage.

(2) I think what comes out most from the review of semantic and syntactic properties is how everything is about correlations, rather than absolutes. So, we have these semantic and syntactic features, and we have verb classes that involve collections of features with particular values; moreover, there seem to be prototypical examples and less-prototypical examples (where a verb has a bunch of properties, but is exceptional by not having another that usually clumps together with the first bunch). This means we can very reasonably have a way to make generalizations, on the basis of property clusters that verb classes have, but we also allow exceptions (related verb classes of much smaller size, or connections between shared properties of verb classes— like an overhypothesis when it comes to property distributions). I wonder if a Tolerance Principle style analysis would predict which property clusters people (adults or children) would view as productive, on the basis of their input frequency and specific proposals about the underlying verb class structure.

(3) Figure 2 is a great visualization for what these verb classes might look like, on the basis of their syntactic frame use. Now, if we could just interpret those first few principle components, we’d have an idea what the high-level properties (=syntactic feature clusters) were…it looks like this is the idea behind the analysis in 3.4.3, where W&al2018 harness the connection between syntactic frames and PCA components.

Side note: Very interesting that bother, amaze, and tell clump together. I wouldn’t have put these three together specifically, but that first component clearly predicts them to be doing the same (negative) thing with respect to that component. Of course, Fig 6 gives a more nuanced view of this.

Also, I love that W&al2018 are able to use their statistical wizardry to interpret their quantitative results and pull out new theoretical proposals for natural semantic classes and the syntactic reflections of these classes. Quantitative theorizing, check!

(4) Hurrah for learning model targets! If we look for features a verb might have as Table 1 does (rather than set classes, where something must be e.g., representational or preferential but not both, which is a problem for hope), then this becomes a nicely-specified acquisition task to model. That is, given children’s input, can verb classes be formed that have each verb connected with its appropriate property cluster? Moreover, with the similarity judgment data, we can even get a sense of what the adult verb classes look like by clustering the verbs on the basis of their similarity (like in Fig 6).

Another learning model check would be put verbs into classes such that the odd man out behavioral results are matched or the similarity judgments are matched. Another would be to put verbs into classes that predict which verb frames they prefer/disprefer.

(5) In the general discussion, we see a concrete proposal for the syntactic and semantic features a learner could track, along with necessary links between the two feature types. I wonder if it’s possible to infer the links (e,g., representational-main clause), rather than build them in. This is a version of my standard wonder: “If you think the learner needs explicit knowledge X, is it possible to derive X from more foundational or general-purpose building blocks?”

(6) Typo sadness: That copyediting typo with “Neither 1 nor 1…” in the introduction was tough. It took me a bit to work through the intended meaning, given examples 3-5, but I figured the point was that think doesn’t entail its complement while know does, whether they’re positive uses or negated uses. Unfortunately, this typo issue seems to be an issue throughout the first chunk of the paper and in the methods section, where the in-text example numbering got 1-ed out. :(