Monday, October 27, 2014

Some thoughts on Barak et al. 2014

One of things I really liked about this paper was the additional "verb class" layer, which is of course what allows similarities between verbs to be identified, based on their syntactic structure distributions. This seems like an obvious thing, but I don't think I've seen too many incremental models that actually have hierarchy in them (in contrast to ideal learner models operating in batch mode, which often have hierarchical levels in them). So that was great to see. Relatedly, the use of syntactic distributions from other verbs too (not just mental state verbs and communication/perception verbs) feels very much like indirect positive evidence (Pearl & Mis 2014 terminology), where something present in the input is informative, even if it's not specifically about the thing you're trying to learn. And that's also nice to see more explicit examples of. Here, this indirect positive evidence provides a nice means to generalize from communication/perception verbs to mental state verbs.

I also liked the attention spent on the perceptual coding problem (I'm using Lidz & Gagliardi 2014 terminology now) as it relates to mental state verbs, since it definitely seem true that mental state concepts/semantic primitives are going to be harder to extract from the non-linguistic environment, as compared to communication events or perception events.

More specific comments:

(1) Overview of the Model, "The model also includes a component that simulates the difficulty of children attending to the mental content...also simulates this developing attention to mental content as an increasing ability to correctly interpret a scene paired with an SC utterance as having mental semantic properties." -- Did I miss where it was explained how this was instantiated? This seems like exactly the right thing to do, since semantic feature extraction should be noisy early on and get better over time. But how did this get implemented? (Maybe it was in the Barak et al. 2012 reference?)

(2) Learning Constructions of Verb Usages, "...prior probability of cluster P(k) is estimated as the proportion of frames that are in k out of all observed input frames, thus assigning a higher prior to larger clusters representing more frequent constructions." -- This reminds me of adaptor grammars, where both type frequency and token frequency have roles to play (except, if I understand this implementation correctly, it's only token frequency that matters for the constructions, and it's only at the verb class level that type frequency matters, where type = verb).

(3) Learning Verb Classes, "...creation of a new class for a given verb distribution if the distribution is not sufficiently similar to any of those represented by the existing verb classes.", and the new class is a uniform distribution over all constructions. This seems like a sensible way to get at the same thing generative models do by having some small amount of probability assigned to creating a new class. I wonder if there are other ways to implement it, though. Maybe something more probabilistic where, after calculating the probabilities of it being in each existing verb class and the new uniform distribution one, the verb is assigned to a class based on that probability distribution. (Basically, something that doesn't use the argmax, but instead samples.)

(4) Generation of Input Corpora, "...frequencies are extracted from a manual annotation of a sample of 100 child-directed utterances per verb" -- I understand manual annotation is a pain, but it does seem like this isn't all that many per verb. Though I suppose if there are only 4 frames they're looking at, it's not all that bad.  That being said, the range of syntactic frames is surely much more than that, so if they were looking at the full range, it seems like they'd want to have more than 100 samples per verb.

(5) Set-up of Simulations: "...we train our model on a randomly generated input corpus of 10,000 input frames" -- I'd be curious about how this amount of input maps onto the amount of input children normally get to learn these mental state verbs. It actually isn't all that much input. But maybe it doesn't matter for the model, which settles down pretty quickly to its final classifications?

(6) Estimating Event Type Likelihoods: "...each verb entry in our lexicon is represented as a collection of features, including a set of event primitives...think is {state, cogitate, belief, communicate}" -- I'm very curious as to how these are derived, as some of them seem very odd for a child's representation of the semantic content available. (Perhaps automatically derived from existing electronic resources for adult English? And if so, is there a more realistic way to instantiate this representation?)

(7) Experimental Results: "...even for Desire verbs, there is still an initial stage where they are produced mostly in non-mental meaning." -- I wish B&al had had space for an example of this, because I had an imagination fail about what that would be. I want used in a non-mental meaning? What is that for want?

Lidz, J. & Gagliardi, A. 2014 to appear. How Nature Meets Nurture: Universal Grammar and Statistical LearningAnnual Review of Linguistics.

Pearl & Mis 2014. The role of indirect positive evidence in syntactic acquisition: A look at anaphoric one. Manuscript, UCI. [lingbuzz:]

No comments:

Post a Comment