Computational Models of Language (at UC Irvine): June 2013

Overall, this paper is concerned with the extent to which children possess abstract knowledge of syntax, and more specifically, children’s ability to acquire generalizations about verb alternations. The authors present two models for the purpose of illustrating that information relevant to verb alternations can be acquired through observations of how verbs occur with individual arguments in the input.

My main point of confusion in this article was and still is about the features used to represent the lowest level of abstraction in the models. The types of features used seem to me to already assume a lot of prior abstract syntactic knowledge… The authors state, “We make the assumption that children at this developmental stage can distinguish various syntactic arguments in the input, but may not yet recognize recurring patterns such as transitive and double-object constructions”, but this assumption still does not quite make sense to me. In order to have a feature such as “OBJ”, don’t you have to have some abstract category for objects? Some abstract representation of what it means to be an object? This seems like more than just a general chunking of the input into constituents because for something to be an object, it has to be in a specific relationship with a verb. So how can you have this feature without already having abstract knowledge of the relationship of the object to the verb? If this type of generalized knowledge is not what is meant, maybe it is just the labels given to these features that bothers me. It seems to me that once a learner has figured out what type each constituent is (OBJ, OBJ2, COMP, PP, etc), the problem of learning generalizations of constructions becomes simple – just find all the verbs that have OBJ and OBJ2 after them and put them into a category together. Even after reading this article twice and discussing it with the class, I am still really missing something essential about the logic behind this assumption.

A few points regarding verb argument preferences:

The comparison of the two models in the results for verb argument preferences seems completely unsurprising… Is this not what Model 1 was made to do? If so, then I would not expect any added benefit from Model 2, but it is unclear what the authors’ expectations were regarding this result.
What is the point of comparing two very similar constructions (prepositional dative and benefactive)? The only difference between these two is the preposition used, so being able to distinguish one from the other does not require abstract syntactic knowledge… as far as I can tell, the differences occur at the phonological level and at the semantic level.
I am curious about the fact that both models acquired approximately 20 different constructions… What were these other constructions and why did they only look at the datives?

A few points regarding novel verb generalization:

I found the comparison of the two models in the results for novel verb generalization to be rather difficult to interpret… In particular, I think organizing the graph in a different way could have made it much more visually interpretable – one in which the bars for model 1 and model 2 were side-by-side on the same graph rather than on separate graphs displayed one above the other. I also would have liked some discussion of the significance of the differences discussed – They say that in comparing Model 2 with Model 1, the PD frame is now more likely than the SC frame, although only slightly. Perhaps just because I’m not used to looking at log likelihood graphs, it is unclear to me whether this difference is significant enough to even bother mentioning because it is barely noticeable on the graph.
On the topic of the behavior observed in children, the authors note that high-frequency verbs tend to be biased toward the double-object form. However, children tend to be biased toward the prepositional dative form. But even in the larger corpus, only about half of the verbs are prepositional-biased, and it is suggested that these are low frequency. So, what is a potential explanation for the observed bias in children? Why would they be biased toward the prepositional dative form if is the low-frequency verbs that are biased this way? This doesn’t make intuitive sense if children are doing some sort of pattern-matching. I would expect children to behave like the model – to more closely match the biases of the high-frequency verbs and therefore prefer to generalize to the double-object construction from the prepositional dative. I think that rather than simply running the model on a larger corpus, it would be useful to construct a strong theory for why children might have this bias and then construct a model that is able to test that theory.

I really liked how this paper tackled a really big problem head on. It's inclusion in subsequent works speaks strongly for the interest in this kind of research. I would really like to see more language papers set a high bar like this and establish a framework for achieving it.

My largest concern about this paper is the fact that the authors seemed to feel that human-guided learning can overcome some of the deficits in the model framework. The large drop off in precision (from 90% to 57%) is not surprising as methods such as the Coupled SEAL and Coupled Morphological Classifier are not robust in the face of locally optimal solutions; it is inevitable that as more and more data is added, the fitness will decline, because the models are already anchored to their fit of previous data. Errors will beget errors, and human intervention will only limit this inherent multiplication.

These errors are further compounded by the fact that the framework does not take into account the degree of independence between its various models. Using group and individual model thresholds for decision making is a decent heuristic, but it is unworkable as an architecture because guaranteeing each model's independence is a hard constraint on the number and types of models that can be used. I believe the framework would be better served by combining the underlying information in a proper, hierarchical framework. By including more models that can inform each other, perhaps the necessity of human-supervised learning can be kept to a minimum.

Computational Models of Language (at UC Irvine)

Friday, June 7, 2013

Some thoughts on Parisien and Stevenson 2010

Thursday, June 6, 2013

Some thoughts on Carlson et al. 2010

People who think this blog is awesome

Members