Computational Models of Language (at UC Irvine): Some thoughts on Wellwood et al. 2016

In general, I love seeing this combination of behavioral and modeling work, and I’m also a big fan of the cue reliability vs. cue accessibility approach that Gagliardi’s work tends to have (ex: Gagliardi et al. 2012, 2014). That said, I had some difficulty following the details of the model that worked best (Model 4), and I’d really like to understand it better (more about this below).

Specific thoughts:

(1) The partitive frame: The partitive frame (“Gleeb of the cows are by the barn”) is an excellent syntactic signal for adults that the word is a quantity word (exact number, quantifier like “most”, etc.). So, that would signal the sense of numerosity, either exact or approximate. Based on the Wynn (1992) work, it seems like two-and-half-year-olds recognize this numerosity-ness of exact number words. Yet I wonder how prevalent the partitive frame is — I’m sure someone must have done a corpus analysis of child-directed speech (I’m thinking some of the former students of Barbara Sarnecka here at UCI).

My intuition is that the partitive frame itself isn’t all that common. (Note: W&al2016 mention a corpus analysis by Syrett et al. 2012 of the partitive frame itself that shows this frame isn’t unambiguous for numerosity words, but don’t mention how often the partitive frame itself occurs.) My intuition might be wrong, but if it’s true, I wonder what other cues are available syntactically in order for numerosity to be associated with exact number words so early. Maybe a more general syntactic distribution sort of thing? This may be important, given that the partitive frame isn’t an unambiguous cue to four-year-old children that the meaning is numerosity-focused.

On the other hand, for the behavioral results, W&al2016 are working with four-year-olds who may have more experience with the partitive frame in their input. Certainly, the partitive frame appears to be a very reliable cue to numerosity meanings (at least when the novel word is in the determiner position: [Det position] “gleebest of the…” vs. [Adj position] “The gleebest of the…”).

It was also useful to learn (in the next section) that determiners only refer to quantities cross-linguistically, so it seems to be a mapping that languages use. A lot. (Interesting question: Where would this bias come from? Built-in (i.e., UG) or (always) derivable somehow?)

(2) A role for informativity?: The example in (5) about why we can’t say “heaviest of the animals” (but we can say “Most of the animals”) reminds me of Greg Scontras’s work on informativity, where, for example, adjective ordering preferences depend on how much uncertainty there is on the part of the listener (ex: big red boxes vs. *red big boxes, as found in Scontras et al. 2015). I wonder if there’s a useful link there from the developmental perspective about which words get mapped to the determiner position. (Or perhaps, why the link between determiner position and permutation invariant words like most would be established.)

(3) The most of the cows were…: My dialect of English utterly fails to allow 6b (“The most of the cows were by the barn.”) I can say something like “The majority of…”, but I just can’t handle “The most of the…”. Hopefully this cue (appearance in the adjectival position of the partitive frame) isn’t too critical a property of superlative acquisition in general. I guess in my case, it’s a cue for ruling out the numerosity meaning and really zeroing in on the quality meaning. So maybe the reliability of the syntactic cues is cleaner than for the dialect that allows “The most of the cows”? (That is, in the experimental stimuli in Table 1, “the gleebest of the cows” isn’t a confounded cue for me. It’s strictly a quality-meaning indicator when the word has the -est morphology.) So, following up on this, I’m not surprised in Figure 3 that the Adjective [“the gleebest cows”] and Confounded [“the gleebest of the cows”] results look alike (i.e., children infer a quality meaning for “gleebest” in these contexts).

(4) Understanding the model variants:

(a) Model 3 (Lexical + Conceptual bias): I couldn’t quite tell from the text, but does joint prior mean there’s equal weight for the lexical and the conceptual prior?

(b) Model 4 (+Perceptual Bias): W&al2016 describe it as “…combining the lexical prior with the intuition that salience impacts how the likelihood, P(d|h), could be encoded with differing reliability for each hypothesis.”

While I deeply appreciate Tables 5 and 7, I really wish we could see the full equation where the alpha, beta, and gamma terms are put into equation form. If I’m interpreting the text correctly, these parameters are meant to alter the likelihood calculation. In Models 1-3, likelihood = 1 and all the work is done in the prior. In Model 4, likelihood is 1 with some probability depending on alpha and beta (and not computed = 0 otherwise?). And then Model 4 is a mixture model of all four of the encoding options (A, B, C, and D). So are these weighted somehow when they’re combined into one probability?

I think they may be weighted based on the alpha and beta parameters (“…combined in a mixture model (the sum of all four terms)” = the alpha, beta, gamma, etc parameters are the weights). My natural inclination is to think “with probability alpha, A happens; with probability beta, B happens, with probability alpha * beta, C happens; with probability 1-p(A or B or C), D happens”, which would then lead to a simple summation like they describe.

Small thing: Also, based on Figure 5, where did the values in Table 6 come from (alpha = quantity confusion = 0.2, beta = quality confusion 0.025)?

References

Gagliardi, A., Feldman, N. H., & Lidz, J. (2012). When suboptimal behavior is optimal and why: Modeling the acquisition of noun classes in Tsez. In Proceedings of the 34th annual conference of the Cognitive Science Society (pp. 360-365).

Gagliardi, A., & Lidz, J. (2014). Statistical insensitivity in the acquisition of Tsez noun classes. Language, 90(1), 58-89.

Scontras, Gregory, Judith Degen & Noah D. Goodman. 2015. Subjectivity predicts adjective ordering preferences. Manuscript from http://web.stanford.edu/~scontras/Gregory_Scontras.html.

Computational Models of Language (at UC Irvine)

Wednesday, June 1, 2016

Some thoughts on Wellwood et al. 2016

No comments:

Post a Comment

People who think this blog is awesome

Members