Computational Models of Language (at UC Irvine): Some thoughts on Qing & Franke 2014

One of the things I really liked about this article was the attention to formalizing the pragmatic facts, and the attention to explaining the intuitions behind the model. (That being said, I probably could have benefited from more of a primer on degree semantics, since I had some trouble following the exact instantiation of that in the model.) Still, Q&F2014’s point is to demonstrate the utility of certain assumptions for learning about degree adjectives and then to rigorously evaluate them using standard Bayesian methods, and I think they succeeded on that computational-level goal. In general, I suspect the length constraint was a big issue for this paper — so much was packed in that of course many things had to be glossed over. I do wish Q&F had spent a bit more time on the discussion and conclusions, however — I was left wondering exactly what to make of these results as someone who cares about how acquisition works. For instance, what does an individual need to learn (c, theta?) vs. what’s already built in (lambda?)?

Some more targeted thoughts:

(1) p.2, “…to measure how efficient a standard theta for ‘tall’ is for describing basketball players, we calculate on average how likely the speaker will manage to convey the height of a random basketball player by adopting that standard.” — This text sounds like the goal is to convey exact height, rather than relative height (importantly, is the player in question “tall” relative to the standard theta?). But it seems like relative height would make more sense. (That is, “tall” doesn’t tell you the player's 7’1” vs. 7’2”, but rather that he’s tall compared to other basketball players, captured by that theta.)

(2) p.2, c: I admit, I struggled to understand how to interpret c specifically. I get the general point about how c captures a tradeoff between communicative efficiency and absolute general applicability (side note: which means…? It always applies, I think?). But what does it mean to have communicative efficiency dominate absolute general applicability (with c close to 0) -- that the adjective doesn’t always apply? I guess this is something of a noise factor, more or less. And then there’s another noise factor with the degree of rationality in an individual, lambda.

(3) p.3, Parameters Learning section: c_A is set to range between -1 and 0. Given the interpretations of c we just got on p.2, does this mean Q&F are assuming that the adjectives they investigate (big, dark, tall, full) are generally inapplicable (and so have a higher theta baseline to apply), since c can only be negative if it’s non-zero? It doesn’t seem unreasonable, but if so, this is an assumption they build into the learner. Why not allow it to range from -1 to 1, and allow the learner to assume positive c values are a possibility?

(4) p.6, Conclusion: “Combining the idea of pragmatic reasoning as social cognition…” — Since they’re just looking at individual production in their model (and individual judgments in their experiment), where is the social cognition component? Is it in how the baseline theta is assessed? Something else?

(5) p.6, Conclusion: “…we advanced the hypothesis that the use of gradable adjectives is driven by optimality of descriptive language use.” — What does this mean exactly? How does it contrast with optimal contextual categorization and referential language use? This is definitely a spot where I wish they had had more space to explain, since this seems to get at the issue of how we interpret the results here.

Computational Models of Language (at UC Irvine)

Monday, November 17, 2014

Some thoughts on Qing & Franke 2014

No comments:

Post a Comment

People who think this blog is awesome

Members