Computational Models of Language (at UC Irvine): May 2016

This paper really highlights to me the impact of pragmatic factors on children’s interpretations, something that I think we have a variety of theories about but maybe not as many formal implementations of (hello, RSA framework potential!). Also, I’m a fan of the idea of the Semantic Subset, though not as a linguistic principle, per se. I think it could just as easily be the consequence of Bayesian reasoning applied over a linguistically-derived hypothesis space. But the idea that information strength matters is one that seems right to me, given what we know about children’s sensitivities to how we use language to communicate.

That being said, I’m not quite sure how to interpret the specific results here (more details on this below). Something that becomes immediately clear from the learnability discussions in M&C2014 is the need for corpus analysis to get an accurate assessment of what children’s input looks like for all of these semantic elements and their various combinations.

Specific thoughts:

(1) Epistemic modals and scope

John might not come. vs. John can not come: I get that might not is less certain than cannot, and so entailment relations hold. But the scope assignment ambiguity within a single utterance seems super subtle.

(a) Ex: John might not come.

surface: might >> not: It might be the case that John doesn’t come. (Translation: There’s some non-zero probability of John not coming.)

inverse: not >> might: It’s not the case that John might come. (Translation: There’s 0% probability that John might come. = John’s definitely not coming.)

Even though the inverse scope option is technically available, do we actually ever entertain that interpretation in English? It feels more to me like “not all” utterances (ex: “Not all horses jumped over the fence”) — technically the inverse scope reading is there (all >> not = “none”) , but in practice it’s effectively unambiguous in use (always interpreted as “not all”).

(b) Ex: John cannot come.

surface: can >> not: It can be the case that John doesn’t come. (Translation: There’s some non-zero probability of John not coming.)

inverse: not >> can: It’s not the case that John can come. (Translation: There’s 0% probability that John can come. = John’s definitely not coming.)

Here, we get the opposite feeling about how can is used. It seems like the inverse scope is the only interpretation entertained. (And I think M&C2014 effectively say this in the “Modality and Negation in Child Language” section, when they’re discussing how can’t and might not are used in English.)

I guess the point for M&C2014 is that this is the salient difference between might not and cannot. It’s not surface word order, since that’s the same. Instead, the strongly preferred interpretation differs depending on the modal, and it’s not always the surface scope reading. This is what they discuss as a polarity restriction in the introduction, I think. (Though they talk about might allowing both readings, and I just can’t get the inverse scope one.)

(2) Epistemic modals, negation, and input: Just from an input perspective, I wonder how often English children hear can’t vs. cannot (and then we can compare that to mightn’t vs. might not). My sense is that can’t is much more relatively frequent, and might not is much more relatively frequent in each pair. One possible learning story component: The reason we have a different favored interpretation for cannot is that we first encounter it as a single lexical item can’t, and so treat it differently than an item like might where we overtly recognize two distinct lexical elements, might and not. Beyond this, assuming children are sensitive to meaning (especially by five years old), I wonder how often they hear can’t (or cannot) used to effectively mean “definitely not” (favored/only interpretation for cannot) vs. might not used to mean “possibly not” (favored/only interpretation for might not).

(3) Conversational usage:

(a) Byrnes & Duff 1989: Five-year-olds don’t seem to distinguish between “The peanut can’t be under the cup” and “The peanut might not be under the box” when determining the location of the peanut. I wonder how adults did on this task. Basically, it’s a bit odd information-wise to get both statements in a single conversation. As an adult, I had to do a bit of meta-linguistic reasoning to interpret this: “Well, if it might not be under the box, that’s better than ‘can’t’ be under the cup, so it’s more likely to be under the box than the cup. But maybe it’s not under the box at all, because the speaker is expressing doubt that it’s under there.” In a way, it reminds me of some of the findings of Lewis et al. (2012) on children’s interpretations of false belief task utterances as literal statements of belief vs. parenthetical endorsements. (Ex: “Hoggle thinks Sarah is the thief”: literal statement of belief = this is about whether Hoggle is thinking something; parenthetical endorsement: there’s some probability (according to Hoggle) that Sarah is the thief.) Kids hear these kind of statements as parenthetical endorsements way more than they hear them as literal statements of belief in day-to-day conversation, and so interpret them as parenthetical endorsements in false belief tasks. That is, kids are assuming this is a normal conversation and interpreting the statements as they would be used in normal conversation.

Lewis, S., Lidz, J., & Hacquard, V. (2012, September). The semantics and pragmatics of belief reports in preschoolers. In Semantics and Linguistic Theory (Vol. 22, pp. 247-267).

(b) Similarly, in Experiment 1, I wonder again about conversational usage. In the discussion of children’s responses to the Negative Weak true items like “There might not be a cow in the box” (might >> not: It’s possible there isn’t a cow), many children apparently responded False because “A cow might be in the box.” Conversationally, this seems like a perfectly legitimate response. The tricky part is whether the original assertion is false, per se, rather than simply not the best utterance to have selected for this scenario.

(4) The hidden “only” hypothesis:

In Experiment 1, M&C2014 found on the Positive True statements (“There is a cow in the box” with the child peeking to see if it’s true) that children were only at ~51.5% accuracy for being right. This is weirdly low, as M&C2014 note. They discuss this as having to do with the particle “also”, suggesting a link to the “only” interpretation, i.e., children were interpreting this as “There is only a cow in the box.” (Side note: M&C2014 talk about this as “There might only be a cow in the box.”, which is odd. I thought the Positive and Negative sentences were just the bare “There is/isn’t an X in the box.”) Anyway, they designed Experiment 2 to address this specific weirdness, which is nice.

In Experiment 2 though, there seems to me to be a potential weirdness with statements like “There might not be only a cow in the box”. Only has its own scopal impacts, doesn’t it? Even if might takes scope over the rest, we still have might >> not >> only (= “It’s possible that it’s not the case there’s only a cow.” = There may be a cow and something else (as discussed later on in examples 44 and 45) = infelicitous in this setup where you can only have one animal = unpredictable behavior from kids). Another interpretation option is might >> only >> not (= It’s possible that it’s only the case that it’s not a cow.” = may be not-a-cow (and instead be something else) = must be a horse in this setup = desired behavior from kids).

We then find that children in Experiment 2 decrease acceptance of Negative Weak True statements like “There might not be a cow in the box” to 33.3%. So, going with the hidden only story, they’re interpreting this as “It’s not the case that there might be (only) a cow in the box.” Again, we get infelicity if not >> only since there can only be one animal in the box at a time. But this could either be because of the the interpretation above (not >> might >> only) or because of the interpretation might >> not >> only (which is the interpretation that follows surface scope, i.e., not reconstructed.) So it’s not clear to me what this rejection by children means.

(5) Discussion clarification: What’s the difference between example 46 = “It is not possible that a cow is in the box” and example 48 = “It is not possible that there is a cow [is] in the box”? Do these not mean the same thing? And I’m afraid I didn’t follow the the paragraph after these examples at all, in terms of its discussion of how many situations one vs. the other is true in.

(6) Semantic Subset Principle (SSP) selectivity: It’s interesting to note that M&C2014 say the SSP is only invoked when there are polarity restrictions due to a lexical parameter. So, this is why M&C2014 say it doesn’t apply when the quantifier every is involved (in response to Musolino 2006). This then presupposes that children need to know which words have a lexical parameter related to polarity restrictions and which don’t. How would they know this? Is the idea that they just know that some meanings (like quantifier every) don’t get them while others (like quantifier some) do? Is this triggered/inferrable from the input in some way?

One of the things I really enjoyed about this book chapter was all the connections I can see for language acquisition modeling. An example of this for me was the discussion about kids’ (lack of) ability to incorporate pragmatic information of various kinds (more detailed comments on this below). Given that some of us in the lab are currently thinking about using the Rational Speech Act model to investigate quantifier scope interpretations in children, the fact that four- and five-year-olds have certain pragmatic deficits is very relevant.

More generally, the idea that children’s representation of the input — which depends on their processing abilities — matters is exactly right (e.g., see my favorite Lidz & Gagliardi 2015 ref). As acquisition modelers, this is why we need to care about processing. Passives may be present in the input (for example) but that doesn’t mean children recognize them (and the associated morphology). That is, access to the information of the input has an impact, beyond the reliability of the information in the input, and access to the information is what children’s processing deals with.

More specific thoughts:

18.1: I thought it was interesting that there are some theories of adult sentence processing that actively invoke an approximation of the ideal observer as a reasonable model (ex: the McRae & Matuski 2013 that SH2016 cite). I suppose this is the foundation of the Rational Speech Act model as well, even though it doesn’t explicitly consider processing as an active process per se.

18.3: Something that generally comes out of this chapter is children’s poorer cognitive control (which is why they perseverate on their first choices). This seems like it could matter a lot in pragmatic contexts where children’s expectations might be violated in some way. They may show show non-adult behavior not because they can’t get the correct answer, but rather that they can’t get to the correct answer once they’ve built up a strong enough expectation for a different answer.

18.4: Here we see evidence that five-year-olds aren’t sensitive to the referential context when it comes to disambiguating an ambiguous PP attachment (as in “Put the frog on the napkin in the box”). (And this contrasts with their sensitivity to prosody.) So, not only do they perseverate on their first mistaken interpretation, but they apparently don’t utilize the pragmatic context information that would enable them to get the correct interpretation to begin with (i.e. there are two frogs so saying “the frog” is weird until you know which frog — therefore “the frog on the napkin” as a unit makes sense in this communicative context). This insensitivity to the pragmatics of “the” makes me wonder how sensitive children are in general to pragmatic inferences that hinge on specific lexical items — we see in section 18.5 that they’re generally not good at scalar implicatures till later, but I think they can get ad-hoc implicatures that aren’t lexically based (Stiller et al. 2015).

So, if we’re trying to incorporate this kind of pragmatic processing limitation into a model of child’s language understanding (e.g., cripple an adult RSA model appropriately), we may want to pay attention to what the pragmatic inference hinges on. That is, is it word-based or not? And which word is it? Apparently, children are okay if you use “the big glass” when there are two glasses present (Huang & Snedeker 2013). So it’s not just about “the” and referential uniqueness. It’s about “the” with specific linguistic ways of determining referential uniqueness, e.g., with PP attachment. HS2016 mention cue reliability in children’s input as one mitigating factor, with the idea that more reliable cues are what children pick first — and then they presumably perseverate on the results of what those reliable cues tell them.

18.6: It was very cool to see evidence of the abstract category of Verb coming from children’s syntactic priming studies. At least by three (according to the Thothathiri & Snedeker 2008 study), the abstract priming effects are just as strong as the within-verb priming effects, which suggests category knowledge that’s transferring from one individual verb to another. To be fair, I’m not entirely sure when the verb-island hypothesis folks expect the category Verb to emerge (they just don’t expect it to be there initially). But by three is already relatively early.

18.7: Again, something that comes to mind for me as an acquisition modeler is how to use the information here to build better models. In particular, if we’re thinking about causes of non-adult behavior in older children, we should look at the top-down information sources children might need to integrate into their interpretations. Children's access to this information may be less than adults have (or simply children's ability to utilize it, which may effectively work out to the same thing in a model).

References

Lidz, J., & Gagliardi, A. (2015). How nature meets nurture: Universal grammar and statistical learning. Annu. Rev. Linguist., 1(1), 333-353.

McRae, K., & Matsuki, K. (2013). Constraint-based models of sentence processing. In R. Van Gompel (Ed.), Sentence Processing (pp. 51-77). New York, NY: Psychology Press.

Stiller, A. J., Goodman, N. D., & Frank, M. C. (2015). Ad-hoc implicature in preschool children. Language Learning and Development, 11(2), 176-190.

Computational Models of Language (at UC Irvine)

Wednesday, May 18, 2016

Some thoughts on Moscati & Crain 2014

Wednesday, May 4, 2016

Some thoughts on Snedeker & Huang 2016 in press

People who think this blog is awesome

Members