Monday, October 19, 2020

Some thoughts on Ovans et al. 2020

I really enjoy seeing this kind of precise quantitative investigation of children’s input, and how it can explain their non-adult-like behavior. This particular case involves language processing, and recovering from an incorrect parse, and the upshot is that kids may be doing perfectly sensible things on these test stimuli, given children’s input.  This underscores for me how ridiculously hard it is to consider everything when you’re designing behavioral experiments with kids, and the value of quantitative work for teasing apart the viability of possible explanations (here: immature executive function vs. mature inference over the differently-skewed dataset of child-directed speech). 


Other specific thoughts:

(1) The importance of the model assumptions: Here, surprisal is the main metric, and its precise value of course depends on the language model you’re using. Here, O&al2020 thought the specific verbs (like “put”) were important to separate out in the language model, because of the known lexical restrictions on verb arguments (and therefore possible parses). If they hadn’t done this, they might have gotten very different surprisal values, as the probabilities for “put” parses would have been aggregated with the probabilities for other verbs like “eat” and “hug”.


It’s because of this importance that I had something of a mental hiccup at the beginning of section 3, before I realized that more detail about the exact language model would come later in section 3.2. ;) 


I also want to note that I don’t think it’s crazy to have grammar rules separated out by the verb lexical item, precisely because of how the argument distributions can depend on the verb. But, this does mean that you get a lot of duplication in PCFG rules (e.g., VP_eat, VP_drink look pretty similar, but are treated completely separate). And when there’s duplication, we may miss generalizations.


(2)  Related thought, from section 4.2: “...our calculation of surprisal included a measure of lexical frequency, and for children, each noun token was relatively unexpected” -- I thought only the verbs were lexicalized (and that seems to be what Figures 2 and 3 would suggest on the x axis labels: put the.1 noun.1 prep.1 the.2 noun.2…). So, where does noun lexical frequency come into this? Why wouldn’t all nouns simply be “noun”? I think I may have misunderstood something in the language model.


(3) O&al2020 find low surprisal at the disambiguating P (e.g., “into”), and interpret that to mean children don’t detect that a reparse is needed. Just to check my understanding: The issue that children have is detecting that they misparsed, given the probability of the word coming next. The explanation O&al2020 give is that children are getting surprised by other things in the sentence (like open-class words like nouns), so the relative strength of the error signal from the disambiguating P slips under their detection radar. That is, lots of things are surprising to kids because they don’t have as much experience with language, so the “you parsed it wrong” surprise is relatively less than it is for adults. That seems reasonable. 


Of course, then O&al2020 themselves note that this is slightly weird, because surprisal then isn’t about parsing disambiguation, even though it’s actually implemented here by summing over possible parses. Except, is this that weird? For the nouns, the parse is simply whether that lexical item can go in that position (caveat: assuming we have the lexical items for nouns and not just the Noun category). That’s a general integration cost, though it’s being classified as a “parse”. If we just think about surprisal as integration, is this explanation really so strange? Integrating open-class words like nouns is harder than integrating close-class words like determiners and prepositions. So, any integration difficulty that a preposition signals can be overshadowed by the difficulty a noun causes.