Tuesday, May 26, 2020

Some thoughts on Liu et al 2019

I really appreciate this paper’s goal of concretely testing different accounts of island constraints, and the authors' intuition that the frequency of the lexical items involved may well have something to do with the (un)acceptability of the island structures they look at. This is something near and dear to my heart, since Jon Sprouse and I worked on a different set of island constraints a few years back (Pearl & Sprouse 2013) and found that the lexical items used as complementizers really mattered. 

Pearl, L., & Sprouse, J. (2013). Syntactic islands and learning biases: Combining experimental syntax and computational modeling to investigate the language acquisition problem. Language Acquisition, 20(1), 23-68.

I do think the L&al2019 paper was a little crunched for space, though -- there were several points where I felt like the reasoning flew by too fast for me to follow (more on this below).


Specific thoughts:
(1) Frequency accounts believe that acceptability is based on exposure. This makes total sense to me for lexical-item-based islands. I wonder if I’d saturate on whether and adjunct islands for this reason.

(grammatical that complementizer) “What did J say that M. bought __?”
 vs. 
(ungrammatical *whether) “What did J wonder whether M. bought __?”
and
(ungrammatical *adjunct (if))“What did J worry if M. bought __?”.

I feel like saturation studies like this have been done at least for some islands, and they didn’t find saturation. Maybe those were islands that weren’t based on lexical items, like subject islands or complex NP islands?

Relatedly, in the verb-frame frequency account, acceptability depends on verb lexical frequency. I definitely get the idea of this prediction (which is nicely intuitive), but Figure 1c seems a specific version of this -- namely, where manner-of-speaking verbs are always less frequent than factive and bridge verbs. I guess this is anticipating the frequency results that will be found?

(2) Explaining why “know” is an outlier (it’s less acceptable than frequency would predict): L&al2019 argue this is due to a pragmatic factor where using “know” implies the speaker already has knowledge, so it’s weird to ask. I’m not sure if I followed the reasoning for the pragmatic explanation given for “know”. 

Just to spell it out, the empirical fact is that “What did J know that M didn’t like __?” is less acceptable than the (relatively high) frequency of “know CP” predicts it should be. So, the pragmatic explanation is that it’s weird for the speaker of the question to ask this because the speaker already knows the answer (I think). But what does that have to do with J knowing something? 

And this issue of the speaker knowing something is supposed to be mitigated in cleft constructions like “It was the cake that J knew that M didn’t like.” I don’t follow why this is, I’m afraid. This point gets reiterated in the discussion of the Experiment 3 cleft results and I still don’t quite follow it: “a question is a request for knowledge but a question with ‘know’ implies that the speaker already has the knowledge”. Again, I have the same problem: “What did J know that M didn’t like __?” has nothing to do with the speaker knowing something.

(3) Methodology: This is probably me not understanding how to do experiments, but why is it that a likert scale doesn’t seem right? Is it just that the participants weren’t using the full scale in Experiment 1? And is that so bad if the test items were never really horribly ungrammatical? Or were there “word salad” controls in Experiment 1, where the participants should have given a 1 or 2 rating, but still didn’t? 

Aside from this, why does a binary choice fix the problem?

(4) Thinking about island (non-)effects: Here, the lack of an interaction between sentence type and frequency was meant to indicate no island effect. I’m more used to thinking about island effects as the interaction of dependency-length (matrix vs embedded) and presence vs absence of an island structure, so an island shows up as a superadditive interaction of dependency length & island structure (i.e., an island-crossing dependency is an embedded dependency that crosses an island structure, and it’s extra bad). 

Here, the two factors are wh-questions (so, a dependency period) + which verb lexical item is used. Therefore, an island “structure” should be some extra badness that occurs when a wh-dependency is embedded in a CP for an “island” lexical item (because that lexical item should have an island structure associated with it). Okay. 

But we don’t see that, so there’s no additional structure there. Instead, it’s just that it’s hard to process wh-dependencies with these verbs because they don’t occur that often. Though when I put it like that, this reminds me of the Pearl & Sprouse 2013 island learning story -- islands are bad because there are pieces of structure that are hard to process (because they never occur in the input = lowest frequency possible). 

So, thinking about it like this, these accounts (that is, the L&al2019 account and the Pearl & Sprouse 2013 [P&S2013] account) don’t seem too different after all. It’s just frequency of what -- here, it’s the verb lexical item in these embedded verb frames; for P&S2013, it was small chunks of the phrasal structure that made up the dependency, some of which were subcategorized by the lexical items in them (like the complementizer).

(5) Expt 2 discussion: I think the point L&al2019 were trying to make about the spurious island effects with Figures 4a vs 4b flew by a little fast for me. Why is log odds [p(acceptable)/p(unacceptable] better than just p(acceptable) on the y-axis? Because doing p(acceptable) on the y axis is apparently what yields the interaction that’s meant to signal an island effect.

(6)  I’m sympathetic to the space limitations of conference papers like this, but the learning story at the end was a little scanty for my taste. More specifically, I’m sympathetic to indirect negative evidence for learning, but it only makes sense when you have a hypothesis space set up, and can compare expectations for different hypotheses. What does that hypothesis space look like here? I think there was a little space to spell it out with a concrete example. 

And eeep, just be very careful about saying absence of evidence is evidence of ungrammaticality, unless you’re very careful about what you’re counting.

Tuesday, May 12, 2020

Some thoughts on Futrell et al 2020

I really liked seeing the technique imports from the NLP world (using embeddings, using classifiers), in the service of psychologically-motivated theories of adjective ordering. Yes! Good tools are wonderful. 

I also love seeing this kind of direct, head-to-head competition between well-defined theories, grounding in a well-defined empirical dataset (complete with separate evaluation set), careful qualitative analysis, and discussion of why certain theories might work out better than others. Hurrah for good science!

Other thoughts:
(1) Integration cost vs information gain (subtle differences): Information gain seems really similar to the integration cost idea, where the size of the set of nouns an adjective could modify is the main thing (as the text notes). Both approaches care about making that entropy gain smaller the further the adjective is away from the noun (since that’s less cognitively-taxing to deal with). The difference (if I’m reading this correctly) is that information gain cares about the set size of the nouns the adjective can’t modify too, and uses that in its entropy calculation.

(2) I really appreciate the two-pronged explanation of (a) the more generally semantic factors (because of improved performance when using the semantic clusters for subjectivity and information gain), and (b) the collocation factor over specific lexical items (because of the improved performance on individual wordforms for PMI). But it’s not clear to me how much information gain is adding above and beyond subjectivity on the semantic factor side. I appreciate the item-based zoom in Table 3, which shows the items that information gain does better on...but it seems like these are wordform-based, not based on general semantic properties. So, the argument that information gain is an important semantic factor is a little tricky for me to follow.