Computational Models of Language (at UC Irvine): Some thoughts on Linzen & Oseki 2018

I really appreciate L&O2018’s focus on the replicability of linguistic judgments in non-English languages (and especially their calm tone about it). I think the situation of potentially unreliable judgments emerging during review highlights the utility of something like registered reports, even for theoretical researchers. If someone finds out during the planning stage that the contrasts they thought were so robust actually aren’t, this may help avoid wasted time building theories to account for the data in question (or perhaps bring in considerations of language variation). [Side note: I have especial feeling for this issue, having struggled to have an author’s judgments about allowed vs. unallowed interpretations in many a semantics seminar paper in graduate school.]

In theory, aspects of the peer review process are supposed to help cover this, but as L&O2018 note in section 4.1, this is harder for non-English languages. To help with this, L&O2018 suggest the open review system in section 4.2, with the crowdsourced database of published acceptability judgments, which sounds incredible. Someone should totally fund the construction of that. As L&O2018 note, this will be especially helpful for less-studied languages that have fewer native speakers.

I’m also completely with L&O2018 on focusing on judgments that aren’t self-evident - but then, who makes the call about what’s self-evident and what’s not? Is it about the subjective confidence of the individual (what’s “obvious to any native speaker”, as noted in section 4)? And if so, what if an individual finds something self-evident, but it’s actually a legitimate point of variation that this individual isn’t aware of, and so another individual wouldn’t view it as self-evident? I guess this is part of what L&O2018 set out to prove, i.e., that a trained linguist has good subjective confidence about self-evidentiality? Section 2.2 covers this, with the three-way classification. But even still, I wonder about the facts that are theoretically presupposed because they’re self-evident vs. theoretically meaningful because they’re not. It’d be great if there was some objective, measurable signal that distinguished them, aside from the acceptability judgments replications of course (since the whole point of having such a signal would be to focus replications on the ones that weren’t self-evident). Mahowald et al. (2016)’s approach of unanimous judgments from 7 people on 7 variants of the data point in question seems like one way to do this -- basically, it’s a mini-acceptability judgment replication. And it does seem more doable, especially with the crowd-sourced judgment platform L&O2018 advocate.

One more thought: L&O2018 make a striking point about the importance of relative acceptability and how acceptability isn’t the same as grammaticality, since raw acceptability value can differ so widely for “grammatical” and “ungrammatical” items. For example, if an ungrammatical item has a high acceptability score (e.g., H8’s starred version had a mean score of 6.06 out of 7), and no obvious dialectal variation, how do we interpret that? L&O2018 reasonable hypothesize that this means it’s not actually ungrammatical. But then, is ungrammatical just about a threshold of acceptability at some point? That is, is low acceptability necessary for (or highly correlated with) ungrammaticality?

Computational Models of Language (at UC Irvine)

Tuesday, October 9, 2018

Some thoughts on Linzen & Oseki 2018

No comments:

Post a Comment

People who think this blog is awesome

Members