I appreciated seeing up front the traditional argument about economy of representation because it’s often at the heart of theoretical debate. What’s interesting to me is the assumption that something is more economical just based on intuition, without having some formal way to evaluate how economical it is. So, good on V&al2019 for thinking about this issue explicitly. More generally, when I hear this kind of debate about categorical grammar + external gradience vs. gradience in the grammar, I often wonder how on earth you could tell the difference. V&al2019 are approaching this from an economy angle, rather than a behavioral fit angle, and showing a proof-of-concept with the SOSP model. That said, it’s interesting to note that the SOSP implementation of a gradient grammar clearly includes both syntactic and semantic features -- and that’s where its ability to handle some of the desiderata comes from.
Other comments:
(1) Model implementation involving the differential equations: If I understand this correctly, the model is using a computational-level way to accomplish the inference about which treelets to choose. (This is because it seems like it requires a full enumeration of the possible structures that can be formed, which is typically a massively parallel process and not something we think humans are doing on a word-by-word processing basis.) Computational-level inference for me is exactly like this: we think this is the inference computation humans are trying to do, but they approximate it somehow. So, we’re not committed to this being the algorithm humans use to accomplish that inference.
That said, the way V&al2019 describe here isn’t an optimal inference mechanism, the way Gibbs sampling is, since it seems to allow the equivalent of probabilistic sampling (where a sub-optimal option can win out in the long run). So, this differs from the way I often see computational-level inference in Bayesian modeling land, because there the goal is to identify the optimal result of the desired computation.
(2) Generating grammaticality vs. acceptability judgments from model output: I often hear “acceptability” used when grammaticality is one (major) aspect of human judgment, but there are other things in there too (like memory constraints or lexical choice, etc.). I originally thought the point of this model was that we’re trying to generate the judgment straight from the grammar, rather than from other factors (so it would be a grammaticality judgment). But maybe because a word’s feature vector also includes semantic features (whatever those look like), then this is why the judgment is getting termed acceptability rather than grammaticality?
(3) I appreciate the explanation in the Simulations section about the difference between whether islands and subject islands -- basically, for subject islands, there are no easy lexical alternatives that would allow sub-optimal treelets to persist that will eventually allow a link between the wh-phrase and the gap. But something I want to clear up is the issue of parallel vs. greedy parsing. I had thought that the SOSP approach does greedy parsing because it finds what it considers the (noisy) local maximum option for any given word, and proceeds on from there. So, for whether islands, it picks either the wonder option that’s an island or the wonder option that gets coerced to think, because both of those are somewhat close in how harmonic they are. (For subject islands, there’s only one option -- the island one -- so that’s the one that gets picked). Given this, how can we talk about the whether island as having two options at all? Is it that on any given parse, we pick one, and it’s just that sometimes it’s the one that allows a dependency to form? That would be fine. We’d just expect to see individual variation from instance to instance, and the aggregate effect would be that D-linked whether islands are better, basically because sometimes they’re okay-ish and sometimes they’re not.