I really appreciate Potts sketching out how vectors of numbers as the core meaning could impact semantics more broadly. This is the kind of broader speculation that’s helpful for people trying to see the effects of this key assumption on things they know and love. Moreover, Potts is aware of the current shortcomings of the “DL semantics” approach, but focuses on where it could be a useful tool for semantic theory. (This is how I incline myself, so I’m very sympathetic to this point of view.) Interestingly, I think Berent & Marcus also end up with sympathy to a hybrid approach, despite their concerns about the relationship between symbolic and non-symbolic approaches to language. A key difference seems to be where each commentary focuses — Potts zooms in on semantics, while Berent & Marcus mostly seem to think about phonology and syntax. And previously, non-symbolic approaches seem to have left a poor impression on Berent & Marcus.
(1) Potts: The idea that machine learning is equivalent to neural networks still confuses me temporarily. In my head, machine learning is the learning part (so it could be symbolic, like SVMs). Another important component is then feature selection, which would correspond to the embedding into that vector of numbers in Potts’s terminology. I guess this just goes to show how terminology changes over time.
(2) Potts: I totally get the analogy of how to do function application with an n-dimensional array. But how do we know that this concatenation and multiplication by a new matrix (W) yields the correct compositional meaning of two elements? Maybe the idea is that we have to find the right function application for our n-dimensional vectors? Potts basically says this by saying we have to learn the values for W from the data, so we have to use supervised learning to get the right W so that compositional meaning results. Okay. But what guarantee do we have that there is in fact a W for all the compositional meaning we might want? Of course, maybe that’s a problem for current semantic theory’s function application as well.
(3) Potts, on how the dataset used to optimize the system will be a collection of utterances, rather than I-language abstractions: So, because of this, it’d be including aspects of both representation and use (like frequency info) together, rather than just the representation part. This isn’t a bad thing necessarily, as long as we don’t explicitly care about the representation part separately. It seems like linguists often do care about this while the NLP community doesn’t. I think Potts’s example with the A but B construction highlights this difference nicely. Potts notes that this would make “use phenomena” more natural to study than they currently are under an intensional semantics approach, and I can see this. I just worry about how we derive explanations from a DL approach (i.e., what do we do with the Weight matrix, once we learn it via supervised machine learning approaches?)
(4) Potts, on how the goal in machine learning is generalization, however that’s accomplished (with compositionality just one way to do this): Maybe compositionality is what humans ended up with due to bottleneck issues during processing and learning over time? This is the kind of stuff Kirby (e.g., Kirby 2017) has modeled with his language evolution simulations.
Kirby, S. (2017). Culture and biology in the origins of linguistic structure. Psychonomic Bulletin & Review, 24(1), 118-137.
(5) Potts, on how having any representation for lexical meaning is better than not: I totally agree with this. A hard-to-interpret vector of numbers encoding helpful aspects about the representation and use of “kitty” is still better than [[kitty]]. It just doesn’t help us explain in symbolic terms that we verbalize things with.
(6) Berent & Marcus, on how the algebraic hypothesis assumes an innate capacity to operate on abstract categories: Sure! Hello, Bayesian inference, for example. Yet another reason why I’m always confused when generative folks don’t like Bayesian inference.
(7) Berent & Marcus, “mental operations are structure-sensitive -- they operate only on the form of representations and ignore their meaning”: It seems like this is a syntax-specific view -- surely semantic operations would operate over meaning? Or is this the difference between lexical semantics and higher-order semantics?
(8) Berent & Marcus, on how we could tell if neural networks (NNs) generated algebraic approaches: I’m not sure I quite follow the train of logic presented. If an NN does manage to capture human behavior correctly, why would we assume that it had spontaneously created algebraic representations? Wouldn’t associationists naturally assume that it didn’t have to (unless explicitly proven otherwise)?
(9) Berent & Marcus, on previous connectionist studies: I definitely understand Berent & Marcus’s frustration with previous connectionist networks and their performance, but it seems like there have been vast improvements since 2001. I’d be surprised if you couldn’t make an LSTM of some kind that couldn’t capture some of the generalizations Marcus investigated before, provided enough data was supplied. Granted, part of the cool thing about small humans is that they don’t get all of Wikipedia to learn from, and yet can still make broad generalizations.
(10) Berent & Marcus: Kudos to Berent & Marcus for being clear that they don’t actually know for sure the scope of human generalizations in online language processing -- they’ve been assuming humans behave a particular way that current NNs can’t seem to capture, but this is yet to be empirically validated. If humans don’t actually behave that way, then maybe the algebraic commitment needs some adjustment.
(11) Berent & Marcus: It’s a fascinating observation that a resistance to the idea of innate ideas itself might be an innate bias (the Berent et al. 2019 reference). This is the first I’ve heard of this. I always thought the resistance was an Occam’s Razor sort of thing, where building in innate stuff is more complex than not building in innate stuff.