General thoughts: I so appreciate how careful Portelance and colleagues are in interpreting the results of their modeled learners and tying those results back to current theoretical perspectives on the acquisition of function words. (Also, I appreciate the Pearl (2023) citation in that discussion — thank you!)
I also really like the developmental plausibility of the setup. The models aren’t learning language for its own sake. The feedback signal here is about task success, not about linguistic form. As the authors say, language is an auxiliary objective — a tool for accomplishing something else: communicating about the visual world. That feels right. Children are focused on acting and understanding in the world, and language turns out to be an efficient way to do that with other humans. Getting feedback about whether your interpretation of a scene works — but not about whether your internal representation of a connective is correct — seems developmentally plausible.
At a broad level, the paper offers a genuine proof-of-concept: aspects of the meanings of logical connectors and relational terms can be learned from distributions of linguistic and visual information, without prior knowledge of linguistic meaning.
But once we zoom in, things get more interesting.
(1) What does it mean to be “sensitive to alternative expressions”?
A central question in the paper is whether the existence of “alternative expressions” affects acquisition. In the case of and and or, this connects to a classic Gricean idea: hearing or often leads us to infer “not and,” because if and were true, it would have been more informative to say so.
The authors suggest that their models show early evidence of being “sensitive to alternative expressions when interpreting language.” But that can mean at least two different things:
(A) Representations change when alternatives are present in the training distribution.
(B) The system reasons about alternatives during interpretation in a Gricean sense.
The experiments strongly support (A). It’s less clear to me that they establish (B).
When and and or are both present in training, performance shifts. Removing one affects how the other behaves. That shows interaction and competition. But interaction between representations isn't (yet) the same thing as reasoning about alternative utterances in a pragmatic sense.
(4) Implications for logical nativists
At a broad level, the paper offers a genuine proof-of-concept: aspects of the meanings of logical connectors and relational terms can be learned from distributions of linguistic and visual information, without prior knowledge of linguistic meaning.
But once we zoom in, things get more interesting.
(1) What does it mean to be “sensitive to alternative expressions”?
A central question in the paper is whether the existence of “alternative expressions” affects acquisition. In the case of and and or, this connects to a classic Gricean idea: hearing or often leads us to infer “not and,” because if and were true, it would have been more informative to say so.
The authors suggest that their models show early evidence of being “sensitive to alternative expressions when interpreting language.” But that can mean at least two different things:
(A) Representations change when alternatives are present in the training distribution.
(B) The system reasons about alternatives during interpretation in a Gricean sense.
The experiments strongly support (A). It’s less clear to me that they establish (B).
When and and or are both present in training, performance shifts. Removing one affects how the other behaves. That shows interaction and competition. But interaction between representations isn't (yet) the same thing as reasoning about alternative utterances in a pragmatic sense.
(2) Truth-conditional geometry matters
One structural feature seems especially important: the geometry of the truth space.
For and and or, we have a nested relation: AND ⊂ OR
When both conjuncts are true, both AND and inclusive-OR are true.That creates an overlap region in which two different expressions yield the same answer.
In contrast, for pairs like behind/in front of or more/fewer, the relation is symmetric. They only overlap in contexts where both are false (e.g., equivalent spatial position, equal numbers). There’s no scalar nesting.
This difference seems to matter.
If two expressions yield the same answer in many contexts, the modeled learner repeatedly sees:
Same world → two different words → same label
In a distributed learning system, that encourages representational similarity (“representational entanglement”). When those same expressions diverge elsewhere (e.g., one-conjunct-true cases for and vs or), the gradients conflict. From a logical perspective, opposing truth values increase discriminability. But from a distributed learning perspective, opposing labels on similar inputs can increase gradient conflict.
I think this helps explain why and “yes” contexts are fragile when or is present, and why removing or in Experiment 2 stabilizes and performance. It may not require the model to be explicitly reasoning about alternative utterances — overlapping supervision is enough.
One structural feature seems especially important: the geometry of the truth space.
For and and or, we have a nested relation: AND ⊂ OR
When both conjuncts are true, both AND and inclusive-OR are true.That creates an overlap region in which two different expressions yield the same answer.
In contrast, for pairs like behind/in front of or more/fewer, the relation is symmetric. They only overlap in contexts where both are false (e.g., equivalent spatial position, equal numbers). There’s no scalar nesting.
This difference seems to matter.
If two expressions yield the same answer in many contexts, the modeled learner repeatedly sees:
Same world → two different words → same label
In a distributed learning system, that encourages representational similarity (“representational entanglement”). When those same expressions diverge elsewhere (e.g., one-conjunct-true cases for and vs or), the gradients conflict. From a logical perspective, opposing truth values increase discriminability. But from a distributed learning perspective, opposing labels on similar inputs can increase gradient conflict.
I think this helps explain why and “yes” contexts are fragile when or is present, and why removing or in Experiment 2 stabilizes and performance. It may not require the model to be explicitly reasoning about alternative utterances — overlapping supervision is enough.
(3) What counts as “meaning” here?
In this modeling setup, meaning is operationalized as the pattern of answer behavior across worlds.
So, if two expressions systematically yield the same answer in a subset of contexts, the model may treat them as similar in those contexts. If they diverge elsewhere, it may expect them to diverge consistently. The system is learning statistical mappings between linguistic forms and response patterns, not necessarily structured semantic objects.
That distinction becomes important when interpreting claims about logical knowledge, which seems like a different type of “meaning” to me.
In this modeling setup, meaning is operationalized as the pattern of answer behavior across worlds.
So, if two expressions systematically yield the same answer in a subset of contexts, the model may treat them as similar in those contexts. If they diverge elsewhere, it may expect them to diverge consistently. The system is learning statistical mappings between linguistic forms and response patterns, not necessarily structured semantic objects.
That distinction becomes important when interpreting claims about logical knowledge, which seems like a different type of “meaning” to me.
(4) Implications for logical nativists
From a logical nativist perspective, the burden of proof is not simply showing that statistical learning can approximate correct answers in a constrained domain. The question is whether it yields the structured, compositional representations children appear to have.
So, I’m not sure how worried the logical nativists would be by the findings here.
So, I’m not sure how worried the logical nativists would be by the findings here.
No comments:
Post a Comment