I really enjoy seeing this kind of computational cognitive model, where the model is not only generating general patterns of behavior (like the ability to get the right interpretation for a novel utterance), but specifically matching a set of child behavioral results. I think it’s easier to believe in the model’s informativity when you see it able to account for a specific set of results. And those results then provide a fair benchmark for future models. (So, yay, good developmental modeling practice!)
(1) It’s always great to show what can be accomplished “from scratch” (as G&al2019 note), though this is probably harder than the child’s actual task. Presumably, by the time children are using syntactic bootstrapping to learn harder lexical items, they already have a lexicon seeded with some concrete noun items. But this is fine for a proof of concept -- basically, if we can get success on the harder task of starting from scratch, then we should also get success when we start with a headstart in the lexicon. (Caveat: Unless a concrete noun bias in the early lexicon somehow skews the learning the wrong direction for some reason.)
(2) It’s a pity that the Abend et al. 2017 study wasn’t discussed more thoroughly -- that’s another one using a CCG representation for the semantics, a loose idea of what the available meaning elements are from the scene, and doing this kind of rational search over possible syntactic rules, given naturalistic input. That model achieves syntactic bootstrapping, along with a variety of other features like one-shot learning, accelerated learning of individual vocabulary items corresponding to specific syntactic categories, and easier learning of nouns (thereby creating a noun bias in early lexicons). It seems like a compare & contrast with that Bayesian model would have been really helpful, especially noting what about those learning scenarios was simplified, compared with the one used here.
For instance, “naturalistic” for G&al2019 means utterances which make reference to abstract events and relations. This isn’t what’s normally meant by naturalistic, because these utterances are still idealized (i.e., artificial). That said, these idealized data have more complex pieces in them that make them similar to naturalistic language data. I have no issue with this, per se -- it’s often a very reasonable first step, especially for cognitive models that take awhile to run.
(3) Figure 4: It looks like there’s a dependency where meaning depends on syntactic form, but not the other way around -- I guess that’s the linking rule? But I wonder why that direction and not the other (i.e., shouldn’t form depend on meaning, too, especially if we’re thinking about this as a generative model where the output is the utterance? So, we start with a meaning, and get the language form for that, which means the arrow should go from meaning to syntactic form?)? Certainly, it seems like you need something connecting syntactic type to meaning if you’re going to get syntactic bootstrapping, and I can see in their description of the inference process why it’s helpful to have the meaning depend on the structure (i.e., because they infer the meaning from the structure for a novel verb: P(m_w | s_w), which only works if you have the arrow going from s_w to m_w).
(4) It took me a little bit to understand what was going on in equations 2 and 3, so let me summarize what I think I got here: if we want to get the probability of a particular meaning (which is comprised of several independent predicates), we have to multiply the probability of each of those predicates together (that’s equation 3). To get the probability of each predicate, we sum over all instances of that predicate that are associated with that syntactic type (that’s equation 2).
(5) The learner is constrained to only encode a limited number of entries per word at all times (i.e., only the l-highest weight lexical entries per wordform are retained): I love the ability to constrain the number of entries per word form. This seems exactly right from what I know of the kid word-learning literature, and I wonder how often a limit of two is the best…from Figure 7, it looks like 2 is pretty darned good (pretty much overlapping 7, and better than 3 or 5, if I’m reading those colors correctly).