Tuesday, November 23, 2021

Some thoughts on Bohn et al. 2021

I think it’s really nice to see a developmental RSA model, along with explicit model comparisons. To me, this approach highlights how you can capture specific theories/hypotheses about what exactly is developing via these computational cognitive modeling “snapshots” that capture observable behavior at different ages. Also, we get to see the model-evaluation pipeline often used in RSA adult modeling now used with kids (i.e., the model makes testable predictions that are in fact tested on kids). I also appreciate how careful B&al2021 are with respect to how model parameters link to psychological processes in the discussion (they emphasize in the general discussion that their model necessarily made idealizations to be able to get anywhere).


Some other thoughts:

(1) It’s interesting to me that B&al2021 talk about children integrating all available information, in contrast to alternative models that ignore some information (and don’t do as well). I’m assuming “all” is relative, because a major part of language development is learning which part of the input signal is relevant. For instance, speaker voice pitch is presumably available information, but I don’t think B&al2021 would consider it relevant for the inference process they’re interested in. But I do get that they’re contrasting the winning model with one that ignores some available relevant information.


(2) I feel like the way that B&al2021 talk about informativity seems to differ at points. In one sense, they talk about an informative and cooperative speaker, which seems to link with the general RSA framework of speaker utility as maximizing correct listener inference. In another sense, they connect informativity to alpha specifically, which seems like a narrower sense of “informativity”, maybe tied to how much above 1 alpha is (and therefore how deterministic the probabilities are that the speaker uses).


(3) Methodology, no-word-knowledge variant: I was still a little fuzzy even after reading the methods section about how general vocabulary size is estimated and used in place of specific word familiarity, except that of course it’s the same value for all objects (rather than in fact differing by word familiarity).


Tuesday, November 9, 2021

Some thoughts on Perfors et al. 2010

I’m reminded how much I enjoy this style of modeling work. There’s a lot going on, but the intuitions and motivations for it made sense to me throughout, and I really appreciated how careful P&al2010 were in both interpreting their modeling results and connecting them to the existing developmental literature.


Some thoughts:

(1)  I generally am really a fan of building less in, but building it in more abstractly. This approach makes the problem of explaining where that built-in stuff comes from easier --  if you have to explain where fewer things came from, you have less explaining to do.


(2) I really appreciate how careful P&al2010 are with their conclusions about the value of having verb classes. It does seem like the model with classes (K-L3) captures the age-related effect of less overgeneralization much more strongly while the one with a single verb class (L3) doesn’t. But, P&al2010 still note that both technically capture the key effects. Qualitative developmental pattern as the official evaluation measure, check! (Something we see a lot in modeling work, because then you don’t have to explain every nuance of the observed behavior;  instead you can say the model can predict something that matters a lot for producing that observed behavior, even if it’s not the only thing that matters.)


(3) Study 3: It might seem strange to try to add more to the model in Study 2 that already seems to capture the known empirical developmental data with just syntactic distribution information. But, the thing we always have to remember is that learning any particular thing doesn’t occur in a vacuum -- if information is in the input that’s useful, and children don’t filter it out for some reason, then they probably do in fact use it and it’s helpful to see what impact this has on an explanatory model like this. Basically, does the additional information intensify the model-generated patterns or muck them up, especially if it’s noisy? This can tell us about whether kids could be using this additional information (or when they’re using it) or maybe should ignore it, for instance. This comes back at the end of the results presentation, when P&al2010 mention that having 13 features with only 6 being helpful ruins the model -- the model can’t ignore the other 7, tries to incorporate them, and gets mucked up.  Also, as P&al2010 demonstrate here, this approach could differentiate between different model types (i.e., representational theories here: with verb classes vs. without).


(4) Small implementation thing: In Study 3, when noise is added to the semantic feature correlations, so that the appropriate semantic feature only appears 60% of the time: Presumably this would be implemented across verb instances, rather than only 60% of the verbs in that class having the feature? Otherwise, if some verbs always had the feature and some didn’t, I would think the model would probably end up inferring different classes for each syntactic type instead of just one per syntactic type, e.g., a PD-only class with the P feature and a PD-only class with no feature.