I’m reminded how much I enjoy this style of modeling work. There’s a lot going on, but the intuitions and motivations for it made sense to me throughout, and I really appreciated how careful P&al2010 were in both interpreting their modeling results and connecting them to the existing developmental literature.
Some thoughts:
(1) I generally am really a fan of building less in, but building it in more abstractly. This approach makes the problem of explaining where that built-in stuff comes from easier -- if you have to explain where fewer things came from, you have less explaining to do.
(2) I really appreciate how careful P&al2010 are with their conclusions about the value of having verb classes. It does seem like the model with classes (K-L3) captures the age-related effect of less overgeneralization much more strongly while the one with a single verb class (L3) doesn’t. But, P&al2010 still note that both technically capture the key effects. Qualitative developmental pattern as the official evaluation measure, check! (Something we see a lot in modeling work, because then you don’t have to explain every nuance of the observed behavior; instead you can say the model can predict something that matters a lot for producing that observed behavior, even if it’s not the only thing that matters.)
(3) Study 3: It might seem strange to try to add more to the model in Study 2 that already seems to capture the known empirical developmental data with just syntactic distribution information. But, the thing we always have to remember is that learning any particular thing doesn’t occur in a vacuum -- if information is in the input that’s useful, and children don’t filter it out for some reason, then they probably do in fact use it and it’s helpful to see what impact this has on an explanatory model like this. Basically, does the additional information intensify the model-generated patterns or muck them up, especially if it’s noisy? This can tell us about whether kids could be using this additional information (or when they’re using it) or maybe should ignore it, for instance. This comes back at the end of the results presentation, when P&al2010 mention that having 13 features with only 6 being helpful ruins the model -- the model can’t ignore the other 7, tries to incorporate them, and gets mucked up. Also, as P&al2010 demonstrate here, this approach could differentiate between different model types (i.e., representational theories here: with verb classes vs. without).
(4) Small implementation thing: In Study 3, when noise is added to the semantic feature correlations, so that the appropriate semantic feature only appears 60% of the time: Presumably this would be implemented across verb instances, rather than only 60% of the verbs in that class having the feature? Otherwise, if some verbs always had the feature and some didn’t, I would think the model would probably end up inferring different classes for each syntactic type instead of just one per syntactic type, e.g., a PD-only class with the P feature and a PD-only class with no feature.
No comments:
Post a Comment