One of the things I quite liked about this paper was the description of the intuitions behind the different model parameters and capacity limitations. As a computational modeler who's seen ideal Bayesian learners before, could I have just as easily decoded this from a standard graphical model representation? Sure. Did I like to have the intuitions laid out for me anyway? You bet. Moreover, if we want these kind of models to be recognized and used within language research, it's good to know how to explain them like this. On a related note, I also appreciated that Perfors explicitly recognized the potential issues involved in extending her results to actual language learning. As with most models, hers is a simplification, but it may be a useful simplification, and there are probably useful ways to un-simplify it.
It was also good to see the discussion of the relationship between the representations this model used for memory and the existing memory literature. (Given the publication venue, this probably isn't so surprising, but given that my knowledge of memory models is fairly limited, it was helpful to see this spelled out.)
I think the most surprising thing for me was how much memory loss was required for the regularization bias to be able to come into play and allow the model to show regularization. Do we really think children only remember 10-20% of what they hear? (Maybe they do, though, especially in more realistic scenarios.)
More specific thoughts:
Intro: I found the distinctions made between different "less is more" hypothesis variants to be helpful, in particular the difference between a "starting small" version that imposes explicit restrictions on the input (because of attention, memory, etc.) to identify useful units in the input vs. a general regularization tendency (which may be the byproduct of cognitive limitations, but isn't specifically about ignoring some of the input) which is about "smoothing" the input in some sense.
Section 2.1.2: The particular task Perfors chooses to investigate experimentally is based on previous tasks that have been done with children and adults to test regularization, but I wonder what kind of task it seemed like to the adult subjects. Since the stimuli were presented orally, did the subjects think of each one as a single word that had some internal inconsistency (and so might be treating the variable part as morphology tacked onto a noun) or would they have thought of each one as one consistent word plus a separate determiner-like thing (making this more of a combinatorial syntax task)? I guess it doesn't really matter for the purposes of regularization - if children can regularize syntax (creoles, Nicaraguan sign language, Simon), then presumably they regularize morphology (e.g., children's overregularization of the past tense in English, like
goed), and it's not an unreasonable assumption that the same regularization process would apply to both. Perfors touches again on the issue of how adults perceived the task a little in the discussion (p.40) - she mentions that mutual exclusivity might come into play if adults viewed this as a word learning task, and cause more of a bias for regularization. Whether it's a morphology task or a combinatorial syntax task, I'm not sure I agree with that - mutual exclusivity seems like it would only apply if adults assumed the entire word was the name of the object (as opposed to the determiner-thing being an actual determiner like
the or
a or morphology like
-ed or -
ing). Because only a piece of the entire "word" would change with each presentation of the object, it doesn't seem like adults would make that assumption.
Section 3.0.6: For the Prior bias, it seems like prior is constructed from the global frequency of the determiner (based on the CRP). This seems reasonable, but I wonder if it would matter any to have a lexical-item-based prior (maybe in addition to the global prior)? I could imagine that the forgotten data for any individual item might be quite high (even if others are low) when memory loss is less than 80-90% globally, which might allow the regularization effects to show up without needing to forget 80-90% of all the data.
Section 4: It's an interesting observation that the previous experiments that found regularization effects conducted the experiment over multiple days, where consolidation during sleep would have presumably occurred. Perfors mentions this as a potential memory distortion that doesn't occur during encoding itself, or retrieval, but rather with the processes of memory maintenance. If this is true, running the experiments again with adults, but over multiple days, should presumably allow this effect to show up.