One of the things I really liked about this paper was that it implements a computational model that makes predictions, and then test those predictions experimentally. It's becoming more of a trend to do both within a single paper, but often it's too involved to describe both parts, and so they end up in separate papers. Fortunately, here we see something concise enough to fit both in, and that's a lovely thing.
I also really liked that R&al investigate the logical problem of language acquisition (LPLA) by targeting one specific instance of that problem that's been held up (or used to be held up as recently as ten years ago) as an easily understood example of the LPLA. I'm definitely sympathetic to R&al's conclusions, but I don't think I believe the implication that this debunks the LPLA. I do believe it's away to solve it for this particular instantiation, but the LPLA is about induction problems in general -- not just this one, not just subset problems, but all kinds of induction problems. And I do think that induction problems abound in language acquisition.
It was interesting to me how R&al talked about positive and negative evidence -- it almost seemed like they conflated two dimensions that are distinct: positive (something present) vs. negative (something absent), and direct (about that data point) vs. indirect (about related data points). For example, they equate positive evidence with "the reinforcement of successful predictions", but to me, that could be a successful prediction about what's supposed to be there (direct positive evidence) or a successful prediction about what's not supposed to be there (indirect negative evidence). Similarly, prediction error is equated with negative evidence, but a prediction error could be about predicting something should be there but it actually isn't (indirect negative evidence) or about predicting something shouldn't be there but it actually is (direct positive evidence -- and in particular, counterexamples). However, I do agree with their point that indirect negative evidence is a reasonable thing for children to be using, because of children's prediction ability.
Another curious thing for me was that the particular learning story R&al implement forces them to commit to what children's semantic hypothesis space is for a word (since it hinges on selecting the appropriate semantic hypothesis for the word as well as the appropriate morphological form, and using that to make predictions). This seemed problematic, because the semantic hypothesis space is potentially vast, particularly if we're talking about what semantic features are associated with a word. And maybe the point is their story should work no matter what the semantic hypothesis space is, but that wasn't obviously true to me.
As an alternative, it seemed to me that the same general approach could be taken without having to make that semantic hypothesis space commitment. In particular, suppose the child is merely tracking the morphological forms, and recognizes the +s regular pattern from other plural forms. This causes them to apply this rule to "mouse" too. Children's behavior indicates there's a point where they use both "mice" and "mouses", so this is a morphological hypothesis that allows both forms (H_both). The correct hypothesis only allows "mice" (H_mice), so it's a subset-superset relationship of the hypotheses (H_mice is a subset of H_both). Using Bayesian inference (and the accompanying Size Principle) should produce the same results we see computationally (the learner converges on the H_mice hypothesis over time). It seems like it should also be capable of matching the experimental results: early on, examples of the regular rule indirectly boost the H_both hypothesis more, but later on when children have seen enough suspicious coincidences of "mice" input only, the indirect boost to H_both matters less because H_mice is much more probable.
So then, I think the only reason to add on this semantic hypothesis space the way R&al's approach does is if you believe the learning story is necessarily semantic, and therefore must depend on the semantic features.
Some more specific thoughts:
(1) The U-shaped curve of development: R&al talk about the U-shaped curve of development in a way that seemed to odd to me. In particular, in section 6 (p.767), they call the fact that "children who have been observed to produce mice in one context may still frequently produce overregularized forms such as mouses in another" a U-shaped trajectory. But this seems to me to just be one piece of the trajectory (the valley of the U, rather than the overall trajectory).
(2) The semantic cues issue comes back in an odd way in section 6.7, where R&al say that the "error rate of unreliable cues" will "help young speakers discriminate the appropriate semantic cues to irregulars" (p.776). What semantic cues would these be? (Aren't the semantics of "mouses" and "mice" the same? The difference is morphological, rather than semantic.)
(3) R&al promote the idea that a useful thing computational approaches to learning do is ''discover structure in the data" rather than trying to "second-guess the structure of those data in advance" (section 7.4, p.782). That seems like a fine idea, but I don't think it's actually what they did in this particular computational model. In particular, didn't they predefine the hypothesis space of semantic cues? So yes, structure was discovered, but it was discovered in a hypothesis space that had already been constrained (and this is the main point of modern linguistic nativists, I think -- you need a well-defined hypothesis space to get the right generalizations out).