I think the main innovation of this model (and this style of modeling in general) is to provide a formal representation of how multiple levels of knowledge can be learned simultaneously in a mathematical framework. This is also (unsurprisingly) something that appeals to me very much. Interestingly, this kind of thing seems to be very similar to the idea of linguistic parameters that nativists talk about. A linguistic parameter would most naturally correspond to hyper-parameters (or hyper-hyper-parameters, etc.), with individual linguistic items providing a way to identify the linguistic parameter value - which then allows generalization to items rarely or not yet seen.
The interesting claim (to me) that the authors make here, of course, is that these hyper-parameters have extremely weak priors and so (perhaps) domain-specific knowledge is not required to set their values. I think this may still leave open the question of how the learner knows what the parameters are, however. In this model, the parameters are domain-general things, but linguistic parameters are often assumed to be domain-specific (ex: head-directionality, subject-drop, etc.). Perhaps the claim would be then that linguistic parameters can be re-imagined as these domain-general parameters, and all the details the domain-specific parameters were originally created to explain would fall out from some interplay between the new domain-general parameters and the data.
Some more targeted thoughts:
p.2, dative alternation restrictions: I admit I have great curiosity about how easy it is to saturate on these kind of restrictions. If they're easily mutable, then they're not necessarily the kind of phenomena that linguists often posit linguistic principles and parameters for. Instead, the alternations that we see would be more a sort of accident of usage, rather than reflecting any deep underlying restrictions of language structure. This idea of "accident of usage" comes up again on p.5, where they mention that the distinctions in usage don't seem to be semantically-driven (no completely reliable semantic cues).
p.12, footnote 4: This footnote mentions that the model doesn't involve memory limitations the way humans do, which leads me to my usual question when dealing with rational models - how do we convert this into a process model? Is it straight-forward to add memory limitations and other processing effects? And then, once you have this, do the results found with the computational-level model still occur? This gets at the difference between "is it possible to do" (yes, apparently) and "is it possible for humans to do" (unclear)?
p.15, related thought to the above one involving the different epochs of training: If this process of running the model's inference engine after every so many data points was taken to its extreme, then it seems we could create an incremental version that does its inference thing after every data point encountered. This would be a first step towards creating a process model, I think.
p.23: related to the large opening comment, with respect to the semantic features: This seems like a place where nativists might want to claim an innate bias to heed certain kinds of semantic features over others. (And then they can think about whether the necessary bias is domain-specific or domain-general.)