So, I can start off by saying that there are many things about this paper that warmed the cockles of my heart. First, I love that modeling is highlighted as an explanatory tool. To me, that's one of the best things about computational modeling - the ability to identify an explanation for observed behavior, in addition to being able to produce said behavior. I also love that psychological constraints and biases were being incorporated into the model. This is that algorithmic/process-level-style model that I really enjoy working with, since it focuses on the connection between the abstract representation of what's going on and what people actually are doing. Related to both of the above, I was very happy to see how the model made assumptions concrete and thus isolated (potential) explanatory factors within the model. Now, maybe we don't always agree with how an assumption has been instantiated (see the note on novelty below)- but at least we know it's an assumption and we can see that version of it in action. And that is definitely a good thing, in my (admittedly biased) opinion.
Some more specific thoughts:
I found the general behavioral result from Vlach et al. 2008 about the "spacing effect" to be interesting, where learning was better when items are distributed over a period of time, rather than occurring one right after another. This is the opposite of "burstiness", which (I thought) is supposed to facilitate other types of learning (e.g., word segmentation). Maybe this has to do with the complexity of the thing being learned, or what existing framework there is for learning it (since I believe the Vlach et al. experiments were with adults)?
I thought the semantic representation of the scene as a collection of features was a nice step towards what the learner's representation probably is like (rather than just individual referent objects). When dealing with novel objects and more mature learners, this seems much more likely to me. On the other hand, I was a little fuzzy on how exactly the features and their feature weights were derived for the novel objects. (It's mentioned briefly in the Input Generation section that each word's true meaning is a vector of semantic features, but I missed completely how those are selected.)
Novelty: Nematzadeh et al. (N&al) implement novelty as an inverse function of recency. There's something obviously right about this, but I wonder about other definitions of novelty, like something that taps into overall frequency of this item's appearance (so, novel because it's fairly rare in the input). I'm not sure how this other definition (or a novelty implementation that incorporates both recency and overall frequency) would jive with the experimental results N&al are trying to explain.
Technical side note, related to the above: I had some trouble interpreting equation (2) - is the difference between t and tlastw a fraction of some kind? Maybe because time is measured in minutes, but the presentation durations are in seconds? Otherwise, novelty could become negative, which seems a bit weird.
I was thinking some about the predictions of the model, based on figure 4 and the discussion following it, where N&al are trying to make the model replicate certain experimental results. I think their model would predict that if learners had longer to learn the simplest condition (2 x 2), i.e., the duration of presentation was longer so the semantic representations didn't decay so quickly, that condition should then be the one best learned. That is, the "desirable difficulty" benefit is really about how memory decay doesn't happen so quickly for the 3 x 3 condition, as compared to the 2 x 2 condition.
I found it incredibly interesting that the behavioral experiment Vlach & Sandhofer 2010 (V&S) conducted just happened to have exactly the right item spacing/ordering/something else to yield the interesting results they found, but other orderings of those same items would be likely to yield different (perhaps less interesting) results. You sort of have to wonder how V&S happened upon just the right order - good experiment piloting, I guess? Though at the end of the discussion section, N&al seem to back off from claiming it's all about the order of item presentation, since none of the obvious variables potentially related to order (average spacing, average time since last presentation, average context familiarity) seemed to correlate with the output scores.