Some more targeted thoughts:
- I think one of the claims is that the hypothesis space of the learner would exclude hypotheses allowing stress patterns that don't obey neighborhood-distinctness (ND). It seems like a slightly more relaxed version of this could be beneficial: Specifically, the hypothesis space includes stress patterns that don't obey ND, but the learner is strongly biased against these patterns. In order to learn languages that violate ND, the learner has to overcome this bias. So, these non-ND language patterns are in the hypothesis space, but their prior (in Bayesian terms) is low. From this, we could predict that non-ND languages take longer for children to learn - we might expect children to follow ND generalizations for some time before they finally overcome their initial bias based on the data.
- A tangent on the point above - I noticed several examples Heinz mentioned (that were non-ND) involved words of four or five syllables. While of course this varies from language to language, English child-directed speech contains very few words of 4+ syllables. If this is something generally true of languages (and we might imagine some Zipfian distribution that makes it so), maybe that's another reason children might end up with ND hypotheses about stress patterns, and have to unlearn them (or learn exceptions).
- How translatable is the FBLearner and the FSA representation to human acquisition algorithms and human representations? I can see how this might be meant as a direct correlate, especially given the attention given to an online variant of the learning algorithm.