Yang, C. D. (2004). Universal Grammar, statistics or both? Trends in cognitive sciences, 8(10), 451-456.
Monday, December 1, 2014
Some thoughts on Richie et al. 2014
One thing I really like about this article is that it provides a nice example of how to make a concrete, empirically grounded computational model to test an intuitive theory about a particular phenomenon. In this case, it’s the impact of multiple speakers on lexicon emergence (in particular, the many-to-one vs. many-to-many dynamic). While I do have one or two questions about the model implementation, it was generally pretty straightforward to understand and nicely intuitive in its own right — and so for me, this is an excellent demonstration of how to use modeling in an informative way. On a related note, while the authors certainly packed both experimental and modeling pieces into the paper, it actually didn’t feel all that rushed (but perhaps this is because I’m already fairly familiar with the model).
Some more targeted thoughts:
p.185, Introduction: “The disconnect between experimental and computational approaches is a general concern for research on collective and cooperative behavior” — I think this is really the biggest concern I always have for models of general language evolution. At least for the sign lexicon emergence, we actually have examples of it happening “in the wild”, so we can ground the models that way (in the input representation, transmission, output evaluation, etc.). But this becomes far harder (if not impossible) to do for the evolution of language in the human species. What are reasonable initial conditions? What is the end result supposed to look like anyway? Ack. And that doesn’t even begin to get into the problem of idealization in evolutionary modeling (what to idealize, is it reasonable to do that, and so on). So for my empirical heart, one of the main selling points of this modeling study is the availability of the empirical data, and the attempt to work it into the model in a meaningful way.
p.185, Introduction: “A probabilistic model of language learning…situated in a social setting appears to capture the observed trends of conventionalization” — One of the things I’m wondering is how much the particulars of the probabilistic learning model matter. Could you be a rational Bayesian type for example, rather than a reinforcement learner, and get the same results? In some sense, I hope this would be true since the basic intuition that many-to-many is better for convergence seems sensible. But would the irrelevance of population dynamics persist, or not? Since that’s one of the less intuitive results (perhaps due to the small population size, perhaps due to something else), I wonder.
p.186, 2.1.4 Coding: “…we coded every gesture individually for its Conceptual Component” — On a purely practical note, I wonder how these were delineated. Is it some recognizable unit in time (so the horns of the cow would occur before the milking action of the cow, and that’s how you decide they’re two meaning pieces)? Is it something spatial? Something else, like a gestalt of different features? I guess I’ve been thinking about the simultaneous articulation aspects of signed languages like ASL, and this struck me as something that could be determined by human perceptual bias (which could be interesting in its own right).
p.190, 3.2: For the adjustment of p using the Linear-Reward-Penalty, is the idea that each Conceptual Component’s (CC’s) probability is adjusted, one at a time? I’m trying to map this to what I recall of previous uses of Yang’s learning model (e.g., Yang 2004), where the vector would be of different grammar parameters, and the learner actually can’t tell which parameter is responsible for successful analysis of the observed data point. In that case, all parameter probabilities are either rewarded or punished, based on the success (or failure) of input analysis. Here, since the use (or not) of a given CC is observed, you don’t have to worry about that. Instead, each one can be rewarded or punished based on its observed use (or not). So, in some sense, this is simpler than previous uses of the learning model, precisely because this part of learning is observed.
p.191, 3.5: “…we run the simulations over 2 million instances of communications” — So for the homesigners with the many-to-one setup, this can easily be interpreted as 2 million homesigner-nonhomesigner interactions. For deaf population simulation, is this just 2 million deaf-deaf communication instances, among any of the 10 pairs in a population of 5? Or does each of the 10 pairs get 2 million interactions? The former seems fairer for comparison with the homesigner population, but the latter is a possible sensible instantiation. If it’s the latter, then the overall frequency of interactions within the population might be what’s driving faster convergence.