Friday, April 27, 2018

Some thoughts on Kirby 2017 + Adger 2017 + Bowling 2017

I love seeing an articulated approach to studying language evolution from a computational perspective, and appreciate that Kirby addressed some of my concerns with this approach head on (whether or not I found the answers satisfactory). Interestingly, I’m not so bothered by the issues that bother Adger. I also quite appreciate Bowling’s point about gene-culture interactions when it comes to explaining the origins of complex “phenotypes” like language.


Other thoughts:


(1) Kirby 2017


(a) Given his focus on cultural heredity, does Kirby fall on the “language evolved to enable communication, not complex thought” side of the spectrum? He seems to want to distinguish his viewpoint from the language-for-communication side, though. “...cultural transmission once certain biological prerequisites are in place”. I guess it depends on what he thinks the biological prerequisites are? His final claim is that it’s linked to self-domestication (which yields tendencies towards signal copying and sharing), so that again seems more on the language-for-communication side.


(b) According to Kirby, language design features all seem to lead to systematicity, which is the ability to be more compactly represented, a la language grammars. This is a pretty key component in language acquisition, where children seem biased to look for systematicity (i.e., generalizations) in the data they encounter, such as language data. This seems like it comes into play when Kirby talks about systematicity arising from pressures of language learning and language use.


Kirby also indicates that only human languages have systematicity, which makes the studies about meaningful combinations in other animal systems (e.g., some primate calls involving social hierarchies, birdsong involving the rearrangement of components) interesting as a comparison. Presumably, Kirby would say the non-human systematicity is very poor compared to human language systematicity?


(c) Iterated learning: Kirby notes that compositionality emerges over time in simulated agents when all individuals are initialized with random form-referent pairs (e.g. Brighton 2002). But what else is going on in these simulations? What external/internal pressures are there to cause any change at all? That seems important.


(d) I thought it was interesting to see the connection to poverty of the stimulus (i.e., the data being underspecified with respect to the hypothesis speakers used to generate those data). In particular, because the data are compatible with multiple hypotheses, learners can land on a hypothesis that’s different than what the speaker used to generate those data and which probably aligns better with the learner’s already-existing internal biases. Then, this learner grows up and becomes a speaker generating data for the next generation, now using the hypothesis that’s better aligned with learner biases to actually generate the data new learners observe. So, ambiguity in the input signal allows the emergence of language structure that’s easy to pick up when the data are ambiguous, precisely because it aligns with learner biases. So, that’s why little kids ended up being so darned good at learning the same language structure from ambiguous data.


(e) I laughed just a little at the reference to Gold (1967) as having anything to say about humans learning language acquisition. If there’s anything I learned from Johnson (2004) -- a phenomenal paper-- it’s that whenever you cite the computational learnability results of Gold (1967) as having anything to do with human language acquisition, you’re almost invariably wrong.


Johnson, K. (2004). What does Gold's Theorem show about language acquisition?. In Proceedings from the Annual Meeting of the Chicago Linguistic Society, Vol. 40, No. 2, pp. 261-277).


(f) While I appreciate the walk-through of Bayesian reasoning with respect to the likelihood and prior, I cringed just a little at equating the prior with the learner’s biology. It all depends in how you set the model up (and what kind of learning you’re modeling). All “prior” means is prior -- it was in place before the current learning started. That may be because it was there innately (courtesy of biology) or because the learner derived it from prior experience. That said, hats off to Kirby for promoting the Bayesian modeling approach as an example of modeling that you can easily interpret theoretically. I couldn’t agree more with that part.


(g) In terms of interpreting Figure 3, I definitely understand the main point that the size of the prior bias towards regularity (aaaaa languages) doesn’t seem to affect the results at all. But it looks like between 15 and 22% of all languages at the end of learning are this type of language, with near 0% distributed across the other 4 options (aaab, aabb, etc.) Where did the other 78-85% go? Maybe the x axis language instances are samples from the entire population of languages ((a-e)^5), and so the remaining 78-85% is distributed in teeny tiny amounts across all these possibilities? So, therefore, the aaaaa language is the one with a relative majority?


(h) I really like the point that human languages may take on certain properties not because they’re hard-coded into humans, but because humans have a tendency (no matter how slight/weak) towards them. Any little push can make a difference when you have a bunch of ambiguous data to learn from. (This hooks back into the Poverty of the Stimulus ideas from before, and how that contributes to the emergence of languages that can be learned by children from the data they encounter.) That said, I got a little lost with this statement: “the language faculty may contain domain-specific constraints only if they are weak, and strong constraints only if they are domain general. I understand about constraints being weak, whether they’re about language (domain-specific) or cognition generally. But where does the distinction between domain-specific vs. domain-general come from, and where do strong constraints come from at all, based on these results? Maybe this has to do with the domain-general simplicity bias Kirby comes back to at the end, which he makes a case for as a strong innate bias that does a lot of work for us?


(i) In terms of iterated learning in the laboratory, I’m always a little skeptical about what to conclude from these studies. Certainly, we can see the amplification of slight biases in the case of transmission/learning bottlenecks. But in terms of where those biases come from, if they’re representative of pre-verbal primate biases...I’m not sure how we tell. Especially when we consider artificial language learning, we have to consider what language-specific and domain-general biases adult humans have developed by already knowing a human language. For example, does compositionality emerge in the lab because it has to with those conditions or because the humans involved already had compositionality in their hypothesis space because of their experience with their native languages? To be fair, Kirby explicitly acknowledges this issue, and then sets up a simulation with a simplicity bias built in that’s capable of generating the behavioral results from humans. But of course, the simplicity bias is expressed with respect to certain structural preferences (concise context-free transducers). How different from compositionality is this structural preference? This comes up again for me when Kirby notes all the different ways a “simplicity” bias can be cashed out linguistically. So, simple with respect to what seems to matter -- that is, how the learner knows to define simplicity a particular way.


What I find more convincing is the comparison with recently-created signed languages like NSL, in terms of the systemacity that emerges. It seems that whatever cognitive bias is at play for the cohorts of NSLers might also be at play in the experimental participants learning the mini-signed languages. Therefore, you can at least get the same outcome from both people who don’t already know a language (NSLers) and people who do (experimental participants).

(2) Adger 2017


(a) I do think it’s very fair for Adger to note the other sociocultural factors that affect transmission of language through a population, such as invasion, intermarriage, etc. This comes back to how all models idealize, and the real question is if they’ve idealized something important away.


(b) Also, Adger makes a fair point about how “cultural” in Kirby’s terms is really more about transmission pressures in a population that can be formally modeled, rather than what a sociologist might naturally refer to as culture and/or cultural transmission.


(c) I’m not sure I agree that the NSLers showing the emergence of regularity so quickly is an argument against iterated learning. It seems to me that this is indeed a valid case of the transmission & learning bottleneck at the heart of Kirby’s iterated learning approach. The fact that it occurs over years in the NSLers instead of over generations doesn’t really matter, I don’t think.


(d) Adger notes that rapid emergence of certain structures involves “specific cognitive structures” that must also be present. I don’t see this as being incompatible with Kirby’s suggestion of a domain-general simplicity bias. That’s a specific cognitive structure, after all, assuming you count biases as structures.


(e) Adger also brings up certain examples of language not being very simple in an obvious way in order to argue against Kirby’s simplicity bias. But to me, the question is always simple with respect to what? Maybe the seemingly convoluted language structure we see is in fact a simple solution given the other constraints at work (available building blocks like a preference for hierarchical structure, frequent data in the input, etc.). It’s also not obvious to me that seeing hierarchical structure appear over and over again is incompatible with Kirby’s proposal for weak biases leading to highly prevalent patterns. That is, why couldn’t a slight bias for hierarchical structure make the hierarchical structure version the simplest answer to a variety of language structure problems?

(3) Bowling 2017


(a) I really like Bowling’s point that there are bidirectional effects between DNA and the environment (e.g., culture). For me, this makes a nice link with the environmental/cultural factors of transmission and learning by individuals that Kirby’s approach highlights and the biological underpinnings of those abilities. For example, could the evolution of language have reinforced a biologically-based bias for simplicity? That is, could the iterated learning process have made individuals with a predisposition for simplicity more prevalent in the human population? That doesn’t seem far-fetched to me.

(b) “...even though Kirby’s Bayesian models falsely separate genes from learning” - This doesn’t seem like a fair characterization to me. All that Bayesian models do is separate out what was there before you started learning from what you’re currently learning. They don’t specify where the previous stuff came from (i.e., genes vs. environment vs. environment+genes, etc.).