Monday, October 19, 2015

Some thoughts on Braginsky et al. 2015

One of the things I quite like about this paper is that it’s a really nice example of what you can do with observational data (like the CDI data), though of course there are still the standard limitations on the accuracy of caretaker reports, the fact that you’re getting at production (in this case) rather than comprehension so we’re seeing a time delay w.r.t. when the knowledge is acquired by the child, etc. 

Also, how nice to see a study with this many subjects! I think this size subject pool is more standard in medical studies, but it’s really rare that we’ve seen this size for language acquisition studies. This means that when we find trends, we can be more sure it’s not just a fluke of the sample.

The question from the modeler’s perspective then becomes “What can we do with this?” Certainly this provides an empirical checkpoint in multiple languages for specific details about the development trajectory. So, I think this makes it good behavioral data for models of syntactic development (e.g., MOSAIC by Freudenthal & colleagues: Freudenthal et al. 2007, 2009; Variational learning: Yang 2004, Legate & Yang 2007) and models of vocabulary development (e.g., the model of McMurray & colleagues: McMurray 2007, Mitchell & McMurray 2009, McMurray et al. 2012) to try and match their outputs against. Especially good is the differences across languages - these are the kind of nuances that may distinguish models from each other. Perhaps even more interesting would be an attempt to build a joint model that combines promising syntactic development and vocabulary development models together so that you can look for the correlational data this large-scale observational study provides.

Some more targeted thoughts:
(1) The methodology advance of pleases me no end — I think this kind of aggregation approach is the way forward. Once you can aggregate data sets of this size, you can find things that you can feel more confident about as a scientist. So, the finding that there are age effects on syntax (less so on morphology) and on function words (less so on nouns) is something that people will take notice of.

(2) Analysis 1: I wonder how much of an effect the linguistic properties of these languages has (ex: Spanish, Norwegian, and Dutch are morphologically much richer than English). It would be nice to see some sort of quantitative measure of the morphological richness, and maybe other potentially relevant cross-linguistic factors. A related thought: Are there any useful/explanatory cross-linguistic differences in the actual items in the Complexity (Morphological & Syntactic) items?

(3) Analysis 2,  Figure 4: There’s an interesting difference in early Spanish where predicates lag behind function words until the vocabulary size =~ 0.4. Presumably this is something due to the language itself, and the items in the predicates vs. function words categories? It’s notable that Spanish is also the only language where predicates don’t seem to have an age effect coefficient (see Figure 5) - so predicate development is totally predictable from the child’s vocabulary development. Also, Figure 5 shows Danish with a big age effect for Nouns — does this have to do with the particular nouns, I wonder? Or something about Danish nouns in general?


Freudenthal, D., Pine, J. M., Aguado‐Orea, J., & Gobet, F. (2007). Modeling the developmental patterning of finiteness marking in English, Dutch, German, and Spanish using MOSAIC. Cognitive Science31(2), 311-341.

Freudenthal, D., Pine, J. M., & Gobet, F. (2009). Simulating the referential properties of Dutch, German, and English root infinitives in MOSAIC. Language Learning and Development5(1), 1-29.

Legate, J. A., & Yang, C. (2007). Morphosyntactic learning and the development of tense. Language Acquisition14(3), 315-344.

McMurray, B. (2007). Defusing the childhood vocabulary explosion. Science317(5838), 631-631.

McMurray, B., Horst, J. S., & Samuelson, L. K. (2012). Word learning emerges from the interaction of online referent selection and slow associative learning. Psychological review119(4), 831.

Mitchell, C., & McMurray, B. (2009). On leveraged learning in lexical acquisition and its relationship to acceleration. Cognitive Science33(8), 1503-1523.

Yang, C. D. (2004). Universal Grammar, statistics or both?. Trends in cognitive sciences8(10), 451-456.

Monday, October 5, 2015

Tenure-track Assistant Professor, Language Science @ UCI

The Program in Language Science ( at the University of California, Irvine (UCI) is seeking applicants for a tenure-track assistant professor faculty position. We seek candidates who combine a strong background in theoretical linguistics and a research focus in one of its sub-areas with computational, psycholinguistic, neurolinguistic, or logical approaches.
The successful candidate will interact with a dynamic and growing community in language, speech, and hearing sciences within the Program, the Center for Language Science, the Department of Cognitive Sciences, the Department of Logic and the Philosophy of Science, the Center for the Advancement of Logic, its Philosophy, History, and Applications, the Center for Cognitive Neuroscience & Engineering, and the Center for Hearing Research. Individuals whose interests mesh with those of the current faculty and who will contribute to the university's active role in interdisciplinary research and teaching initiatives will be given preference.
Interested candidates should apply online at with a cover letter indicating primary research and teaching interests, CV, three recent publications, three letters of recommendation, and a statement on previous and/or past contributions to diversity, equity and inclusion.
Application review will commence on November 20, 2015, and continue until the position is filled.
The University of California, Irvine is an Equal Opportunity/Affirmative Action Employer advancing inclusive excellence. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, age, protected veteran status, or other protected categories covered by the UC nondiscrimination policy.

Some thoughts on Meylan & Griffiths 2015

I really enjoyed seeing this extension of a reasonable existing word-learning model (which was focused on concrete nouns) to something that tries to capture more of the complexity of word meaning learning. I admit I was surprised to find out that the extension was on the semantics side (compositional meanings) rather than some sort of syntactic bootstrapping (using surrounding word contexts), especially given their opening example. Given the extensive syntactic bootstrapping experimental literature, I think a really cool extension would be to incorporate the idea that words appearing in similar distributional contexts have similar meanings. Maybe this requires a more sophisticated “meaning” hypothesis space, though? 

I also appreciated seeing the empirical predictions resulting from their model (good modeling practices, check!). More specifically, they talk about why their model does better with a staged input representation, and suggest that learning from one, then two, then three words would lead to the same result as learning from three, then two, then one word (which is not so intuitive, and therefore an interesting prediction). To be honest however, I didn’t quite follow the nitty-gritty details of why that should be, so that’s worth hashing out together.

More specific thoughts:
(1) The learners here have the assumption that a word refers to a subset of world-states, and that presumably could be quite large (infinite even) if we’re talking about all possible combinations of objects, properties, and actions, etc. So this means the learner needs to have some restrictions on the possible components of the world-states. I think that’s pretty reasonable — we know from experimental studies that children have conceptual biases, and so probably also have equivalent perceptual biases that filter down the set of possible world-states in the hypothesis space.

(2) The “wag” example walk-through: I’m not sure I understand exactly how the likelihood works here. “Wag” refers to side-to-side motion. If the learner thinks “wag” refers to side-to-side motion + filled/black shading, this is described as being “consistent with the observed data”.  But what about the instances of “wag” occurring with non-filled items (du ri wag, pu ri wag) - these aren’t consistent with that hypothesis. So shouldn’t the likelihood of generating those data, given this hypothesis, be 0? M&G2015 also note for this case that “the likelihood is relatively low in that the hypothesis picks out a larger number of world-states”. But isn’t side-to-side+black/filled compatible with fewer world-states than side-to-side alone?

(3) I like the incorporation of memory noise (which makes this simulation more cognitively plausible). Certainly the unintentional swapping of a word is one way to to implement memory noise that doesn’t require messing with the guts of the Bayesian model (it’s basically an update to the input the model gets). I wonder what would happen if we messed with the internal knowledge representation instead (or in addition to this) and let the learned mappings degrade over time. I could imagine implementing that as some kind of fuzzy sampling of the probabilities associated with the mappings between word and world-state.

(4) Figure 3, with the adult artificial learning results from Kertsen & Earles 2001: Adults are best at object or path mapping, and are much worse at manner mapping. My guess is that has to do with the English bias for manner-of-motion encoded in verbs over direction-of-motion (which happens to be the opposite of the Spanish bias). So, these results could come from a transfer effect from the English L1 — in essence, due to their L1 bias, it doesn’t occur to the English subjects to encode the manner as a separate word from the verb-y/action-y type word. Given what we know about the development of these language-specific verb biases, this may not be present in the same way in children learning their initial language (e.g., there’s some evidence that all kids come predisposed for direction-of-motion encoding — Maguire et al. 2010.)  At any rate, it seems easy enough to build in a salience bias for one type of world-state - just weight the prior accordingly. At the moment, the model doesn’t show same manner deficit and so this could be an empirically-grounded bias to add to the model to account for those behavioral results.

Maguire, M. J., Hirsh-Pasek, K., Golinkoff, R. M., Imai, M., Haryu, E., Vanegas, S., Okada, H., Pulverman, R., & Sanchez-Davis, B. (2010). A developmental shift from similar to language-specific strategies in verb acquisition: A comparison of English, Spanish, and Japanese. Cognition, 114(3), 299-319. 

(5) Also Figure 3:  I’m not sure what to make of the model comparison with human behavior.  I agree that there’s a qualitative match with respect to improvement for staged exposure over full exposure. Other than that? Maybe the percent correct if averaged (sort of) for eta = 0.25. I guess the real question is how well the model is supposed to match the adult behavior. (That is, maybe I’m being too exacting in my expectations for the output behavior of the model, given what it has built into it.)

(6)  Simulation 3 setup: I didn’t quite follow this. Is the idea that the utterance is paired with four world-states, and the learner assumes the utterance refers to one of them? If so, what does this map to in a realistic acquisition scenario?  Having more conceptual mappings possible? In general, I think the page limit forced the authors to cut the description of this simulation short, which makes it tricky to understand.