Computational Models of Language (at UC Irvine): Some thoughts on Lloyd-Kelly et al. 2016

I really appreciate this paper as a first attempt to provide a linking story between model representations and infant behavior (in this case, turning probabilities associated with chunked representations into actual infant listening times, using things like time-sensitive trace decay). This highlights how the details of the experimental procedure matter, such as how often syllables are uttered, how long between habituation and test phases, and how long between the individual test stimuli during the test phase. In theory, this would also include all the non-linguistic processes that go into generating observable behavior, like motor control, attention, and memory, though LK&al2016 focused on memory for this first-pass attempt. (I should note that I think including some mechanism for attention would really help them out in future modeling attempts — more on this below.)

Some additional thoughts:

(1) It might be useful to go over some of the details of the CHREST model discussed in the “Participant Modelling” section, and embodied in Figure 1. While the basic division into long-term memory, short-term memory, and a phonological loop makes good sense, I want to make sure I’m clear on the distinction between discriminating, familiarizing, and a node being finished. For instance, why does a “finished” node cause something new to be created?

Relatedly, based on Figure 1, it seems like there’s a built-in primacy effect with respect to inserting a new node. For example, when pa-go is encountered in “pa-go-ti”, but only pa-do exists, the first thing that happens is “go” is added on its own as a primitive. My interpretation: If you get something new, you only manage to grab a piece of it. Primacy biases make you grab the first piece you don’t recognize. (An alternative might be a recency bias, where you grab the last thing, due to phonological loop decay. So, in pa-go-ti, you grab “ti” first.).

(2) I think it’s very handy how the learner ignores incoming requests during the search, retrieval, and updating process. The upshot is that the learner can’t learn new things while it’s still updating old things, which intuitively feels right. Also, it’s nice from a model fit perspective to have three distinct timing variables to tweak in order to match human behavior (though this also gets into issues of maybe being able to overfit with that many degrees of freedom).

(3) I really appreciated the empirical grounding based on children’s sensory auditory memory strength for the phonological store (=600ms). However, then I got a bit confused at to why they were testing out other values for this (800ms and 1000ms) in their simulations. Perhaps because 600ms was only a guess?

This then relates to the interpretation of Figure 2. It looks like the least variable performance comes from a short phonological store trace decay (600ms), though the r^2 is also low (but then, so is the RMSE, which is a good thing). If we take this as “this is the best”, then we might interpret this as quick forgetting mattering more than the other memory retrieval aspects encoded by familiarization and discrimination time.

On the other hand, if we focus on the highest r^2 and lowest RMSE values, then we get these combinations as being best:

800ms phon decay + 10000ms discrimination + 1000-1500ms familiarization

1000ms phon decay + 9000ms discrimination + 2000ms familiarization

Importantly, the 600ms phon decay isn’t even in there. If we take these at face value, then the question is how to interpret it. Perhaps it narrows down the set of possible values for these different memory components in infants. In that case, maybe an 8-month-old phonological store trace decay is closer to a 1 or 2-year-old's, which is 1000-2000ms, rather than 600ms…

…except LK&al2016’s conclusion section seems to take the opposite tack: “…the data obtained in this paper would lend credence to the proposal that the trace decay time of the phonological store is around 600ms for very young infants.” I think I missed how they get there from their results, especially the connection to the digit span findings cited from Gathercole & Adams (1993). It seems super important, given how LK&al2016 think it’s the biggest finding of their paper.

(4) LK&al2016 find a qualitative match to infant looking times (Figure 3), but they note that they’re getting longer times for everything. As LK&al themselves note: “infants appear to become bored much more quickly than the model”. It seems like this indicates a natural role for attention in future models. Interestingly, this is something LK&al2016 didn’t explicitly mention in describing future adaptations of the model in the conclusion. On the plus side, it doesn’t seems like it would be hard to build attention into the listening time calculation (e.g., just subtract some amount from the total looking time, based on some parameter connected to how much time has passed).

Computational Models of Language (at UC Irvine)

Monday, November 14, 2016

Some thoughts on Lloyd-Kelly et al. 2016

No comments:

Post a Comment

People who think this blog is awesome

Members