Computational Models of Language (at UC Irvine): Some thoughts on Lidz & Gagliardi 2014

My Bayesian-inclined brain really had a fun time trying to translate everything in this acquisition model into Bayesian terms, and I think it actually lends itself quite well to this -- model specification, model variables, inference, likelihood, etc. I'm almost wondering if it's worth doing this explicitly in another paper for this model (maybe for a different target audience, like a general cognitive sciences crowd). I think it'd make it easier to understand the nuances L&G highlight, since these nuances track so well with different aspects of Bayesian modeling. (More on this below.)

That being said, it took me a bit to wrap my head around the distinction between perceptual and acquisitional intake, probably because of that mapping I kept trying to do to the Bayesian terminology. I think in the end I sorted out exactly what each meant, but this is worth talking about more since they do (clearly) mean different things. What I ended up with: perceptual intake is what can be reliably extracted from the input, while acquisitional intake is the subset relevant for the model variables (and of course the model/hypothesis space that defines those variables need to already be specified).

Related to this: It definitely seems like prior knowledge is involved to implement both intake types, but the nature of that prior knowledge is up for grabs. For example, if a learner is biased to weight cues differently for the acquisitional intake, does that come from prior experience about the reliability of these cues for forming generalizations, or is it specified in something like Universal Grammar, irrespective of how useful these cues have been previously? Either seems possible. To differentiate them, I guess you'd want to do what L&G are doing here, where you try to find situations where the information use doesn't map to the information reliability, since that's something that wouldn't be expected from derived prior knowledge. (Of course, then you have to have a very good idea about what exactly the child's prior experience was like, so that you could tell what they perceived the information reliability to be.)

One other general comment: I loved how careful L&G were to highlight when empirical evidence doesn't distinguish between theoretical viewpoints. So helpful. It really underscores why these theoretical viewpoints have persisted in the face of all the empirical data we now have available.

More specific comments:

(1) The mapping to Bayesian terms that I was able to make:
-- Universal Grammar = hypothesis space/model specification
Motivation:
(a) Abstract: "Universal Grammar provides representations that support deductions that fall outside of experience...these representations define the evidence the learners use..." -- Which makes sense, because if the model is specified, the relevant data are also specified (anything that impacts the model variables is relevant).
(b) p.6, "The UG component identifies the class of representations that shape the nature of human grammatical systems".

-- Perceptual Intake = parts of the input that could impact model variables
Motivation:
p.10, "contain[s]...information relevant to making inferences"

-- Acquisitional Intake = parts of the input that do impact model variables

-- Inference engine = likelihood?
Motivation:
(a) p.10, "...makes predictions about what the learner should expect to find in the environment"...presumably, given a particular hypothesis. So, this is basically a set of likelihoods (P(D | H)) for all the Hs in the hypothesis space (defined by UG, for example).
...except
(b) p.21, "...the inference engine, which selects specified features of that representation (the acquisitional intake) to derive conclusions about grammatical representations". This makes it sound like the inference engine is the one selecting the model variables, which doesn't sound like likelihood at all. Unless inference is over the model variables, which are already defined for each H.

-- Updated Grammar, deductive consequences = posterior over hypotheses
Motivation:
p.30, "...inferential, using distributional evidence to license conclusions about the abstract representations underlying language"
Even though L&G distinguish between inferential and deductive aspects, I feel like they're still talking about the hypothesis space. The inferential part is selecting the hypothesis (using the posterior) and the deductive consequences part is all the model variables that are connected to that hypothesis.

(2) The difference about inference: p.4, "On the input-driven view, abstract linguistic representations are arrived at by a process of generalization across specific cases...", and this is interpreted as "not inference" (in contrast to the knowledge-driven tradition). But a process of "generalization across specific cases" certainly sounds a lot like inference, because something has to determine exactly how that generalization is constrained (even if it's non-linguistic constraints like economy or something). So I'm not sure it's fair to say the input-driven approach doesn't use inference, per se. Instead, it sounds like the distinction L&G want is about how that inference is constrained (input-driven: non-linguistic constraints; knowledge-driven: linguistic hypothesis space).

(3) Similarly, I also feel it's not quite fair to divide the world into "nothing like the input" (knowledge-driven) vs. "just like the input, only compressed" (input-driven) (p.5). Instead, it seems like this is more of a continuum, and some representations can be "not obviously" like the input, and yet still be derived from it. The key is knowing exactly what the derivation process is -- for example, for the knowledge-driven approach, the representations could be viewed as similar to the input at an abstract level, even if the surface representation looks very different.

(4) p.6, "...the statistical sensitivities of the learner are sometimes distinct from ideal-observer measures of informativity...reveal the role learners play in selecting relevant input to drive learning." So if the learner has additional constraints (say, on how the perceptual intake is implemented), could these be incorporated into the learner assumptions that would make up an ideal learner model? That is, if we're not talking about constraints that are based on cognitive resources but are instead talking about learner biases, couldn't we build an ideal-observer model that has those biases? (Or maybe the point is that perceptual intake only comes from constraints on cognitive resources?)

(5) p.8, "...it must come from a projection beyond their experience". I think we have to be really careful about claiming this -- maybe "direct experience" is better, since even things you derive are based on some kind of experience, unless you assume everything about them is innate. But the basic point is that some previously-learned or innately-known stuff may matter for how the current direct experience is utilized.

(6) p.9, (referring to distribution of pronouns & interpretations), "...we are aware of no proposals outside the knowledge-driven tradition". Hello, modeling call! (Whether for the knowledge-driven theory, or other theories.)

(7) p.9, "...most work in generative linguistics has been the specification of these representations". I think some of the ire this approach has inspired from the non-generative community could be mitigated by considering which of these representations could be derived (and importantly, from what). It seems like not as many generative researchers (particularly ones who don't work a lot on acquisition) think about the origin of these representations. But since some of them can get quite complex, it rubs some people the wrong way to call them all innate. But really, it doesn't have to be that way -- some might be innate, true, but some of these specifications might be built up from other simpler innate components and/or derived from prior experience.

(8) p.15, "...predicted that the age of acquisition of a grammar with tense is a function of the degree to which the input unambiguously supports that kind of grammar..." And this highlights the importance of what counts as unambiguous data (which is basically data where likelihood p(D | H) is 0 for all but the correct H). And this clearly depends on the model variables involved in all the different Hs (which should be the same??).

(9) p.25, "...preference for using phonological information over semantic information likely reflects perceptual intake in the initial stages of noun class learning". So this could easily be a derived bias, but I would think we would still call it "knowledge-driven" -- it's just that it's derived knowledge, rather than innate knowledge that caused it.

(10) sections 6, Kannada empirical facts -- So interesting! Every time I see this, I always have a quiet moment of goggling. It seems like such an interesting challenge to figure out what these facts could be learned from. Something about binding? Something about goal-prominence? I feel like the top of p.35 has a parameter-style proposal linking possession constructions and these ditransitive facts, which would then be model variables. The Viau & Lidz 2011 proposal that cares about what kind of NPs are in different places also seems like another model variable. Of course, these are very specific pieces of knowledge about model variables...but still, will this actually work (like, can we implement a model that uses these variables and run it)? And if it does, can the more specific model variables be derived from other aspects of the input, or do you really have to know about those specific model variables?

(11) Future Issues, p.47: Yes. All of these. Because modeling. (Especially 5, but really, all of them.)

Computational Models of Language (at UC Irvine)

Monday, October 13, 2014

Some thoughts on Lidz & Gagliardi 2014

No comments:

Post a Comment

People who think this blog is awesome

Members