Wednesday, October 29, 2014

Next time on 11/19/14 @ 10:30am in SBSG 2221 = Qing & Franke 2014

Thanks to everyone who was able to join us for our incisive and informative discussion of Barak et al. 2014!  For our next CoLa reading group meeting on Wednesday November 19 at 10:30am in SBSG 2221, we'll be looking at an article that investigates the acquisition of gradable adjectives like "tall", using a Bayesian approach that incorporates pragmatic reasoning.

Qing, C. & Franke, M. 2014. Meaning and Use of Gradable Adjectives: Formal Modeling Meets Empirical Data. Proceedings of the Cognitive Science Society.


See you then!




Monday, October 27, 2014

Some thoughts on Barak et al. 2014

One of things I really liked about this paper was the additional "verb class" layer, which is of course what allows similarities between verbs to be identified, based on their syntactic structure distributions. This seems like an obvious thing, but I don't think I've seen too many incremental models that actually have hierarchy in them (in contrast to ideal learner models operating in batch mode, which often have hierarchical levels in them). So that was great to see. Relatedly, the use of syntactic distributions from other verbs too (not just mental state verbs and communication/perception verbs) feels very much like indirect positive evidence (Pearl & Mis 2014 terminology), where something present in the input is informative, even if it's not specifically about the thing you're trying to learn. And that's also nice to see more explicit examples of. Here, this indirect positive evidence provides a nice means to generalize from communication/perception verbs to mental state verbs.

I also liked the attention spent on the perceptual coding problem (I'm using Lidz & Gagliardi 2014 terminology now) as it relates to mental state verbs, since it definitely seem true that mental state concepts/semantic primitives are going to be harder to extract from the non-linguistic environment, as compared to communication events or perception events.



More specific comments:

(1) Overview of the Model, "The model also includes a component that simulates the difficulty of children attending to the mental content...also simulates this developing attention to mental content as an increasing ability to correctly interpret a scene paired with an SC utterance as having mental semantic properties." -- Did I miss where it was explained how this was instantiated? This seems like exactly the right thing to do, since semantic feature extraction should be noisy early on and get better over time. But how did this get implemented? (Maybe it was in the Barak et al. 2012 reference?)

(2) Learning Constructions of Verb Usages, "...prior probability of cluster P(k) is estimated as the proportion of frames that are in k out of all observed input frames, thus assigning a higher prior to larger clusters representing more frequent constructions." -- This reminds me of adaptor grammars, where both type frequency and token frequency have roles to play (except, if I understand this implementation correctly, it's only token frequency that matters for the constructions, and it's only at the verb class level that type frequency matters, where type = verb).

(3) Learning Verb Classes, "...creation of a new class for a given verb distribution if the distribution is not sufficiently similar to any of those represented by the existing verb classes.", and the new class is a uniform distribution over all constructions. This seems like a sensible way to get at the same thing generative models do by having some small amount of probability assigned to creating a new class. I wonder if there are other ways to implement it, though. Maybe something more probabilistic where, after calculating the probabilities of it being in each existing verb class and the new uniform distribution one, the verb is assigned to a class based on that probability distribution. (Basically, something that doesn't use the argmax, but instead samples.)

(4) Generation of Input Corpora, "...frequencies are extracted from a manual annotation of a sample of 100 child-directed utterances per verb" -- I understand manual annotation is a pain, but it does seem like this isn't all that many per verb. Though I suppose if there are only 4 frames they're looking at, it's not all that bad.  That being said, the range of syntactic frames is surely much more than that, so if they were looking at the full range, it seems like they'd want to have more than 100 samples per verb.

(5) Set-up of Simulations: "...we train our model on a randomly generated input corpus of 10,000 input frames" -- I'd be curious about how this amount of input maps onto the amount of input children normally get to learn these mental state verbs. It actually isn't all that much input. But maybe it doesn't matter for the model, which settles down pretty quickly to its final classifications?

(6) Estimating Event Type Likelihoods: "...each verb entry in our lexicon is represented as a collection of features, including a set of event primitives...think is {state, cogitate, belief, communicate}" -- I'm very curious as to how these are derived, as some of them seem very odd for a child's representation of the semantic content available. (Perhaps automatically derived from existing electronic resources for adult English? And if so, is there a more realistic way to instantiate this representation?)

(7) Experimental Results: "...even for Desire verbs, there is still an initial stage where they are produced mostly in non-mental meaning." -- I wish B&al had had space for an example of this, because I had an imagination fail about what that would be. I want used in a non-mental meaning? What is that for want?


References:
Lidz, J. & Gagliardi, A. 2014 to appear. How Nature Meets Nurture: Universal Grammar and Statistical LearningAnnual Review of Linguistics.

Pearl & Mis 2014. The role of indirect positive evidence in syntactic acquisition: A look at anaphoric one. Manuscript, UCI. [lingbuzz: http://ling.auf.net/lingbuzz/001922]

Wednesday, October 15, 2014

Next time on 10/29/14 @ 10:30am in SBSG 2221 = Barak et al. 2014

Thanks to everyone who was able to join us for our invigorating discussion of Lidz & Gagliardi 2014!  For our next meeting on Wednesday October 29 at 10:30am in SBSG 2221, we'll be looking at an article that investigates the acquisition of a particular subset of lexical items, known as mental state verbs (like "want", "wish", "think", "know"). This computational modeling study focuses on different syntactic information that children could be leveraging.

Barak, L., Fazly, A., & Stevenson, S. 2014. Gradual Acquisition of Mental State Meaning: A Computational Investigation. Proceedings of the Cognitive Science Society.


http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/BarakEtAl2014_LearningMentalStateVerbs.pdf


See you then!

Monday, October 13, 2014

Some thoughts on Lidz & Gagliardi 2014

My Bayesian-inclined brain really had a fun time trying to translate everything in this acquisition model into Bayesian terms, and I think it actually lends itself quite well to this -- model specification, model variables, inference, likelihood, etc. I'm almost wondering if it's worth doing this explicitly in another paper for this model (maybe for a different target audience, like a general cognitive sciences crowd). I think it'd make it easier to understand the nuances L&G highlight, since these nuances track so well with different aspects of Bayesian modeling. (More on this below.)

That being said, it took me a bit to wrap my head around the distinction between perceptual and acquisitional intake, probably because of that mapping I kept trying to do to the Bayesian terminology. I think in the end I sorted out exactly what each meant, but this is worth talking about more since they do (clearly) mean different things.  What I ended up with: perceptual intake is what can be reliably extracted from the input, while acquisitional intake is the subset relevant for the model variables (and of course the model/hypothesis space that defines those variables need to already be specified).

Related to this: It definitely seems like prior knowledge is involved to implement both intake types, but the nature of that prior knowledge is up for grabs. For example, if a learner is biased to weight cues differently for the acquisitional intake, does that come from prior experience about the reliability of these cues for forming generalizations, or is it specified in something like Universal Grammar, irrespective of how useful these cues have been previously? Either seems possible. To differentiate them, I guess you'd want to do what L&G are doing here, where you try to find situations where the information use doesn't map to the information reliability, since that's something that wouldn't be expected from derived prior knowledge. (Of course, then you have to have a very good idea about what exactly the child's prior experience was like, so that you could tell what they perceived the information reliability to be.)

One other general comment: I loved how careful L&G were to highlight when empirical evidence doesn't distinguish between theoretical viewpoints. So helpful. It really underscores why these theoretical viewpoints have persisted in the face of all the empirical data we now have available.

More specific comments:

(1) The mapping to Bayesian terms that I was able to make:
-- Universal Grammar = hypothesis space/model specification
Motivation:
(a) Abstract: "Universal Grammar provides representations that support deductions that fall outside of experience...these representations define the evidence the learners use..." -- Which makes sense, because if the model is specified, the relevant data are also specified (anything that impacts the model variables is relevant).
(b) p.6, "The UG component identifies the class of representations that shape the nature of human grammatical systems".

-- Perceptual Intake = parts of the input that could impact model variables
Motivation:
p.10, "contain[s]...information relevant to making inferences"

-- Acquisitional Intake = parts of the input that do impact model variables

-- Inference engine = likelihood?
Motivation:
(a) p.10, "...makes predictions about what the learner should expect to find in the environment"...presumably, given a particular hypothesis. So, this is basically a set of likelihoods (P(D | H)) for all the Hs in the hypothesis space (defined by UG, for example).
...except
(b) p.21, "...the inference engine, which selects specified features of that representation (the acquisitional intake) to derive conclusions about grammatical representations". This makes it sound like the inference engine is the one selecting the model variables, which doesn't sound like likelihood at all. Unless inference is over the model variables, which are already defined for each H.


-- Updated Grammar, deductive consequences = posterior over hypotheses
Motivation:
p.30, "...inferential, using distributional evidence to license conclusions about the abstract representations underlying language"
Even though L&G distinguish between inferential and deductive aspects, I feel like they're still talking about the hypothesis space. The inferential part is selecting the hypothesis (using the posterior) and the deductive consequences part is all the model variables that are connected to that hypothesis.


(2) The difference about inference: p.4, "On the input-driven view, abstract linguistic representations are arrived at by a process of generalization across specific cases...", and this is interpreted as "not inference" (in contrast to the knowledge-driven tradition). But a process of "generalization across specific cases" certainly sounds a lot like inference, because something has to determine exactly how that generalization is constrained (even if it's non-linguistic constraints like economy or something). So I'm not sure it's fair to say the input-driven approach doesn't use inference, per se. Instead, it sounds like the distinction L&G want is about how that inference is constrained (input-driven: non-linguistic constraints;  knowledge-driven: linguistic hypothesis space).


(3) Similarly, I also feel it's not quite fair to divide the world into "nothing like the input" (knowledge-driven) vs. "just like the input, only compressed" (input-driven) (p.5). Instead, it seems like this is more of a continuum, and some representations can be "not obviously" like the input, and yet still be derived from it. The key is knowing exactly what the derivation process is -- for example, for the knowledge-driven approach, the representations could be viewed as similar to the input at an abstract level, even if the surface representation looks very different.


(4) p.6, "...the statistical sensitivities of the learner are sometimes distinct from ideal-observer measures of informativity...reveal the role learners play in selecting relevant input to drive learning."  So if the learner has additional constraints (say, on how the perceptual intake is implemented), could these be incorporated into the learner assumptions that would make up an ideal learner model? That is, if we're not talking about constraints that are based on cognitive resources but are instead talking about learner biases, couldn't we build an ideal-observer model that has those biases? (Or maybe the point is that perceptual intake only comes from constraints on cognitive resources?)


(5) p.8, "...it must come from a projection beyond their experience". I think we have to be really careful about claiming this -- maybe "direct experience" is better, since even things you derive are based on some kind of experience, unless you assume everything about them is innate. But the basic point is that some previously-learned or innately-known stuff may matter for how the current direct experience is utilized.


(6) p.9, (referring to distribution of pronouns & interpretations), "...we are aware of no proposals outside the knowledge-driven tradition". Hello, modeling call! (Whether for the knowledge-driven theory, or other theories.)


(7) p.9, "...most work in generative linguistics has been the specification of these representations". I think some of the ire this approach has inspired from the non-generative community could be mitigated by considering which of these representations could be derived (and importantly, from what). It seems like not as many generative researchers (particularly ones who don't work a lot on acquisition) think about the origin of these representations. But since some of them can get quite complex, it rubs some people the wrong way to call them all innate. But really, it doesn't have to be that way -- some might be innate, true, but some of these specifications might be built up from other simpler innate components and/or derived from prior experience.


(8) p.15, "...predicted that the age of acquisition of a grammar with tense is a function of the degree to which the input unambiguously supports that kind of grammar..." And this highlights the importance of what counts as unambiguous data (which is basically data where likelihood p(D | H) is 0 for all but the correct H). And this clearly depends on the model variables involved in all the different Hs (which should be the same??).


(9) p.25, "...preference for using phonological information over semantic information likely reflects perceptual intake in the initial stages of noun class learning". So this could easily be a derived bias, but I would think we would still call it "knowledge-driven" -- it's just that it's derived knowledge, rather than innate knowledge that caused it.


(10) sections 6, Kannada empirical facts -- So interesting! Every time I see this, I always have a quiet moment of goggling. It seems like such an interesting challenge to figure out what these facts could be learned from. Something about binding? Something about goal-prominence? I feel like the top of p.35 has a parameter-style proposal linking possession constructions and these ditransitive facts, which would then be model variables. The Viau & Lidz 2011 proposal that cares about what kind of NPs are in different places also seems like another model variable. Of course, these are very specific pieces of knowledge about model variables...but still, will this actually work (like, can we implement a model that uses these variables and run it)? And if it does, can the more specific model variables be derived from other aspects of the input, or do you really have to know about those specific model variables?


(11) Future Issues, p.47: Yes.  All of these. Because modeling. (Especially 5, but really, all of them.)



Friday, October 3, 2014

Next time on 10/15/14 @ 10:30am in SBSG 2221 = Lidz & Gagliardi 2014

Hi everyone,

It looks like a good collective time to meet will be Wednesdays at 10:30am for this quarter, so that's what we'll plan on.  Our first meeting will be on October 15, and our complete schedule is available on the webpage at 



On October 15, we'll be looking at a review article that discusses a particular learning model drawing on language-specific and domain-general knowledge to explain the process of acquisition. For modelers, it's especially useful to consider the specific implementations proposed, as these are theoretically and empirically motivated learning strategies that can we can investigate via computational modeling.

Lidz, J. & Gagliardi, A. 2014 to appear. How Nature Meets Nurture: Universal Grammar and Statistical Learning. Annual Review of Linguistics.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/LidzGagliardi2014ToAppear_UGStats.pdf

See you on October 15!