Computational Models of Language (at UC Irvine): April 2014

Monday, April 14, 2014

Next time on 5/5/14 @ 2:15pm in SBSG 2221 = Orita et al. 2013

Thanks to everyone who was able to join us for our invigorating discussion of the Han et al. 2013 manuscript! Next time on May 5 @ 2:15pm in SBSG 2221, we'll be looking at an article that presents a Bayesian learning model for pronoun acquisition, with a special focus on the role of discourse information:

Orita, N., McKeown, R., Feldman, N. H., Lidz, J., & Boyd-Graber, J. 2013. Discovering Pronoun Categories using Discourse Information. Proceedings of the Cognitive Science Society.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/OritaEtAl2013_Pronouns.pdf

See you on May 5!

Thursday, April 10, 2014

Some thoughts on the Han et al. 2013 Manuscript

One of the things I greatly enjoyed about this paper is that it really takes a tricky learning issue seriously: What happens if you don't get any indicative data about a certain hypothesis space (in this case, defined as a set of possible grammars related to verb-raising)? Do humans just remain permanently ambivalent (which is a rational thing to do, and what I think any Bayesian model would do), or do they pick one (somehow)? The super-tricky thing in testing this, of course, is how you find something that humans have no input about and actually ascertain what grammar they picked. If there's no input (i.e., a relevant example in the language) that discerns between the grammar options, how do you tell?

And that's how we find ourselves in some fairly complex syntax and semantics involving quantifiers and negation in Korean, and their relationship to verb-raising. I did find myself somewhat surprised by the (apparent) simplicity of the test sentences (e.g., the equivalent of "The man did not wash every car in front of his house"). Because the sentences are so simple, I'm surprised they wouldn't occur at all in the input with the appropriate disambiguating contexts (i.e., the subset of these sentences that occur in a neg>every-compatible context, like the man washing 2 out of 3 of the cars in the above example). Maybe this is more about their relative sparseness, with the idea that while they may appear, they're so infrequent that they're just not noticeable by a human learner during the lifespan. But that starts to become a tricky argument when you get to adults -- you might think that adults encountering examples like these over time would eventually learn from them. (You might even argue that this happened between session one and session two for the adults that were tested one month apart: they learned (or solidified learning) from the examples in the first session and held onto that newly gained knowledge for the second session.)

One reason this matters is that there's a big difference between no data and sparse data for a Bayesian model. Nothing can be learned from no data, but something (even if it's only a very slight bias) can be learned from sparse data, assuming the learner pays attention to those data when they occur. For this reason, I'd be interested in some kind of corpus analysis of realistic Korean input of how often these type of negation + object quantifier sentences occur (with the appropriate disambiguating context ideally, but at the very least, how often the sentence structure itself occurs). If they really don't occur at all, this is interesting, since we have the idea described in the paper that humans are picking a hypothesis even in the absence of input (and then we have to investigate why). If these sentences do occur, but only very sparsely, this is still interesting, but speaks more about the frequency thresholds at which learning occurs.

Thursday, April 3, 2014

Next time on 4/14/14 @ 2:15pm in SBSG 2221 = Han et al. 2013 Manuscript

Hi everyone,

It looks like a good collective time to meet will be Mondays at 2:15pm for this quarter, so that's what we'll plan on. Our first meeting will be on April 14, and our complete schedule is available on the webpage at

http://www.socsci.uci.edu/~lpearl/colareadinggroup/schedule.html

On April 14, we'll be looking at an article that investigates how learners generalize in the absence of input that distinguishes between two hypotheses. This is an experimental paper that makes explicit links to the learning process and may provide fruitful data for acquisition modeling work.

Han, C., Lidz, J., & Musolino, J. 2013. Grammar Selection in the Absence of Evidence: Korean Scope and Verb-Raising Revisited. Manuscript, University of Maryland College Park and Rutgers University. Please do not cite without permission from the authors.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/HanLidzMusolino2013Manu_KoreanVRAcqGramSelect.pdf

See you on April 14!

Computational Models of Language (at UC Irvine)