Computational Models of Language (at UC Irvine): February 2014

Friday, February 28, 2014

Next time on 3/14/14 @ 3:20pm in SBSG 2221 = Goodman & Stuhlmüller 2013

Thanks to everyone who was able to join us for our exciting discussion of Levinson 2013 & the Legate et al. 2013 reply manuscript! We had some particularly interesting thoughts about potential recursion in non-syntactic domains, such as theory of mind, which relates to our article for Friday March 14 at 3:20pm in SBSG 2221: Goodman & Stuhlmüller 2013. In particular, G&S2013 discuss a rational model of how speakers interpret utterances, where listeners model the speaker's model of selecting utterances:

Goodman, N. & Stuhlmueller, A. 2013. Knowledge and Implicature: Modeling Language Understanding as Social Cognition.Topics in Cognitive Science, 5, 173-184.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/GoodmanStuhlmuller2013_LangSocialCogImplicatures.pdf

See you then!

Wednesday, February 26, 2014

Some thoughts on Levinson 2013 + Legate, Pesetsky, & Yang 2013 reply (manuscript)

One of the take-away points I had from Levinson 2013 [L13] was the idea that center-embedding is not a structural option specific to syntax, since there are examples of this same structural option in dialogue. I had the impression then that L13 wanted to use this to mean this particular type of recursion is not language-specific, as dialogue is using language to communicate information and it's the information communicated (via the speech acts) that's center-embedded. (At least, that's how I'm interpreting "speech acts" as "actions in linguistic clothing".) I'm not quite sure I believe that, since I would classify speech acts as a type of linguistic knowledge (specifically, how to translate intention into the specific linguistic form required to convey that intention). But suppose we classify this kind of knowledge as not really linguistic, per se -- then wouldn't the interesting question be about how unique this type of structural option is to human communication systems, since that relates to questions about the Faculty of Language (broad or narrow)? And presumably, this then links back to whether non-human animals can learn these syntactic structures (doesn't seem to be true as far as we know) or these type of embedded interactions (also doesn't seem to be true, I think?)?

As a general caveat, I should note that while I followed the simpler examples of center embedding in dialogue, I was much less clear about the more complex examples that involved multiple center-embeddings and cross-serial dependencies (for example, deciding that something was an embedded question rather than a serial follow-up, like in example (14), the middle of (16), some of the embeddings in (17)). This may be due to my very light background in pragmatics and dialogue analysis, however. Still, it seemed that Legate, Pesetsky, & Yang 2013 [LPY13] had similar reservations about some of these dialogue dependencies.

LPY13 also had very strong reactions to both the syntactic and computational claims in L13, in addition to these issues about how to assign structure to discourse. I was quite sympathetic to (and convinced by) LPY13's syntactic and computational objections as a whole, from the cross-linguistic frequency of embedding to the non-centrality of center embedding for recursion to the not-debate about whether natural languages were regular. They also brought out a very interesting point about the restrictions on center embedding in speech acts (example (13)), which seemed to match some of the restrictions observed in syntax. If it's true that these restrictions are there, and we see them in both linguistic (syntax) and potentially non-linguistic (speech act) areas, then maybe this is nice evidence for a domain-general restriction on processing this kind of structure. (And so maybe we should be looking for it seriously elsewhere too.)

More specific comments:

L13: There's a comment in section 4 about whether it's more complex to treat English as a large system of simple rules or a small system of complex rules. Isn't this exactly the kind of thing that rational inference gets at (e.g., Perfors, Tenenbaum, & Regier 2011 find that the context-free grammar works better than a regular or linear grammar on child-directed English speech -- as LPY13 note)? With respect to recursion, L13 cites the Perfors et al. 2010 study, which LPY13 correctly note doesn't have to do with regular languages vs. non-regular languages. Instead, that study finds that a mixture of recursive and non-recursive context-free rules (surprisingly) is the best, rather than having all recursive or all non-recursive rules, despite this seeming to duplicate a number of rules.

L13: Section 6, using the transformation from pidgin to creole as evidence for syntactic embedding coming from other capacities like joint action abilities: It's true that one of the hallmarks of pidgins vs. creoles is the added syntactic complexity, which (broadly speaking) seems to come from children learning the pidgin, adding syntactic structure to it that's regular and predictable, and ending up with something that has the same syntactic complexity as any other language. I'm not sure I understand why this tells us anything about where the syntactic complexity is coming from, other than something internal to the children (since they obviously aren't getting it from the pidgin in any direct way). Is it that these children are talking to each other, and it's the dialogue that provides a model for the embedded structures, for example?

LPY13: I'm not quite sure I agree with the objection LPY13 raise about whether dialogue embeddings represent structures (p.9). I agree that there don't seem to be very many restrictions, certainly when compared to syntactic structure. But just because there are multiple licit options doesn't mean there isn't a structure corresponding to each of them. It just may be exactly this: there are multiple possible structures that we allow in dialogue. So maybe this really more of an issue about how we tell structure is present (as opposed to linear "beads on a string", for example).

Friday, February 14, 2014

Next time on 2/28/14 @ 3:20pm in SBSG 2221 = Levinson 2013 + Legate et al. 2013 Manuscript

Thanks to everyone who was able to join us for our delightfully thoughtful discussion of the Omaki & Lidz 2013 manuscript! Next time on Friday February 28 at 3pm in SBSG 2221, we'll be looking at an article that argues that recursion is a central part of cognition, even if it's curiously restricted in the realm of syntax (Levinson 2013).

Levinson, S. 2013. Recursion in pragmatics. Language, 89(1), 149-162.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/Levinson2013_Recursion.pdf

In addition, we'll read a reply to that article that critically examines the assumptions underlying that argument (Legate et al. 2013 Manuscript).

Legate, J., Pesetsky, D., & Yang, C. 2013. Recursive Misrepresentations: a Reply to Levinson (2013). Revised version to appear in Language. Please do not cite without permission from Julie Legate.

http://www.socsci.uci.edu/~lpearl/colareadinggroup/readings/LegateEtAl2013_RecursiveMisrep.pdf

See you then!

-Lisa

Wednesday, February 12, 2014

Some thoughts on the Omaki & Lidz 2013 Manuscript

There are many things that made me happy about this manuscript as a modeler, not the least of which is the callout to modelers about what ought to be included in their models of language acquisition (hurrah for experimentally-motivated guidance!). For example, there's good reason to believe that a "noise parameter" that simply distorts the input in some way can be replaced by a more targeted perceptual intake noise parameter that distorts the input in particular ways. Also, I love how explicit O&L are about the observed vs. latent variables in their view of the acquisition process -- it makes me want to draw plate diagrams. And of course, I'm a huge fan of the distinction between input and intake.

Another thing that struck me was the effects incrementality could have. For example, it could cause prioritization of working memory constraints over repair costs, especially when repair is costly, because the data's coming at you now and you have to do something about it. This is discussed in light of the parser and syntax, but I'm wondering how it translates to other types of linguistic knowledge (and perhaps more basic things like word segmentation, lexical acquisition, and grammatical categorization). If this is about working memory constraints, we might expect it to apply whenever the child's "processor" (however that's instantiated for each of these tasks) gets overloaded. So, at the beginning of word segmentation, it's all about making your first guess and sticking to it (perhaps leading to snowball effects of mis-segmentation, as you use your mis-segmentations to segment other words). But maybe later, when you have more of a handle of word segmentation, it's easier to revise bad guesses (which is one way to recover from mis-segmentations, aside from continuing experience).

This relates to the cost of revision in areas besides syntax. In some sense, you might expect that cost is very much tied to how hard it is to construct the representation in the first place. For syntax (and the related sentential semantics), that can continue to be hard for a really long time, because these structures are so complex. And as you get better at it, it gets faster, so revision gets less costly. But looking at word segmentation, is constructing the "representation" ever that hard? (I'm trying to think what the "representation" would be, other than the identification of the lexical item, which seems pretty basic assuming you've abstracted to the phonemic level.) If not, then maybe word segmentation revision might be less costly, and so the swing from being revision-averse to revision-friendly might happen sooner for this task than in other tasks.

Some more targeted thoughts:

(i) One thing about the lovely schematic in Figure 1: I can definitely get behind the perceptual intake feeding the language acquisition device (LAD) and (eventually) feeding the action encoding, but I'm wondering why it's squished together with "linguistic representations". I would have imagined that perceptual intake directly feeds the LAD, and the LAD feeds the linguistic representation (which then feeds the action encoding). Is the idea that there's a transparent mapping between perceptual intake and linguistic representations, so separating them is unnecessary? And if so, where's the place for acquisitional intake (talked about in footnote 1 on p.7), which seems like it might come between perceptual intake and LAD?

(ii) I found it a bit funny that footnote 2 refers to the learning problem as "inference-under-uncertainty" rather than the more familiar "poverty of the stimulus" (PoS). Maybe PoS has too many other associations with it, and O&L just wanted to sidestep any misconceptions arising from the term? (In which case, probably a shrewd move.)

(iii) In trying to understand the relationship between vocabulary size and knowledge of pronoun interpretation (principle C), O&L note that children who had faster lexical access were not faster at computing principle C, so it's not simply that children who could access meaning faster were then able to do the overall computation faster. This means that the hypothesis that "more vocabulary" equals "better at dealing with word meaning", which equals "better at doing computations that require word meaning as input" can't be what's going on. So do we have any idea what the link between vocabulary size and principle C computation actually is? Is vocabulary size the result of some kind of knowledge or ability that would happen after initial lexical access, and so would be useful for computing principle C too? One thought that occurred to me was that someone who's good at extracting sentential level meaning (i.e., because their computations over words happen faster) might find it easier to learn new words in the first place. This then could lead to a larger vocabulary size. So, this underlying ability to compute meaning over utterances (including using principle C) could cause a larger vocabulary, rather than knowing lots of words causing faster computation.

(iv) I totally love the U-shaped development of filler-gap knowledge in the Gagliardi et al. (submitted) study. It's nice to see an example of this qualitative behavior in a realm besides morphology. The explanation seems similar, too -- a more sophisticated view of the input causes errors, which take some time to recover from. But the initial simplified view leads to surface behavior that seems right, even if the underlying representation isn't at that point. Hence, U-shaped performance curve. Same for the White et al. 2011 study -- U-shaped learning in syntactic bootstrapping for the win.

(v) I really liked the note on p.45 in the conclusion about how the input vs. intake distinction could really matter for L2 acquisition. It's nice to see some explicit ideas about what the skew that occurs is and why it might be occurring. (Basically, this feels like a more explicit form of the "less is more" hypothesis, where adult processing is distorting the input in predictable ways.)

Computational Models of Language (at UC Irvine)