Discussion board for the reading group based out of UCI.
Tuesday, December 6, 2011
Schedule for Winter 2012 available
The schedule of readings for winter 2012 is now available! We'll be looking at a variety of topics again, including word segmentation, morphology, and linguistic productivity.
Friday, November 18, 2011
Some thoughts on Mitchener & Becker (2011)
I really like that M&B are looking at a learning problem that would be interesting to both nativists and non-nativists (a lot of the time, it seems like the different sides are talking past each other on what problems they're trying to solve). I also really like that they're exploring a variety of different probabilistic learning models. It does seem that M&B are still approaching the learning problem from a strongly nativist perspective, given the way they've described the actual problem: the learner knows there are two classes of behavior that link syntactic structure to semantic interpretation (raising vs. control), and that there are specific cues the learner should use to figure out which behavior a given verb has (animacy & eventivity). Importantly, only those cues (and their distribution) are relevant. There also seems to be an implicit assumption (at least initially) that unambiguous data are required to distinguish the behavior of any given verb, and the learning problem results because unambiguous data aren't always available (this is a common way learnability problems are framed in a nativist perspective). One thing I wondered while reading this is what would happen if the behavior of these verbs was taken in the context of a larger system - that is, would it possibly be easier to recognize these distinct classes of verbs if other information were deemed relevant besides the two cues M&B look at? I believe they hint at this themselves in the paper - that it might be possible to look at the syntactic distribution of these verbs over all frames, rather than just the ambiguous frame that signals either raising or control (She VERBed to laugh). This doesn't solve the problem of knowing what the different linking rules are between structure and interpretation, but maybe it makes the classification problem (that there are distinct classes of verbs) easier.
Some more targeted thoughts:
- Footnote 2 talks about the issues of homophony, and I can certainly see that tend's meanings are pretty distinct between raising and regular transitive verb. However, happens looks like it means very similar things whether it's raising or regular transitive, so I wonder how children would make this distinction - or if they would at all. If not, then this looks like an additional class of verb that involves mixed behavior.
- The end of section 2 talks about how 3- and 4-year-olds are very sensitive to animacy when they interpret verbs in the ambiguous raising/control frame. I can completely believe that animacy might generally be a cue children use to help them figure out what things should mean (e.g., if a verb takes an agent or not).
- I really like the discussion/caveat that M&B do in the intro of section 4 about biological plausibility.
- I also really liked the discussion of the linear reward penalty (LRP) learner's issues in section 4.2.1. Not having an intermediate state equilibrium is problematic if you need there to be mixed behavior (e.g., something is ambiguous between raising and control). I admit, I was surprised by the saturating accumulator model M&B chose to implement to correct that problem. I had some trouble connecting the various pieces of it to the process in a child's mind - the intuitive mapping didn't work for me the way it does for the LRP learner. For example, the index they talk about right at the end of section 4.2.2 seems fairly ad-hoc and requires children to do abstracting over patterns of frames defined by these different semantic cues.
Some more targeted thoughts:
- Footnote 2 talks about the issues of homophony, and I can certainly see that tend's meanings are pretty distinct between raising and regular transitive verb. However, happens looks like it means very similar things whether it's raising or regular transitive, so I wonder how children would make this distinction - or if they would at all. If not, then this looks like an additional class of verb that involves mixed behavior.
- The end of section 2 talks about how 3- and 4-year-olds are very sensitive to animacy when they interpret verbs in the ambiguous raising/control frame. I can completely believe that animacy might generally be a cue children use to help them figure out what things should mean (e.g., if a verb takes an agent or not).
- I really like the discussion/caveat that M&B do in the intro of section 4 about biological plausibility.
- I also really liked the discussion of the linear reward penalty (LRP) learner's issues in section 4.2.1. Not having an intermediate state equilibrium is problematic if you need there to be mixed behavior (e.g., something is ambiguous between raising and control). I admit, I was surprised by the saturating accumulator model M&B chose to implement to correct that problem. I had some trouble connecting the various pieces of it to the process in a child's mind - the intuitive mapping didn't work for me the way it does for the LRP learner. For example, the index they talk about right at the end of section 4.2.2 seems fairly ad-hoc and requires children to do abstracting over patterns of frames defined by these different semantic cues.
Tuesday, November 8, 2011
Next time on 11/21: Mitchener & Becker (2011)
Thanks to those of you who were able to join our nicely in-depth discussion of Alishahi & Pyykkonen (2011)'s article on syntactic bootstrapping! I think we figured out some of the details that were glossed over, and these really helped to understand the contribution of the study.
Next time, on Nov 21 (@3pm in SBSG 2221), we'll be looking at an article that examines how a subtle syntactic distinction that has specific semantic implications (called the raising-control distinction) could be learned.
Mitchener, G. & Becker, M. (2011). Computational Models of Learning the Raising-Control Distinction. Research on Language and Computation, 8(2), 169-207.
See you then!
Friday, November 4, 2011
Some thoughts on Alishahi & Pyykkonen (2011)
I really like the investigation of syntactic bootstrapping in this kind of computational manner. While experimental approaches like the Human Simulation Paradigm (HSP) offer us certain insights about how (usually adult) humans use different kinds of information, they have certain limitations that the computational learner doesn't (such as the researcher knowing exactly what the internal knowledge state is, and how it changes). From my perspective, the HSP with adults (and maybe even with 7-year-olds) is a kind of ideal learner approach, because it asks what inferences can be made with maximal knowledge about (the native) language - so while it clearly involves human processing limitations, it's examining the best that humans could reasonably be expected to do in a task that's similar to what word-learners might be doing. The computational learner is much more limited in the knowledge it has access to a priori, and I think the researchers really tried to give it reasonable approximations of what very young children might know about different language aspects. In addition, as A & P mention, the ability to track the time course of learning is a nice feature (though with some caveats with respect to implementation limitations).
Some more targeted thoughts:
I thought the probabilistic accuracy was a clever measure for taking advantage of the distribution over words that the learner calculates.
As I said above, tracking learning over time is an admirable goal - however, the modeled learner here clearly is only qualitatively doing this, since there's such a spike in performance between 0 and 100 training examples. I'm assuming A & P would say that children's inference procedures are much noisier than this (and so it would take children longer), unless there's evidence that children really do learn the exact correct meaning in under 100 examples (possible, but seems unlikely to me).
I was a little surprised that A & P didn't discuss the difference in Figure 1 between the top and bottom panel with respect to the -LI condition. (This was probably due to the length constraints, but still.) It's a bit mystifying to me how absolute accuracy could be close to the +LI condition while verb improvement is much lower than the +LI condition. I guess this means the baseline for verb improvement was different between the +LI and -LI conditions somehow?
It was indeed interesting to see that having no linguistic information (-LI) was actually beneficial for noun-learning - I would have thought noun-learning would also be helped by linguistic context. A & P speculate that this is because early nouns refer to observable concepts (e.g., concrete objects) and/or the nature of the training corpus made the linguistic context for nouns more ambiguous than for verbs. (The latter reason ties into the linguistic context more.) I wonder if this effect would persist with a different training corpus (after all, there were some assumptions A & P made when constructing this corpus - they seemed reasonable, but there are still different ways to construct the corpus.)
Some more targeted thoughts:
I thought the probabilistic accuracy was a clever measure for taking advantage of the distribution over words that the learner calculates.
As I said above, tracking learning over time is an admirable goal - however, the modeled learner here clearly is only qualitatively doing this, since there's such a spike in performance between 0 and 100 training examples. I'm assuming A & P would say that children's inference procedures are much noisier than this (and so it would take children longer), unless there's evidence that children really do learn the exact correct meaning in under 100 examples (possible, but seems unlikely to me).
I was a little surprised that A & P didn't discuss the difference in Figure 1 between the top and bottom panel with respect to the -LI condition. (This was probably due to the length constraints, but still.) It's a bit mystifying to me how absolute accuracy could be close to the +LI condition while verb improvement is much lower than the +LI condition. I guess this means the baseline for verb improvement was different between the +LI and -LI conditions somehow?
It was indeed interesting to see that having no linguistic information (-LI) was actually beneficial for noun-learning - I would have thought noun-learning would also be helped by linguistic context. A & P speculate that this is because early nouns refer to observable concepts (e.g., concrete objects) and/or the nature of the training corpus made the linguistic context for nouns more ambiguous than for verbs. (The latter reason ties into the linguistic context more.) I wonder if this effect would persist with a different training corpus (after all, there were some assumptions A & P made when constructing this corpus - they seemed reasonable, but there are still different ways to construct the corpus.)
Monday, October 17, 2011
Next time: Alishahi & Pyykkonen (2011)
Thanks to those of you who were able to join our nicely in-depth discussion today of Dillon et al. (2011)'s article on applying Bayesian models to phonological acquisition! Next time on 11/7 (@3:30pm in SBSG 2221), we'll be discussing an article that looks at the phenomenon of syntactic bootstrapping, which is the ability to infer word meaning and abstract structure associated with that word from the syntactic context of the word:
Alishahi, A. & Pyykkonen, P. (2011). The onset of syntactic bootstrapping in word learning: Evidence from a computational study. Proceedings of the 33nd Annual Conference of the Cognitive Science Society, Boston, MA.
See you then!
Alishahi, A. & Pyykkonen, P. (2011). The onset of syntactic bootstrapping in word learning: Evidence from a computational study. Proceedings of the 33nd Annual Conference of the Cognitive Science Society, Boston, MA.
See you then!
Friday, October 14, 2011
Some thoughts on Dillon et al. (2011)
I'm really fond of this paper - I love that they're tackling realistic problems (with realistic language data), that they're seriously looking at the state of the art with respect to computational models of it, and that they're finding a way to connect linguistic theory (e.g., "There are phonological rules") with this level of concreteness (e.g., "Let's make them linear models operating over acoustic space"). Because of all this, I think their point about the potential issues of two-stage models comes across very clearly. And I love that that they can make a model that learns both phonemes and their relationships between phonetic categories simultaneously. Moreover, the fact they can do this without trying to learn a lexicon simultaneously (like Feldman, Griffiths, & Morgan (2009) do) is impressive to me, since that was the main thing that seemed to lead to good results for Feldman et al. (2009). Notably, they make use of the linguistic context (i.e., does a uvualar consonant follow), which is something Swingley (2009) recently suggested looks really helpful for English phonemes in a review of infant phoneme learning.
A few more targeted thoughts:
References:
A few more targeted thoughts:
- I really like that they note the three-vowel +allophones system is not just a special weirdness of Inuktitut, but rather something that occurs in a number of different languages. This makes it more important to be able to account for this kind of data, and bolsters support for the single stage model.
- I also thought it was useful to note that the EM approach follows the frequentist tradition. After a moment's reflection, this is clearly true, but it didn't occur to me until they pointed it out.
- Because of the nature of the Bayesian model, the more data that come in, the more the model is likely to prefer more categories over less (and the explanation they give for this just before the discussion of Expt 1 is entirely sensible). This carries over even for their cool Expt 3 model that learns categories and rules simultaneously (as we can see in Table 6) - the 12000 data point model is much more likely to posit 4 or 5 categories than the 1000 data point model. I'm wondering what this means for actual acquisition. Should we expect that infants learn very quickly and so end up with 3 categories + rules? Or would we expect that infants might go through a stage where they have 4 or 5 categories, and have to recover (maybe based on doing word segmentation/lexicon item discovery)?
- For the one-stage model in Expt 3, they mention that they build in a bias for complementary distribution - is this an uncontroversial assumption (or easy to derive from innate abilities we know infants do have)? I honestly don't have strong intuitions about this. It'd be great if it was.
References:
Feldman, N., Griffiths, T., and Morgan, J. (2009). Learning phonetic categories by learning a lexicon. Proceedings of the 31st Annual Conference on Cognitive Science.
Swingley, D. (2009). Contributions of infant word learning to language development. Philosophical Transactions of the Royal Society B, 364, 3617-3632.
Monday, October 3, 2011
Thanks to those of you who were able to join our spirited discussion today of Dunbar et al. (2010)'s article on Bayesian reasoning in linguistics! Next time on 10/17 (@3pm in SBSG 2221), we'll be discussing an article by the same crew of authors that models the acquisition of specific phonological phenomena:
Dillon, B., Dunbar, E., & Idsardi, B. (2011 ms). A single stage approach to learning phonological categories: Insights from Inuktitut. University of Maryland, College Park and University of Massachusetts, Amherst.
See you then!
Dillon, B., Dunbar, E., & Idsardi, B. (2011 ms). A single stage approach to learning phonological categories: Insights from Inuktitut. University of Maryland, College Park and University of Massachusetts, Amherst.
See you then!
Friday, September 30, 2011
Some thoughts on Dunbar et al. (2010)
This is probably one of the more linguistically technical articles we've read in the group to date, but I think that even if the linguistic details weren't as accessible to someone without a linguistic background, there was still a very good, basic point made about the simplicity of abstract structures, given principles of Bayesian reasoning. On the one hand, this might seem surprising since adding another layer of representation might seem de facto more complex; on the other hand, there's something clearly simpler about having three basic units of representation instead of six (for instance).
Some more targeted thoughts:
p.7: The particular example they discuss involving phonemes (specifically, three with derivational rules vs. six with no need for derivational rules) - this reminds me of Perfors et al. (2010), where they were looking at recursion in language, also from a Bayesian perspective. In that case, the decision was between a non-recursive grammar, a partially-recursive grammar, and a fully recursive grammar. The outcome turned out to be that for different structures (subject embedding vs. object embedding), different grammars fit the data best, with one of the winners being the partially-recursive grammar. In essence, this is a "direct store + some computation" approach. For the phoneme example in Dunbar et al., it seems like the choices are between "direct store of six" vs. "store three + some computation", and the "some computation" option ends up being the best. (Related note on p.30: I agree that it would be nice to have formal theoretical debates take place at this level when discussing learnability, rather than relying on intuitions of whether computation or direct storage is more complex/costly.)
p.9: Just a quick note about their justification of looking for a theoretically optimal solution (using the ideal learner paradigm, essentially) - I do agree that this has a place in acquisition studies. Basically, if you formulate a problem (and accompanying hypothesis space), and then find that this problem is unsolvable by an ideal learner, this is a clue that some thing is not right - maybe it's the hypothesis space, maybe it's a missing learning bias on how to use the data, etc.
p.14: Another main message of the authors: "Probability theory...is simply a way...of formalizing reasoning under uncertainty." I get the impression that this is to persuade readers who aren't normally very fond of probability.
Some more targeted thoughts:
p.7: The particular example they discuss involving phonemes (specifically, three with derivational rules vs. six with no need for derivational rules) - this reminds me of Perfors et al. (2010), where they were looking at recursion in language, also from a Bayesian perspective. In that case, the decision was between a non-recursive grammar, a partially-recursive grammar, and a fully recursive grammar. The outcome turned out to be that for different structures (subject embedding vs. object embedding), different grammars fit the data best, with one of the winners being the partially-recursive grammar. In essence, this is a "direct store + some computation" approach. For the phoneme example in Dunbar et al., it seems like the choices are between "direct store of six" vs. "store three + some computation", and the "some computation" option ends up being the best. (Related note on p.30: I agree that it would be nice to have formal theoretical debates take place at this level when discussing learnability, rather than relying on intuitions of whether computation or direct storage is more complex/costly.)
p.9: Just a quick note about their justification of looking for a theoretically optimal solution (using the ideal learner paradigm, essentially) - I do agree that this has a place in acquisition studies. Basically, if you formulate a problem (and accompanying hypothesis space), and then find that this problem is unsolvable by an ideal learner, this is a clue that some thing is not right - maybe it's the hypothesis space, maybe it's a missing learning bias on how to use the data, etc.
p.14: Another main message of the authors: "Probability theory...is simply a way...of formalizing reasoning under uncertainty." I get the impression that this is to persuade readers who aren't normally very fond of probability.
Monday, September 26, 2011
Welcome back!
The CoLa Reading Group will be holding its meetings on Mondays at 3pm in SBSG 2221 this quarter. We'll meet four times during the quarter, approximately every other week (schedule available here). Our first meeting will be held this coming Monday October 3rd, when we'll be looking at Dunbar, Dillon, & Idsardi (2010), who examine the utility of abstract linguistic representations viewed from a Bayesian perspective:
Dunbar, E., Dillon, B., & Idsardi, W. (2010 ms) A Bayesian Evaluation of the Cost of Abstractness. University of Maryland, College Park and University of Massachusetts, Amherst.
And remember: Even if you aren't able to come to the meeting in person, you're always welcome (and encouraged) to post on the reading group discussion board here!
See you next Monday!
Dunbar, E., Dillon, B., & Idsardi, W. (2010 ms) A Bayesian Evaluation of the Cost of Abstractness. University of Maryland, College Park and University of Massachusetts, Amherst.
And remember: Even if you aren't able to come to the meeting in person, you're always welcome (and encouraged) to post on the reading group discussion board here!
See you next Monday!
Monday, May 23, 2011
Thanks and see you at the end of the summer!
Thanks to everyone who was able to join us for our discussion of Clark & Lappin's (2011) article - as usual, we had quite the rousing debate about various points! This concludes the reading group activities for the spring quarter. We'll be picking up again at the end of the summer, around late August. Have a good break!
Friday, May 20, 2011
Thoughts on Clark & Lappin (2011)
I was really pleased with the overall approach of this paper, particularly how it discussed integrating a probabilistic component into learnability theory, the emphasis on the importance of tractable cognition, and the mention about how it's important to identify efficient algorithms for acquisition even if you already know the hypothesis space, such as in the principle & parameters framework (that last bit is particularly near and dear to my academic heart). It really seems like C&L take a solid psychological perspective, even though they're often dealing with idealized scenarios. To me, this seems to echo the original intuitions of generative grammarians - something like "we realize things are more complicated, but we can get a long way by making sensible simplifications".
Some more targeted thoughts:
Some more targeted thoughts:
- In the introductory bit on p.3, I was surprised at the continuing use of "strong language specific learning biases" (in contrast to domain-general learning biases). Maybe this is because that's the kind of language-specific biases nativists often claim, but to me, any innate domain-specific bias would be part of Universal Grammar (UG), whether it's strong or not.
- (p.4) I thought the separation between the learnability of a particular grammar formalism and the learnability of a class of natural languages was very nice. It does seem like sometimes the motivation for UG learning biases comes from assuming a particular representation that's being learned, rather than accounting for empirical coverage of the data.
- (p.5) It seemed odd to me to say that it's not necessary to incorporate a characterization of the hypothesis space into the learner, but rather that the "design of the learner" will limit the hypothesis space. Is the difference that the hypothesis space is explicitly built in in the first case while in the second case, it's implicit (via some other constraints on how the learner learns)?
- (describing Gold's results and the PAC framework) I thought the step-through of the various learnability results was remarkably clear. I've seen a number of different attempts to do just the basic Gold one about identifiability in the limit from positive evidence, and this is definitely one of the best. In particular, C&L really take care to point out the limitations of each of the results (as they apply to acquisition) simply and concisely.
- (p.11) Mostly, I just found myself saying "Yes!" enthusiastically at the end of section 3, where C&L talk about how learnability connects to acquisition.
- (p.13) I also appreciated the explicit connection being made to current probabilistic techniques, such as MDL and Bayesian inference.
- (p.20) When talking about efficient learning, C&L say "...the core issues in learning concern efficient inference from probabilistic data and assumptions". It's the "assumptions" part that I think the focus of the debate is on - assumptions about the data, assumptions about what's generating the data, something else? What kind of assumptions are these and how/why does the learner make them?
- (p.22) I admit great curiosity in the Distributional Lattice Grammars, since they apparently have empirical coverage, a syntax-semantics interface, and good learnability properties. This really underscores how the representation we think children are trying to learn will determine what they need to learn it. Maybe this is something to read about in more detail in the fall...
- Section 7 (starting on p.25): After all the emphasis on targeting learnability research to be more informative to acquisition, I was a bit surprised to see the concluding discussion about machine learning (ML) methods (especially the supervised ML methods). While it's true that unsupervised learning is much closer to the acquisition problem, it seems like this loses the point about tractable cognition (i.e., use strategies humans could use).
Monday, May 9, 2011
Next time: Clark & Lappin (2011)
Thanks to everyone who was able to join us for our spirited discussion of Heinz (2010)'s computational phonology papers! I think we really brought up some excellent points and made some interesting connections between computational phonology and cognition.
Next time on May 23rd, we'll be looking into another study on formal mathematical representations of grammar inference, and how they connect to human language learning:
Clark & Lappin (2011)
See you then!
Next time on May 23rd, we'll be looking into another study on formal mathematical representations of grammar inference, and how they connect to human language learning:
Clark & Lappin (2011)
See you then!
Friday, May 6, 2011
Thoughts on Heinz (2010a + b)
Again, I was surprised by how fast I read through these two papers - Heinz definitely knows how to explain abstract concepts in very comprehensible ways. One thing I did notice about these papers was that there were parts that whirled by almost a little too quickly, so that instead of giving background for someone not already in the know about computational phonology, they felt more like a brief literature review for someone already familiar with the relevant concepts. Still, I liked that Heinz was pretty up front about the goal of computational phonology - identifying the shape of the human phonological system (its universal properties, etc.). This definitely feels like a cognitive science-oriented approach, even if the specifics sometimes seem a little disconnected from what we might normally think of as the psychology of language.
Some more specific thoughts:
Some more specific thoughts:
- The discussion of finding a theory with the right balance between restrictiveness and expressiveness reminded me very much of Bayesian inference (find the hypothesis that has the best balance between simplicity and fit).
- My inner theoretical computer science geek was pretty happy about the discussion of problems and algorithms and tractability, and the like. When discussing determinism, though, I do think there's some wiggle room with respect to non-deterministic processes (i.e., those that guess when unsure). A number of acquisition models incorporate some aspect of probabilistically-informed guessing, with reasonable success.
- I thought the outline of phonological problems in particular (on p.9 of the first paper) neatly described a number of different interesting questions. I think the recognition problem is something like what psycholinguists would call parsing, while the phonotactic learning problem is what psycholinguists would generally call acquisition.
- I believe Heinz mentions that transducers aren't necessarily the mental representation of grammars, but a lot of the conclusions he mentions seems predicated on that being true in order for the conclusions to have psychological relevance. That is, if the mental representations of grammar aren't something like the transducers discussed here, how informative is it to know that a surface form can be computed in so many steps, etc.? Or maybe there's still a way to translate that kind of conclusion, even if transducers aren't similar to the grammar representation?
- The fact that two grammar formalisms (SPE and 2LP) are functionally the same is an interesting conclusion. What should then choose between them, besides personal preference? Ease of acquisition maybe?
- I really liked the discussion distinguishing simulations from demonstrations. I think that pretty much all of my recent models seem to fall more under the demonstration category.
Tuesday, April 26, 2011
Next time: Heinz (2010), parts 1 and 2
Thanks to everyone who was able to join us for a very interesting discussion of Heinz's (2009) computational phonology paper! Next time on May 9th, we'll be continuing to look into the enterprise of computational phonology and what it can offer to both theoretical linguistics and language acquisition.
Computational Phonology, Part 1
Computational Phonology, Part 2
See you then!
Computational Phonology, Part 1
Computational Phonology, Part 2
See you then!
Friday, April 22, 2011
Thoughts on Heinz (2009)
One of the things I really enjoyed about this paper was the clarity of its description - I thought Heinz did a wonderful job stepping the reader through the fundamentals and building up his main point. I also really like the approach of connecting the hypothesis space to learner bias. While Heinz's focus in this paper is more on learnability in the limit, I still think some very useful connections were made to language acquisition by children. I was also very fond of the way "don't count beyond two" falls out of the notion of neighborhood-distinctness. Then, the question is how neighborhood-distinctness comes about. Is it a constraint on the learner's abstraction abilities (as Heinz walks us through learners abstracting over prefixes and suffixes)? Is it just a more compact representation, and so that's simpler (Bayesian bias for simplicity)?
Some more targeted thoughts:
Some more targeted thoughts:
- I think one of the claims is that the hypothesis space of the learner would exclude hypotheses allowing stress patterns that don't obey neighborhood-distinctness (ND). It seems like a slightly more relaxed version of this could be beneficial: Specifically, the hypothesis space includes stress patterns that don't obey ND, but the learner is strongly biased against these patterns. In order to learn languages that violate ND, the learner has to overcome this bias. So, these non-ND language patterns are in the hypothesis space, but their prior (in Bayesian terms) is low. From this, we could predict that non-ND languages take longer for children to learn - we might expect children to follow ND generalizations for some time before they finally overcome their initial bias based on the data.
- A tangent on the point above - I noticed several examples Heinz mentioned (that were non-ND) involved words of four or five syllables. While of course this varies from language to language, English child-directed speech contains very few words of 4+ syllables. If this is something generally true of languages (and we might imagine some Zipfian distribution that makes it so), maybe that's another reason children might end up with ND hypotheses about stress patterns, and have to unlearn them (or learn exceptions).
- How translatable is the FBLearner and the FSA representation to human acquisition algorithms and human representations? I can see how this might be meant as a direct correlate, especially given the attention given to an online variant of the learning algorithm.
Monday, April 11, 2011
Next time: Heinz (2009)
Thanks to everyone who joined us (virtually or in person) to discuss phonological learning from acoustic data! We definitely had some good suggestions on how to extend the work in Yu (2010) both computationally and experimentally. Next time on April 25th, we'll be turning to a computational study on stress systems by Heinz (2009), which provides a way to connect properties of the learner with typological properties of observed stress systems:
Heinz(2009): On the role of locality in learning stress patterns
See you then!
Heinz(2009): On the role of locality in learning stress patterns
See you then!
Friday, April 8, 2011
Thoughts on Yu (2010)
I'm really pleased with the way Yu strives to position the question she's exploring in this paper in the larger framework of language acquisition. Another strength of this paper for me is the empirical basis in acoustic data. I think a fair number of phonological category learning models do this, but you don't see this in as many word segmentation models (which usually assume that learners have abstracted phonemes already). One thing that I was hoping to see but didn't: Yu talks about how it's important to figure out the right set of acoustic features/cues for learners to pay attention to, but then doesn't really offer a way to sort through the potentially infinite number of relevant acoustic features. Her work here focuses more on distinguishing the effectiveness of already proposed features (which is certainly an important thing to do).
Some more targeted thoughts:
Some more targeted thoughts:
- On p.5, Yu mentions how the learning problem can be thought of effectively as a poverty of the stimulus (PoS) problem, because the learner has to generalize from a finite set to generalizations that cover infinite sets. I do get this, but it does seem like this might be an easier generalization (in this particular case) to make than some of the problems that are traditionally held up as poverty of the stimulus (say, in syntax). This is because the acoustic data points available might be best fit by a generalization that's close enough to the truth - not every data point appears, but enough appear that are spread out sufficiently. On the other hand, a harder PoS problem would be if the data points that appear are most compatible with a generalization that is in fact the wrong one (here, if the proper ellipsis was actually much much bigger than the observed data suggest, and only extended along a particular dimension, for example).
- On p.6, in footnote 1, we can really see the differences in approach to (morpho)syntax taken by linguistics vs. computational linguistics. I believe it's standard to assume a probabilistic distribution over whatever units you're working with, which has to map real-values, while in linguistics it's more standard to assume a categorical (discrete) approach. (Though of course there are linguists who adopt a probabilistic approach by default - I just think they're not in the majority in generative linguistic circles.)
- on p.12, where Yu notes that there are distinctions between adult-directed and child-directed speech, and justifies the decision to use adult-directed speech: While I can certainly understand the practical motivations for doing this, it would be really good to know how different adult-directed speech is compared to child-directed speech, particularly for the acoustic properties that Yu is interested in with respect to tone. I have the (possibly mistaken) impression that there might be quite significant differences.
Monday, March 14, 2011
CoLa Schedule for Spring 2011 now up!
Thanks to everyone who was able to join us winter quarter! We definitely covered some very interesting material and had some invigorating discussions. The schedule is now tentatively set for Spring 2011, with a new time that seems likely to accommodate more people's schedules (based on the feedback I received):
Every other Monday, beginning April 11th
Time: 2:15pm - 3:30pm
Location: SSL 427
The readings we'll be looking at this quarter focus on computational models of phonological learning (stress, tone, phonological categories, etc.), as well as some computational learning theory papers that attempt to connect with the cognitive process of language learning. More details can be found at the schedule page, with links to both the readings and related reference materials.
Every other Monday, beginning April 11th
Time: 2:15pm - 3:30pm
Location: SSL 427
The readings we'll be looking at this quarter focus on computational models of phonological learning (stress, tone, phonological categories, etc.), as well as some computational learning theory papers that attempt to connect with the cognitive process of language learning. More details can be found at the schedule page, with links to both the readings and related reference materials.
Monday, March 7, 2011
Thoughts on Progovac (2010)
I definitely am sympathetic to Progovac's introductory point about the mismatch between the units of syntax and the units of neuroscience currently, and so I like her idea of trying to identify syntactic primitives that might map more naturally to what we know about biology. And I'm also sympathetic to the small clause as syntactic primitive, since it could form the basis of other structures (transitive, intransitive, etc.) and so there's a nice evolution from less complex to more complex that goes along with it. That being said, I'm not quite sure I see the connection between small clauses and brain biology.
Some additional thoughts:
Some additional thoughts:
- On p.239, where she discusses exocentric verbal compounds like daredevil as proto-syntax evidence that transitivity emerged later than intransitivity: Clearly, many of these seem to be modern words. Is the idea that they persist because they make use of ancient processing structures in the brain, while other kinds of exocentric compounds that could exist (but don't) don't use these same structures?
- On p.244, where she discusses Serbian unaccusative clauses as evidence for the small clause as the basic primitive: Serbian appears to have the same intransitive order as English (Subject Verb) - should we expect to see more small clauses with unaccusative verbs like "Come November" in English? Since we don't (as footnote 16 seems to indicate), why don't we? Is this just an accident of history that these don't persist, even though they're based on the small clause structure that's processed by the ancient brain structures?
- p.245, where the claim is made that formulaic speech like idioms is processed by more ancient structures in the brain: I can certainly see why idioms would be processed more like one big word, but why should they be processed by more ancient brain structures? Is it that they lack more syntactic structure and so are more equivalent to the minimal syntactic structure that small clauses exhibit?
- p.248: I like the discussion of subjacency restrictions as deriving from fossilized syntactic structures, since subjacency has been notoriously tricky to properly characterize. With respect to acquisition, would it be that children have to learn which constructions are mapped to these different kinds of fossil structures? Or would it be that by their very nature (semantic? syntactic?), these constructions are processed differently in the brain?
- p.249, footnote 19: I understand that there's very little evidence (if any) to draw on about the actual evolution that led to the structures we see today in languages, but the story about how noticing the non-dominant hand during bimanual toolmaking is linked to (noticing?) the topic of a sentence seems like a bit of a stretch.
Wednesday, February 23, 2011
Next time: Progovac (2010)
Thanks to everyone who was able to join us for a spirited discussion of the linguistic evolution perspective in Chater & Christiansen (2010)! Next time on March 9th, we'll be reading a paper on linguistic evolution by Progovac (2010) that is likely to have a more nativist perspective:
Progovac (2010)
Note: We'll be meeting slightly earlier on March 9th - starting at 1pm, instead of 1:15pm. We'll use the same location as this past meeting: SSL 427.
Progovac (2010)
Note: We'll be meeting slightly earlier on March 9th - starting at 1pm, instead of 1:15pm. We'll use the same location as this past meeting: SSL 427.
Monday, February 21, 2011
Thoughts on Chater & Christiansen (2010)
I think one of the main things that stuck with me throughout this paper was the assumption made about UG: namely, that whatever is in it is, by definition, arbitrary. I'll certainly grant that this is one way the knowledge of UG has been characterized sometimes, but I'm not sure I agree that it's a necessary property. So, if we find out that innate, language-specific biases have their origins in, say, pragmatic/communication biases, does that make them not UG? To my mind, it doesn't. But I think it does for the authors - and that seems to be one of their main claims for saying it couldn't possibly have evolved. So they strike out arbitrary, innate, domain-specific knowledge - does that means that all innate, domain-specific knowledge is ruled out? It seems like they want to claim that (and instead attribute everything to innate, domain-general processes), but I don't think I agree.
That being said, I do sympathize with the position that it seems more likely that language adapted to the brain, rather than the brain adapting to language. It seems reasonable to me that language evolution is too fast for the processes of gene evolution to catch up with, i.e., language is too much a "moving target".
A few things that also stuck with me:
p. 1135, where they say that aspects of language that are difficult to learn or process will be rapidly stamped out: Given this, should we expect that all persistent gaps (i.e., Russian inflectional paradigms, or why you can't say "Who did you see who did that?" to mean "Who did you see do that?" in English, even though you can do the equivalent in German) to be more difficult in some way? I suppose it's possible, but it seems less plausible to me.
p.1143: I love that they're looking at binding phenomena, because it's true that this is traditionally been an example held up by UG proponents as something that is very likely to be a part of UG. I'm not quite sure that their story about dependency resolution (how clauses get "closed off") would work for all environments where we use regular pronouns instead of reflexive ones, though. However, I think they're satisfied to show at least as few connections between these syntax principles and other non-syntax constraints - they say something to the effect of "This doesn't account for everything, but since no one can account for anything, this is as good as anything else." While it's true that no story accounts for everything yet, I suspect the syntactic accounts might go a bit further than the account the authors have sketched here.
That being said, I do sympathize with the position that it seems more likely that language adapted to the brain, rather than the brain adapting to language. It seems reasonable to me that language evolution is too fast for the processes of gene evolution to catch up with, i.e., language is too much a "moving target".
A few things that also stuck with me:
p. 1135, where they say that aspects of language that are difficult to learn or process will be rapidly stamped out: Given this, should we expect that all persistent gaps (i.e., Russian inflectional paradigms, or why you can't say "Who did you see who did that?" to mean "Who did you see do that?" in English, even though you can do the equivalent in German) to be more difficult in some way? I suppose it's possible, but it seems less plausible to me.
p.1143: I love that they're looking at binding phenomena, because it's true that this is traditionally been an example held up by UG proponents as something that is very likely to be a part of UG. I'm not quite sure that their story about dependency resolution (how clauses get "closed off") would work for all environments where we use regular pronouns instead of reflexive ones, though. However, I think they're satisfied to show at least as few connections between these syntax principles and other non-syntax constraints - they say something to the effect of "This doesn't account for everything, but since no one can account for anything, this is as good as anything else." While it's true that no story accounts for everything yet, I suspect the syntactic accounts might go a bit further than the account the authors have sketched here.
Wednesday, February 9, 2011
Next time: Chater & Christiansen (2010)
Thanks to all who were able to join us for our exciting discussion of the articles on language change and language acquisition! Next time on Feb 23, we'll be continuing to look at the intersection of these two fields, with a recent article by Chater & Christiansen:
Language Acquisition Meets Language Evolution
Note: We will be changing our location to SSL 427.
Language Acquisition Meets Language Evolution
Note: We will be changing our location to SSL 427.
Monday, February 7, 2011
Thoughts on Hruschka et al. (2009) & Lightfoot (2010)
So the Hruschka et al. (2009) article seems to be targeting a wider audience, and is obviously a collaborative effort of many researchers from different backgrounds. One of the aspects I really thought was interesting was the stuff in Box 1, where they talk about how acquisition biases that are too weak to appear in psycholinguistic experiments might show up when learning persists over generations. This is the iterated learning paradigm stuff, which I think is very interesting indeed, and I would love to think about to apply this to more sophisticated linguistic stuff (like the phenomena that Lightfoot (2010) talks about, as opposed to something like verb-object vs. object-verb word order). They do mention a study by Daland et al that examines the persistence of lexical gaps, and this starts to get into more complex territory, I think. But can something like that be applied to the persistence of island constraints in syntax, for example?
Something that struck me as a little funny in the Hruschka et al. article: they claim to present a framework for language change understanding, as exemplified in figure 1. But to me, all that looks like they're saying is "Yes, all these things are important." Maybe that's in response to someone with a perspective like Lightfoot's, which I guess Hruschka et al. might call variationist?
Another part of the Hruschka et al. article discussed agent-based approaches that aren't explicitly tied to empirical data. Granted, empirical data for language change (especially historical language change) isn't that easy to come by, but I do feel a bit skeptical about models that aren't grounded that way. On the other hand, this kind of agent-based modeling is very common in mathematical and game theoretic approaches to cognitive phenomena.
Turning to Lightfoot (2010), clearly Lightfoot is coming from one very specific perspective that endorses innate domain-specific knowledge in the form of cues. One thing that was very interesting to me was the possibility that cues could lead a child to a grammar that isn't actually the optimal grammar for the current data set. This reminds me very much of current theories of English metrical phonology - it turns out that child-directed and adult-directed speech input are more compatible with some non-English grammar variants than they are with the official English grammar. This could mean that either the ideas about the English grammar are wrong, or that something like a cues-based learner is causing these mismatches. (On the other hand, you would expect these mismatches not to persist over time - the cue-based learning is the instigator of language change in Lightfoot's view.)
Something else that occurred to me when reading Lightfoot: let's suppose that it's true that children need something like principle (4) [Something can be deleted if it is (in) the complement of an adjacent, overt word.]. This has wide coverage and explains some of the puzzling phenomena that Lightfoot brings up. Could this be selected from the hypothesis space (or maybe inferred from the data somehow) exactly because it applied to lots of data, including these phenomena? That seems to jive with the idea of indirect evidence a la Perfors, Tenenbaum, & Regier, as well as classic ideas of how parameters are supposed to be set (one parameter is connected to lots of different data types).
Something that struck me as a little funny in the Hruschka et al. article: they claim to present a framework for language change understanding, as exemplified in figure 1. But to me, all that looks like they're saying is "Yes, all these things are important." Maybe that's in response to someone with a perspective like Lightfoot's, which I guess Hruschka et al. might call variationist?
Another part of the Hruschka et al. article discussed agent-based approaches that aren't explicitly tied to empirical data. Granted, empirical data for language change (especially historical language change) isn't that easy to come by, but I do feel a bit skeptical about models that aren't grounded that way. On the other hand, this kind of agent-based modeling is very common in mathematical and game theoretic approaches to cognitive phenomena.
Turning to Lightfoot (2010), clearly Lightfoot is coming from one very specific perspective that endorses innate domain-specific knowledge in the form of cues. One thing that was very interesting to me was the possibility that cues could lead a child to a grammar that isn't actually the optimal grammar for the current data set. This reminds me very much of current theories of English metrical phonology - it turns out that child-directed and adult-directed speech input are more compatible with some non-English grammar variants than they are with the official English grammar. This could mean that either the ideas about the English grammar are wrong, or that something like a cues-based learner is causing these mismatches. (On the other hand, you would expect these mismatches not to persist over time - the cue-based learning is the instigator of language change in Lightfoot's view.)
Something else that occurred to me when reading Lightfoot: let's suppose that it's true that children need something like principle (4) [Something can be deleted if it is (in) the complement of an adjacent, overt word.]. This has wide coverage and explains some of the puzzling phenomena that Lightfoot brings up. Could this be selected from the hypothesis space (or maybe inferred from the data somehow) exactly because it applied to lots of data, including these phenomena? That seems to jive with the idea of indirect evidence a la Perfors, Tenenbaum, & Regier, as well as classic ideas of how parameters are supposed to be set (one parameter is connected to lots of different data types).
Wednesday, January 26, 2011
Next time: Hruschka et al. (2009) & Lightfoot (2010)
Thanks to all who were able to join us this time for a vigorous discussion of Lidz (2010)! Next time on Feb 9, we'll be discussing two shorter articles dealing with models of language change, and how they (might) inform us about properties of language acquisition:
Hruschka et al. (2009)
Lightfoot (2010)
Hruschka et al. (2009)
Lightfoot (2010)
Monday, January 24, 2011
Thoughts on Lidz (2010)
Something that was extremely interesting to me was the linguistic phenomena that Lidz talked about in the later sections of the review - these are things that really look hard to learn from the data, and yet which children (and sometimes extremely young children) seem to have very clear intuitions about. These are far more complex than the standard poverty of the stimulus examples that are often tackled in the current modeling literature, like structure dependence and anaphoric one. They also make clear what domain-specific information that doesn't seem derivable from the input looks like, e.g., the connections between the observable features in the data and the underlying linguistic objects that express these features. Lidz clearly believes that there's a place for statistical learning (and in fact, a great need for statistical learning) even in a world that has Universal Grammar, since the learner has to track various input features. What Universal Grammar does is provide something like an attentional focus - which features are significant, etc. Moreover, UG also tells how these features connect to the abstract entities, rather than just saying "hey, look at these features - they're probably important for something" (unless we think it's obvious how some of the features map to the abstract structural knowledge).
Some more targeted thoughts:
Some more targeted thoughts:
- The way Lidz used analysis by synthesis on p.203 seemed much more in line with what I thought it was compared to some of the definitions of AxS we saw in last time's reading.
- I thought the 18-month-old constituent knowledge was interesting, but it wasn't as convincing to me as some of the later examples since infants at that age do have experience with constituents moving in their native language - so it's not like they haven't ever seen constituents moving before. In contrast, the interpretations connected with the Spanish & Kannada dative alternations are not something children seemed to encounter very much at all in the input. The same goes for the generic vs. existential interpretations on bare nouns.
Wednesday, January 12, 2011
Next time: Lidz (2010)
Thanks to all who were able to join our vigorous discussion of analysis-by-synthesis at this past meeting of the reading group. Next time, we'll have a look at a recent article by Lidz (2010) (which can be found here) that provides a nativist perspective on statistical learning in language acquisition.
Monday, January 10, 2011
Thoughts on Bever & Poeppel (2010)
So this is definitely a bit of topic shift from our previous readings, but I think it's very useful to see how the analysis-by-synthesis (AxS in Bever & Poeppel's terms) idea is being thought about by people who care very much about the biological underpinnings of language (they are writing for Biolinguistics, after all). The part that spoke most to me was the syntactic section - for one thing, the demonstration of how difficult the mapping can be from syntactic form to meaning (example (4)) was very clear. A main question that arose was whether grammatical derivations are computed as part of the processes of comprehension - this has strong connections to syntactic theory, which views grammatical derivations as the basic operation that has to happen.
Something else of note, which I'd like to think more about, is how to connect the description of AxS that they have here to how we've been thinking about it in the Bayesian models we've looked at already. It almost seemed like some of the distinctions B&P were making were almost orthogonal to how I normally think of a Bayesian model. For instance, in example 9, they deal with the distinction between habits accumulated via induction over experiences (which they seem to liken to pattern recognition) and novel computation (which is presumably the generative part). Mapping this to the Bayesian reasoning stuff we've been looking at, the habits accumulated from induction works pretty well as the general inference from data; would the novel computation then be what happens when the Bayesian learner tries to deal with novel data? But B&P seem to be thinking of this as a process (or processes) that apply during language comprehension, so is all of it happening when the novel data is encountered during comprehension?
Later, in section 7, where they move to AxS in visual perception (and in particular visual object recognition), they again seem to have the division into the quick heuristic/pattern recognition and the slower computation. Would we view this as a two-step inference process when the novel data point is encountered? Step 1: update prior based on quick heuristic ==> new posterior; Step 2: update new posterior based on slower computation ==> newer posterior. The same sort of question would apply when we talk about AxS for sentential meaning (section 8.3): the quick heuristic is the literal meaning, while the slower computation is the pragmatic knowledge.
They do touch on applying this to acquisition at the very end of the article - how the child builds up statistical generalizations over time, with the example of learning the canonical sentence form. The interesting part is that the novel computation is triggered by noticing that there's not a one-to-one mapping from form to meaning all the time. Does this make any clear acquisition trajectory predictions? It seems like it might have something to say about children's understanding of this mapping, and how many non-canonical forms they've seen. But maybe sometime more concrete can be said?
Something else of note, which I'd like to think more about, is how to connect the description of AxS that they have here to how we've been thinking about it in the Bayesian models we've looked at already. It almost seemed like some of the distinctions B&P were making were almost orthogonal to how I normally think of a Bayesian model. For instance, in example 9, they deal with the distinction between habits accumulated via induction over experiences (which they seem to liken to pattern recognition) and novel computation (which is presumably the generative part). Mapping this to the Bayesian reasoning stuff we've been looking at, the habits accumulated from induction works pretty well as the general inference from data; would the novel computation then be what happens when the Bayesian learner tries to deal with novel data? But B&P seem to be thinking of this as a process (or processes) that apply during language comprehension, so is all of it happening when the novel data is encountered during comprehension?
Later, in section 7, where they move to AxS in visual perception (and in particular visual object recognition), they again seem to have the division into the quick heuristic/pattern recognition and the slower computation. Would we view this as a two-step inference process when the novel data point is encountered? Step 1: update prior based on quick heuristic ==> new posterior; Step 2: update new posterior based on slower computation ==> newer posterior. The same sort of question would apply when we talk about AxS for sentential meaning (section 8.3): the quick heuristic is the literal meaning, while the slower computation is the pragmatic knowledge.
They do touch on applying this to acquisition at the very end of the article - how the child builds up statistical generalizations over time, with the example of learning the canonical sentence form. The interesting part is that the novel computation is triggered by noticing that there's not a one-to-one mapping from form to meaning all the time. Does this make any clear acquisition trajectory predictions? It seems like it might have something to say about children's understanding of this mapping, and how many non-canonical forms they've seen. But maybe sometime more concrete can be said?
Monday, January 3, 2011
Meetings for Winter 2011 Quarter
After looking at people's availability, it looks like the best time to meet for this quarter is, somewhat surprisingly, the same time as last quarter: Wednesdays, from 1:15pm to 2:30pm in SBSG 2341. My apologies to anyone who is unable to make it at this time this quarter. Hopefully you'll be able to make it for next quarter. But, in the meantime, always remember that you're welcome to post on the discussion board here.
This quarter, we'll be looking at a couple of different topics, with the idea of getting a view of some of the current perspectives in the field:
Jan 12: Analysis by Synthesis (a common assumption in generative models)
Jan 26: Nativist views on statistical learning
Feb 9 & Feb 23: Language acquisition + Language change
Mar 9: Language evolution, with respect to syntax
The selected readings are posted on the home website's schedule page.
As always, feel free to let me know of any topics or particular readings you would be interested in for future CoLa reading group meetings.
Looking forward to seeing you on January 12!
This quarter, we'll be looking at a couple of different topics, with the idea of getting a view of some of the current perspectives in the field:
Jan 12: Analysis by Synthesis (a common assumption in generative models)
Jan 26: Nativist views on statistical learning
Feb 9 & Feb 23: Language acquisition + Language change
Mar 9: Language evolution, with respect to syntax
The selected readings are posted on the home website's schedule page.
As always, feel free to let me know of any topics or particular readings you would be interested in for future CoLa reading group meetings.
Looking forward to seeing you on January 12!
Subscribe to:
Posts (Atom)