Wednesday, April 29, 2015

Some thoughts on Heinz 2015 book chapters, parts 6-9

Continuing on from last time, where we read up through the discussion about constraints on strings, Heinz’s 2015 book chapter now gets into the constraints on maps between the underlying form and the observable form of a phonological string. As before, I found the more leisurely walk-through of the different ideas (complete with illustrative figures) quite accessible. The only gap in that respect for me as a non-phonologist was what an opaque map was, since Heinz mentions that opaque maps raise potential issues for the computational approach here. A quick googling pulled up some examples, but a brief concrete example would have been helpful.

On a more a contentful note, I found the compare and contrast with the optimality approach quite interesting. We have this great setup for some logically possible maps that are derivationally simple (e.g. “Sour Grapes”), and yet we find these maps unattested. Optimality has to add stuff in to take care of it, while the computational ontology Heinz presents neatly separates them out. Boom. Simple. 

So then this leads me (as an acquisition person) to wondering what we can do with this learning-wise. Let’s say we have the set of phonological maps that occur in human language captured by a certain type of relationship (input-strictly local [ISL]) — there are some exceptions currently, but let’s say those get sorted out. Then, we also have some computational learnability results about how to learn these types of maps in the limit. Can I, as an acquisition modeler, then do something with those algorithms? Or do I need to develop other algorithms based off of those that do the same thing, only in plausible time limits? 

And let’s make this even more concrete, actually — suppose there are a set of maps capturing English phonology that we think children learn by a certain age. Suppose that we do the kind of analysis Heinz suggests and discover all these maps are ISL. What kind of learning algorithms should I model to see if children could learn the right maps from English child-directed data? Are the existing learnability algorithms the ones? Or do I need to adapt them somehow? Or is it more that they serve to show it’s possible, but they may bear no resemblance to the algorithms kids would actually have to use? Given Heinz’s comment at the end of part 5 about the link between algorithm and representation, I feel like the existing algorithms should be related to the ones kids approximate if that kind of link is there.

A few other thoughts: 

(1) Heinz points out the interesting dichotomy between tone maps and segment maps, where the tone maps allow more complex relationships. He mentions that this has been used to argue for modularity (where tones are in one module and segments are in the other, presumably), and that could very well be. What it also shows is that there isn’t just one restriction on the complexity in general — a more restrictive one occurs for segment maps but a less restrictive one occurs for tone maps. Why? Two thoughts: (1) Maybe the less restrictive one is the general abstract restriction, and something special happens for segments that further restricts it. This fits into the modularity explanation above. But (2) maybe it’s just chance that we haven’t found segment maps that violate the more restrictive restriction. If so, we wouldn’t need the modularity explanation since the difference between segment maps and tonal maps would just be, in effect, a sampling error (more samples if we had them would show segment maps that don’t follow that extra restriction). Caveat: I’m not sure how plausible this second idea is, given how many segment maps we have access to.

(2) I’m still not sure how much faith I have in the artificial language learning experiments that are meant to show that humans can’t learn certain types of generalizations/rules/mappings. I definitely believe that the subjects struggled to learn certain ones in the experiment while finding others easy to learn. But how much of that is effectively an L2 transfer effect? That is, the easy-to-learn ones are the ones in your native language, so (abstractly) you already have a bunch of experience with those and no experience with the other hard-to-learn kind. To be fair, I’m not sure how you could factor out the L2 transfer effect — no matter what you do with adults (or even kids), if it’s a language thing, they’ve already had exposure from their native language.

(3) Something for NLP applications (maybe): Section 6.4, “The simplest maps are Markovian on the input or the output (ISL, LOSL, and ROSL), and very many phonological transformations belong to these classes.” — This makes me think that the simpler representations NLP apps tend to use for speech recognition and production (ex: various forms of Hidden Markov Models, I think) may not be so far off from the truth, if this approach is correct.

Wednesday, April 15, 2015

Some thoughts on Heinz 2015 book chapter, parts 1-5

For me, this was a very accessible introduction to a lot of the formal terminology and distinctions that computational learnability research trades in. (For instance, I think this may be the first time I really understood why we would be excited that generalizations would be strictly local or strictly piecewise.) From an acquisition point of view, I was very into some particular ideas/approaches:

(1) the distinction between an intensional description (i.e., theoretical constructs that compactly capture the data) and an extension (i.e., the actual pattern of data), along with the analogy to the finite means (intensional description) that accounts for the infinite use (extension). If there’s a reasonable way to talk about the extension, we get a true atheoretical description of the empirical data, which seems like an excellent jumping off point for describing the target state of acquisition.

(2) the approach of defining the implicit hypothesis space, i.e., the fundamental pieces that explicit hypothesis spaces (or generalizations) are built from. This feels very similar to the old school Principles & Parameters approach to language acquisition (specifically, the Principles part, if we’re talking about the things that don’t vary). It also jives well with some recent thoughts in the Bayesian inference sphere (e.g., see Perfors 2012 for implicit vs. explicit hypothesis spaces).

**Perfors, A. 2012. Bayesian Models of Cognition: What's Built in After All? Philosophy Compass, 7(2), 127-138.

(3) that tie-in between the nature of phonological generalizations, the algorithms that can learn those generalizations, and why this might support those generalizations as actual human mental representations. In particular, “Constraints on phonological well-formedness are SL and SP because people learn phonology in the way suggested by these algorithms.” (End of section 5.2.1)

When I first read this, it seemed odd to me — we’re saying something like: “Look! Human language makes only these kinds of generalizations, because there are constraints! And hey, these are the algorithms that can learn those constrained generalizations! Therefore, the reason these constraints exist is because these algorithms are the ones people use!” It felt as if a step were missing at first glance: we use the constrained generalizations as a basis for positing certain learning algorithms, and then we turn that on its head immediately and say that those algorithms *are* the ones humans use and that’s the basis for the constrained generalizations we see. 

But when I looked at it again (and again), I realized that this did actually make sense to me. The way we got to story may have been a little roundabout, but the basic story of “these constraints on representations exist because human brains learn things in a specific way” is very sensible (and picked up again in 5.4: “…human learners generalize in particular ways—and the ways they generalize yield exactly these classes”). And what this does is provide a concrete example of exactly what constraints and exactly what specific learning procedures we’re talking about for phonology.

(4) There’s a nice little typology connection at the end of section 5.1, based on these formal complexity ontologies: “…the widely-attested constraints are the formally simple ones, where the measure of complexity is determined according to these hierarchies”. Thinking back to links with acquisition, would this be because the human brain is sensitive to the complexity levels (however that might be instantiated)? If so, the prevalence of less complex constraints is due to how easy they are to learn with a human brain. (Or any brain?)

Friday, April 3, 2015

Next time on 4/17/15 @ 12pm in SBSG 2221 = Heinz 2015 book chapter, parts 1-5

It looks like a good collective time to meet will be Fridays at 12pm for this quarter, so that's what we'll plan on.  Our first meeting will be on April 17, and our complete schedule is available on the webpage at 

On April 17, we'll be discussing the first part of a book chapter (parts 1-5) on computational phonology that focuses on the kinds of generalizations and computations that seem to occur in this linguistic domain. This can be very useful for us to think about as modelers if we want to understand the hypothesis spaces learners have for making phonological generalizations.

Heinz, J. 2015 Manuscript. The computational nature of phonological generalizations. University of Delaware. Please do not cite without permission from Jeff Heinz.

See you on April 17!