Monday, February 29, 2016

Some thoughts on Goldberg & Boyd 2015

I definitely appreciated G&B2015’s clarification of how precisely statistical preemption and categorization are meant to work for learning about a-adjectives (or at least, one concrete implementation of it). In particular, statistical preemption is likened to blocking, which means the learner needs to have an explicit set of alternatives over which to form expectations. For A-adjectives,  the relevant alternatives could be something like “the sleeping boy” vs. “the asleep boy”. If both are possible, then “the asleep boy” should appear sometimes (i.e., with some probability). When it doesn’t, this is because it’s blocked. Clearly, we could easily implement this with Bayesian inference (or as G&B2015 point out themselves, with simple error-driven learning), provided we have the right hypothesis space. 

For example, H1 = only “the sleeping boy” is allowed, while H2 = “the sleeping boy” and “the asleep boy” are both allowed. H1 will win over H2 in a very short amount of time as long as children hear lots of non-a-adjective equivalents (like "sleeping") in this syntactic construction. The real trick is making sure these are the hypotheses under consideration.  For example, there seems to be another reasonable way to think about the hypothesis space, based on the relative clause vs. attributive syntactic usage. H1 = “the boy who is asleep”; H2 = “the asleep boy” and “the boy who is asleep”. Here, we really need to instances of relative-clause usage to drive us towards H1.

It makes me think about the more general issue of determining the hypothesis space that statistical preemption (or Bayesian inference, etc.) is supposed to operate over. G&B2015 explicitly note this themselves in the beginning of section 5, and talk more about hypothesis space construction in 5.2. For the a-adjective learning story G&B2015 promote, I would think some sort of recognition of the semantic similarity of words and the syntactic environments is the basis of the hypothesis space generation.

Some other thoughts:
(1) Section 1: I thought it was an interesting point about “afraid” being sucked into the a-adjective class even though it lacks the morphological property (aspectual “a-“ prefix + free morpheme, the way we see with “asleep”, “ablaze”, “alone”, etc.). This is presumably because of the relevant distributional properties categorizing it with the other a-adjectives? (That is, it’s “close enough”, given the other properties it has.)

(2) Section 2: Just as a note about the description of the experimental tasks, I wonder why they didn’t use novel-a-adjectives that matched the morphological segmentation properties that the real a-adjectives and alternatives have, i..e, asleep and sleepy, so ablim and blimmy (instead of chammy).  

(3) Section 3: G&B2015 note that Yang’s child-directed survey didn’t find a-adjectives being used in relative clauses (i.e., the relevant syntactic distribution cue). So, this is a problem if you think you need to see relative clause usage to learn something about a-adjectives. But, as mentioned above (and also in Yang 2015), I think that’s only one way to learn about them. There are other options, based on semantic equivalents (“sleeping”, “sleepy”, etc. vs. “asleep”) or similarity to other linguistic categories (e.g., the Yang 2015 approach with locative particles).

(4) Section 4: I really appreciate the explicit discussion of how the distributional similarity-based classification would need to work for the locative particles-strategy to pan out (i.e., Table 1). It’s the next logical step once we have Yang’s proposal about using locative particles in the first place.

(5) Section 4: I admit a bit of trepidation about the conclusion that the available distributional evidence for locative particles is insufficient to lump them together with a-adjectives. It’s the sort of thing where we have to remember that children are learning a system of knowledge, and so while the right-type adverb modification may not be a slam dunk for distinguishing a-adjectives from non-a-adjectives, I do wonder if the collection of syntactic distribution properties (e.g., probability of coordination with PPs, etc.) would cause children to lump a-adjectives together with locative particles and prepositional phrases and, importantly, not with non-a-adjectives. Or perhaps, more generally, the distributional information might cause children to just separate out a-adjectives, and note that they have some overlap with locative particles/PPs and also with regular non-a-adjectives. 

Side note: This is the sort of thing ideal learner models are fantastic at telling us: is the information sufficient to draw conclusion x? In this case, the conclusion would be that non-a-adjectives go together, given the various syntactic distribution cues available. G&B2015 touch on this kind of model at the beginning of section 5.2, mentioning the Perfors et al. 2010 work.


(6) Section 5: I was delighted to see the Hao (2015) study, which gets us the developmental trajectory for a-adjective categorization (or at least, how a-adjectives project onto syntactic distribution). Ten years old is really old for most acquisition stuff. So, this accords with the evidence being pretty scanty (or at least, children taking awhile until they can recognize that the evidence is there, and then make use of it).

No comments:

Post a Comment