Thursday, January 13, 2022

Some thoughts on Hu et al. 2021

It’s a nice change of pace for me to take a look at pragmatic modeling work more from the engineering/NLP side of the world (rather than the purely cognitive side), as I think this paper does. That said, I wonder if some of the specific techniques used here, such as the training of the initial context-free lexicon, might be useful for thinking about how humans represent of meaning (especially meaning that feeds into pragmatic reasoning). 


I admit, I also would have benefited from the authors having more space to explain their approach in different places (more on this below). For instance, the intuition of self-supervised vs. regular supervised learning is something I get, but the specific implementation of the self-supervised approach (in particular, why it counts as self-supervised) was a little hard for me to follow.


Specific thoughts:

(1) H&al2021 describe a two-step learning process, where the first step is learning a lexicon without “contextual supervision”. It sounds like this is “context-free” lexicon, like the L0 level level of RSA, which typically involves the semantic representation only. Though I do wonder how “context-free” the basic semantic representations actually are (e.g., they may incorporate the linguistic contexts words appear in), to be honest. But I suppose the main distinction is that no intentions or social information are involved.


The second step is to learn “pragmatic policies” by optimizing an appropriate objective function without “human supervision”. I initially took this to mean unsupervised learning, but then H&al2021 clarified (e.g., in section 3) that instead they meant that certain types of information provided by humans aren’t included during training, and this is useful from an engineering perspective because that kind of data can be costly to get. And so the learning gets the label “self-supervising”, from the standpoint of that withheld information.


 (2) Section 4.3, on the self-supervised learning (SSL) pragmatic agents.


For the AM model that the RSA implementations use, H&al2021 say that they train the base level agents with the full contextual supervision and then “enrich” it with subsequent AM steps. I think I need this unpacked more. I think I follow what it means to train agents with the full contextual supervision: in particular, include the contexts provided by the color triples. But I don’t understand what enriching the agents with AM steps afterwards means. How is that separate/different from the initial training process? Is the initial training not done via AM optimization? For the GD model, we see a similar process, with pragmatic enrichment done via GD steps, rather than AM steps. It seems like this is important to understand, as this distinction gets this approach classified as self-supervised rather than fully supervised. 


(3) For the GD approach, the listener model can train an utterance encoder and color context encoder. But why wouldn’t a listener be using decoders, since listeners can be intuitively thought of as decoding? I guess decoding is just the inverse of encoding, so maybe it’s translatable?


(4) I think I’m unclear on what “ground truth” is in Figure 2a, and why we’re interested in that if humans don’t match it either sometimes. I would have thought the ground truth would be what humans do for this kind of pragmatic language use.

No comments:

Post a Comment