I really like the approach this paper takes, where insights from humor research are operationalized using existing formal metrics from language understanding. It’s the kind of approach I think is very fruitful for computational research because it demonstrates the utility of bothering to formalize things — in this case, the outcome is surprisingly good prediction about degrees of funniness and more explanatory power about why exactly something is funny.
As an organizational note, I love the tactic the authors take here of basically saying “we’ll explain the details of this bit in the next section”, which is an implicit way of saying “here’s the intuition and feel free to skip the details in the next section if you’re not as interested in the implementation.” For me, one big challenge of writing up modeling results of this kind is the level of detail to include when you’re explaining how everything works. It’s tricky because of how wide an audience you’re attempting to interest. Put too little computational detail in and everyone’s irritated at your hand-waving; put too much in and readers get derailed from your main point. So, this presentation style may be a new format to try.
A few more specific thoughts:
(1) I mostly followed the details of the ambiguity and distinctiveness calculations (ambiguity is about entropy, distinctiveness is about KL divergence of the indicator variables f for each meaning). However, I think it’s worth pondering more carefully how the part described at the end of section 2.2 works which goes into the (R)elatedness calculation. If we’re getting relatedness scores between pairs of words (R(w_i, h), where h = homophone and w_i is another word from the utterance, then how do we compile that together to get the R(w_i, m) that shows up in equation 8? For example, where does free parameter r (which was empirically fit to people’s funniness judgments and captures a word’s own relatedness) show up?
(2) I really like that this model is able to pull out exactly which words correspond to which meanings. This features reminds me of topic models, where each word is generated by a specific topic. Here, a word is generated by a specific meaning (or at least, I think that’s what Figure 1 shows, with the idea that m could be a variety of meanings).
(3) I always find it funny in computational linguistics research that the general language statistics portion (here, in the generative model) can be captured effectively by a trigram model. The linguist in me revolts, but the engineer in me shrugs and thinks if it works, then it’s clearly a good enough. The pun classification results here are simply another example of why computational linguistics often uses trigram models to approximate language structure and pretty much finds it adequate for most of what they want to do. Maybe for puns (and other joke types) that involve structural ambiguity rather than phonological ambiguity, we’d need something more than trigrams, though.