That’s What She Said: Double Entendre Identification

That’s What She Said: Double Entendre Identification

Abstract:

Humor identification is a hard natural language understanding problem. We identify a subproblem — the “that’s what she said” problem — with two distinguishing characteristics: (1) use of nouns that are euphemisms for sexually explicit nouns and (2) structure common in the erotic domain. We address this problem in a classification approach that includes features that model those two characteristics. Experiments on web data demonstrate that our approach improves precision by 12% over baseline techniques that use only word-based features.

1 Introduction:

[...] To our knowledge, related research has not studied the task of identifying double entendres in text or speech. The task is complex and would require both deep semantic and cultural understanding to recognize the vast array of double entendres. We focus on a subtask of double entendre identification: TWSS recognition. We say a sentence is a TWSS if it is funny to follow that sentence with “that’s what she said”. We frame the problem of TWSS recognition as a type of metaphor identification.

We define three functions to measure how closely related a noun, an adjective, and a verb phrase are to the erotica domain.

1. The noun sexiness function NS(n) is a real-valued measure of the maximum similarity a noun n ∈/ SN has to each of the nouns ∈ SN−. For each noun, let the adjective count vector be the vector of the absolute frequencies of each adjective that modifies the noun in the union of the erotica and the Brown corpora. We define NS(n) to be the maximum cosine similarity, over each noun ∈ SN−, using term frequency-inverse document frequency (tf-idf) weights of the nouns’ adjective count vectors. [...] Example nouns with high NS are “rod” and “meat”.

2. The adjective sexiness function AS(a) is a real-valued measure of how likely an adjective a is to modify a noun ∈ SN. We define AS(a) to be the relative frequency of a in sentences in the erotica corpus that contain at least one noun ∈ SN. Example adjectives with high AS are “hot” and “wet”.

Previously.

Tags: , ,

9 Responses:

  1. TJIC says:

    > The noun sexiness function

    My quest for the perfect band name is finally over.

  2. Morrisa says:

    This evening I said "Mucus, but you barely know us" to Miranda, who is trying very, very hard to acquire humor as fast as her bright, seven-year-old brain possibly can. She cocked her head quizzically and replied, "No, that one isn't good. Myook is nonsense, not a real verb. 'Poker face? But you barely know 'er!' is a much better one. Keep trying, though, mom. Those are funny, when they work."

    I have created a monster.

  3. Jason! says:

    When I read: "In this paper, we assume all test instances are from nonerotic domains and leave the classification of erotic and nonerotic contexts to future work."

    I thought: "But that's the really hard part!"

    And then I thought... well, yeah, you know what I thought.

  4. Fluff says:

    Also discussed on HN & Quora today: http://news.ycombinator.com/item?id=2491487 - although I couldn't find this PDF linked from there!

  5. phuzz says:

    I'm glad someone is studying this, after all, we wouldn't want our future machine overlords to not pick up on our innuendos would we?

    • Editer says:

      Heh, you said "in you end-o".

    • Art Delano says:

      So much for being able to sneak coded information past Skynet. Our only hope now is embedding messages within logical paradoxes and hope that if it attempts to parse them it will shake, emit smoke and explode.

      • DFB says:

        Trying to engender contradictions in information technology equipment is what did in the dinosaurs. We need to port Asimov's third law to Javascript.

  6. phuzz says:

    Well, at least we'll be able to flit with our robots.
    er, hypothetically?