NPR Puzzle for August 10, 2014: solving with Python!

August 14, 2014 by Alex | 8 Comments

Here’s this past week’s NPR puzzle:

Name a well-known movie of the past — two words, seven letters in total. These seven letters can be rearranged to spell the name of an animal plus the sound it makes. What animal is it?

Brute-forcing this puzzle is easy but ineffective. We can get a list of movies from IMDB and see which of the seven-letter ones can anagram into two words. But you don’t want to do this, because it will force you to sift through a ton of results. Can we do better? (Yes, we can.)

We don’t want to use *all* movies, and *all* words if we can avoid it. Can we get a list of “well-known” movies, animals, or animal sounds? The first is the easiest; instead of downloading the movies.list file from the IMDB link above, download the ratings.list file and only consider movies that have above a certain threshold of votes. I actually rated the movies on a scale of 1-100 using this method, so that the cutoffs make more sense. Now, how about the list of animals and animal sounds? Well, thanks to Python/NLTK and a terrific hint from Stack Overflow, we can actually get the members of any category. Check this out:

from nltk.corpus import wordnet as wn
def get_category_members(name):
    members = set()
    synsets = wn.synsets(name)
    for synset in synsets:
        members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names]))
    return members

Easy, right? If you type get_category_members('animal') it will give you everything that Wordnet thinks is an animal.

Animal sounds are a little harder to come by. Typing get_category_members('animal_sounds') gets you the empty set, so we have to be a little more creative. Let’s check out the hypernyms for a few animal sounds to give us an idea of what to do.

sounds = ('meow','moo','baa')
for s in sounds:
    syn = wn.synsets(s)
    for x in syn:
        print x, x.hypernyms()

Synset('meow.n.01') [Synset('cry.n.05')]
Synset('meow.v.01') [Synset('utter.v.02')]
Synset('moo.n.01') [Synset('cry.n.05')]
Synset('moo.v.01') [Synset('utter.v.02')]
Synset('baa.n.01') [Synset('cry.n.05')]
Synset('bleat.v.02') [Synset('utter.v.02')]

Now we’re on to something! Each of these of course can be a noun or a verb, but they all fall under the umbrella of “cry” or “utter.” So if we take category members of those two words — well, it’ll give us too much, but that’s okay. At least we should have all the relevant animal sounds in there.

Put it all together and you get something like this:

import re
from nltk.corpus import wordnet as wn

def is_good_movie(m):
    '''
    Count a movie as "good" if it only has 7 letters and one space
    '''
    m2 = re.sub(r'[^A-Za-z]+','',m)
    if len(m2) == 7 and m.count(' ') == 1:
        return True
    else:
        return False
        
def get_category_members(name):
    '''
    Use NLTK to get members of a category
    '''
    members = set()
    synsets = wn.synsets(name)
    for synset in synsets:
        members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names]))
    return members

def w1_in_w2(w1,w2):
    '''
    Return leftover letters if all the letters from w1 are in w2
    Return False otherwise
    '''
    for c in w1:
        if c not in w2:
            return False
        else:
            pos = w2.find(c)
            w2 = w2[:pos] + w2[pos+1:]
    return w2

# Get a set of movies
movies = []
with open('RankedMovies.txt','rb') as fid:
    for x in fid.readlines():
        x = x.rstrip('\r\n')
        movie, score = x.split('\t')
        score = int(score)
        if score >= 90:
            movie = movie[:-7] # Remove the year
            if is_good_movie(movie):
                movies.append(movie)

# Let's get a list of things that might be sounds
sounds_tmp = set()
for x in ('utter','cry'):
    sounds_tmp = sounds_tmp.union(get_category_members(x))
sounds = frozenset([z for z in sounds_tmp if len(z) < 7])
# Dictionary to help find anagrams
sorted_to_word = dict()
for word in sounds:
    s = ''.join(sorted(word))
    try:
        sorted_to_word[s].append(word)
    except KeyError:
        sorted_to_word[s] = [word]

# Get a set of animals
animals_tmp = get_category_members('animal')
animals = frozenset([y for y in animals_tmp if len(y) < 7])

# Go through and give results
for movie in movies:
    for animal in animals:
        leftover = w1_in_w2(animal,movie.lower())
        if leftover:
            l = leftover.replace(' ','')
            s = ''.join(sorted(l))    
            try:
                new_words = sorted_to_word[s]
            except KeyError:
                new_words = [] 
            for n in new_words:
                print "%s => %s %s" % (movie, animal, n)

Which produces the following result:

Das Boot => toad sob
Red Dawn => wren add
Red Dawn => wren add
Big Stan => bat sign
Big Stan => bat sing
La Bamba => lamb baa
Rio Lobo => loir boo
Dear God => dog read
Rapa Nui => arui pan
Grey Owl => grey low

So there you have it. In about a second you get only 10 results and it’s easy to pick out the right one. I’m stoked about this new method to find members of a category; I sure wish I had it when I was working on the challenge to name a tree whose letters can be rearranged to spell two herbs or spices. It took me forever to get good lists for trees and herbs and spices, but with this approach the problem is completely trivialized.

Box Relatives

Thoughts about puzzles, math, coding, and miscellaneous

NPR Puzzle for August 10, 2014: solving with Python!

8 Comments

Leave a Reply Cancel reply