Here’s this past week’s NPR puzzle:
Name a well-known movie of the past — two words, seven letters in total. These seven letters can be rearranged to spell the name of an animal plus the sound it makes. What animal is it?
Brute-forcing this puzzle is easy but ineffective. We can get a list of movies from IMDB and see which of the seven-letter ones can anagram into two words. But you don’t want to do this, because it will force you to sift through a ton of results. Can we do better? (Yes, we can.)
We don’t want to use *all* movies, and *all* words if we can avoid it. Can we get a list of “well-known” movies, animals, or animal sounds? The first is the easiest; instead of downloading the movies.list file from the IMDB link above, download the ratings.list file and only consider movies that have above a certain threshold of votes. I actually rated the movies on a scale of 1-100 using this method, so that the cutoffs make more sense. Now, how about the list of animals and animal sounds? Well, thanks to Python/NLTK and a terrific hint from Stack Overflow, we can actually get the members of any category. Check this out:
from nltk.corpus import wordnet as wn def get_category_members(name): members = set() synsets = wn.synsets(name) for synset in synsets: members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names])) return members
Easy, right? If you type get_category_members('animal')
it will give you everything that Wordnet thinks is an animal.
Animal sounds are a little harder to come by. Typing get_category_members('animal_sounds')
gets you the empty set, so we have to be a little more creative. Let’s check out the hypernyms for a few animal sounds to give us an idea of what to do.
sounds = ('meow','moo','baa') for s in sounds: syn = wn.synsets(s) for x in syn: print x, x.hypernyms()
Synset('meow.n.01') [Synset('cry.n.05')] Synset('meow.v.01') [Synset('utter.v.02')] Synset('moo.n.01') [Synset('cry.n.05')] Synset('moo.v.01') [Synset('utter.v.02')] Synset('baa.n.01') [Synset('cry.n.05')] Synset('bleat.v.02') [Synset('utter.v.02')]
Now we’re on to something! Each of these of course can be a noun or a verb, but they all fall under the umbrella of “cry” or “utter.” So if we take category members of those two words — well, it’ll give us too much, but that’s okay. At least we should have all the relevant animal sounds in there.
Put it all together and you get something like this:
import re from nltk.corpus import wordnet as wn def is_good_movie(m): ''' Count a movie as "good" if it only has 7 letters and one space ''' m2 = re.sub(r'[^A-Za-z]+','',m) if len(m2) == 7 and m.count(' ') == 1: return True else: return False def get_category_members(name): ''' Use NLTK to get members of a category ''' members = set() synsets = wn.synsets(name) for synset in synsets: members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names])) return members def w1_in_w2(w1,w2): ''' Return leftover letters if all the letters from w1 are in w2 Return False otherwise ''' for c in w1: if c not in w2: return False else: pos = w2.find(c) w2 = w2[:pos] + w2[pos+1:] return w2 # Get a set of movies movies = [] with open('RankedMovies.txt','rb') as fid: for x in fid.readlines(): x = x.rstrip('\r\n') movie, score = x.split('\t') score = int(score) if score >= 90: movie = movie[:-7] # Remove the year if is_good_movie(movie): movies.append(movie) # Let's get a list of things that might be sounds sounds_tmp = set() for x in ('utter','cry'): sounds_tmp = sounds_tmp.union(get_category_members(x)) sounds = frozenset([z for z in sounds_tmp if len(z) < 7]) # Dictionary to help find anagrams sorted_to_word = dict() for word in sounds: s = ''.join(sorted(word)) try: sorted_to_word[s].append(word) except KeyError: sorted_to_word[s] = [word] # Get a set of animals animals_tmp = get_category_members('animal') animals = frozenset([y for y in animals_tmp if len(y) < 7]) # Go through and give results for movie in movies: for animal in animals: leftover = w1_in_w2(animal,movie.lower()) if leftover: l = leftover.replace(' ','') s = ''.join(sorted(l)) try: new_words = sorted_to_word[s] except KeyError: new_words = [] for n in new_words: print "%s => %s %s" % (movie, animal, n)
Which produces the following result:
Das Boot => toad sob Red Dawn => wren add Red Dawn => wren add Big Stan => bat sign Big Stan => bat sing La Bamba => lamb baa Rio Lobo => loir boo Dear God => dog read Rapa Nui => arui pan Grey Owl => grey low
So there you have it. In about a second you get only 10 results and it’s easy to pick out the right one. I’m stoked about this new method to find members of a category; I sure wish I had it when I was working on the challenge to name a tree whose letters can be rearranged to spell two herbs or spices. It took me forever to get good lists for trees and herbs and spices, but with this approach the problem is completely trivialized.
Pingback: NPR Puzzle for 2015-05-17: Solving With Python | Box Relatives