{"id":356,"date":"2014-08-14T14:00:33","date_gmt":"2014-08-14T21:00:33","guid":{"rendered":"http:\/\/alexboisvert.com\/musings\/?p=356"},"modified":"2014-08-14T14:09:49","modified_gmt":"2014-08-14T21:09:49","slug":"npr-puzzle-for-august-10-2014-solving-with-python","status":"publish","type":"post","link":"https:\/\/alexboisvert.com\/musings\/2014\/08\/14\/npr-puzzle-for-august-10-2014-solving-with-python\/","title":{"rendered":"NPR Puzzle for August 10, 2014: solving with Python!"},"content":{"rendered":"<p>Here&#8217;s this past week&#8217;s NPR puzzle:<\/p>\n<blockquote><p>Name a well-known movie of the past \u2014 two words, seven letters in total. These seven letters can be rearranged to spell the name of an animal plus the sound it makes. What animal is it?<\/p><\/blockquote>\n<p>Brute-forcing this puzzle is easy but ineffective.  We can get a <a href=\"ftp:\/\/ftp.fu-berlin.de\/pub\/misc\/movies\/database\/\">list of movies from IMDB<\/a> and see which of the seven-letter ones can anagram into two words.  But you don&#8217;t want to do this, because it will force you to sift through a ton of results.  Can we do better? (Yes, we can.)<br \/>\n<!--more--><br \/>\nWe don&#8217;t want to use *all* movies, and *all* words if we can avoid it.  Can we get a list of &#8220;well-known&#8221; movies, animals, or animal sounds?  The first is the easiest; instead of downloading the movies.list file from the IMDB link above, download the ratings.list file and only consider movies that have above a certain threshold of votes.  I actually rated the movies on a scale of 1-100 using this method, so that the cutoffs make more sense.  Now, how about the list of animals and animal sounds?  Well, thanks to Python\/NLTK and <a href=\"http:\/\/stackoverflow.com\/questions\/15330725\/how-to-get-all-the-hyponyms-of-a-word-synset-in-python-nltk-and-wordnet\">a terrific hint from Stack Overflow<\/a>, we can actually get the members of any category.  Check this out:<br \/>\n[python]<br \/>\nfrom nltk.corpus import wordnet as wn<br \/>\ndef get_category_members(name):<br \/>\n    members = set()<br \/>\n    synsets = wn.synsets(name)<br \/>\n    for synset in synsets:<br \/>\n        members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names]))<br \/>\n    return members<br \/>\n[\/python]<br \/>\nEasy, right?  If you type <code>get_category_members('animal')<\/code> it will give you everything that Wordnet thinks is an animal.<\/p>\n<p>Animal sounds are a little harder to come by.  Typing <code>get_category_members('animal_sounds')<\/code> gets you the empty set, so we have to be a little more creative.  Let&#8217;s check out the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Hyponymy_and_hypernymy\">hypernyms<\/a> for a few animal sounds to give us an idea of what to do.<br \/>\n[python]<br \/>\nsounds = (&#8216;meow&#8217;,&#8217;moo&#8217;,&#8217;baa&#8217;)<br \/>\nfor s in sounds:<br \/>\n    syn = wn.synsets(s)<br \/>\n    for x in syn:<br \/>\n        print x, x.hypernyms()<br \/>\n[\/python]<\/p>\n<pre>\r\nSynset('meow.n.01') [Synset('cry.n.05')]\r\nSynset('meow.v.01') [Synset('utter.v.02')]\r\nSynset('moo.n.01') [Synset('cry.n.05')]\r\nSynset('moo.v.01') [Synset('utter.v.02')]\r\nSynset('baa.n.01') [Synset('cry.n.05')]\r\nSynset('bleat.v.02') [Synset('utter.v.02')]\r\n<\/pre>\n<p>Now we&#8217;re on to something!  Each of these of course can be a noun or a verb, but they all fall under the umbrella of &#8220;cry&#8221; or &#8220;utter.&#8221;  So if we take category members of those two words &#8212; well, it&#8217;ll give us too much, but that&#8217;s okay.  At least we should have all the relevant animal sounds in there.<\/p>\n<p>Put it all together and you get something like this:<br \/>\n[python]<br \/>\nimport re<br \/>\nfrom nltk.corpus import wordnet as wn<\/p>\n<p>def is_good_movie(m):<br \/>\n    &#8221;&#8217;<br \/>\n    Count a movie as &#8220;good&#8221; if it only has 7 letters and one space<br \/>\n    &#8221;&#8217;<br \/>\n    m2 = re.sub(r'[^A-Za-z]+&#8217;,&#8221;,m)<br \/>\n    if len(m2) == 7 and m.count(&#8216; &#8216;) == 1:<br \/>\n        return True<br \/>\n    else:<br \/>\n        return False<\/p>\n<p>def get_category_members(name):<br \/>\n    &#8221;&#8217;<br \/>\n    Use NLTK to get members of a category<br \/>\n    &#8221;&#8217;<br \/>\n    members = set()<br \/>\n    synsets = wn.synsets(name)<br \/>\n    for synset in synsets:<br \/>\n        members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names]))<br \/>\n    return members<\/p>\n<p>def w1_in_w2(w1,w2):<br \/>\n    &#8221;&#8217;<br \/>\n    Return leftover letters if all the letters from w1 are in w2<br \/>\n    Return False otherwise<br \/>\n    &#8221;&#8217;<br \/>\n    for c in w1:<br \/>\n        if c not in w2:<br \/>\n            return False<br \/>\n        else:<br \/>\n            pos = w2.find(c)<br \/>\n            w2 = w2[:pos] + w2[pos+1:]<br \/>\n    return w2<\/p>\n<p># Get a set of movies<br \/>\nmovies = []<br \/>\nwith open(&#8216;RankedMovies.txt&#8217;,&#8217;rb&#8217;) as fid:<br \/>\n    for x in fid.readlines():<br \/>\n        x = x.rstrip(&#8216;\\r\\n&#8217;)<br \/>\n        movie, score = x.split(&#8216;\\t&#8217;)<br \/>\n        score = int(score)<br \/>\n        if score >= 90:<br \/>\n            movie = movie[:-7] # Remove the year<br \/>\n            if is_good_movie(movie):<br \/>\n                movies.append(movie)<\/p>\n<p># Let&#8217;s get a list of things that might be sounds<br \/>\nsounds_tmp = set()<br \/>\nfor x in (&#8216;utter&#8217;,&#8217;cry&#8217;):<br \/>\n    sounds_tmp = sounds_tmp.union(get_category_members(x))<br \/>\nsounds = frozenset([z for z in sounds_tmp if len(z) < 7])\n# Dictionary to help find anagrams\nsorted_to_word = dict()\nfor word in sounds:\n    s = ''.join(sorted(word))\n    try:\n        sorted_to_word[s].append(word)\n    except KeyError:\n        sorted_to_word[s] = [word]\n\n# Get a set of animals\nanimals_tmp = get_category_members('animal')\nanimals = frozenset([y for y in animals_tmp if len(y) < 7])\n\n# Go through and give results\nfor movie in movies:\n    for animal in animals:\n        leftover = w1_in_w2(animal,movie.lower())\n        if leftover:\n            l = leftover.replace(' ','')\n            s = ''.join(sorted(l))    \n            try:\n                new_words = sorted_to_word[s]\n            except KeyError:\n                new_words = [] \n            for n in new_words:\n                print \"%s => %s %s&#8221; % (movie, animal, n)<\/p>\n<p>[\/python]<br \/>\nWhich produces the following result:<\/p>\n<pre>\r\nDas Boot => toad sob\r\nRed Dawn => wren add\r\nRed Dawn => wren add\r\nBig Stan => bat sign\r\nBig Stan => bat sing\r\nLa Bamba => lamb baa\r\nRio Lobo => loir boo\r\nDear God => dog read\r\nRapa Nui => arui pan\r\nGrey Owl => grey low\r\n<\/pre>\n<p>So there you have it.  In about a second you get only 10 results and it&#8217;s easy to pick out the right one.  I&#8217;m stoked about this new method to find members of a category; I sure wish I had it when <a href=\"http:\/\/www.npr.org\/2013\/11\/24\/246933624\/we-plant-the-seed-you-pick-the-tree\">I was working on the challenge to name a tree whose letters can be rearranged to spell two herbs or spices.<\/a>  It took me forever to get good lists for trees and herbs and spices, but with this approach <a href=\"http:\/\/pastebin.com\/XGF23ZaS\">the problem is completely trivialized<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s this past week&#8217;s NPR puzzle: Name a well-known movie of the past \u2014 two words, seven letters in total. These seven letters can be rearranged to spell the name of an animal plus the sound it makes. What animal &hellip; <a href=\"https:\/\/alexboisvert.com\/musings\/2014\/08\/14\/npr-puzzle-for-august-10-2014-solving-with-python\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,5],"tags":[],"class_list":["post-356","post","type-post","status-publish","format-standard","hentry","category-coding","category-puzzles"],"_links":{"self":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/356","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/comments?post=356"}],"version-history":[{"count":8,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/356\/revisions"}],"predecessor-version":[{"id":364,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/356\/revisions\/364"}],"wp:attachment":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/media?parent=356"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/categories?post=356"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/tags?post=356"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}