{"id":404,"date":"2015-05-21T13:00:14","date_gmt":"2015-05-21T20:00:14","guid":{"rendered":"http:\/\/alexboisvert.com\/musings\/?p=404"},"modified":"2015-05-21T13:37:14","modified_gmt":"2015-05-21T20:37:14","slug":"npr-puzzle-for-2015-05-17-solving-with-python","status":"publish","type":"post","link":"https:\/\/alexboisvert.com\/musings\/2015\/05\/21\/npr-puzzle-for-2015-05-17-solving-with-python\/","title":{"rendered":"NPR Puzzle for 2015-05-17: Solving With Python"},"content":{"rendered":"<p>Here&#8217;s <a href=\"http:\/\/www.npr.org\/2015\/05\/17\/407090307\/a-puzzle-that-takes-you-around-the-globe\" target=\"_blank\">this week&#8217;s NPR puzzle<\/a>:<\/p>\n<blockquote><p>Name a country with at least three consonants. These are the same consonants, in the same order, as in the name of a language spoken by millions of people worldwide. The country and the place where the language is principally spoken are in different parts of the globe. What country and what language are these?<\/p><\/blockquote>\n<p>Let&#8217;s see what the NLTK can do for us here.<br \/>\n<!--more--><br \/>\nAs usual, the goal is to do this in pure Python, without any external help.  Getting a list of countries is super easy:<br \/>\n[python]<br \/>\nfrom nltk.corpus import gazetteers<br \/>\ncountries = set([country for filename in (&#8216;isocountries.txt&#8217;,&#8217;countries.txt&#8217;) for country in gazetteers.words(filename)])<br \/>\n[\/python]<\/p>\n<p>Getting a list of languages?  Well, let&#8217;s use Wordnet for that.  First, let&#8217;s look at where the word &#8220;Swahili&#8221; falls in Wordnet by printing its hypernyms:<br \/>\n[python]<br \/>\nsynsets = wn.synsets(&#8216;swahili&#8217;)<br \/>\nsynset = synsets[0]<br \/>\nwhile synset:<br \/>\n    print synset.lemma_names()<br \/>\n    synsets = synset.hypernyms()<br \/>\n    if synsets:<br \/>\n        synset = synsets[0]<br \/>\n    else:<br \/>\n        synset = &#8221;<br \/>\n[\/python]<\/p>\n<pre>\r\n[u'Swahili']\r\n[u'Bantu', u'Bantoid_language']\r\n[u'Niger-Congo']\r\n[u'Niger-Kordofanian', u'Niger-Kordofanian_language']\r\n[u'natural_language', u'tongue']\r\n[u'language', u'linguistic_communication']\r\n[u'communication']\r\n[u'abstraction', u'abstract_entity']\r\n[u'entity']\r\n<\/pre>\n<p>All right, looks like taking all the hyponyms of &#8220;natural_language&#8221; will work nicely.  We&#8217;ll get some things we don&#8217;t need &#8212; namely language families like &#8220;Niger-Kordofanian&#8221; &#8212; but it&#8217;s all right, we&#8217;ll just remove them with the eyeball test.<\/p>\n<p>Now that we&#8217;re ready to go, we&#8217;ll apply <a href=\"http:\/\/alexboisvert.com\/musings\/2014\/08\/14\/npr-puzzle-for-august-10-2014-solving-with-python\/\">the trick we used before to get members of a category<\/a> and we&#8217;re off:<\/p>\n<p>[python]<br \/>\nfrom nltk.corpus import wordnet as wn<br \/>\nimport re<br \/>\nfrom nltk.corpus import gazetteers<br \/>\nfrom collections import defaultdict<\/p>\n<p>def just_consonants(w):<br \/>\n    &#8221;&#8217;<br \/>\n    Remove anything but consonants<br \/>\n    &#8221;&#8217;<br \/>\n    w = w.lower()<br \/>\n    return re.sub(r'[aeiouy]+&#8217;,&#8221;,w)<\/p>\n<p>def get_category_members(name):<br \/>\n    &#8221;&#8217;<br \/>\n    Use NLTK to get members of a category<br \/>\n    &#8221;&#8217;<br \/>\n    members = set()<br \/>\n    synsets = wn.synsets(name)<br \/>\n    for synset in synsets:<br \/>\n        members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names()]))<br \/>\n    return members<\/p>\n<p>##################<br \/>\n# Get a list of languages<br \/>\nlanguages = get_category_members(&#8216;natural_language&#8217;)<br \/>\n# Make a dictionary of consonantcy -> language<br \/>\nlang_dict = defaultdict(list)<br \/>\nfor w in languages:<br \/>\n    lang_dict[just_consonants(w)].append(w)<\/p>\n<p># Get a list of countries<br \/>\ncountries = set([country for filename in (&#8216;isocountries.txt&#8217;,&#8217;countries.txt&#8217;) for country in gazetteers.words(filename)])<br \/>\ncountry_dict = defaultdict(list)<br \/>\nfor w in countries:<br \/>\n    country_dict[just_consonants(w)].append(w)<br \/>\ncons_country_set = frozenset(country_dict.keys())<\/p>\n<p>counter = 1<br \/>\nfor consonantcy in lang_dict.iterkeys():<br \/>\n    if consonantcy in cons_country_set and len(consonantcy) >= 3:<br \/>\n        print counter, country_dict[consonantcy], lang_dict[consonantcy]<br \/>\n        counter += 1<br \/>\n[\/python]<\/p>\n<pre>\r\n1 [u'Somalia'] [u'Somali']\r\n2 [u'Turkey'] [u'Turki']\r\n3 [u'Uganda'] [u'Gondi']\r\n4 [u'Chad'] [u'Chad']\r\n5 [u'America'] [u'Maraco']\r\n6 [u'Tonga'] [u'Tonga']\r\n7 [u'Lebanon'] [u'Albanian']\r\n8 [u'Nepal'] [u'Nepali']\r\n9 [u'Slovenia'] [u'Slovene']\r\n10 [u'Slovakia'] [u'Slovak']\r\n11 [u'Armenia', u'Romania'] [u'Romany']\r\n12 [u'Azerbaijan'] [u'Azerbaijani']\r\n13 [u'Germany'] [u'German']\r\n14 [u'Malawi'] [u'Mulwi']\r\n15 [u'Malta'] [u'Yamaltu', u'Malto', u'Malti']\r\n16 [u'Greece'] [u'Ugric']\r\n17 [u'China'] [u'Chin']\r\n18 [u'Ukraine'] [u'Korean', u'Karen']\r\n<\/pre>\n<p>Well, what do you know.  You could argue, I guess, for #3 or #16, but far and away the best answers are #7 and #18 (the first part, anyway).  Nice puzzle!  And it once again goes to show the power of using NLTK to get members of a category.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s this week&#8217;s NPR puzzle: Name a country with at least three consonants. These are the same consonants, in the same order, as in the name of a language spoken by millions of people worldwide. The country and the place &hellip; <a href=\"https:\/\/alexboisvert.com\/musings\/2015\/05\/21\/npr-puzzle-for-2015-05-17-solving-with-python\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,5],"tags":[],"class_list":["post-404","post","type-post","status-publish","format-standard","hentry","category-coding","category-puzzles"],"_links":{"self":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/404","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/comments?post=404"}],"version-history":[{"count":5,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/404\/revisions"}],"predecessor-version":[{"id":410,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/404\/revisions\/410"}],"wp:attachment":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/media?parent=404"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/categories?post=404"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/tags?post=404"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}