Box Relatives

Thoughts about puzzles, math, coding, and miscellaneous

NPR Puzzle for 2015-05-17: Solving With Python

| 0 comments

Here’s this week’s NPR puzzle:

Name a country with at least three consonants. These are the same consonants, in the same order, as in the name of a language spoken by millions of people worldwide. The country and the place where the language is principally spoken are in different parts of the globe. What country and what language are these?

Let’s see what the NLTK can do for us here.

As usual, the goal is to do this in pure Python, without any external help. Getting a list of countries is super easy:

from nltk.corpus import gazetteers
countries = set([country for filename in ('isocountries.txt','countries.txt') for country in gazetteers.words(filename)])

Getting a list of languages? Well, let’s use Wordnet for that. First, let’s look at where the word “Swahili” falls in Wordnet by printing its hypernyms:

synsets = wn.synsets('swahili')
synset = synsets[0]
while synset:
    print synset.lemma_names()
    synsets = synset.hypernyms()
    if synsets:
        synset = synsets[0]
    else:
        synset = ''
[u'Swahili']
[u'Bantu', u'Bantoid_language']
[u'Niger-Congo']
[u'Niger-Kordofanian', u'Niger-Kordofanian_language']
[u'natural_language', u'tongue']
[u'language', u'linguistic_communication']
[u'communication']
[u'abstraction', u'abstract_entity']
[u'entity']

All right, looks like taking all the hyponyms of “natural_language” will work nicely. We’ll get some things we don’t need — namely language families like “Niger-Kordofanian” — but it’s all right, we’ll just remove them with the eyeball test.

Now that we’re ready to go, we’ll apply the trick we used before to get members of a category and we’re off:

from nltk.corpus import wordnet as wn
import re
from nltk.corpus import gazetteers
from collections import defaultdict

def just_consonants(w):
    '''
    Remove anything but consonants
    '''
    w = w.lower()
    return re.sub(r'[aeiouy]+','',w)
         
def get_category_members(name):
    '''
    Use NLTK to get members of a category
    '''
    members = set()
    synsets = wn.synsets(name)
    for synset in synsets:
        members = members.union(set([w for s in synset.closure(lambda s:s.hyponyms(),depth=10) for w in s.lemma_names()]))
    return members
 
##################
# Get a list of languages
languages = get_category_members('natural_language')
# Make a dictionary of consonantcy -> language
lang_dict = defaultdict(list)
for w in languages:
    lang_dict[just_consonants(w)].append(w)

# Get a list of countries
countries = set([country for filename in ('isocountries.txt','countries.txt') for country in gazetteers.words(filename)])
country_dict = defaultdict(list)
for w in countries:
    country_dict[just_consonants(w)].append(w)
cons_country_set = frozenset(country_dict.keys())

counter = 1
for consonantcy in lang_dict.iterkeys():
    if consonantcy in cons_country_set and len(consonantcy) >= 3:
        print counter, country_dict[consonantcy], lang_dict[consonantcy]
        counter += 1
1 [u'Somalia'] [u'Somali']
2 [u'Turkey'] [u'Turki']
3 [u'Uganda'] [u'Gondi']
4 [u'Chad'] [u'Chad']
5 [u'America'] [u'Maraco']
6 [u'Tonga'] [u'Tonga']
7 [u'Lebanon'] [u'Albanian']
8 [u'Nepal'] [u'Nepali']
9 [u'Slovenia'] [u'Slovene']
10 [u'Slovakia'] [u'Slovak']
11 [u'Armenia', u'Romania'] [u'Romany']
12 [u'Azerbaijan'] [u'Azerbaijani']
13 [u'Germany'] [u'German']
14 [u'Malawi'] [u'Mulwi']
15 [u'Malta'] [u'Yamaltu', u'Malto', u'Malti']
16 [u'Greece'] [u'Ugric']
17 [u'China'] [u'Chin']
18 [u'Ukraine'] [u'Korean', u'Karen']

Well, what do you know. You could argue, I guess, for #3 or #16, but far and away the best answers are #7 and #18 (the first part, anyway). Nice puzzle! And it once again goes to show the power of using NLTK to get members of a category.

Leave a Reply

Required fields are marked *.


This site uses Akismet to reduce spam. Learn how your comment data is processed.