November 5, 2012
by Alex
0 comments
Joon recently posted the following on Twitter:
wordplay #puzzler: think of two unrelated phrases, both with enumeration 3, 4. the 3-letter words are synonyms; so are the 4-letter words.
Well, I had just recently downloaded the unbelievably awesome natural language toolkit for Python so I thought I’d test it out on this problem. Here’s what I came up with:
#!/usr/bin/python
from nltk.corpus import wordnet as wn
import re
import sys
myfile = sys.argv[1]
word1_length = int(sys.argv[2])
word2_length = int(sys.argv[3])
def get_synonyms(word, length):
'''
Gets all synonyms of `word` of length `length`
'''
syns = list()
synsets = wn.synsets(word)
for synset in synsets:
for w in synset.lemma_names:
if len(w) == length and w != word and w not in syns:
syns.append(w)
return syns
fid = open(myfile,'r')
dict = [x.rstrip('\r\n') for x in fid.readlines()]
fid.close()
# Slim this down a bit
my_pattern = r'^[a-z]{%i}_[a-z]{%i}$' % (word1_length, word2_length)
my_phrases = [x for x in dict if re.match(my_pattern,x)]
# Go through and check
for p in my_phrases:
w1 = p[:word1_length]
w2 = p[word1_length+1:]
# Get synonyms of the appropriate length
l1 = get_synonyms(w1,word1_length)
l2 = get_synonyms(w2,word2_length)
if len(l1) != 0 and len(l2) != 0:
for t1 in l1:
for t2 in l2:
test_phrase = t1 + '_' + t2
if test_phrase in my_phrases:
print p + ' -> ' + test_phrase
To use it you will need a fairly comprehensive list of phrases — the data dump from Wiktionary is what I used here. Just run it as
puzzler.py enwiki.txt 3 4
and it will spit out:
bad_lots -> big_deal
bad_lots -> big_band
bad_mind -> big_head
big_head -> bad_mind
bum_rush -> rat_race
had_best -> get_well
hit_home -> off_base
off_base -> hit_home
rat_race -> bum_rush
(Yes, WordNet thinks that “big” and “bad” are synonyms.)
You may notice that Joon’s intended answer isn’t among the results — one of those phrases wasn’t on Wiktionary … until I added it (it will probably be in the next data dump).
Anyway! You may notice that the code offers you room to work with other word lengths. Is there anything interesting good there? Well, I kind of like “light switch” and “short-change”. “Well liquors” and “good spirits” is a good one too. Anything else? Feel free to experiment with the code and let me know in the comments.