Box Relatives

Thoughts about puzzles, math, coding, and miscellaneous

The newish Wikipedia Regex Search

| 2 Comments

We’ve had the Wikipedia Regex Search live for a while at Crossword Nexus, and this past week we made a few changes:

  • The search is much, much faster now. It is also easier on our servers, so making that change was win-win.
  • We’ve removed the Wiktionary search. The ranking algorithm we have for Wikipedia works great, but not so much for Wiktionary. We’ll need a new way of ranking common words … any suggestions?
  • We’ve removed the word length limit option and restricted results somewhat. We want to encourage people to download the list and run their searches offline if they want more.

Now if you’re wondering why you should use this thing, let me give you a concrete example of why it’s better than OneLook. Let’s use it to solve the most recent NPR Sunday Puzzle

There is a politician today, sometimes known by his or her full three-word name, whose initials are also the initials of a popular chain of restaurants. Who is the politician and what’s the restaurant?

If you’re like me, you very quickly came up with Hillary Rodham Clinton as the politician. Let’s face it, there just aren’t that many nationally known politicians period, much less ones that go by three-word names. But I couldn’t come up with the restaurant chain, so I asked OneLook …
onelook
OMG over 300 entries I’m not going to sort through all that. Onelook thankfully has a “common words and phrases” restriction, let’s see what that gives us …onelook2
Well, that’s less than helpful. How about on Crossword Nexus? What do we get there?xn
Well, that’s better. Ordering the results the way we do is much better than a simple alphabetical ordering. And 20 results is plenty when the thing you’re looking for is in the top 5.

Anyway, feel free to use this thing. And if you find anything cool with it, I would love to hear about it.

2 Comments

  1. Wiktionary ranking: frequency of appearance in some corpus … maybe Wikipedia?

  2. So, yes, but I’m not sure this would be any better than number of inlinks. Basically, what I’m worried about is that an entry like “pub quiz”, which is great, would look terrible under these metrics.

    (And instead of making my own frequency list, I would definitely use this guy’s: http://norvig.com/mayzner.html)

Leave a Reply

Required fields are marked *.


This site uses Akismet to reduce spam. Learn how your comment data is processed.