Box Relatives

Thoughts about puzzles, math, coding, and miscellaneous

Wikipedia Regex Search Updated

| 3 Comments

The news first: I’ve updated the Wikipedia Regex Search to include Wiktionary in its results. The Wikipedia results have also been updated to be current as of November 1st.

Now the problem: to test it out, I attempted to solve the most recent Matt Gaffney Contest using the search, but it didn’t turn anything up. Why? Because “Oracle of Omaha” isn’t a full-fledged Wikipedia page, just a redirect, and I exclude redirects from my results.

So what’s the fix here? The obvious fix is to include redirects in my results, but I can’t just include all of them wholesale. Just look at all the pages that redirect to “Condoleezza Rice” to see why. No thanks.

So is there a way to be more judicious about choosing which redirects to use? There must be; after all, Onelook seems to handle it just fine. I’m thinking for now to compare each redirect to a list of known “good” results, maybe from my clue database or the collaborative word list. If a redirect page appears in one of those, then maybe I could include it and just give it the same score as the page it redirects to. (Incidentally, it is in my clue database, but not the collaborative word list — I’ll have to add it.)

Is there another way to determine which redirects to use? I’d love to hear suggestions. Anything I can do to improve my tool would be great.

3 Comments

  1. Did you try excluding redirects tagged with things like {{R from short name}} or in category [[Category:Unprintworthy redirects]]? Don’t know if this exactly matches what you need, but at first glance the unprintworthy thing would seem to be close.

  2. I didn’t even know about that! I will definitely look into that for the next iteration. Thanks!

  3. Pingback: More on ranking Wikipedia pages | Box Relatives

Leave a Reply

Required fields are marked *.


This site uses Akismet to reduce spam. Learn how your comment data is processed.