Box Relatives

Thoughts about puzzles, math, coding, and miscellaneous

FillBot Jr. Update

| 3 Comments

For more on this project, see the FillBot Jr. category

As I mentioned in the comments of the last post, FillBot Jr. has successfully solved a crossword! Now admittedly, it was low-hanging fruit — a Monday Newsday by Gail Grabowski — but I’m happy to have that result as a sort of “proof of concept.” So far the algorithm is ultra-simple: it looks for the answer it has the most confidence in and fills it in, ignoring any potential problems it may have down the road.

Buoyed by this result, I decided to tackle the following day’s puzzle. And it did great — filling in the entire puzzle except for one blank: the crossing of:
NO_E [Not any] and
MI_E [Not yours]
Wait, really? That’s what tripped up my bot? What gives? This is especially infuriating because I have both of those EXACT CLUES in my database.

Well, as you may recall, I am leaning heavily on MySQL’s full-text search which, for a given clue, quickly scours my database finding similar clues and even gives them a numerical value according to how well the clue matches. Well, it turns out that this has a big limitation in the form of stopwords — words that aren’t indexed by MySQL. And, you guessed it — “not”, “any”, and “yours” are on that list. I’m not really sure how to get around this; like I said, I am relying very heavily on this capability of MySQL. Maybe once I add some logic for “checking the crossings” this problem will be mitigated.

In any case, I’m happy with the results so far. Once I add some cross-checking logic, I think I might just be left with optimization improvements. And if it turns out to be decent, I’d love to add it to Crossword Nexus to allow users to upload .puz files to see how the bot handles them.

Thoughts?

3 Comments

  1. I don’t fully understand all of this, but I’m fascinated…

  2. Congratulations on the successful trials.

    My initial thought to your “cross-checking logic” problem was a subroutine that would search for all possible letters that satisfy a given intersection. If the clue database does not offer any support (due to stopwords, e.g.) then the algorithm might simply pick the letter that produces the highest fill score, assuming that score values for crossword entries are available. This is not unlike the “play constructor rather than read all the clues” strategy used by some human solvers in speed competitions. Of course, for the NO_E/MI_E intersection, I would expect M and maybe T to have better fill scores than N, so that fill algorithm would not be universally successful.

    I’d be curious to see how constructors would react to the widespread development of crossword-solving algorithms. I can see the benefit in having a computer assess the overall difficulty of a puzzle; even better would be an assessment of the consistency of difficulty within a puzzle. I’m sure that your algorithm could modified to identify sections of a grid that call for easier or more difficult clues based on the types of entries and crossings.

  3. Dan — I think I need to do a better job of explaining the algorithm. Maybe I’ll try to that next time I post. Essentially, it tries to mimic what a solver would do. At least, that’s the goal.

    Todd — I hadn’t thought for a second what this might be useful for. It was just a little challenge for myself. But I really like the idea of judging the difficulty level of a puzzle. Once I’m done I will have to think about that.

Leave a Reply

Required fields are marked *.


This site uses Akismet to reduce spam. Learn how your comment data is processed.