{"id":184,"date":"2012-04-09T04:23:26","date_gmt":"2012-04-09T04:23:26","guid":{"rendered":"http:\/\/alexboisvert.com\/musings\/?p=184"},"modified":"2012-04-13T15:22:18","modified_gmt":"2012-04-13T15:22:18","slug":"fillbot-jr-initial-results","status":"publish","type":"post","link":"https:\/\/alexboisvert.com\/musings\/2012\/04\/09\/fillbot-jr-initial-results\/","title":{"rendered":"FillBot Jr.: Initial Results"},"content":{"rendered":"<p>If you&#8217;re a regular on this blog, you may remember that I <a href=\"http:\/\/alexboisvert.com\/musings\/2012\/03\/25\/paging-dr-fill\/\">floated the idea of making a crossword-filling algorithm<\/a> just to see how hard it would be.  I wasn&#8217;t sure I&#8217;d ever get around to making it but today I was sick and bedridden and bored (yay?) so I spent a few hours coding something up.  If you&#8217;d like to see more or less what I&#8217;ve done and how it performs on <a href=\"http:\/\/www.brendanemmettquigley.com\/2012\/04\/crossword-424-themeless-monday-results-of-the-ben-tausig-sings-contest.html\">a sample themeless puzzle<\/a>, follow me to the rest of this post.<\/p>\n<p><!--more--><\/p>\n<p>Here&#8217;s the basics of it: MySQL has a terrific full-text match based on the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Tf-idf\">tf*idf algorithm<\/a> which I&#8217;m using to find good fits for clue\/entry pairs.  For instance, if I have a four-letter word for &#8220;Fried Cajun side&#8221; you can run the MySQL query<br \/>\n[SQL]<br \/>\nSELECT Entry,<br \/>\nMATCH (<br \/>\nClue<br \/>\n)<br \/>\nAGAINST (<br \/>\n&#8216;Fried Cajun Side&#8217;<br \/>\n) AS Score<br \/>\nFROM `Clues`<br \/>\nWHERE MATCH (<br \/>\nClue<br \/>\n)<br \/>\nAGAINST (<br \/>\n&#8216;Fried Cajun Side&#8217;<br \/>\n)<br \/>\nAND Entry LIKE &#8216;____&#8217;<br \/>\nLIMIT 0 , 10<br \/>\n[\/SQL]<br \/>\nand get the following results:<\/p>\n<pre>\r\nEntry \tScore\r\nSLAW \t15.20439338684082\r\nOKRA \t14.877264976501465\r\nOKRA \t9.801589965820312\r\nOKRA \t9.801589965820312\r\nOKRA \t9.801589965820312\r\nOKRA \t9.801589965820312\r\nOKRA \t9.801589965820312\r\nOKRA \t9.801589965820312\r\nROUX \t9.69262981414795\r\nOKRA \t9.69262981414795\r\n<\/pre>\n<p>(OKRA appears several times because it has matched with several different clues.)  Of course, if you knew even one of the letters, this process would immediately give you the correct answer (OKRA).<\/p>\n<p>At each step of the algorithm, I&#8217;m taking these results and adding the &#8220;Score&#8221; to the difference between the top entry and the second-ranked entry.  The entry with the highest sum gets added to the grid and the process is restarted.  For now I&#8217;m not doing any backtracking; I just want to see how this algorithm will do simply marching ahead and filling in entries.<\/p>\n<p>Ready?  Let&#8217;s step through with it on <a href=\"ttp:\/\/www.brendanemmettquigley.com\/2012\/04\/crossword-424-themeless-monday-results-of-the-ben-tausig-sings-contest.html\">Brendan&#8217;s grid<\/a> and see how it does.<\/p>\n<p><strong><i>Step 1: 49 ACROSS: Electrical resistance symbol &#8212; OMEGA<\/i><\/strong><br \/>\nWe&#8217;re off to a good start.  This clue was an exact match for one I had in my database, and was a much better match than the second-best possibility (which turned out to be SAILS, for whatever reason).<\/p>\n<p><strong><i>Step 2: 10 DOWN: 1961 Charlton Heston epic &#8212; ELCID<br \/>\nStep 3: 39 DOWN: Renault vehicle marketed to the U.S. &#8212; LECAR<br \/>\nStep 4: 33 DOWN: North Carolina town or college &#8212; ELON<\/i><\/strong><br \/>\nThese are all stone-cold gimmes for anyone who does a lot of crosswords, so it&#8217;s nice to see the algorithm doing well on them too.  Nice to see it reject DUKE for 33-Down; the database&#8217;s clue of &#8220;North Carolina college town&#8221; was a great match.<\/p>\n<p><strong><i>Step 5: 57 ACROSS: ___ Center (home of the Nets and Devils) &#8212; IZOD<br \/>\nStep 6: 1 ACROSS: Model airplane material &#8212; BALSA<\/i><\/strong><br \/>\nI was shocked that the algorithm got the IZOD Center so quickly, I guess because I didn&#8217;t know this little bit of trivia.  And now that I&#8217;ve learned it, it&#8217;s already out of date, since those teams have moved to the Prudential Center.  Oh well.<\/p>\n<p><strong><i>Step 7: 6 DOWN: Immune system stuff &#8212; SHREDBLOODCELLS<\/i><\/strong><br \/>\nWell, we had to mess up eventually.  A human would never ever have entered this into the grid, but the bot sees the clue in its database that reads [Attack the immune system?] and thinks it&#8217;s got a good match.  Maybe I need some logic for when the clue in the database has a question mark?  This seems especially important for long entries.  (And incidentally, I don&#8217;t have WHITEBLOODCELLS in my database, which is shocking, considering it&#8217;s a totally legit 15-letter answer.  I&#8217;ll have to data-mine Brendan&#8217;s puzzle after I&#8217;m done here to add it.)<\/p>\n<p><strong><i>Step 8: 59 ACROSS: Pummels &#8212; ROUTS<\/i><\/strong><br \/>\nAh, shoot, we&#8217;ve messed up again even though this one doesn&#8217;t look quite so bad.  The intended answer was PELTS.  There&#8217;s only one five-letter entry in my database with the word &#8220;pummels&#8221; in it, so it gets the green light here.<\/p>\n<p><strong><i>Step 9: 35 DOWN: Clothing joint? &#8212; SEAM<br \/>\nStep 10: 8 DOWN: Miso soup enhancer &#8212; MSG<br \/>\nStep 11: 60 ACROSS: Choosing word &#8212; EENY<\/i><\/strong><br \/>\nYeah, it&#8217;s doing fine on these, but let me show the grid at this point:<\/p>\n<pre>\r\nB A L S A . . S _ M . _ E _ _\r\n_ _ _ _ _ . _ H _ S . _ L _ _\r\n_ _ _ _ _ . _ R _ G _ _ C _ _\r\n_ _ _ _ _ _ _ E _ . _ _ I _ _\r\n. . . _ _ _ _ D _ _ _ _ D _ _\r\n_ _ _ _ _ _ _ B . _ _ _ . . .\r\n_ _ _ _ . . _ L _ _ _ _ _ _ E\r\n_ _ _ . . S _ O _ _ . . _ _ L\r\n_ _ _ _ L E _ O _ . . _ _ _ O\r\n. . . _ E A . D _ _ _ _ _ _ N\r\n_ _ _ _ C M _ C _ _ _ _ . . .\r\nO M E G A . _ E _ _ _ _ _ _ _\r\n_ _ _ _ R _ _ L _ . _ _ _ _ _\r\nI Z O D . _ _ L _ . R O U T S\r\nE E N Y . _ _ S . . _ _ _ _ _\r\n<\/pre>\n<p>Anything jump out at you? Something sure jumps out at me: we&#8217;re hardly crossing any words at all.  We absolutely HAVE to use the information we&#8217;ve got in terms of the crossing words or we&#8217;ll never get anywhere.  The algorithm isn&#8217;t doing that, and definitely needs a push in that direction.<\/p>\n<p>Okay, look at the above grid one more time.  What answer can you put in next, even without looking at the clues?  I see an obvious one: _M_ZE <b>has to<\/b> be AMAZE, right?  Let&#8217;s see what the bot fills in next.<\/p>\n<p><strong><i>Step 12: 27 ACROSS: Chicago-based insurance company that sponsors Manchester United&#8217;s uniforms &#8212; RIO<\/i><\/strong><br \/>\nOkay &#8230; that&#8217;s wrong, first of all, and why didn&#8217;t you fill in AMAZE?<\/p>\n<p>So &#8230; this needs work.  Any thoughts on how to make this better?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you&#8217;re a regular on this blog, you may remember that I floated the idea of making a crossword-filling algorithm just to see how hard it would be. I wasn&#8217;t sure I&#8217;d ever get around to making it but today &hellip; <a href=\"https:\/\/alexboisvert.com\/musings\/2012\/04\/09\/fillbot-jr-initial-results\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,11,5],"tags":[],"class_list":["post-184","post","type-post","status-publish","format-standard","hentry","category-coding","category-fillbot-jr","category-puzzles"],"_links":{"self":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/184","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/comments?post=184"}],"version-history":[{"count":11,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/184\/revisions"}],"predecessor-version":[{"id":196,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/posts\/184\/revisions\/196"}],"wp:attachment":[{"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/media?parent=184"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/categories?post=184"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/alexboisvert.com\/musings\/wp-json\/wp\/v2\/tags?post=184"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}