<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Box Relatives</title>
	<atom:link href="http://alexboisvert.com/musings/feed/" rel="self" type="application/rss+xml" />
	<link>http://alexboisvert.com/musings</link>
	<description>Thoughts about puzzles, math, coding, and miscellaneous</description>
	<lastBuildDate>Wed, 16 May 2012 16:08:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Crosswords LA 2012 puzzles!</title>
		<link>http://alexboisvert.com/musings/2012/05/16/crosswords-la-2012-puzzles/</link>
		<comments>http://alexboisvert.com/musings/2012/05/16/crosswords-la-2012-puzzles/#comments</comments>
		<pubDate>Wed, 16 May 2012 16:08:43 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=200</guid>
		<description><![CDATA[The Crosswords LA 2012 puzzles are here! Get them while they&#8217;re hot! CROSSWORDS LA 2012 Puzzles should be e-mailed to you almost immediately. If you don&#8217;t get them right away, check your spam folder. If you still don&#8217;t see them, &#8230; <a href="http://alexboisvert.com/musings/2012/05/16/crosswords-la-2012-puzzles/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The Crosswords LA 2012 puzzles are here!  Get them while they&#8217;re hot!<br />
<center><br />
<small><strong>CROSSWORDS LA 2012</strong></SMALL></p>
<form action="https://www.paypal.com/cgi-bin/webscr" method="post">
<input type="hidden" name="cmd" value="_s-xclick">
<input type="hidden" name="hosted_button_id" value="VBXJ8LVZ3BCEG">
<input type="image" src="https://www.paypalobjects.com/en_US/i/btn/btn_buynowCC_LG.gif" border="0" name="submit" alt="PayPal - The safer, easier way to pay online!">
	<img alt="" border="0" src="https://www.paypalobjects.com/en_US/i/scr/pixel.gif" width="1" height="1"><br />
</form>
<p></center><br />
Puzzles should be e-mailed to you almost immediately.  If you don&#8217;t get them right away, check your spam folder.  If you still don&#8217;t see them, <script type="text/javascript" src="/javascript/riddle.js"></script> me and I&#8217;ll send them to you myself (once I verify your purchase).</p>
<p>If you want to help promote Crosswords LA (and why wouldn&#8217;t you?  All proceeds go to a <a href="http://readingtokids.org/Home/main.php">very worthy cause</a>) please consider adding the Paypal button to your own site.  Here&#8217;s the code:</p>
<pre>
&lt;form action=&quot;https://www.paypal.com/cgi-bin/webscr&quot; method=&quot;post&quot;&gt;
	&lt;input type=&quot;hidden&quot; name=&quot;cmd&quot; value=&quot;_s-xclick&quot;&gt;
	&lt;input type=&quot;hidden&quot; name=&quot;hosted_button_id&quot; value=&quot;VBXJ8LVZ3BCEG&quot;&gt;
	&lt;input type=&quot;image&quot; src=&quot;https://www.paypalobjects.com/en_US/i/btn/btn_buynowCC_LG.gif&quot; border=&quot;0&quot; name=&quot;submit&quot; alt=&quot;PayPal - The safer, easier way to pay online!&quot;&gt;
	&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;https://www.paypalobjects.com/en_US/i/scr/pixel.gif&quot; width=&quot;1&quot; height=&quot;1&quot;&gt;
&lt;/form&gt;
</pre>
<p>You may want to add the &#8220;check your spam folder&#8221; caveat.  Thanks so much, and enjoy the puzzles!</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/05/16/crosswords-la-2012-puzzles/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FillBot Jr. Update</title>
		<link>http://alexboisvert.com/musings/2012/04/13/fillbot-jr-update/</link>
		<comments>http://alexboisvert.com/musings/2012/04/13/fillbot-jr-update/#comments</comments>
		<pubDate>Fri, 13 Apr 2012 15:41:32 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[FillBot Jr.]]></category>
		<category><![CDATA[puzzles]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=198</guid>
		<description><![CDATA[For more on this project, see the FillBot Jr. category As I mentioned in the comments of the last post, FillBot Jr. has successfully solved a crossword! Now admittedly, it was low-hanging fruit &#8212; a Monday Newsday by Gail Grabowski &#8230; <a href="http://alexboisvert.com/musings/2012/04/13/fillbot-jr-update/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><em>For more on this project, <a href="http://alexboisvert.com/musings/category/fillbot-jr/">see the FillBot Jr. category</a></em></p>
<p>As I mentioned in the comments of the last post, FillBot Jr. has successfully solved a crossword!  Now admittedly, it was low-hanging fruit &#8212; a <a href="http://www.brainsonly.com/servlets-newsday-crossword/newsdaycrosswordPDF?pm=pdf&#038;puzzle=1112122&#038;data=%3CNAME%3E111212%3C%2FNAME%3E%3CTYPE%3E2%3C%2FTYPE%3E">Monday Newsday by Gail Grabowski</a> &#8212; but I&#8217;m happy to have that result as a sort of &#8220;proof of concept.&#8221;  So far the algorithm is ultra-simple: it looks for the answer it has the most confidence in and fills it in, ignoring any potential problems it may have down the road.</p>
<p>Buoyed by this result, I decided to tackle <a href="http://www.brainsonly.com/servlets-newsday-crossword/newsdaycrosswordPDF?pm=pdf&#038;puzzle=1112122&#038;data=%3CNAME%3E111213%3C%2FNAME%3E%3CTYPE%3E2%3C%2FTYPE%3E">the following day&#8217;s puzzle</a>.  And it did great &#8212; filling in the entire puzzle except for one blank: the crossing of:<br />
NO_E [Not any] and<br />
MI_E [Not yours]<br />
Wait, really?  <strong>That&#8217;s</strong> what tripped up my bot?  What gives?  This is especially infuriating because I have both of those EXACT CLUES in my database.</p>
<p>Well, as you may recall, I am leaning heavily on <a href="http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html">MySQL&#8217;s full-text search</a> which, for a given clue, quickly scours my database finding similar clues and even gives them a numerical value according to how well the clue matches.  Well, it turns out that this has a big limitation in the form of <a href="http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html">stopwords</a> &#8212; words that aren&#8217;t indexed by MySQL.  And, you guessed it &#8212; &#8220;not&#8221;, &#8220;any&#8221;, and &#8220;yours&#8221; are on that list.  I&#8217;m not really sure how to get around this; like I said, I am relying very heavily on this capability of MySQL.  Maybe once I add some logic for &#8220;checking the crossings&#8221; this problem will be mitigated.</p>
<p>In any case, I&#8217;m happy with the results so far.  Once I add some cross-checking logic, I think I might just be left with optimization improvements.  And if it turns out to be decent, I&#8217;d love to add it to Crossword Nexus to allow users to upload .puz files to see how the bot handles them.</p>
<p>Thoughts?</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/04/13/fillbot-jr-update/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>FillBot Jr.: Initial Results</title>
		<link>http://alexboisvert.com/musings/2012/04/09/fillbot-jr-initial-results/</link>
		<comments>http://alexboisvert.com/musings/2012/04/09/fillbot-jr-initial-results/#comments</comments>
		<pubDate>Mon, 09 Apr 2012 04:23:26 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[FillBot Jr.]]></category>
		<category><![CDATA[puzzles]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=184</guid>
		<description><![CDATA[If you&#8217;re a regular on this blog, you may remember that I floated the idea of making a crossword-filling algorithm just to see how hard it would be. I wasn&#8217;t sure I&#8217;d ever get around to making it but today &#8230; <a href="http://alexboisvert.com/musings/2012/04/09/fillbot-jr-initial-results/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>If you&#8217;re a regular on this blog, you may remember that I <a href="http://alexboisvert.com/musings/2012/03/25/paging-dr-fill/">floated the idea of making a crossword-filling algorithm</a> just to see how hard it would be.  I wasn&#8217;t sure I&#8217;d ever get around to making it but today I was sick and bedridden and bored (yay?) so I spent a few hours coding something up.  If you&#8217;d like to see more or less what I&#8217;ve done and how it performs on <a href="http://www.brendanemmettquigley.com/2012/04/crossword-424-themeless-monday-results-of-the-ben-tausig-sings-contest.html">a sample themeless puzzle</a>, follow me to the rest of this post.</p>
<p><span id="more-184"></span></p>
<p>Here&#8217;s the basics of it: MySQL has a terrific full-text match based on the <a href="http://en.wikipedia.org/wiki/Tf-idf">tf*idf algorithm</a> which I&#8217;m using to find good fits for clue/entry pairs.  For instance, if I have a four-letter word for &#8220;Fried Cajun side&#8221; you can run the MySQL query</p>
<pre>
SELECT Entry,
MATCH (
Clue
)
AGAINST (
'Fried Cajun Side'
) AS Score
FROM `Clues`
WHERE MATCH (
Clue
)
AGAINST (
'Fried Cajun Side'
)
AND Entry LIKE '____'
LIMIT 0 , 10
</pre>
<p>and get the following results:</p>
<pre>
Entry 	Score
SLAW 	15.20439338684082
OKRA 	14.877264976501465
OKRA 	9.801589965820312
OKRA 	9.801589965820312
OKRA 	9.801589965820312
OKRA 	9.801589965820312
OKRA 	9.801589965820312
OKRA 	9.801589965820312
ROUX 	9.69262981414795
OKRA 	9.69262981414795
</pre>
<p>(OKRA appears several times because it has matched with several different clues.)  Of course, if you knew even one of the letters, this process would immediately give you the correct answer (OKRA).</p>
<p>At each step of the algorithm, I&#8217;m taking these results and adding the &#8220;Score&#8221; to the difference between the top entry and the second-ranked entry.  The entry with the highest sum gets added to the grid and the process is restarted.  For now I&#8217;m not doing any backtracking; I just want to see how this algorithm will do simply marching ahead and filling in entries.</p>
<p>Ready?  Let&#8217;s step through with it on <a href="ttp://www.brendanemmettquigley.com/2012/04/crossword-424-themeless-monday-results-of-the-ben-tausig-sings-contest.html">Brendan&#8217;s grid</a> and see how it does.</p>
<p><strong><i>Step 1: 49 ACROSS: Electrical resistance symbol &#8212; OMEGA</i></strong><br />
We&#8217;re off to a good start.  This clue was an exact match for one I had in my database, and was a much better match than the second-best possibility (which turned out to be SAILS, for whatever reason).</p>
<p><strong><i>Step 2: 10 DOWN: 1961 Charlton Heston epic &#8212; ELCID<br />
Step 3: 39 DOWN: Renault vehicle marketed to the U.S. &#8212; LECAR<br />
Step 4: 33 DOWN: North Carolina town or college &#8212; ELON</i></strong><br />
These are all stone-cold gimmes for anyone who does a lot of crosswords, so it&#8217;s nice to see the algorithm doing well on them too.  Nice to see it reject DUKE for 33-Down; the database&#8217;s clue of &#8220;North Carolina college town&#8221; was a great match.</p>
<p><strong><i>Step 5: 57 ACROSS: ___ Center (home of the Nets and Devils) &#8212; IZOD<br />
Step 6: 1 ACROSS: Model airplane material &#8212; BALSA</i></strong><br />
I was shocked that the algorithm got the IZOD Center so quickly, I guess because I didn&#8217;t know this little bit of trivia.  And now that I&#8217;ve learned it, it&#8217;s already out of date, since those teams have moved to the Prudential Center.  Oh well.</p>
<p><strong><i>Step 7: 6 DOWN: Immune system stuff &#8212; SHREDBLOODCELLS</i></strong><br />
Well, we had to mess up eventually.  A human would never ever have entered this into the grid, but the bot sees the clue in its database that reads [Attack the immune system?] and thinks it&#8217;s got a good match.  Maybe I need some logic for when the clue in the database has a question mark?  This seems especially important for long entries.  (And incidentally, I don&#8217;t have WHITEBLOODCELLS in my database, which is shocking, considering it&#8217;s a totally legit 15-letter answer.  I&#8217;ll have to data-mine Brendan&#8217;s puzzle after I&#8217;m done here to add it.)</p>
<p><strong><i>Step 8: 59 ACROSS: Pummels &#8212; ROUTS</i></strong><br />
Ah, shoot, we&#8217;ve messed up again even though this one doesn&#8217;t look quite so bad.  The intended answer was PELTS.  There&#8217;s only one five-letter entry in my database with the word &#8220;pummels&#8221; in it, so it gets the green light here.</p>
<p><strong><i>Step 9: 35 DOWN: Clothing joint? &#8212; SEAM<br />
Step 10: 8 DOWN: Miso soup enhancer &#8212; MSG<br />
Step 11: 60 ACROSS: Choosing word &#8212; EENY</i></strong><br />
Yeah, it&#8217;s doing fine on these, but let me show the grid at this point:</p>
<pre>
B A L S A . . S _ M . _ E _ _
_ _ _ _ _ . _ H _ S . _ L _ _
_ _ _ _ _ . _ R _ G _ _ C _ _
_ _ _ _ _ _ _ E _ . _ _ I _ _
. . . _ _ _ _ D _ _ _ _ D _ _
_ _ _ _ _ _ _ B . _ _ _ . . .
_ _ _ _ . . _ L _ _ _ _ _ _ E
_ _ _ . . S _ O _ _ . . _ _ L
_ _ _ _ L E _ O _ . . _ _ _ O
. . . _ E A . D _ _ _ _ _ _ N
_ _ _ _ C M _ C _ _ _ _ . . .
O M E G A . _ E _ _ _ _ _ _ _
_ _ _ _ R _ _ L _ . _ _ _ _ _
I Z O D . _ _ L _ . R O U T S
E E N Y . _ _ S . . _ _ _ _ _
</pre>
<p>Anything jump out at you? Something sure jumps out at me: we&#8217;re hardly crossing any words at all.  We absolutely HAVE to use the information we&#8217;ve got in terms of the crossing words or we&#8217;ll never get anywhere.  The algorithm isn&#8217;t doing that, and definitely needs a push in that direction.</p>
<p>Okay, look at the above grid one more time.  What answer can you put in next, even without looking at the clues?  I see an obvious one: _M_ZE <b>has to</b> be AMAZE, right?  Let&#8217;s see what the bot fills in next.</p>
<p><strong><i>Step 12: 27 ACROSS: Chicago-based insurance company that sponsors Manchester United&#8217;s uniforms &#8212; RIO</i></strong><br />
Okay &#8230; that&#8217;s wrong, first of all, and why didn&#8217;t you fill in AMAZE?</p>
<p>So &#8230; this needs work.  Any thoughts on how to make this better?</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/04/09/fillbot-jr-initial-results/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Paging Dr. Fill</title>
		<link>http://alexboisvert.com/musings/2012/03/25/paging-dr-fill/</link>
		<comments>http://alexboisvert.com/musings/2012/03/25/paging-dr-fill/#comments</comments>
		<pubDate>Sun, 25 Mar 2012 22:02:22 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[FillBot Jr.]]></category>
		<category><![CDATA[puzzles]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=181</guid>
		<description><![CDATA[I&#8217;m sure most of my readers have heard of Dr. Fill, the Matt Ginsberg solving computer program that competed in this past ACPT. Now Matt is an artificial intelligence expert, so I&#8217;m sure Dr. Fill does about as well as &#8230; <a href="http://alexboisvert.com/musings/2012/03/25/paging-dr-fill/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m sure most of my readers have heard of Dr. Fill, the Matt Ginsberg solving computer program that competed in this past ACPT. Now Matt is an artificial intelligence expert, so I&#8217;m sure Dr. Fill does about as well as a computer program could at solving crosswords. My question is: how hard would it be to write a computer program that would give about 80% of the solving capability of Dr. Fill? And to make it especially easy on ourselves, let&#8217;s presume we already have a large database of crossword clues and entries, and a relatively fast, effective way of ranking entries with a clue and a letter pattern (e.g. given [Melodic passages] and the letter pattern ?R???? it would return ARIOSI with score 19 and ARIOSO/ARIOSE with score 10). Since I have this setup already, I thought this might be a good starting point. <img src='http://alexboisvert.com/musings/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>What would the algorithm look like from this point? Here are my initial thoughts.</p>
<p><span id="more-181"></span></p>
<p>I think it would need to be a recursive/iterative algorithm. At each step we have a partially filled-in grid. The algorithm would:</p>
<ul>
<li>Go to each position in the grid and apply the above process to get possibilities for that position.  If no possibilities are returned, simply select all possibilities that fit and assign them a low score.</li>
<li>Take the highest-scoring match and place it in the grid.</li>
<li>Repeat with the new grid.</li>
</ul>
<p>If we get to a point where there are no possibilities we have to backtrack.  That subroutine would go to (I guess) the most recently filled-in answer, remove it and go to the next possibility.  Meanwhile, once the grid is completed we stop, unless there&#8217;s some way to &#8220;check&#8221; the grid.</p>
<p>Is there a better way to do this, or some tweaks to the above I can add?  I kind of want to do this to see how well it would perform.</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/03/25/paging-dr-fill/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Viewing your permissions on SQL Server</title>
		<link>http://alexboisvert.com/musings/2012/03/20/viewing-your-permissions-on-sql-server/</link>
		<comments>http://alexboisvert.com/musings/2012/03/20/viewing-your-permissions-on-sql-server/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 17:04:48 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[coding]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=176</guid>
		<description><![CDATA[I wanted to see my permissions on SQL Server and Googling it was proving very hard. So I&#8217;m adding this here in case someone might find it useful. This will look at all the databases on the current server and &#8230; <a href="http://alexboisvert.com/musings/2012/03/20/viewing-your-permissions-on-sql-server/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I wanted to see my permissions on SQL Server and Googling it was proving very hard.  So I&#8217;m adding this here in case someone might find it useful.  This will look at all the databases on the current server and list all of your permissions for each.  As a bonus, it will list the current server as well.</p>
<pre>
DECLARE @myserver sysname;
SET @myserver = (
        SELECT
        	s.name
        FROM
        	sys.servers s
        WHERE
        	s.server_id = 0
    );

SELECT
	@myserver as 'Server'
,   d.name AS 'Database'
,   fbp.permission_name AS Permission
FROM
	(
	    SELECT
	    	'DATABASE' AS mytype
	    ,   *
	    FROM
	    	sys.databases
	) d
       JOIN sys.fn_builtin_permissions(null) fbp
            ON  d.mytype = fbp.class_desc
WHERE
	Has_perms_by_name(quotename(d.name) , 'database' , fbp.permission_name) = 1
ORDER BY
	d.name
</pre>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/03/20/viewing-your-permissions-on-sql-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Wiki Ranking II &#8211; Lessons for next time</title>
		<link>http://alexboisvert.com/musings/2012/03/16/wiki-ranking-ii-lessons-for-next-time/</link>
		<comments>http://alexboisvert.com/musings/2012/03/16/wiki-ranking-ii-lessons-for-next-time/#comments</comments>
		<pubDate>Fri, 16 Mar 2012 04:17:05 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[coding]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=166</guid>
		<description><![CDATA[All right, the new Wikipedia sort is live on CrosswordNexus.com and this time I am allowing users to download the original list to play with offline. Now that I&#8217;ve played with it a bit, I have some ideas for next &#8230; <a href="http://alexboisvert.com/musings/2012/03/16/wiki-ranking-ii-lessons-for-next-time/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>All right, the new Wikipedia sort <a href="http://crosswordnexus.com.wiki.php">is live on CrosswordNexus.com</a> and this time I am allowing users to download the original list to play with offline. Now that I&#8217;ve played with it a bit, I have some ideas for next time that I&#8217;m going to gather here. If you have some ideas too, feel free to chip in.</p>
<p>First observation: I don&#8217;t like the distribution of scores. I actually made a histogram of the distribution which you can see here:</p>
<div id="attachment_167" class="wp-caption aligncenter" style="width: 630px"><a href="http://alexboisvert.com/musings/wp-content/uploads/2012/03/WikiHistogram.png"><img class="size-large wp-image-167" title="WikiHistogram" src="http://alexboisvert.com/musings/wp-content/uploads/2012/03/WikiHistogram-1024x509.png" alt="Histogram of Wikipedia rankings" width="620" height="308" /></a><p class="wp-caption-text">Click to embiggen</p></div>
<p><span id="more-166"></span></p>
<p>Looks okay, right? Except &#8230; the vast majority of the useful entries are clustered in the 97-100 range. That&#8217;s way too tight a range. Also, no one will care at all about pretty much anything scored 50 or below. That&#8217;s way too loose a range for the junk. So next time I need to tailor the histogram to make the bars on the right smaller and the bars on the left bigger. Of course, this will test my sorting algorithm &#8212; if it ranks things wrong there will be a bigger gap.</p>
<p>Second: Some things that we don&#8217;t want get too high a ranking. <a href="http://alexboisvert.com/musings/2012/03/08/ranking-wikipedia-pages/">I mentioned in the last post</a> that Wikipedia seems to have a bias toward geographical sites. But it also has a bias toward things Americans tend not to care about &#8230; notably, soccer. Romario, Ruud van Nistelrooy and Guus Hiddink all rate 100 on the site, and there&#8217;s no way they will ever appear in an American crossword.</p>
<p>Is there a fix to this? Not really &#8230; unless we cheat a little bit. One bit of information I didn&#8217;t use when making the current rankings is the list of categories assigned to each article. So I can simply add some logic in the code that says something like &#8220;If the article has a category that starts &#8220;Cities&#8221;, multiply the page length by 0.8, and if it has a category that ends &#8220;footballers&#8221;, multiply the page length by 0.5.&#8221; The intent is to bring these articles to a length that an American would assign to them if he were making the page. Will it hurt some legitimate articles? Of course. But it will help weed out some of the chaff as well.</p>
<p>Thoughts? Have you been playing with it any?</p>
<p>P.S. My son is doing very well. His cancer is in remission but he is at a high risk of infection, so we are being very careful. And right now he is in my arms and GO TO SLEEP WHY AREN&#8217;T YOU GOING TO SLEEP?!?</p>
<p>P.P.S. Have fun at the ACPT everyone!</p>
<p>P.P.P.S. &#8220;Weed out the chaff&#8221; is a terrible mixed metaphor.</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/03/16/wiki-ranking-ii-lessons-for-next-time/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ranking Wikipedia Pages</title>
		<link>http://alexboisvert.com/musings/2012/03/08/ranking-wikipedia-pages/</link>
		<comments>http://alexboisvert.com/musings/2012/03/08/ranking-wikipedia-pages/#comments</comments>
		<pubDate>Thu, 08 Mar 2012 04:34:48 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[coding]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=156</guid>
		<description><![CDATA[The most interesting part of my Crossword Nexus website is the Wikipedia Regex search, and the most interesting part of that is the ordering of results. I didn&#8217;t want it to return results alphabetically &#8212; I wanted it to return &#8230; <a href="http://alexboisvert.com/musings/2012/03/08/ranking-wikipedia-pages/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The most interesting part of my <a href="http://crosswordnexus.com">Crossword Nexus</a> website is the Wikipedia Regex search, and the most interesting part of that is the ordering of results.  I didn&#8217;t want it to return results alphabetically &#8212; I wanted it to return results based on relevance.  But how can you automatically determine if a Wikipedia article is relevant?  Well, the method I implemented was ordering by inlinks.  The more links there were to an article, the more interesting it should be, right?</p>
<p>For the most part, this works pretty well.  But let&#8217;s ask the site for the best results of the form <strong>??E?L??</strong>.</p>
<p>Before we go on, try to think of some good crossword entries that would fit this pattern, preferably ones with Wikipedia pages.<br />
<span id="more-156"></span></p>
<p>Let&#8217;s see &#8230; there&#8217;s Kremlin and gremlin, Sheila E., The Blob, shellac &#8230; quite a few options.  What are the top 10 results returned from Crossword Nexus?</p>
<p><a href="http://en.wikipedia.org/wiki/Clenleu">Clenleu</a><br />
<a href="http://en.wikipedia.org/wiki/Breilly">Breilly</a><br />
<a href="http://en.wikipedia.org/wiki/Kremlin">Kremlin</a><br />
<a href="http://en.wikipedia.org/wiki/Fresles">Fresles</a><br />
<a href="http://en.wikipedia.org/wiki/Creully">Creully</a><br />
<a href="http://en.wikipedia.org/wiki/Treclun">Treclun</a><br />
<a href="http://en.wikipedia.org/wiki/Bresles">Bresles</a><br />
<a href="http://en.wikipedia.org/wiki/Rieulay">Rieulay</a><br />
<a href="http://en.wikipedia.org/wiki/Clesles">Clesles</a><br />
<a href="http://en.wikipedia.org/wiki/Treslon">Treslon</a></p>
<p>That&#8217;s &#8230; extremely ugly.  Wait, what the heck are those things even?  Communes in France, are you kidding me?  Is something messed up?</p>
<p>No, nothing&#8217;s messed up.  Take a look at <a href="http://en.wikipedia.org/wiki/Clenleu">Clenleu</a>&#8216;s page.  See all those links at the bottom (click on &#8220;Show&#8221; on the &#8220;Communes of the Pas-de-Calais department&#8221; tab)?  Yeah, all of those pages in turn link back to Clenleu.  In fact, <a href="http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Clenleu&#038;limit=500">about 900 pages</a> in all link to Clenleu, which kills the results.</p>
<p>Luckily, there&#8217;s an easy way around this.  If we only count links in the article text itself, we can skip all of those useless links.  If we do that, Clenleu only has (drum roll please) 3 inlinks.  So that takes care of that.  Here are the new top 10 results with this method:</p>
<pre>
Kremlin 396
Siedlce 239
Heerlen 226
Shellac 160
Sheila E. 139
Peebles 137
Ixelles 133
Breclav 88
Feedlot 82
Preslav 72
</pre>
<p>Ahhhh, better.  Still no &#8220;gremlin&#8221; or &#8220;The Blob&#8221; but those finished 12th and 13th respectively.  So that&#8217;s it, right?  We&#8217;ve got our new ranking metric?</p>
<p>Well, hold on, why stop there?  There might be other good possibilities, too.  Let&#8217;s look at length of article, number of languages an article is translated into, and recency of last edit.  Maybe these have some value too.</p>
<p>Length of article:</p>
<pre>
MHealth 55963
The Play 21931
Tien len 21263
Gremlin 17843
The Kliq 17477
Shellac 16519
Heerlen 16194
Sheila E. 14344
Eden Log 13981
Ixelles 13020
</pre>
<p>Uh, MHealth?  I guess <a href="http://en.wikipedia.org/wiki/MHealth">it&#8217;s a thing.</a>  People like to write about it, at least.  &#8220;The Play&#8221; didn&#8217;t mean anything to me until I noticed it was referring to the &#8220;The band is on the field!&#8221; play.  This is pretty good, too.</p>
<p>Next up, number of translations:</p>
<pre>
Kremlin 38
Apelles 34
Heerlen 31
Ixelles 29
Sterlet 27
Shellac 26
Siedlce 25
Buellas 25
Preslav 24
Usellus 23
</pre>
<p>This &#8230; is not so great.  I fear that things translated into many languages might be mostly geographical sites.</p>
<p>I have high hopes for recency of last edit.  Let&#8217;s take a look:</p>
<pre>
O'Neills 1325721157
Kvevlax 1325693146
Alex Lee 1325693087
The Kliq 1325642247
Sheila E. 1325640534
Buellas 1325616464
The Flow 1325616163
Poe's law 1325612070
Rieulay 1325611112
Uxelles 1325605593
</pre>
<p>Oh, no, this is the worst one yet.  And what is so important about Buellas that it&#8217;s translated into so many languages and updated so frequently?</p>
<p>Based on these results, I&#8217;m thinking of going with a metric that&#8217;s 70% inlinks and 30% article size.  But what do you think?  Should I exclude the other two completely?  I&#8217;d like to get a little feedback before going live.  Thanks!</p>
<p><strong>UPDATE (3/7/2012 9:31 PM)</strong> In case you were curious how the 70/30 split would look, here are the first few:</p>
<pre>
Sheila E.
Shellac
Heerlen
Siedlce
Ixelles
Gremlin
Apelles
Kremlin
Peebles
Feedlot
The Play
The Bled
EHealth
Preslav
The Blob
Breclav
Abe clan
Wheelie
</pre>
<p>That&#8217;s pretty good.  Cities and communes and the like are simply overrated by Wikipedia, so I&#8217;m not sure we could ever get rid of things like Heerlen and Siedlce and Ixelles.  Thoughts?</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/03/08/ranking-wikipedia-pages/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Life changes</title>
		<link>http://alexboisvert.com/musings/2012/02/15/life-changes/</link>
		<comments>http://alexboisvert.com/musings/2012/02/15/life-changes/#comments</comments>
		<pubDate>Wed, 15 Feb 2012 05:12:02 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[miscellaneous]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=153</guid>
		<description><![CDATA[This past Wednesday my two year-old son was diagnosed with leukemia. At this age the chances of a cure are surprisingly high and the doctors have been extremely pleased with his progress so far. Still, it is an incredibly scary &#8230; <a href="http://alexboisvert.com/musings/2012/02/15/life-changes/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><iframe width="420" height="315" src="http://www.youtube.com/embed/1Mq14yyBH60" frameborder="0" allowfullscreen></iframe></p>
<p>This past Wednesday my two year-old son was diagnosed with leukemia. At this age the chances of a cure are surprisingly high and the doctors have been extremely pleased with his progress so far. Still, it is an incredibly scary process which is draining in so many ways.</p>
<p>I have been hesitant to write about this, but I now think it is necessary to do so. It doesn&#8217;t hurt to raise a little awareness about cancer. And writing this now will allow me to write more about it in the coming weeks, months and years as his treatment progresses.  I&#8217;m sure I will have lots to say on the subject.</p>
<p>Many people have asked how they can help. I would ask you to consider giving blood or (especially) platelets. A timely transfusion of platelets early on in the process may have saved my son&#8217;s life. If you are eligible and can find the time, please consider donating.</p>
<p>Also: I don&#8217;t know what this will do to my schedule quite yet, but I am guessing that a lot of non-essential activities will fall by the wayside. Don&#8217;t expect much from me on Twitter or here for a little while. I also expect <a href="http://twitter.com/fakewillshortz" title="@FakeWillShortz">@FakeWillShortz</a> to slow down his rate of tweeting, though he&#8217;ll be taking on some new writers soon.</p>
<p>Thanks for reading.</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/02/15/life-changes/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Crossword Butler</title>
		<link>http://alexboisvert.com/musings/2012/01/23/crossword-butler/</link>
		<comments>http://alexboisvert.com/musings/2012/01/23/crossword-butler/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 04:56:34 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[puzzles]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=150</guid>
		<description><![CDATA[I am pleased to announce the launch of Crossword Butler v. 2.0 (let&#8217;s call it beta 1). Those of you who remember the original Crossword Butler will notice some differences from the old version, namely: 1. It is now an &#8230; <a href="http://alexboisvert.com/musings/2012/01/23/crossword-butler/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I am pleased to announce the launch of Crossword Butler v. 2.0 (let&#8217;s call it beta 1).  Those of you who remember the original Crossword Butler will notice some differences from the old version, namely:</p>
<p>1. It is now an online interface, accessible at <a href="http://crosswordbutler.com">http://crosswordbutler.com</a><br />
2. It doesn&#8217;t blindly get all available crosswords; only the ones from providers who have given their permission<br />
3. Most interestingly (I think), it gives independent crossword constructors a site to host their puzzles and distribute them easily for free or for profit.</p>
<p>Right now the site is essentially a souped-up puzzle pointers page, that in addition to most of the standard links, also links to puzzles like Matt Gaffney&#8217;s weekly contest and Andrew Ries&#8217;s weekly Rows Garden puzzle.  But the fun part will begin when independent constructors start using the site to host their puzzles.  Right now there isn&#8217;t a one-stop shopping destination for independent puzzlemakers, so constructors have to set up Paypal accounts, websites, e-mail lists, etc. to be able to distribute their work.  Crossword Butler will take care of all of that; all a constructor has to do is upload the puzzles to the site (through a simple interface).  If the puzzle is paid, I would take a (small) cut; if the puzzle is free I&#8217;ll happily host it free of charge.</p>
<p>Feel free to <a href="http://crosswordbutler.com">become a member of the site</a> and poke around.  If you have suggestions, comments, questions, leave a comment below!</p>
<p>Thanks for reading.  This is my grand experiment this year &#8212; hope it works.</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/01/23/crossword-butler/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Ben Sheets vs. Barry Zito</title>
		<link>http://alexboisvert.com/musings/2012/01/20/ben-sheets-vs-barry-zito/</link>
		<comments>http://alexboisvert.com/musings/2012/01/20/ben-sheets-vs-barry-zito/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 18:01:55 +0000</pubDate>
		<dc:creator>Alex</dc:creator>
				<category><![CDATA[baseball]]></category>

		<guid isPermaLink="false">http://alexboisvert.com/musings/?p=147</guid>
		<description><![CDATA[When I read Moneyball a while back, it was already a few years old, and already seemed it (a whole chapter devoted to Scott Hatteberg?!) I don&#8217;t remember all of it, but one part in particular stood out for me &#8230; <a href="http://alexboisvert.com/musings/2012/01/20/ben-sheets-vs-barry-zito/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When I read Moneyball a while back, it was already a few years old, and already seemed it (a whole chapter devoted to Scott Hatteberg?!)  I don&#8217;t remember all of it, but one part in particular stood out for me at the time:</p>
<blockquote><p>The hardest thing,&#8221; says Billy [Beane], &#8220;is there is a certain pride, or lack of pride, required to do this right.  You take a guy high no one else likes and it makes you uncomfortable.  But I mean, really, who gives a f**k where guys are taken?  Remember Zito?  Everyone said we were nuts to take Zito with the ninth pick of the draft.  And we <em>knew</em> everyone was going to say that.  One f**king month later it&#8217;s clear we kicked everyone&#8217;s ass.</p>
<p>&#8230;</p>
<p>A lot of people in the room have forgotten that the scouting department hadn&#8217;t wanted to take Barry Zito because Barry Zito threw an 88-mph fastball.  They preferred a flamethrower named Ben Sheets.  &#8220;Billy [Beane] made us take Zito,&#8221; Bogie [a scout] later confesses.</p></blockquote>
<p>Well, now that both guys&#8217; careers are essentially over, we can ask the question: who had the better career, Zito or Sheets?  Did Billy really kick everyone&#8217;s ass with that pick?  Or were the scouts right?</p>
<p><span id="more-147"></span></p>
<p>I have chosen to measure the careers by fWAR (Fangraphs Wins Above Replacement) because I think that&#8217;s probably the best way.  (If you want to do it some other way, be my guest, but bear with me for this part.)  Now, before I reveal who has accumulated more WAR over his career, what do you think the answer will be, based on what you know about both careers?</p>
<p>Personally, my guess was Zito.  I didn&#8217;t follow either guy especially closely during their careers, but I recall that while Sheets probably had the higher peak, Zito was much, much more durable, and he wasn&#8217;t so bad himself, even winning a Cy Young award along the way.  Sure, he&#8217;s been awful since joining the Giants, but he&#8217;s still been above replacement level.  So I would have guessed Zito would have accumulated more WAR, maybe about 15% more.</p>
<p>Well, here&#8217;s what Fangraphs says &#8212; Zito: 30.8 WAR.  Sheets: 31.7.</p>
<p>So, yeah.  Sheets&#8217; peak was WAY higher than Zito&#8217;s.  And it actually gets better &#8212; the FANS prediction for Zito this year is 0.9 WAR, which means he and Sheets would be exactly even in WAR at year&#8217;s end.</p>
<p>&#8220;But wait,&#8221; you might say.  &#8220;It&#8217;s not fair to compare their entire careers.  The A&#8217;s would only have had control over their pick for six years, before he hit free agency.  Who accumulated more WAR over those first six years?&#8221;  Good point, theoretical reader.  Let&#8217;s take a look: during Zito&#8217;s time with the A&#8217;s, he accumulated 24.2 WAR.  Meanwhile, in Sheets&#8217;s first six years with the Brewers* he accumulated 24.5 WAR.</p>
<p><em>* I&#8217;m not sure when he would have first hit free agency &#8212; the Brewers bought out at least one year of it with a contract extension.  This number might actually be higher.</em></p>
<p>So to recap: the A&#8217;s didn&#8217;t exactly &#8220;kick everyone&#8217;s ass&#8221; with this pick.  If anything, it was a wash.  But there is an amusing postscript to the story &#8212; the A&#8217;s did end up signing Ben Sheets to a one-year contract for $10 million at the end of his career.  He gave them 0.6 WAR for their efforts.  Guess they just picked him up at the wrong time.</p>
]]></content:encoded>
			<wfw:commentRss>http://alexboisvert.com/musings/2012/01/20/ben-sheets-vs-barry-zito/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

