<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dragons in the Algorithm</title>
	<atom:link href="http://mcherm.com/feed" rel="self" type="application/rss+xml" />
	<link>http://mcherm.com</link>
	<description>Adventures in Programming</description>
	<lastBuildDate>Tue, 08 Jun 2010 18:54:48 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Viewing a dependency tree in Maven</title>
		<link>http://mcherm.com/permalinks/1/viewing-a-dependency-tree-in-maven</link>
		<comments>http://mcherm.com/permalinks/1/viewing-a-dependency-tree-in-maven#comments</comments>
		<pubDate>Tue, 08 Jun 2010 18:54:48 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=477</guid>
		<description><![CDATA[To find out what dependencies you are getting and from where, execute &#8220;mvn dependency:tree&#8221;. Send it to a file using &#8220;mvn dependency:tree -Doutput=file&#8221;.
]]></description>
			<content:encoded><![CDATA[<p>To find out what dependencies you are getting and from where, execute &#8220;mvn dependency:tree&#8221;. Send it to a file using &#8220;mvn dependency:tree -Doutput=file&#8221;.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/viewing-a-dependency-tree-in-maven/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Election Guide, May 2010</title>
		<link>http://mcherm.com/permalinks/1/election-guide-may-2010</link>
		<comments>http://mcherm.com/permalinks/1/election-guide-may-2010#comments</comments>
		<pubDate>Mon, 17 May 2010 02:03:17 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Politics]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=470</guid>
		<description><![CDATA[This coming Tuesday, we have primary elections. I have been doing my research on the candidates for the various races &#8212; all primary elections, and I am registered as a Democrat. I will summarize the results of that research here along with my endorsements and intended votes.
Race #1: Governor:
Dan Onorato
Dan is the front-runner for the [...]]]></description>
			<content:encoded><![CDATA[<p>This coming Tuesday, we have primary elections. I have been doing my research on the candidates for the various races &#8212; all primary elections, and I am registered as a Democrat. I will summarize the results of that research here along with my endorsements and intended votes.<span id="more-470"></span></p>
<p>Race #1: Governor:</p>
<p style="padding-left: 30px;"><a href="http://www.voteonorato.com/">Dan Onorato</a></p>
<p style="padding-left: 60px;">Dan is the front-runner for the governor&#8217;s race: polls suggest that he has all-but-clinched the Democratic nomination. One of his strongest advantages is that polls show him doing well against the likely Republican nominee. Dan is a former county executive who appears, based on his position papers and newspaper articles to be a reasonable, competent man who I would be happy to support for governor.</p>
<p style="padding-left: 30px;"><a href="http://joehoeffel2010.com/">Joe Hoeffel</a></p>
<p style="padding-left: 60px;">Joe Hoeffel is an unabashed liberal (sorry: conservatives have made that a bad word, so now they call them &#8220;progressives&#8221;). His publicly taken positions include support for gay marriage and imposing a progressive income tax. What can I say&#8230; I liked most all of his positions.</p>
<p style="padding-left: 30px;"><a href="http://www.williams4governor.com/">Anthony Williams</a></p>
<p style="padding-left: 60px;">I advice anyone NOT to vote for Mr. Williams. It appears to me that he came from nowhere with significant amounts of funding from what was basically a single rich source. And his major issue appears to be school vouchers: moving money out of public schools when students go to private schools.</p>
<p style="padding-left: 30px;"><a href="http://www.jackwagner.org/">Jack Wagner</a></p>
<p style="padding-left: 60px;">I learned much less about Mr. Wagner in my research. He presented himself rather well in the debate, but did not (in my mind) distinguish himself.</p>
<p style="padding-left: 30px;">I will be voting for *<strong>Joe Hoeffel*</strong>. I agree with his positions on basically every issue for which he had a position paper &#8212; including some which were rather controversial.  And since it appears that Dan Onorato has the nomination sewed up I feel I can vote my heart rather than trying to vote strategically.</p>
<p>Race #2: Lieutenant Governor:</p>
<p style="padding-left: 30px;"><a href="http://www.saidel2010.com/">Jonathan Saidel</a></p>
<p style="padding-left: 60px;">Jonathan Saidel, former Philadelphia comptroller, has political experience, money and democratic endorsements.</p>
<p style="padding-left: 30px;"><a href="http://www.scottconklin.net/">Scott Conklin</a></p>
<p style="padding-left: 60px;">Scott Conklin is a state House representative from a rural area. He seems generally to be thought well of, and he secured the <a href="http://www.philly.com/inquirer/opinion/20100510_Editorial__Worthy_candidates.html">endorsement</a> of the Philadelphia Inquirer.</p>
<p style="padding-left: 30px;"><a href="http://www.smithribnerforlieutenantgovernor.com/">Doris Smith-Ribner</a></p>
<p style="padding-left: 60px;">Doris Smith-Ribner is a former judge who is running for the office. Her campaign seems less &#8220;professional&#8221; than the others.</p>
<p style="padding-left: 30px;"><a href="http://www.pittsburghcitypaper.ws/gyrobase/Content?oid=oid%3A78823">This article</a>, from the Pittsburg City Paper seemed to lay it out best: Smith-Ribner is not a serious contender, Conklin may be up-and-coming, but Saidel is tried-and-true and will be an asset to the ticket. I&#8217;m planning to vote for <strong>*Jonathan Saidel*</strong>, but wouldn&#8217;t be disappointed if Conklin won.</p>
<p>Race #3: US Senator: This may be the most important race on the ballot!</p>
<p style="padding-left: 30px;"><a href="http://www.specter2010.com/">Arlen Specter</a></p>
<p style="padding-left: 60px;">Look, if you don&#8217;t know who Arlen Specter is, then you shouldn&#8217;t be voting. He has been a Senator from PA for <em>30 years!</em> Of course, he was Republican for all but the last year or so. Here&#8217;s the deal: I always respected Specter as a member of the &#8220;other side&#8221; who was willing to be reasonable and to cross party lines. He says that he changed parties to have a chance of getting re-elected (I believe this) and he also says that the Republican party moved to the right rather than he moving to the left (I believe this too). And I strongly want to encourage moderates to leave the Republicans and join the Democratic &#8220;big tent&#8221;. So I decided long ago to support Arlen Specter.</p>
<p style="padding-left: 30px;"><a href="http://joesestak.com/Home/Home.html">Joe Sestak</a></p>
<p style="padding-left: 60px;">If you are not from my district, you may not know Joe Sestak. He&#8217;s a former 3-star Navy Admiral who went into politics in the US House from my district. He has his flaws (he&#8217;s more conservative on some issues than I would like, and he seems to have a bossy personal style with his subordinates), but he is very smart, very reasonable, and rather charismatic. He represents the up-and-coming future of the Democratic party.</p>
<div style="padding-left: 30px;">This was the hardest decision of all. Like I said, I decided long ago to support Arlen Specter. But over time I began to have second thoughts about this decision. When both Specter AND Sestak engaged in some nasty negative campaigning it didn&#8217;t help <em>either</em> one&#8217;s case (note to candidates: if one of you had kept out of the mud, you would have earned my vote). But <strong>six years</strong> is a very long time: too long, I decided finally, to give to someone just as a prize for switching parties. No one should be &#8220;entitled&#8221; to a senate seat because they&#8217;ve been there for a long time. So I am planning to vote for <strong>*Joe Sestak*</strong> &#8211; I really do think he represents the future.</div>
<p>All other races in my district are uncontested. I&#8217;ll happily vote for anyone brave enough to put their name on the ballot.</p>
<p><strong>Late Breaking Correction:</strong></p>
<p>I just got back from the polls, and it turns out there is another contested item on the ballot: selection of 5 men and 5 women for the &#8220;democratic committee&#8221; (what&#8217;s that?). There are just 5 women running, but 8 men. I didn&#8217;t do any research on this so you&#8217;re on your own. (The democratic party has official endorsements.)</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/election-guide-may-2010/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Petitioning the FCC on Net Neutrality</title>
		<link>http://mcherm.com/permalinks/1/petitioning-the-fcc-on-net-neutrality</link>
		<comments>http://mcherm.com/permalinks/1/petitioning-the-fcc-on-net-neutrality#comments</comments>
		<pubDate>Sat, 03 Apr 2010 22:46:42 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Politics]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=465</guid>
		<description><![CDATA[I sent the following message to the FCC, which is currently accepting public input prior to promulgating new rules on &#8220;Net Neutrality&#8221;.
To the FCC:
Some form of &#8220;Net Neutrality&#8221; is essential, and it is up to the governing agencies to determine what form and how.
The general principle holds that government should regulate as little as possible [...]]]></description>
			<content:encoded><![CDATA[<p>I sent the following message to the FCC, which is currently accepting public input prior to promulgating new rules on &#8220;Net Neutrality&#8221;.<span id="more-465"></span></p>
<blockquote><p>To the FCC:</p>
<p>Some form of &#8220;Net Neutrality&#8221; is essential, and it is up to the governing agencies to determine what form and how.</p>
<p>The general principle holds that government should regulate as little as possible in order to allow citizens to innovate. However, there are some places where that principle must give way to other important principles. The building and deploying of a worldwide network is a vital public good, but because of very strong network effects (please pardon the pun) the network itself will tend to be run by a very small number of large corporations.</p>
<p>Soon, that network will transport nearly all communications in this country. It will be (already is!) vital to citizens&#8217; ability to communicate: to speak with each other, to associate freely, to petition their government, and so forth. It will also be (already is!) vital to business: many businesses exist solely on the internet (YouTube, online retailers, eBay sellers, iPhone applications) and even more physical industries rely on the internet for communications.</p>
<p>The important point is this: the small number of companies that are responsible for the infrastructure of the internet do not have the ability to completely silence someone: if AT&#038;T or Verizon were to cut off internet access for anyone who took a particular political position the resulting hue and cry would lead to legislation to prevent such behavior. But they WILL have &#8212; in fact, already DO have &#8212; the ability to have a more subtle effect.</p>
<p>Companies could, with very plausible technical excuses, provide a small degree of preference for one customer over another. Perhaps they would deliver video slightly faster (so it works in realtime) for a major studio who paid them but not for the independent film producer. Perhaps they would block certain protocols that allowed anonymous communication while allowing others (this has already happened). The point is that the fundamental ability to communicate is too important to be allowed in the hands of a small number of companies, no matter how well intentioned.</p>
<p>This, then, is one of those rare situations where minimal government regulation IS called for. Rules are needed which will still permit companies to manage their traffic, but will prohibit them from discriminating on the basis of the origin, destination, format, or content of that traffic. And now is the time for such rules to be put in place.</p>
<p>Michael Chermside
</p></blockquote>
<p>If you feel similarly, I would encourage you to <a href="https://secure.freepress.net/site/Advocacy?cmd=display&#038;page=UserAction&#038;id=439">send your own message</a>. But hurry. The last day on which they will accept public comment is April 8th.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/petitioning-the-fcc-on-net-neutrality/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Logging APIs &#8211; Evaluating Options</title>
		<link>http://mcherm.com/permalinks/1/logging-apis-evaluating-options</link>
		<comments>http://mcherm.com/permalinks/1/logging-apis-evaluating-options#comments</comments>
		<pubDate>Tue, 09 Feb 2010 13:30:58 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=381</guid>
		<description><![CDATA[In my previous post, I defined a number of different features that logging libraries could have. This time, I will evaluate some Java libraries based on those features. I&#8217;ll start by ranking these according to how important I think they are, at least for my purposes.

Severity &#8211; mandatory: no logging system should be without this
Tree [...]]]></description>
			<content:encoded><![CDATA[<p>In <a href="http://mcherm.com/permalinks/1/logging-apis-feature-list">my previous post</a>, I defined a number of different features that logging libraries could have. This time, I will evaluate some Java libraries based on those features. <span id="more-381"></span>I&#8217;ll start by ranking these according to how important I think they are, at least for my purposes.</p>
<ol>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#severity">Severity</a> &#8211; <em>mandatory</em>: no logging system should be without this</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#tree">Tree of Log Topics</a> &#8211; <em>mandatory</em>: no logging system should be without this</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#configurable">Configurable</a> &#8211; <em>vital</em>: configuring log levels at runtime is something we use often</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#rotating">Rotating Log Files</a> &#8211; <em>vital</em>: our log files would be too big for the OS without this</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#lineformat">Configurable Log Line Format</a> &#8211; <em>vital</em>: it is unlikely that the off-the-shelf fields would be the ones we want to use</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#exceptions">Logging of Exceptions</a> &#8211; <em>vital</em>: getting stack traces from logs is one of our most productive debugging techniques</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#delayed">Delayed String Construction</a> &#8211; <em>vital</em>: I consider this to be an very undervalued feature. Without it, software <em>will</em> be slowed significantly and also will be less readable.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#locations">Log to Multiple Locations</a> &#8211; <em>desirable</em>: sometimes this is handy. We used to use it, but at the moment we don&#8217;t.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#directed">Configure where Logs are Directed</a> &#8211; <em>desirable</em>: this, too, we have used in the past but are not using right now.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#standard">Standard or Widely Used</a> &#8211; <em>desirable</em>: in the Java world, only Log4J (the most widely used library) and java.util logging (which is in the standard library).</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list">Unique Messages</a> &#8211; <em>desirable</em>: as suggested in the comments, an ability to identify each log usage uniquely (probably by source file and line number) would be handy.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#fallbacks">Sensible Fallbacks</a> &#8211; <em>desirable</em>: it&#8217;s nice that the library works OK when your config fails, because it helps in debugging the config problem.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#queued">Logging Queued to Avoid Delays</a> &#8211; <em>nice extra</em>: although this seems like it would be a very useful feature, I do not know of any serious production logging tools that implement it.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#threadlocal">Threadlocal Context Data</a> &#8211; <em>nice extra</em>: theoretically, this is extremely useful. In practice, people usually live without it and find the data by reading back through the log.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#filtering">Log Filtering</a> &#8211; <em>nice extra</em>: if this were easy, we would use it to filter out SSNs and passwords from our logs, but we can live without it.</li>
<li><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#internationalization">Internationalization</a> &#8211; <em>undesirable</em>: I recommend against ever using this. Use logging ONLY for developers, NOT for end users; all developers should speak the same language.</li>
</ol>
<p>Next, I will assemble a list of different logging libraries in Java to be evaluated.</p>
<ul>
<li><strong><a href="http://logging.apache.org/log4j/">Log4J</a></strong>: Log4J by Apache is the most widely used logging framework in Java.</li>
<li><strong><a href="http://java.sun.com/javase/6/docs/api/index.html?java/util/logging/package-summary.html">Java util logging</a></strong>: Rather than adopting Log4J as the standard for logging in Java, Sun chose to clone it, creating something almost-but-not-quite the same as a standard Java library.</li>
<li><strong><a href="http://commons.apache.org/logging/">Commons Logging/Log4J</a></strong>: Commons Logging is a wrapper from Apache which is designed for use in libraries. It simply delegates to an underlying logging framework. The purpose is so a library can be configured to use the same logging system as the rest of the application. I will consider Commons Logging backed by Log4J.</li>
<li><strong><a href="http://www.slf4j.org/">SLF4J/Log4J</a></strong>: SLF4J is a project begun by the original author of Log4J. It is intended to provide a better API for calling into a logging framework, and it can connect to different logging back ends. I will consider SLF4J backed by Log4J.</li>
</ul>
<p>There are others (<a href="http://logback.qos.ch/">logback</a>, <a href="http://jlo.jzonic.org/">jLo</a>, and <a href="http://www.java-logging.com/">many others</a>), but I am fairly confident that one of these 4 will be the final choice, so at this point I am going to perform a full analysis just on these 4.</p>
<table border="1">
<tr align="center">
<th>Feature</th>
<th></th>
<th>Log4J</th>
<th>JavaUtil</th>
<th>Commons<br/>/Log4J</th>
<th>SLF4J/Log4J</th>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#severity">Severity</a></td>
<td>mandatory</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#tree">Tree of Log Topics</a></td>
<td>mandatory</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#configurable">Configurable</a></td>
<td>vital</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#rotating">Rotating Log Files</a></td>
<td>vital</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#lineformat">Configurable Log Line Format</a></td>
<td>vital</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#exceptions">Logging of Exceptions</a></td>
<td>vital</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#delayed">Delayed String Construction</a></td>
<td>vital</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#locations">Log to Multiple Locations</a></td>
<td>desirable</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#directed">Configure where Logs are Directed</a></td>
<td>desirable</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#standard">Standard or Widely Used</a></td>
<td>desirable</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list">Unique Message</a></td>
<td>desirable</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#fallbacks">Sensible Fallbacks</a></td>
<td>desirable</td>
<td>Meh</td>
<td>Meh</td>
<td>Meh</td>
<td>Meh</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#queued">Logging Queued to Avoid Delays</a></td>
<td>nice extra</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#threadlocal">Threadlocal Context Data</a></td>
<td>nice extra</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#filtering">Log Filtering</a></td>
<td>nice extra</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
<tr align="center">
<td align="left"><a href="http://mcherm.com/permalinks/1/logging-apis-feature-list#internationalization">Internationalization</a></td>
<td>undesirable</td>
<td>No</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
</table>
<hr/>
<a href="http://www.flickr.com/photos/melodysk/3035450347/"><img src="http://mcherm.com/blog/wp-content/uploads/2010/01/woodpile.jpg" alt="Woodpile" title="woodpile" width="500" height="333" class="aligncenter size-full wp-image-434" /></a></p>
<p>So, after considering all of these options, I have concluded that my preference is to use SLF4J as an interface, with the implementation from Log4J. Most of the options I have considered have more or less the same features. The deciding factors are (1) Threadlocal storage (MDC) is useful and not present in java.util.logging, and (2) The API for SLF4J provides an elegant solution to delay string construction, which is rather important for performance.</p>
<p>Conveniently, SLF4J publishes a tool for automatically converting existing java.util.logging and Log4J code to SLF4J, so the conversion should be relatively painless.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/logging-apis-evaluating-options/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Logging APIs &#8211; Feature List</title>
		<link>http://mcherm.com/permalinks/1/logging-apis-feature-list</link>
		<comments>http://mcherm.com/permalinks/1/logging-apis-feature-list#comments</comments>
		<pubDate>Mon, 01 Feb 2010 19:42:15 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=388</guid>
		<description><![CDATA[Logging is not the world&#8217;s most interesting computing problem, but it is important, and it&#8217;s been on my mind lately because people have been pointing out that my company&#8217;s use of logging is currently a bit of a mess and ought to be cleaned up. Specifically, I&#8217;ve been thinking about the API that is used [...]]]></description>
			<content:encoded><![CDATA[<p>Logging is not the world&#8217;s most interesting computing problem, but it <em>is</em> important, and it&#8217;s been on my mind lately because people have been pointing out that my company&#8217;s use of logging is currently a bit of a mess and ought to be cleaned up.<span id="more-388"></span> Specifically, I&#8217;ve been thinking about the API that is used to invoke logging and also a <em>little</em> bit about the plumbing that is responsible for <em>doing something</em> with the log messages. What I would like to do here is a bit too big for one entry, so I will split it in two. In this, the first part, I will simply catalog some features that a logging API can have, then in the second part I will use that list to evaluate a few Java libraries. The catalog of features would apply to any language, but I will consistently use Java as an example.</p>
<p>The ultimate goal in logging is to write out some messages that are solely for use in debugging and managing a system &#8212; output that is a side effect rather than the primary goal of the program. The simplest version of logging has always been this:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">SplineList splineList <span style="color: #339933;">=</span> reticulateSplines<span style="color: #009900;">&#40;</span>rawSplines<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span>splineList<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>The simple &#8220;print&#8221; statement is a universal form of logging (I can name only a couple of languages that don&#8217;t have some basic syntax to support it) and I have used it far more times than I would choose to admit. Any logging framework should be evaluated in terms of what it <em>adds</em> to this capability. Here are the features I have been able to think of:</p>
<ul>
<li><strong><a name="severity">Severity</a></strong> &#8211; the ability to mark some log messages as being more important than others. There is no need to go overboard: even the biggest systems I have seen have done quite well with 6 or seven different levels, but it is nice to distinguish between printing out the value of a variable for debugging and reporting a potentially fatal application error. Severity in logging looks like this:

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">logger.<span style="color: #006633;">log</span><span style="color: #009900;">&#40;</span>Level.<span style="color: #006633;">WARNING</span>, <span style="color: #0000ff;">&quot;Spline reticulation returned 0 results.&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

</li>
<li><strong><a name="tree">Tree of Log Topics</a></strong> &#8211; in large or medium-sized programs, no one is interested in every single log message (at least not all at the same time). The best solution that has been found is to group messages into &#8220;topics&#8221;, and the best solutions organize these topics into trees. So you may write some logs to the topic &#8220;security.login&#8221;, others to &#8220;validation.TransferMoney&#8221; and a library may write to &#8220;org.apache.comcat.util&#8221;. Logging levels can be set to view even DEBUG level messages within &#8220;security.*&#8221; but only FATAL messages in &#8220;org.apache.*&#8221;. Often a sensible default is to take the name of the log topic from the full package-and-class of the class that is writing the log message. Use of log topics looks like this:

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">private</span> <span style="color: #000000; font-weight: bold;">static</span> Logger logger <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> Logger<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;com.mcherm.sampleapp.ThisClass&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

</li>
<li><strong><a name="configurable">Configurable</a></strong> &#8211; the choice of what to log, at what severity, and to what files (or other kinds of output) should not be embedded in code. It should be configurable, certainly at deploy time and perhaps at runtime as well. Configuration looks something like this:

<div class="wp_syntax"><div class="code"><pre class="properties" style="font-family:monospace;"><span style="color: #000080; font-weight:bold;">log4j.appender.LOGFILE</span><span style="color: #000000;">=</span><span style="color: #008000; font-weight:bold;">org.apache.log4j.DailyRollingFileAppender</span>
<span style="color: #000080; font-weight:bold;">log4j.appender.LOGFILE.layout</span><span style="color: #000000;">=</span><span style="color: #008000; font-weight:bold;">org.apache.log4j.PatternLayout</span>
<span style="color: #000080; font-weight:bold;">log4j.appender.LOGFILE.layout.ConversionPattern</span><span style="color: #000000;">=</span><span style="color: #008000; font-weight:bold;">%d %p <span style="">&#91;</span>%t<span style="">&#93;</span> <span style="">&#91;</span>%X<span style="">&#123;</span>sesstag<span style="">&#125;</span><span style="">&#93;</span> <span style="">&#91;</span>%X<span style="">&#123;</span>pmsessid<span style="">&#125;</span><span style="">&#93;</span> <span style="">&#91;</span>%c<span style="">&#93;</span> - &amp;lt;%m&amp;gt;%n</span>
<span style="color: #000080; font-weight:bold;">log4j.appender.LOGFILE.DatePattern</span><span style="color: #000000;">=</span><span style="color: #008000; font-weight:bold;">'.'yyyy-MM-dd</span></pre></div></div>

</li>
<li><strong><a name="rotating">Rotating log files</a></strong> &#8211; When logs are written to a file (by far the most common approach), the files can very easily get too big to use. A useful feature is the ability to set a max size for the files. After the max size has been reached, subsequent writes go to a new file, and the old file is <em>left in place until later</em>. Rotating log files lead to directory listings like this:

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">12/10/2009  03:24 PM         1,710,360 loanapp.0.log
12/09/2009  11:41 AM                 0 loanapp.0.log.lck
12/09/2009  11:40 AM           189,992 loanapp.1.log
12/08/2009  06:06 PM         1,366,039 loanapp.2.log
12/07/2009  03:08 PM         8,314,456 loanapp.3.log
11/24/2009  01:34 PM           410,931 loanapp.4.log
11/24/2009  11:34 AM           506,066 loanapp.5.log</pre></div></div>

<p><div id="attachment_413" class="wp-caption aligncenter" style="width: 392px"><strong><a href="http://www.flickr.com/photos/davidkohlmeyer/4012196059/"><img class="size-full wp-image-413" title="log_rolling" src="http://mcherm.com/blog/wp-content/uploads/2010/01/log_rolling.jpg" alt="" width="382" height="280" /></a></strong><p class="wp-caption-text">The other kind of &quot;Log Rolling&quot;</p></div>
</li>
<li><strong><a name="lineformat">Configurable Log Line Format</a></strong> &#8211; The key content of a log message is typically some string that was provided when the call was made to the logger. But there is a lot of other information that may be available: the exact time the log call was made, the logger topic that was used, the place in the code the call was made from, the thread that was in use&#8230; and lots of others if you keep thinking about it. A useful feature of some loggers is the ability to configure just what information gets written out along with that basic message. Typical output from a logger may look like this:

<div class="wp_syntax"><div class="code"><pre class="txt" style="font-family:monospace;">* 2009-12-29T15:56:55.591-05:00 [22] com.company.util.security FINER (com.company.util.security.OneTimeCredential.validateCredential)
clearCred = 00 00 01 25 DC 3A 23 8E 00 00 00 4E
&nbsp;
* 2009-12-29T15:56:55.591-05:00 [22] com.company.dg.util.OperatorImpl FINEST (com.company.dg.util.OperatorImpl.getOperator)
Returning systemOperator = web
&nbsp;
* 2009-12-29T15:56:55.591-05:00 [22] com.company.util.ApplicationPropertiesImpl FINEST (com.company.util.ApplicationPropertiesImpl.getProperty)
Property Name(AuthenHelper.custDatabase) has value (cust).
&nbsp;
* 2009-12-29T15:56:55.591-05:00 [22] com.company.util.ApplicationPropertiesImpl FINEST (com.company.util.ApplicationPropertiesImpl.getProperty)
Property Name(AuthenHelper.getPartyId) has value (pkg_dg_party_login.sp_getPartyIdByString).</pre></div></div>

<p>Each entry has a timestamp, the thread number in square brackets, the logger name, the severity, the function logging it, and then the log message string. There are various other pieces of metadata that could have been recorded instead of or in addition to the timestamp, thread number and so forth, and this can all be controlled from the logging config file.</li>
<li><strong><a name="exceptions">Logging of Exceptions</a></strong> &#8211; Most logging has been described in terms of the string to be logged, but there is no reason why it necessarily has to be a string. In theory, a logging framework could support producing sensible logs from other kinds of objects that are passed in to it. In practice, I have only seen one such object which is useful (but that one is quite helpful) &#8211; Exceptions. So it is useful if a logging framework allows the user to specify an exception, along with the string, when that is appropriate, and includes the stack trace from that exception in the data recorded for the log.</li>
<li><strong><a name="delayed">Delayed String Construction</a></strong> &#8211; The content for a log message <em>can</em> be just a simple string:

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">logger.<span style="color: #006633;">warn</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Spline reticulation returned 0 results.&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>but it is far more common for it to be a string constructed with data available at runtime:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">logger.<span style="color: #006633;">warn</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Spline reticulation returned &quot;</span> <span style="color: #339933;">+</span> numResults <span style="color: #339933;">+</span> <span style="color: #0000ff;">&quot; results.&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Also, it is not at all unusual for there to be significant amounts of work in constructing the string:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">logger.<span style="color: #006633;">warn</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Splines returned: &quot;</span> <span style="color: #339933;">+</span> splineList.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

<p>Because Java does early binding of parameters (as do most languages other than Haskell), the work to generate the string to be logged will be done <strong>EACH AND EVERY TIME</strong> the log statement is encountered, <strong>EVEN IF LOGGING IS DISABLED</strong>. So even if the log levels are set to warnings and debug logs won&#8217;t be printed, the examples above would still perform all the work to construct the string. This overhead can be enormous: I personally worked on a project where we sped up the entire application by a factor of 3 just be avoiding the creation of log lines for statements that would not be logged.</p>
<p>There are three ways to avoid this overhead. One is specify the log level at compile time (lots of logging systems for C do this), but then it cannot be changed without recompiling. Another is to manually perform the (efficient) check of logging level before performing the work:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">if</span> <span style="color: #009900;">&#40;</span>logger.<span style="color: #006633;">isLoggable</span><span style="color: #009900;">&#40;</span>Level.<span style="color: #006633;">DEBUG</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    logger.<span style="color: #006633;">debug</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Splines returned: &quot;</span> <span style="color: #339933;">+</span> splineList.<span style="color: #006633;">toString</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>but this requires burying each logging statement inside of an <em>if</em>. The final option is what I call &#8220;Delayed String Construction&#8221;, or sometimes &#8220;parameterized log messages&#8221; and that is to have library level support for delaying the construction:</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;">logger.<span style="color: #006633;">debug</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Splines returned: {}&quot;</span>, splineList<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span></pre></div></div>

</li>
<li><strong><a name="locations">Log to Multiple Locations</a></strong> &#8211; Sometimes you want certain log entries to go to more than one file. For instance, you might want to send all logs to a single master log file, but <em>also</em> send logs about login errors to a separate security log which gets reviewed by a different group of people.</li>
<li><strong><a name="directed">Configure Where Logs are Directed</a></strong> &#8211; Usually you want logs to go to a file, but other times you want something more exotic like storing logs from multiple servers on a single remote location, sending logs to a messaging system, or writing logs to a database. This is particularly useful in conjunction with logging to multiple locations. So one useful feature, called &#8220;Appenders&#8221; in some systems, is the ability to provide a custom piece of code which processes the log entry: it could write it to a file OR do something else more exotic.</li>
<li><strong><a name="standard">Standard, or Widely Used</a></strong> &#8211; It is a big advantage if the logging system you use is widely used by others. One reason is the usual: standardized or widely used products tend to get improved over time while more obscure products tend to be forgotten and lose support. But in logging there is an additional advantage: if a library you are using or the server that is hosting your code also uses the same logging facility, then it may be possible (even easy) to comingle the log messages with your application log messages, which can be extremely useful when debugging issues which cross library boundaries.</li>
<li><strong><a name="fallbacks">Sensible Fallbacks</a></strong> &#8211; As previous features have mentioned, logging can involve all kinds of complicated configuration and setup. In the real world, these will sometimes fail. Perhaps your logging configuration file cannot be found, or the log directory cannot be written to, or the logging database password fails &#8212; and the most common of these errors is to have no configuration at all. Different logging frameworks have different fallback behavior in the face of problems like this: some will disable all logging, others will fall back on a more primitive form of logging such as writing all logs to stdout if the logging is not configured. It is desirable for a logging system to have sensible fallback behavior.</li>
<li><strong><a name="queued">Logging Queued to Avoid Delays</a></strong> &#8211; In nearly all cases, logging is not the most important thing that your application is doing. The primary goal of the program is to reticulate splines, respond to user keystrokes, respond to web requests, or whatever it is that your application does; logging is a secondary concern. That being said, it is a real shame that writing logs slows down the application. It is quite common that the work done in response to a request can be quick and simple enough (retrieving some data that is already cached) that it is FAR less than the time required to open a file on disk and write even a single line of logging.One solution (with a few drawbacks) is that instead of actually writing the log immediately, logs that are to be written get put onto a queue, and they are written out by a background thread. The big advantage is performance: your program can quickly stash a few strings in memory and get on with its work faster; the log entry gets written later, when there are spare CPU cycles available. It can even save time overall by batching writes. There are a few disadvantages to be aware of: it will increase memory requirements (but typically by an insignificant amount); the queue typically needs a max size, and you need a policy for what to do when logs are being created faster than they can be written: typically the choices are either to discard some of the logs or to slow down the application. And it makes it difficult to debug issues that actually lead to crashes (because the logs don&#8217;t get written due to the crash). But significantly faster application performance is often worth these costs.</li>
<li><strong><a name="threadlocal">Threadlocal Context Data</a></strong> &#8211; Imagine building a web application, which takes requests from logged-in users. And consider logs which are written from deep inside the spline reticulation code. For tracking down certain kinds of bugs, it would nice to know which user&#8217;s data contained the spline that was being reticulated. But we can&#8217;t just add that to the log message, because the spline reticulation code isn&#8217;t passed the name of the user. Some function 7 levels up the stack may have known it, but that doesn&#8217;t help to pass it into the log message.So a useful feature would be to have a command we could call at the point where we first identify the user submitting this request. This command could tell the logging system the name of the user and the logging system could store it in threadlocal storage. Then all future logs written from this thread could be annotated with &#8220;user=JSmith&#8221;, until this service call is completed.
<p>Two versions of this feature are offered by Log4J: &#8220;Nested Diagnostic Context&#8221;, in which multiple pieces of data can be kept in a stack, and &#8220;Mapped Diagnostic Context&#8221; in which multiple pieces of data are kept in a map.</li>
<li><strong><a name="filtering">Log Filtering</a></strong> &#8211; Not all applications will have this requirement, but at large institutions with a concern for security, there are some things which ought not be written into a log. For instance, there may be a policy that passwords are not recorded, or that customer social security numbers never be stored in an unencrypted fashion. A fairly rare, and often imperfect but still useful, feature is the ability to provide some sort of filter which will recognize these kinds of data and remove them before the log data is recorded to disk. The difficult part, of course, is in recognizing the data.</li>
<li><strong><a name="internationalization">Internationalization</a></strong> &#8211; Many loggers have support for writing messages in unicode (particularly in languages like Java where the fundamental String type supports unicode). But some go further and provide support for loading the messages from a language-specific file and may have features like localization of dates and other information appearing in log messages.</li>
</ul>
<p>So that&#8217;s my list of features; see <a href="http://mcherm.com/permalinks/1/logging-apis-evaluating-options">my next entry</a> where I use this list of features to evaluate some Java logging libraries.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/logging-apis-feature-list/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Password in Pieces</title>
		<link>http://mcherm.com/permalinks/1/password-in-pieces</link>
		<comments>http://mcherm.com/permalinks/1/password-in-pieces#comments</comments>
		<pubDate>Sat, 05 Dec 2009 04:08:13 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=338</guid>
		<description><![CDATA[I came across the following question on reddit:
My bank on the online banking login instead of having a password field it presents you with 3 password fields 1 character each where it asks you for 3 characters from your password, chosen randomly. E.g. the 2nd, 4th and 7th.
I wanted to respond to this, because not [...]]]></description>
			<content:encoded><![CDATA[<p>I came across the following <a href="http://www.reddit.com/r/programming/comments/ab470/have_guys_seen_this_password_input_scheme_does_it/">question</a> on <a href="http://www.reddit.com/">reddit</a>:</p>
<blockquote><p>My bank on the online banking login instead of having a password field it presents you with 3 password fields 1 character each where it asks you for 3 characters from your password, chosen randomly. E.g. the 2nd, 4th and 7th.</p></blockquote>
<p>I wanted to respond to this, because not only is it an incredibly misguided attempt at security which seriously <em>weakens</em> actual security, it also sounds familiar. Because a few months ago my employer considered doing something just like this. Let me recount the story<span id="more-338"></span>:</p>
<p>I work for a bank, so we care a LOT about security. Customers call into our call center and to identify themselves they get connected to the IVR (interactive voice response unit&#8230; telephone system) to enter their PIN (a 4 to 8 digit passcode). An important feature is that we cut out the phone reps from hearing this&#8230; because we want your password to be a secret EVEN FROM OURSELVES. All of this is good security design.</p>
<p>We opened up a new call center in Hawaii, and they had some problems. Apparently the phone system we were using had a time limit when transferring a call &#8212; if it wasn&#8217;t picked up by the remote phone switch within a few milliseconds then it was disconnected. The ping time between the Hawaii call center and our east-coast data center was just a little too long and many of the calls were being disconnected when they were transferred to the IVR to enter the PIN.</p>
<p>The first solution that they thought of was to stop using the IVR to enter PIN numbers. Instead, the idea was that they would instead create a system where the phone reps asked the customers for certain digits out of their PIN (just as described in enanoretozon&#8217;s reddit question). They would type this in and then the customer could log in. Apparently, this was the standard practice at our German subsidiary, and had somehow become blessed as the official corporate-wide best practice.</p>
<p>Well, it may be an official &#8220;best practice&#8221;, but it&#8217;s still a very bad idea, for two reasons. The first reason should be completely obvious if you just try it. First, say your phone number out loud. Now say the 3rd, 6th, and 4th characters of it. For most normal people, the second will take many times longer, and be much harder, even though it is only 3 digits. There is always a tradeoff between security and usability (We could provide <em>perfect </em>security if we never allowed anyone to take their money out of the bank. Of course, usability would have dropped to zero.), and entering random digits has SUCH <em> </em>poor usability that it is not worth it.</p>
<div id="attachment_345" class="wp-caption alignright" style="width: 250px"><a href="http://www.flickr.com/photos/incognito_rico/32359790/"><img class="size-full wp-image-345" title="3D6" src="http://mcherm.com/blog/wp-content/uploads/2009/12/3D6.jpg" alt="Random Digits" width="240" height="180" /></a><p class="wp-caption-text">Random Digits</p></div>
<p>Besides that, it is <em>also</em> less secure. There are, if you consider it, multiple different kinds of attacks that we need to protect against. One kind, certainly, is attacks by unscrupulous bank employees who might misuse a customer&#8217;s login credentials. But another <em>far more likely</em> attack is a third-party who wants to steal from a customer&#8217;s account.</p>
<p>Such an attacker, if they didn&#8217;t know the customer&#8217;s PIN, would have to guess. To prevent repeated guessing, we will temporarily lock out a customer&#8217;s account after a certain number of incorrect login attempts. But a clever attacker would just try <em>different</em> customers, making just one guess for each one.</p>
<p style="text-align: center;"><a href="http://moultano.blogspot.com/2009/11/google-can-generate-your-equations-for.html"><img class="aligncenter" src="http://chart.apis.google.com/chart?cht=tx&amp;chf=bg,s,FFFFFF00&amp;chco=000000&amp;chl=E%28tries\:needed%29=10^{digits}" alt="Equation" /></a></p>
<p>With (for instance) a 6-digit pin, the expected number of guesses before the attacker got one right is around 100,000. Long before an attacker managed to try even a small fraction of 100,000 guesses, we would have noticed what they were doing and put a stop to it. But we only ask for 3 particular digits out of the password, then the attacker only needs to try about 1000 times before she is expected to guess correctly. There is a good chance that we would catch that, but (particularly if they <a href="http://www.spoofcard.com/">spoof their phone number</a>) we might not.</p>
<p>So we traded better defense against a rare attack (we don&#8217;t hire a lot of employees who commit bank fraud) for <em>much</em> worse defense against a common attack (we detect and stop attempted attacks of various sorts every single week!). It is NOT an improvement.</p>
<p>So&#8230; after these points were raised, we chose <em>not</em> to implement our German counterpart&#8217;s policy. What did we do instead?</p>
<p>Very simple: we fixed the phone system so it could transfer calls properly.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/password-in-pieces/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Raising the limit on IDs processed</title>
		<link>http://mcherm.com/permalinks/1/raising-the-limit</link>
		<comments>http://mcherm.com/permalinks/1/raising-the-limit#comments</comments>
		<pubDate>Fri, 13 Nov 2009 13:35:37 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=324</guid>
		<description><![CDATA[It is a fairly simple screen for entering &#8220;mass alerts&#8221;. There are (omitting some irrelevant details) just two fields: one in which the user enters the text of an alert, and the other in which they enter a list of customer-ids specifying who we should show the alert to. This is normally pasted in from [...]]]></description>
			<content:encoded><![CDATA[<p>It is a fairly simple screen for entering &#8220;mass alerts&#8221;. There are (omitting some irrelevant details) just two fields: one in which the user enters the text of an alert, and the other in which they enter a list of customer-ids specifying who we should show the alert to. This is normally pasted in from a spreadsheet by the users who are setting up new alert messages.</p>
<p>The feature that we need to implement (or &#8220;story&#8221; in <a href="http://www.scrumalliance.org/pages/what_is_scrum">Scrum</a> parlance) is an increase in the maximum number of customers that can be set at once. You see, there is a &#8220;feature&#8221; that limits the number of IDs that can be set at one time to about 200. (&#8220;About&#8221; 200 because most id&#8217;s are 9 digits long and they are separated by whitespace; the actual limit is 2000 characters, enforced in Javascript as the field is input.) So when they need to set an alert on 600 IDs, they run through the screen 3 times. When they have 2.5 million IDs to update they open up a &#8220;story&#8221; for the development team.</p>
<p>I think we asked someone why it was limited to 200 IDs. No one is quite sure, but it&#8217;s probably to avoid overtaxing the database query or running a middleware service that takes too long&#8230; something like that. &#8220;Sure,&#8221; we say, &#8220;we can increase the limit.&#8221; We figure maybe we&#8217;ll group it in chunks of 200 and call it in a loop or something. We schedule it to be worked on in this month&#8217;s &#8220;sprint&#8221;.</p>
<p>A couple of man-days of effort go into building it. Some testing determines that (on much less powerful dev hardware) a single call can easily handle thousands of IDs without running into timeout issues &#8212; more than that, actually, as we left a factor of 4 or 5 for safety. So the front end breaks the list into chunks of that size. We thought we&#8217;d build it to handle unlimited capacity, but there&#8217;s an <a href="http://www.stopie6.com/">IE6 bug</a> (yes, our corporate overlords require the use of and obsolete broken browser) that limits us to about 60,000 IDs.</p>
<div id="attachment_330" class="wp-caption alignleft" style="width: 510px"><img class="size-full wp-image-330" title="Corporate Overlords" src="http://mcherm.com/blog/wp-content/uploads/2009/11/CorporateOverlords.jpg" alt="Our Corporate Overlords" width="500" height="330" /><p class="wp-caption-text">Our Corporate Overlords</p></div>
<p>So we have completed the feature and the business can now enter more than 50x as many IDs at a time. But that&#8217;s not <em>quite</em> the end of the story. Because as part of regression testing, our QA staff does some exhaustive testing of the screen, and they discover that there apparently isn&#8217;t a limit on the size of <em>other</em> field, the one that contains the alert message. We check the database table for the appropriate max message length, and it turns out to be exactly 2000 characters.</p>
<p>Wait&#8230; I think I&#8217;ve heard that number before.</p>
<p>Apparently, whoever built this page in the very first place accidentally limited the length of the wrong field. There never <strong>was</strong> a reason for a limit on the number of IDs processed at once&#8230; the limit came entirely because of a bug. Yet we&#8217;ve been living with this absurd limitation for several years, simply because no one ever questioned the limit. (Or if they did question it, they got some vague answer like &#8220;I assume it&#8217;s for performance reasons.&#8221;)</p>
<p>I&#8217;m sure there is some lesson we should draw from this experience&#8230; I&#8217;ll leave it to you to figure out what the lesson is.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/raising-the-limit/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Upgrading GWT/AppEngine to v1.6+</title>
		<link>http://mcherm.com/permalinks/1/upgrading-gwtappengine-to-v1-6</link>
		<comments>http://mcherm.com/permalinks/1/upgrading-gwtappengine-to-v1-6#comments</comments>
		<pubDate>Mon, 09 Nov 2009 05:38:39 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=314</guid>
		<description><![CDATA[I had a project using Google Web Toolkit (GWT) and App Engine. It was developed in Eclipse (which I don&#8217;t like much, mostly because I don&#8217;t know how to use it very well) because Google recommends this and provides support in the form of Eclipse plugins for working with these tools.
Well, they released a new [...]]]></description>
			<content:encoded><![CDATA[<p>I had a project using Google Web Toolkit (GWT) and App Engine. It was developed in <a href="http://www.eclipse.org/">Eclipse</a> (which I don&#8217;t like much, mostly because I don&#8217;t know how to use it very well) because Google recommends this and provides support in the form of Eclipse plugins for working with these tools.</p>
<p>Well, they released a new version and I hit the &#8220;upgrade&#8221; button. After that, my project didn&#8217;t work anymore. I tried for a day to resolve it and I just couldn&#8217;t understand anything. Finally I &#8220;solved&#8221; it by uninstalling Eclipse and reinstalling it, then following the tutorial steps to create a brand new project and copying in my old files one-by-one. Another full day lost (I can only work a couple of hours per day on hobby projects).</p>
<p>Surely they wouldn&#8217;t do it again, right? So I carefully saved everything and held my breath the <em>next</em> time Google released an upgrade. It promptly broke everything like last time. Only this time I solved it differently: I uninstalled Eclipse and did NOT reinstall it.</p>
<p>The big difference is that in the intervening month JetBrains had <a href="http://blogs.jetbrains.com/idea/2009/10/intellij-idea-open-sourced/">announced</a> that a slightly-impaired version of <a href="http://www.jetbrains.com/idea/">IntelliJ IDEA</a> would be available for free. The stripped down version <a href="http://www.jetbrains.com/idea/nextversion/editions_comparison_matrix.html">doesn&#8217;t have support</a> for GWT and App Engine (which the paid version <em>does</em> have), but it&#8217;s something I can use. At work, I use IntelliJ (properly paid for) but it&#8217;s awfully expensive to pay for my own copy at home. (Can&#8217;t use the same copy because that would disturb the corporate bean-counters, even though it is allowed by the license.) The stripped down version is fine if I can run from the command line.</p>
<p>There are <a href="http://code.google.com/webtoolkit/tutorials/1.6/create.html">instructions</a> for running GWT via ant. And there are <a href="http://code.google.com/webtoolkit/tutorials/1.6/appengine.html">instructions</a> for adding support for App Engine. But they are broken in (what I think is) exactly the same way that the Eclipse plugin is broken. Details from <a href="http://groups.google.com/group/google-appengine-java/browse_thread/thread/df660675d21c64f0">a forum posting</a> led me to realize the problem was that it now needs a &#8220;javaagent&#8221; specified. A &#8220;javaagent&#8221; is <a href="http://java.sun.com/javase/6/docs/api/java/lang/instrument/package-summary.html">some sort of a</a> pre-processor that runs before main() &#8212; apparently introduced with Java 1.5.</p>
<p>So after following Google&#8217;s instructions, I now add the following: In my &lt;hosted&gt; target, along with the other &lt;jvmarg&gt; elements, I add a new one which looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="xml" style="font-family:monospace;"><span style="color: #009900;"><span style="color: #000000; font-weight: bold;">&lt;jvmarg</span> <span style="color: #000066;">value</span>=<span style="color: #ff0000;">&quot;-javaagent:${appengine.sdk}/lib/agent/appengine-agent.jar&quot;</span><span style="color: #000000; font-weight: bold;">/&gt;</span></span></pre></div></div>

<p>After that, I can build it using ant. I&#8217;ll also need to use the command line for deploys, that looks like this:</p>

<div class="wp_syntax"><div class="code"><pre class="text" style="font-family:monospace;">&quot;C:\Program Files\appengine-java-sdk-1.2.6\bin\appcfg&quot; update war</pre></div></div>

<p>And now it works again.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/upgrading-gwtappengine-to-v1-6/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Estimate Units</title>
		<link>http://mcherm.com/permalinks/1/estimate-units</link>
		<comments>http://mcherm.com/permalinks/1/estimate-units#comments</comments>
		<pubDate>Fri, 13 Feb 2009 18:44:57 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=268</guid>
		<description><![CDATA[When you estimate tasks, should the estimates be done in hours, or in days?
As I see it, the big advantage of estimating in hours is that if you THINK in hours, you tend to get a more accurate estimate. There are lots of development tasks which will seem like they should take &#8220;no more than [...]]]></description>
			<content:encoded><![CDATA[<p>When you estimate tasks, should the estimates be done in hours, or in days?</p>
<p>As I see it, the big advantage of estimating in hours is that if you THINK in hours, you tend to get a more accurate estimate. There are lots of development tasks which will seem like they should take &#8220;no more than 2 days&#8221;, but if you think about all the individual steps (I have write create the page and the new service. And the stored procedure. And I&#8217;ll have to get a security review and a code review. And I have to remember to do the unit tests. Oh yes, and save time for bug fixes), the total comes out a big bigger.</p>
<p>As I see it, the big advantage of estimating in days is that it&#8217;s quicker and simpler. If you team is sitting there arguing whether a task is 3 hours or 4 hours, then you&#8217;re wasting time &#8212; after all, development estimates are never THAT accurate anyway: we always need to allow for the unexpected.</p>
<p>Considering these, I could be persuaded to do it either way. What is NOT useful is to think how many days it will take, multiply by the number of hours per day, then spend time arguing about whether it is one more or one less than this number.</p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/estimate-units/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>An Exception to Every Rule</title>
		<link>http://mcherm.com/permalinks/1/an-exception-to-every-rule</link>
		<comments>http://mcherm.com/permalinks/1/an-exception-to-every-rule#comments</comments>
		<pubDate>Wed, 31 Dec 2008 21:54:31 +0000</pubDate>
		<dc:creator>mcherm</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://mcherm.com/?p=85</guid>
		<description><![CDATA[I like automated code scanners, really I do. They can scan your code either before or after you check it in and review it for code formatting, memory errors, or even potential security problems. It can prevent lots of foolish errors and unnecessary inconsistencies.
But there is one catch: the tools are &#8220;dumb&#8221;, and there always [...]]]></description>
			<content:encoded><![CDATA[<p>I like automated code scanners, really I do. They can scan your code either before or after you check it in and review it for <a href="http://checkstyle.sourceforge.net/">code formatting</a>, <a title="Valgrind, Purify, and others" href="http://phaseit.net/claird/comp.software.testing/mem_test.html">memory errors</a>, or even <a title="Ounce" href="http://www.ouncelabs.com/">potential security problems</a>. It can prevent lots of foolish errors and unnecessary inconsistencies.</p>
<p>But there is one catch<span id="more-85"></span>: the tools are &#8220;dumb&#8221;, and there always needs to be a way for a knowledgeable human to override it. Usually it&#8217;s a special comment that tells the tool not to report a particular infraction. The override is used in that rare special case where there is a good reason why the rule does not apply.</p>
<p>Now, I know a certain portion of my readers are already forming objections, certain that THEIR pet peeve is the one case to which there should never be an exception. For instance, rules requiring consistent indentation throughout the project are good, but suppose some of your source files are from an external source? It&#8217;s not wise to reformat them just to placate the code scanner &#8212; that will make it harder to merge in changes when the next revision is released.</p>
<p>Or for another example, the capitalization rule that Java instance variables always begin with a lowercase letter seems very good, but when my team created an XML binding mechanism that mapped XML fields (which are case sensitive and often begin with a capital letter) to instance variables, being consistent with the XML file was more important than following the standard Java conventions.</p>
<p>I can still hear someone arguing. Some reader out there is still complaining that there is a special reason (security? correctness checks?) why some rules must be absolute. Frankly, I think this reader is just a control freak and they should learn to <a href="http://www.noop.nl/2008/12/i-follow-my-rules-you-follow-yours.html">let programmers act like the professionals they are</a>, but in order to convince you, I&#8217;m going to tell a story.</p>
<div id="attachment_253" class="wp-caption alignright" style="width: 314px"><img class="size-full wp-image-253" title="Larry Osterman" src="http://mcherm.com/blog/wp-content/uploads/2008/12/larryosterman.jpg" alt="Larry Osterman" width="304" height="229" /><p class="wp-caption-text">Larry Osterman</p></div>
<p>This is the story of a programming rule so VERY absolute that before you read the tale you&#8217;ll agree that there is NO possible reason to violate this rule. Yet after an automated code scanner discovered the violation, the catastrophic result was that for 2 years, all cryptography originating from a major operating system was completely insecure.</p>
<p>So&#8230; on with the story. (And many thanks to <a href="http://blogs.msdn.com/larryosterman/archive/2008/05/13/more-proof-that-crypto-should-be-left-to-the-experts.aspx">Larry Osterman</a>, pictured here, from whom I first learned this sordid tale.)</p>
<hr />The programming rule that I am sure you will agree with is that one should never read from uninitialized memory. In many modern programming languages this is difficult or impossible (Java, for instance, guarantees instance variables are pre-initialized to 0), but in C it is so easy that people constantly do it by mistake. Since there is NEVER a reason to read from unitialized memory (which could contain any random junk), surely we can expunge this behavior from our code, right?</p>
<p>Now, Linus Torvalds and his compatriots manage the source code for the kernel of the Linux operating system, but a usable system requires much more &#8212; a whole set of systems and applications which are tested and configured to work together. This is called a &#8220;Linux distribution&#8221; and there are several <a title="Major Linux Distributions" href="http://distrowatch.com/dwres.php?resource=major">major distributions</a> in wide use: SUSE, Fedora, and Mandriva among others.  The single most popular today is Ubuntu, which is based on Debian.</p>
<p>The folks at Debian collect code from a number of different projects and carefully review it, test for compatibility, and then certify the resulting distribution and provide a place to download the finished product. One of those components is OpenSSL, an open-source implementation of some cryptography libraries that provide support for basic functions like secure random number generation. And one of the tests that the Debian folks perform is to analyze the code with Valgrind, an automated code scanner that detects memory leaks and similar problems.</p>
<p>Valgrind detected that there was a problem in one of the OpenSSL routines that generated random numbers &#8212; it accessed uninitialized memory. <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=363516">Not knowing how to suppress the warning</a>, someone at Debian decided to &#8220;fix&#8221; the code.</p>
<p>What they didn&#8217;t realize is that the OpenSSL developers had known exactly what they were doing. The crux of a cryptographically secure random number generator is an entropy collector. You see, the problem with generating &#8220;random&#8221; numbers on a computer is that nothing on the computer is truly random. You can mix and mash bits with the fanciest hash function in the world, but if the seed you start it off with is just the current time off the clock and an attacker can guess that (all but the last few digits are awfully easy to guess), then the attacker can repeat the same process and determine what &#8220;random&#8221; key was chosen, thus completely cracking your security.</p>
<p>So a <em>cryptographic</em> random number generator (as opposed to a garden variety RNG) goes to great lengths to collect &#8220;entropy&#8221;. It may start with the time off the clock, but it mixes in the number of milliseconds between key presses on the keyboard. And it mixes in the process ID of the OpenSSL process, <a href="http://www.openssl.org/docs/crypto/RAND_egd.html">data traveling over the network socket</a>, <a href="http://www.grc.com/sn/SN-146.htm">micro-timing of the hard-drive motor and of mouse movements</a>, and anything else they can use as a source of &#8220;randomness&#8221;. When they are first creating the data structure for the entropy pool, they <em>intentionally</em> left the memory uninitialized, because it&#8217;s yet another source of randomness.</p>
<p><img class="alignleft size-full wp-image-255" style="margin: 3px;" title="debianlogo-100" src="http://mcherm.com/blog/wp-content/uploads/2008/12/openlogo-100.png" alt="debianlogo-100" width="100" height="123" />Unfortunately, a Debian maintainer didn&#8217;t realize this. And they also made a minor error when changing the code: they accidentally commented it out in such a way that all sources of entropy other than the first one (the process ID) were multiplied by zero before being mixed in. So there was NO randomness except the process ID (which is very easily guessed).</p>
<div id="attachment_258" class="wp-caption alignright" style="width: 206px"><img class="size-full wp-image-258" title="ubuntulogo1" src="http://mcherm.com/blog/wp-content/uploads/2008/12/ubuntulogo1.png" alt="Ubuntu" width="196" height="196" /><p class="wp-caption-text">Ubuntu</p></div>
<p>Then Debian was released with this bug, and Ubuntu picked it up and distributed it further. And 2 years went by. During that 2 years, everything done using the RNG on Debian or Ubuntu Linux is insecure because the keys are <a href="http://www.metasploit.com/users/hdm/tools/debian-openssl/">guessable</a>. <a href="http://mag.entropy.be/blog/2008/05/13/how-badly-debianubunutu-openssl-is-fscked-up/">Everything!</a> Any SSL connection made from such a machine. Any secure certificate signed by such a machine. And no one noticed for two whole years.</p>
<p>So the moral of the story is, don&#8217;t behave like that  ignominious Debian developer and change code that you don&#8217;t understand. But also realize that for <em>any</em> supposedly-universal rule, there is some special case exception. In almost all circumstances, the rule is good, but it is still wise to provide some way for the experts who <em>do</em> need to violate the rule to declare that they are doing so on purpose (preferably with a required explanation of why). Otherwise, someone who <em>doesn&#8217;t</em> know what they are doing is likely to break things.</p>
<p style="align:center"><a title="Dilbert.com" href="http://dilbert.com/strips/comic/2001-10-25/"><img src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/2000/300/2318/2318.strip.gif" border="0" alt="Dilbert.com" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://mcherm.com/permalinks/1/an-exception-to-every-rule/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

