2009 November : Dragons in the Algorithm


Raising the limit on IDs processed

It is a fairly simple screen for entering “mass alerts”. There are (omitting some irrelevant details) just two fields: one in which the user enters the text of an alert, and the other in which they enter a list of customer-ids specifying who we should show the alert to. This is normally pasted in from a spreadsheet by the users who are setting up new alert messages.

The feature that we need to implement (or “story” in Scrum parlance) is an increase in the maximum number of customers that can be set at once. You see, there is a “feature” that limits the number of IDs that can be set at one time to about 200. (“About” 200 because most id’s are 9 digits long and they are separated by whitespace; the actual limit is 2000 characters, enforced in Javascript as the field is input.) So when they need to set an alert on 600 IDs, they run through the screen 3 times. When they have 2.5 million IDs to update they open up a “story” for the development team.

I think we asked someone why it was limited to 200 IDs. No one is quite sure, but it’s probably to avoid overtaxing the database query or running a middleware service that takes too long… something like that. “Sure,” we say, “we can increase the limit.” We figure maybe we’ll group it in chunks of 200 and call it in a loop or something. We schedule it to be worked on in this month’s “sprint”.

A couple of man-days of effort go into building it. Some testing determines that (on much less powerful dev hardware) a single call can easily handle thousands of IDs without running into timeout issues — more than that, actually, as we left a factor of 4 or 5 for safety. So the front end breaks the list into chunks of that size. We thought we’d build it to handle unlimited capacity, but there’s an IE6 bug (yes, our corporate overlords require the use of and obsolete broken browser) that limits us to about 60,000 IDs.

Our Corporate Overlords

Our Corporate Overlords

So we have completed the feature and the business can now enter more than 50x as many IDs at a time. But that’s not quite the end of the story. Because as part of regression testing, our QA staff does some exhaustive testing of the screen, and they discover that there apparently isn’t a limit on the size of other field, the one that contains the alert message. We check the database table for the appropriate max message length, and it turns out to be exactly 2000 characters.

Wait… I think I’ve heard that number before.

Apparently, whoever built this page in the very first place accidentally limited the length of the wrong field. There never was a reason for a limit on the number of IDs processed at once… the limit came entirely because of a bug. Yet we’ve been living with this absurd limitation for several years, simply because no one ever questioned the limit. (Or if they did question it, they got some vague answer like “I assume it’s for performance reasons.”)

I’m sure there is some lesson we should draw from this experience… I’ll leave it to you to figure out what the lesson is.

Upgrading GWT/AppEngine to v1.6+

I had a project using Google Web Toolkit (GWT) and App Engine. It was developed in Eclipse (which I don’t like much, mostly because I don’t know how to use it very well) because Google recommends this and provides support in the form of Eclipse plugins for working with these tools.

Well, they released a new version and I hit the “upgrade” button. After that, my project didn’t work anymore. I tried for a day to resolve it and I just couldn’t understand anything. Finally I “solved” it by uninstalling Eclipse and reinstalling it, then following the tutorial steps to create a brand new project and copying in my old files one-by-one. Another full day lost (I can only work a couple of hours per day on hobby projects).

Surely they wouldn’t do it again, right? So I carefully saved everything and held my breath the next time Google released an upgrade. It promptly broke everything like last time. Only this time I solved it differently: I uninstalled Eclipse and did NOT reinstall it.

The big difference is that in the intervening month JetBrains had announced that a slightly-impaired version of IntelliJ IDEA would be available for free. The stripped down version doesn’t have support for GWT and App Engine (which the paid version does have), but it’s something I can use. At work, I use IntelliJ (properly paid for) but it’s awfully expensive to pay for my own copy at home. (Can’t use the same copy because that would disturb the corporate bean-counters, even though it is allowed by the license.) The stripped down version is fine if I can run from the command line.

There are instructions for running GWT via ant. And there are instructions for adding support for App Engine. But they are broken in (what I think is) exactly the same way that the Eclipse plugin is broken. Details from a forum posting led me to realize the problem was that it now needs a “javaagent” specified. A “javaagent” is some sort of a pre-processor that runs before main() — apparently introduced with Java 1.5.

So after following Google’s instructions, I now add the following: In my <hosted> target, along with the other <jvmarg> elements, I add a new one which looks like this:

<jvmarg value="-javaagent:${appengine.sdk}/lib/agent/appengine-agent.jar"/>

After that, I can build it using ant. I’ll also need to use the command line for deploys, that looks like this:

"C:\Program Files\appengine-java-sdk-1.2.6\bin\appcfg" update war

And now it works again.

[Here there should be links to more entries, but WordPress is a pain and I can't make it work.]