Using the Legal System To Access Customer Data

A week or so ago Microsoft dug into a customer’s Hotmail account in order to track down some information about code that had been stolen from Microsoft. Their terms and conditions specifically allowed them to do this, but despite that they received a fair amount of criticism.

With this announcement they have decided to change their policy. Now they will only access private customer data in response to a law enforcement request — if a similar situation arises, they will ask law enforcement to investigate (by asking them to provide the data).

This is an excellent decision. Our legal system may not be perfect, but it has all kinds of checks and balances built in to prevent abuses and to balance individual’s rights against the public need to perform investigations. Rather than inventing their own “legal system” for adjudicating such things fairly, Microsoft is taking advantage of the existing system our society has built up over centuries. Other “cloud providers” (for that is exactly what web mail is) should adopt the same policy.

Reasons Why My Code Style is Wrong

I have had people tell me things like “You should never throw an exception to return a value in an unusual case, exceptions are only supposed to be used for error conditions.” And I HATE it when people say things like this.

There are a couple of audiences at play when we write code. There is the machine that needs to compile and then execute the code. If the code won’t compile or if it runs slowly, wastes memory, or produces incorrect results then we have absolutely failed. The other audience is future readers of the code — other developers who will need to read and maintain the code or even ourselves who will come back months later and say “Oh my God, what was I thinking!” when we read it.

But communicating with these two audiences (the computer and the reader) is all we are doing — we are not playing a game by some arbitrary set of rules. There are no “software police” who will come along and arrest us if we throw an exception, use a goto, fail to start our service name with a verb, or use a URI for our REST API that is inconsistent with some other part of the ontology. (Actually, there may be software police, but if so it is only because they have chosen to self-appoint themselves to that role.) Doing these things might make the results incorrect (for the computer) or difficult to read (for the developer) and that would be bad, but only because it was incorrect or difficult-to-understand, not because it broke some arbitrary rule.

So I would much rather hear someone say one of these things to me:

  • “Don’t throw an exception to return a value in this case because creating exceptions is slow, the situation occurs somewhat frequently, and it will reduce the performance of this function which is used in a tight loop and is therefore performance critical.”
  • “Don’t throw an exception to return a value in this case because there is error-handling code that will log the exception as an error before it is caught by the handler, thus producing spurious error reports.”
  • “Don’t throw an exception to return a value in this case because it will hide the fact that the function actually returns a value and that will be confusing to someone not familiar with it.”
  • “Don’t throw an exception to return a value because the other code in this module never does that and it will be inconsistent and therefore difficult for readers of the code.

In other words: It’s great that you’re telling me a different (or better) way to write some code, but don’t tell me to do it because that’s “the right way” (appeal to authority). Instead, tie it back to an actual benefit like correctness or readability.

Book Review: Learning jQuery Deferreds

I don’t often write book reviews here, but in this case I have a connection to the book. My friend Terry Jones was one of the authors of a new O’Reilly programming book (you know, the ones with the animal pictures on the covers) which is titled:

Learning jQuery Deferreds
Taming Callback Hell with Deferreds and Promises

I offered to read and provide feedback on a pre-print version of the book (the publishing process all happens on PDFs these days) and I can say it was a great read. In fact, I have the following review of the book:

Concurrent or parallel programming is hard – REALLY hard. Like quantum mechanics, it is one of the few areas where the mark of a true expert is that they admit to NOT clearly understanding the subject.

The “deferred” is an object pattern for handling one piece of the complexity of concurrent code. It helps to bridge the gap between writing things in a linear format as if for a single-threaded computer, and writing a series of triggers that go off when events occur. Those two models are not really compatible, and that can make it quite confusing.

The jQuery library offers a “deferred” object which is deceptively simple: just a handfull of methods. It could all be explained completely in about a page of text (and IS explained that way if you read the docs). But no one who was not already an expert in the use of the “deferred” pattern could possibly use it correctly.

And that is where this book comes in. The text slowly explains what the class offers and how it functions, along with the reasons why each design detail is important. And then it presents a series of exercises of increasing complexity — all reasonable real-world examples, by the end of which you will fully understand how concurrency can be tamed (partly) with the deferred class. I am a reasonably skilled programmer (20 years experience, at least 15 with concurrent programming) and I found the pace to be about right: everything explained VERY clearly with examples (which is exactly what you want for a tricky subject no matter HOW well you know it).

If you’ve been using jQuery deferreds for a couple of years now you should probably skip this book — by this point you may be an expert. But for everyone else who thinks they might be using them, this is a great little tutorial and I recommend it highly.

CAPTCHAs

CAPCHAs are those odd little boxes that show some badly malformed letters and numbers and ask you to type them in. The idea is to check whether you are a human.

The problem is that CAPCHAs are pretty difficult for humans. And they’re fairly easy for computers. There are the simple work-arounds (like paying to break CAPCHAs on Mechanical Turk). And there are the high-tech solutions where you simply build a computer that can solve them. My biggest concern though is the new kind of CAPTCHA that people have begun using. I find it to be a real problem, and it, too, can be worked around by anyone who is sufficiently motivated, but it is becoming a disturbingly common new way of identifying real humans:

Log In With Facebook

 

Version Control… for Servers

I wanted to pass on an excellent idea that I read from Martin Fowler‘s Blog. He calls it Immutable Servers, but I claim, if you think about it properly, it is merely the application of version control to systems administration.

Everyone understands just how much version control has transformed the development of software code. It enables developers make changes freely, rolling back changes if they need to. It enables them to look back in history and find out how things stood at any point in time, what was changed on a certain date, or when a given change was introduced. And with advanced usage, it allows “branching”, where one can experiment with a group of changes for a long time (while still working on the original branch as well) then merge them together later.server_versions

These features aren’t just for code. They are great for text documents that get edited frequently. They are a great idea for file systems. And system administrators are familiar with the idea of keeping all of their system administration scripts in a version control system. But some things are extremely difficult to put under version control. Databases are notoriously difficult to version (although Capital One 360 manages it). And servers, being pieces of physical hardware, are impossible to check into Git.

Except that they’re not. Servers are not pieces of physical hardware anymore… they were until the last decade, but in recent years that has changed. The vast majority of the servers in our data center either are run or can be run on virtual servers. The current buzzword is “cloud computing”, but whatever you call it, we have the technology to spin up and deploy servers from a template in a matter of minutes. (The fact that it takes weeks to get a server set up for your project has nothing to do with technical problems… that’s just our own failure to take full advantage of the technology that we own.)

So, given that the servers are probably running on a virtual machine anyway, it’s a good idea to keep a virtual machine template with the correct configuration (for quickly restoring the machine). Of course, if you do this you will need to update the template every time you make a significant configuration change. Updating the image doesn’t necessarily mean you launch a virtual machine each time, make the change, then save a new image — you can use tools like Puppet or Chef as part of the image deployment process so often it is just a matter of editing a configuration file.

For the final step, Martin Fowler proposes that you take this to its logical conclusion. If every change needs to be made on the real server AND on the template, why not simplify your workflow (and make it more reliable at the same time) by making the changes directly to the image and deploying a new copy each time. You never change the production server, just roll out a new one each time. This sounds crazy to anyone who hasn’t yet drunk the “cloud computing” cool-aid, to anyone for whom creating a new instance of a server takes more than a couple of minutes, but if you DO have an environment that flexible, then you might get all the benefits of version control but for servers. Netflix is one example of a major company that has taken this approach quite successfully.

 

I dream of Satoshi Nakamoto

bitcoin_license_plate“Satoshi Nakamoto” is the alias of the anonymous person who invented and published the protocol for Bitcoin. So far, no one knows for sure who it is, although attempts have been made to unmask the person (or people) by an analysis of their writing style and similar indicators. Now, in a blogpost, Sergio Demian Lerner has found a way to recognize coins mined by the same computer and has picked out the distinctive pattern of a certain individual who began mining almost from block one and continued mining at a consistent rate with regular restarts for a long time, without spending any of those coins.

This, he says, is Satoshi, and I applaud Sergio for this clever way to recognize an individual miner. Like Sergio, I am pleased that Satoshi’s fortune in Bitcoins is now apparently worth around $100 million USD. But Sergio also suggests that he expects this will lead to the unmasking of Satoshi once others track this to a Bitcoin somewhere which HAS been spent. (Bitcoin has many advantages, but it is NOT fully anonymous: in fact,  anyone can track a payment back to see which (anonymous) account it came from previously.)

I hope he is wrong about the unmasking. I prefer to imagine that Satoshi Nakamoto is living and working a normal job, still haunting cryptography boards in the evenings and on weekends, and occasionally checking the news to see how that Bitcoin thing is progressing. I imagine that someday, many years from now, when she dies her husband will open that envelope she left in the safe-deposit-box and it will contain a hard drive and stack of papers labeled “Now that I am gone, please publish this for the world to read.”

Okay, it’s just a romantic dream, but I’m hanging onto it as long as I can.

How NOT to do technical recruiting: Sunil Kumar of Panzer Solutions

So, “Sunil Kumar” of Panzer Solutions wrote to me a ten days ago offering a position. Normally, I appreciate hearing from recruiters. As it happens, I have no interest in a new job; I am happy with my current position and have plenty of new challenges there recently. But it is nice to hear the signs that my industry is doing well, and keeping up contacts with recruiters in my area is a good idea.

But Mr. Kumar didn’t write me about a position commiserate with my specific skills, he wrote to tell me “We have more than 100 W2 working currently with successful hit.” (That’s not quite English, but it’s fairly close.) There are recruiters who work hard to match up a particular applicant with a position where their skills and their career/environment preferences are a good fit. When I am doing the hiring (and just to note, Capital One is hiring right now in the Wilmington area), I love working with these recruiters: they bring me just 3 resumes and I end up wanting to bring in all 3 for further interviews. That’s a much more pleasant experience than digging through a stack of resumes most of whom can’t pass the FizzBuzz test.

Mr. Kumar is in a different category altogether: he clearly thinks recruiting is a numbers game: if he just sends enough applicant names to enough open positions then he’ll be successful. He won’t be, because he’s not adding value. So I politely wrote back to Mr. Kumar explaining this and asking that he not send me “blind mailing” style job offers. A week later I have received TWO other emails from Mr. Kumar stating that “Panzer Solutions is looking to hire 10-20 New H1b’s and OPT EAD’s in coming one month.” (Still, not quite English.) Besides being a violation of federal employment law (I’m not a lawyer, but I was under the impression that companies were not permitted to favor H1B holders over citizens), this is no better than spam, either for the recipient (me) or the employer to whom the names are offered.

So I am Naming and Shaming Mr. Sunil Kumar of Panzer Solutions, and I will never do business with him or his company. Here’s hoping this article jumps to the top of the search rankings for those names so that others will recognize their uselessness sooner and Panzer and Mr. Kumar can quickly go out of business and leave space for better recruiters who actually make the hiring process easier, not harder.

Things my next phone should have

After looking at the iPhone 5, I see people saying that phones already have everything they need… nothing new will happen. That’s completely absurd. There are tons of little things, for instance I want a browser that can do everything the PC browsers can do. But there are also HUGE changes needed. Here are some things that I want for my phone:

Story Points Aren’t Accurate – That’s Why They’re Good

Ben Northrop wrote to complain that story points are not accurate. They don’t (always) map linearly to hours spent, so adding up story points over a large project won’t accurately give hours for the project. In the spirit of expressing controversial opinions, I will agree, and explain why I think that’s a good thing.

I believe that story points serve as a “rough” estimate. In the teams I work with, story point estimates are made quickly (a few minutes to be sure we understand the story, then quickly discuss and reach a consensus estimate). They are quantized (must round off to some Fibonacci number) which means that any given estimate is necessarily imperfect.

As such, they provide a cheap (didn’t take long to generate) but rough (not perfectly accurate) estimate, and they have to be respected as such. Story point estimates would not be useful to answer questions like “Will this project deliver in October or November?”, but they ARE useful for questions like “Would this be a 3-month project or a 1 year project?” For some purposes, a more precise estimate is needed, and then it may be necessary to invest a few hours to a few weeks to perform detailed work to generate a more precise estimate. However, I think that such situations are rare: people *want* perfect estimates ahead of time but rarely *need* them. Also I think that people are usually fooling themselves: most (usually waterfall) projects with precise up-front estimates later discover that those estimates are not accurate.

One of the strengths of story points is that everyone (including the customer) REALIZES that they are rough and don’t correspond to a precise delivery date — something that can be difficult to explain for estimates expressed in hours.

Constant Crawl Design – Part 4

Suppose you wanted to build a tool for anonymously capturing the websites that a user visited and keeping a record of the public sites while keeping the users completely anonymous so their browsing history could not be determined. One of the most difficult challenges would be finding a way to decide whether a site was “public” and to do so without keeping any record (not even on the user’s own machine) of the sites visited or even tying together the different sites by one ID (even an anonymous one). [More...]