2008-12-17 | Filed Under Programming
Suppose you are setting up your database table, and you want to create a column to store an email address. How many characters should you allow in the field?
In Oracle (what I use most often), the correct type to use for an email is VARCHAR2(x) for some value of x. Oracle stores VARCHAR2 in an space-efficient manner — essentially it stores a null-terminated string. So if you declare the field to be a VARCHAR2(600), then it won’t take 600 bytes to store “JoeSmith@example.com”. It will take 21 or 42 bytes (I forget whether it stores characters in 8 bits or 16), one (or 2?) for each character plus one for the null terminator.
Given that, we don’t mind creating a really large field if that’s appropriate: it won’t take up extra space and will prevent someone’s long email address from being cut off. So my first instinct is to figure out the maximum legal length of an email address, then use this length.
Unfortunately, that turns out to be trickier than I had thought. Email was one of the first things standardized on the internet (actually, before there was an internet!) so the specifications have gone through several revisions. I believe the current specification would be RFC 5322. That does not define any length limit on email addresses, but it DOES define a maximum line length.
To get from that to email addresses, we need to divert to discuss what an email address is. The official definition says that an email address looks something like this: “Michael Chermside <email@example.com>”. However, in common parlance, people use the term “email address” to refer to just the “firstname.lastname@example.org” portion. Since we are storing email addresses entered by customers we’ll assume we are only storing the latter portion.
With that restriction, email addresses cannot contain whitespace. And RFC 5322 says header fields can only be line-wrapped at whitespace, so you can’t line wrap within an “email address” (of the sort we’ll be storing). So start with a max line length of 1000 characters, subtract 2 characters for the “\r\n” and 3 more for “TO:” and we can say that an email address over 995 characters could never receive email.
It was fun finding a theoretical limit, but to size my actual database column, I will use a much more practical approach. I really only know of one service on the internet that specializes in creating extremely long email addresses. (Yes, you can find anything on the internet.) That is the euphoniously named “http://www.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.com/“. Basically, somebody realized that although there is no length limit on domain names, there IS a limit of 64 characters in the component before the “.com” . So he registered a maximum-lenght name and started offering webmail with very long email addresses. The longest username they allow is 32 characters, with the @ and .com that comes to a nice round 100 characters.
So I will declare my email column to be VARCHAR2(100).