Threadsafe Java Servlets
2008-09-23 | Filed Under Programming
Web servers are inherently threaded applications: their primary purpose is to serve up a website or web application to a large number of users. Essentially all of the frameworks for creating web applications, such as Java’s “servlet” specification and all of the structure built on top of it, provide built-in support for handling queries from different users simultaneously, and they make it possible for these threads to operate “safely” (without data corruption) so long as a few basic rules are followed (“Don’t store anything in the servlet instance variables.”, and “Don’t access anything stored in static variables unless it is threadsafe.”).
However, threading issues for web servers are not limited to the fact that there are multiple simultaneous users — it is also possible to be processing multiple HTTP requests a single user at the same time. In fact, this happens all the time with requests for images. Most browsers today download images 4 at a time (or more) after the basic page has loaded. Fortunately, downloading images tends to be a read-only operation with little opportunity for threading problems. In “the old days” (in our industry that probably means last year) users would mostly view one page, wait while it finished loading, then follow a link or submit a form to a new page. So images were the only thing happening “simultaneously” for a given user, and there was little danger of data corruption.
The advent of AJAX and rich internet applications has changed all that. Modern websites don’t consist of a series of pages and forms, they consist of an interactive environment where the user modifies controls which communicate with the server. Think of Yahoo Mail, Google Maps, or NetVibes. And that means that the threading issues which formerly caused very little grief are now poised to become a major issue.
What, you may ask, is so special about multiple requests from the same user? If safely handling multiple simultaneous requests from different users is a solved problem, why would safely handling multiple requests from the same user be any different? Well, two threads can run safely as long as they do not share any objects. Avoiding instance variables in the servlet (or implementing SingleThreadModel, but that’s not as good a solution) prevents one possible kind of shared object. Avoiding static instance variables (except possibly those designed for threadsafe use) prevents another. And the specification for servlet says that the container is responsible for ensuring that the other objects available to the servlet are safe to use… except for one: the session.
The session (usually the HttpSession) is used to carry around data from one call to the next within a user’s session. (This is needed because the HTTP protocol is inherently stateless, but rich internet applications are quite stateful — the state is stored in the session). Some calls (like those downloading an image) won’t use the session, and those have no threading issues. But many calls will read some information from the session and write other information. If multiple threads are reading from and writing to the same object and that object is not specially designed to be threadsafe (and HttpSession is not), then it is a recipe for threading disasters.
So what is the solution? I will try to describe the some common “solutions” and why they are not satisfactory for the project I am starting on at the moment. Then (in a future post) I hope to sketch out a more complex but more capable solution which does meet my current needs.
The most common solution, the one used by the vast majority of Java web application programmers is called “pretend it doesn’t matter”. Whether through ignorance (probably), or a belief that “it won’t matter anyway”, they simply don’t do anything about the threading behavior of session. And that most of the time, they get away with it – after all, data corruption due to threading behavior is rare problem, occurring in an indeterminate fashion usually impossible to replicate in a test case. So the program “nearly always” works, and the few glitches are just ignored. For me, this approach (though tempting) is simply too dangerous: one should never rely on undefined behavior, particularly not if you work for a bank: people tend to get peeved when their bank makes small, random data errors.
Another solution is simply to avoid storing any data in the user’s Session. While this would work, it rather undermines the usefulness of the display tier. For most non-trivial web applications we don’t want to have to re-fetch data from the database on each and every HTTP request — we need to cache various pieces of information in the Session.
The next simplest solution is to obtain a lock on the user’s Session (or some standard object in it) before reading from or writing to it. The advantage of this approach is that it is straightforward, and that it is guaranteed to avoid concurrent access. There is even support for it in several major frameworks, such as the synchronizeOnSession property on the AbstractController in Spring MVC. The disadvantage is that different HTTP requests are handled one-at-a-time by the server. This means that we don’t get to take full advantage of the massive multi-threaded server that is serving our application: sure, all hardware allows us to support hundreds of simultaneous users, but it won’t speed up the experience for any one user. It also means that if we issue several requests in quick succession, they will “line up” at the server… I have seen cases where one busy page “locked up” the server with a whole series of requests so that even if the user navigated away to a different page there was still a noticable delay while the server performed the already-queued requests. And finally, it would prevent one thing which I certainly want, which is for certain long-running requests to go on in the background while the user’s interactions continue. So this solution, while simple and elegant, won’t meet my needs.
The next obvious idea is that instead of locking on the entire session, you can just lock on specific bits of data. So if the session contains variables “loginSuccessful”, “userName”, “cachedUserData”, and “cachedUserPreferences” (the latter two being complex structures in their own right), then instead of obtaining a lock on the entire Session before reading or writing any of the variables, you could instead have a policy of getting a lock on an individual object before accessing it. Unfortunately, this approach is fraught with problems. One is the race condition for object creation: if two threads both check for a “cachedUserData” and find that it is missing, they might both try to load and save it simultaneously. This can be avoided (somewhat awkwardly) by having a separate lock object for every possible variable and creating all of the lock objects at Session initialization.
Unfortunately, there is a more serious problem: that of deadlock. Suppose two different HTTP requests both need to access the “userName” and the “cachedUserData”. If one locks the “useName” while the other locks the “cachedUserData” then each tries to obtain the other lock, they will deadlock, and neither can continue (nor can any other thread using these locks!). The only solution that I know of for this problem is to always obtain all locks in a fixed order… but that is particularly difficult: in the general case it requires global knowledge of ALL code in the entire application.
And weighed against these tricky threading difficulties, the benefits of locking on specific bits of data are not very impressive. There will often be a single piece of data (“cachedUserPreferences” perhaps) that is used by nearly every command – in which case it reduces to being essentially a lock on the session.
So how CAN one solve this dilemma? I am hoping to come up with a solution by using immutable (copy-on-write) data structures. But this is long enough already: that will have to be a topic for a future essay.
Post Links
Permalink | Trackback | 6 Comments
Comments
6 Responses to “Threadsafe Java Servlets”
Leave a Reply
It should be noted the HttpSession is not inherently ThreadSafe or not – its just an interface that dictates what a servlet httpsession should provide not ‘how’ it should provide it.
It is certainly possible to provide an implementation within your given container that supports the threading safety that you are looking for. I would even venture to say that in some of the more popular servlet containers it is more then likely to already have been dealt with as an issue in the default implementation of the session.
There are also well known ways to deal with the ‘double’ object creation issues that you outline. I would be interested in talking to you about the problem you are attempting to solve so as to understand if you may be solving a problem that has already been dealt with.
Just my $0.02.
Joe
Very nice analysis. The biggest problem with concurrency is that people don’t think about it sufficiently, leading to all those grand issues with mutable state, locking etc.
My personal favorite solution (for the moment) is the one undertaken by Clojure: immutable, persistent (*not* copy-on-write!) data structures referenced by shared mutable refs with concurrency semantics. Unfortunately, to get the full impact of Clojure’s references, you really need a transactional memory model, which isn’t exactly something you can retrofit onto a language. Still, “altering” an immutable data structure and storing the result in a shared mutable field is a lot better than actually modifying a single, shared mutable data structure. Once you add simple Mutex locking to the shared mutable variable, then you’re good to go!
As an aside, persistent data structures are a *lot* better than copy-on-write. They’re both fully-immutable, but copy-on-write is hideously inefficient for write operations since the entire structure must be copied. Persistent data structures share structure between new and old versions. With a fully-persistent data structure (like singly-linked lists), you don’t need to copy anything on write, just make the new head point its tail to the old list. More complicated data structures (like red-black trees) usually end up copying a little more, but almost never the entire structure.
Optimally, a fully-immutable data structure should preserve all of the performance guaranties of its mutable counterpart. This is usually hard to achieve, but when it can be done the results are extremely compelling.
Joe:
I would love to hear about a servlet container that provides some kind of support *other* than the ability to process the requests one-at-a-time for a user (same as locking on the Session). I was under the impression that neither Tomcat, JBoss, WebSphere nor WebLogic offered that. Am I wrong about one of these, or do you know of another container which does?
As for the “double object creation” issue, I’ll stop by your desk sometime; I’d be interested to hear your ideas.
Daniel:
Ah… apparently I misused the term “copy on write”. I was actually envisioning an immutable data structure held by reference in the (mutable) Session, just as you describe. This works great for threads that do reading (they can work with the copy that was valid when the thread began, even if subsequent updates occur). But I’m still trying to figure out he details for threads that do writes. I don’t quite understand your statement, “Once you add simple Mutex locking to the shared mutable variable, then you’re good to go!” but hopefully I’ll figure it out.
I _do_ wonder why there aren’t more published solutions out there for this problem.
When I searched around on the web for other descriptions of this problem and perhaps for people who have solutions (other than locking on the session) I found almost nothing that talked about it. It sometimes makes you wonder… am I imagining things? Is the problem real but everyone in the world is ignoring it, or is it that I am just confused about the issues.
Well, it turns out that I’m not the only one. Joe Campbell pointed me toward a well-written article by article by Brian Goetz that describes this problem in detail. Brian is one of the world’s top experts on Java threading — among other things he is the author of “Java Concurrency in Practice”. So I am not confused… there IS a real issue here, and 99% of all web applications are broken.
“This can be avoided (somewhat awkwardly) by having a separate lock object for every possible variable and creating all of the lock objects at Session initialization.”
For this problem, you could either declare those objects volatile, or, perhaps even better, make them Atomic. Check out the docs on atomic variables here: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/atomic/package-summary.html
This’ll allow you to read and write variables threadsafely, and also to compare and update them – the threading system may decide to switch to another thread in between a comparison and an update of a variable.