The Beauties and Pitfalls of ThreadLocals

ThreadLocal must be one of the Java features, which not many developers have used. It doesn't happen so often, that you stand in front of a problem, for which ThreadLocal is the best solution. In the current game I am developing, I have however decided to use a ThreadLocal to store per-thread slash per-request context information. Actually this is one of the use cases that Brian Goetz wrote about in his great article about ThreadLocals.

Other applications for ThreadLocal include storing or accumulating per-thread context information for later retrieval.


This game uses the Spring Framework and the Servlet which is handling all requests, is a Spring Bean. When the request comes in, the Servlet will delegate handling of the request data to something that we call a RpcHandler. Once the appropriate RpcHandler is selected by the Servlet, the request has to go through 3 stages and each stage is mapped to a method in the RpcHandler.


// stage 1
public void readRequest(final DatatypeInput in) {

}

// stage 2
public void performOperation() {

}

// stage 3
public void writeResponse(final DatatypeOutput out) {

}


First the request data is read, then the RpcHandler is performing it's main operation and finally some response data might be written. In other games developers have decided to create a new instance of the RpcHandler subclass for each request. This makes it easy as the per-request context information can be stored in private fields of the RpcHandler. The negative part is that this approach is using more memory as there will be 1 instance of RpcHandler per request. This might sound acceptable if the RpcHandler was simple and lightweight but often they contain further object-trees. Because our game was Spring based, I was striving for a solution were we only had 1 instance per RpcHandler sublass. Each instance is injected as a Spring Bean into the Servlet. Since the RpcHandler interface could not be changed, so that per-request context information was given as a parameter to the 3 methods above, I needed to use a ThreadLocal.

First I create a wrapper class for the context information called Message. The RpcHandler would then contain a private field of type ThreadLocal encapsulating a Message on a per-request basis. Even though ServletContainers like Jetty or Tomcat use a thread pool were Threads are recycled, this would still be okay as one Thread is used to service an entire request (actually this might be different in Servlet 3.0 where requests can be suspended and I'am assuming Threads are given back to the pool?). Quoting Brian Goetz:

ThreadLocal is also useful in servlet-based applications or any multithreaded server application in which the unit of work is an entire request, because then a single thread will be used during the entire course of handling the request.


Since usually each Spring Bean does only exist as 1 instance per ApplicationContext (unless specified otherwise), I decided to have a non-static ThreadLocal field which is sort of different from what the JavaDoc class comment suggests:

ThreadLocal instances are typically private static fields in classes that wish to associate state with a thread.


Here is an example of a typical RpcHandler subclass in the game:


public class SaveChangesHandler extends AbstractRpcHandler {

@Override
public void readRequest(final DatatypeInput in) {
// extract Message from DatatypeInput
setMessage(...);
}

@Override
public void performOperation() throws DAOException, RpcFailureException {
final Message message = getMessage();
// use Message to update the User
userService.save(message.getUser(), ...);
}

@Override
public void writeResponse(final DatatypeOutput out) throws IOException {
final Message message = getMessage();
// use Message to write something back to repsonse
out.write(message.getUser(), ...)
}

}


The AbstractRpcHandler class looks like this:


public abstract class AbstractRpcHandler extends RpcHandler<GameDaoSupport> {

private ThreadLocal<Message> threadLocalMessage;

public Message getMessage() {
return threadLocalMessage.get();
}

public void setMessage(final Message message) {
this.threadLocalMessage = new ThreadLocal<Message>();
this.threadLocalMessage.set(message);

// add a cleanup callback which will be
// executed from the ServletRequestCleanupFilter
ServletRequestCleanupService.addCleanupCallback(
new ServletRequestThreadLocalCleanupCallback(
this.threadLocalMessage
)
);
}
}


This worked well for some time but sometimes I could see the following errors in the logs:


2011-07-26 13:50:50 ERROR c.p.s.RpcServlet:handleRequest threw exception:
java.lang.NullPointerException: null
at com.foo.save.SaveChangesHandler.writeResponse(SaveChangesHandler.java:21)


Apparently the Message in writeResponse(..) was null, which was a bit strange because it was not null when performOperation(..) was invoked by the RpcServlet in stage 2. It took a bit but I found at least one problem with the code in AbstractRpcHandler:


public void setMessage(final Message message) {
this.threadLocalMessage = new ThreadLocal<Message>();
this.threadLocalMessage.set(message);

// add a cleanup callback which will be
// executed from the ServletRequestCleanupFilter
ServletRequestCleanupService.addCleanupCallback(
new ServletRequestThreadLocalCleanupCallback(
this.threadLocalMessage
)
);
}


The intention was to update the thread-local Message for the current Thread. Instead the code was reassigning the shared field to a new ThreadLocal instance and the previously assigned ThreadLocal instance was probably garbage collected. Here is a visualization of the problem flow:


Thread1 → setMessage(m1)
assigning new ThreadLocal1
ThreadLocal1.set(m1)
Thread1 → getMessage
ThreadLocal1.get → m1
Thread2 → setMessage(m2)
assigning new ThreadLocal2
ThreadLocal2.set(m2)
Thread1 → getMessage
ThreadLocal2.get → null *bang*


Took a while to spot the problem. I can really recommend to mark you static or non-static ThreadLocals as final, so this problem could not have occurred. I changed the AbstractRpcHandler to the following code and the errors disappeared.


public abstract class AbstractRpcHandler extends RpcHandler<GameDaoSupport> {

private final ThreadLocal<Message> threadLocalMessage = new ThreadLocal<Message>();

public Message getMessage() {
return threadLocalMessage.get();
}

public void setMessage(final Message message) {
this.threadLocalMessage.set(message);

// add a cleanup callback which will be
// executed from the ServletRequestCleanupFilter
ServletRequestCleanupService.addCleanupCallback(
new ServletRequestThreadLocalCleanupCallback(
this.threadLocalMessage
)
);
}
}


On a side note, I found it was a good practice to cleanup the ThreadLocals after the request was finished. Not really because this represented a memory problem as the number of Threads was limited by the Jetty thread pool. Rather I felt safer, if the recycled Thread serving a new Request would not see the Message of the previously handled HTTP Request. To do the cleanup using a Servlet Filter, I stole some code from the Apache Jetspeed 2 project (ServletRequestCleanupFilter, ServletRequestCleanupService, ServletRequestThreadLocalCleanupCallback and ServletRequestCleanupCallback).

Interviewed at Google Pt.2

Some people asked me why I wasn't publishing the second part of my Google interview experience. Well I didn't care so much about Google and job interviews in general after I moved to Norway. Anyway, I finished the whole blog post back in September at the airport in Zürich. So you might as well read the rest.

So lunch was great. Google has his own canteen with dedicated cooking staff etc. There is a salad bar, free drinks, desert and 3 dishes to choose from every day. Not bad. The canteen was packed when we went. It must have been 200 people or so.

Anyway, after lunch the next interview was again with two people, of which one was talking and the other one listening. This time I had two coding exercises in the interview. The first one, was to come up with an implementation of a class, which you could send a function or a block of executable code to, that should then be executed after 34 seconds. The second exercise was to suggest a unit test for a function producing random floats values between 0 and 1. For the first task I picked Java as concrete language and suggested using a ThreadPoolExecutor, Callable and Thread.sleep. The interviewer asked me, how we could bring down the test execution time, as we did not want to wait 34 seconds every time the test runs. I suggested having the time span configurable, so that it would fire after the configured amount of seconds. A better answer would have been to refactor the code, so that it was given an instance of a class representing a clock. I knew the clock test pattern but I was not able to pull it out in this situation - too bad. As for second task, testing random float values, I suggested creating a test set and group/count the values using some array indices. The test validation should only allow a certain degree of deviation. The interviewer then asked me how to calculate the deviation based on the input size but it was out of my league and I could not answer. For the correct answer you need some good knowledge in statistics I suppose. This was probably the worst interview I gave this day. It was also a bit harder for me because the interviewer had a very strong French accent. I had to ask him to repeat a couple of times, which might have been annoying for him. This is particular maddening because a co-worker I worked with for 3 years was French, so I should have been used to French-English.

The next interviewer was from Poland. He was a very cool guy and very friendly. We talked a bit about Apache Cassandra, which I had mentioned in my CV, since he knew one of the Facebook developers working with Cassandra. The coding exercise he had for me was about a graph problem. He gave me a class data-structure holding a list of File objects. The Files were supposed to have references to each other, pretty much like edges in a graph. I was given the task to print out the Files in a specific order. Unfortunately I understood him wrong and suggested a standard DFS traversal. But he was interested in a very specific order, which required a modified DFS approach. It was a nice interview anyways and he told me I was actually the first guy who had implemented a DFS using a stack and that he liked it. Well, I thought using a Stack for DFS was the default way, strange.

Time went by and after 45 minutes I found myself again in a very test-orientated session. The first 20 minutes or so, I was asked a lot of test related questions like "How would you do performance testing and let the development team know about the results?". My gut feeling was that I didn't give very great answers to these questions. Basically in my previous companies we never had an organization, where there was a unit of Software test engineers supporting the main developers. It is often the developer itself who does the performance testing and has the results at hand.. At least at this point, I realized I should have prepared myself more for test related questions since this was a Software Engineer in Test position after all. In my preparations I spent most time focusing on solving technical questions. But okay you never know this in advance. The final coding exercise in this interview was to come up with a suggestion on how to verify a file backup (directory structure) so that it matches a given source directory. I suggested to create an abstraction for each physical file or directory which would store the filename and a checksum. Then you could do a traversal of each tree, building up two linked lists. Finally you start removing the first item in each list and compare the items. If they don't match, it is either the directory structure that was wrong or something with the files. As the last step, I had to implement this in a language of my choice. I used again Java.

After a long day, the final interview started. A very nice girl, who worked with the Google search as a Software engineer, started to ask questions about my current job position. Surprise, this was actually the only time someone showed interest in what I was currently doing. I spoke about our system, the problems which we had and have. She asked me questions like "What happens if the database fails in this case?" and so on. I think she was testing my overall understanding of how different parts of a software system affect each other. The second part of the interview was performed as a code review. She had printed a function in Java and I my task was to review it. It was easy enough to find most of the potential issues like using static members (testability), not validating input, thread safety, going over the index bound of an array etc. I mixed up the explanation for bitwise XOR with AND, embarrassing but hey you are nervous. After doing the code review I had to rewrite a recursive function call into a iterative one, which I somehow managed. When the time was up, we said goodbye and she guided me out of the Zürich office building.

So what do I think about the interview? I think it was a great experience. I met nice people and they showed me some of my weaknesses. The preparations I had done upfront helped me only to some extend. There were less technical questions, no behavioral questions and definitely no questions about me or Google's history. A better preparation would have been to read some books on Software testing and less books about algorithms and data-structures.

So much for the second part. Here is something I am adding to the text I wrote 9 months ago. Never add technologies to your CV which you only know superficial. This can get you into embarrassing situations. Less is more. I have seen CV's for candidates who have worked with everything that exists out there - not good. For a Google interview, I think it is a plus if you know more than one language. Back then I was a Java only guy more or less. It's great if you can Python, C or C++. Some of my whiteboard examples where too Java specific, i.e. when using something like ThreadPoolExecutor take a step back and talk more about concepts than JDK classes. Don't expect every interviewer to know the language you are good at. Finally, expect to be disappointed if you fail. This can hit you hard, especially if you are used to getting a job offer every time you been to an interview.

Fluent Flexunit tests

As some of you might know, I work for a company which is doing Facebook games. Currently almost all of our Facebook games use Adobe Flash a the client and Java as the backend technology. There are different ways to have a Flex client communicate with a Java server application. For instance BlazeDS, which I really like. Unfortunately we are not using BlazeDS for the games I have been involved with. Instead we are using a framework, also based on a binary format similar to AMF, that someone at my company has written some time ago. There are some things about this framework which I don't like but overall it is performing really good. Anyway, last weekend I started to read a great book called Flex on Java, which was published by Manning almost half a year ago.

I am not the best Action Script developer, in fact I had never written a single line of Action Script when I started at my current company. The language is quite similar to JavaScript however, so Java developers are able to learn Action Script rather fast. From the first chapters in the book I could learn some stuff about Flex that I didn't knew before. Also it woke my interest in asynchronous testing using Flexunit. Similar to the Webservice example from the book, our unit-tests are also using asynchronous service calls. Well these tests are not really unit but integration tests. The first thing that is done is to start a embedded jetty using Maven and the maven-jetty-plugin:


<build>
<plugins>
<plugin>
<groupId>org.mortbay.jetty</groupId>
<artifactId>maven-jetty-plugin</artifactId>
<version>6.1.26</version>
<configuration>
<stopKey>foo</stopKey>
<stopPort>9999</stopPort>
<contextPath>/</contextPath>
<webApp>
${settings.localRepository}/package/goes/here/1.0-SNAPSHOT/game-1.0-SNAPSHOT.war
</webApp>
</configuration>
<executions>
<execution>
<id>start-jetty</id>
<phase>process-test-classes</phase>
<goals>
<goal>deploy-war</goal>
</goals>
<configuration>
<daemon>true</daemon>
</configuration>
</execution>
<execution>
<id>stopconfigurationReport-jetty</id>
<phase>prepare-package</phase>
<goals>
<goal>stop</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>


It is a bit dirty that we are starting a WAR file straight out of the Maven repository using an absolute file path. Not sure there is a way to tell the maven-jetty-plugin to start a WAR file of a Maven dependency instead. This does the job for now.

When embedded Jetty is running, our Flexunit integration tests will fire a sequence of service calls against the server. The sequence starts with resetting the test user, then some calls altering the player data and finally a call to load the test user back from the server. For the test user loaded at the end of the test, we run some Flexunit assert statements to validate that we are in the expected state. Since each of the service calls in the test sequence must be fired asynchronously, we use Async.handleEvent to wait until a certain Event has occurred otherwise we fail the test. Async.handleEvent has a Function argument called eventHandler, which will be the Function that is invoked if the Event has been fired successfully. Given that your service call sequence has a few steps, the test quickly mutates into a chain of Function calls of private Functions within the same test class. If you are coming from a JUnit or TestNG background, these sort of tests are not very attractive. For another developer, just from reading the code, it might be really hard to follow the test flow. I found myself adding comments all over the place, documenting where the test is continuing and what we were testing.

Here is a typical test which I made up to give you an example:


public class AlterPlayerTest {

private static const TIMEOUT : int = 2000;

private static const INITIALIZED : String = "initialized";
private static const SAVED : String = "saved";
private static const LOADED : String = "loaded";
private static const FAILED : String = "failed";

private var eventDispatcher : EventDispatcher = new EventDispatcher();
private var rpcClient : RpcClient;
private var user : User = null;

public function AlterPlayerTest() { }

[Test(async)]
public function testAlteringUser() : void {
this.rpcClient = new RpcClient(ConnectionParameters.getValues());

// make sure the user is created on the server side
this.rpcClient.init(dispatchInitializeEvent, dispatchFailEvent);
Async.handleEvent(this, this.eventDispatcher, INITIALIZED, alterUser, TIMEOUT); // flow continues in alterUser function
}

private function alterUser(evt : Event, passThru : Object) : void {
var user : User = new User();
user.id = new NetworkId(NetworkId.FACEBOOK, "189891015686187");

// make sure all objects are gone
var removeAll : RemoveAllObjects = new RemoveAllObjects();

var changeBatch : ChangeBatch = new ChangeBatch(new ArrayCollection([ removeAll ]).source);

this.rpcClient.saveUserUsingChangeBatch(dispatchSavedEvent, dispatchFailEvent, user, changeBatch);
Async.handleEvent(this, this.eventDispatcher, SAVED, reloadChangedUser, TIMEOUT); // flow continues in the reloadChangedUser function
}

private function reloadChangedUser(evt : Event, passThru : Object) : void {
// load back the user and verify that the RemoveAllObjects has been applied
this.rpcClient.load(dispatchUserLoadedEvent, dispatchFailEvent);
Async.handleEvent(this, this, LOADED, assertChanges, TIMEOUT);
}

private function assertChanges(evt : Event, passThru : Object) : void {
var user : User = this.user;
Assert.assertNotNull("user should not be null", user);

var objects : WorldObjects = this.user.worldObjects;
Assert.assertTrue("user should not have any objects left but had " + objects.length, objects.length == 0);
}

private function dispatchInitializeEvent(... args) : void {
dispatchEvent(new Event(INITIALIZED));
}

private function dispatchSavedEvent(results : Array) : void {
dispatchEvent(new Event(SAVED));
}

private function dispatchUserLoadedEvent(user : User) : void {
this.user = user;
dispatchEvent(new Event(LOADED));
}

private function dispatchFailEvent() : void {
dispatchEvent(new Event("Failed"));
}

}


In the AlterPlayerTest the service call sequence has contains only 3 service calls. The initial Flexunit test method is called testAlteringUser from which rpcClient.init is invoked. The test will break if no INITIALIZED Event is fired within 2 seconds. If the INITIALIZED Event is dispatched properly, the test flow continues with the alterUser function in which the service call rpcClient.saveUserUsingChangeBatch is made. Here the test expects the SAVED Event or it will fail. Being successful, the test will finally continue in reloadChangedUser, where the last service call rpcClient.load is made. Here the test user is retrieved from the Java server backend and again, a LOADED Event is expected. At the end the test is asserting that the test user has been altered.

While this test might be fine, we often have test scenarios where different game players interact with each other and the service call sequence is 10+ calls long, which makes the test really hard to read.

Fortunately today I found a nice framework called Fluint Sequences which has been contributed and become part of Flexunit. As the name suggests, use this if you want to write a fluent sequence of service calls within your test case. So here is a rewrite of the test above using Fluint Sequences:


public class AlterPlayerSequentiallyTest {

private static const TIMEOUT : int = 2000;

private static const INITIALIZED : String = "initialized";
private static const SAVED : String = "saved";
private static const LOADED : String = "loaded";
private static const FAILED : String = "failed";

private var eventDispatcher : EventDispatcher = new EventDispatcher();

public function AlterPlayerSequentiallyTest() { }

[Test(async)]
public function testAlteringUser() : void {
var rpcClient = new RpcClient(ConnectionParameters.getValues());

var user : User = new User();
user.id = new NetworkId(NetworkId.FACEBOOK, "189891015686187");

var passThroughData : Object = new Object();
passThroughData.usr = null;

var sequence:SequenceRunner = new SequenceRunner(this);

// make sure the user is created on the server side
sequence.addStep(sequenceCaller(rpcClient.init,
[function() : void {
eventDispatcher.dispatchEvent(new Event(INITIALIZED));
}, dispatchFailEvent]
)
);
sequence.addStep(new SequenceWaiter(this.eventDispatcher, INITIALIZED, TIMEOUT));

// make sure all objects are gone
sequence.addStep(sequenceCaller(rpcClient.saveUserUsingChangeBatch,
[function() : void {
eventDispatcher.dispatchEvent(new Event(SAVED));
}, dispatchFailEvent, user, new ChangeBatch(new ArrayCollection([ new RemoveAllObjects() ]).source)]
)
);
sequence.addStep(new SequenceWaiter(this.eventDispatcher, SAVED, TIMEOUT));

// reload user and store in passThroughData
sequence.addStep(sequenceCaller(rpcClient.load,
[function(user : User) : void {
passThroughData.usr = user;
eventDispatcher.dispatchEvent(new Event(LOADED));
}, dispatchFailEvent]
)
);
sequence.addStep(new SequenceWaiter(this.eventDispatcher, LOADED, TIMEOUT));

// assert changes
sequence.addAssertHandler(assertChanges, passThroughData);

// start the test
sequence.run();
}

private function sequenceCaller(fnc : Function, args : Array) {
return new SequenceCaller(
this.eventDispatcher,
fnc,
args
)
}

private function assertChanges(evt : Event, passThroughData : Object) : void {
var user : User = passThroughData.usr;
Assert.assertNotNull("user should not be null", user);

var objects : WorldObjects = this.user.worldObjects;
Assert.assertTrue("user should not have any objects left but had " + objects.length, objects.length == 0);
}

private function dispatchFailEvent() : void {
dispatchEvent(new Event("Failed"));
}
}


This test is not at all shorter than the first one but I think it is much easier to read and understand. Also some private member fields could be changed to local variables. The bread and butter of this test is the SequenceRunner, which acts as a container for steps which should be executed in order. No step will be executed until you invoke SequenceRunner.run() however. The first step is a SequenceCaller wrapping the rpcClient.init call followed by a SequenceWaiter step which waits the specified time for the INITIALIZED Event to occur. I do the same thing for altering and reloading the user. Finally I define and execute a Function to contain my assertions.