Maybe I should have used a Lock here

Java 5 added some really nice classes in the java.util.concurrent package. For instance there is the ConcurrentMap interface which allows you to add items to a Map if they are not already contained. The code you would normally write if the ConcurrentMap didn't exist, needs to do this as a atomic check-then-act sequence, if the Map is shared between Threads. With the ConcurrentMap interface you get all of this for free using the putIfAbscent method.

I shoot myself in the foot today, with a small piece of code, which one of my unit tests was executing from a large number of Threads. The test was failing randomly, like 5% of the times. If you see tests randomly failing, it is often indicating a Date problem or a concurrency problem. Here is the class under test. Can you spot the problem?

In order to understand what the class does, you need to know what a DynamicProperty is. This is a class wrapping a value which comes from a remote source, i.e. over the Network. So instead of having fixed System properties, you would ask the remote source for the value. Then the value is cached for a couple of seconds and if expired you refresh the value. Now the ChangeAwareDynamicProperty is doing the same thing but additionally react to value changes. So every time the value is changed in the remote source, there is a costly operation that the ChangeAwareDynamicProperty needs to do. This code is not shown as it is not relevant.

What is the ConcurrentMap for? Obviously the costly operation should only be done once per value change right? So a simple approach is to use locking. Let only 1 Thread at a time go into a critical section where the current value is compared to the new value. If a change is detected, run the costly operation. This would totally work but the throughput would be horrible. Especially when a Thread detects a value change while holding the Lock, running the costly operation would block all other Threads for a while. So my idea was to use a ConcurrentMap instead of a Lock. If Threads detect a value change, they try to put the new value into a ConcurrentMap. By definition, only one Thread can put the new value into the Map. For this Thread the putIfAbscent call will return null. This Updater Thread will then run the costly operation, while other Threads will still return the previous value. Once the Updater Thread is finished, it will update the current value and remove the new current value from the ConcurrentMap. This is to prevent the Map from growing eternally. Sounds straightforward or?

Well obviously there was a problem, as the unit test was failing every now and then. Every time the test was failing, I could see that it was trying to run the costly operation twice for the same changed value, as indicated by the following log statement:

I started to believe, that the putIfAbsent method was buggy and returned null even if the value was already present in the Map. I asked another co-worker to check my code, to see if he could spot a problem. After a few minutes we realized that the problem was in my code - as to be expected. Like I said, I don't want the Map to grow forever. So the Updater Thread is removing the new changed value after the costly operation is finished. The problem is that another Thread could be waiting for the CPU in line 14. This is the line that invokes putIfAbscent. So once the Updater Thread is done, and the waiting Thread gets active, it will actually do exactly the same work again. Not good!

Our immediate solution was to not remove the Map entry after the Update Thread is finished. What we do instead is to remove the old value from the Map before reassigning the new changed value into currentVersion. As the Map will never contain more than 1 entry, it is always possible that a costly operation will be run again, even if a value has already been handled. This change only fixes the problem, that a single value change can trigger a consecutive execution of the costly operation.