Java Code Splitter

After doing just Python for almost one year, I am back on the JVM with some recent Scala projects. In one of the projects I had the chance to try Akka for the first time – which is an amazing library. In one of my Actors, and I think this is quite a common use case, I needed to run some initialization logic based on the Actors constructor arguments. During construction, the Actor would initialize an object that was expensive to create. This object would then be re-used in the receive method of the Actor.

I knew that Actor instances were shared, i.e. multiple calls to the receive method would be done on the same Actor object. So being new to Akka, I was afraid of having shared mutable state within my Actor and I was researching for a better way to do the initialization, other than just having a mutable field. This is when I found out about the FSM (Finite State Machine) trait. It is a perfect way to model initialization. I created two States for my Actor (if you want to do initialization in multiple Actors it’s a good idea to keep the common states, data holders and initialization messages in a separate object)

Individual states and data holder are then created in the individual Actors. The parent (supervisor) would then create the Actor, send an Initialize message which would in turn create the expensive object. The actor would then move itself to the next state and be ready to receive the further messages.

While this is a very nice way to model initialization one big problem became apparent – restarts. As soon as my Actor failed with an exception in the Initialized state, the parent Actor would restart it in the New state. This made the Actor pretty much unusable. One potential solution to this are probably the lifecycle methods. I could have overwritten the postRestart method in my Actor, where I have access to the constructor arguments, to send an initialization message to myself. But instead and against my gut feeling I decided to use a mutable field instead.

As I learned later, even though multiple threads share the same Actor instance, Akka guarantees that only a single thread will handle a message in the receive method at a time (also called The Actor subsequent processing rule). So now I set the mutable field to a None (Option type) and on the first message that arrives the field is initialized properly to a Some. This works fine but throws up some interesting questions. Since Akka is using Dispatchers (thread pools), subsequent messages in an Actor are most likely handled by different threads. In Java, changes to fields of shared objects done in one thread are not always visible to other threads (unless the field is volatile, the modification is done in a synchronized code section or in a section guarded by a Lock). Apparently this is not a problem for Akka.

In layman’s terms this means that changes to internal fields of the actor are visible when the next message is processed by that actor and you don’t need to make the fields volatile.

Unfortunately it is not further explained how Akka achieves this. The visibility problem DOES exist in Akka - if Actor's contain fields that are modified when receiving a message (i.e. some immutable field of ArrayBuffer where elements are added and removed in the receive method). In that case, how does Akka make sure that those changes are seen in other threads when the next message arrives? In my application at least, I had one issue which seemed to be a visibility problem. Unfortunately until now I wasn’t able to isolate and reproduce this problem in a unit test :( What I have so far (some parts need to be added).

Have to fill the gap and do the HTTP POST. What I have seen is a print indicating that a smaller batch has been pushed out – which can ultimately only be a visibility issue. My guess it that the culprit is either my asynchronous POST using the Dispatch library or the way clear() is implemented in the ArrayBuffer class. Further investigating. For now, this change got rid of the problem for me.

Akka and parameterized mutable Actor state