Two Scala Serialization Examples

In the last two days I’ve been looking into ways to serialize and deserialize some Scala objects. I tested a few suggestions that were mentioned on this post on Stackoverflow. As a reference for myself (and because sometimes it is hard to find good examples) I am adding two examples for Scala Pickling and Twitter Chill. Let’s have a basic SBT project first.

Since I work with the Battlefield franchise let’s create some domain classes that we are going to serialize and deserialize.

The first candidate will be Scala Pickling. The following code pickles a List of 3000 random WeaponAccessory instances.

Unfortunately the code doesn't even compile properly. Scala Pickling uses Macros and advanced Scala compile features. Trying to compile Pickling.scala fails during compilation. Also people are encouraged to depend on a SNAPSHOT version which means you are always depending on the latest patches. When I wrote this blog post I hit this issue. Verdict: scala-pickling is very easy to use and works great for very simple stuff. As soon as your object graph gets a bit more complicated you will hit weird errors. Another problem is the lack of a non-SNAPSHOT version.

The seconds test candidate was Twitter Chill which is based on Kryo. chill-scala adds some Scala specific extensions. Your SBT project should depend on chill directly, which contains the code in chill-scala (which isn’t published separately). Even though they don’t have Scala examples in their Github documentation and I got some cryptic errors first when doing stuff wrong - I have to say this is an awesome library that works great! Also the authors reply fast on Twitter. Verdict: highly recommended!

SBT and faster RPM packaging

We do a lot of Scala coding nowadays and I am trying to introduce SBT as build tool to all our new Scala projects. When we deploy these applications to Amazon EC2 nodes, we use Chef Solo and the Instance User Data feature to install an RPM file. We don’t use custom AMI’s. The RPM file is hosted in S3 and made available as package via this yum plugin. Each time we build our project via our continuous integration server (Bamboo), a new RPM package is created and uploaded to S3.

It became more and more of a problem that building that particular application in Bamboo took a long time. The build plan ran for more than 10 minutes. So yesterday I spent some time to make it build a bit faster.

First of all I have to say it is pretty lame that the SBT plugin is broken in Bamboo since version 4.4.3 and no one from Atlassian is interested in fixing it since August 2013! I tried to fix the Bamboo plugin myself but Atlassian has some non-public Maven repositories so I couldn’t even build it. Given that the top four Java/Scala build tools are Ant, Maven, Gradle and SBT you could also say that Bamboo is somewhat 25% broken currently. Anyway a workaround is to use the Script Task in a Job and run SBT, which is what we do currently.

When I looked at our build there were basically two steps which took a long time. First we were creating a big one-jar (also called uber-jar sometimes). This is a single jar file that contains all compiled classes from all dependencies as well as our own classes. To create the uber-jar we used the sbt-assembly plugin which can run for a bit if you have a lot of dependencies. But actually you don’t need to have a single big jar file as you can add an entire directory to the Java classpath when starting an application. So I switched to a plugin called sbt-pack which dumps the jar files of all managed dependencies into a folder under target along with your project jar. This folder is then used later when building the RPM. Not using the sbt-assembly plugin to create a single uber-jar already saved us about 2 minutes of buildtime.

The second change was addressing creation of the actual RPM package. Previously we were using SBT native packager to assemble the RPM file. Unfortunately it was also not running very fast. Another big issue in Bamboo was that the sbt-native-packager logs some stuff on Std Error. This failed the build because Bamboo is scanning the build log for errors. (Our hack around this issue was to write a SBT task that logs 250 lines of “Build Successful” into the Bamboo log - what a mess). Today the RPM is build using fpm. On your Bamboo server you need to install fpm which is a Ruby Gem (gem install fpm). Then install Python and the fabric library.

And here is how we use fabric and fpm. In the root of your Scala project create a folder called build. Inside this folder store the following file:

You probably want to adapt projectname, packagename and the fpm settings to match your own project. To invoke the script during a build create a Script task in Bamboo that executes: fab -f build/ build. When the Script is executed from Bamboo it is looking for a file called version.txt in the build folder. The file version.txt need to be created upfront via SBT to propagate the project version to the Python script. This is what the custom rpmPrepare task does.

The rpmPrepare task reuses a SettingKey called branchName which contains the name of the branch in Github. The name of the RPM package will contain the branch name, so that you can build multiple branches of the same project in Bamboo in parallel without having to worry about version clashes. The branchName Setting in SBT is retrieved via either a system property or an environment variable called “branchName”. This variable is set from Bamboo. Each build plan in Bamboo is made of individual tasks and for a task you can set individual environment variables. So just add -DbranchName=${} and Bamboo will feed in the Github branch name into the task.

So after running the Python script you will have the RPM file in the WORK_DIR folder. For running Java command-line applications we use Supervisor. Here is an example how to invoke a Main class given that the RPM installs your project in /opt/projectname.

Publishing from SBT to Nexus

I am pretty new to SBT. Yesterday, for the first time, we wanted to publish the jar artifact of an in-house utility library into our private Nexus repository. This is an internal Nexus repository which we use mostly in Java projects build with Maven. While the task of publishing an artifact from SBT is well documented, it was not working right away. We hit some problems. Some answers to these problems we found on Stackoverflow, but some things we needed to figure out ourselves.

To prepare your build in SBT basically do these things. Add values for the publishTo Setting and the credentials Task. I recommend using a credentials file not under version control for obvious reasons. The first thing you want to verify is that you are using the correct “realm” value, which can be either a property in the credentials file or the first argument to the constructor of the Credentials class. Use curl to figure out the correct value as explained here. Send a POST to the Nexus repository which you want to publish to without any authentication arguments. For us this was the call.

Look for the WWW-Authenticate header and use the realm value. I think the default is “Sonatype Nexus Repository Manager”.

This was a step in the right direction but we still got the following error in SBT:

Not super useful but more info is actually available in the Nexus logfiles. Make sure you set the loglevel to DEBUG via the Nexus admin GUI first, then tail nexus.log while you try to publish from SBT. Here is some output in nexus.log, basically saying that SBT did not sent a value for username and password as part of the Basic Authentication.

And I was using the following build.sbt file:

After running a few tests, I figured out that the second argument to the sbt.Credentials class should only be the host and must not include the port – doh! After fixing this, everything works just fine. Another thing you want to check via the Nexus admin GUI is the Access Settings of your repository. For “Deployment Policy” we have set it to “Allow Redeploy”.

Dynamic Type System Trouble

This week I really learned to appreciate my Java compiler. I learned it the hard way – by not using it. In the last game that we released (Battlefield 4) I have implemented a feature for our players which suggests 3 game items to progress on, i.e. a weapon to unlock, an assignment that should be finished etc. Our internal name for this feature is “Suggestions”. A player would not only see these 3 items but also see his own progress towards reaching the suggestion goal of each item. The code that calculates the 3 items has become quite complex since there are a lot of different item types that we can pick from and we need to match each player individually. The code is written in Python, my favorite language at this point, which uses a dynamic type system.

The “Suggestions” feature was tested thoroughly and worked quite well in production. I implemented some additional functionality on top. Players now also had the opportunity to manually pick individual items so they could see their progress in the game and on our companion website Battlelog. Unfortunately after a few weeks players complained about strange problems. These players would see completely random items being suggested to them – even with the progression totally being off. In some cases, players got items suggested that they had completed or unlocked. These errors happened completely random. Not able to reproduce in any of our test systems. But it was happening mostly to players that played the game a lot. So I started to investigate.

No unit test was broken and also a long code review did not surface any problems. Fortunately we have very short release cycles. So I added some additional logging to this functionality, which was released to production earlier this week. This finally got me something! I could see that in some rare cases the function, which calculates the suggested items of a player, returns not just 3 but more: 4, 5, 6 sometimes 9 items! I am posting you a ridiculous simplified version of the code below. Try to spot the problem.

I should also tell you, that an instance of the SuggestionService is shared. The Service is used in an application which uses gevent. There are many Greenlets (lightweight Threads) which call the suggest method simultaneously. Ring ring – multithreading issue! The problem is in Line 10, where two parentheses are missing. Instead of creating an instance of the ProgressSuggestions class every time the suggest method is called, the code gets a reference to the ProgressSuggestions class and assigns it to a variable called progress. Then, on the first invocation, it dynamically adds a suggestions class field to that class. Something that would neither be possible nor compile in a statically typed language like Java. All Greenlets modify the same class instance, so player’s suggestions can overwrite each other. The simple fix is to create an instance of the ProgressSuggestions class as it was intended. I am surprised that this bug could live so long. In a real multithreaded application this would have affected much more players. Greenlets are only semi parallel. They must yield at a bad time to trigger this problem. Here is the correct version.