Riak, Protocol Buffers and Java in the Mix

Today I was using Riak for the first time, in order to compare it with Cassandra. I set up a Maven quickstart project to use Riak from Java using Google Protocol Buffers. The Riak clients come in two flavours - either using HTTP or Protobuf. The latter supposed to be a bit faster. Well, no surprise. HTTP is top level in the OSI model, so there is some overhead involved. However, if performance is not your biggest concern, I would almost always go for HTTP because it is easier to use in client applications. Anyway, let's continue with the Protocol Buffers version anyways (some people claim it is 10x faster).

If you are running Ubuntu like me, follow the install instructions using the binary packages. This is actually installing Riak Search, which includes Riak core. What the guide is not telling you, is how to start the database server afterwards. Well not too hard:


/etc/init.d/riaksearch start


To test if it is up and running execute:


curl http://localhost:8098/stats

curl -v http://127.0.0.1:8098/riak/test


Continue to set up a vanilla Java project using Maven:


mvn archetype:generate


Pick the maven-archetype-quickstart type (15) and enter the remaining stuff. Next make sure, that you have the compiler for Protocol Buffers installed:


protoc --version


If it is not installed run sudo apt-get install protobuf-compiler.

Next download the riakclient.proto file, which we will use to generate us a Java version of the Riak client. Before you do this, let's create 2 new folders which we need later. In the Maven project structure create src/main/proto. This is the default location for all your .proto files. Furthermore create src/main/java-gen which we will use to output generated Protocol Buffers files. Now download riakclient.proto and put it into src/main/proto.

The protobuf-compiler need to be told in which Java package you want to generate your source code files. Open riakclient.proto and add the package instructions before the first Message block:


...

** 24 - RpbMapRedResp{1,}
**
*/
option java_package = "my.package.riak.client";

// Error response - may be generated for any Req
message RpbErrorResp {
required bytes errmsg = 1;
required uint32 errcode = 2;
}

...


Ideally, you want to generate all source code based on the Protocol Buffers files every time you build your project. This is where the maven-protoc-plugin comes in handy. In order to use it, the pom.xml file needs to be tweaked. Add the plugin repository hosting the plugin:


<pluginRepositories>
<pluginRepository>
<id>dtrott</id>
<url>http://maven.davidtrott.com/repository</url>
</pluginRepository>
</pluginRepositories>


then hook in the plugin when running compile:


<plugin>
<groupId>com.google.protobuf.tools</groupId>
<artifactId>maven-protoc-plugin</artifactId>
<configuration>
<outputDirectory>src/main/java-gen/</outputDirectory>
</configuration>
<executions>
<execution>
<goals>
<goal>compile</goal>
</goals>
<phase>generate-sources</phase>
</execution>
</executions>
</plugin>


With this configuration, the maven-protoc-plugin will use the protoc compiler installed in your OS. This implies, that you use the same dependency version of Protocol Buffers in your project. So run protoc --version from the command line and set this version in your Maven dependencies section. If you don't do this, you might see errors like this:


Riakclient.java:[118,51] boolean cannot be dereferenced


Also note the outputDirectory defined in the plugin above. Unfortunately the maven-protoc-plugin cleans this directory before generating sources. This is why you cannot generate the protobuf sources into the same folder as your regular Java project files or they will be deleted.

Finally we are ready to write some Java code to run against Riak. Here is an example of a TestNG unit test that invokes the List Buckets operation:


/**
* List all of the buckets using protocol buffers. You need to have Riak
* up and running on port 8087.
*
* @author reiks, Jan 5, 2011
*/
@Test(groups = "all", sequential = true)
public class ListBucketsTest {

public void testListBuckets() throws IOException {
final Socket socket = new Socket("localhost", 8087);
DataOutputStream dout = null;
DataInputStream din = null;

try {
dout = new DataOutputStream(
new BufferedOutputStream(
socket.getOutputStream(), 1024 * 200
)
);

din = new DataInputStream(
new BufferedInputStream(
socket.getInputStream(), 1024 * 200
)
);

dout.writeInt(1);
dout.write(15); // 15 - RpbListBucketsReq
dout.flush();

final byte[] bytes = getData(16, din); // 16 - RpbListBucketsResp
assertNotNull(bytes);
assertTrue(bytes.length > 0);

System.out.println(Arrays.toString(bytes));

} finally {
IOUtils.closeQuietly(dout);
IOUtils.closeQuietly(din);
socket.close();
}
}

byte[] getData(final int expectedCode, final DataInputStream din) throws IOException {
int len = din.readInt();
int returnCode = din.read();

byte[] data = null;
if (len > 1) {
data = new byte[len - 1];
din.readFully(data);
}

if (expectedCode != returnCode) {
throw new IOException("Unexpected (" + returnCode + "), expected " + expectedCode);
}

return data;
}
}


Double check that you run against port 8087 which is for Protocol Buffers communication. I was using port 8098 from the curl examples above for a while and got totally strange results until I figured that this is actually for HTTP communication. Stupid me!

As you can see, this code is rather low-level. Not only do you need to know the different Message codes, which you can find in the riakclient.proto file, you also need to get the java.io stuff right. In a perfect world, this code should be hidden behind a facade. This is exactly what the riak-java-pb-client library is doing. A similar facade library for Cassandra is Hector. It let's you write application code against Riak in a convinient way and hides the low-level details. Unfortunately at this stage, the riak-java-pb-client library was only available as source files in github and they don't provide a way to build it out of the box. I have sent them a version of their code which is restructured and can be build with Maven. With a bit of luck, you can git clone and build riak-java-pb-client in a few days.

3 Kommentare:

Andy Gross hat gesagt…

As of a few weeks ago, the Java PB client is part of the official Riak Java client, which builds with Maven, and can be found here:

https://github.com/basho/riak-java-client

- Andy Gross

Reikje hat gesagt…

@Andy: in that case they should update the documentation because the description says: "# Overview # This Java-based Riak client uses Commons HttpClient to perform HTTP requests."

Reikje hat gesagt…

Good news. krestenkrab has accepted my changes to riak-java-pb-client so whenever you check out the project next time, you can build it using Maven.