Java Code Splitter

Jetty7, Spring, Testing and Classloading

Currently working in a project where Jetty is configured in a Spring Context, that is started for unit tests. I was permanently hitting: "Context attribute is not of type WebApplicationContext". However, in the debugger I could see it was given a XmlWebApplicationContext - WTF? I figured it must be some classloading issue. Maybe something that was different when comparing Jetty 6 and 7? In previous Spring projects I have always used Jetty 6. This was the first time I tried Jetty 7 - which is still not final I think.

Anyway, what you want to do is this in your Spring applicationContext.xml file:

Previously I never had to use the parentLoaderPriority property but this made my tests working.

One (unrelated) question mark remains though. This is a Maven based project and the deploy artifact is a WAR file. There was one unit test that manually started a Jetty server in-process to do some testing. My initial idea was to use the maven-jetty-plugin to start Jetty before the test suite runs using the Maven command-line. However in that case it would not have been possible to run this test isolated in the IDE. I decided to make the test a Spring powered unit test and have the Spring Context start up Jetty. Unfortunately this required an existing WAR file, correctly set in the war property of the WebAppContext (see above). Maven builds the WAR file after running the test. To work around this, I told Maven to construct the WAR file before the tests are run:


    
        
            
                org.apache.maven.plugins
                maven-war-plugin
                
                    
                        war-it
                        generate-test-resources
                        
                            war

Anyone else has solved this scenario differently?

Spring Insight and Google Speed Tracer

I am sitting in a session here at Disruptive Code called "From Zero to Cloud" which is presented by Adam Skogman from SpringSource. The session was actually split in the middle, with lunch in between. While the talk itself did not give me much of a big WOW effect, simply because I have used the Amazon cloud services before, Adam mentioned a very cool tool called Spring Insight. He spoke about Spring Insight just under a minute. It sounded like an extension to the Spring tc server, which makes it possible to do performance analysis for web applications. It can measure and display information about the execution time of a HTTP request, a query execution in the database or even a single Spring bean invocation. Spring Insight can furthermore be integrated with Google Speed Tracer, which is apparently a Chrome plugin that I did not know of just a couple of minutes ago.

Spring Insight is very interesting project for me. A while ago a former colleague and I had a similar idea for an open-source project. We were planning to implement a language independent framework to measure execution time of web applications. In Java, the idea was to implement this using Annotations and Servlet Filters. In PHP, we were planning to have the same outcome using explicit method calls, which the PHP developer would have to add to the code manually. While the application was executed, it would then write a report file which users could then upload online to visualize the collected data. It is this part, that Google Speed Racer is doing now in the setup together with Spring Insight. Unfortunately we never had the time to work on our project, so it ended up just being an idea.

Anyway, I googled around a bit. The biggest point of criticism for Spring Insight is apparently that it is tightly coupled to SpringSource's tc Server. Even though there a millions of Spring Framework powered applications, only a fraction of them is running in Spring tc Server. Doing a quick research, it seems like the main reason to use Spring tc Server for Spring Insight, is the usage of custom Container deployers to enable the functionality. The good news is that someone else has already started to work on a third party library, to enable Spring Insight functionality without actually forcing the user to go with Spring tc Server. The library is called spring4speedtracer. Even though it is not as powerful as the original Spring Insight - for instance JDBC query execution times cannot be measured - it is a promising project. When I get home, I will have a closer look at this library, maybe I can contribute to the project (and put my own project idea to rest for good).

Auto Secondary Indexes in Cassandra

Twenty minutes ago, Eric Evans talk about Cassandra ended at Disruptive Code. In the first 25 minutes or so, I was quite disappointed because it seemed to be exactly the same presentation, which I saw in June at the Berlin Buzzwords conference. Even the funny Bigtable-Dynamo lovechild slide was still there, though I believe the laughter was greater in Berlin than it was in Stockholm. Well I guess it's not so easy to get a Swede laughing.

Anyway, what I realised during Eric's presentation, was that he already added some stuff from the next Cassandra release 0.7. First of all, every time he was showing configuration, he had an excerpt from the a cassandra.yaml file. For instance this snippet from his timeseries example:


#conf/cassandra.yaml
keyspaces:
  -name: Sites
   column_families:
     -name Stats
      compare_with: LongType

new yaml configuration in Cassandra

Apparently as of version 0.7, the cassandra.yaml file is replacing the cassandra.xml file. I have not come in contact with yaml really, I believe it is common in the Ruby world. Another very cool feature is the addition of secondary indexes to Cassandra. In previous versions, Cassandra did not have indexes out of the box. To mimic the behavior of a secondary index, what you could have done is to create another Column Family (I believe it was called). This new Column Family would then be sorted differently and contain a key to the "original" entry. As a example, imagine having a Column Family to store addresses. To be able to search by the city, you could create another Column Family called "byCity" with two properties, "city" and "address key". Every time you insert or update an address, your code has to alter the byCity Column Family.

It looks like Cassandra will do this for you from version 0.7 on. There two new per-column settings called index_name and index_type. If I understood Eric correctly, adding this to your configuration will create you an inverted index, which can be used as a secondary access path. I think this is a very nice, yet very undocumented, feature. No clue when version 0.7 is going to be released but I hope it will be very soon, because we are only weeks away from starting a very big Cassandra project in my company.

Disruptive Code Party

Seriously, who needs Java One when you can go partying with Disruptive Code people at Gröna Lund? Okay, Weather might be a bit nicer in California :)

Day 1 was great I think. Great sessions, especially the ones about HTML5. Two things I did not like: Adam Skogmans "Designing For NoSQL" talk started earlier than it was printed on the badges or written on disruptivecode.com = missed it :( Also WIFI quality in the Big Hall was really bad. Nevertheless great stuff! Looking forward to day two.

Choosing wrong track

A problem that keeps following me on conferences, is picking the wrong talks. Often a session sounds nicer on the agenda, than it is in reality. For the first track I selected PayPal over the HTML5 session. I was hoping to get some insights into the PayPal API. How to use it? How to integrate it with some practical examples? The session however turned out to be not that detailed. It felt more like a sales talk. By the way, PayPal is one of the main sponsors of Disruptive Code. The fact, that the organisers put up all conference tweets during the talk, and it seemed to be really exciting over at the HTML5 track, made things worse.

The only good thing I could take away from this talk, are some ideas to possibly integrate payment into my projects. The first of the two PayPal speakers, gave some interesting project examples. ie. a person to persons send money application, a game where you can buy ammo or crowd-sourcing (paying users for uploading pictures, entering recipes etc.). There is also a portal called PayPal X, where developers can develop applications and tools on top of the PayPal API. Similar to the iPhone, these applications have to be approved by PayPal before they are available to everyone.

So it is likely I will embed PayPal in my applications in the future. This session just did not show me how to really. The only slide having code on it, wasn't helping there much either and it certainly was not PHP code like the speakers said.

HTML5 Web Workers and Geolocation

After having missed the first session about upcoming HTML5 features, I decided to go to Peter Lubbers talk "HTML5 Web Sockets, Web Workers and Geolocation Unleashed". Peter is the author of the recently published Apress book "Pro Html 5 Programming". He works for a company called Kaazing which I think is based in the Netherlands. Some time ago, my boss forwarded me a mail which was from Kaazing. It was about a Web Sockets presentation which they wanted to held at our office, since game clients are one of the primary use cases where Web Sockets can come in handy. I have to admit that back then I did not want to meet them. Primarily because I do not work with client products. In addition to that, I thought it was immature, but it was rather that I did not know so much about it back then and did not care.

Anyway, Peters talk covered three of the most interesting API's around HTML5 - Web Workers, Geolocation and Web Sockets. These API's have initially been part of the HTML5 spec but have now been removed and put into their own specification. The idea behind all three is to make life simpler for the developers. With the current generation of browsers and HTML4, developers have to come up with complicated hacks or they have to use plugins in order to mimic bi-directional communication. What HTML5 is aiming for, is to support this natively powered through the browser instead of building something similar based on a bad foundation.

Web Workers are a new feature which brings back UI responsiveness while long running, heavy Javascript is executed. It enables background processing of such script while the user can still use the browser. Peter had an excellent example of this and hopefully I can put this up here on my blog later on. In his example he had a webform with two buttons. Each of the buttons would fire a busy-keeping Javascript for 10 seconds. One button would do this without, the other one with using a Web Worker. Clicking the first button, it was impossible to use the dropdown element or even open a new tab in Firefox. All this worked with Web Workers. Currently this feature is available in Firefox, Chrome, Safari and Opera but not in Internet Explorer. Actually it would be fun to write a web application using Web Workers, where you print a message to the IE users like "if you cannot navigate right now, consider switching to another browser". Web Workers are an incentive that comes for free if you use anything else than Internet Explorer.

The second API Peter talked about was Geolocation. Again, Geolocation is only supported in some browsers right now. You want to check out caniuse.com to see if your browser supports it. Also html5test.com is a great site to check HTML5 compatibility. Geolocation is a name that stands for native support of user location inside a browser. There are really only two methods that the API supports, getCurrentPosition() and watchPosition(). The first one doing a one time call, the latter constantly receiving the location of the user. This of course makes much more sense on mobile devices than from withing a fixed network. What the two calls return is simply longitude, latitude and accuracy. It is up to the developer how he uses this information within the web application, ie. by displaying it using the Google Maps API. Along with the API calls, you can also request additional metadata but if your browser can not give it to you (like altitude, heading, speed) you might end up getting NULL instead. Looking under the hood, Geolocation is implemented by the browser vendors by using an external location service. The browser asks this service for the location and returns this to the user. I was thinking, the watchPosition() method from the API is probably a candidate where you want to use a Web Worker, unless it is already implemented on top of a Web Worker. Have to find this out.

Would like to blog more about Web Sockets but the next session about CSS3 has already started...

Conference kickoff

For me, it's rather hard to get annoyed on public transport in Stockholm. However this morning was different. It looked like all the Kindergartens and school classes in Sweden were out for a field trip. Pendeltåg was packed, so were the buses. I took the Bus 69 from the central station to the technical museum, where conference is taking place. I have never been in this area of Stockholm before, even though we live here for more than 2 years now. Wondering why they claim the Tekniska Museet is on Djurgården? This is totally the wrong island. Anyway, the museum is quite an amazing place. It's like Tom Tits Experiment with the limitation of you are not allowed to touch and try the stuff.

A bunch of people exited the bus with me and headed for the conference. Someone handed me my badge at registration without checking my id. Last name misspelled, did I screw this up in the blogger pass registration form? Actually it was good that my last name was spelled wrong. My company had booked a Disruptive Code ticket for me before I got elected as a Blogger. I promised the ticket to someone else in the office since I didn't know that the badges came with a name. To make a long story short, you will meet two Reik Schatz on the conference now, one with the last name spelled right.

Conference is about to start now, let's see how it goes. Could already grab Eric Evans over a coffee.

How to select a data-store

I have probably mentioned a couple of times. My company is currently in the process of selecting a new data-store which is supposed to replace MySQL for some applications that produce a lot of data. Cassandra is right now the hottest candidate and favored by my development team and also by the system administrators. Since replacing the database is obviously something bigger, we had to present this to some of the managers. They seemed to be interested as well and decided to make this a venture. This means we are getting hardware, people, resources etc. but it also means we have to follow a certain venture process that starts with a pre-study. To make it even more complicated (or bureaucratic) there is even a pre-study for the pre-study. Actually it is not as bad as it sounds. One of the architects called for a meeting today. The purpose was to find all the open questions that we have to answer in the pre-study document. It will then be this document which will be presented to the people who decided about the venture.

The meeting was actually very productive. Here is some stuff that is worth thinking about if you are in a similar situation.

Alternatives: this is almost a 100% question you will get. What are the alternatives? Why have you selected product X? I was thinking that maybe we could come up with some sort of comparison matrix from which it will be obvious why we favor Cassandra (otherwise we have to ask us why our selfs probably).

Product: The product you are evaluating. What tools does it have to offer? Which changes are required for the development and the test environments? For instance many of our testers look up and compare results directly in the database using MySQL query browser. What can we give them instead? Also an interesting aspect is competence. How do you build it up? Are there any trainings you can participate or is there a company from which you can buy professional support? In the case of open-source software, how active is the community and how certain is it that the product is not gonna die within two years?

Operations: What hardware setup do we need? How do we backup and restore? How do you monitor in the system and what do you monitor? What are the requirements on high-availability and scalability? Talking about scalability: can we add and removes nodes on the fly and how long does it take to replicate / re-balance our data onto these nodes? Do we have a disaster recovery strategy?

Impact: The most interesting part for me. How do we have to adapt existing applications in order to integrate with Cassandra? What client library is to prefer? How flexible is the new data-store when it comes to changes? For instance we always have big-time trouble if we need to alter our MySQL schema's without downtime - a process for which Facebook even has a special name: Online Schema Change. Another interesting question, how can you effectively unit test an application using Cassandra. I guess running an in-process Cassandra, that starts up inside a unit test on a single node, is not catching all errors. Is it realistic to believe, that we can come up with a comprehensive list of use-cases to describe how the data is used in our system. Such a list would greatly influence or data-model. Since there really is only one physical way to store data, how do we handle alternative access. Do we retain a bunch of MySQL indexes to support Cassandra or do we also store these secondary access paths in the main data-store?

I do not remember all the open questions we came up with during the meeting. This is a good start at least. I hope we can create a great pre-study document to get everyone in the company on board.

Blogging at Disruptive Code

Good news everyone - I have been selected as a blogger for the Disruptive Code conference in Stockholm in September. Two weeks ago, a co-worker of mine sent me a link to the dcode website. I have not heard of this conference before but the topics as well as the location seemed very cool. My company is currently evaluating long-time storage prototype based on Cassandra as a replacement for our not-so-long-time MySQL solution. My main interest is therefore the NoSQL sessions and talks about Apache Cassandra. Nice to see, that Eric Evans from Rackspace decided to come visit Scandinavia and talk about Cassandra insights. I saw him earlier this year in Berlin at the Berlin Buzzwords conference. Looking forward to two exciting days on the Djurgården island.