Parsing Tomcat Access Log for 404 Errors

Yesterday I set up a new dedicated server for a couple of domains. I have Apache with mod_proxy running in front of a Tomcat. It was pretty easy to set up. Since these were quite old domains, I have not really worked with them in a while. I was interested, if I get a lot of 404 errors for these websites. I came up with a nice looking Linux command. Something I remembered from my current job.

Given that you have logging enabled in your Tomcat server.xml configuration, probably like this:

<Valve className="org.apache.catalina.valves.FastCommonAccessLogValve" directory="/etc/tomcats/logs" prefix="access." suffix=".log" pattern="common" resolveHosts="false" />

Prefix and suffix could be different of course but this does not matter. This will create you a daily log file like access.2009-09-24.log. Now to get a nice overview and detect 404 errors fast, run this command:

cat access.2009-09-24.log | cut -f 7,8,9 -d \ | sort | uniq -c | sort -gr

Here are the details. First you display the file contents using cat. This is piped through the cut command. -f 7,8,9 -d \ specifies that you are interested in the fields 7,8 and 9 and the delimiter shall be a whitespace. The syntax for the whitespace delimiter only works that way because another pipe follows. The sort applies some alphabetical sorting. Next pipe is uniq -c which will eliminated duplicates but also adds a count for each unique row. Finally sort -gr will apply numerical sorting based on the result of uniq -c and in reverse order, having the highest number first. Here is some sample output:

6 /includes/css/schufafreie.css HTTP/1.1" 200
6 /images/spacer.gif HTTP/1.1" 200
6 /images/linksline.gif HTTP/1.1" 200
6 /images/banner_oben.jpg HTTP/1.1" 200
6 /favicon.ico HTTP/1.1" 404
5 /includes/js/schufafreie.js HTTP/1.1" 200
5 /images/pfeil_r_grau.gif HTTP/1.1" 200
5 / HTTP/1.1" 200