access_log flowchart?

Apache, probably just like every other webserver, defaults to keeping very detailed logs. Every single HTTP request, the IP, the exact time, the hostname they accessed, the exact path, the HTTP version, the returned status code, the returned file size, the referring page, the client’s browser….

grep makes it trivial to find all access_log entries. (And grep + ssh + NFS makes it easy to find all access_log entries across a load-balanced cluster without centralized logging set up.) But it’s kind of like asking someone, “What route did you take to get this party,” and getting an answer that involves every single turn taklen, the precise distances, and every landmark encountered. I don’t care about every HTTP request made. What I want is a flowchart. They first arrived on the site from such-and-such a referrer, accessing a certain URL on our site. They accessed a handful of files as a result — CSS, static images, and, ultimately, another page.

If you view the request and the referrer as a parent-child relationship, it seems like it should be pretty easy to graph. Something like graphviz is probably perfectly suited for that type of data. It shouldn’t be so hard. So why hasn’t someone much smarter than me written a slick little application that would do this ten times better? Is there not that much of a demand? I find that hard to believe.

Leave a Reply

Your email address will not be published. Required fields are marked *