Philip Cristiano -

Tagging in Jekyll

Posted on 2012-12-31

Categories in Jekyll have annoyed me for a while because of the URLs generated. The path would be something like /tag1/tag2/year/month/day/title which works so long as you don't change the categories used. Since tags are also an option and don't have the same issue I've switched. I followed this post about tagging archive pages in Jekyll that made it rather painless.

Graphing Influence

Posted on 2012-12-30

I just pushed a Python package for Klout-to-Graphite that will easily allow you to graph your Klout within Graphite.

This started with a few minutes after lunch at SeatGeek where we were checking various Klout scores. Since I tend to graph... everything... I quickly setup a cron script to start collecting the metrics for Graphite.

To run it:

Ideally this is run in cron, we use 30 minutes. Over the course of 2 weeks there is already a few rank changes and large jumps due to adding new social networks to Klout.

14 Days of Kout

Resque Metrics with StatsD

Posted on 2012-12-31

A recent task of mine was to add some metric collection to a Rails application at SeatGeek. One of the main components (and critical if there was a problem) is the set of Resque background workers. There is actually a Resque Plugin (abandoned, maintained that will collect stats. The gem sadly is not maintained so I forked the maintained repo in order to provide a stable source. I use the commit hash to make sure I get the version but if the repository we used disappears that would cause problems, so a fork solves that.My fork doesn't change much except for some of the paths used for the metrics. At some point I may clean up the README and package my first gem.

Monitoring Service Health Check Duration

Posted on 2012-12-16

A recent metric I've started paying attention to was the duration of the health check for services behind HAProxy. This is reported in the admin interface CSV and can easily be added to your metric systems. This is what a few nodes started doing yesterday:

Check Duration

This service can usually hits the 50ms range for health checks although it started getting much worse. The service is actually written in Tornado although has a few blocking calls that are used. Non-blocking IO should allow the health checks to be very quick to respond as in this case it returns a static response.

The root cause for the problem is that calls to MongoDB in a particular handler were taking longer than before and will hold back other handlers as it is currently a blocking operation. If the HAProxy health checks pass a threshold it will remove the nodes from the pool, a good precaution, although in our case can cause flickering if MongoDB takes longer than expected.

I did receive alerts thanks to alerting of per-service health checks with Graphite Pager.

We are using Diamond at SeatGeek which easily collects metrics from HAProxy. Check duration is (by default) stored at servers.HAPROXY-SERVER.haproxy.BACKEND.HOST-SERVER.check_duration. The metric we alert on is the moving median for each server regardless of the HAProxy server aliasByNode(movingMedian(groupByNode(servers.*.haproxy.*.*.check_duration,3,"averageSeries"),10),0).