More on RabbitMQ Priorities

Posted on

With a single process consuming from multiple queues the prefetch count could be a good enough solution to balancing the work from each queue.

After you have set up priorites with multiple queues you still need to consume from them. You could setup separate processes for each queue or a single process that consumes from multiple queues.

I usually set consumers to a prefetch count of 10, it works well enough and latency isn't much of a concern. When consuming from multiple queues setting each queue to the same prefetch count will give you a fair distribution of work to that consumer.

What I finally took the time to try this week was changing the prefetch counts based on priority. In my case we had 2 queues, high and low priority. The higher priority was based on user actions and we wanted to happen quickly. There was only 1 set of processes consuming from both queues and had the same prefetch counts. Since the messages are sent to the consumer ahead of time there were 20 messages for each process. Adjusting the low priority queue to a prefetch of 2 meant that there would be 12 items sent to the consumer, still plenty of work. These 12 items are put into a single queue in the library, no work needed in your code, and will give a 5/1 distribution of work in the consumer.

With the adjusted prefetch counts we are able to control which portion of the work we wind up doing when queues start to backup. In this case you have to sacrifice latency to do it, the higher priority queue may give more work to a busy consumer when others could be empty. In practice for us this did not matter, we set the prefetch on the high priority queue to 10 anyway.

This has the nice property that low priority items are still processed while high priority items exist and will be consumed at the highest rate as soon as the high priority items are drained. With more than 2 queues this technique may be cause more latency than you would like but it has be working well and required no code changes. I was planning on making a locking mechanism, and if you didn't want any low priority work in progress while there was high priority work you would still need to, but I don't think one will be needed anytime soon.


Conference Going

Posted on

I just returned from a week of conferences, first Monitorama and then Emerging Technologies for the Enterprise.

This was the first Monitorama event, held in Boston, and was a great chance to meet the people behind a lot of the software/blogs I follow. The first day was a single-track set of talks regarding open source monitoring and the second day a hackathon to help improve the state of open source monitoring. I contributed a bit to correct a small pain point I had. I didn't enter it into the judging partially because I believed it to be a rather simple hack and partially because I didn't want to have to rush to get back to Amtrak for my train back to NY. I was pleasantly surprised to see food always available including plenty of healthy bits.

PhillyETE was definitely a big change from Monitorama, more people, more talks, not as great of food. Part of the benefit for me to go to PhillyETE is the trip to Philadelphia to see my family, especially as an Easter trip. The best part of the conference had to be seeing the push for Clojure as an enterprise language, (and slightly less interest to me, Scala). Given that they are JVM languages they fit in very well and can work side by side for evaluation. I tried Scala for a bit since I wanted to try out Akka but ran into JVM memory issues during the Play "Hello, World!" tutorial which really soured me towards Scala.

Basho's sponsorship of Monitorama also helped convince me (with a 25% discount) to attend Ricon East. That may be the end of my conference going this year, I haven't decided yet about Surge.


Graphite Pager - v0.0.6 - Links to Documentation

Posted on

I've released version v0.0.6 of Graphite Pager my tool for alerting based on Graphite metrics.

The change for this release was to add links to documentation for each alert. Currently the format of the URL is {docs_url}/{alert name}#{alert legend name} where the docs_url is specified in the YAML config and the rest is based on the alert that is triggering.

While people at work haven't jumped to create metrics and alerts for various things this will at least make it easier for them to know why this alert was created and how to fix the problem. Right now I have only documented a few alerts and will do so as existing alerts fire. If anyone needs alerts made I will make sure the wiki page exists ahead of time.


Provisioning AMQP

Posted on

AMQP clients will allow you to declare your exchanges, queues, and bindings at the consumer level but that can cause problems as you use it more. You may get to the point that you will have to grep for all the declare methods in your code or run into problems trying to migrate to a new broker.

An alternative is to have consumers and producers take only the name of a queue or exchange and handle the rest outside of the application. This allows you to see and change in one place the configuration for all of your applications. When you need to provision a new broker it is done in a few seconds instead having to migrate some consumers, then all producers, then the rest of the consumers.

I've started writing and using Declare AMQP so that I can provision everything within Chef. It only supports the features I'm using but is very simple.

The migration is now much simpler as provisioning the server once is enough to make it ready for all applications. When I need to change exchanges or bindings I don't have to update any code. There is still the need to know which applications publish which routing key, but not a huge concern.

This has helped out as well configuring queues with specific priorities for the same type of tasks. Each application can be started with a queue to listen to and the configuration for both the broker and applications remains in one place.


Prioritizing Emails with RabbitMQ

Posted on

After you move a few tasks to the background with RabbitMQ you may realize that you eventually need to support different priorities for the same type of tasks, sending bulk email after you send transactional email. RabbitMQ doesn't have priorities so you wind up having to use separate queues for each priority.

You should already have a worker that can send the email, just now you need setup RabbitMQ with priorities.

The main exchange you use, email, should be declared either topic or direct and will take all of the messages you intend to send but when declared you should include an alternate exchange of email-undeliverable that is declared as a fanout exchange. Now you just need a default queue bound to the default routing key for the email exchange and also bound to email-undeliverable. Now every email your try to send that doesn't have a specifically prioritized queue will be routed to the default queue.

All you need now is to start your workers consuming from each queue you create.


Example Tornado AMQP Client with Pika

Posted on

I've used AMQP for a couple of years now but never used Pika in production. Recently I've been using Haigha in my AMQP Dispatcher project but needed a client for Tornado, which Pika supports. There is an example in the Pika docs of using the Tornado Connection but it doesn't provide as usable an interface as I'd like.

I wrote an client for internal use that handles the conditions I needed by default (including callbacks with the result of RabbitMQ publish confirmations) and after talking with a previous coworker put it into a gist.

It doesn't handle some things (like publish a content type with the encoded json) and could have some better names but it may be of use to more people.


MongoDB Lock Percentage in Graphite

Posted on

While investigating our MongoDB lock ratio I was asking around to see what common lock percentages were among those who I know use MongoDB. I discovered that despite having similar setups to what we are using that they didn't know how to get the lock percentage from Graphite.

Using Diamond you can easily get all of the MongoDB server status metrics into Graphite but the globalLock.ratio is a bit misleading in that it is based on the total uptime of Mongo, which could be a while, and not on recent usage patterns. And in 2.2 it disappears anyway!

The metrics that are included though that help are globalLock.totalTime and globalLock.lockTime which can be used to find the lock ratio/percentage over whatever sampling period you use.

The percentage winds up being scale(divideSeries(derivative(servers.MONGOHOSTNAME.MongoDBCollector.globalLock.lockTime),derivative(servers.MONGOHOSTNAME.MongoDBCollector.globalLock.totalTime)),100). You can remove the scale function to get the ratio. This doesn't work with globbing in Graphite though. You can scale the lockTime though to be able to get a globbable lock ratio for all of your Mongo servers, the exact value will depend on the sampling period.


SQL-to-Graphite

Posted on

I released a package SQL-to-Graphite that aims to easily save the results of SQL queries into Graphite.

We use this and similar scripts (I'm going to move over to using this) at work in order to collect global metrics about our systems. I typically count any table that has a status column and the average/max age of any records that should be updated periodically.

I made this package once I hit the second repository where I would have to write a script to do this. It should be compatible with any database supported by SQLAlchemy.

After installing (pip install sql-to-graphite) you can run the sql-to-graphite command.

With a file like:

And start getting metrics into Graphite!


Autodetecting Your RSS Feed in the Browser

Posted on

I noticed recently that my site didn't have the Syndication Icon icon anymore. I'm not sure when I lost it but to add it back I just added the link field that lets browsers know where my Atom feed is. Simple enough to add <link type="application/atom+xml" rel="alternate" href="/atom.xml"/> to the head of the page.


Resque Metrics with StatsD

Posted on

A recent task of mine was to add some metric collection to a Rails application at SeatGeek. One of the main components (and critical if there was a problem) is the set of Resque background workers. There is actually a Resque Plugin (abandoned, maintained that will collect stats. The gem sadly is not maintained so I forked the maintained repo in order to provide a stable source. I use the commit hash to make sure I get the version but if the repository we used disappears that would cause problems, so a fork solves that.My fork doesn't change much except for some of the paths used for the metrics. At some point I may clean up the README and package my first gem.


Tagging in Jekyll

Posted on

Categories in Jekyll have annoyed me for a while because of the URLs generated. The path would be something like /tag1/tag2/year/month/day/title which works so long as you don't change the categories used. Since tags are also an option and don't have the same issue I've switched. I followed this post about tagging archive pages in Jekyll that made it rather painless.


Graphing Influence

Posted on

I just pushed a Python package for Klout-to-Graphite that will easily allow you to graph your Klout within Graphite.

This started with a few minutes after lunch at SeatGeek where we were checking various Klout scores. Since I tend to graph... everything... I quickly setup a cron script to start collecting the metrics for Graphite.

To run it:

Ideally this is run in cron, we use 30 minutes. Over the course of 2 weeks there is already a few rank changes and large jumps due to adding new social networks to Klout.

14 Days of Kout


Monitoring Service Health Check Duration

Posted on

A recent metric I've started paying attention to was the duration of the health check for services behind HAProxy. This is reported in the admin interface CSV and can easily be added to your metric systems. This is what a few nodes started doing yesterday:

Check Duration

This service can usually hits the 50ms range for health checks although it started getting much worse. The service is actually written in Tornado although has a few blocking calls that are used. Non-blocking IO should allow the health checks to be very quick to respond as in this case it returns a static response.

The root cause for the problem is that calls to MongoDB in a particular handler were taking longer than before and will hold back other handlers as it is currently a blocking operation. If the HAProxy health checks pass a threshold it will remove the nodes from the pool, a good precaution, although in our case can cause flickering if MongoDB takes longer than expected.

I did receive alerts thanks to alerting of per-service health checks with Graphite Pager.

We are using Diamond at SeatGeek which easily collects metrics from HAProxy. Check duration is (by default) stored at servers.HAPROXY-SERVER.haproxy.BACKEND.HOST-SERVER.check_duration. The metric we alert on is the moving median for each server regardless of the HAProxy server aliasByNode(movingMedian(groupByNode(servers.*.haproxy.*.*.check_duration,3,"averageSeries"),10),0).


Graphite Pager - An Easy Way to Send Alerts From Graphite

Posted on

I've started working on a project making it easy to send alerts from Graphite. Previously at AWeber we had this problem as well but used Nagios (not that easy) which wasn't a great experience. Given that PagerDuty can handle the notification part (Yay APIs!) all that was left was reading Graphite's rawData output and triggering an alert.

Right now I'm testing it at SeatGeek running it on Heroku, the example of how to set up Graphite Pager on Heroku is small and straightforward. It has already helped detect a few problems before our other monitoring tools and (eventually) can alert on actual business metrics!

The alert format looks like:

Pretty Simple. It supports globbing with unique alerts for each metric. Graphite Pager can't determine a disappearing host from the glob, maybe in the future, but will set the alert for all metrics returned.


New Job

Posted on

Friday August 17th was my last day at AWeber. I've accepted a position at SeatGeek as a Web Engineer.

AWeber was a great opportunity that turned me from a programmer into a developer. I wish everyone there the best of luck.

Now that I've left I'm upset that most of what I worked on there was not open source so I will no longer be able to use the things that I've built. Towards the end of my time there I was trying to put more projects on Github that could have been useful to someone. Hopefully at SeatGeek I will have the change to make some larger contributions!


Kanban What

Posted on

During Philly Tech Week I gave a talk about Kanban. This was my first time speaking professionally and I have since volunteered to give a talk at Philly Coders

The official blurb for the Kanban talk

When faced with the challenges of managing a growing email marketing software and 40-person development team, AWeber turned to the project management system Kanban. In this presentation with Ethan McCreadie and Philip Cristiano, they shared AWeber's journey into Kanban, how it functions within AWeber's team structure, and the advantages and disadvantages other companies should take into consideration before implementing the system.


Free Shirts (Focus on Quality)

Posted on

If you want someone to wear your shirt (or promote your company through any free merch) then you might as well spend a little more and give away something great! In the case of shirts, I am a huge fan of the companies that give away American Apparel shirts instead of something crappy.

Companies that I've seen give away American Apparel shirts include:

Even when the company is a competitor to my employer (MailChimp), I'm likely to wear the shirt.

Companies that at least make a comfortable shirt without crappy graphics may still be worn after I'm through the American Apparel shirts.

Like in software, even if you are giving away something for free, you should focus on quality.


Twitter Bootstrap

Posted on

I've switched over to using Twitter's Bootstrap library. It was about 30 minutes to setup but I'll have to look around for the best way to use set the active tab in the navbar.


Brightcove's Diamond and Contributing to Projects

Posted on

I have spent a significant amount of time recently looking into monitoring and metrics collection. At work we have Nagios and Cacti currently and are looking at other options. After setting up Ganglia we decided to give Graphite a try. There is a script to send data from Ganglia to Graphite although the whole system gets to be more complex than I'd like. The chain winds up being: monitored server -> Ganglia -> Ganglia parser -> Graphite.

Looking at the Graphite Tools page I learned about Diamond which can collect metrics using a Python agent and send them directly to Graphite. It has some very useful collectors already and the big benefit, makes it very easy to add a new collector!

In about an hour I was able to make a collector for MongoDB and submit a pull request. The changeset was accepted making it my first contribution to an existing open-source project! I plan on creating more collectors for software that we are using as work. Next up is likely to be RabbitMQ.

And if you haven't, try Graphite!


Back to Jekyll

Posted on

I've switched back to Jekyll! This time it's hosted on my Linode account. After checking my analytics for my domain while it was hosted with Flavors.me I realized it wasn't worth $20 a year for as few hits it was getting when I was paying for this server. It did serve as a good place for recruiters to contact me. We'll see if that still happens here.