Readers of this little blog may know I’ve spent some time to have a munin setup that was tweaked to be optimized. But I’ve reached a point where my knowledge did not suffice to balance the design problems of munin. This is not a rant, I just say that when your infrastructure reaches a certain size munin reaches its limits. The pull based model, the graph generation (don’t tell me about the CGI graph, this thing never worked as expected) overloaded my management box. Talking with other peoples brought collectd to my attention so I gave it a try.
Collectd has many nice features : 10 seconds precision (versus 5 minutes), written in C for performance, multicast support (even if I don’t use it), and you can even create collectd relays. There are some packages for many different targets, even for my OpenWRT based access points ! Configuration can easily be automated (flat files, superior), a mandatory point to me.
Of course collectd comes without an UI but that’s no big deal : there are many around. As I said in the previous post, I use visage.
For the load generated a picture says a thousand words (click to enlarge) :
Next step is working on the IO congestion caused by so many RRD updates, collectd wiki has many tips about this, this will probably be fixed in a couple of hours.