Tuesday, September 14th, 2010 | Author:

eth0I recently switched from puppet daemon to mcollective commander : it kicks the “stuck in outer space puppet daemon” feature out of the way and brings me nice features (as load control). To do so I deployed the puppetd agent over my boxes.

As most of the sysadmins say : “if it’s not monitored, it doesn’t exist” I had placed a script in cron based on the puppetlast script to report by mail once a day which hosts had not checked in during the last 30 minutes. This method had 2 serious flaws : it runs only once a day (I hate being flooded by mails) and test machines keep nagging you until you remove the yaml files on the puppetmaster. Talking with Volcane on #mcollective I discovered that the agent was able to report when the client last run, so I decided to use this to check my puppet runs freshness with a nagios plugin.

Good bye cron job, say hello nagios.

Category: Code, NetAdmin, Puppet, SysAdmin  | Tags: , ,  | Comments off
Wednesday, September 08th, 2010 | Author:

eth0 Most people that work with puppet use a VCS : subversion, git, CVS, mercurial… Pick yours. My company uses subversion and each cmmit to the repository needs to be pulled by the master. Since I have two masters, I also want them to be synchronized. Once again it’s mcollective that comes to the rescue. I wrote a very simple agent (a 5 minutes work, to be improved) that can update a specified path. Grab it here. Once it is deployed you can use a post commit hook that calls it.

Example of mine :

#!/usr/local/bin/ruby
 
require 'mcollective'
include MCollective::RPC
 
mc = rpcclient("svnagent")
mc.progress = false
mc.class_filter "puppet::master"
mc.update(:path => "/etc/puppet")

The agent will only be called on machines being puppet masters by using the class filter.

Category: Code, Puppet, SysAdmin  | Tags: , ,  | 2 Comments
Wednesday, August 18th, 2010 | Author:

RubyQuick post : installing a mongo server is good, monitoring it is better. I wrote a very basic check that ensure your mongo DB is healthy. Grab it on my github.

Category: Code, SysAdmin, Tech  | Tags: , ,  | Comments off
Wednesday, August 18th, 2010 | Author:

eth0 In my last post I’ve been talking about mcollective (check the new website) and mongo and how awesome it is.

I’ve mentioned the docs available in the wiki but I’ve been too fast on this point; it’s not yet in the wiki (only some parts) so here is a little “guide” about setting the pieces work together. All comments are welcome.

  • Deploy meta.rb on all your  nodes. It will make the metadata available to other nodes.
  • Add the following lines to your server.cfg file :
registration = Meta
registerinterval = 300
  • Install a mongoDB server (debian ships it in squeeze)
  • Deploy the mongo registration agent on one node (don’t be like me, do not start deploying it on all nodes !). It will “suck” metadata from nodes, and insert it to the mongo database.
  • Add the following lines to your server.cfg on the host with the registration agent :
plugin.registration.mongohost = mongo.mycorp.net
plugin.registration.mongodb = puppet
plugin.registration.collection = nodes
  • Connect to your mongoDB and enjoy :
$ mongo mongo.mycorp.net/puppet
MongoDB shell version: 1.6.0
connecting to: mongo.mycorp.net/puppet
> db.nodes.find().count()
59
  • You’re done !
Category: BOFH Life, SysAdmin, Tech  | Tags: , , ,  | Comments off
Thursday, August 12th, 2010 | Author:

eth0I run a bunch of box under OpenBSD at $WORK and I wanted to be able to run mcollective on these too. Unfortunately, there were no package available for this OS. So I took time and with some help from landry@ I was able to build a port. It has been integrated into the github repo quickly. But to be able to use mcollective you also need to have the ruby stomp connector. As I don’t like using gems and prefer them packages, I also built a port for it. You can grab it in my github repository here.

R.I. Pienaar also added some cool mongo stuff to mcollective (see the project wiki for more details) and I built 2 more ports for the bson & mongo gems.

These packages have been tested under OpenBSD 4.7 (i386, but they are no_arch ones, ruby related), let me know if they work for you.

Category: BOFH Life, SysAdmin, Tech  | Tags: , , ,  | Comments off
Wednesday, June 23rd, 2010 | Author:

eth0If you read this blog you may know I daily use puppet at $WORK. Puppet is made to maintain configuration on machines, but not for one shot actions. For a couple of month I used to work with fabric for this but It had a few drawbacks, mainly because you need to maintain the list of hosts you want to act on and that it hates dead hosts, even if ghantoos dropped a link showing how to get rid of this. So I replaced it with mcollective : not exactly the same (it uses an agent instead of SSH) but it dynamicaly knows which hosts are up, allows to filter on facts (from facter, the puppet companion).

You could think that needing an agent is a drawback but it allows much complex actions, fine grained logic. Moreover it does not require much work to be installed if you already have puppet running. If you have a high number of machines you gain scalability with real parallel actions : it does not take longer to run on 5 machines than on 100.

One more point is the active development : R.I. Pienaar released 2 versions recently, improving access control, adding DDL to create interfaces easily. You can give your unprivileged staff some power through a web interface in less than 100 lines of code.

A tool for sysadmins that appreciate the devops spirit and want to do more in less time !

I’ll be publishing my agent and related tools on my github account.

Category: BOFH Life, SysAdmin, Tech  | Tags: , , ,  | Comments off
Thursday, May 20th, 2010 | Author:

eth0Readers of this little blog may know I’ve spent some time to have a munin setup that was tweaked to be optimized. But I’ve reached a point where my knowledge did not suffice to balance the design problems of munin. This is not a rant, I just say that when your infrastructure reaches a certain size munin reaches its limits. The pull based model, the graph generation (don’t tell me about the CGI graph, this thing never worked as expected) overloaded my management box. Talking with other peoples brought collectd to my attention so I gave it a try.

Collectd has many nice features : 10 seconds precision (versus 5 minutes), written in C for performance, multicast support (even if I don’t use it), and you can even create collectd relays. There are some packages for many different targets, even for my OpenWRT based access points ! Configuration can easily be automated (flat files, superior), a mandatory point to me.

Of course collectd comes without an UI but that’s no big deal : there are many around. As I said in the previous post, I use visage.

For the load generated a picture says a thousand words (click to enlarge) :

goodbyemunin

Next step is working on the IO congestion caused by so many RRD updates, collectd wiki has many tips about this, this will probably be fixed in a couple of hours.

Tuesday, May 18th, 2010 | Author:

eth0Some friends told me for a while about collectd, why I should look at it, why munin is so painful and so on. If you’ve been reading my posts you know I have tweaked a little my $WORK munin install to make it faster and lighter. But I finally took time to explore collectd, and I regret to not have done this before. It has so many pros that I decided to implement it in parallel with munin (because I can’t afford being blind on metrics). But collectd comes without an UI : it “only” collectds data, but that’s not a problem. There are various web interfaces and after giving a look to a bunch of them I fell in love with Lindsay Holmwood‘s Visage.

This piece of software is definitely cool : all graphs are rendered live in your browser in SVG. Yes ! Realtime graphs, no need for crappy flash s***, zoom. It is based on sinatra, haml and some JS libraries (I won’t talk about this, my JS foo is deeper than the Mariana Trench). But it lacked some features : it’s OK when you have a few hosts but when the hosts list starts being loooong then the interface needs some improvements. So I forked it on github and implemented (some parts of) what I needed. My github fork has host grouping & per host profiles. Check this out and enjoy Visage !

Now working on sets of graphs :)

PS : <3 Guigui2

Category: BOFH Life, Code, NetAdmin, SysAdmin, Tech  | Tags: , , ,  | Comments off
Wednesday, April 14th, 2010 | Author:

eth0I already blogged about my experiments with mcollective & xen but I had something a little bigger in my mind. A friend had sent me a video showing some vmware neat features (DRS mainly) with VMs migrating through hypervisors automatically.

So I wrote a “proof of concept” of what you can do with an awesome tool like mcollective. The setup of this funny game is the following :

  • 1 box used a iSCSI target that serves volumes to the world
  • 2 xen hypervisors (lenny packages) using open-iscsi iSCSI initiator to connect to the target. VMs are stored in LVM, nothing fancy

The 3 boxens are connected on a 100Mb network and the hypervisors have an additionnal gigabit network card with a crossover cable to link them (yes, this is a lab setup). You can find a live migration howto here.

For the mcollective part I used my Xen agent (slightly modified from the previous post to support migration), which is based on my xen gem. The client is the largest part of the work but it’s still less than 200 lines of code. It can (and will) be improved because all the config is hardcoded. It would also deserve a little DSL to be able to handle more “logic” than “if load is superior to foo” but as I said before, it’s a proof of concept.

Let’s see it in action :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    873.5
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  78838.0
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.3
test3                                       20   256     1     r-----     11.9

test3 is a VM that is “artificially” loaded, as is the machine “hypervisor3” (to trigger migration)

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Let’s see our hypervisors :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    878.9
test3                                       25   256     1     -b----      1.1
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  79079.3
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.4

A little word about configuration options :

  • interval : the poll time in seconds.  this should not be too low, let the machine some time and avoid load peeks to distort the logic.
  • load_threshold : where you consider the machine load is too high and that it is time to move some stuff away (tampered with max_over, see below)
  • daemonize : not used yet
  • max_over : maximum time (in minutes) where load should be superior to the limit. When reached, it’s time, really. Don’t set it too low and at least 2*interval or sampling will not be efficient
  • debug : well….
  • max_vm_per_host : the maximum VMs a host can handle. If a host already hit this limit it will not be candidate for receiving a VM
  • max_load_candidate : same thing as above, but for the load
  • host_mapping : a simple CSV file to handle non-DNS destinations (typically my crossover cable address have no DNS entries)

What is left to do :

  • Add some barriers to avoid migration madness to let load go down after a migration or to avoid migrating a VM permanently
  • Add a DSL to insert some more logic
  • Write a real client, not a big fat loop

Enjoy the tool !

Files :

Tuesday, April 06th, 2010 | Author:

eth0Another cool project I keep an eye on for some weeks is “the marionette collective“, aka mcollective. This project is leaded & develloped by R.I. Pienaar, one of the most active people in the puppet world too.

Mcollective is an framework for distributed sysadmin. It relies on a messaging framework and has many features included : flexibility, speed, easy to understand.

Some time ago, I had wrote a tool called “whosyourdaddy” to help me (and my memory as big as a goldfish one) to find on which Xen dom0 a Xen domU was living. It worked fine, expect the fact that is was not dynamic : if a VM was migrated  from a dom0 to another, I had to update the CMDB. Not really reliable (if an update fails the CMDB is no more accurate) and I didn’t want to have to embed this constraint in the Xen logic. So I decided to try out to write my own mcollective agent and here it is ! It is built on top of a (very) small ruby module for xen and has it own client.

You can find on which dom0 a domU resides :

master1:~# ./mc-xen -a find --domu test
hypervisor2              : Absent
hypervisor1              : Absent
master1:~# ./mc-xen -a find --domu domu2
hypervisor2              : Present
hypervisor1              : Absent

Or list your domUs :

master1:~# ./mc-xen -a list
hypervisor2              
 domu2

hypervisor1              
 no domU running

Download the agent & the client