Archive for the Category » BOFH Life «

Friday, March 11th, 2011 | Author:

eth0This post follows my previous one, dealing with the reuse of chef providers of chef in mcollective. In the comments Adam Jacob had an interesting word and when I wrote my second agent, to manage package I saw it would be a piece of cake to write a really generic agent, due to the nature of chef resource (and the way to invoke them)

So, this is a generic chef resource mcollective agent, with the associated example client code. It anyway deserves an little explanation; it is not mean to work with a command line invocation. Why ? Because I push quite “complex” data as the resourceactions parameter. The only way I found to make this work from command line is to use eval on the argument, which is no way acceptable. Anyway I hope some people will find this useful.

Category: BOFH Life, Code, SysAdmin, Tech  | Tags: , , ,  | Comments off
Tuesday, March 08th, 2011 | Author:

eth0It has been quite calm for a couple of months here. I have switched job, it explains why I had less time to post some things.I now work at fotolia, and I switched from puppet to chef (no troll intended, I still think puppet is a great tool, please read this).

However, a tool I still have is the awesome mcollective. Unfortunately, the most used agents (package, service) relay on puppet providers to do their actions. Fortunately, open source is here, so I wrote a (basic) service agent that uses chef providers to start/stop or restart an agent. It still needs some polish for the status part (ho the ugly hardcoded path) but I was quite excited to share this. Freshly pushed on github !

Thanks to Jordan Sissel for minstrel, an awesome debug tool, the opscode team for the help on the provider and R.I. Pienaar for mcollective (and the support).

Category: BOFH Life, Code, SysAdmin, Tech  | Tags: , , ,  | 3 Comments
Wednesday, August 18th, 2010 | Author:

eth0 In my last post I’ve been talking about mcollective (check the new website) and mongo and how awesome it is.

I’ve mentioned the docs available in the wiki but I’ve been too fast on this point; it’s not yet in the wiki (only some parts) so here is a little “guide” about setting the pieces work together. All comments are welcome.

  • Deploy meta.rb on all your  nodes. It will make the metadata available to other nodes.
  • Add the following lines to your server.cfg file :
registration = Meta
registerinterval = 300
  • Install a mongoDB server (debian ships it in squeeze)
  • Deploy the mongo registration agent on one node (don’t be like me, do not start deploying it on all nodes !). It will “suck” metadata from nodes, and insert it to the mongo database.
  • Add the following lines to your server.cfg on the host with the registration agent :
plugin.registration.mongohost = mongo.mycorp.net
plugin.registration.mongodb = puppet
plugin.registration.collection = nodes
  • Connect to your mongoDB and enjoy :
$ mongo mongo.mycorp.net/puppet
MongoDB shell version: 1.6.0
connecting to: mongo.mycorp.net/puppet
> db.nodes.find().count()
59
  • You’re done !
Category: BOFH Life, SysAdmin, Tech  | Tags: , , ,  | Comments off
Thursday, August 12th, 2010 | Author:

eth0I run a bunch of box under OpenBSD at $WORK and I wanted to be able to run mcollective on these too. Unfortunately, there were no package available for this OS. So I took time and with some help from landry@ I was able to build a port. It has been integrated into the github repo quickly. But to be able to use mcollective you also need to have the ruby stomp connector. As I don’t like using gems and prefer them packages, I also built a port for it. You can grab it in my github repository here.

R.I. Pienaar also added some cool mongo stuff to mcollective (see the project wiki for more details) and I built 2 more ports for the bson & mongo gems.

These packages have been tested under OpenBSD 4.7 (i386, but they are no_arch ones, ruby related), let me know if they work for you.

Category: BOFH Life, SysAdmin, Tech  | Tags: , , ,  | Comments off
Wednesday, June 23rd, 2010 | Author:

eth0If you read this blog you may know I daily use puppet at $WORK. Puppet is made to maintain configuration on machines, but not for one shot actions. For a couple of month I used to work with fabric for this but It had a few drawbacks, mainly because you need to maintain the list of hosts you want to act on and that it hates dead hosts, even if ghantoos dropped a link showing how to get rid of this. So I replaced it with mcollective : not exactly the same (it uses an agent instead of SSH) but it dynamicaly knows which hosts are up, allows to filter on facts (from facter, the puppet companion).

You could think that needing an agent is a drawback but it allows much complex actions, fine grained logic. Moreover it does not require much work to be installed if you already have puppet running. If you have a high number of machines you gain scalability with real parallel actions : it does not take longer to run on 5 machines than on 100.

One more point is the active development : R.I. Pienaar released 2 versions recently, improving access control, adding DDL to create interfaces easily. You can give your unprivileged staff some power through a web interface in less than 100 lines of code.

A tool for sysadmins that appreciate the devops spirit and want to do more in less time !

I’ll be publishing my agent and related tools on my github account.

Category: BOFH Life, SysAdmin, Tech  | Tags: , , ,  | Comments off
Thursday, May 20th, 2010 | Author:

eth0Readers of this little blog may know I’ve spent some time to have a munin setup that was tweaked to be optimized. But I’ve reached a point where my knowledge did not suffice to balance the design problems of munin. This is not a rant, I just say that when your infrastructure reaches a certain size munin reaches its limits. The pull based model, the graph generation (don’t tell me about the CGI graph, this thing never worked as expected) overloaded my management box. Talking with other peoples brought collectd to my attention so I gave it a try.

Collectd has many nice features : 10 seconds precision (versus 5 minutes), written in C for performance, multicast support (even if I don’t use it), and you can even create collectd relays. There are some packages for many different targets, even for my OpenWRT based access points ! Configuration can easily be automated (flat files, superior), a mandatory point to me.

Of course collectd comes without an UI but that’s no big deal : there are many around. As I said in the previous post, I use visage.

For the load generated a picture says a thousand words (click to enlarge) :

goodbyemunin

Next step is working on the IO congestion caused by so many RRD updates, collectd wiki has many tips about this, this will probably be fixed in a couple of hours.

Tuesday, May 18th, 2010 | Author:

eth0Some friends told me for a while about collectd, why I should look at it, why munin is so painful and so on. If you’ve been reading my posts you know I have tweaked a little my $WORK munin install to make it faster and lighter. But I finally took time to explore collectd, and I regret to not have done this before. It has so many pros that I decided to implement it in parallel with munin (because I can’t afford being blind on metrics). But collectd comes without an UI : it “only” collectds data, but that’s not a problem. There are various web interfaces and after giving a look to a bunch of them I fell in love with Lindsay Holmwood‘s Visage.

This piece of software is definitely cool : all graphs are rendered live in your browser in SVG. Yes ! Realtime graphs, no need for crappy flash s***, zoom. It is based on sinatra, haml and some JS libraries (I won’t talk about this, my JS foo is deeper than the Mariana Trench). But it lacked some features : it’s OK when you have a few hosts but when the hosts list starts being loooong then the interface needs some improvements. So I forked it on github and implemented (some parts of) what I needed. My github fork has host grouping & per host profiles. Check this out and enjoy Visage !

Now working on sets of graphs :)

PS : <3 Guigui2

Category: BOFH Life, Code, NetAdmin, SysAdmin, Tech  | Tags: , , ,  | Comments off
Wednesday, April 14th, 2010 | Author:

eth0I already blogged about my experiments with mcollective & xen but I had something a little bigger in my mind. A friend had sent me a video showing some vmware neat features (DRS mainly) with VMs migrating through hypervisors automatically.

So I wrote a “proof of concept” of what you can do with an awesome tool like mcollective. The setup of this funny game is the following :

  • 1 box used a iSCSI target that serves volumes to the world
  • 2 xen hypervisors (lenny packages) using open-iscsi iSCSI initiator to connect to the target. VMs are stored in LVM, nothing fancy

The 3 boxens are connected on a 100Mb network and the hypervisors have an additionnal gigabit network card with a crossover cable to link them (yes, this is a lab setup). You can find a live migration howto here.

For the mcollective part I used my Xen agent (slightly modified from the previous post to support migration), which is based on my xen gem. The client is the largest part of the work but it’s still less than 200 lines of code. It can (and will) be improved because all the config is hardcoded. It would also deserve a little DSL to be able to handle more “logic” than “if load is superior to foo” but as I said before, it’s a proof of concept.

Let’s see it in action :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    873.5
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  78838.0
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.3
test3                                       20   256     1     r-----     11.9

test3 is a VM that is “artificially” loaded, as is the machine “hypervisor3” (to trigger migration)

[mordor:~] ./mc-xen-balancer
[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.11 load and 3 slice(s) running
[+] added test1 on hypervisor3 with 0 CPU time (registered 18.4 as a reference)
[+] added test2 on hypervisor3 with 0 CPU time (registered 19.4 as a reference)
[+] added test3 on hypervisor3 with 0 CPU time (registered 18.3 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.0 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.5 CPU time eaten (registered 19.8 as a reference)
[+] sleeping for 30 seconds

[+] hypervisor2 : 0.16 load and 0 slice(s) running
[+] init/reset load counter for hypervisor2
[+] hypervisor2 has no slices consuming CPU time
[+] hypervisor3 : 1.33 load and 3 slice(s) running
[+] updated test1 on hypervisor3 with 0.0 CPU time eaten (registered 18.4 as a reference)
[+] updated test2 on hypervisor3 with 0.0 CPU time eaten (registered 19.4 as a reference)
[+] updated test3 on hypervisor3 with 1.7 CPU time eaten (registered 21.5 as a reference)
[+] hypervisor3 has 3 threshold overload
[+] Time to see if we can migrate a VM from hypervisor3
[+] VM key : hypervisor3-test3
[+] Time consumed in a run (interval is 30s) : 1.7
[+] hypervisor2 is a candidate for being a host (step 1 : max VMs)
[+] hypervisor2 is a candidate for being a host (step 2 : max load)
trying to migrate test3 from hypervisor3 to hypervisor2 (10.0.0.2)
Successfully migrated test3 !

Let’s see our hypervisors :

hypervisor2:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   233     2     r-----    878.9
test3                                       25   256     1     -b----      1.1
hypervisor3:~# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   232     2     r-----  79079.3
test1                                        6   256     1     -b----     18.4
test2                                        4   256     1     -b----     19.4

A little word about configuration options :

  • interval : the poll time in seconds.  this should not be too low, let the machine some time and avoid load peeks to distort the logic.
  • load_threshold : where you consider the machine load is too high and that it is time to move some stuff away (tampered with max_over, see below)
  • daemonize : not used yet
  • max_over : maximum time (in minutes) where load should be superior to the limit. When reached, it’s time, really. Don’t set it too low and at least 2*interval or sampling will not be efficient
  • debug : well….
  • max_vm_per_host : the maximum VMs a host can handle. If a host already hit this limit it will not be candidate for receiving a VM
  • max_load_candidate : same thing as above, but for the load
  • host_mapping : a simple CSV file to handle non-DNS destinations (typically my crossover cable address have no DNS entries)

What is left to do :

  • Add some barriers to avoid migration madness to let load go down after a migration or to avoid migrating a VM permanently
  • Add a DSL to insert some more logic
  • Write a real client, not a big fat loop

Enjoy the tool !

Files :

Tuesday, March 02nd, 2010 | Author:

kermitI’ve been lazy at maintaining my servers recently and decided to start playing with puppet reports. First I started with something simple that helps me to find on which machines my manifests have some failure.

So here’s a quick and dirty code that goes through Puppet’s reportdir and points out neglected machines.

#!/usr/bin/env ruby
 
require 'puppet'
require 'find'
require 'yaml'
require 'optparse'
 
Puppet[:config] = "/etc/puppet/puppet.conf"
Puppet.parse_config
 
def most_recent_file(path)
	reports = []
	Find.find(path) { |file|
		if File.file? file
			reports << File.basename(file,".yaml")
		end
	}
	reports.sort!.reverse!
	return path+"/"+reports[0].to_s+".yaml"
end
 
 
def scan_dir(path, debug=false)
	Find.find(path) { |entry|
		if entry != path # don't scan the basedir
			if File.directory? entry
				report = most_recent_file(entry)
				scan_file(report, debug)
			end
		end
	}
end
 
 
def scan_file(filename, debug=false)
	notify_on_field = [:failed]
 
	# debug
	if debug then  puts "scanning " + filename end
 
	fp=open(filename,"r")
	YAML::load_documents(fp) { |report|
		report.metrics["resources"].values.each { |value|
			if (notify_on_field.include? value[0]) and (value[2] > 0) then
				puts "#{report.host} has #{value[2]} #{value[0]} resource(s)"
				if debug then
					puts "log message(s) :"
					report.logs.each { |log|
						puts log.message
					}
				end
			end
		}
	}	
end
 
options = {}
myargs = Array
 
optparse = OptionParser.new { |opts|
	opts.banner = "Usage : report_check.rb"
 
	options[:show]=false
	opts.on("-d", "--debug", "runs in debug mode") do |debug|
		options[:debug]=true
	end
 
	opts.on("-h", "--help", "Displays this help") do
		puts opts
		exit
	end
 
}
 
optparse.parse!
 
scan_dir(Puppet[:reportdir], options[:debug])
Friday, February 26th, 2010 | Author:

kermitOn my Solaris machines at $WORK I use iMil‘s pkgin to install additional software. But until today, I add to do it by hand, on every machine… Not really what I like to do after a little more than a year using puppet. So I wrote a provider to manage packages with pkgin. It was very informative on puppet internals and I learned more about my favorite config management system.

Enough talking, here is the file : pkgin.rb

Example of use in a manifest :

class foo {
    package { "bla":
        ensure => installed,
        provider => pkgin
    }
}