Archive for » April, 2008 «

Thursday, April 03rd, 2008 | Author:

This post follows this one and is a translation of what I wrote here.

It does not exist (yet) MIBs for ZFS, and particulary to check failed disks. This is quite annoying. Hopefully, Solaris has (since Solaris 10) Fault Manager.

Quick FMd

fmd in 3 commands :

  • fmadm faulty : lists the problems (and their UUID)
  • fmadm repair [UUID] : marks the problem as repaired
  • fmdump : dump problems list, including repaired ones

Installing SNMPd

pkgadd -d [repsdespackages] SUNWsmcmd SUNWsmmgr SUNWsmagt

Run snmpconf (with the -i switch) to setup easily the behaviour of the daemon.

and of course :

svcadm enable sma

SNMPd & FMd
Add into /etc/sma/snmp/snmpd.conf :

dlmod sunFM /usr/lib/fm/amd64/libfmd_snmp.so.1

to activate the snmp module for fmd

and then restart sma :

svcadm restart sma

Please note that the path is arch dependant (x86 64 bits here)

Crash test it

# prepare a file based zfs pool
mkdir crash
cd crash
# Files must be > 64M
dd if=/dev/zero of=pool1 bs=1024k count=64
dd if=/dev/zero of=pool2 bs=1024k count=64
dd if=/dev/zero of=pool3 bs=1024k count=64
# create the pool
sudo zpool create crashtest raidz /home/nico/crash/pool1 /home/nico/crash/pool2 /home/nico/crash/pool3
# break it
rm pool3
# scrub it (to be sure that the system sees the failure)
sudo zpool scrub crashtest
# check that fmd does its job
sudo fmadm faulty

Now, let’s see what informations we get with SNMP :

snmptable -v2c -c public 127.0.0.1 SUN-FM-MIB::sunFmProblemTable

| sunFmProblemUUID | sunFmProblemCode | sunFmProblemURL | sunFmProblemDiagEngine | sunFmProblemDiagTime | SunFmProblemSuspectCount |
| “96397f16-1cea-463b-e9db-de989cd42e81” | ? | ? | ? | ? | ? |

The module exports 4 tables : sunFmProblemTable, sunFmFaultEventTable, sunFmModuleTable, sunFmResourceTable

the easiest way is to use snmpwalk :

snmpwalk -c public -v 2c 127.0.0.1 SUN-FM-MIB::sunFmProblemTable
SUN-FM-MIB::sunFmProblemUUID.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: “96397f16-1cea-463b-e9db-de989cd42e81″
SUN-FM-MIB::sunFmProblemCode.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: ZFS-8000-D3
SUN-FM-MIB::sunFmProblemURL.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: http://sun.com/msg/ZFS-8000-D3
SUN-FM-MIB::sunFmProblemDiagEngine.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: fmd:///module/zfs-diagnosis
SUN-FM-MIB::sunFmProblemDiagTime.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: 2008-2-21,12:31:2.0,+1:0
SUN-FM-MIB::sunFmProblemSuspectCount.”96397f16-1cea-463b-e9db-de989cd42e81” = Gauge32: 1

Nagios integration

See this post.

See also

All this stuff is based upon this excellent post.