Nico
Nico
Creator of this small website
Apr 3, 2008 2 min read

SNMP + FMd : english version

This post follows this one and is a translation of what I wrote here.

It does not exist (yet) MIBs for ZFS, and particulary to check failed disks. This is quite annoying. Hopefully, Solaris has (since Solaris 10) Fault Manager.

Quick FMd

fmd in 3 commands :

  • fmadm faulty : lists the problems (and their UUID)
  • fmadm repair [UUID] : marks the problem as repaired
  • fmdump : dump problems list, including repaired ones

Installing SNMPd

pkgadd -d [repsdespackages] SUNWsmcmd SUNWsmmgr SUNWsmagt

Run snmpconf (with the -i switch) to setup easily the behaviour of the daemon.

and of course :

svcadm enable sma

SNMPd & FMd

Add into /etc/sma/snmp/snmpd.conf :

dlmod sunFM /usr/lib/fm/amd64/libfmd_snmp.so.1

to activate the snmp module for fmd

and then restart sma :

svcadm restart sma

Please note that the path is arch dependant (x86 64 bits here)

Crash test it

# prepare a file based zfs pool

mkdir crash

cd crash

# Files must be > 64M

dd if=/dev/zero of=pool1 bs=1024k count=64

dd if=/dev/zero of=pool2 bs=1024k count=64

dd if=/dev/zero of=pool3 bs=1024k count=64

# create the pool

sudo zpool create crashtest raidz /home/nico/crash/pool1 /home/nico/crash/pool2 /home/nico/crash/pool3

# break it

rm pool3

# scrub it (to be sure that the system sees the failure)

sudo zpool scrub crashtest

# check that fmd does its job

sudo fmadm faulty

Now, let’s see what informations we get with SNMP :

snmptable -v2c -c public 127.0.0.1 SUN-FM-MIB::sunFmProblemTable

| sunFmProblemUUID | sunFmProblemCode | sunFmProblemURL | sunFmProblemDiagEngine | sunFmProblemDiagTime | SunFmProblemSuspectCount |

| “96397f16-1cea-463b-e9db-de989cd42e81” | ? | ? | ? | ? | ? |

The module exports 4 tables : sunFmProblemTable, sunFmFaultEventTable, sunFmModuleTable, sunFmResourceTable

the easiest way is to use snmpwalk :

snmpwalk -c public -v 2c 127.0.0.1 SUN-FM-MIB::sunFmProblemTable

SUN-FM-MIB::sunFmProblemUUID.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: “96397f16-1cea-463b-e9db-de989cd42e81″

SUN-FM-MIB::sunFmProblemCode.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: ZFS-8000-D3

SUN-FM-MIB::sunFmProblemURL.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: http://sun.com/msg/ZFS-8000-D3

SUN-FM-MIB::sunFmProblemDiagEngine.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: fmd:///module/zfs-diagnosis

SUN-FM-MIB::sunFmProblemDiagTime.”96397f16-1cea-463b-e9db-de989cd42e81″ = STRING: 2008-2-21,12:31:2.0,+1:0

SUN-FM-MIB::sunFmProblemSuspectCount.”96397f16-1cea-463b-e9db-de989cd42e81” = Gauge32: 1

Nagios integration

See this post.

See also

All this stuff is based upon this excellent post.