I’ve been updating my Nagios monitoring to make sure I’m carefully monitoring my server hardware, including temperature, fan speeds and hard drives, and I wanted to use S.M.A.R.T. monitoring to monitor drives on a 3ware RAID controller for signs of imminent failure.
I already monitor the status of the RAID array itself using my nagios_3ware_raid_check Nagios plugin (which I previously blogged about), but I wanted to use SMART monitoring to look for signs of imminent drive trouble, rather than simply finding out when a drive has just failed.
After installing smartmontools, I was able to edit /etc/smartd.conf, disabling the default of scanning for devices, and listing devices explicitly, as follows:
# Monitor the drives on our RAID array; schedule self tests for Sundays.
/dev/twa0 -d 3ware,0 -a -s L/../../7/02
/dev/twa0 -d 3ware,1 -a -s L/../../7/04
The above monitors both drives of a RAID-1 mirrored pair on a 3ware controller card; the -s option schedules a long self-test every Sunday, starting between 2-3am and 4-5am respectively.
I’m still looking for a good way to monitor via Nagios, though; the (poorly-named) check_ide_smart plugin doesn’t support the ability to monitor drives on other interfaces as far as I can see. I found a couple of Perl scripts such as Check-SMART-status-modified, but they had issues.