You are here: Foswiki>Net Web>MONStuff (20 Dec 2009, AntonIvanov?)EditAttach

Mon Stuff

Introduction

It is a matter of personal opinion of course, but for me MON has always been the monitoring package which is the easiest to extend and adapt to custom applications. Once again, it is a matter of personal preferences, but I have found mon alerts easiest to write. It is easier than BigBrother?, NetSaint? and other server monitoring packages. In fact the code attached to this article is a good example that extending mon is a no-brainer. Some of the monitors are replacements for software that is shipped by default with the controller card or is installed in Linux by default. Some sysadmins prefer to leave vendor monitoring and end up with every system being monitored by a different commercial package. Some sysadmins prefer to replace all vendor monitoring with something that plugs into their own monitoring system. I belong to the second group.

Obvious question - why are these alerts not contributed upstream? Obvious answer - the ones that I consider relevant have gone upstream long ago. The rest are either trivial or depend on vendor specific utilities and will at best make the contrib section.

Monitor Scripts

The code in most of the subsections is mostly trivial 5 minutes worth of coding or less.

SmartRaid? Status

This is an monitoring module for older smartraid arrays - cpqarray using hpucli. You have to have the hpucli installed from the HP web site. Beware that the layout of the installation is very strange (it installs under /opt by default and tries to "hide" some directories). Actually strange is too polite. The packager at HP was smoking some really hard stuff when laying out the directory tree. Nothing bad in that, but he was not sharing either. I usually hack it to go under /usr/local where it belongs.

The HP utilities must be invoked as root. As a result the mon script must be setuid. While I have made some checks for bogus input it is not by any means audited properly. It must be installed so that only mon can read it.

-r-sr-x---  1 root daemon 1542 2005-12-28 15:53 raid.monitor.smartraid

Besides having to be run as root hpucli takes a long time to load. It should not be run very often (once every 10 minutes or more).

3Ware Status

This is a monitoring module for 3Ware SATA arrays (possibly will work on older PATA as well). You have to have the tw_cli installed from the 3Ware web site. The 3Ware utilities are not fully compatible across kernels, drivers and firmware. Download the latest one and if it complains move along down the release tree until you hit one that works. Similarly, any kernel upgrade may break 3Ware utilities so they should be retested after any kernel upgrades.

In addition to that all older 3ware utilties use SCSI and /proc interfaces which are no longer supported in the newer 2.6 kernels. Any 2.6 kernel after 2.6.9 will utilise only the newer interface supported in release 9 of the 3ware utilities. This is based on changelog, I need to retest this properly, but 2.6.7 worked with the 7.x tools. 2.6.13 and 2.6.14 definitely do not.

This is OK for any 9xxx controllers, but 8xxx do not have support for the commands essential for rebuilding as of Linux 2.6.14 and 3Ware CLI version 9.3.0.3 (note - latest tw_cli fixes at least some of thes conditions, upgrade is a good idea). At the same time older release 7.x utilies no longer work. The telltale sign is this error message when trying to rebuild an array:

Error: (CLI:022) Invalid operation(s) for the specified controller

In a case like this the only thing that can be done is to reboot the machine and rebuild the array from the 3Ware BIOS. In addition to that failure conditions are not reported correctly using the old tw_cli info c0 syntax. As a result this mon script was recently rewritten for the new 9.x tool syntax as required when running on 2.6.9+ kernels with version 9.3.0.X of the CLI. It should also work on FreeBSD? with 9.3.0.x tools (9xxx controllers).

Same as the HP SmartRaid? the 3ware cli must be invoked as root. As a result the mon script must be setuid. While I have made some checks for bogus input it is not by any means audited properly. It must be installed so that only mon can read it.

-r-sr-x---  1 root daemon 1286 2005-12-28 15:53  raid.monitor.3ware

Linux Software RAID Status

Linux software raid monitoring scripts. These do not need any special permission because /proc/mdstat is world readable on Linux. It will trigger a trap only if a raid array has an element in the F state. It will not generate a trap if the number of elements is less then expected. It will not generate alerts if the array is being rebuilt either. It provides the same functionality as mdadm in monitoring mode, but via mon.

raid.monitor.linuxswraid

NUT UPS alerts

This is a monitor for following UPS status. It will alert on power loss, overload and over-temperature. See the script itself on how to use it.

ups.monitor

This monitor has also proved to be most usefull for following environmental conditions (including temperature) and load. Temperature sensors on UPSes are considerably more reliable then temperature sensors on motherboards. Also they are indicative of the temperature in the entire server room, not an actual machine.

Environmental Alerts

There is a number of possible inputs for environmental alerts on a linux system. None of them is reliable and using most of them requires a lot of system specific tweaking, searching the internet for fellow sufferers and trial and error. The most common methods are:

lm-sensors

This is the most popular method relying on a combination of i2c (and other) devices and a userland component to read them. Unfortunately nearly every mainboard manufacturer has wired these sensors differenly (if at all). As a result the configuration file has to be altered, labels and limits moved around and some of the less reliable sensors disabled outright. Once this has been done it is possible to use them. Still, as reading them often produces false alerts on some systems it is necessary to configure a reasonable minimal duration for the alerts (or a repeat count). Otherwise there will be quite a few false positives. As the checking is done by /usr/bin/sensors the monitor itself is trivial.

sensors.monitor

/proc/acpi/thermal_zone reporting

Most non-laptop systems do not bother updating these. Out of all I have tried so far, the only ones that are reasonably reliable are Via EPIA. They require startup script to echo something reasonable into /proc/acpi/thermal_zone/THRM/polling_frequency.

Alerts

Mon does not ship with a good SMS alert and the ipso/tax and other paging protocols are no longer available from most telecommunications providers. Similarly mon does not have some other alerts people often find useful (network broadcast to Windoze workstations, etc). The following alerts fill these gaps.

SMSTools based SMS Alert

First of all, why an SMS alert at all? There are "cheap" Internet based SMS services out there so there is no need to do ancient things like driving a modem off the serial port. Fair enough, so how can one use these to alert that the Internet link has gone down?

Here is an alert script for mon which uses smstools. It needs to be installed setuid uucp:daemon

-r-sr-x---  1 uucp daemon 2266 2004-12-28 15:53  sms.alert

The serial port used by the alert needs to be chowned to uucp:daemon as well if hotplug has not done it for us.

-- AntonIvanov? - 08 Nov 2008

Topic attachments
I Attachment Action Size Date Who Comment
elsetemperature monitor.temperature manage 1.5 K 20 Dec 2009 - 20:44 AntonIvanov? Temperature Monitor Script
else3ware raid.monitor.3ware manage 1.2 K 20 Dec 2009 - 20:44 AntonIvanov? 3Ware Monitor script
elselinuxswraid raid.monitor.linuxswraid manage 0.6 K 20 Dec 2009 - 21:01 AntonIvanov? Linux Software Raid
elsesmartraid raid.monitor.smartraid manage 1.4 K 20 Dec 2009 - 20:45 AntonIvanov? Smartraid Monitor script
elsemonitor sensors.monitor manage 0.4 K 20 Dec 2009 - 20:46 AntonIvanov? LM Sensors Generic monitor script
elsealert sms-new.alert manage 1.8 K 20 Dec 2009 - 20:46 AntonIvanov? SMS Alert
elsemonitor ups.monitor manage 1.8 K 20 Dec 2009 - 20:46 AntonIvanov? UPS Monitor script
Topic revision: r2 - 20 Dec 2009 - 21:06:08 - AntonIvanov?


  • Google
    Web
    sigsegv.cx

 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback