Creating a Sensu Check
I recently did a lightning talk at DevOpsMtl about creating a Sensu check. Sensu is a monitoring framework that lets you build your monitoring solution, exactly how you need it.
Despite tons of checks being available already (see the ‘plugins’ directory of the sensu-community-plugins repo), you may run into situations where you need to build your own. Thankfully, it’s pretty easy.
What are sensu checks?
Sensu checks are run periodically to check anything from basic system state (CPU, memory, disk usage, load, etc) to the state of software you’re running (PostgreSQL, MySQL, ElasticSearch, Redis, etc). You can also go further and use Sensu to monitor the state of external systems you depend on, even if you don’t have direct control over them. At least you can initiate a failover (e.g. switch to Mailgun if SendGrid is down).
The doc on checks is pretty comprehensive. This post is meant as a primer.
So, what’s a Sensu check? Let’s start with how they’re configured:
1 2 3 4 5 6 7 8 9 |
|
The command
parameter is simply an executable command that will be run on the
server. Here we have an executable ruby script (properly hash-banged),
invoked with a few parameters.
It doesn’t have to be Ruby, it can be Bash, Python, a compiled program, anything.
Heck, if you feel like it, you could cram a whole Bash one-liner in your
command
parameter and not need to upload a script to the monitored node.
The API
The script must behave a certain way for Sensu to be able to make sense of it. Before I go any further, I’ll just mention that there are two kinds of checks: standard checks and metric checks. In this post I’ll cover only the first. Go here to read about metrics checks
For those familiar with Nagios, standard Sensu checks are compatible with Nagios checks. So if you know of a Nagios check that does what you need, you can stop right here and go grab that. Otherwise, let’s continue.
The exit status of a Sensu check should be:
- 0: ok
- 1: warning
- 2: critical
- 3 or more: unknown
A sensu check optionally outputs text describing the state to stdout or stderr. Ideally, your check should output at least one line, but it can be more.
Example outputs of check-ram.rb:
Exit Status Output 0 CheckRAM OK: 65% free RAM left 1 CheckRAM WARNING: 40% free RAM left 2 CheckRAM CRITICAL: 13% free RAM left 3 CheckRAM UNKNOWN: invalid percentage
Dumb check example
Here’s an example of a dummy Bash check that monitors the value of $RANDOM. Anything over 22000 is ok, over 11000 is a warning and below that is critical.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Ruby is nicer
If you prefer using Ruby, the Sensu folks created a Ruby gem you can start from.
1
|
|
Here’s a simplified version of the check-ram.rb plugin:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
You inherit from Sensu::Plugin::Check::CLI
, then define a run
method and
you’re off to the races.
CLI parameters
The option
method lets you declare how your script should be invoked.
Here’s a few things you can specify:
- default value
- required arguments
- short and long argument names (
-v
or--version
) - a description for the CLI help
- a proc to run on the argument. Commonly used to call .to_i, .to_f.
All values you get in the
config
hash are strings otherwise. - is it a boolean argument (this is the only case where you don’t get a string).
Using option
also has the nice side effect of producing a useful --help
for your script.
The workings of option
will sound familiar to fans of Thor,
but in our case, option
is a feature provided by Opscode’s
mixlib-cli gem.
You can go see there to see all of the documentation for option
.
The instance method config
is also created by mixlib-cli. It returns a hash
of all resolved CLI parameters (manually specified value or default and so on).
config
being an instance method, it’s available inside run
or in any other
method you care to define, without needing to pass around the hash.
Reporting the state
So you have defined your params and you know how to access them in your code.
The sensu-plugin
gem also gives you a few helper methods to help you report
the observed state in a consistent manner.
The basics are
ok(message)
warning(message)
critical(message)
unknown(message)
Each of those corresponds exactly to the API we defined at the top. One thing to know is that calling any of those stops the execution of the script right away.
The check-ram.rb example uses a slightly different approach. Since the message
is always exactly the same, it calls message(message)
once, and then calls
ok()
(or other) without a message.
If you look back at the code for check-ram.rb, you’ll notice that the message reported is “65% free RAM left”. The sensu-plugin gem automatically normalizes the output with the check name and status: “CheckRAM OK: 65% free RAM left”.
Testing your check
Since Sensu checks are simple executables, now you only need to run your check manually in all appropriate scenarios to validate that your check works as expected.
No one’s created an automated test harness yet for Sensu checks. So manual testing will have to do for now :-)
Last step
The last step of creating a Sensu check is to visit the sensu-community-plugins repo, create a pull request and share your check with the world. You’re probably not the only one who needs to monitor whatever you just created a check for, so why not give back to the community?