NAME

monit - system for monitoring programs


SYNOPSIS

monit [options] {arguments}


DESCRIPTION

monit is a utility for monitoring and managing daemons or similar programs running on a Unix system; monit will start specified programs if they are not running and restart programs not responding.

The monit utility can run in a daemon mode to repeatedly poll one or more programs at a specified interval.


GENERAL OPERATION

The behavior of monit is controlled by command-line options and a run control file, ~/.monitrc, the syntax of which we describe in a later section. Command-line options override .monitrc declarations.

The following options are recognized by monit. It's recommended that you set the log and daemon options in the control .monitrc file.

General Options and Arguments

-c file Use this control file

-l logfile Print log information to this file.

-p pidfile Use this lock file in daemon mode.

-d n Run as a daemon once per n seconds

-I Run from init (do not run in background)

-g Set group name for start, stop, restart and status

-v Verbose mode, work noisy (diagnostic output)

-V Print version number and patchlevel

-h Print a help text

In addition to the options above, monit can be started with one of the following action arguments; monit will then execute the action and exit without transforming itself to a deamon.

start Start all programs listed in the control file. If the group option is set, only start the programs in the named group.

start name Start the named program. The name must exist in the monitrc file, after a check keyword. See also the MONIT HTTPD section below.

stop Stop all programs listed in the control file. If the group option is set, only stop the programs in the named group.

stop name Stop the named program. The name must exist in the monitrc file, after a check keyword. See also the MONIT HTTPD section below.

restart Stop and start all programs. If the group option is set, only restart the programs in the named group.

restart name Restart the named program. The name must exist in the monitrc file, after a check keyword. also the MONIT HTTPD section below.

status Print status information for each program. If the group option is set, only print the status for the named group.

quit Kill monit daemon process

validate Check all programs and start the ones not running. Also if a program indicates (in the control file) that it's listening on a port number, although monit cannot connect to the port, then restart the program. This action is also the default behavior when monit runs in daemon mode.


LOGGING

monit will log status and error messages to a log file. If syslog is given as a value for the -l option (or the keyword set logfile syslog is found in the control file) monit will use the syslog system daemon for logging messages. To turn off logging, simply do not set the logfile in the control file (and of course, do not use the -l switch)


DAEMON MODE

The -d interval option runs monit in daemon mode. You must specify a numeric argument which is a polling interval in seconds.

In daemon mode, monit puts itself in the background and runs continously, monitoring each specified program and then sleeping for the given polling interval.

       Simply invoking
              monit -d 300

will poll all programs described in your ~/.monitrc file every 5 minutes.

It is possible to set a polling interval in your ~/.monitrc file by saying 'set daemon n', where n is an integer number of seconds. If you do this, monit will always start in daemon mode (as long as no action arguments are given).

Only one daemon process is permitted per user; in daemon mode, monit makes a per-user lockfile to guarantee this.

Calling monit with a daemon in the background sends a wakeup signal to the daemon, forcing it to check programs immediately.

The quit argument will kill a running daemon process instead of waking it up.

If you touch or change the .monitrc file while monit is running in daemon mode, this will be detected at the beginning of the next poll cycle. When a changed .monitrc is detected, monit rereads it and reinitialize itself. Note also that if you break the .monitrc file's syntax, the monit daemon will exit after logging the appropriate error message.

monit lock file

monit utilize a lock file to prevent concurrent runs in daemon mode. That is, only one monit daemon is permitted per user. The lock file contains the process id (pid) from the current running monit daemon. If monit is run by the root user the location of the lock file is either /var/run/monit.pid or /etc/monit.pid depending on the operating system. For a non-root user the location of the lock file is $HOME/.monit.pid. The lock file is removed when a monit daemon is stopped.

Normally it's not necessary to consider the location of the lock file but in certain very special situations, you may need to control the location. For instance if you run monit as root on two different machines but with the same file system you will need to change the location of the monit lock file via the -p option. You can also set the location of the lock file on a more permanent basis via this global set-statement in a monitrc control file: (keywords are in capital)

   SET PIDFILE {pidfile}

For instance, set pidfile /run/monit.pid.


INIT SUPPORT

Monit can be run and controlled from init. In case monit crashes init will respawn a new monit process.

You can use either the 'set init' statement in monit's configuration file or use the -I option from the command line. Here's a sample /etc/inittab entry for monit:

  # Run monit in standard runlevels
  mo:2345:respawn:/usr/local/sbin/monit -Ic /etc/monitrc

After you have modified init's configuration file, you can run the following command to re-examine /etc/inittab and start monit:

  telinit q 

For systems without telinit:
  kill -1 1

Make sure that if you run monit from init, that you do not start monit in your startup scripts as well.


GROUP SUPPORT

Program entries in the control file, monitrc, can be grouped together by the group statement. The syntax is simply (keyword in capital):

  GROUP groupname

With this statement it is possible to group similar program entries together and manage them as a whole. Monit provides functions to start, stop and restart a group of programs, like so:

To start a group of programs:

  monit -g <groupname> start

To stop a group of programs:

  monit -g <groupname> stop

To restart a group of programs:

  monit -g <groupname> restart

Show the status of a program group:

  monit -g <groupname> status


MONITORING MODE SELECTION

Monit supports three monitoring modes per service: active, passive and manual. See also the example section below for usage of the mode statement.

In active mode, monit will monitor a service and in case of problems monit will act and raise alerts, start, stop or restart the service. Active mode is the default mode.

In passive mode, monit will passively monitor a service and specifically not try to fix a problem, but it will still raise alerts in case of a problem.

For clustered environments there is also a manual mode. In this mode, monit will enter active mode only if a service was started under monit's control, for example by:

  monit start sybase

or

  monit -g database start (for service group)

If the service wasn't started by monit or was stopped for example by:

  monit stop sybase

or

  monit -g database stop (for service group)

monit will not monitor the service at all. This allows for having services configured in monitrc and start it with monit only if it should run. This could be used for building simple failsafe clusters. For instance, using the heartbeat system (http://linux-ha.org/) to watch the health of nodes and in the case of one machine failure start services on a secondary node.

Appropriate scripts that can call monit to start/stop specific services are needed on both nodes - typical usage:

  FILE                    DESCRIPTION
  -----------------------------------
  /etc/inittab            starts monit
  /etc/rcS.d/S41heartbeat execute "monit start heartbeat"
  /etc/init.d/monit-node1 execute "monit -g node1 start"
  /etc/init.d/monit-node2 execute "monit -g node2 start"

This way hearbeat can easily control the cluster state and if one node fails, hearbeat will start monit-xxxxx on the running node and monit is instructed to start the services of the failing node and monitor them...


ALERT MESSAGES

monit will raise an email alert if:

 o A program timed out
 o A program was restarted
 o A program was stopped
 o A timestamp test didn't pass
 o A resource statement match (see also the section RESOURCE
   TESTING below)
 o A checksum error occurred (see also the section MD5 CHECKSUM
   below)

More than one alert statement can be used in a process entry. This means that you can send different emails to different addresses. The full syntax for the alert statement is as follows (keywords are in capital):

 ALERT mail-address [{events}] [MAIL-FORMAT {mail-format}]

Simply using:

 alert foo@bar

will send a default email alert to the address foo@bar whenever a timeout, restart, checksum, resource, stop or timestamp error occurs.

If you only want an alert message sent when a certain event occurs for example a timeout or when a program is restarted, postfix the alert-statement respectively

 alert foo@bar only on { timeout } or
 alert foo@bar { timeout }

(only and on are noise keywords, ignored by monit)

or

 alert foo@bar { restart }

The same applies for a checksum error

 alert foo@bar { checksum }

It is also possible to combine events and send mail to different email addresses like:

 alert foo@bar { restart, timeout, resource } 
 alert security@bar on { checksum, stop }
 alert manager@bar

This will send an alert message to foo@bar when a timeout, resource or restart occurs and a message to security@bar if a checksum error or stop occurs. And finally, a message to manager@bar whenever any error event occurs.


The following alert-statement:

 alert foo@bar { timeout
                 restart
                 checksum
                 resource
                 stop
                 timestamp }

is equivalent to:

 alert foo@bar

which as stated above, will send a message when a timeout, a restart, checksum, resource, stop or timestamp error occurs. (If the postfix variant is used, then note that the parenthesis are mandatory).

A restart alert is also sent if monit fails to execute a start or stop program for an entry. It is therefor strongly advised that at least one alert statement register interest for restart alerts.

monit will provide a default mail message layout that is short and to the point. Here's an example of a standard alert mail sent by monit:

 From: monit@tildeslash.com
 Subject: monit alert -- apache restarted
 To: hauk@tildeslash.com
 Date: Tue, 28 May 2002 20:42:30 +0200
 Program apache restarted
        Date: Tue May 28 20:42:30 2002
        Host: www.tildeslash.com
 Your faithful employee,
 monit

If you want to, you can change the format of this message with the optional mail-format statement. The syntax for this statement is as follows:

 mail-format {
      from: monit@localhost
   subject: apache $EVENT at $DATE
   message: Monit restarted $PROGRAM at $DATE on $HOST. 
     Your joke for today is:
     Things You Do Not Want Your System Administrator to Say:
        * Ooops.
        * Wow!! Look at this ...
        * Hey!! The Suns don't do this.
        * Terminated??!
        * What software license?
        * Well, it's doing something ...
        * Wow! ... That seemed fast ...
        * Where's the DIR command?
        * Why is my "rm" taking so long?
        * System coming down in 0 min ...
 }

Where the keyword from: is the email address monit should pretend it is sending from. It does not have to be a real mail address, but must be a proper formated mail address, on the form; name@domain. The keyword subject: is for the email subject line. The subject must be on only one line. The message: keyword denotes the mail body. If used, this keyword should always be the last in a mail-format statement. The mail body can be as long as you want and must not contain the '}' character.

All of these format keywords are optional but you must provide at least one. Thus if you only want to change the from address monit is using you can do:

 alert foo@bar with mail-format { from: bofh@xyzzy.no }

From the previous example you will notice that 4 special variables was used. If used they will be substituted into the text with a special value:

$EVENT A string describing the event that occured. The values are fixed and are, ``restarted'', ``timed out'', ``stopped and ''checksum error``

$PROGRAM The program entry name in monitrc

$DATE The current time and date (C time style).

$HOST The name of the host monit is running on

Setting a global mail format

Finally, it is possible to set a standard mail format with the following global set-statement (keywords are in capital):


 SET MAIL-FORMAT {mail-format}

Format set with this statement will apply to every alert statement that does not have its own specified mail-format. This statement is most usefull for setting a default from address for messages sent by monit, like so:

 set mail-format { from: monit@foo.bar.no }


PROGRAM TIMEOUT

monit provides a program timeout mechanism for situations where a program simply refuses to start or respond over a longer period. In cases like this, and particularly if monits poll-cycle is low, monit will simply increase the machine load by trying to restart the program.

The timeout mechanism monit provides is based on two variables, i.e. the number the program has been started and the number of poll-cycles. For example, if a program had x restarts within y poll-cycles (where x <= y) then monit will timeout and not (re)start the program on the next cycle. It's a good idea to use the alert statement in conjunction with timeout, so if a timeout occurs monit will send an alert notification. A legal (but verbose) way to write a timeout statement for a program entry in the control file is:

 timeout if 3 restarts within 3 cycles

The shorthand version is:

 timeout(3,3)

Where the first digit is the number of program restarts, the second is the number of poll-cycles. If the number of cycles was reached without a timeout, the program start-counter is reset to zero. This provides some granularity to catch expectional cases and do a program timeout, but to let occasional program restarts happen without having an accumulated timeout.

If you use timeout (it's optional), then be sure to add an alert statement to notify the responsible administrator. Such as:


 timeout(3, 5) and alert bofh@foo.bar on { timeout }

To have monit check the program again after a timeout, run 'monit start program' from the command line. This will remove the timeout lock in the daemon and make the daemon start and check the program again.


RESOURCE TESTING

Monit can examine how much system resources a service or the system is using.

Depending on this indicator services can be stopped or restarted and alerts can be generated. Thus it is possible to utilize systems which are idle and to spare system under high load.

The full syntax for the resource-statements used for resource testing is as follows (keywords are in capital and optional statements in [brackets]),

 resource operator value [cycles] action

resource is a choice of ``CPUUSAGE'', ``MEMUSAGE'', ``MEMKBYTE'', ``LOADAVG([1min|5min|15min])'':

CPUUSAGE is the CPU usage of the process and it's children in parts of hundred (percent). This resource value is a floating point number. For instance, 60.0.

MEMUSAGE is the memory usage of the process in parts of hundred (percent). This resource value is also a floating point number.

MEMKBYTE is the memory amount of the process in KiB (1024 byte). This resource value is an integer number.

LOADAVG([1min|5min|15min]) refers to the system's load average. The load average is the number of processes in the system run queue averaged over the specified time period. This resource value is again a floating point number.

operator is a choice of ``<'',``>'',``!='',``=='' in c notation, ``gt'', ``lt'', ``eq'', ``ne'' in shell sh notation and ``greater'', ``less'', ``equal'', ``notequal'' in human readable form (if not specified, default is EQUAL).

cycles is the maximum number of cycles the expression above has to be true in order to start an action. If cycles is omitted then it is set to one.

action is a choice of ``ALERT'', ``RESTART'', ``STOP'':

ALERT sends the user a resource alert in case the maximum number of cycles has been reached.

RESTART restarts the service in case the maximum number of cycles has been reached.

STOP stops the service in case the maximum number of cycles has been reached. If monit stops a service it will not be checked by monit anymore nor restarted again later. You must explicit start it again from the web interface or from the console, like: 'monit start apache' if you want the monit daemon to monitor the service again.

To calculate the cycles, a counter is raised whenever the expression above is true and it is lowered whenever it is false (but not below 0). All counters are reseted in case of a restart.

In order to check that the CPU usage of a service is not going beyond 50% for five cycles before restarting it, the following expression could be used:

 if cpuusage is greater than 50.0 for 5 cycles then restart

Or the short version without noise keywords:

 cpuusage > 50.0 5 restart

See also the example section below.


TIMESTAMP TESTING

Monit can watch the timestamp of any file or directory associated with a program.

The full syntax for the timestamp statement is as follows (keywords are in capital and optional statements in [brackets]):

 TIMESTAMP object [operator] value [unit] action

object is a path to the associated file or directory to watch.

operator is a choice of ``<'',``>'',``!='',``=='' in c notation, ``GT'', ``LT'', ``EQ'', ``NE'' in shell sh notation and ``GREATER'', ``LESS'', ``EQUAL'', ``NOTEQUAL'' in human readable form (if not specified, default is EQUAL).

value is a time watermark.

unit is either ``SECOND'', ``MINUTE'', ``HOUR'' or ``DAY'' (it is also possible to use ``SECONDS'', ``MINUTES'', ``HOURS'', or ``DAYS'').

action is a choice of ``ALERT'', ``RESTART'', ``STOP'':

 o ALERT sends the user a timestamp alert.
 o RESTART restarts the service.
 o STOP stops the service. If monit stops a service it will not
   be checked by monit anymore nor restarted again later. You
   must explicit start it again from the web interface or from
   the console, like: 'monit start apache' if you want the monit
   daemon to monitor the service again.

The timestamp statement is useful for monitoring systems, that are able to report its state by changing the timestamp of certain state files. For instance the iPlanet Messaging server stored process system update the timestamp of:

 o stored.ckp
 o stored.lcu
 o stored.per

whenever it runs tasks and if a task failed, the system keeps the timestamp.

To report stored problems you can use following statements:

 if timestamp "/msg-foo/config/stored.ckp" > 1 minute then alert
 if timestamp "/msg-foo/config/stored.lcu" > 5 minutes then alert
 if timestamp "/msg-foo/config/stored.per" > 1 hour then alert

or the equivalent less verbose form:

 timestamp "/msg-foo/config/stored.ckp" > 60 alert
 timestamp "/msg-foo/config/stored.lcu" > 300 alert
 timestamp "/msg-foo/config/stored.per" > 3600 alert

As mentioned above, you can also use the timestamp statement for monitoring directories for changes. If files are added or removed to/from a directory, its timestamp is changed:

 if timestamp "/foo/directory" > 1 hour then alert

or

 if timestamp "/foo/secure/directory" < 1 hour then alert

The following example is a neat trick for restarting a process after a certain time. Sometimes this is a necessary workaround for some products, until the vendor fix a problem:

 if timestamp "/var/run/crappy_program.pid" > 7 days then restart


CONNECTION TESTING

Monit is able to perfom connection testing via networked ports and via unix sockets.

If a program listens on one or more sockets, monit can connect to the port (using either tcp or udp) and verify that the program will accept a connection and that it is possible to read and write to the socket. If a connection is not accepted or if there is a problem with the socket i|o, monit will assume that something is wrong and restart the program. Additionally, if monit is compiled with openssl support ssl forged network services can be tested, too.

The full syntax for the port-statement used for connection testing is as follows (keywords are in capital and optional statements in [brackets]) for networked ports,

 [HOST hostname] PORT number 
         [TYPE {TCP|UDP|TCPSSL [CERTMD5 md5sum]}] 
         [PROTO(COL) {name} [REQUEST {"/path"}]]

or for unix sockets,

 UNIX(SOCKET) path [TYPE {TCP|UDP}] [PROTO(COL) {name} 
  [REQUEST {"/path"}]]

To have monit check a port connection use the following statement:

  port 80

In this case the machine in question is assumed to be localhost and monit will issue a tcp connection to localhost at port 80. Monit will use tcp by default, if you want to connect with udp, you can specify this after the port-statement;

 port 53 use type udp ('use' is a noise keyword)

The TCPSSL statement accepts optionally the md5 sum of the server's certificate. The md5 sum is matched against the one delivered by the server. In case they do not match the connection test fails.

In case a server is listening to a unix socket called /var/run/mysocket, the following statement can be used:

 unix /var/run/mysocket

If your machine answers for several virtual hosts you can prefix the port statement with a host-statement like so:

 host www.sol.no     port 80
 host shop.sol.no    port 443
 host kvasir.sol.no  port 80
 host 10.2.3.4       port 80

And as mentioned above, if you do not specify a host-statement, localhost is assumed.

Finally, monit also knows how to speak some of the more popular Internet protocols. So, besides testing for connections, monit can also speak with the server in question to verify that the server works. For example, the following is used to test a http server:


 host www.tildeslash.com port 80 protocol http

At the moment monit knows how to speak HTTP, SMTP, FTP, POP, IMAP, NNTP and SSH. If you have compiled monit with ssl support, monit can also speak HTTPS, FTPS, POPS and IMAPS.

Some protocols also support a request statement. This statement can be used to ask the server for a special document entity.

Currently only the HTTP protocol module supports the request statement, such as:

 host www.myhost.com port 80 protocol http 
   request "/data/show.php?a=b&c=d"

The request should contain an URI string specifying a document from the http server. The string will be url encoded by monit before it sends the request to the http server, so it's okay to use url unsafe characters in the request.

If the request statement isn't specified, the default web server page will be requested.

It is of course possible to mix networked ports and unix sockets checks for a service.

See also the example section below.


MONIT HTTPD

If specified in the control file, monit will start a monit daemon with http support. From a Browser you can then start and stop programs as well as view the status of each program. Also, if monit logs to its own file, you can view the content of this logfile in a Browser.

The control file statement for starting a monit daemon with http support is a global set-statement:

  set httpd port 2812

And you can use this URL, http://localhost:2812/, to access the daemon from a browser.

The port number, in this case 2812, can be any number that you are allowed to bind to.

If you have compiled monit with openssl support, you can also start the httpd server with ssl support, using the following expression:

  set httpd port 2812 
        ssl enable
        pemfile /etc/certs/monit.pem

And you can use this URL, https://localhost:2812/, to access the monit web server over an ssl encrypted connection.

The pemfile, in the example above, holds the private key and the certificate. This file should be stored on a safe place on the filesystem and should have strict permissions, that is, no more than 0700. For more information on how to generate this file, please consult README.ssl

In addition, if you want to check for client certificates you can use the CLIENTPEMFILE statement. In that case a connecting client has to have a sufficient key and certificate in order to connect. This file also needs to have all necessary CA certificates. A configuration could look like:


  set httpd port 2812 
        ssl enable
        pemfile /etc/certs/monit.pem
        clientpemfile /etc/certs/monit-client.pem

By default self signed certificates are not allowed. In case you need to use them it has to be allowed explicitly with the ALLOWSELFCERTIFICATION statement.

If you only want the http server to accept connect requests to one host addresses you can specify the bind address either as an IP number string or as a hostname. In the following example we bind the http server to the loopback device. In other words the http server will only be reachable from localhost:

  set httpd port 2812 and use the address 127.0.0.1

or

  set httpd port 2812 and use the address localhost

or with ssl

  set httpd port 2812 
      ssl enable
      address localhost 
      pemfile /var/certs/monit.pem

If you do not use the ADDRESS statement the http server will accept connections on any/all local addresses.

If you remove the httpd statement from the config file, monit will stop the httpd server on its next cycle. Likewise if you change the port number, monit will restart the http server using the new specified port number.

The status page displayed by the monit web server is automatically refreshed with the same poll time set for the monit daemon.

Note:

You must start a monit daemon with http support if you want to be able to use the following console commands.

 'monit stop'
 'monit start program' 
 'monit stop program' 
 'monit restart program' 
 'monit -g groupname start' 
 'monit -g groupname stop' 
 'monit -g groupname restart'

If a monit daemon is running in the background we will ask the deamon (via the HTTP protocol) to execute the above commands. That is, the daemon is requested to start and stop the programs. This ensures that a daemon will not restart a program that you requested to stop and that (any) timeout lock will be removed from a program when you start it.

Monit HTTPD Authentication

monit supports two types of autenthication schemas for connecting to the httpd server. Both schemas can be used together or by itself. You must choose at least one.

Host allow list

The http server maintains an access-control list of hosts allowed to connect to the server. You can add as many hosts as you want to, but only hosts with a valid domain name or its IP address are allowed. If you specify a host that does not resolve, monit will write an error message in the console and not start.

The http server will query a nameserver to check any hosts connecting to the server. If a host (client) is trying to connect to the server, but cannot be found in the access list or cannot be resolved, the server will shutdown the connection to the client promptly.

Control file example:

  set httpd port 2812
      allow localhost
      allow my.other.work.machine.com
      allow 10.1.1.1

Basic Authentication

This authentication schema is HTTP specific and described in more detail in RFC 2617.

In short; a server challenge a client (e.g. a Browser) to send authentication information (username and password) and if accepted, the server will allow the client access to the requested document.

The biggest weakness with Basic Authentication is that the username and password is sent in clear-text (i.e. base64 encoded) over the network. It is therefor recommended that you do not use this authentication method unless you run the monit http server with ssl support. With ssl support it is completely safe to use Basic Authentication since all http data, including Basic Authentication headers will be encrypted.

monit will use Basic Authentication if an allow statement contains a username and password separated with a single ':' character, like so; allow username:password. The username and password must be written in clear-text. Only one username and password pair is supported.

If you use this method together with a host list, then only clients from the listed hosts will be allowed to connect to the monit http server and each client will be asked to provide a username and password.

Example:

  set httpd port 2812
      allow localhost
      allow my.other.work.machine.com
      allow 10.1.1.1
      allow hauk:monit

If you only want to use Basic Authentication, then just provide the one line with username and password, like:

  set httpd port 2812
      allow hauk:monit

If you use Basic Authentication it is a good idea to set the access permission for the control file (~/.monitrc) to only readable and writeable for the user running monit, because the password is written in clear-text. (Use this command, /bin/chmod 600 ~/.monitrc). In fact, since monit version 3.0, monit will complain and exit if the control file is readable by others.


MD5 CHECKSUM

If specified in the control file, monit will compute a md5 checksum for programs. The checksum is used to verify that a program does not change. If a program was changed, monit will send an (optional) alert notification, log an alert message and not check the process anymore. The web interface will also show a checksum warning.

The rationale for this feature is security and that monit does not start a possible cracked program or script.

The full syntax for the checksum-statement is as follows: (keywords are in capital)

 CHECKSUM [file [EXPECT checksum] ]+

A legal (but verbose) way to write a checksum statement for a process entry in the control file is:

 checksum the /usr/bin/httpd program

The shorthand version is just:

 checksum /usr/bin/httpd

Several files can be used in a checksum statement:

 checksum /usr/apache/bin/httpd /usr/apache-ssl/bin/httpsd

or on a line by itself:

 checksum /usr/apache/bin/httpd
 checksum /usr/apache-ssl/bin/httpsd

You can add as many 'checksum file' statements as you want. Like described above, if the checksum for a file changes, monit will log a warning, issue an alert message and not check the associated process anymore.

The expect statement is optional and used to specify a md5 string monit should expect when testing a file's checksum. If this statement is used monit will not compute an initial checksum for the file, as in the examples above, but instead use the string you submit. For example:

 checksum /usr/bin/httpd expect 8f7f419955cefa0b33a2ba316cba3659

or verbose style;

 checksum /usr/bin/httpd and 
   expect the sum 4e5309d1956f003bcdff168748bea647

You can, for example, use the GNU utility md5sum to create a checksum string for a file and then use this string in the expect-statement.


DEPENDENCY CHECKING

If specified in the control file, monit can do dependency checking before starting or stopping processes. The syntax for the depend statement is simply:

 DEPENDS [process]+

Where process is a process entry name, for instance apache. You may add more than one process name or use more than one depend statement in a check entry.

Processes specified in a depends statement will be checked during stop/start operations. If a process is stopped it will first stop any processes that depends on itself. Likewise, if a process is started, it will first stop any processes that depends on itself and after it is started, start all depending processes again.

Consider the following common server setup:

   (a) WEB-SERVER -> (b) APPLICATION-SERVER -> (c) DATABASE

You can set dependencies so that the web-server depends on the application server to run before the web-server starts and the application server depends on the database server. See also the example section below for examples using the depend statement.

Here we describe how monit will function with the above dependencies:

If no servers are running
monit will start the servers in the following order: c, b, a

If all servers are running
When you run 'monit stop' this is the stop order: a, b, c. If you run 'monit stop c' then a and b are stopped because they depend on c and finally c is stopped.

If a does not run
When monit runs it will start a

If b does not run
When monit runs it will first stop a then start b and finally start a again.

If c does not run
When monit runs it will first stop a and b then start c and finally start b then a.

If the control file contains a depend loop.
A depend loop is for example; a->b and b->a or a->b->c->a.

When monit starts it will check for any such loops and complain and exit if a loop was found. It will also exit with a complaint if a depend statement was used that does not point to any processes in the controlfile.


THE RUN CONTROL FILE

The preferred way to set up monit is to write a .monitrc file in your home directory. When there is a conflict between the command-line arguments and the arguments in this file, the command-line arguments take precedence. To protect the security of your control file and passwords the control file must have permissions no more than 0700 (u=xrw,g=,o=); monit will complain and exit otherwise.

Run Control Syntax

Comments begin with a '#' and extend through the end of the line. Otherwise the file consists of a series of program entries or global option statements in a free-format, token-oriented syntax.

There are three kinds of tokens: grammar keywords, numbers (i.e. decimal digit sequences) and strings. Strings can be either quoted or unquoted. A quoted string is bounded by double quotes and may contain whitespace (and quoted digits are treated as a string). An unquoted string is any whitespace-delimited token, containing characters and/or numbers.

Each program entry consists of the keywords `check', followed by a unique descriptive name for the program, which is again followed by a path to the program's pidfile. A check entry can have a number of optional statements. These statements are described below and in the example section.

You can use noise keywords like 'if', `and', `with(in)', `has', `using', 'use', 'on(ly)' and `program' anywhere in an entry to make it resemble English. They're ignored, but can make entries much easier to read at a glance. The punctuation characters ';' ',' and '=' are also ignored. Keywords are case insensitive.

 Here are the legal global keywords:
 Keyword         Function
 -----------------------------------------------------------
 set daemon      Set a background poll interval in seconds
 set init        Set monit to run from init
 set logfile     Name of a file to dump error- and status-
                 messages to. If syslog is specified as the 
                 file, monit will utilize the syslog daemon
                 to log messages.
 set mailserver  The mailserver used for sending alert
                 notifications. If the mailserver is not 
                 defined, monit will try to use 'localhost' 
                 as the smtp-server for sending mail.
 set mail-format Set a global mail format for all alert
                 messages emitted by monit.
 set httpd port  Activates monit http server at the given 
                 portnumber.
 ssl enable      Enables ssl support for the httpd server.
                 It requires the use of the pemfile statement.
 ssl disable     Disables ssl support for the httpd server.
                 It is equal to omitting any ssl statement.
 pemfile         Set the pemfile to be used with ssl.
 clientpemfile   Set the pemfile to be used when client's
                 certificates should be checked too.
 address         If specified, the http server will only 
                 accept connect requests to this addresses
                 This statement is an optional part of the
                 set httpd statement.
 allow           Specifies a host or IP address allowed to
                 connect to the http server. Can also specify
                 a username and password allowed to connect
                 to the server. More than one allow statement
                 are allowed. This statement is also an 
                 optional part of the set httpd statement.
 Here are the legal program entry keywords:
 Keyword         Function
 ------------------------------------------------------------
 check           Starts an entry and must be followed by a 
                 descriptive name for the program.
 pidfile         Specify the  programs pidfile. Every 
                 program must create a pidfile with its 
                 current process id
 group           Specify a groupname for a program entry.
 start           The program for starting the specified 
                 process. Full path is required. This 
                 statement is optional.
 stop            The program for stopping the specified 
                 process -- full path is required. This 
                 statement is optional.
 host            The hostname or IP address to test the port
                 at. This keyword can only be used together
                 with a port statement.
 port            Specify a TCP/IP service port number which 
                 the program is listening on. This statement
                 is also optional. If this statement is not
                 prefixed with a host-statement, localhost is
                 used as the hostname to test the port at.
 type            Specifies the type of socket monit should 
                 use when testing a connection to the port.
                 If the type keyword is omitted, tcp is 
                 used. This keyword must be followed by 
                 either tcp or udp.
 tcp             Specifies that monit should use a TCP 
                 socket type (stream) when testing the port.
 tcpssl          Specifies that monit should use a TCP 
                 socket type (stream) which is embeds a ssl
                 connection when testing the port.
 udp             Specifies that monit should use a UDP socket
                 type (datagram) when testing the port.
 certmd5         The md5 sum of a certificate a ssl forged 
                 server has to deliver.
 proto(col)      This keyword specifies the type of service 
                 found at the port. monit knows at the moment 
                 how to speak HTTP, SMPT, FTP, POP and IMAP. 
                 You're welcome to write new protocol test 
                 modules. If no protocol is specified monit 
                 will use a default test which in most cases 
                 are good enough.
 request         Specifies a server request and must come
                 after the protocol keyword mentioned above.
                  - for http it can contain an URI and an
                    optional query string.
                  - other protocols doesn't support this
                    statement yet
 unix(socket)    Specifies a unix socket file and used like 
                 the port statement above to test a Unix 
                 domain network socket connection.
 timeout         Define program timeout.  Must be followed by
                 two digits. The first digit is max number of
                 restarts for  the program.  The second digit
                 is the cycle interval to test restarts. 
                 This statement is optional
 alert           Specifies an email address for notification
                 if checksum, timeout, restart, stop or timestamp
                 occurs.  Alert can also be postfixed, to only
                 send a message for certain events. See the
                 examples above. More than one alert statement
                 is allowed in an entry. This statement is also
                 optional.
 mail-format     Specifies a mail format for an alert message 
                 This statement is an optional part of the
                 alert statement.
 checksum        Specify that monit should verify a checksum
                 for associated files.
                 More than one checksum statement are allowed.
 expect          Specifies a checksum string (md5) monit 
                 should use when testing the checksum. This
                 statement is an optional part of the 
                 checksum statement.
 timestamp       Specifies expected timestamp for given object
                 and optional action. More than one timeout
                 statement are allowed.
 every           Validate this entry only at every n poll 
                 cycle. Usefull in daemon mode when the
                 poll-cycle is short and the program takes
                 some time to start. 
 mode            Must be followed either by the keyword active,
                 passive or manual. If active, monit will restart
                 the program if it is not running (this is the
                 default behaviour). If passive, monit will not
                 (re)start the program if it is not running - it
                 will only monitor and send alerts (resource
                 related restart and stop options are ignored
                 in this mode also). If manual, monit will enter
                 active mode only if a service was started under
                 monit's control otherwise the service isn't
                 monitored.
 cpuusage        Must be followed by a compare operator, a 
                 floating point number, optionally a maximum
                 number of cycles and an action. This statement
                 is used to check the cpu usage in percent of a
                 process with it's children over a number of
                 cylces.  If the compare expression matches then
                 the action restart, alert or stop is activated
 memusage        The equivalent to cpuusage for memory of a 
                 process (w/o children!). The syntax is the same
                 as above.
 memkbyte        The equivalent to memusage but with amounts 
                 in Kb instead of percentages.
 loadavg         Must be followed by [1min,5min,15min] in (), a 
                 compare operator, a floating point number,
                 optionally a maximum number of cycles and an
                 action.  This statement is used to check the
                 system load average over a number of cylces. If
                 the compare expression matches then the action 
                 start, alert or stop is avtivated.
 depends (on)    Must be followed by the name of a process this
                 process depends on to run before it starts.

Here's the complete list of reserved keywords used by monit:

set, daemon, logfile, syslog, address, httpd, ssl, enable, disable, pemfile, allow, check, init, pidfile, group, start, stop, port(number), unix(socket), type, proto(col), tcp, tcpssl, udp, alert, mail-format, restart, timeout, checksum, resource, expect, mailserver, every, mode, active, passive, manual, depends, host, default, http, ftp, smtp, pop, nntp, imap, ssh, request, cpuusage, memusage, memkbyte, loadavg, timestamp, second(s), minute(s), hour(s) and day(s).

And here is a complete list of noise keywords ignored by monit:

if, is, are, on(ly), with(in), and, has, using, use, the, sum, restarts, program(s), cycle(s), than, then, for.

Note: If the start or stop programs are shell scripts, then the script must begin with #! and the remainder of the first line must specify an interpreter for the program. E.g. #!/bin/sh

It's possible to write scripts directly into the start and stop entries by using a string of shell-commands. Like:

 start: "/bin/sh -c { echo $$ > pidfile; exec program }"
 stop:  "/bin/sh -c { kill -s SIGTERM `cat pidfile` }"

CONFIGURATION EXAMPLES

The simplest form is just the check statement. In this example we check to see if the server is running and log a message if not:

 check resin with pidfile /usr/local/resin/srun.pid

To have monit start the server if it's not running, add a start statement:

 check resin with pidfile /usr/local/resin/srun.pid
   start program = "/usr/local/resin/bin/srun.sh start"

Here's a more advanced example for monitoring an apache web-server listening on the default portnumber for HTTP and HTTPS. In this example monit will restart apache if it's not accepting connections at the portnumbers. The method monit use for a process restart is to first execute the stop-program, wait for 10 seconds (to give the program time to terminate) and then execute the start-program.

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   port 80   
   port 443

In this example we use udp for connection testing to check if the 'named' service is running and also use timeout and alert:

 check named with pidfile /var/run/named.pid
   start program = "/etc/init.d/named start"
   stop program  = "/etc/init.d/named stop"
   port 53 use type udp
   timeout (3,5) 
   alert bofh@norid.no

The following example illustrate how to check if the service 'sophie' is answering connections on its unix domain socket:

 check sophie with pidfile /var/run/sophie.pid
   start program = "/etc/init.d/sophie start"
   stop  program = "/etc/init.d/sophie stop"
   unix /var/run/sophie

In this example we check an apache web-server running on localhost that answers for several IP-based virtual hosts or vhosts, hence the host statement before port:

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   host www.sol.no          port 80
   host shop.sol.no         port 443
   host chat.sol.no         port 80
   host www.tildeslash.com  port 80

In the following example we ask monit to compute and verify the checksum for the underlying apache binary used by the start and stop programs.

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   host www.tildeslash.com  port 80
   checksum /usr/local/apache/bin/httpd

Some servers are slow starters, like for example Java based Application Servers. So if we want to keep the poll-cycle low (i.e. < 60 seconds) but allow some programs to take its time to start, the every statement is handy:

 check dynamo with pidfile /etc/dynamo.pid
   start program = "/etc/init.d/dynamo start"
   stop program  = "/etc/init.d/dynamo stop"
   port 8840
   every 2 cycle

Here is an example where we group together two database entries. The mode statement is also illustrated in the first entry and have the effect that monit will not try to (re)start this program if it is not running:

 check sybase with pidfile /var/run/sybase.pid
   start program = "/etc/init.d/sybase start"
   stop program  = "/etc/init.d/sybase stop"
   mode passive
   group database
 check oracle with pidfile /var/run/oracle.pid
   start program = "/etc/init.d/oracle start"
   stop program  = "/etc/init.d/oracle stop"
   mode active # Not necessary really, since this is the default
   port 9001
   alert bofh@foo.bar
   group database

Here is an example to show the usage of the resource checks. It will send an alert when the CPU usage of the http daemon and it's child processes raises beyond 60% for over two cycles. It is restarted when the CPU usage it over 80% for five cycles, the memory usage over 100Mb for five cycles or the load average is beyond 10 for 8 cycles:

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   if cpuusage > 60.0 for 2 cycles then alert
   if cpuusage > 80.0 for 5 cycles then restart
   if memkbyte > 100000 for 5 cycles then stop
   if loadavg(5min) greater than 10.0 for 8 cycles then stop

In this example we demonstrate usage of the extended alert statement:

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   host www.tildeslash.com  port 80
   checksum /usr/local/apache/bin/httpd
   alert security@bar on {checksum}
   alert admin@bar on {restart, timeout} 
     with mail-format { 
              from:     bofh@$HOST
              subject:  apache $EVENT
              message:  This event occurred on $HOST at $DATE.
              Your faithful employee,
              monit
     }
   timeout (3, 5) 
   group server

In this example, we demonstrate usage of the depend statement. In this case, we want to start oracle and apache. However, we've set up apache to use oracle as a backend, and if oracle is restarted, apache must be restarted as well.

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop  program = "/etc/init.d/httpd stop"
   depends on oracle
 check oracle with pidfile /var/run/oracle.pid
   start program = "/etc/init.d/oracle start"
   stop program  = "/etc/init.d/oracle stop"
   port 9001

In the next example, we have a standard three tier setup with apache, an application server and a back end database server. Dependencies are setup so should oracle be restarted then both apache and the weblogic application server are also restarted.

 check apache with pidfile /var/run/httpd.pid
   start program = "/etc/init.d/httpd start"
   stop  program = "/etc/init.d/httpd stop"
   depends on weblogic
 check weblogic with pidfile /var/run/weblogic.pid
   start program = "/etc/init.d/weblogic start"
   stop  program = "/etc/init.d/weblogic stop"
   depends on oracle
 check oracle with pidfile /var/run/oracle.pid
   start program = "/etc/init.d/oracle start"
   stop program  = "/etc/init.d/oracle stop"
   port 9001

Next, we have 2 programs oracle-import and oracle-export that need to be restarted if oracle is restarted, but are independent of each other.

 check oracle with pidfile /var/run/oracle.pid
   start program = "/etc/init.d/oracle start"
   stop program  = "/etc/init.d/oracle stop"
   port 9001
 check oracle-import with pidfile /var/run/oracle-import.pid
   start program = "/etc/init.d/oracle-import start"
   stop  program = "/etc/init.d/oracle-import stop"
   depends on oracle
 check oracle-export with pidfile /var/run/oracle-export.pid
   start program = "/etc/init.d/oracle-export start"
   stop  program = "/etc/init.d/oracle-export stop"
   depends on oracle

Finally an example with all statements:

 check apache with pidfile /var/run/httpd.pid
   group server
   start program = "/etc/init.d/httpd start"
   stop program  = "/etc/init.d/httpd stop"
   checksum /usr/local/apache/bin/httpd
    and expect the sum 8f7f419955cefa0b33a2ba316cba3659
   host www.sol.no     port 80  type tcp protocol http
   host kvasir.sol.no  port 80  type tcp protocol http 
      and use the request "/login.cgi"
   host shop.sol.no port 443 type tcpssl proto http
   timeout (2,3) 
   if cpuusage is greater than 60.0 for 2 cycles then alert
   if cpuusage > 80.0 for 5 cycles then restart
   if memkbyte > 100000 then stop
   if timestamp "/usr/local/apache/logs/httpd.pid" > 7 days 
       then restart
   alert foo@bar on { checksum } 
   alert bofh@bar with mail-format {from: monit@foo.bar.no}
   every 2 cycles
   mode active
   depends on weblogic

Note; only the check- and pidfile statement are mandatory, the other statements are optional and the order of the optional statements is not important.


FILES

~/.monitrc Default run control file

./monitrc If the control file is not found in the default location, and the current working directory contains a monitrc file, this file is used instead.

/etc/monitrc If the control file is not found in either of the previous two locations, and /etc contains a monitrc file, this file will be used instead.

~/.monitrc.pid Lock file to help prevent concurrent runs (non-root mode).

/var/run/monit.pid Lock file to help prevent concurrent runs (root mode, Linux systems).

/etc/monit.pid Lock file to help prevent concurrent runs (root mode, systems without /var/run).


SIGNALS

If a monit daemon is running, SIGUSR1 wakes it up from its sleep phase and forces a poll of all processes. SIGTERM will gracefully terminate a monit daemon. This signal is sent to a monit daemon if monit is started with the quit action argument. Sending a SIGHUP signal to a running monit daemon will force the daemon to reload itself, specifically it will close and reopen log files.

Running monit in foreground while a background monit daemon is running will wake up the daemon.


NOTES

This is a very silent program. Use the -v switch if you want to see what monit is doing, and tail -f the logfile.

The syntax (and parser) of the control file is inspired by Eric S. Raymond et al. excellent fetchmail program. Some portions of this man page does also receive inspiration from the same authors.


AUTHORS

Jan-Henrik Haukeland <hauk@tildeslash.com>, Martin Pala <martin.pala@hq.iol.cz>, Christian Hopp <chopp@iei.tu-clausthal.de>, Rory Toma <rory@digeo.com>, Thomas Oppel <oppel@kbis.de>

See also http://www.tildeslash.com/monit/who.html


COPYRIGHT

Copyright (C) 2000-2002 by Contributors to the monit codebase. All Rights Reserved. This product is distributed in the hope that it will be useful, but WITHOUT any warranty; without even the implied warranty of MERCHANTABILITY or FITNESS for a particular purpose.


SEE ALSO

GNU text utilities; md5sum(1); openssl(1)