The Remote Uptime (RU) Project

The Remote Uptime (RU) Project

History
    I started using Unix systems in approximately 1984 at Colorado State University.    At that time, Unix was BSD or SysV. TCP/IP was new as was networking computers in general. Needless to say, I found Unix to be really cool. Just to date myself, in 1984, the Big Computer on Campus was a CDC Cyber, and the primary computer that ran the Computer Science Departement was a Digital VAX 11/750 (not even a 780!). The computing facilities were HP 9000/300 series workstations/servers with a blazing 16Mhz 68020 CPU and 8 megabytes of memory and HP/IB storage. Needless to say, these weren't powerful machines.
    When assignments were due, the systems got horribly busy. The computing system assigned to a paticular class (if only one) got so slow that it literally was seconds between typing a keystroke and seeing it on the screen. Due to this, the idea of using an idle system from another class became attractive. I would find a non-busy system, do my editing there, and rcp/rlogin to the primary host, do my compiles and go back to editing.
    As part of this, I found the 'ruptime' command. It was very handy for figuring out what systems were on the network and how busy those systems were.
    After I left college, I eventually became a Systems Administrator (SysAdmin). I liked the 'ruptime' functionality for watching the systems in my environment. However, as soon as I turned on the ruptime function, I found that the program had a few major flaws:

The 'rwhod' daemon that collected the information used broadcasts, which meant that it didn't work across subnets.
Each client must run rwhod to show up in the 'ruptime' list. This means that every so often (5 seconds?) each client sent a broadcast declaring it's current status. This caused a HUGE number of broadcasts for any network larger than 10-15 clients.
rwhod stores the information for each client locally. That means that each client has its own list of rwho status. This represented a large amount of wasted storage, as I was the only one looking at the data, and I was looking from a single workstation.
maintaining the rwho daemon and storage was a time-waster. It took time to keep it going.

    Due to the problems above, I ended up turning off rwhod on all of the clients.   However, I still wanted something similar so that I could quickly see system status.

RU is Born
Around this time, I asked a co-worker of mine, Dave Farnham, whether he had a way to implement something similar to ruptime, but without the nasty edges. Dave spent a weekend (he's a wiz) and came up with an RPC based system that operated in a client-server mode. By using RPC, the solution was very portable, and easily ported to pretty much everywhere. The initial version reported only system uptime by using a RPC call from the client to a centralized server that ran a collection daemon written in Perl. This initial version appeared in the 1994-1995 timeframe.
    Over the next year or two, Dave added history reporting, system load graphing, and availability reporting. Around this time, Dave decided to leave the company, and went on to another position at another company.
    I then took over the maintenance of the ru source. Over time, I've added a number of features until I consider it to be a mature, easy to use system status reporting program.
    I find it very handy to simply type 'ru' periodically to see what is going on in the environment. ru is very useful for detecting 'just not right' server problems, such as runaway jobs, high loads, etc.

What is RU?
xterm ru display

    In the above picture the program 'ru' has been executed against a larger environment of servers. Immediately, you can see the system name, the system uptime, the load average, and the number of logged in users. In addition, the coloring of the text shows a number of additional items. The server 'handies1' has been booted only 14 minutes ago. It is shown in yellow to attract attention that the server has booted in the last 24 hours.
    Server 'jaguar' has a * in front of it, which indicates that the system hasn't reported in two minutes. This actually indicates that the clock on jaguar is out of sync with the master server, and that the system in question is really more than two minutes different than the host system. If you deploy ru, you need to synronize your system clocks with NTP (or something similar). It's a good idea to synchronize your clocks for security purposes anyway.
    Below represents many (but not all) of the features of the ru system:

Clients run a daemon called 'rucd' (ru client daemon). This client sends a single RPC packet to the central server 'ruserver' once per minute reporting:

System Uptime
Number of users of the system
Load average of the system
On Linux only, system paging and cpu % used

The central server 'ruserver' runs 3 daemons:

rusd - collects the client reports
ruds - processes client requests for information
rucd - ru client daemon (for the server to report its own status)

The central sever runs a nightly cron job to clean up the daily infomation and to build the availability reports.
On the central system, a program called 'avail' can be run to document system availability.
A perl program 'ru' can be run from anywhere with network reachability to the server to report on the data collected by ru.
The system runs itself. It needs no interaction, and cleans up after itself.
The system will use gnuplot (if installed) to plot system(s) load average over a 24 hour period. This is handy for comparing loading information over time. This works by default if run on a Linux system with gnuplot.
The system has groups to allow segmentation of the clients into: (implict segementation in italic)

Operating system
Architecture
Arbitrary group (firewall, production, lab, etc.)

The system runs on every Unix I've found with minimal changes. The following are known to work:

HP-UX 10 and 11
Linux
SunOS and Solaris
Irix
OS X
Linux on S390

Installation

Detailed instructions are found in the download tarball. However, the below represents a simple overview of the installation:

Install the server daemons and create the data directories.
Add startup scripts to the server.
Start the services.
Add a dns alias to your server of 'ruserver'.
Add a single line to the inittab of each client to start the rucd on boot.
Start the client via a 'init q'.

    Note that the current solution is focused towards a particular production environment. It should be easily possible to redefine flags any way desired with the use of a text editor.

    If you should have questions, see the forums, or contact me at shaw at fmsoft dot com.

Futures
    I'd like to change a few things about how ru works to make it truly no-brainer install-and-manage:

Change the server configuration to look up the client group automatically so that changing the client group can be centralized.
Switch from RPC to a dedicated socket via Perl (major rewrite).

Thanks to SourceForge.net for hosting this project.

SourceForge.net Logo