Dual-Hosted WHT ICS

Gary Mitchell

Revision date 2005-11-02 09:55

Index

Purpose of this Document

These notes describe how to change between the live and backup WHT Instrument Control System (ICS). The expected audience of these notes includes Duty Engineers (DEs) and Systems Administrators. While observers could use these notes it is expected that in the event of a problem they would contact the DE in the first instance. The DE would diagnose the problem and may conclude that a change of ICS would be helpful. The section "technical notes" need only be read by UNIX Systems Administrators who may need to resolve problems with the system.

Overview

The WHT ICS is a mission critical system. To avoid the situation where a hardware failure of the computer might cause us to lose observing time the job of the ICS can be done by either of two identical computers.

Components of system

Computers
At the time of writing the two computers used are: Except for brief periods these two hosts are identical. There is no "normal" host. When the hosts are not identical (for example because of operating system software development or hardware enhancement) the difference and the preferred hostname will be on the whiteboard in the WHT control room.

Both these computers use the same copy of the observing system applications.

At any one time one (and only one) of these computers can be used as the WHT ICS. Whichever one is the WHT ICS responds on the network not only to it's own hostname but also to the hostname "taurus". The hostname "whtics" is a synonym for taurus.

Keyboard and Mouse
Although there are two computers there is only one keyboard and one mouse. These connect to a "Keyboard/Video/Mouse" (KVM) switch. The KVM switch is located below the monitors. A simple push-button forwards the connection from the keyboard/mouse to either lpss89 or lpss90. Please note the push button does not change the video inputs

Technical notes

The KVM switch can not be used to switch video connections because on the WHT ICS we use 3 monitors and the switch is designed only for simple systems where each computer has one monitor.

Monitors
There are 3 monitors

The 3 monitors do not behave identically.

Each monitor has 4 video inputs numbered 1,2,3,4

  1. analog (blue plug)
  2. DVI-D (white plug)
  3. s-video but is not used at ING
  4. composite video - but is not used at ING

Technical notes

Screen A is managed by a single graphics card. Screens B and C are paired together and operated by another graphics card.

Determining which machine is the current live WHT ICS

use any of the following methods

How to switch between the alternate hosts

A reboot of the live WHT ICS is all that is needed for the alternate computer to take on the role of the WHT ICS. As soon as the backup ICS detects that the live ICS has stopped working it takes on the role of the WHT ICS automatically. The keyboard/mouse and monitors need to be connected to the new ICS. See the rest of this section.

The detection of the absence of what was the live ICS and the transition to the new ICS takes about 45s. If the live WHT ICS is not shutdown cleanly but fails due to a hardware error (eg failed power supply) a similar process takes place but it is a bit slower at about 90 seconds.

In the example below we switch from hostA to hostB.

Example Check the mouse and keyboard are interacting with the screen you are viewing.

Frequently Asked Questions

I want to reboot the ICS but I want to continue to use the computer I was using before. How do I do that?

Reboot the old ICS. The ICS will shift to the alternate host. Then reboot that host. The ICS will shift back to the original.

I can't drag the application window between screen A and B. Why?

Dragging between screens is only supported under one or other of the following conditions: Hardware mode is efficient but limited to 2 screens for the current graphics cards. Software mode is a lot less efficient and may have a few bugs which make some ING GUI displays fail to update or otherwise behave quite as expected. Software mode is therefore not used.

If the application is not in the window I want and I can't drag the application to the desired window what can I do?

Exit the application. Pass the mouse to the window you want. From a menu or terminal in that window re-start the application.

Tecnical Notes

These technical notes are for use by UNIX Systems Administrators or Software Developers working with services associated with the WHT ICS.

Mechanisms

The specification required The solution is to have a dual-hosted SCSI disk to store the volatile information and to associate the hostname taurus with that sparc which has ownership of the disk.

Software scripts

these are derived from the scripts used in the ING's NFS file servers.
/etc/init.d/INGdiskset_yellow
A simple bourne shell script which invokes a bash script.
/etc/init.d/INGdiskset_yellow.bash
A bash shell script which attempts to take/release the dual-hosted SCSI disk at startup and shutdown.
/etc/init.d/INGdependent_disk_services
A script to start services which can only be run by a host which has ownership of the dual-hosted SCSI disk. Similarly it stops these same services when the host releases the dual-hosted SCSI disk.
/etc/yellow/init.d/INGpostmaster
The postmaster task can only be allowed to run if the host has the disk. As such this script was removed from /etc/init.d and placed in this sub-directory where it is invoked by /etc/init.d/INGdependent_disk_services as necessary.
/etc/init.d/INGwatch_keyboard
The KVM switch only forwards the USB keyboard/mouse to one host. If a host boots when it is not selected then at boot time it decides not to invoke Xsun. This script watches for the appearance of a keyboard and then starts Xsun.

root cron jobs

/etc/opt/bin/test_taurus >> /var/log/test_taurus.log
every 2 minutes
this script watches ownership of the dual hosted disk and takes action to acquire the disk if the other hosts releases the disk or fails.
/etc/opt/bin/check_shared_metadevices > /var/log/check_shared_metadevices.log
07 08,09,10,11,12,13,14,15,16,17,20 * * 1-5
ie hourly during working hours
watches for damage to the mirrored disk set built on the dual-hosted SCSI disks and sends an alert as necessary.
/etc/opt/bin/backup_whticspgpd >> /var/log/backup_whticspgpd.log
27 11 * * * /etc/opt/bin/backup_whticspgpd
ie daily makes rolling on-line backup copies of the postgress database