Dual-Hosted WHT ICS
Gary Mitchell
Revision date 2005-11-02 09:55
Index
These notes describe how to change between the live and backup WHT
Instrument
Control System (ICS).
The expected audience of these notes includes Duty Engineers (DEs) and
Systems
Administrators.
While observers could use these notes it is expected that in the event
of
a problem they would contact the DE in the first instance. The
DE would diagnose the problem and may conclude that a change of ICS
would be helpful.
The section "technical notes" need only be read by UNIX Systems Administrators
who may need to resolve problems with the system.
The WHT ICS is a mission critical system. To avoid the situation
where a hardware failure of the computer might cause us to lose
observing time the job of the ICS can be done by either of two
identical computers.
- Computers
- At the time of writing the two computers used are:
Except for brief periods these two hosts are identical. There
is no "normal" host. When the hosts are not identical (for example
because of operating system software development or hardware
enhancement) the
difference and the preferred hostname will be on the whiteboard
in the WHT control room.
Both these computers use the same copy of the observing
system applications.
At any one time one (and only one) of these computers can be
used as
the WHT ICS. Whichever one is the WHT ICS responds on the network
not only to it's own hostname but also to the hostname "taurus".
The hostname "whtics" is a synonym for taurus.
- Keyboard and Mouse
- Although there are two computers there is only one
keyboard and one mouse.
These connect to a "Keyboard/Video/Mouse" (KVM) switch.
The KVM switch is located below the monitors.
A simple push-button forwards the connection from the keyboard/mouse to
either lpss89 or lpss90.
Please note the push button does not change the video inputs
Technical notes
The KVM switch can not be used to switch video
connections because on the WHT ICS we use 3 monitors and the switch is
designed only for simple systems where
each computer has one monitor.
- Monitors
- There are 3 monitors
- A on the left
- B in the middle
- C on the right
The 3 monitors do not behave identically.
- Application windows can be dragged between monitors B and C.
- Application windows on screen A are limited to screen A.
Each monitor has 4 video inputs numbered 1,2,3,4
-
analog (blue plug)
- DVI-D (white plug)
- s-video but is not used at ING
- composite video - but is not used at ING
Technical notes
Screen A is managed by a single graphics card.
Screens B and C are paired together and operated by another graphics
card.
use any of the following methods
- use telnet to log in as the WHT observer to host "taurus"
Look the prompt - this identifies the host which is the
current live ICS eg
In the example below user commands are shown in bold and the ICS is identified as lpss89 as shown by the
information in itallics.
user@somehost> telnet taurus
Trying 161.72.6.56...
Connected to taurus.roque.ing.iac.es.
Escape character is '^]'.
SunOS 5.9
login: whtobs
Password:
Last login: Wed Nov 2 09:13:50 from fornax.ing.iac.
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
whtobs@lpss89>
|
- use telnet to log in as any authorised user to host "taurus"
and issue the command
hostname to identify the host:
In the example below user commands are shown in bold and the ICS is identified as lpss90 as shown by the
information in itallics.
user@somehost> telnet taurus
Trying 161.72.6.56...
Connected to taurus.roque.ing.iac.es.
Escape character is '^]'.
SunOS 5.9
login: guest01
Password:
Last login: Wed Nov 2 09:13:50 from fornax.ing.iac.
Sun Microsystems Inc. SunOS 5.9 Generic May 2002
guest01> hostname lpss90
|
- The login screen when the computer is the live WHT ICS
"Only Observer should use taurus"
is different from when it is the backup ICS.
"Backup ICS: All ING Users can use lpssNN"
Because the login screen could have been waiting for user input since
before the
computer took on the role of the WHT ICS you should first request
an up to date login screen by selecting with the mouse
"Options" and from the pop-up menu "Reset Login Screen"
A reboot of the live WHT ICS is all that is needed for the alternate
computer to take on the role of the WHT ICS.
As soon as the backup ICS detects that the live ICS has stopped working
it
takes on the role of the WHT ICS automatically.
The keyboard/mouse and monitors need to be connected to the new ICS.
See the rest of this section.
The detection of the absence of what was the live ICS and
the transition to the new ICS takes about 45s.
If the live WHT ICS is not shutdown cleanly but fails due to a
hardware error
(eg failed power supply) a similar process takes place but it is a bit
slower
at about 90 seconds.
In the example below we switch from hostA to hostB.
Example
- To initiate the switch of the ICS to the other computer:
- Either
- Use the shut account to shut or reboot hostA
- Or
- Use the computers power switch to power down hostA
and optionally start hostA again after a minute.
The power switch is a round button centred on the front panel. The front side of
the computer has a red panel.
- To connect the keyboard mouse and video to the new WHT ICS
-
Keyboard and mouse
- On the Keyboard/Mouse switch press the host selection button
You should have a pair of lights (red,green) on the KVM switch
either at position 1 or position 2.
- Monitors
- On each of the three monitors (A,B,C) change the video input
The video input for each monitor will either be 1 or 2. Video inputs
3 and 4 are not used at the ING.
There are two (and only 2) permutations which are valid
- A1 , B2 , C1
- A2 , B1 , C2
There is a label on monitor A to remind you which combinations
are valid for each computer.
Check the mouse and keyboard are interacting with
the screen you are viewing.
- At the login screen look at the welcoming text
above the space where you type the username.
It should say
"Only Observer should use taurus"
If it does not say this refresh the login screen by selecting
"Options" and then "Reset Login Screen"
If it still does not say "Only Observer should use taurus" are you
sure you have the Keyboard/Video/Mouse connected to the correct computer?
- I want to reboot the ICS but I want to continue to use
the computer I was using before. How do I do that?
- Reboot the old ICS. The ICS will shift to the alternate host.
Then reboot that host. The ICS will shift back to the original.
- I can't drag the application window between screen A and B.
Why?
- Dragging between screens is only supported under one or other
of the following conditions:
-
when the two screens
are on the same graphics card and managed as a single screen in
hardware by that graphics card
- when the computer manages the two screens using special
software
simulation ("xinerma" mode)
Hardware mode is efficient but limited to 2 screens for the current
graphics cards. Software mode is a lot less efficient and may have a
few bugs which
make some ING GUI displays fail to update or otherwise behave quite as
expected.
Software mode is therefore not used.
- If the application is not in the window I want and I can't
drag the application to the desired window what can I do?
- Exit the application. Pass the mouse to the window you want.
From a menu or terminal in that window re-start the
application.
These technical notes are for use by UNIX Systems Administrators
or Software Developers working with services associated with
the WHT ICS.
Mechanisms
The specification required
- that one or other hosts take on the role of the
WHT ICS
- the other (backup) host should be on-line for test purposes
- that volatile information be preserved
when failing over to the backup system.
The solution is to have a dual-hosted SCSI disk to store the volatile
information and to associate the hostname taurus with that sparc
which has ownership of the disk.
Software scripts
these are derived from the scripts used in the ING's NFS file servers.
-
/etc/init.d/INGdiskset_yellow
- A simple bourne shell script which invokes a bash script.
-
/etc/init.d/INGdiskset_yellow.bash
- A bash shell script which attempts to take/release the dual-hosted SCSI disk at startup and shutdown.
-
/etc/init.d/INGdependent_disk_services
-
A script to start services which can only be run by a host which has ownership
of the dual-hosted SCSI disk.
Similarly it stops these same services when the host releases the dual-hosted SCSI disk.
-
/etc/yellow/init.d/INGpostmaster
- The postmaster task can only be allowed to run if the host has the
disk. As such this script was removed from /etc/init.d and placed in this
sub-directory where it is invoked by
/etc/init.d/INGdependent_disk_services as necessary.
-
/etc/init.d/INGwatch_keyboard
- The KVM switch only forwards the USB keyboard/mouse to one host.
If a host boots when it is not selected then at boot time it decides not
to invoke Xsun. This script watches for the appearance of a keyboard
and then starts Xsun.
root cron jobs
-
/etc/opt/bin/test_taurus >> /var/log/test_taurus.log
-
every 2 minutes
this script watches ownership of the dual hosted disk and
takes action to acquire the disk if the other hosts releases the disk
or fails.
/etc/opt/bin/check_shared_metadevices > /var/log/check_shared_metadevices.log
-
07 08,09,10,11,12,13,14,15,16,17,20 * * 1-5
ie hourly during working hours
watches for damage to the mirrored disk set built on the dual-hosted
SCSI disks and sends an alert as necessary.
-
/etc/opt/bin/backup_whticspgpd >> /var/log/backup_whticspgpd.log
-
27 11 * * * /etc/opt/bin/backup_whticspgpd
ie daily
makes rolling on-line backup copies of the postgress database