Routine SPARC Software Maintenance

Gary F Mitchell 24-Jan-2003

Summary

The sparcs need to be rebooted from time to time to allow routine operating system maintenance. This can be done by scripts and it is proposed to do the Astronomy SLO sparcs between 00:00 and 04:00 am on the first Thursday of the month. Other SLO desktops will also be included such as


				2003

	 Jan			Feb		       Mar
 S  M Tu  W Th  F  S    S  M Tu  W Th  F  S    S  M Tu  W Th  F  S
          1  2  3  4                      1                      1
 5  6  7  8  9 10 11    2  3  4  5  6  7  8    2  3  4  5  6  7  8
12 13 14 15 16 17 18    9 10 11 12 13 14 15    9 10 11 12 13 14 15
19 20 21 22 23 24 25   16 17 18 19 20 21 22   16 17 18 19 20 21 22
26 27 28 29 30 31      23 24 25 26 27 28      23 24 25 26 27 28 29
                                              30 31
	 Apr			May		       Jun
 S  M Tu  W Th  F  S    S  M Tu  W Th  F  S    S  M Tu  W Th  F  S
       1  2  3  4  5                1  2  3    1  2  3  4  5  6  7
 6  7  8  9 10 11 12    4  5  6  7  8  9 10    8  9 10 11 12 13 14
13 14 15 16 17 18 19   11 12 13 14 15 16 17   15 16 17 18 19 20 21
20 21 22 23 24 25 26   18 19 20 21 22 23 24   22 23 24 25 26 27 28
27 28 29 30            25 26 27 28 29 30 31   29 30

	 Jul			Aug		       Sep
 S  M Tu  W Th  F  S    S  M Tu  W Th  F  S    S  M Tu  W Th  F  S
       1  2  3  4  5                   1  2       1  2  3  4  5  6
 6  7  8  9 10 11 12    3  4  5  6  7  8  9    7  8  9 10 11 12 13
13 14 15 16 17 18 19   10 11 12 13 14 15 16   14 15 16 17 18 19 20
20 21 22 23 24 25 26   17 18 19 20 21 22 23   21 22 23 24 25 26 27
27 28 29 30 31         24 25 26 27 28 29 30   28 29 30
                       31
	 Oct			Nov		       Dec
 S  M Tu  W Th  F  S    S  M Tu  W Th  F  S    S  M Tu  W Th  F  S
          1  2  3  4                      1       1  2  3  4  5  6
 5  6  7  8  9 10 11    2  3  4  5  6  7  8    7  8  9 10 11 12 13
12 13 14 15 16 17 18    9 10 11 12 13 14 15   14 15 16 17 18 19 20
19 20 21 22 23 24 25   16 17 18 19 20 21 22   21 22 23 24 25 26 27
26 27 28 29 30 31      23 24 25 26 27 28 29   28 29 30 31
                       30

While the reboots will be scattered over 3 hours or so each sparc will be patching for about 10 to 20 minutes. During this time logins are disabled but other services continue.

Details

To begin with the operating system of sparcs at the ING is continuously evolving.

Apart from the major releases (solaris 8, solaris 9) the remainder are applied more or less continuously throughout the year. In the past some (but never all) of them could be applied to a live system. Some patches require a rebooot afterwards for them to take effect.

Why it is necessary

Concerns about security mean that I have introduced measures which make the sparcs very secure - but these same measures now require that after patching the sparcs must be rebooted to make them secure again. To continue to maintain the operating system therefore a program of scheduled reboots needs to be introduced.

Reboots are inconvenient - but this is nothing compared to the inconvenience of a sparc which has been compromised for lack of a security patch.

Finally the ING cannot jeopardise operations by permitting the continued operation of unsecured desktops. Users must realise that all computers in the ING domains must be secure or there is no security.

The reboot / patch / reboot sequence

The sequence is How long is all this? Depending on the volume of work to be done it could be between 10 and 20 minutes.

Apart from the boot/reboot events (1 minute each) the sparc continues to serve any data areas associated with it (e.g. /data/djl). Only logins are disabled. Anyone attempting to login is denied access but shown a text which says why it is unavailable. Something like this

"patching begins in 4 minutes  - please use another host"

When patching is underway it changes to this

 "Sorry - patching in progress
  Number of patches to apply: 12
  patching began at 02:10"

Practicalities

With about 80 sparcs at the ING we can't use this procedure for them all. NFS file servers and observing systems are not included. This is about desktops and SLO sparcs and sparcs which the CFG has for it's own specialised purposes.

(Others obviously still need maintenance done - but that is scheduled entirely differently).

Not all SLO sparcs can be done simultaneously so they are split over 4 hours. The patches are applied quicker if the patch server is not overloaded by all ING sparcs requesting patch data simultaneously. If users wish each sparc could have a little sticker to remind them when their desktop is unavailable - or simple write their own reminder.

"This sparc reboots for maintenance on 
first Thursday morning of the month"

Remember that this is nothing new - O/S maintenance has been going on with about the same frequency over the last 2 years. All that is new is the requirement to reboot to do it - and to minimise disruption to SLO desktop users I propose to do this in the small hours of the morning.

The reboot schedule

This information last audited 24-Jan-2003

hostnamereboot timedata areas
lpss2501:07
lpss2602:07jba,jht,guest16
lpss3202:07naw,knapen
lpss3303:07greimel,sanchez
lpss3401:07bgarcia,nom,pms
lpss3502:07rcorradi
lpss3603:07djl,jma
lpss3701:07azurita,cp
lpss3802:07cc,rlc,sjst,stp
lpss4203:37jholt

Administrator notes

These scheduled reboots may be suspended by commenting out the root cron job