Computing Facilities Group
Prioritised purchase list for CFG composed by GFM.
Version of 06-Feb-1997

These items exclude those requested for other sections and instead concern those for use within the CFG or of benefit to multiple users in diverse sections.

Summary
  1. Three Cisco routers at £ 12,500 each + other network components TOTAL £53,000 To properly implement a fast (100 Mb/s) and secure network on site.
  2. Network Components TOTAL £16,000 To implement only a fast (100 Mb/s) network on site. This a a cut-down version of the previous bid
  3. UltraSPARC 1 Compute Server + DAT drive TOTAL £ 8,000 To establish a high-availability service at Roque and SLO.
  4. Bulk storage backup device Two DLT stacks at £4,500 each TOTAL £9,000 To match backup capacity to disk capacity at Roque and SLO.
  5. Professional Modem Service Dial-up Shiva LanRover Plus with 8 modem cards TOTAL £6,000 To have a reliable fast high capacity modem service.
  6. Upgrade of 4 old sparc IPX to equivalent of sparc10 using Cycle Computers upgrade board. each upgrade is £1500. TOTAL £6,000 To bring up to date four sparcs on site.

This information is available as a web page. For an up to date copy please view wish97/cfg.htm

Priority Brief Description
and cost
case for purchase
1 Three Cisco routers at £ 12,500 each = 37,500 + other network components
TOTAL £53,000
To properly implement a fast (100Mb/s) and secure network on site.

The management information system is located at SLO but unlike other PPARC sites we need to allow access to the MIS from outside the physical location of the MIS - namely from the Roque. This introduces a severe risk to the core business of the observatory. To properly address this the network topology on site has to be re-designed to create a secure subnet. A design has been composed by Tony/Basil/Gary but it needs TWO routers (one in Residencia the other in the WHT) and a third for use as a spare for the above.

If we get the routers
  • The would be a 100Mb/s link between the WHT and INT.
  • The would be un upgrade path (purchase of ATM boards for the Cisco routers) to have ATM all the way from the WHT to the residencia, the IAC and Madrid. In a mail message from Diego at the IAC 06-Feb-1997 he has asked if the ING would make use of an upgrade of the Residencia to IAC link from the current 2Mb/s capacity to 34Mb/s (and ultimately 155Mb/s). To view the message please see the reference on the web-page version of this document.
  • The established network would operate in a secure manner which would be acceptable to PPARCS security advisors for the MIS project.
  • the telescope computers would be secure from hacking by anyone who is not on site or at the SLB or any other specific trusted site (such as selected hosts at RGO Cambridge).
If we get NOT get the routers
  • The would be only a 10Mb/s link between the WHT and INT.
  • One day the MIS will be hacked from off-site probably via a computer at the Roque which is trusted by the MIS.
  • The telescope systems will be hacked from off-site with a potentially devastating loss of observing time
In either case to prevent a recurrence it would be necessary to isolate the site by crudely pulling the plug on the residencia link. Communications between the SLO and the Roque site would also need to be interrupted until the mechanism of the breakin was understood. While most systems could be re-constituted from backups taken before the first intrusion the tracing and understanding of the hacking is no trivial task. It could take weeks or even require the services of a specialised consultancy firm. One of the recommendations by any consultancy will be to get the site(s) back up by buying three routers...
1 B network components
TOTAL £16,000
To implement only a fast (100Mb/s) network on site.

This is essentially a cut-down version of the "fast and secure network" ie without any routers - and without any security.

If we get the network components
  • The would be a 100Mb/s link between the WHT and INT.
If we get NOT get the network components
  • The link between the WHT and INT would remain unchanged.
2 UltraSPARC 1 Compute Server + DAT drive
TOTAL £8,000
To establish a high-availability service at Roque and SLO.

There is a strategic weakness in the role of ultrasparc both at the Roque and at SLO. The principal is that for every sparc in a role crucial to operations there must be equivalent hardware which is executing a non-crucial role such that in the event of a hardware failure the hardware can be replaced without waiting for the hardware to be repaired. The solaris file server is in the process of being built on an ultrasparc 1. The only other ultrasparc is at sea-level (lpss20). The sparc lpss20 is a compute server and normally this would mean it occupies a non-crucial role because if it was unavailable users could log in to any of the remaining (slower) sparcs in the same cluster. However lpss20 is the only host for SLO and to remove it would disable SLO sparc services.

If we had one more ultrasparc 1
  • it would provide a powerful computing resource at the Roque for all users
  • it would serve as a spare for the Roque sparc fileserver
  • it would serve as a spare for the (only) SLO host supporting logins.
If we do NOT get another ultrasparc 1
  • the on-site Solaris cluster will be laughably under-powered for anything other than the observer accounts which are the only accounts which will have login access to each telescope's data-collection and quick-look data reduction sparcs. For general compute serving only feeble sparc IPXs are available (but see item #5 below). This state will continue until the last of the sunos sparcs can be updated to solaris when at last a sparc10 will be available. This could take months.
  • there will be total loss of sparc services at SLO if ever the Roque ultrasparc fails.
  • there will be total loss of sparc services at SLO if ever the SLO ultrasparc fails.
  • we cannot further develop the RAID array. The RAID array has 8 disks and a capacity for 20. However we cannot sensibly make further use of this resource because it would be adding emphasis to a resource dependent on an ultrasparc for which there is no spare.
3 Bulk storage backup device
Two DLT stacks at £4,500 each
TOTAL £9,000
To match backup capacity to disk capacity at Roque and SLO.

Over the last 3 years the disk capacity associated with sparcs has increased enormously and with ever larger chips and the WFC array imminent the usage will continue to increase. Despite this increase the CFG still has the use of just one DAT drive running in uncompressed mode on lpss1 to back up all the site sparcs. The SLO facility similarly has just one modest DAT drive on the file server. The usage of DAT drives in this way is no longer practical. For example, the process of doing full backups of just lpss1/2/6/7 - which represents half of the Roque disk capacity - takes 7 working days.

The principal problem is that the system administrator must change the tapes every 3 hours and instead of writing the theoretical maximum of 8 tapes a day this can be done just 2 times each working day.

Once all sparcs are established on the Solaris cluster and all being backed up the "full" backups which should be done once a month will take two weeks to complete!

Two is the minimum because if either were to develop a hardware failure at least one drive capable of being used to restore a disk must be available.

If the DLT system is purchased
  • the larger capacity of the tape media and the fact that there are several available will allow backups to be done day and night continuously.
  • the frequency of backups can be increased and the use of full and intermediate backups relative to incremental backups can be increased such that recovery from a disk failure could be much quicker.
  • a DAT drive will be freed by each substitution for general use.
  • system administrator can do something more useful than change tapes.
If the DLT system is NOT purchased
  • the DAT drives used for system backup will forever be in use
  • data on disk is vulnerable to irrecoverable loss during the 7-day period each month that monthly backups are done. When this period gets even longer we will be unable to backup data at a rate matching its creation.
4 Professional Modem Service
Dial-up Shiva LanRover Plus with 8 modem cards
TOTAL £6,000
To have a reliable fast high capacity modem service.

The modem service offered over the last two years is pitiful. The modems employed are cheap models designed for occasional use by domestic PC users. They regularly burn out or develop other faults. These cheap modems have a useful life of less than 6 months and we never have two models the same. Furthermore after tediously configuring them they can lose that configuration when powered off. PGS spends an inordinate amount of his time configuring them, testing them, sending them back for repair and shopping for replacements. They depend on other devices for dial-back. Modems have been the source of numerous complaints.

The lack of a reliable modem service can impede staff who are called out at home from accessing the system remotely to diagnose faults.

If the DLT system is purchased
  • there will be added productivity from those users working from home.
  • it will be practical for selected staff to have PCs at home so they can, when called, be able to log in and diagnose faults.
If the DLT system is NOT purchased
  • be prepared for another year of complaints about the pitiful and useless modem service.
  • staff called out at home in the evening will need to drive to the SLO in Santa Cruz if they want to diagnose a fault remotely.
5 Upgrade of 4 old sparc IPX to equivalent of sparc10
using Cycle Computers upgrade board. each upgrade is £1500.
TOTAL £6,000
To bring up to date four sparcs on site.

There are four sparc IPXs available - the single largest group of sparcs which are all the same model. They are only 4 years old but their performance of 21 specmarks relative to a sparc10 (53 specmarks) or an ultrasparc (215 specmarks) is pitiful. While they were the snazzy machine of their day about 4 years ago they are viewed with contempt by staff and visiting observers. Currently they account for 3 of the 4 sparcs offered as the Roque sunOS cluster.

There is a relatively cheap route to give these sparcs with their large screens and peripheral device a new lease of life. For details see the web page for CFG job #243 - reference http://www.ing.iac.es/~cfg/job_dir/job243.htm

If the CPU upgrades are purchased
  • the site Solaris cluster will get of to a good start with 3 x 60 = 180 specmarks of compute services.
  • we will have sufficient SCSI interfaces to make use of the large number of SCSI (as opposed to fast SCSI) disks of modest (2Gb and 4Gb) capacity.
If the CPU upgrades are NOT purchased
  • the site Solaris cluster will get of to a dismal start with 3 x 20 = 60 specmarks of compute services.
  • while we can put SCSI devices on the IPXs in the Solaris cluster they would not effectively contribute to file serving because the speed at which they could be served is so poor. The huge cumulative investment in SCSI devices would not be realised to its full potential.
  • when Sun eventually drop support for sun4c type architecture we will have to upgrade these sparcs then instead of now or throw them out. The upgrade from Cycle Computers changes the architecture to that of the sparc5 and sparc10 machines ie sun4m. Support for that architecture should continue for more years - and on past experience we can expect our computers to still be in use long after they were sold.