This document puts the case for purchasing in the current financial year an UltraSPARC for the Roque site.
It is inevitable that we progress from SunOS to Solaris. The sea-level sparcs and some operational systems in use at (INT DAS, WHT JOSE) or imminent (JKT DAS) use Solaris. These machines cannot make full use of apllications on site where these applications are available as sunOS. There is an excess burden in attempting to maintain two sets of software collections concurrently. The administration of these machines will be more efficient when those members of the CFG who acquired training in sys admin under Solaris can put this to use.
The ultrasparc contributes to the Solaris upgrade because the biggest single hurdle to starting Solaris on site has been the absence of a free and powerful sparc to found the sparc cluster. The sparc has to be "free" (or at least easily freed) because the operation of installing Solaris and a respectable number of applications requires at least a week's continuous work. During this time the hardware is unavailable for whatever purpose it may originally have had. The sparc has to be powerful because its performance in file serving can be rate-limiting on other sparcs which use its services. It is true that there are a number of sparcs on site. The function of each of these is described in a table . From this table it can be seen that the "powerful " sparcs (defined as sparc 5, 10 or 20 or UltraSparc) are fully comitted to the sunos cluster (lpss1 - file & compute server, lpss8 test focal station, lpss10 and lpss12 INT and JKT DAS systems). See below for a plan of how the Solaris upgrade could get underway quickly once an ultrasparc was purchased.
The Management Information System (MIS) runs on a sparc 10. A number of measures have been taken to maximise the availability of this system including disk mirroring. Another example would be to buy another S-bus card such that failure of one of the pair of cards would still allow the system to continue. However, the weak point is that if we had a hardware failure and wanted to recover access to the MIS we would want to borrow a sun. Not any sun sparcstation would do. If we want to be able to continue from a simple restore of a backup tape then it needs to be of an architecture the same as the original machine - this means the sun4m architecture of a sparc 5, 10 or 20. Similarly the INT and JKT DAS systems depend on sun4m machines. If these machines had a fault how could observing continue? The current plan is to borrow the WHT test focal station sparc (lpss8) pending a repair of the original. This is unsatisfactory. A sparc should be available which has no special commitments ie a sparc which is one of several compute servers. If it has to be borrowed for use as a spare then while users might have to do with reduction in performance (if the other compute servers are inferior) at least all SERVICES remain intact. This is ultimately what matters in ensuring that observing time is not lost. For these reasons the so-called library sparc should be left free of any mission critical role.
While buying in another sparc5 to act as a file server is a possibility which would fit into the pool of sun4m machines which in an emergency could be substituted by the library sparc the purchase of a sparc5 represents poor value for money. This statement is justified by their similar price but significantly different performance.
Why do they need improving at all? This is set out under the following three headings.
In late 1995 the decision was made to discontine the use of VAXes on site as general user and data reduction facilities. The performance and price of sparcstations together with the increasing dominance of unix-based applications had by then made VAXes obsolete. The savings on VAX hardware maintenance is about 25000 pta per vax per month or 125 pounds per vax per month. During 1996 users have moved to the site sparc stations. The performance as perceived by users has steadily decreased. This is not because the sparcs are ill-configured. It is simply the fact that lpss1 has too much to do. A recent mail message circulated by PCTR pointed out this over-loading. Users have expressed their disatisfaction with the performance. For example
The future performance of the current sparc cluster in 1997 is cause for concern. The usage of larger chips, the higher throughput of the INT and JKT DAS systems and especially the commissioning of INT prime focus mosaic camera will mean more and larger data images have to be manipulated. It is not sufficient to dismiss these as to be handled by the powerful sparcs associated with these projects. For example, tape replication, verification or provision through anonymous ftp will be done on the general computing sparcs and they have to be capable of doing so. Work on quality and the research done by staff using these images similarly cannot be contained to "observing sparcs". The observing systems have to obviously remain free for their prime function during the night and for quality and maintenance work during the day. It is contradictory to the philosophy of having a "general computing service" at the observatory if you can do anything you want with it except process astronomical images originating at that observatory. What is true of the general compute servers is also true of the observing sparcs such as lpss3. Again users have expressed disatisfaction with the so-called quick-look reduction software.
The site provides a service to visiting observers. These people know of the existance of ultrasparcs and will quite probably have used one. It is an objective of the ING to provide facilities such that observers can begin the process of data reduction during the course of their observational run. This is justified by the capability given to observers to adapt their usage of night time by analysing previous observations as they go. Observers judge this facility by comparing reduction times with that capable at their home institutes (or other observatories). Our own staff are aware of the existance of better and faster computers appearing on the market too.
The ING has not, and probably never will, the budget to buy leading edge computing as it comes on the market. Typically new devices come on the market highly priced and decline in 2 years. After 2 years they are may remain fixed in price although superceded by newer models to discourage purchase of obsolete models no longer in mass production. For example in October 1996 quotes for a sparc5 were similar to that of a newer Ultrasparc 1.
Please follow this reference
Urrgh - pretty much impossible. Found the solaris cluster on the library sparc (a sparc5)