OBS-ARCH-1

Architecture

for the

ING Observing System

Guy Rixon

Issue 1.1; 1997-11-02

Royal Greenwich Observatory,
Madingley Road,
Cambridge CB3 0HJ

Telephone (01223) 374000
Fax (01223) 374700
Internet gtr@ast.cam.ac.uk

1 Introduction

1.1 Purpose of the document

A proposed architecture is presented for the ING observing system on the WHT, INT and JKT. Aspects of the architecture that are common to all telescopes and instruments are discussed in detail.

The current issue was produced in a hurry for use at the INGRID strategy meeting of October 1997; it is incomplete. Later issues of this document will extend the decomposition of the programs that are to be reused for several instruments and will provide component descriptions for these programs. The definitions of the communications interfaces will be made more rigorous. The set of functions that each sub-system must provide will be reviewed and may be extended.

It is intended to supplement this document with a text for each instrument covered by in the architecture. The latter documents will define the instrument-specific details and variations from the generic architecture.

1.2 Scope of the system

The Observing System (hereinafter just `the system') is defined to be the collection of computers and software that is needed for routine observing and maintenance of the telescopes. It includes the TCS, ICS and DAS together with the client programs and user interfaces that give access to the real-time services. In terms of hardware, the system includes, for each telescope, the system computer, the telescope computer, the autoguider computer, the data-junction, the detector controllers (mainly CCD controllers) and the instrument-control microcomputers (mainly EPICS systems, 4MSs and MMSes).

The observing system excludes off-line aids to observing (e.g. those used to prepare finding charts). Most data-reduction programs are excluded; the heads of automatic reduction-pipelines are on the boundary of the system. Where a particular instrument needs reduction to be done as part of the data-acquisition cycle, then that reduction software is part of the observing system.

1.3 References

[1] Software requirements for the DAS servers
ING/RGO document INS-DAS-8 by Guy Rixon.

[2] A transaction protocol based on CAD-CAR records
ING/RGO document SOF-EPICS-1 by Guy Rixon.

2 System overview

The observing system is a three-layer client-server installation. In the fundamental layer, which is distributed across a variety of processors, the analogue inputs and outputs of the system are handled and the real-time control of the equipment is performed. The layer nearest the observer consist in mimic diagrams, GUIs, image displays and command terminals. The intermediate layer, sometimes known as the central intelligence or CIA, co-ordinates the users' demands on the real-time services. The CIA contains the `transient clients' already seen at the INT: small network-aware programs that sequence one user command across multiple real-time services. The CIA and UI layers run on the system computer, a SPARCstation running Solaris.

This scheme is a continuation of the architecture used successfully on the INT and JKT in the DRAMA systems produced in 1995 to 1997. The differences from that pilot scheme are in implementation: for new work in the real-time layer, EPICS replaces the failed DRAMA technology and the troublesome DAS and CCD controllers are replaced by the data junction and SDSU devices. Most other equipment made for the INT will continue in service, although some of the common facilities, such as the message log or `talker', will be reworked to support EPICS and to provide better service. The observing log will be devolved from the observing system and merged with the engineering archive, where decent relational technology is deployed.

This incremental approach protects the major investment made by ING in the years 1995 to 1997. The DRAMA programs in the ICS, which use proven hardware and don't rely on DRAMA's broken networking scheme, are generally considered acceptable and can be refined gradually for little extra cost. EPICS, for new work, I take as a given requirement. The advantages to control engineering of using EPICS are great enough that it must be accommodated by the computing arrangements.

The transient clients have been criticised as unwieldy and unreliable in the current implementation. I suggest that their benefits still outweigh their disadvantages, as I will now explain.

There seems to be a real need for the system to be three-layered, with the astronomical logic held in the middle layer. The VAX-ADAM system on the WHT was originally built as a two-layer system with the astronomical rules either missing or present as scripts running in the command interpreter ICL. This arrangement was widely reviled as hard to use, slow, and hard to maintain. In its later years, parts of the VAX-ADAM system were reworked on the three-layer model. The AAO have found that they need a three-layer scheme in both ADAM and DRAMA, and the business-computing industry now prefers three-layer installations with `business objects' in the middle.

The observing system is inherently parallel. Many mechanisms of a spectrograph can be positioned at the same time; commands can come simultaneously from the observer, the TO or from supporting engineers. However, the system is not separable: it has been a guiding principal since the opening of the WHT that all facilities should be controllable from a single point. Hence, the middle layer of the system has to co-ordinate many overlapping, asynchronous transactions on disparate real-time sub-systems. This co-ordination can be managed by multiplexing I/O in a single program; by using threads to keep track of transactions; by using separate processes to contain particular user commands. The first approach was used at the WHT in the UES CD-task, the C-tasks for data acquisition and in the Autofib C-task. These are all more or less successful, but the complexity of the tasks is very great and there is no prospect of covering the whole system for one instrument in a single task. The pure multi-threaded approach was considered for the INT in 1994 but is not supported by either DRAMA or EPICS. The final approach is used in the transient clients. Viewed against this background, I believe that the transient clients are still the best way of integrating our DRAMA and EPICS assets.

The observing system has the Unix shell (C shell at present) as its command-line and scripting interface. This is done to eliminate the cost of providing a custom command-interpreter; to reduce the risk of fielding a badly-broken interface (like ICL in its earlier days); to capitalize on industry-standard shell-programming skills; and to present to guest observers an interface that they may have learned elsewhere. The binding of all user commands into client programs allows any alternative shell or command-interpreter to be used: Tcl and PERL scripts have been used successfully at the INT.

Tk, programmed in Tcl, is recommended as the means to produce GUIs, with the assumption that a custom GUI will be programmed for each desired application. Tcl has been integrated well with DRAMA and the successful GUIs for the INT use this approach. In future, other GUI technologies may be come more attractive. The proposed arrangements in the CIA allow these to be used with our proprietary messaging protocols.

3 System context

Because the observing system includes all the computing hardware, the flows across its boundaries are mainly analogue.

Mech_signals denotes encoder and switch readings, motor demands etc. Throughout the system, each instance of this flow is handled by a local processor and the details of the flow can be considered in the design of that processor.

Raw_pixel indicates the primary purpose of the observing system: it turns readings from detectors into FITS files.

Text, Graphics and mouse_actions are the interaction of the users with standard terminal equipment. The system is required to provide a consistent set of controls to each of the three users shown. The system provides terminals with X-windows capability for one observer and one TO. The engineer user can log into the system computer from any X-windows terminal and should be able to take full control subject to verbal negotiation with the observer and TO, and subject to normal interlocking. Engineering access is controlled by the password system of the system computer.

The FITS_files which are the primary product of the system go to the archive system for recording on CD-ROM. The observing_log and the session_transcript, a log of the commands given during a night's observing are also sent to the archive.

4 System design

4.1 Design method

The system is described using data-flow diagrams in the notation of Yourdon. No control-flow is noted; because the parts of the system are loosely coupled, it is simpler to represent control messages as data flows.

This model is generic: it does not describe exactly the set of programs to run a particular instrument, nor does it show the union of all programs for all known instruments. Instead, I have tried to show the necessary types of programs and interconnections to cover the facilities needed in all the applications to instruments:

z Commands to mechanisms and status returned from the commands.

z State displays for the mechanisms.

z Interlocking of commands.

z Textual message and alarms sent to the users.

z Prompting for information.

z Acquisition and storage of data.

The decomposition is not entirely hierarchical: figures 101..106 expand the parts of the data-transforms in figure 0 that are relevant to each of the categories in the list above.

4.2 Design decomposition

Figure 0 shows the three-layer model alluded to in section 2. The top two layers are resident on the system computer, while the real-time-service layer is distributed.

Figure 3 shows the diversity in the real-time service layer.

The autoguider is a major revision of the autoguider produced by RGO in the re-engineering program of 1995-1997. In this sub-system, the basic architecture of a VxWorks system driving a Phase-II CCD controller can be retained. However, the DRAMA interface to the autoguider is unlikely to reach the standard of reliability needed by ING and must be replaced. EPICS is the most obvious technology to replace DRAMA because it will be used elsewhere in the system. However, EPICS provides no support for the control of the autoguider CCDC nor for the autoguider's computation; EPICS simply provides the interface for communications. In light of this, it may be preferable to devise a special, tailored messaging system for the autoguider. If EPICS is used, the autoguider's images for display must be sent by a different route as EPICS cannot transport bulk data. A further possible upgrade of the autoguider is to replace the phase-II CCDC with an SDSU CCDC.

The TCS is the RGO product developed from the original on the WHT during the re-engineering programme of 1995-97. This is based on an Alpha workstation running OpenVMS. The command-and-status interface to the TCS is networked DRAMA with a D-task inside the TCS and running on the telescope computer. DRAMA connections from Solaris to VMS are well suited to ING's needs and have been found to be highly reliable.

The autoguider sends guide packets to the TCS on a serial link. This interface is a legacy, has caused some problems and is extremely difficult to verify. The cost of reworking the link should be determined and compared to the likely cost of extra staff-hours in maintenance for the current version.

The combination of instrument MMS and instrument D-task is a legacy from the Perkin-Elmer ADAM systems and the DRAMA system on the INT and JKT. For each of the MMSes we have, or are building, a DRAMA D-task to run on the system computer, driving the MMS via a dedicated serial line. This arrangement has been proven in service to be generally sound and no-one has suggested a reason for retiring the MMSes. Communication between MMSes and the D-tasks for IDS and IDS' A&G box seem to work well, and communications from the central intelligence to the D-tasks are satisfactory (DRAMA seems to work fairly well when no network is involved). These sub-systems should be kept in service with minor improvements to the quality of the DRAMA interface.

The combination of instrument 4MS and instrument D-task is one way to migrate instruments from the WHT's VAX-ADAM system into the new architecture. The ADAM D-tasks are not suitable for re-use, even in the short term. The cannot be ported to the Unix system computer and they use a shared-memory noticeboard for the Mech_state flows which cannot cross the LAN from VAX to SPARCstation. The best possibility here is to replace ADAM D-tasks with DRAMA D-tasks following the model for the INT MMSes. This is risky: the new D-tasks would have to use the utility-network protocol and this has caused problems when tried with CCD controllers (both the science and autoguider CCDCs) at the INT. Before writing these D-task we should answer the questions `is the utilnet implementation in the re-used 4MSes correct to the specification?' and `do we have resources to re-implement a utilnet client-library for Solaris?'. (The existing implementation has been shown to lose communication indefinitely for some practical sequences of messages.)

The alternative to the 4MSes is to build instrument EPICS sub-systems in VME computers. This programme is already underway for the WHT. Since these sub-systems can be linked by Ethernet, can run the standard transaction interface described below and can be customized arbitrarily with C code, they are rightly the new standard for instrument control at ING. Notably, EPICS databases that offer a standard interface can be linked to the CIA without the aid of D-tasks on the system computer.

The data junction is the new detector-control and data-acquisition sub-system developed in concept in October 1997. It is a VME rack containing a processing unit for each detector controller that it supports; the detector controllers are SDSU units or phase-II CCDCs. The processing unit contains a CPU card, running an EPICS database dedicated the detector head, an SDSU interface card and a memory card; the unit drives the detector controller and reconstructs in memory the image-data sent by the controller. The data junction has a single dedicated processor to control access to its observation-data disks. Networking arrangements for the data-junction are not yet finalized, but it is intended to have a LAN connection to each of the processing units for command and control and a separate LAN segment, at 100 Mb/s, for data export. The output of the data junction is FITS files. The external data needed for the FITS header arrives preformatted in FITS-packet files.

Although the real-time sub-systems are of several different kinds, they are expected to provide logically consistent services to the central intelligence and user-interface layers:

z Instructions are sent down to the RT sub-system in obey transactions

.

z Actions in obey transactions can be aborted by kick transactions.

z State variables (the mech_state flows) in the sub-system can be monitored by other programs. This is a `push' service to avoid the need for programs to poll the sub-system state.

z The sub-system can send textual messages to the user via the standard facility discussed below. The sub-system does not require its own console or other UI to run.

z The sub-system either support a standard command to archive it state in a FITS-header packet written to disk as specified in INS-DAS-8 [1], or its state can be read from the mech_state flows.

z There is a standard command [TBD] by which the central intelligence can cause the sub-system to initialize itself and a standard form of enquiry [also TBD] by which the central intelligence can find out if the sub-system is ready to do astronomy. These features are lacking in the current DRAMA systems and their absence weaken the start-up operation very badly.

z Where-ever possible, interaction between sub-systems is through the CIA, not by direct links in hardware or on the LAN. This makes development, testing and maintenance much easier, and the principle should only be reversed where very high performance in communications is essential.

z Data flows such as mech_signals and raw_pixel across the boundary of the observing system are never shared between sub-systems. Each such low-level signal is handled by one specific real-time facility.

z The external signals are never left unattended. There must always be some resident task that reacts to the signals and which can raise alarms as follows. In particular, the sub-systems must look after their hardware between user commands.

Figure 101 shows the involvement of the UI and CIA layers in the execution of commands. The UIs can be GUIs of any type or Unix shells; the shells provide the scripting capability. In each case, the command is started when the UI forks a client program and passes it an argument array, and ends when the client program exits and returns status. These are the simplest, most common arrangements for IPC in a Unix or POSIX-2-compliant computer and they free the UI from any need to implement communications with the RT layer. Hence, any combination of UI building tools can be used, and programs such as the shells can be used without adaptation.

The advantage of using a standard shell for a command line should not be underestimated. The two custom command-interpreters used so far by ING (Adamcl and ICL) have not been very successful. Writing a special command-shell is expensive and it is onerous for guest observers to learn a non-standard command syntax.

If a shell, say csh, is used to run client programs in the background, then some separate provision needs to be made for that programs prompting and text output.

This is dealt with later in the design.

Figure 102 shows the flow of state information to drive mimic displays. There are two possibilities:

z UI `speaks the language' of the real-time sub-systems and monitors their state variables directly: the right-hand branch of the diagram. This gives the highest possible throughput and the least number of processes. It is desirable for an engineering display for one sub-system but less of an advantage for an astronomical display that spans many sub-systems.

z Simple filter programs monitor each RT sub-system , preprocess the results and pass them on to the UI: this is left-hand branch of the diagram. The filter programs are launched by the UI and pass back their result on standard Unix pipes. This arrangement is slower and has extra processes but it protects the UI from the complications of the message systems.

The `directly-coupled' approach has been used so far in the DRAMA systems. It works well with clients and D-tasks both on the system computer, and between the telescope computer and the system computer. The method works less well from the autoguider computer to the system computer.

The `filtered' approach would be needed if a GUI builder was used that could not accommodate the DRAMA and EPICS message systems. It may be needed for UIs hand-coded in Tcl that talks to both DRAMA and EPICS.

Command interlocking is show in figure 103. This interlocking is present to reduce or eliminate the loss of data or observing time due to inappropriate user commands, and the interlocking mechanism is not forceful enough to ensure safety of users or equipment in extreme situations. Because of this, the RT sub-systems must provide safety interlocks for the hardware they control.

Reasoning on the set of allowable commands is concentrated in an interlock manager in the CIA. In general, the RT sub-systems do not know about each other to determine the locks and the GUIs become over-complex, unreliable and difficult to code and maintain if they manage the locks. The lock manager, which has not yet been designed in detail, is meant to be a simple engine driven by a table for the instrument of choice. It is hoped that the tables will be easier to verify than large ladders of logic.

At any time, a command may be allowed, vetoed, or warned against. Since the interlocking is maintaining efficiency, not safety, command are not normally forbidden. For example, a science exposure when the telescope is not tracking is normally a mistake but could just be a special observing mode such as a trail along the spectrograph slit; a warning would be issued instead of a veto.

When a client wants to start an operation, it asks the lock manager for permission. If the lock manager returns a veto, the client ends with an error status. If the lock manager returns a warning, the client can negotiate with the user for permission to continue, probably prompting for a yes/no answer.

GUIs conventionally `grey out' buttons and menu items for commands that are unavailable. This effect can be achieved if the GUI monitors the state of the interlocks in the lock manager for the appropriate commands.

The lock manager is expected to monitor the state variables in the RT sub-systems to detect situations that lock commands. The exact transactions will vary between sub-systems but it would be easiest if the designers of sub-systems provided special variables for the key states. For example, the EPICS database for a science CCD should have a state variable that indicates `sub-system not initialized', `idle', `clearing', `integrating', or `reading out'. An AO sub-system might provide a boolean variable that is set to true when the sub-system can accommodate a telescope movement.

Figure 104 shows the data flows involved in presenting textual messages and alarms to the user. This is a fairly complex facility aimed at presenting and logging the messages consistently, as with the talker programs in the current ING DRAMA system.

The core of the mechanism is the message-log task in the CIA. This task formats incoming messages (sets time-stamps, urgency keys and a note of the origin of each message) and writes them each as single lines of text to a disk file.

Each user of the system runs a message display which tracks the contents of the file in real time (by polling with a period of 1 to 5 seconds). The display filters the messages according to their marked urgency: alarms are presented as dialogue boxes, and routine messages can be suppressed.

UIs, clients and RT servers can all pass messages for display. Most messages from the RT servers are error messages of some kind and they normally go first to the clients; in this way if a client handles an error silently, it can suppress the error message. Major alarms and errors that occur outside the context of a user command are passed directly from the RT server to the message logger. GUIs can, of course, display messages themselves. However, this duplicates code and means that some message traffic doesn't get logged, so UIs should not normally do this.

The mechanism by which the messages are passed to the message logging task is a set of POSIX named pipes, one for each urgency level. The details of this mechanism should be published later in November 1997. RT servers outside the system computer can't use a named pipe (which is restricted to local tasks and doesn't work over NFS), so some communications agent is implied inside the message-log bubble to transcribe DRAMA or EPICS messages into the pipe.

If the message log is broken or absent, all clients UIs and servers should be able to output the messages by some other means. Writing to standard error would do.

Figure 105 shows the arrangements for prompting. GUIs can write their own prompts. Clients cannot take input from the shell if they are in the shell's background, and are not X-windows programs. They prompt by spawning a small graphical utility that displays a dialogue box and returns the answer on its standard output stream.

Figure 106 shows the interaction between layers and RT sub-systems during data acquisition.

The acquisition, processing and filing of images happens within the data junction (shown as `DAS' in the current diagram) and need not be considered in this global architecture. Pixels go in and FITS files come out when an acquisition is triggered by commands from a client.

Each FITS file needs packets of FITS descriptors from elsewhere in the system and these are generated under control of the packet compiler, a resident task in the CIA. Each DRAMA task in the system is command by the compiler to generate one packets and writes this directly to disk. This mechanism was adopted to reduce the load on the DRAMA message system. For the EPICS sub-systems, it has been proposed that the compiler runs a filter program to parse the sub-system's state variables and write the packet. This method would also work for the DRAMA sub-systems, and, given that a new packet compiler must be produced for the new approach it may be worth standardizing. In favour of standardization, it is much easier to maintain a filter program than a code fragment embedded in a server, and the DRAMA transaction handling in the current packet compiler isn't particularly robust. Against standardizing, the existing D-tasks have been tested and produce acceptable packets already.

To get the images displayed as they are produced, I propose that the FITS files be read back from disk by a display client. That client would be permanently resident in the CIA and would process the files for use by the standard IRAF image displays such as Ximtool. That is, the display client would be an automatic version of the IRAF display command. To reduce the delay in displaying the display client should monitor a state variable in the data junction that indicates when a file is ready for display.