NAOMI
Nasmyth Adaptive Optics for Multi-purpose Instrumentation
The
Real Time Control System
Programmers Guide
wht-naomi-24
Version:
0.3 24 December 2002
Authors:
Richard Myers1, Stephen Goodsell2
1.
University
of Durham, Department of Physics
2.
Isaac
Newton Group of Telescopes
This document describes how to programme the real-time control system of NAOMI, how the existing real-time programmes are structured and how they can be modified.
A separate document “NAOMI Hardware Reference Manual”
provides a description of the real-time control electronics rack.
A separate document “The GP Messaging Library” describes the general purpose asynchronous messaging system used to configure and monitor the real-time system.
A separate document “Real Time Control System User’s Guide” provides a
description of the use of the low-level engineering level command line access
methods, including setting up and executing software configurations as well as
the manipulation and monitoring of a running system.
A separate document “Naomi Engineering and Control Program: TopGui” provides a description of higher-level control, monitoring and sequencing of real-time processes.
NAOMI’s real-time software is derived from Durham University’s Electra software and many of its files and nomenclature use the word Electra. Most of the underlying design of the software and much of the programming is by David Buscher. Other design features were evolved in discussion with Andy Vick. Craige Bevil, David Buscher, Nigel Dipper, Peter Doel, Szilveszter Juhos, Patrick Morris, Richard Myers and Andy Vick evolved the workstation client-side software. (The staff of the UKATC produced the Mechanism and WFS control software but this software has its own documentation and this Guide does not describe it further.)
Parts of this Guide borrow freely from David Buscher’s documentation.
NAOMI stands for Nasmyth Adaptive Optics for Multi-Purpose Instrumentation. It is the Adaptive Optics (AO) system on the William Herschel Telescope (WHT).
The purpose of this document is to enable maintenance and development of the NAOMI real-time control system. An understanding of the purposes and the basic construction and operation of NAOMI is assumed. An Introduction to Astronomical Adaptive Optics is available in the book of that name by John. W. Hardy (OUP). Introductory information on NAOMI’s overall design requirements may be found on http://aig-www.dur.ac.uk/fix/projects/projects_index.html. The accompanying Technical Description is useful but describes the design rather than the final system. An updated general description of the as-built NAOMI system is in preparation by A.J.Longmore (UK ATC at ROE) and R.M.Myers (Durham) at the time of writing this document.
This Guide includes a brief history of the real-time system software, sufficient only to understand the nature of the collected code. It describes the Revision Control system used, how to set up as a developer and gain access to the source code, and where the key files are located in the software directory tree.
Subsequent sections describe the hardware architecture of the real-time control system (briefly) and the corresponding software architecture. The structure of the principal programmes is then explained. The method of building new programmes and generating new real-time configurations is then described, followed by brief descriptions of the nature of the connections to workstation client processes.
The final sections document the software files in detail, and briefly introduce some ways in which the software might develop.
AO |
Adaptive Optics |
partial removal of the effects of atmospheric turbulence on image quality |
BSP |
Bulk Synchronisation Parallelism |
Real-time programming methodology used in NAOMI (University of Oxford) |
C40/C44 |
Texas Instruments DSP |
DSP optimised for interprocessor communications |
CCD |
Charged Coupled Device |
light sensitive detector |
DM |
Deformable Mirror |
wavefront phase corrector |
DSP |
Digital Signal Processor |
processor optimised for signal processing |
Electra |
Durham University AO system, software and DM |
See History section below |
EPM |
Electra process monitor |
Process variable database portion of the NAOMI sequencer |
FSM |
Fast Steering Mirror |
wavefront tip-tilt corrector |
GHRIL |
Ground-based High Resolution Imaging Laboratory |
Nasmyth platform of the WHT |
GP |
General Purpose message protocol |
Asynchronous messaging system used to/from and between C40s |
INGRID |
ING infrared camera |
Can be used as a science camera for NAOMI |
ISR |
Interrupt Service Routine |
Code executed on processor interrupt |
LoveTrain |
Synchronisation packet |
Synchronous messaging system used between C40s in the WFS and DM rings during ISR execution. |
OASIS |
Optical Spectrograph |
Integral field spectrograph to be used with NAOMI. |
Python |
High-level language |
Scripting language for NAOMI |
Ring Leader |
C40 brokering transaction IDs |
C40 used to coordinate parameter block transactions within a c40 ring |
Sequencer |
Process launcher/monitor |
Processes used by NAOMI supervisory software |
WFS |
Wavefront Sensor |
Optical system for wavefront phase distortion measurement |
Code is indicated by this type format.
Future expansion notes in the
main text are indicated in this type format.
NAOMI’s real-time software is derived from Durham University’s Electra software and many of its files and nomenclature use the word Electra.
The Electra AO system’s ThermoTrex 228-degree of freedom mirror, with its internal figure sensing system, was adopted for the NAOMI system in 1996. This followed a PPARC review recommendation and a decision by the NAOMI consortium (then ING, Durham, RGO, ROE). A Memorandum of Understanding between ING and the University of Durham describes the transfer and reciprocal arrangements.
The Electra mirror requires unique control system features because of its internal feedback capability. The design of Electra’s real-time control software predated the inception of NAOMI and is essentially a superset of NAOMI’s requirements in terms of the flexibility of both the underlying architecture and the visualisation system. On the other hand, it interfaces to different WFS and FSM hardware and does not cover NAOMI’s operational requirements. Rather than write wholly new software on this scale, the Electra software was adopted as a whole and new interfaces and a unified engineering GUI (TopGui) were added within the Electra software structure. New documentation requirements were added. This document is part of that process.
The NAOMI mechanism and WFS control software is new and was written at UK ATC. It does not need to be within the Electra software structure and, indeed, is not.
The higher-level coordination software (the Sequencer and TopGui) necessarily forms part of the Electra structure because of its need to communicate intensively with the real-time control system (e.g. for WFS image display).
The adoption of the Electra software package as a whole has resulted in the availability of many little-used library routines and applications. Several of these applications (WFSAlign, MirrorMimic, DataDiag) were of great use during NAOMI’s commissioning phase but have now been superseded by TopGui. In all these cases it is not proposed to provide extensive documentation of these facilities and it is anticipated that they will be removed from the supported release in due course.
The Electra software system supports development in two languages: C and Python (with Numeric extensions). Python provides both a scripting facility and a rapid development high-level language that is also efficient in execution.
This section is freely adapted from David Buscher’s Quick Guide to Electra Software Development. It describes the method for developing Electra workstation and c40 code in general. It also describes the location of the principal source code for the Real-Time control system parts of the Electra. The details of how to build, maintain, modify and enhance this particular code are deferred to subsequent sections.
Electra uses the BCS baseline control system to allow multiple developers to work on one body of code. Not only source code but also documentation is all stored in one central directory tree, called a baseline. Software developers maintain their own private copies of this baseline, called staging areas and the BCS system takes care of the synchronisation of the private copies and the baseline. For the Electra developer, there are three copies of the source trees to be aware of, which are rooted in
/software/Electra_src_tree
$STAGING (usually $HOME/Electra) and
/software/Electra
The first of these is the baseline, which contains the ‘master’ copies of the source files. No object files or executables are ever built in this tree, and manipulation of this tree is mostly indirect via BCS commands.
The second tree, $STAGING, is a private staging area for a given user. It can have any name but we have given it the name $HOME/Electra for concreteness here. Normally its subdirectories contain pointers (symlinks) to read-only files in the corresponding subdirectories of /software/Electra_src_tree. When these files are staged real writable copies replace the symlinks. The user can then build modified versions of executables in their staging areas without affecting installed executables and other users. The RCS revision control files within each directory are common to all users, however, and locks are used to prevent two users staging the same file at the same time.
The third (partial) tree is a public staging area. It is like the private staging area, but is used for sharing built versions of the code, e.g. libraries etc.
The normal development process is to develop and test code in the private staging area, and once it has been tested to install the built versions in
/software/Electra
and the source code in
/software/Electra_src_tree.
The /software/Electra/bin directory contains executables, the /software/Electra/lib directory contains object libraries and configuration files for the built code. Likewise the /software/Electra/include directory contains common include files which are shared between packages.
The files in /software/Electra/{bin,lib,include} are the
stable versions of these files, while those under the developers' private trees are developmental versions. Compiler search paths for include files and library files may search the user’s private directories first, and then the relevant /software/Electra directory. In this way, a developer first picks up the versions of the code s/he is currently developing in preference to the stable version. Clearly though, once development of a package has been completed, the private directories should be cleaned out. This allows the latest version to be picked up from /software/Electra, which is helpful if someone else later updates the code. Similar comments apply to the order of bin directories in the shell PATH variable.
There are several scripts to aid the development process. These scripts are installed in the /software/Electra/bin directory, and the originals are in /software/Electra_src_tree/tools (and can themselves be accessed from, and staged into, $STAGING/tools). The functions of the two most commonly used scripts are as follows:
bcs_mkdirs |
Sets up the directory tree for a developer |
bcs_publish |
Checks in source files which have been checked out, and
updates the baseline copy of the files. Used to “publish” any updated
versions of the source code in a given tree/subtree. |
The /software/Electra_src_tree/config directory contains files which are included by makefiles to configure the make process. These files can be accessed from $STAGING/config.
The /software/Electra_src_tree/scripts directory contains startup scripts for setting up the development process and also for starting daemons which are used at runtime. These files can be accessed from $STAGING/scripts.
Historical note: the /software/Electra_src_tree/docs directory tree contains some early documentation about the Electra system. It is mostly in the form of LaTeX source files. A makefile in these directories converts these files to HTML and installs them in an HTML tree.
All the rest of the subdirectories of /software/Electra_src_tree, and the corresponding directories of $STAGING/ are C and Python source code trees. These trees contain by convention subdirectories libsrc and appsrc to hold code for building libraries and executables respectively.
Here is a recipe for how to develop a new package. Skip the stages you have already done as necessary.
1.
Set up your login files. Edit your .cshrc
file to include the lines
setenv BASELINE /software/Electra_src_tree
setenv STAGING $HOME/Electra
source /software/Electra/bin/setup.csh
Then log in again or re-source your .cshrc.
You will obviously have to adjust this process if you do not use csh or tcsh.
2.
Create a private staging area using
mkdir $STAGING
Type bcs_mkdirs. This command
should be used any time someone else has created new directories in the
baseline.
3.
Make a new subdirectory if a totally new package is being
made. Type
cd $STAGING
mkdirhier myPackage/appsrc
cd myPackage/appsrc
register_file Makefile
cp ../../c40Comms/appsrc/Makefile .
This will create the directory and make a mirror copy in the baseline. The
example shows stealing a makefile from another directory as the starting point.
This should be edited to suit the package being built.
4.
Register any other new files with the BCS system e.g.
register_file mysrc.c mysrc.h
Edit files in this area, and compile and test them.
5.
Once the files at least compile, the source files can be
checked into the RCS system for version control
bcs ci -l mysrc.c mysrc.h
6.
When the files compile and have been tested, install the built
files in the (public) baseline directory and put the latest versions of the
source files into the baseline.
gmake install
bcs_publish .
7.
There is a (little-used) process to make a release:
cd ~/Electra
bcs_tag_tree myPackage
This will tag all the RCS files in the myPackage tree with a tag of
myPackage1. The next time you do this the files will be tagged with myPackage2
etc.
The c40 DSP processors are described further below. For now it suffices to note that they are the main processors for actual real-time operations.
By convention, c40 executables are denoted by the .x40 suffix and the object files are denoted by the .o40 suffix. This allows two versions of a given program, one running on c40s and one on workstations, to be built in the same directory.
In a directory where c40 programs are to be built, the makefile, like all Electra makefiles, should include the line
include $(STAGING)/config/Electra.mk
A target build line might look something like
c40Echo.x40 : c40Echo.o40
$(C40_CC) $(C40_LDFLAGS) -o $@ $^ \
-L$$STAGING/lib -L/software/Electra/lib \
-lGPmsg.lib -lUtil.lib
The c40_cc program is a front-end to the TI C40 compiler that makes it appear much more like a standard Unix compiler.
The library files shown in the make recipe provide GP messaging and utility functions (currently only the exeception-handling facilities). The c40_cc compiler front end automatically includes the C run-time system.
After setting up a user account for NAOMI (ELECTRA) RealTime software the STAGING environment variable will be defined (generally as /home/user/Electra). The setup procedure should also have produced a subdirectory (amongst others) called:
${STAGING}/RealTime/ - base directory for real-time source code
This directory in turn has the following subdirectories:
appsrc/ - real-time workstation client applications (low level engineering level)
pythonModules/ - workstation python client support libraries
libsrc/ - c40 libraries
WFS/ - c40 WFS ring code
StrainGauge/ - c40 Strain Gauge ring code
The following file in the RealTime/WFS directory are the ones which normally need to be edited to alter real-time WFS processing behaviour or add new control parameters.
AlgNaomiInterleave.c |
Contains the WFS centroid estimation algorithm |
AlgMartini.c |
Contains the WFS reconstructor and tip-tilt-piston calibration |
NaomiGenericBSP.c |
Hosts AlgNaomiInterleave. This is the main function of NaomiGenericBSP.x40, which loads onto all the C44s in the WFS ring apart from the mirror CPU (GP number 4). |
NaomiMirrorBSP.c |
Hosts AlgMartini. This is the main function of NaomiMirrorBSP.x40, that loads onto the mirror CPU (GP number 4) of the WFS ring. |
Makefile |
makefile (uses ../Makefile) |
AlgSGadc.c |
Strain Gauge reading, calibration and servo algorithm |
AlgSGmirror.c |
Mirror output algorithm |
AlgSGtimer.c |
Strain Gauge ring timer algorithm |
SGBSP2.c |
Hosts all the above algorithms. This is the main function
of SGBSP2.x40 that
loads onto all the C44s in the strain gauge ring. |
Makefile |
makefile (uses ../Makefile) |
The ${STAGING}/SharedInclude directory contains ‘master’ python files which are used to generate C and python include files. These ensure that workstation and C40 programmes have the same correspondence between C enum types (and python strings) and command-identifying numbers in GP communication packets.
ParameterBlocks.py - is used to identify new parameter block transactions
Makefile – gmake include will regenerate the include files and install them
The required NAOMI real-time processing functions are:
The Texas Instruments TMS320C40 Digital Signal Processors (DSPs) was selected as the main processor for implementation of the real-time control system for NAOMI. It has the following features:
Note that the TMS320C44 is actually used in the NAOMI control system. This differs from the c40 by having 4 instead of 6 communications ports. A full c40 is used for the diagnostics processor.
The figure below shows the connections of the c44 communications for NAOMI. Interprocessor connections and external connections are indicated, as are the assigned GP numbers for each processor.
16 of the c40s have their communications ports interconnected so as to form two rings of eight processors each. One ring is used for the WFS algorithms and one ring for the Strain Gauge processing. The figure below shows how the c40 rings connect to the external sensing, actuation and diagnostics systems.
Together with the software architecture described below, the ring structures enable the real-time processing requirements to be fulfilled with fewer DSPs than with an equivalent farm architecture. The development overheads are also reduced.
The WFS ring contains 8 c44 processors (GP numbers 1 to 8). Two are connected via communications ports and interface casrds to the data output from the “Steward” parallel output ports of the SDSU controllers for the WFS CCDs. The interface cards, produced by Durham, buffer the incoming data and respond to synchronisation and control signals from further communications part which carries command from the c40s. P. Clark describes the interface cards in the NAOMI Hardware Reference Manual.
The 8 processors of the WFS ring are hosted on a single VME card: a Blue Wave Systems (UK) DBV44 card. This card is a motherboard, which hosts 4 TIM processor daughtercards, each of which carries 2 C44s. The DB44 card routes communications ports to the VME P2 bus as well as to the panel front, and it is P2 ports which carry the WFS interfaces. Panel front connections are used for the communications ports that link the WFS ring to the strain gauge ring (one port) and to the diagnostic CPU.
The DBV44 has Link Interface Adapters (LIAs) which connect some of the communications ports to the VME bus, where they appear as memory mapped registers. The four LIAs are used to provide communications with the VME host processor (see below) and thence, via Ethernet to the outside network.
The Strain Gauge (SG) processing ring is designed to be as
similar possible in both hardware and software to the WFS ring (above). It too
consists of 8 C44 processors (GP numbers 11 to 18) hosted on a DBV44 VME card.
In this case the sensor data come from the strain gauge Analogue to Digital
Convertors (ADCs). The ADCs are contained in 3 VME cards produced by Pentland,
UK, which are specialised in that they have C40 communications port outputs.
There are 256 16-bit ADC channels altogether and each of them is capable of sampling
at 85kHz and delivering the data via a c40 communications port. The
configuration of these ADC cards is done via the VME bus from the c40Comms process on the VME host processor
(see below). They are configured to deliver the digitised data via 6 communications
ports to 6 processors (the ADC CPUs) of the strain gauge ring: 2 ports carry 64
strain gauge channels each and 4 ports carry 32 strain gauge channels each. The
readout order is a little complicated and downloadable tables are used by the
c40s to reorder the data. The connection to the
processors is organised so that the 2 64-sample channels could have half of
their respective data transferred to unloaded
neighbour C44s in order to achieve a better load balance of 32 channels
per processor (future upgrade).
An output port from one of the C44s (the timer CPU) is connected to a Durham interface card, which produces a trigger signal for the ADCs. Therefore the SG ring must provide its own source of interrupts, as it is responsible for the conversion trigger to the ADCs. This is in distinction to the WFS ring where the SDSU controllers free-run and provide the interrupts.
The output from the SG ring is carried by a panel front communications port to a Durham DAC interface card and thence to the Durham DAC rack. A synchronisation signal is carried to the same interface on a separate communications port. There are 256 13-bit DAC channels of which 228 are used by the DM. Two DAC channels (30 and 31) are used for the FSM and are connected to the Zeiss (Jena) driver rack. The DM analogue signals go to the Durham drive amplifier rack.
Like the WFS ring, the SG ring uses its LIAs to communicate with the VME host processor. Likewise it has a panel front communications port connection with the diagnostic CPU.
The diagnostic CPU is a C40 (GP number 9) located on a Blue Wave Systems (UK) DBV46 VME card. The CPU is one of two fitted as standard to the DBV46. The other CPU (GP number 10) is spare capacity. The DBV46 can also host TIM cards carrying additional processors but none have been fitted.
The DBV46 has dual-ported memory, which is read/write accessible both from the C40s and from the VME32 bus. It is this channel that is used for downloading diagnostics data to the VME host.
Unfortunately the DBV46 has no LIA connections and therefore must be booted indirectly from one of the WFS ring CPUs via its front panel communications port connection.
The current VME host is a Force VME card carrying a SPARC 5V processor. It runs the Sun Solaris operating system and can therefore mediate between the C40s and the external network. It has its own disk and is an autonomous computer. A console may be attached if required via an RS232 connection. Normal communication goes via a 10BT Ethernet port.
For mostly historical reasons this processor currently carries the C40 cross-development software. The reason is that one of the C40 development tools, the DB40 debugger, must run on this computer in order to access the JTAG scan chain via the VME bus. This debugger is only used very rarely now and there is no fundamental reason why this computer should continue to host the other C40 cross-development tools.
The current Force card will probably be replaced with an updated one based on an UltraSPARC processor. Such a card will be fitted with a 100BT Ethernet port.
The control and monitoring processes of the remainder of the NAOMI control system may be distributed anywhere on the Internet, at least in theory. Some of them have been ported to SGI and Linux hardware, for example. In practice they are run on the NAOMI workstation, navis, a dual processor UltraSPARC system.
The software architecture fulfils the overall requirements given in the hardware architecture section (above) and within the constraints imposed by the selected hardware (above). It is important to also bear in mind the history of the design (above), i.e., that the NAOMI software is derived from the Electra software and that the Electra requirements were in some respects a superset of those for NAOMI. Some software features are therefore not strictly traceable to NAOMI’s requirements. These features generally take the form of additional flexibility and diagnostic capabilities. An example of additional flexibility is the ability to perform synchronised switching of interrupt processing algorithms as well as just run-time parameters.
One important example of an Electra-specific feature is a code design that allows the WFS data from a given CCD to be transmitted as separate quadrants to more than one CPU. This is indeed partially exploited by NAOMI in its support for more than one WFS CCD. Also some per-quadrant processing is really necessary because of the quadrant dependence of the bias and the (interleaved) read order. It would therefore be difficult to untangle in retrospect how much of the per-quadrant processing is really superfluous to NAOMI’s requirements.
The software User Requirements Document for NAOMI as a whole lists various direct requirements for real-time processing and indirect ones for control, status display and visualisation. These are effectively covered by the requirements given in the above hardware architecture section and the Electra capabilities. Note, however, that some requirements, for example, 3D diagnostics, though met in general by Electra’s capabilities, have not yet been commissioned for NAOMI use due to time constraints and their low priority in practice.
In developing a parallel real-time processing system, a very major issue is controlling the development time required to generate a stable structure and to subsequently maintain and enhance it. Following studies of parallel operating systems and an experimental evaluation of one proprietary language, it was decided to adopt selected aspects of the Bulk Synchronisation Parallelism (BSP) methodology developed by Oxford University, UK. The aims of this methodology are ease of development and stability. Its adoption has been successful. Consequently there is no proprietary code within the real-time software. The aspects of BSP not adopted have normally been omitted for obvious reasons: e.g., random process placement would not work with fixed interface connection nodes.
The BSP methodology requires that all processes wait to communicate until a global synchronisation step. All processes wait for this step to complete regardless of whether they have anything to transmit or receive. The step is not complete until all interprocess communications have finished. The global progress of the parallel processing system is therefore divided into supersteps by these synchronisation barriers. The figure below illustrates the general idea.
The advantage of all this is that it makes parallel software development much easier in practice. Each process ‘knows’ that if it is in superstep n then every other process must be in superstep n too. It also ‘knows’ that the data available to every other process is precisely that which is dictated by its being in superstep n. The developer can safely program the message passing of any process at any stage knowing the states of all other receiving and transmitting processors. Furthermore, if an error were reported by process X whilst in superstep m and this were thought to be due to the activities of another process, then the developer can ascertain, simply by inspecting code, the program and data states of all other processes within the system at the time of the error to within the resolution of one superstep: they will all be at their respective superstep m. Without such a scheme, complex and time-dependent webs of interprocess dependency can develop.
In practice, the BSP methodology is applied separately to the two rings of C44 processors, and then, in fact, only to the software within their Interrupt Service Routines (ISRs). The use of ISRs for the time-critical processes is pretty well dictated by the requirement that diagnostic/visualisation/logging activity should have no effect on the timing or dataflow within the latency-critical processing. We do not have multi-ported memory on each CPU so there must be C44 involvement in diagnostic/visualisation/logging activity. If this activity is not to affect time-critical processing then it follows that it must be interrupted by signals indicating the arrival of new real-time data. In the case of the WFS ring, the signal is the arrival of a new frame of WFS data. In the case of the SG ring, the signal is the arrival of newly digitised strain gauge data from the Pentland ADCs. There is of course the issue of processor interrupt latency but the effects of that are effectively eliminated by the use of DMA transfers for the interrupting WFS and SGS data, combined with the “cowcatcher” system (below).
The diagnostics/visualisation activity effectively forms a background activity and clearly needs to use an interruptible, and therefore asynchronous, communications system to exchange data with external workstation processes. For example a WFS processor sending pixel data to a workstation GUI for display must be interrupted mid-message by the arrival of new pixel data coming from the CCD controllers. The physical medium for this background communication is via the network of interconnecting communications ports. An asynchronous message-passing protocol is therefore required. This is called GP for General Purpose and is described in detail by David Buscher’s document. It is called “General Purpose” because it is also used for transmission of command and status information to/from the C40/C44 processors. Such a protocol is clearly well suited to retrieving status data and can also be used for sending commands to the C44s provided there is some means of synchronising the actual changes in real-time parameter states on different processes. This requirement is fulfilled by the transaction system, which is described below.
GP message packets originating from a C40/C44 processor are forwarded from processor to processor according to a destination address embedded within the packet. Other internal data fields, as described below, further identify their contents. Diagnostic data packets arriving at the diagnostic C40 (GP number 9) are placed in a shared memory buffer, ready for transmission via the VME bus to the VME host. Other packets typically travel via LIAs to the VME host. Either way, the server process, c40Comms, running on the VME host, embeds the GP packets within appropriately addressed TCP/IP packets for onward transmission via the internet. The c40Comms process also performs a reverse procedure, extracting GP packets from TCP/IP wrapping and forwarding them via LIAs into the C40/C44 network.
The ISR code cannot use the GP system directly for
interprocessor communications. Firstly, this is because GP is not synchronous
and would not meet the processing latency requirement. Secondly, in the context
of an ISR the GP system will have been interrupted by the ISR code and such a
protocol could hardly be made re-entrant without a significant loss of
efficiency. A second, synchronous, interprocessor communications protocol, and
one which embodies the Barrier Synchronisation idea, is therefore required for
use by the ISRs. The unit of this protocol is called a LoveTrain©, primarily for
memorability.
LoveTrains use the ring of interconnections of the C44s to broadcast information between them. Each processor in the ring sends its own output information for broadcast (if any) to its downstream neighbour and then copies information from its upstream neighbour to downstream. The processor’s output information is therefore copied right around the ring to the processor immediately upstream, which does not copy it further (i.e., information does not return, redundantly, to its origin). The LoveTrain implements the BarrierSynch because communication is global and cannot finish until all processes have started communicating. The expected quantity of data to be sourced, copied and removed by each processor is coordinated near the beginning of each ISR. This is done by an initial, special BarrierSynch which uses the first LoveTrain after the cowcatcher to broadcast the anticipated number and size of LoveTrain contributions from each processor. The final BarrierSynch of each ISR is also special. It contains the StopMessage, which is used to decide if RealTime processing should cease at the current interrupt. The StopMessage also implements the transaction system, whereby real-time parameter changes, which have been scheduled on several processors, all become effective at the next ISR.
The communication port links used by the ISRs to transmit LoveTrains can also be used by the GP system outside of the ISRs. This is achieved by insisting that the two protocols travel in different directions on the bi-directional links. This is possible because each link direction has its own FIFO buffer system.
The General Purpose Message system (GP) allows asynchronous communications between any 2 C40/C44s and between a C40/C44 and an external processor. This is achieved by a simple message passing and forwarding system within the C40 network and by wrapping the GP messages in TCP/IP packets (accessed using the DTM library) for transmission via the internet.
David Buscher in “The GP Messaging Library” deals with the GP system in detail. Here we summarise a few key features.
GP messages consist of fixed-length header and a variable length body. The header fields include a length and source and destination address fields. The body carries most of the actual data and can be up to 3300 32-bit words long. This is the C declaration for the header structure:
typedef struct {
int32 length; /* Length of
the entire packet in words -1 */
uint32 protocolID; /*
Synchronisation word */
uint32 destAddress; /*
Destination machine port address */
int32 hopCount; /* Incremented on every hop - used to trap
messages
which never reach
their destination */
int32 command; /*
Command/acknowledge verb */
int32 sequenceID; /* Unique
message tag for multiple
messages of the same type */
uint32 replyAddress; /* Reply_to
machine + port address */
int32 arg1 /* Optional
command argument –
pads to 8 words */
} GPmsgHdr;
The replyAddress field is used so that a reply can be sent if appropriate. The 32-bit address fields have the following format:
For a
C40/C44 address, the port number is always 0, whilst the CPU number is the GP
number assigned to the CPU at boot time. The C40s/C44s in NAOMI are numbered 1
to 18 with the c40Comms process on the VME host having the special number 0.
For a workstation process address, the CPU number is always 0. So, as far as a C40/C44 routing such a packet is concerned, the correct thing to do is to forward the packet to the c40Comms process on the VME host. This is the correct behaviour. The port number is non-zero in this case and this identifes to c40Comms the final destination of the packet on the internet. The c40Comms process maintains a table translating port numbers to DTM ports. DTM ports are identified by strings, which include both an IP address and an IP port number or symbolic name. When a client workstation process opens its communications with c40Comms its first action is to request (via a GP command) that c40Comms make an association between the client’s DTM port for GP replies and a new GP port number. The client sends the DTM port name as an ‘argument’ to the GP command and receives the generated port number as a reply. It can then use this port number as a reply address in the header of subsequent GP commands directed to c40 CPUs. The GP commands from workstation clients are wrapped in DTM packets addressed to c40Comms, which then extracts the GP command and forwards it via an LIA to the C40 network. The reverse process happens to a a reply GP message from the C40s to a workstation client. In this case c40Comms wraps the GP message in a DTM packet addressed to the DTM port corresponding to the GP port number in the destination field of the GP packet. It is also possible for a workstation client to request a mapping from a port number to a DTM port attached to a “third party” workstation process. This is intended for directing diagnostic GP packets to a display or logging process. In practice, however, many processes request their own diagnostic packets.
GP command headers contain a command field, which, like the destination and reply addresses, is also divided into two sub-fields. In this case the fields are a command class and an actual command indentifier. The class field identifies to a standard C40 program which GP callback function it should invoke in order to further identify and process all commands of that class. An example is found in the file
${STAGING}/SharedInclude/c40RealTimeCommands.py
This contains the python statements which define the python strings associated with a new class and its commands. The installation process uses these statements to generate a C include files which contains enum statements that make the same association.
The Data Transfer Mechanism (DTM), developed by the NCSA, is the system that is used for message passing. It is layered on the TCP socket libraries and provides a relatively simple message passing API for use between (potentially remote) processes. The DTM API allows the destination port to be named either via an IP address/port (which can be in symbolic or numeric form) or using a name which is not known to the TCP/IP dns system. Such names are translated using a Nameserver process, which also allows names to be registered (by association with IP address/ports). One Nameserver process can serve several machines. All the machines which share the Nameserver must “know” the IP/port address on which it can be contacted.
DTM communication always begins with a freeform text header message followed by a number of binary data messages. The header generally serves to describe the format and contents of the data packets. Although the header is freeform in a sense, its contents must normally be parseable in order to achieve automatic data description. Several simple conventions and code for header interpretations are distributed with DTM can some of these are employed with NAOMI. GP messages are very straightforwardly copied into DTM data packets as described above. DTM is also used for communication between workstation applications and display tools and these re-use some of the DTM header conventions. New-style commands are sent over DTM as python embedded into the data packet with “meta data” in the header describing the reply and acknowledge addresses. The Sequencer processes use this system. The sequencer and associated EPM (Electra Process Monitor) are used to coordinate control of NAOMI systems at the high level (for example using TopGui).
The following C statements form a typical main C40 programme configured to use the GP system.
GPsetup(status);
GPaddCallback(RT_CLASS, &RTcallback, status);
GPaddCallback(MIRROR_CLASS, &MirrorCallback, status);
GPmainLoop(status);
The first statement sets up the GP system whilst the next two associate call-back C functions with command classes. Finally the programme enters the GPmainLoop where it remains until the CPUs are reset. The RT_CLASS contains the special commands which configure ISRs, and the transaction system whereby Parameter Block data may be communicated between ISRs and the GP system. The status structure pointer is used to track errors. By convention each function will return immediately if an error status has been set.
Basic C40 programs are normally set up as simple message-driven programs. The actions of the GPmainLoop consist of waiting for a message, decoding the header, performing whatever action the header specifies, and returning a reply or error message. The program then loops back and looks for another message.
The convention is that all messages sent to port PORT_BOOT are handled by calling the BootServices function. This is essential for the messaging system to function properly.
All other commands are handled in the appropriate callback. The callback contains a switch statement which decodes the command verb and performs the appropriate action. This processing frequently involves extracting additional argument data from the message buffer. By convention, all commands put the reply in the same message buffer used to receive the message, and indicate the length of the reply by adjusting the header appropriately. At the end of the switch statement, control is returned by the callback to GPmainloop and any reply in the message buffer is sent back to the workstation, with the header.arg1 value set to zero to indicate success, and the header.command value incremented by GP_ACK to indicate which command is being acknowledged.
If an error is encountered, using the ExcRaise macro will store an error message in the status structure (defined in exception.h). At the end of the switch statement, if an exception has been raised, a reply with a non-zero header.arg1 value is sent to indicate an error. The reply contains the error message in its body, in case the receiving program wants to print it out. In addition, the error message is sent to the C40_STDERR port. If an error message printing process is listening on that port, it will print it for the user. (A fprintf(stderr,..) statement on the C40s will also cause a message to be printed by any such error-logging processes.)
Interrupt Service Routines (ISRs) perform all latency critical data processing and are the only context in which a Barrier Synchronisation can take place. They are therefore the only context in which the components of a LoveTrain can be transmitted, copied and received. ISRs are normally invoked by an external interrupt signal. In the case of the WFS ring this will be ultimately caused by the arrival of a new frame of WFS data. In the case of the SG ring the ultimate cause is a timer interrupt. However, even though these are the ultimate causes of ring-wide interrupts, most of the CPUs in the ring are actually interrupted by communications port traffic from the neighbour (upstream) CPU in the ring. This is because only the timer CPU is actually interrupted in the SG ring case and only the WFS frame reception CPU or CPUs are interrupted in the WFS ring case. The interrupted CPUs then begin to send data to their neighbours which then in turn begin to send data to their neighbours and so on. The CPUs are armed to interrupt on receipt of communications port activity so this has the desired effect of eventually putting all of the ring CPUs into their ISRs. However if we relied upon waiting for the first processed data output from the interrupted CPUs to accomplish this, we would have the unfortunate side effect that we would then have an additional wait whilst the interrupted CPU actually arrived in its ISR code. This delay, the interrupt latency, is in-part a hardware delay caused by the processor saving the interrupted code context on the stack, and partly a software delay caused by the execution of the ISRwrapper assembly language code which prepares for execution of a C function and ensures that certain global data may be accessed from the ISR. Incurring these delays in a system-critical fashion as each CPU in the ring were sequentially interrupted would be most undesirable.
The deleterious effects of most of the interrupt latency are overcome by arranging for a high degree of concurrency. Each interrupted CPU, whatever the cause of the interrupt, immediately sends to its downstream neighbour a single word on their connecting communications port. This has the effect of interrupting the neighbour promptly so that it will almost certainly have completed its interrupt latency before useful data could be sent to it. Because these interrupting words precede the first LoveTrains carrying operational data they are dubbed “cowcatchers” for memorability. The interrupt latency of the WFS ring is concurrent with the arrival and decoding of the WFS header and the processing of the first row of centroid data. The interrupt latency of the SG ring is concurrent with the conversion time of the SG ADCs. The ADC conversion is initiated by the SG timer CPU immediately on its (timer) interrupt.
The LOG(val) macro records a programme file, programme line number, frame number, time, and an integer argument val in a circulating buffer. This may be retrieved at any time and is useful for tracing code execution and for obtaining time-synchronised data samples simply. The command RT PrintDebugLog nCPU will print logging data from CPU (GP) number nCPU. Care must be taken when examining these logs as the circular buffer will typically contain output from several ISR interrupt frames and will have recycled and overlapped to a seemingly arbitrary point at the time of download and printing. A careful examination of the frame numbers and times for the lines in the log output will reveal which is the most recent log record.
When an error is detected within an ISR the standard action
is to invoke the C macro Panic(). This causes the
following actions:
Because of this last action then no other CPUs in the same ring can pass further BarrierSynchs, because no LoveTrain can pass the Panic’ed CPU. Ring-wide real-time activity therefore halts at this point. Note, however, that the panicking CPU has not crashed. It can still perform GP processing in its foreground task, including displaying its Panic status message, normally in response to the WFS Status and SG Status commands for the respective C40 rings.
Barrier Synchronisation is both the method by which the supersteps of ISRs within a CPU ring are synchronised and also the method by which they communicate. Each CPU which participates in a Barrier Synchronisation sources a certain amount of data to its downstream neighbour, copies a certain amount of data from its upstream neighbour to its downstream neighbour, and sinks a certain amount of data that has already been all the way around the ring and would, if it were copied further, be returning (inefficiently) to its originator. This ring-wide movement of data, collectively constitutes a LoveTrain.
For efficient operation, careful checking that the expected quantity of data arrives and does not cause a timeout or overrun is not carried out for every LoveTrain. Instead the each CPU publishes a plan of its subsequent LoveTrain activity at the start of the ISR. The exchange of plans is carried out with careful checking. Each CPU then compares all received plans with its own intentions and if they are inconsistent, panics.
A typical plan is defined by the following C statement:
static BarrierSyncPlan myPlanWFS[]
= {
{ 5, NUM_SUBAP_X*2*NUM_CCD_PORT,
NUM_SUBAP_X*2*NUM_CCD_PORT, 0 },
{ 1, 3,
3*NUM_RING_CPU, 0 },
{ -1, -1, -1, -1 }
};
Each line of the plan specifies a set of similar BarrierSyncs that the CPU intends to perform in turn. The first field specifies the number of similar BarrierSyncs and the next field specifies the number of 32-bit words which the CPU plans to source (i.e., to add to the LoveTrain) at each of these BarrierSyncs. The second field contains the total number of 32-bit words it anticipates there to be in each LoveTrain. The final field is always initially zero but is set during a successful exchange of plans. The final line of the plan simply signals that this is the end of the plan, but the penultimate line is more interesting. This always specifies the exchange of a StopMessage, which is always the last LoveTrain to be exchanged in each ISR. It establishes if real time activity is scheduled to stop on completion of the current frame, or if a change of real-time parameters or algorithm has been scheduled ring-wide at the next interrupt. This last function implements a key part of the Transaction system (see below).
The plan is exchanged using the C function
BSPbegin(myPlanWFS, ISRglobals.status);
where the plan array forms the first argument. The BarrierSyncs themselves are carried out by the following function:
BarrierSync(iSuperStep,(int *)centroid,
NUM_SUBAP_X * 2 * NUM_CCD_PORT,
0);
The first argument is the superstep counter (which is incremented by the invoking program after the BarrierSync), the second is a pointer to the LoveTrain buffer, the third is the number of words to source and the fourth is the number of words to copy.
The StopMessage is exchanged and acted on by the following code fragment:
/* Transfer stop frame info around the ring - barrier synchronisation */
if (StopMessage(iSuperStep))Panic();
iSuperStep++;
/* Do algorithm shuffle */
if (SwapAlgorithm())Panic();
LOG(0);
Note that some older ISR code uses a direct call to BarrierSync to deal with the StopMessage.
Parameter block transactions implement the synchronised changing of real-time parameters and even algorithms ring-wide. Essentially a set of changes can be queued up in advance on each CPU in a ring and then made active simultaneously on a particular interrupt. The queuing of changes is accomplished by the GP commands of the RT_CLASS and can be activated by either C or Python workstation programs but is most elegantly implemented in python, where a single python function call can carry out a very complex ring-wide transaction.
In order to be available to the standard function RTCallback, which processes the RT_CLASS commands, each C40 program must include a code fragment along the following lines in its main() function:
/* Table to hold the set of
available algorithms. Used in RTcallback()
* to define the real-time algorithms available in this executable.
*/
extern struct AlgorithmMethods
SGstarterMethods;
extern struct AlgorithmMethods
SGmirrorMethods;
extern struct AlgorithmMethods
SGadcMethods;
extern struct AlgorithmMethods
SGtimerMethods;
const struct AlgorithmMethods
*algorithmMethods[] = {
&SGstarterMethods,
&SGmirrorMethods,
&SGadcMethods,
&SGtimerMethods,
NULL /* Required to mark the end of the table */
};
The standard array of pointers to AlgorithmMethods structures, algorithmMethods, establishes a global record of available algorithms and their associated parameter manipulation functions. This is available to RTcallback which is then able to invoke particular methods (functions) according to RT_CLASS command parameters. The methods themselves are defined externally to the main program and are in fact, most conveniently, defined in the same program files as the ISR algorithms themselves. Consider the following example from an algorithm C file (the names of these files conventionally begin with Alg):
static
void ISR(void); /* The interrupt
service routine */
static
Algorithm *Create(const Algorithm *, ExcStatus *);
/* Create/copy an
algorithm instance */
static
void Destroy(Algorithm *, ExcStatus *);
/* Release resources
used by an instance */
static
void SetParameters(Algorithm *, int32, int32 *, int32, ExcStatus *);
/* Set/alter the
parameters of an instance */
static
void GetParameters(const Algorithm *, int32, GPmsgBuffer *,
ExcStatus *);
/* Return a instance
parameter set in a binary format */
static
void PrintParameters(const Algorithm *, int32, ExcStatus *);
/* Print parameter set
values to stderr */
struct
AlgorithmMethods SGadcMethods = {
ALG_SG_ADC,
"$Id: AlgSGadc.c,v 1.21 2000/05/24
10:11:59 rmm Exp $",
&ISR,
&Create, &Destroy,
&SetParameters,
&GetParameters, &PrintParameters
};
Note that the AlgorithmMethods structure definition includes the following:
The functioning of the algorithm methods themselves is dealt with in the next section below. The principal commands of the RT_CLASS are summarised below:
RT_INIT – initialises the real-time transaction system
RT_STATUS – enquires about the status of the ISRs and transaction system,
RT_START – starts interrupt processing
RT_STOP – stops interrupt processing
RT_BEGIN_TRANSACTION – obtains a transactionID from the ringleader
RT_END_TRANSACTION – instructs the ringleader to release a transaction
RT_BREAK_TRANSACTION – aborts and unlocks a transaction setup
RT_SET_PARAMS – invokes SetParameters for a named algorithm ID
RT_GET_PARAMS - invokes GetParameters for a named algorithm ID
RT_PRINT_PARAMS - invokes PrintParameters for a named algorithm ID
RT_GET_ALGORITHM_VERSION – Gets the Algorithm RCS ID
Workstation programs may invoke these commands either directly with the GP rpc (remote procedure call) function which is available in both the C and Python versions of the workstation GP support libraries, or in the case of python workstation programs, they will probably chose to invoke them via the GPtransaction library.
The actual mechanism of a transaction is as follows:
The ParameterBlock section Ids identify a particular parameter for replacement or retrieval. Many are actually arrays of variables. The section IDs are defined in Electra/SharedInclude/ParameterBlocks.py and are summarised below. A PB_SG_ prefix indicates an SG ring CPU parameter block section. All others are WFS ring CPU parameter block sections.
PB_FLAT |
unused |
PB_WFS_OFFSET |
RAL
WFS offset |
PB_SEG_GAIN |
WFS
segment TT gains (X,Y) |
PB_DECIMATE |
WFS
centroid and pixel diagnostic decimation values |
PB_RECON_GAIN |
usused |
PB_LOCK_DAC |
open/close
WFS loop |
PB_CCD_TEST |
WFS
test mode |
PB_FRAME_DELAY |
WFS
test mode rate |
PB_SG_TIMER_INTERVAL |
SG
sample interval (in 66ns clocks) |
PB_SG_DEMAND_PORT |
identify
demand communications port for SG ring |
PB_SG_TIMER_TRIGGER_PORT |
identify
timer trigger communications port |
PB_SG_MIRROR_DATA_PORT |
mirror
data output port |
PB_SG_MIRROR_SYNC_PORT |
mirror
sync output strobe port |
PB_SG_ADC_PORT |
SG
ADC data input port |
PB_SG_ADC_BLOCK_SIZE |
size
of SG ADC data block for this SG CPU |
PB_SG_ADC_CAL_GAIN |
SG
ADC calibration gain vector |
PB_SG_ADC_CAL_OFFSET |
SG
ADC offset gain vector |
PB_SG_ADC_SERVO_GAIN |
SG
servo loop gain |
PB_SG_PASS_THROUGH |
mirror
actuator flags controlling SGloop |
PB_SG_ADC_REORDER_TABLE |
vector
indicating SG ADC channel order |
PB_SG_INITIAL_DEMAND |
mirror
demand received from WFS ring |
PB_SG_DAC_REORDER_TABLE |
table
for reordering LoveTrain actuator values |
PB_SG_WAVEFORM |
test
actuator waveform |
PB_SG_CAPTURE |
SG
rapid sample diagnostic mode control |
PB_SG_SNAPSHOT |
SG
synchronised sample diagnostic mode control |
PB_SG_ACCUM_ZERO_HOLD |
SG
open/closed loop (normally use PASS_THROUGH) |
PB_SG_DIAGNOSTIC |
|
PB_CENTROID_WEIGHT |
X
and Y pixel weights for centroiding |
PB_SG_TABLE_OFFSET |
identify
block of SG ADC data |
PB_MATRIX |
Reconstructor
matrix |
PB_MATRIX1 |
Reconstructor
section |
PB_MATRIX2 |
Reconstructor
section |
PB_MATRIX3 |
Reconstructor
section |
PB_MATRIX4 |
Reconstructor
section |
PB_TT_FLAT |
Zero
level for global tip-tilt |
PB_TT_GAIN |
Gain
matrix for global tip-tilt |
PB_SG_ADC_SNAPSHOT_DECIMATE |
frame-wise
decimation for SG ADC snapshot (all channels) diagnostics |
PB_SG_ADC_CAPTURE_DECIMATE |
buffer-wise
decimation for SG waveform (single continuously-sampled channel) diagnostics |
PB_SG_DAC_SNAPSHOT_DECIMATE |
frame-wise
decimation for SG DAC snapshot (all channels) diagnstics |
PB_XYZ_TO_ABC |
Geometry
matrix for segment XYZ algorithm |
PB_I_AM_A_DUMMY |
Set
into dummy interrupt mode |
PB_PIXEL_TIMEOUT |
Timeout
for waiting for pixels (in clock ticks) |
PB_FLUX_MEMORY |
Decay
constant for flux low-pass filter - zero is no memory, unity is no learning |
PB_SG_SYNCHRONISE |
set
synchronisation to WFS ring demands |
PB_SG_FEEDFORWARD |
send
new demand deltas direct to the mirror |
PB_SEGMENT_TILT_LIMIT |
Limit
of tilt during closed-loop operation |
PB_BACKGROUND_WEIGHT |
Pixel
weights for background estimation |
PB_QUICK_PISTONS |
Compute
pistons immediately after the tilts |
PB_CENTROID_BIAS |
Bias
for x, y and flux sums |
PB_LONG_WFS_OFFSET |
Full-frame
WFS offsets for use with NAOMI |
PB_LONG_CENTROID_BIAS |
Full-frame
WFS biases for use with NAOMI |
PB_SDSU_CURRENT_CAM |
Sets
current UNSYNCHED loop controlling camera - for use with NAOMI |
PB_ACCEPT_FRAMES |
Sets
WFS CPU to accept SDSU frames |
PB_ROUTE_CENTROIDS |
Sets
WFS CPU to route SDSU centroids to mirror |
PB_ROUTE_PIXEL_DIAGS |
Sets
WFS CPU to route SDSU pixel diagnostics |
PB_ROUTE_CENTROID_DIAGS |
Sets
WFS CPU to route SDSU (direct) centroid diagnostics |
PB_LONG_CENTROID_BIAS_4x4 |
4x4
WFS bias for use with NAOMI |
PB_LONG_CENTROID_BIAS_2x2 |
2x2
WFS bias for use with NAOMI |
PB_CENTROID_WEIGHT_4x4 |
4x4
X and Y and flux pixel weights for centroiding |
PB_BACKGROUND_WEIGHT_4x4 |
4x4
Pixel weights for background estimation |
PB_CENTROID_WEIGHT_2x2 |
2x2
X and Y and flux pixel weights for centroiding |
PB_BACKGROUND_WEIGHT_2x2 |
2x2
Pixel weights for background estimation |
PB_SDSU_STATUS |
WFS
interface status (Get only) |
PB_HEADER_TIMEOUT |
Timeout
for waiting for SDSU header (in clock ticks) |
PB_BACKGROUND_FLUX_MEMORY |
Flux
memory specific to background calc |
PB_SURROGATE_WFS_APP |
Set
a WFS mode to simulate from full frame data |
PB_TT_ONLY_GAIN |
Gain
for use in tip-tilt only (app 10/doublet) mode |
PB_TT_ONLY_LIMIT |
Limit
for use in tip-tilt only (app 10/doublet) mode |
PB_SELECT_APP10 |
Assume
SDSU App 10 is in use rather than App 8 |
The Create function of each algorithm performs certain activities by convention when it is invoked. It allocates space for a parameter block structure which will contain all the parameter data (and some of the operational data) that the algorithm will use. Where rapid write-access to data is required for particular parameters or operational buffers, it will only allocate space for a pointer in the parameters structure itself and will instead allocate the actual memory in on-chip RAM. This is a limited resource in the C40 architecture but provides faster access than off-chip RAM. The Create function can ascertain whether or not it is the first invocation for this type of algorithm. If it is the first then it will initialise the parameter block with default values. If it is not the first invocation then it will copy the values from the last invocation into the newly allocated structure. In this way, the SetParameters function need modify only the selected parameters of an existing configuration and does not have to upload a full set of data each time a change is required.
The SetParameters function identifies which parameter block
section is actually being manipulated using a switch statement. Normally a
number of additional arguments are then decoded from the GP body. The precise
number and nature of these arguments depends on the Parameter Block section
which is being manipulated.
Floating point
arguments may be unpacked in-situ from the GP message body following this
example code fragment:
floatArg = (float *) arg;
afrieee(floatArg, SG_UNPACKED_DEMAND_BUF_SIZE);
The GetParameters function also uses switch statement to identify which ParameterBlock
Section is being addressed. It then formats a buffer body to deliver the
requested information. Again the number and nature of the elements in the body
depends on the ParameterBlock section in question.
Floating point
arguments may be packed for GP transmission as follows:
floatArg = (float *) buffer->body;
for (i = 0; i < SG_UNPACKED_DEMAND_BUF_SIZE; i++) {
floatArg[i] = parameters->ADCcalGain[i];
}
atoieee(floatArg, SG_UNPACKED_DEMAND_BUF_SIZE);
There are a number of other support libraries and macro collections which support the writing of C40 programmes. Import examples are the Exc exception handling system and the AIO asynchronous I/O system which manipulates the on-chip DMA engines.
Two algorithms are involved in WFS ring processing: AlgNaomiInterleave, which deals with SDSU WFS frame reception, and runs on all but one of the WFS ring CPUs and AlgMartini, which takes the centroid data produced by AlgNaomiInterleave and derives drive signals for segmented mirror and fast steering mirror.
AlgNaomiInterleave.c contains the ISR itself, its associated methods such as Create, SetParameters and GetParameters, and numerous helper functions invoked from the ISR. Some of these helper functions are tagged for inline compilation for speed of execution. The file begins with the including of the various required include files for utility c40 and real-time support libraries (in Electra/RealTime/libsrc), definitions of working constants, and the declaration of the parameter block structure and AlgorithmMethods.
The ISR function of AlgNaomiInterleave has several possible modes of operation. The most fundamental condition determining which mode to execute is whether or not the processor is attached to a wavefront sensor readout, i.e., whether it is a “WFS CPU”. If not, then its role in the current implementation is simply to pass on data: it is a “slacker” representing spare processing capacity. (The distributed reconstructor under development as AlgParallelSISO is designed to exploit this capacity.) The processor determines whether it is attached to a WFS output port using the WFS.in structure which is set up before the RT system is started using ordinary GP commands rather than transactions. The reason for this is simply that the WFS interrupts are driven by the reception of WFS frames and the configuration of the WFS system must precede the use of the transaction system, which depends on interrupts.
The attachment of a WFS channel to a CPU does not necessarily mean that it will always be desirable for that CPU to process frames. Consider, for example, the use of only one of the two NAOMI WFS CCDS when both are reading out in synched mode. This is important in system alignment, when it can be desirable to switch rapidly between both cameras. In this case it is necessary to switch off frame processing on one of the CPUs even though a WFS is attached and data frames are arriving. This switching is achieved using the parameter acceptFrames.
When a WFS CPU is receiving and processing frames there are a number of options available for routing output data and diagnostics. Using the routeCentroids parameter it is possible to control whether output centroid data are passed from this CPU to the mirror control CPU (running AlgMartini) or not. Similarly routePixelDiags and routCentroidDiags control whether or not the CPU send diagnostics packets to the diagnostic CPU containing pixel and centroid data respectively. Note that it is not usual to send centroid diagnostic data from a WFS CPU. Rather it is the mirror CPU in the WFS ring which is normally responsible for this. The routing of centroid Data to the mirror CPU and the routing of centroid diagnostics are therefore coupled in normal operation. The facility to decouple them using the data routing parameters enables a configuration where one WFS CCD is responsible fore closing the DM control loop whilst the other is providing pixel and centroid diagnostics for alignment. The purpose of this configuration is to allow the WFS CCDs to be mutually aligned with the DM control loop closed.
There are two other special configurations available in AlgNaomiInterleave. Firstly dummy mode allows a CPU to generate interrupts without WFS frames actually arriving (or even the WFS being switched on or attached). This is accomplished using a c40 timer to trigger interrupts instead of a WFS communications port. Dummy mode is controlled by the parameter IamAdummy. The second configuration is surrogation whereby the WFS actually provides frames in NAOMI SDSU mode (‘application’) 1 but the CPU reformats the data to simulate some other mode and then processes the data as if it were actually provided in that format. The purpose of this configuration is to debug the processing of data for various WFS modes.
With one exception the processing of the data from the various modes (‘applications’) of the SDSU WFS is dynamically determined, that is to say that the WFS mode of each individual WFS data frame is determined from self-describing header data and the processing algorithm is adjusted accordingly. The exception is mode 10 which shares the same application-identifying bit pattern as mode 8 and so must currently be enabled by the previous transmission of a parameter selectApp10. A future development might be to determine this distinction using auxiliary header data (row/column) instead.
The following sequence describes the detail of the operation of AlgNaomiInterleave essentially in its conventional processing mode, whilst passing comments about the other configurations. The detailed description of the helper functions is deferred to later discussion.
if
(ISRglobals.iFrame == 0) RunWallClock();
/* New
frame number */
parameters->wfsFrame = ISRglobals.iFrame++;
LOG(0);
if
(!ExcOK(ISRglobals.status)) Panic();
/* Record
start of frame time */
start =
C40ticksStart();
/* keep
track of inter-frame interval */
parameters->frameInterval
= start - lastStart;
lastStart =
start;
/* Set up
heap variables */
parameters
= ISRglobals.currentAlgorithm->parameters;
centroid =
parameters->centroid;
/* Unblock
comports so that messages and wakeups can occur */
GPunblock(ISRglobals.status);
/* Start
timer for timeouts etc */
RunTimer(ISRtimer, 100);
/* Remove
wakeup (cowcatcher) word from comport */
if
(CowCatcherRemove()) Panic();
The Algorithm AlgMartini runs on only one CPU and that is in the WFS ring. This is known as the mirrorCPU although in fact it interfaces to the SG ring and not directly to the mirror. Its ISR is responsible for receiving centroid LoveTrains from the WFS CPUs. It maintains servo loops controlling the mirror segment tip-tilts and conducts a reconstruction of the mirror pistons using a matrix. The XYZ (tip-tilt-piston) segment commands are converted to ABC (triaxial actuator) format and then added to a ‘flat’ buffer containing the base actuator commands and transmitted to the SG ring. Various helper functions are invoked to accomplish these procedures, and the description of these is postponed to later.
AlgMartini is contained in the file AlgMartini.c, which also contains declarations, the ISR itself, the helper functions, and the other algorithm methods. This is the same situation as for other algorithms.
The following sequence summarises the operation of the ISR of AlgMartini:
The Strain Gauge (SG) ring runs a single compiled c40 programme on all 8 CPUs within the ring: SGBSP2.x40. This programme therefore contains all the algorithms required for operation on the various CPUs: AlgSGstarter, AlgSGtimer, AlgSGmirror and AlgSGadc. The source code for each algorithm is contained in a .c file of the same name and the same organisational convention as AlgNaomiInterleave on the WFS ring is followed: each file contains a parameter block definition, an ISR, and the code for its associated methods.
Linking all the algorithms into
a single monolith is inefficient in terms of space because each CPU runs only
one algorithm. A possible future extension is to produce separate programs for
each CPU function as in the case of the WFS ring. Note however that AlgSGstarter is required in the
initialisation phase and also that a further future rebalancing of the SG
processing load may well involve increasing commonality of the algorithms
running on different CPUs.
The basic function of the SG ring is to accept the demand output from the WFS ring and compare it repeatedly to sample data from the Strain Gauge ADCs. On the basis of this comparison the final output demand to the mirror control electronics is adjusted using a servo algorithm such that the demand and ADC values become identical. For this purpose the ADC values are subjected to a calibrated transformation to nominal mirror DAC units. Mirror DAC values are represented by 13-bit unsigned integers and the raw Strain Gauge ADC values are 16-bit unsigned integers.
The algorithm AlgSGstarter is a minimal ISR with essentially placeholder methods. It executes no BarrierSyncs apart from the Stop Message. Its purpose is to allow the main algorithms to be configured using parameter block transactions. The main algorithms can therefore swap in with a full set of parameters at the first frame. It is actually a little more complicated than this because the main algorithms then perform one ‘frame’ with the ADCs being triggered but no data actually being read. This is to deal with an artefact of the ADC initialisation whereby data are not produced after the first trigger.
This simple algorithm acts as a ‘slacker’ during data processing when it simply copies the LoveTrains containing the demand and output data. Its main purpose is to schedule the timer interrupt which will cause the next ISR invocation. Once invoked the timer algorithm automatically wakes up its neighbour by transmitting the cowcatcher in its ISR wrapper (as do all algorithms). Shortly after invocation the ISR sends a trigger pulse to the ADC trigger module. The re-scheduling of the interrupt takes place after transmission of the Stop Message. The parameter interruptInterval is used to control the timer interval.
This is the main algorithm of the SG ring and runs on the six CPUs that have ADC data inputs. The structure of the ISR is as follows:
The ISR function of the AlgSGmirror algorithm is responsible for transmitting output data to the mirror electronics. The ISR processing follows the following sequence:
GNU gmake is used to build the programs from the RealTime directory.
RealTime configurations are specified in the RealTime/pythonModules/RTconfig.py python language source file.
GP transactions must be conducted with respect to named algorithms. As software is developed, however, it is sometimes necessary to change the names of algorithms allocated to a given processor and/or function. This is inevitable where two or more former algorithms are being merged, to implement load-balancing, for example. It is also desirable where functional development or differing supported hardware configuration have led to genuinely different algorithms supporting differing sets of parameter block transactions. At the same time, it is desirable, of course, to be able to write workstation support libraries which will perform certain generic functions with any running algorithm that supports that function and not to have a different library for every set of algorithm names. Clearly however such a generic function then needs to be able to identify the name of which algorithm is executing and whether it supports the desired function, and if so, what procedure is required to invoke it. The RTconfig system provides the capability to do this in a centralised fashion.
RTconfig introduces the idea of named software configurations. Tables in RTconfig.py hold the names of the executable programme files that must be loaded on each CPU in order to run a given configuration. It also holds the basic WFS and DM configurations required to initialise each configuration. A named configuration can be booted simply by passing its name to the c40Run utility which then passes on the name to Run functions within WFSlib and SGlib. They in turn use the configuration name to look up the required c40 programme filenames and configuration tables using the services provided by RTconfig.
Software configurations therefore provide at once a means of being able to retrieve older c40 programmes without having to alter and recompile libraries, and also a means of loading different programmes to support different attached hardware. RTconfig provides some additional support for this latter application by introducing a hardware-software compatibility checking system. It works like this: at boot time the Run function can check compatibility using the HWRTcompatible(conf) function in RTconfig where conf is the name of the software configuration that is being booted. RTconfig retrieves basic information on the attached hardware that is recorded in the file /software/Electra/save/RealTime/<host>.HWconfig where <host> is replaced by the name of the c40 host computer, e.g., aocontrol1. It uses this, together with internal compatibility tables, to assess whether or not the proposed hardware/software combination is compatible.
After a software configuration has been successfully booted its name is recorded ad may be retrieved. RTconfig then uses the concept of functional names for algorithms to allow workstation processes to find out the real names of the algorithms that have been booted for the recorded software configuration name. For example, the functional name “generic” translates to “ALG_NAOMI_INTERLEAVE” under the “Naomi” software configuration but to “ALG_GENERIC” under the “Electra” configuration. A workstation support process can therefore deal transparently with either algorithm using the “generic” functional name.
There are various ways in which
the implementation of RTconfig might be enhanced: (a) it currently stores the
software configuration name in a file whereas it might be better to have it
stored in the EPM database; (b) it implements a form of “inheritance” between
configurations that might be better achieved using the object oriented
programming features of the python language.
Python and C language version of the GP and DTM libraries are available for workstation programmes to communicate with the c40 processors and each other.
The most common way of sending normal GP commands is the python GP.rpc remote procedure call. This takes three arguments: the CPU (GP) number, the command identifying string, and a list of parameters to the command. For transactions from python procedures the GPtransaction.Transact call provides a convenient mechanism. It takes a single argument which is a list of transaction elements. Each element of this list is itself a list with the following fields: a (GP) CPU number, the name of the algorithm (see RT configurations above), the name of the parameter block section and a list of parameters. For retrieving parameter block sections the python function GPtransaction.GetParam is useful. It has only two arguments: the (GP) CPU number and the name of the parameter block section
Here is a minimal C program to talk to the C40s.
#include <signal.h>
#include <stdlib.h>
#include "exception.h"
#include "packet.h"
#include "c40Commands.h"
int main(int argc, char **argv)
{
GPmsgBuffer *buffer;
ExcStatus *status = ExcStatusNew(1);
GPsetup(status);
buffer = GPallocBuffer(status);
if (!ExcOK(status)) goto end;
buffer->header.command = DO_PING;
GPsendMsg(buffer, length, GP_PACK_ADDRESS(4, PORT_BOOT), status);
buffer = GPgetMsg(1000, status);
if (!ExcOK(status)) goto end;
if(buffer->header.command == DO_PING + GP_ACK
&& buffer->header.arg1 == 0)
printf("Ping succeeded\n");
else
printf("Ping failed\n");
GPshutdown();
exit(0);
end:
ExcToFile(status, stderr);
GPshutdown();
exit(1);
}
The program goes through the following steps:
It initialises the status variable, which holds the exception state and error messages. The argument of 1 to ExcStatusNew() causes the program to exit with an error if the system is out of memory.
It initialises the GP system with GPsetup()
It allocates a buffer and puts a command into the header
It sends the command to the appropriate CPU, in this case number 4.
It then waits for a response with a 1000 msec timeout
It checks to see if the response is correct. By convention, the C40 acknowledges all commands by incrementing the initial command number by the value GP_ACK (1000). The status is, by convention returned in header.arg1 and is zero on success.
It shuts down the GP system. This is important to free up ports on the c40Comms server.
It deals with any error messages by calling ExcToFile, which simply prints out any messages from any of the called subroutines. The reason the error messages are not printed in the subroutines is that the subroutines have no knowledge whether the calling program is connected to a terminal, or whether the program sends the error messages to a higher-level client.
Also, higher-level subroutines in the call chain may decide that an exception is not serious and call ExcClear() to clear the error condition without printing any messages.
WFSlib.py provides general access to WFS ring functions to python workstation processes working below the EPM level. At this level only the minimal interlocking provided by the GP transaction system is operational and it is therefore appropriate for engineering level functions. Command line access to many of the functions is available via the WFS command which uses the WFStest.py library to access the functions in WFSlib. These commands are dealt with in the c40 Users Guide.
SGlib.py provides general access to WFS ring functions to python workstation processes working below the EPM level. At this level only the minimal interlocking provided by the GP transaction system is operational and it is therefore appropriate for engineering level functions. Command line access to many of the functions is available via the SG command which uses the SGtest.py library to access the functions in SGlib. These commands are dealt with in the c40 Users Guide.
DeformableMirror.py provides python access to deformable mirror manipulation functions at the sub-EPM (non-interlocked) level.
ReconLib.py provides python access to reconstructior matrix manipulation functions at the sub-EPM (non-interlocked) level.
GPdiag.py provides python access to the c40 diagnostics system.
Python c40 support libraries exist above the EPM level and provide potentially full-interlocked access to the c40s. It is these libraries that are used by top level processes such as TopGui and the superscripts.
In addition to TopGui there are a number of workstation programmes which access the c40s. These include FisbaGui, which deals with initial mirror flattening using the interferometer, WhiteLightProcedure, which deals with full flattening using the WFS and IngridAlign which deals with the removal of non-common-path errors in the optical path to the science camera.
[insert SJG material]
Implementation of 4x4 subaperture modes
Implementation of synched WFS modes
Improved load-balancing on the SG c40s
Investigation of optimal read/write timing on the SG c40s
[Many other lower priorities]
WFSAlign Non-EPM C programme (GUI) for WFS alignment and visualisation
MirrorMimic Non-EPM C programme (GUI) for DM control
RTengGui python non-EPM Real-time control GUI
DataDiag Non-EPM C diagnostic launcher
[Many other systems to catalogue]