INS-DAS-20

Communicating regions of interest to SDSU controllers

Issue 2.1, written on 26th November 1999

Guy Rixon, ING (gtr@ast.cam.ac.uk)

Purpose of this document

The background to the design of the window table specified in INS-DAS-18 is given. Discarded approaches are listed and some trade-offs are described. Anybody who wants to understand INS-DAS-18 should read this document first.

This document is not in itself part of the UltraDAS architecture.

Changes to the document

Issue 1.1, 1999-09-29:

The original document.

Issue 2.1, 1999-11-26:

The discussion of multi-channel cameras was updated. All six solutions from issue 1.1 are now known to be inadequate; a new solution is identified as the standard for UltraDAS.

References

[1]: Interfacing the DAS computer to SDSU Detector-controllers

The problem

The user requirements for the UltraDAS require that a detector readout may be windowed. There may be up to 10 windows, each of which may be of any size or shape subject to a few constraints:

all windows are single rectangles;
no window overlaps the edge of the camera;
windows may overlap in x or in y but not in both.

On a multi-channel camera (e.g. a mosaic of CCDs or an IR detector with independent readout in four quadrants) the windows may appear in the sub-raster corresponding to any of the readout channels. In the most general case, a window that is continuous on one chip may span the boundary of more than one sub-raster and hence there may be sub-windows on each channel.

Hence, the first problem is how to achieve a set of up to 10 windows in the frame read out on one channel. The same solution is then applied to each channel of a camera.

This is a harder problem than has previously been solved by any user of SDSU controllers. The possible overlapping of windows in the y direction is what makes it hard, as the controller cannot then fully read out one window before starting on the next.

Solutions considered and discarded

Thanks are due to Peter Moore for explaining the work already done on this problem.

Any readout on a CCD can be described by a sequence of four basic operations:

skip one row;
clock one row into the serial register;
skip one column in a row;
read one pixel in a row.

A simple list of command codes (or DSP instructions to jump to a subroutine for each of the four operations) could be made for any pattern of windows. However, this would require storage for at least one instruction per pixel. An SDSU controller has nowhere near enough RAM for this.

For a given pattern of windows, it is fairly straightforward to write an assembly-code routine to descibe the readout in terms of loops over sequences of the four basic instructions above. However, the number of possible patterns for n windows is a multiple of n² and is hence far too high to provide fixed subroutines for each option.

A very-general program could start from the definitions of the individual windows and emit sequences of basic readout instructions, perhaps a detector row at a time, to the SDSU video board. This is deemed too difficult to do in the controller, firstly because the program would be complex, subtle and extremely hard hard to express in assembly code; secondly because the program code might grow to be too large for the controller's memory; and thirdly because the DSP on the SDSU timing board is committed to real-time clocking operations during a readout and is not available to do the geometry calculations.

If the host computer (the DAS SPARCstation) did the geometry calculations, it could, in principle, upload a detector-row's worth of readout instructions at a time to the controller. This approach fails, partly because of the limited bandwidth on the uplink lead (the time to upload the instructions would be at least 0.0025s for a full row of an EEV42 CCD described in 24 bits per readout instruction), but more importantly because the DAS SPARCstation does not have a good real-time response. The time between readout operations would vary wildly and the data quality would be degraded.

Window table: an intermediate form

The table is the representation of the readout pattern sent from the server program to the controller program. It is an intermediate representation between the bare parameters of the windows and the DSP machine-code that works the readout pattern. This intermediate code is intended to be short enough to hold in DSP memory and simple enough to parse inside the controller program.

The problem becomes tractable if the readout is described in terms of strips of pixels (all to be read out or all to be skipped) that are contiguous in x in a given row, and blocks of identically-patterned rows that are contiguous in y.

In a system that allows up to n windows, any given row can have up to n strips of pixels to read out and n+1 strips of pixels to skip. If the count of pixels in each strip is represented by one word of controller memory, then the row is described by 2n+1 words. To make the table easier to parse in the controller, the windows are ordered in increasing order of the x coordinate of their leftmost column.

The description of a row can be grown into the description of a block by adding one word to hold the repeat count.

To help in working out when to use a row-skip operation, it is helpful to put a one-word flag at the in each block-description, after the repeat count and before the description of the first strip. The flag is set to one for a skipped row and zero for a row that is going to be read out. This makes the size of each row 2n+3 words.

The same n-windowed system can have up to 2n+1 blocks in the most-complicated case: one block before the first window, one block inside each window and one block after each one. Partly overlapping any pair of the n windows in y does not change the number of distinct blocks; however, making the y-range of one window a sub-set of another reduces the number of blocks by 2.

Thus, the storage required to represent a readout of n windows is 4n² + 8n + 3 words. A storage requirement of 483 words for 10 windows is high, but not impossibly so for a controller with around 16Kwords of application space.

The table of strips and blocks is compiled by the camera's server-program on the host SPARCstation and uploaded to the detector controller whenever the camera is initialized, or the pattern of windows is changed.

For simplicity, the table should be of a fixed size for a given maximum number of windows. That is, the table does not grow or shrink when the user activates a different number of windows. Windows that the user is not using (those set in the server program to zero extent in one or both dimensions) still have columns in the table. They generate strips of zero pixels skipped and zero pixels read.

As an example, consider a readout format for a CCD of 2048 by 4028 pixels. The CCD is on a spectrograph, with spectral dispersion along the y axis. The observer wants two narrow windows, each of 100 pixels in x, covering the full spectral range except for 20 rows of poor-quality pixels at the low-y end. The windows start at x=500 and x=1500 respectively and the observer has arbitrarily set windows 8 and 5. The table is (sized for up to 10 windows):

  20 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0   0   0   0    0
4008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 499 100 900 100  549
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0   0   0   0    0
   ...

(last line repeated 17 times). The first block is 20 row-skips (second word of the line is the row-skip flag). The second block is the remaining 4008 rows of the detector (including overscan), in which the pattern is "skip 499, read 100, skip 900, read 100, skip remaining 549". Windows 1, 2, 3, 4, 6, 7, 9 and 10 (which the observer hasn't set) have a low x-value of zero and hence are sorted to the front of the row description; they generate "skip zero, read zero' pairs. Windows 5 and 8 (which are set) are sorted to the back. Window 8 has been sorted in front of window 5 because it extends to lower x (hence there is no fixed relationship between window numbers and column numbers in the table). The third and subsequent lines are filled with zeros as all rows of the readout have already been described.

Some detector programs may not leave enough memory free for a table of 10 windows. In these cases, a smaller number of windows may be sufficient; 4 windows (as allowed in the Data-Cell DAS) can be described in 99 words using this new system. Two windows (as in the old perkin-Elmer DAS) can be described in 35 words. If this approach is to be allowed, even as an option on future cameras, then the host program will have to ask the controller program for the size of the table each time it sets up a window pattern.

Compiling the table

The table format is designed not only to be compact but also easy to parse by the controller program. Using the advantageous command-set of the Motarola 56000-series DSP, the table can be parsed in a single pass.

Here, I list one possible way of compiling the table. I believe that the code produced is as compact as can be obtained from a "just-in-time" compiler without optimization.

The machine-code representation uses two levels of the hardware do-loops which are part of the architecture of Motarola DSPs. The two loops are nested.

The inner loops each contain a jump instruction leading to a subroutine that reads or skips one pixel. The outer loop contains 2n+1 of the inner loops, one for each possible strip in the table. The repeat counts for the inner loops are the numbers in the body of the table. The repeat counts for the outer loops are the numbers at the start of each line of the table.

Consider again the example table given above:

  20 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0   0   0   0    0
4008 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 499 100 900 100  549
   0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0   0   0   0    0
   ...

The parsing goes as follows. I have shown the assembly code equivalent to the machine code that the parser would plant. In the do instructions, the notation +i indicates an address i higher than the location of the do itself: this is the address of the first instruction after the loop.

First line of table, first datum: repeat count of 20 for the instructions in this line. Plant

   do +66, 20

First line, second word: row-skip flag is on. Plant

   jmp rowskip

rowskip

Line 1, words 3..23: because the rows are skipped, there can't be any pixel-skip or pixel-read instructions in this block, so ignore the rest of the line. Plant nop instructions to fill the line out to the length assumed in the loop: we need 65 of them.
Second line, first word: a repeat count of 4008. Plant

   do +66, 4008

Second line, second word: row-skip flag is off. Plant

   jmp rowread

rowread

Second line, words 3..18: pixel count of zero. Plant

   do +2, 0
   nop
   nop

after

nop

Second line, word 19: 499 pixels to skip. Plant

   do +2, 499
   jmp pixelskip
   nop

pixelskip

Second line word 20: 100 pixels to read out. Plant

   do +2, 100
   jmp pixelread
   nop

pixelread

Second line, word 21: 900 pixels to skip. Plant

   do +2, 900
   jmp pixelskip
   nop

Second line, word 22: 100 pixels to read out. Plant

   do +2, 100
   jmp pixelread
   nop

Second line, word 23: 549 pixels to skip. Plant

   do +2, 549
   jmp pixelskip
   nop

End of second line. Plant a nop to separate the end of the current outer-loop from the do at the start of the next.
Line 3, first word: repeat count of zero. This ends the parsing. Fill the rest of the code space with nop. The code space holds 2n+1 sequences of 66 words each and we have filled two of the sequences, so we need 1,254 nop instructions.

The total size of the code is 1,386 words, and is independent of the size of the chip.

The code would be much shorter if the DSP's rep instruction could be used to loop over pixels in a strip. This is not possible, as rep cannot repeat a jump instruction.

Readouts need to be interruptible as they are sometimes aborted. The outer loop for each block needs to include a check of a flag with a possible jump out of the readout code.

Multi-channel cameras

Most modern detectors have more than one output; many ING cameras have, or are planned to have, more than one readout channel per detector controller. Not all multi-channel detectors consist in mosaics of identical units: the INT wide-field camera has one chip rotated 90° from the camera frame; INGRID has four quadrants each with the output at different corners; an EEV42 CCD can be read out in two mirror-image halves.

If there are n windows on the camera and m channels, and if the windows are allowed to overlap the boundaries of the frames on each readout channel, then there can be a total of mn sub-windows on the camera with n in each readout table.

This generality makes it hard to implement windows. An SDSU detector-controller has no built-in multi-tasking, but runs a single sequence of code during readout. The readout tables for each channel have to be compiled together. Simplistically, one imagines the inner loops of the code discussed above being expanded to have one jmp instruction per channel.

The problem is harder than that. Where the pattern of windows is different on the frame attached to each readout channel, the pattern of blocks in the readout tables differs too. Each of the mn windows on the camera can generates blocks in each readout table, making the readout tables 2mn+1 blocks long. There is a danger of running out of memory.

There is a further problem. The DAS computer expects to divide the pixel stream by readout channel. In a full-frame readout, one pixel is sent in turn on each channel and the DAS can identifiy the pixels by their position in the stream. In a window pattern, the interleaving is not necessarily uniform.

UltraDAS cannot afford to have different windowing code optimized for many cameras and for many patterns of windows; a single, general solution is required. There are seven apparent solutions of which only survives closer inspection.

No windows are allowed on multichannel devices.
Only one window is allowed and it must be entirely within the frame of one readout channel.
A window on one channel appears in the same point in the raster on all other channels; the controller has only one readout table. The DAS keeps all these windows in the output files. However, the windows cannot overlap, so there are complex restrictions on how multiple windows can be placed.
Windows can be placed anywhere; the controller inserts dummy (zero-valued) pixels where necessary to preserve the interleaving. This has the desired astronomical result. The speed of output can be improved by switching off completely channels that include no windows. However, the readout tables become very complex. Both the number of blocks of detector rows and the number of strips pixels per block are scaled up by roughly a factor of m, the number of channels. This increases the memory requirement by a factor of m², and a 64-fold increase in memory for an eight-channel camera is untenable.
All pixels are tagged with their channel number. The controller interleaves readouts as in the previous method, but does not need to send dummy pixels as the DAS can accept any sequence of interleaving. The data volume on the connection to the DAS is increased by ~50%.
Windows readouts are not interleaved. The controller reads out each channel in turn. Where there are no windows in the frame of a channel, then that channel is not read out at all. This method saves to disk only the pixels that the observer actually asked for. The performance is optimal when there is one window entirely in the frame of one channel. Otherwise, the readout time is longer than it could be for optimized code. For the worst case of a window in the same place in each readout frame, the readout time is degraded by a factor equal to the number of channels. This method is fairly straightforward to implement, and requires only the detector storage stated in the main discussion above.
Clocking operations are identical on all channels. Wherever the control reads a pixel on one channel, it must read all the channels in sequence; the controller may only skip a row or column where that row or column intersects no window on any channel. Clearly, the controller produces many "ghost" pixels where windows are not place symettrically on the channels, and the DAS has to be able to detect and discard the ghosts. In the worst case, a camera with A amplifiers is generating a factor of A more pixels than will actually be retained by the DAS. However, this has a minimal effect on the duration of the readout because the charge in a given pixel on each amplifier is integrated and extracted in parallel; the time to transmit the pixel value to the DAS, which is done in sequence for each amplifier, is small compared with the time to digitise a set of A pixels.

Methods 4, 5, 6 and 7 satisfy the user requirements on the possible patterns of windows. Methods 1, 2 and 3 do not.

Method 4 is a specialization of method 7 in that the controller sets the value of all ghost pixels to zero. The memory cost for this refinement is prohibitive in large, mosaicked cameras and method 4 is unsuitable as a standard algorithm.

Method 5 is not supported by SDSU's protocol for transmitting pixels from the controller to the DAS, so it cannot be used with standard SDSU products.

Method 6 is thought to be feasible with controllers of optical CCDs, although this assertion has not been proven in practice. However, method 6 does not not work for the IR camera INGRID; it requires the INGRID controller to use more code for reading amplifiers than will fit in the controller's memory. Method 6 cannot be the standard algorithm.

Method 7 is thought to be feasible on INGRID: it requires the least-possible volume of readout code. The method will also work on cameras where the readout channels cannot be clocked independently, such as the INT WFC and single CCDs with two or more amplifiers.

Hence, the standard windowing solution for UltraDAS is the seventh method listed above, For any camera, the DAS downloads to the controller one window table and requires the controller to apply the table equally to all amplifiers of the camera.

The controller must read the amplifiers in the same sequence that it would read them if windowing were not applied; this sequence must be fixed before the application code for the controller is installed in the observing system, but may be changed after the code for the DAS is installed. That is, the order of readout is dictated to the DAS by a configuration file that also defines the file of object code that the DAS downloads to the controller to work the readout. The controller may never change the sequence of interleaving by omitting one or more channels. If the user wants to ignore a channel, the data from this channel must be sent to the DAS and then discarded.