NET-IP-1
Issue 1.1; 31st October 1994
The recommendations in this document are intended for the projects INT Interim DAS and INT Prime-focus Camera. Constraints on protocols are suggested such that the protocols are made suitable for controlling a real-time client-server system programmed in C. The scheme presented assumes TCP connections through Berkeley sockets. However, the techniques could be applied to any reliable byte-stream connections.
1.2 The meaning of `protocol'
`Network protocol' has many levels of meaning. Most commonly, it is used to mean `the means by which data-packets are transferred between computers'. In this document, I ignore the means of transport and examine the contents of the transmissions. Specifically, I describe the messages that application programs must send to the network interface.
In order to make this division, one requires that the underlying transport is reliable (i.e. that it will not routinely lose or corrupt messages) and that messages emerge from a network connection in the order in which they entered. For example, TCP is a reliable protocol but UDP is not: applications that use UDP must explicitly acknowledge each datagram received and must retransmit any that ar not acknowledged. I assume TCP or an equivalent transport-protocol throughout.
To ease the construction and parsing of messages, the interface to a network connection should allow reading and writing of single bytes or of character strings. Most do: Berkeley sockets are the most common example. The socket interface is also suitable for inter-process communication within one computer and the protocols describes here can be used with or without a network.
1.3 Terminology
ETX the ASCII character for end-of-text (0x03).
Mechanism (in this document) an addressable unit of the hardware controller be a server program.
NUL the ASCII character for end-of string (0x00).
Packet (in this document) a message between programs, to be parsed as one unit.
Sentence (in the context of network packets) a character string within a packet, delimited by NUL characters.
Socket a interface from an application program to a network-transport protocol (typically to TCP).
SI the ASCII character for shift-in.
SO the ASCII character for shift-out.
STX the ASCII character for start-of-text (0x02).
TCP Transmission Control Protocol, a network-transport protocol.
UDP Unacknowledged Datagram Protocol, a network-transport protocol.
Three classes of messages are needed.
z Parameter messages, typically sent from server to client, pass information about the current state of a `mechanism'.
z Command messages, sent from client to server, request that the state of a `mechanism' be changed.
z Status messages, sent from server to client, indicate whether a particular command was accepted, whether it is still in progress and whether it was successful.
I an using `mechanism' to indicate a thing controlled by a server, some part of an instrument that needs to be addressed as a unit.
It is helpful to keep these categories separate, even if this requires more messages for a given action.
If a mechanism is commanded to move, the sever should send back both status messages (to show
how the command is progressing) and parameter messages (to show the state of the mechanism).
All the state information that the client needs should be given explicitly in the parameter messages;
requiring the client to infer state from a sequence of
messages
is unsound practice.
Parameter messages should be sent to all connected clients whenever a mechanism changes state, including clients that have not requested the move. This is vital: the `intelligence' of the client programs is defeated if their knowledge of the system state is out-of-date. Status messages need only go to the client that sent the corresponding command. It may be useful, in some protocols, to copy status messages to all connected clients.
The stipulations above imply that clients remain connected between commands and receive parameter messages for all relevant state-changes, relevance being something that varies from client to client. A newly-connected client will need to `catch up' on the current state and so must receive parameter messages for all mechanisms. Programs working this way are called `connection-oriented' (or `connectionful') in contrast to many common Unix network-tools which are `connectionless'.
It complicates the servers to restrict the messages that go to each type of client, and it complicates the client if sets of messages must be explicitly turned on at the time of connection. A good compromise (provided there is sufficient bandwidth) is to send parameter messages to all clients on each state change and to send a full set of parameter messages to each client as it connects. The clients can then discard the messages they don't need.
When a client sends a sequence of the same command to a server, there may be a need to identify the status messages explicitly. This is most conveniently done if the client includes a serial number or similar tag - arbitrary, but meaningful to the client - in the command and the server repeats the tag in the status message(s). Unless the tag includes the name of the client, which is assumed to be unique, it would then be unsafe to echo the status messages to all connected clients: two clients might have commands outstanding with the same tag.
Most commands have associated parameters. A client might preset the parameters for a command with a parameter message and then send a one-word command message, but this invites odd behaviour if other events occur between the two messages. A better scheme is to include all parameters in the command message. As a result of this, the protocol must provide the servers with a way to pick parameters out of a list.
Debugging a distributed system is much easier if the packets contain ASCII text (i.e. all numbers are written out as formatted ASCII). This also avoids the problem of number-representation on different computers. Occasional bursts of `embedded' binary-numbers are possible by prior agreement between client and server, but should be delimited by agreed framing-characters (not NUL, STX, ETX or newline: see below). The ASCII characters SI (shift in) and SO (shift out) might be the best.
Commands typically contain several scalar parameters. For example, a command to set a CCD window might contain the window number, the origin and the size: five integers. For the command to be parseable, the protocol must define a parameter-separator character. Parameters that are character strings (e.g. titles of observations, error messages, comments to the observing log) may include spaces and newline characters within the parameter so these are not suitable separators. From the point of view of a C program, NUL is a good separator, since it breaks the packet neatly up into separate strings: sscanf() can then operate on one parameter at a time. As an extension of this concept, a parameter may be a vector of n numbers, each number within the vector separated by spaces; sccanf() can parse the vector in one call. I call a null-terminated string within a packet a `sentence'. Two consecutive NULs can be used to indicate an empty sentence. To make the packet parseable, the reading program must see a NUL at the end of the last sentence. However, this character need not be transmitted in the packet: the last NUL may be added by the C function that reads the packet from the socket. The specification of each protocol should make clear how this point is to be handled.
Framing characters are needed to mark the beginning and end of a packet; STX and ETX seem to be the most suitable. A program that reads a packet with an unknown number of sentences from a socket cannot detect the end of the packet from the parameter-separators; it must wait until it sees ETX. STX should be on the front of each packet to demonstrate to the reading program that no characters have been missed and also to allow retransmission: if STX is received before ETX, the packet is being retransmitted.
A receiving program cannot know how long a packet will be until ETX arrives and so cannot allocate storage of the correct size. This can be overcome in two ways: (a) set a reasonable maximum size on all packets (512 bytes seems sensible); (b) give each packet a short, fixed-length section which states the length of the remainder. In either case, the fixed or maximum length should be the same for all protocol in a system in order that common software can be used.
With STX/ETX framing, there is no need for all of a packet to arrive at one time: the receiving program can block on its socket until the end of the packet arrives. However, this requires that the reading program either be completely blocked (which may not be acceptable in a server) or that a separate thread or process be allocated to read the packet.
It should be made as easy as possible for a reading program to find out what type of packet has arrived. Ideally, the first sentence of each packet should contain the packet name (typically the name of the mechanism concerned) plus a single character (`punctuation') to distinguish between command, status and parameter packets. It may be easier to parse the message if the punctuation character is the first in the packet after STX.
A protocol that could control this shutter is listed below. For each packet, I have listed the exact bytes of some examples, omitting STX and ETX. The symbol ø represents the character NUL.
Command: take (queue) an exposure
.exposeø12345ø7.25The first sentence identifies this as an expose packet; the leading full stop indicates that it is a command to be obeyed when the shutter is ready for it (i.e. not while another exposure is in progress). The serial number of the command is set by the client to 12345 and an exposure time of 7.25 seconds is requested.
Status: exposure command
:exposeø12345øactive
:exposeø12345øcomplete
:exposeø12345øfailedø256øShutter jammed?ø257øTimeout on shutter drive.In these examples, the leading colon indicates a status packet. The second sentence is the serial number of the command. The third sentence is the the state, from the set pending, active, complete, failed and rejected. Notice that the first character in this sentence is unique to a particular state. The other sentences express error conditions and are arranged as condition-code/text-string pairs. No condition codes or error reports are present unless the state is failed. There can be many sets of error reports in the packet, with the most general coming first. A status packet is sent when the command is accepted (state pending), when the server starts to open the shutter (active) and when the shutter is closed and stationary after the exposure (complete).
Command: open/close the shutter
!openø12346
!closeø12347These commands are marked as preemptive by the leading exclamation-mark: they are to over-ride and cancel any other commands queued on the shutter. The alternative forms .open and .close are not preemptive. A preemptive open clears the exposure time but a normal open does not. Preemptive close effectively aborts an exposure. The expose command also has a preemptive form.
Status: open and close commands
:openø12346øactive
:closeø12347øfailedø256øShutter jammed?ø257øTimeout on shutter drive.These messages have the same syntax as the stati for the expose command. In fact, all stati can use this syntax.
Parameters: shutter state
=shutterøsclosedø0.00ø7.243
=shutterøopenø180.0ø3.121
=shutterømovingø272.5ø0.000One packet can carry all the state information for the shutter. The leading equals-sign distinguishes a parameter packet. The second sentence gives the overall state of the shutter and the third gives its angular position. The final sentence indicates the exposed time. This is set to zero when the shutter opens, and retains its value when the shutter closes. A parameter packet is sent whenever the general state changes and once per second while the shutter is open or moving.
An expose command might progress like this.
Event # Client Server
1 .exposeø1ø2.5
2 :exposeø1øactive
3 =shutterømovingø260.0ø0.000
4 =shutterøopenø180.0ø0.203
5 =shutterøopenø180.0ø1.203
6 =shutterøopenø180.0ø2.203
7 =shutterømovingø102.2ø2.509
8 =shutterøclosedø0.0ø2.509
9 :exposeø1øcomplete1. The client requests a 2.5-second exposure.
2. The shutter server accepts command 1 and tells the client.
3. The shutter starts to move. When the parameter packet is issued, it has moved round from zero degrees (its closed position) to 260.0 degrees and the exposed time has been reset to zero.
4. The shutter stops at its fully-open position (180.0 degrees) and the server reports this. The exposure starts when the edge of the clear sector crosses the camera at about 200 degrees. Hence, the exposure-time is slightly greater than zero.
5. The exposure time has increased by one second and the server reports it.
6. As for 5.
7. The exposure has finished and the shutter has started to close. By the time the server detects this, the shutter has moved round to 102.2 degrees and the camera is covered. The exposure time has attained its final value, 2.509 seconds.
8. The shutter is now fully closed and has stopped moving. The exposure time is still shown as the length of the last exposure.
9. The client is informed that command 1 is complete.
If two clients try to drive the shutter at the same time things are more complex. Consider the interplay between an observer and an engineer,
Event Observer Engineer Server
1 .closeø1
2 (to Eng.) :closeø1øactive
3 (to both) =shutterømovingø80.0ø102.230
4 .exposeø1ø1000.0
5 (to Obs.) :exposeø1øpending
6 (to both) =shutterøclosedø0.0ø102.245
7 (to Eng.) :closeø1øcomplete
8 (to Obs.) :exposeø1øactive
9 (to both) =shutterømovingø265.0ø0.000
10 (to both) =shutterøopenø180.0ø0.210
11 !closeø2
12 (to Eng.) :closeø2øactive
13 (to Obs.) :exposeø1øfailedø128øOverride
14 (to both) =shutterømovingø62.5ø0.325
15 (to both) =shutterøclosedø0.0ø0.325
16 (to Eng.) :closeø2øcomplete1. The engineer orders the shutter closed.
2. The server accepts the engineer's command.
3. The shutter starts to close; both clients are informed.
4. While the shutter is moving, the observer requests an exposure.
5. The server queues the observer's command because the shutter is still moving.
6. The shutter halts in the closed position. All parties are informed.
7. The engineer learns that his command was successful.
8. The observer is told that her expose command is now starting.
9. Both clients learn that the shutter is moving towards a new exposure.
10. Both clients learn that the exposure has started.
11. The engineer sees that the shutter is open again, assumes that the previous close command failed and sends another. Because he thinks something's wrong he uses a preemptive command.
12. The server accepts the close for immediate execution because it's in the preemptive form. The observer isn't told this.
13. The observer is told that her exposure failed and told the reason (over-ridden by another user). The actual error-message is probably a little longer than shown here, but the final text may well be produced in the observer's UI and need not be part of the protocol.
14. Both clients see the shutter moving closed.
15. Both clients see that the shutter has stopped in the closed position.
16. The engineer (only) is told that the close command has completed successfully.
17. The observer loses her temper and goes looking for the engineer. Fortunately, he's in Cambridge, connected over the Internet, and she can't find him.
This parable illustrates some correct behaviour and a design weakness. At events 4-8 the system handles an access conflict correctly and probably without the observer noticing that anything is wrong. At events 11-16 the exposure is lost only because the engineer mis-uses a privileged, low-level command; if he had used the non-preemptive form, the exposure would not have been affected. When the unresolvable conflict happens, the observer learns what has gone wrong and how (the error information could be extended to include the address of the overriding client). At all times, the mimic displays on the two UIs will have correct information.
If the tagging of commands was extended to include the client's name, the respective UIs could indicate which commands were in progress for each station. This might avoid the conflict.
Appendix A. Document history
Issue 1.1 31/10/94 First issue.