SOF-EPICS-1
A Transaction Protocol
based on CAD-CAR records
Guy Rixon
Issue 1.1; 1997-06-26
Royal Greenwich Observatory,
Madingley Road,
Cambridge CB3 0HJ
Telephone (01223) 374000
Fax (01223) 374700
Internet gtr@ast.cam.ac.uk
Document history
Issue 1.1 1997-06-26 Issued for review.
1 Introduction
1.1 Purpose of this document
A communications protocol is presented for review. The material below is a design for the protocol itself, a suggested API for use in transient-client programs and a list of constraints placed by the protocol on the design of EPICS databases.
1.2 Scope of the software
The EPICS CAD and CAR records produced for the Gemini project define an interface to an EPICS server but do not define fully the protocol that a client must use to access the server. This design extends the description of the record behaviour in reference [1] to include
z extended and corrected state machines for the CAD and CAR records;
z a matching state machine for the client;
z the set and ordering of signals between the client and server;
z necessary interconnections of the CAD and CAR records;
z suggested APIs for a library implementing the client-side state machine and the communications with the server.
This text is not greatly concerned with the implementation of actions within the EPICS database. An outline of the connections of the CAD-CAR pair to other records is suggested.
1.3 Glossary
action means the activity in the server started by a command. An action may be ephemeral (e.g. registering a new exposure time) or it may last a long time (e.g. executing the exposure).
CAD stands for Command Action Directive, the record by which a command is put into the server.
CAR stands for Command Action Response, the record by which the progress and outcome of an action is reported.
command means, in the context of this protocol, a request to the server to start or stop an action. The command and the action it invokes are not synonymous. It is possible for the server to reject a command without starting an action.
kick is used in the ADAM/DRAMA sense of a command that requests the stopping or modification of an existing action.
libcia is the library of support functions for client programs. It is the logical place to put the code that implements the client side of the new protocol.
obey is used in the ADAM/DRAMA sense of a command that requests the starting of an action.
transaction is used to express the client's view of an exchange of messages. An obey transaction starts with a command to the server and ends with the server's notification that the associated action has finished. A kick transaction starts and ends with the processing of the command. In each case, giving the command involves several separate access to the server, so the client's transaction is more complex than the atomic transactions described in the channel-access manual [2].
1.4 References
[1] Gemini Record Reference Manual
Gemini Controls-group report SPE-C-G007/02 by B. Goodrich and A. J. Foster.
[2] EPICS R3.12 Channel Access Reference Manual
by J. O. Hill (Los Alamos National Laboratory).
[3] Remote Procedure Calls for DRAMA clients
ING/RGO document
OBS-RPC-1 by G. T. Rixon.
2 System overview
EPICS is a crucial part of the computing architecture for ING telescopes. It needs to be integrated into the general client-server system used at the telescopes and run alongside non-EPICS (ADAM, DRAMA, FORTH) installations.
It is intended that the system of separate client-programs for each command be extended to all sub-systems on all telescopes. The means of communications with EPICS databases must be captured in the support library, libcia [3], for these programs. Using a standard library for the interface implies a standard way of organizing transactions.
2.1 Transaction types
Following the model of ADAM and DRAMA, there are five types of transaction
get transactions retrieve one datum from the server. One possible form of get retrieves a data structure or a list of data in a single transaction. The transaction ends when the client receives the data.
put transactions are the reverse of gets.
monitor transactions are like repeated gets. The client registers a monitor on a datum (or set of data) with the server and is sent the current value(s) and the new value(s) at each change. The transaction continues indefinitely until the client terminates it.
obey transactions send a command to the server. The command is checked by the server and, if accepted, starts an action in the server. The action may be short lived, such as setting an exposure time or long-lived, such as executing the exposure. The transaction ends when the action ends, or when the command is rejected, or when the command fails to start, whichever happens first. The command in an obey can have an abitrary set of parameters.
kick transactions send a command that asks the server to modify the action started by a concurrent obey. Typically, the kick is used to abort the action but it may require it to continue in a different course (e.g. changing the exposure time of a running exposure). A kick transaction ends when its command is accepted or rejected; kicks do not start actions. Kicks may have parameters.
This set of transaction types has proved entirely suitable for ING's purposes and should be retained for new work. Where EPICS sub-systems work alongside DRAMA servers, the DRAMA-style transactions have to be supported so it is very helpful to provide a common interface for the clients.
2.2 Transactions supported by channel access
Channel access, the lower-level communications protocol for EPICS, supports get, set and monitor transactions directly. Each transaction applies to a single field of a single record: it communicates a scalar value. This support is less flexible than in DRAMA (where get, put and monitor transactions can apply to data structures) but is sufficient. No higher-level protocol is proposed for these transactions.
Channel access has no concept of obey or kick transactions. These cannot be expressed as manipulation of a single field of one record as several data are involved: the action name; the obey/kick directive; the list of parameters. The higher-level protocol to do obey and kick is the main subject of this design.
In a distributed system, it is possible for the client to lose contact with the server machine part-way through a transaction. Channel access has mechanisms to detect such disconnections, so get, put and monitor transactions, and the lower-level components of obeys and kicks, need not be left hanging.
2.3 Transaction support from CAD and CAR records
The CAD and CAR record-types developed for the Gemini project provide a base on which to build obey and kick transactions. The higher-level protocol is composed of gets, puts and monitors of fields in CAD and CAR records; the transaction does not need to address any other records in the database.
A CAD record handles the submission and validation of a command and the starting of the associated action. Fields are provided in which the client can put up to 20 scalar parameters for the command. To use the CAD, a client drives the record's internal state-machine through a number of stages by setting the record's DIR field to the following values:
CLEAR erase the remains of previous commands;
MARK latch in the command's parameters;
PRESET validate the command and its parameters;
START start the action;
STOP intervene in a action started by a previous transaction.
START is used in obey transactions and STOP for kicks. CLEAR is only needed for recovery of errors. The CAD has fields to report the success or failure of the command, but does not report on the action that the command starts.
A CAR record reports the state of an action. It also implements a state machine, the output of which is the indication to the client of when the action completes. There are additional fields showing an error code and error message.
In the Gemini usage, arbitrary numbers of CADs and CARs can be combined in a single operation using a third type of record: apply. ING's usage is simpler.
z For each CAD there is exactly one CAR.
z For each CAD-CAR pair, there is one action.
z Apply records are not used.
The public interfaces of the CAD and CAR records, described in reference 1, do not define a unique transaction protocol. Most of the protocol is fixed by the state machines in the records and by the input and output fields, but there are significant ambiguities concerning coordination of the CAD and CAR, concurrent transactions and behaviour after transactions fail. The design model in section 3 tries to resolve these uncertainties.
3 System design
3.1 Design method
A client and a server with one CAD-CAR pair are modelled using Hatley and Pirbigh's method. The client is shown executing both an obey transaction and a kick on the pair. This is a model of the essence of an implementation of the protocol; hopefully, this gives a more complete description of the situation than a bare list of signals and timings.
A data dictionary is given at the end of the model. It defines formally the usage of the data and control flows, although it does not go into details of primitive data types except where it is useful to record the encoding dictated by the CAD and CAR records. In the Hatley-Pirbigh notation, all literal values are denoted by string constants but this need not be carried over into the coding. Thus "TRUE" and "FALSE" may be implemented by the most natural means and the model does not imply that strings should be used in the final code.
Some points of notation in the logic charts deserve explanation. In the state-event matrices, the initial state is the one listed in the row below the column headings and the machine executes the actions in the top-leftmost cell before passing into the initial state. In the cells of these matrices, the operations are denoted by Action / next state. Where an action is given with no following slash, the transition is to the same state. Where there is an action and a slash but no following state-name, the state machine terminates. In the process-activation tables, and in the state-event matrices, Enable (E) means that a transform is turned on and stays turned on until further notice. Trigger (T) means that the transform runs instantaneously producing its outputs once. Activate (A) means that the transform is enabled but implicitly disabled again at the next state transition.
3.2 Decomposition of the model
3.2.1 Connections between client and server
Since we are interested only in the exchanges between client and server, no context diagram is needed and no flows go to or from the outside world.
Figure 1 show the data and control flows between the client and server. Each of the flows represents one field of the CAD-CAR pair.
The CA-repeater task is started automatically when the client makes the first channel-access connection. It informs the client if the connection with the server fails.
3.2.2 Arrangements in the EPICS database
The server program shown in figure 1.2;7 consists of the single CAD-CAR pair, an Action object representing the actual processing of the action and some time-out arrangements to do with the CAD record. Normally, Action would have other inputs and outputs concerned with the process controlled, but these are not shown because they do not affect the execution of the transaction.
The connections between CAD, Action and CAR are important: note how both CAD and Action supply CAR's inputs. It will be seen that CAD and Action send these data at different points in the transaction.
3.2.2.1 CAD time-out
The time-out logic is shown in figure 1.2-s1 below. It runs off the various forward links of the CAD record and resets (clears) the CAD if a start or stop directives does not follow a mark or preset directive within 30 seconds. This feature could be regarded as optional but I strongly recommend that it become standard for all ING servers. Without a timeout or similar reset, it is possible for the CAD record to be hung up by an uncompleted transaction.
NAME:
1.2.4
TITLE:
CAD_timeout
INPUT/OUTPUT:
CAD_DIR : control_out
BODY:
After 30 seconds set CAD_DIR to CLEAR.
3.2.2.2 CAR record
The CAR record, shown in figure 1.2.2, is very simple internally. It copies input flows to output flows when it detects state changes in its IVAL field.
Notably, there are some state transitions that a CAR record will not register. Authors of databases should study figure 1.2.2-s1 and make sure that they do not present improper values for CAR:IVAL otherwise the information will not be propagated to the client.
CAR includes a PAUSED state but most EPICS workers consider this unnecessary. ING databases should not report the PAUSED state.
NAME:
1.2.2.1
TITLE:
Report_action_result
INPUT/OUTPUT:
CAR_VAL : data_out
OERR : data_out
OMSS : data_out
BODY:
Copy CAR:IERR to CAR:OERR.
Copy CAR:IMSS to CAR:OMSS.
Propagate new values to all monitoring programs.
3.2.2.3 CAD record
The CAD record of figure 1.2.1 dictates most of the form of the transaction. Its core is the state machine shown in figures 1.2.1-s1 and 1.2.1-s2.
The CAD state-machine, which is described in reference 1, has the idle state between commands and one of its two active states while a command is being processed. After the command starts or modifies an action the state-machine goes back to idle; that is, it has no state to express the progress of the action, that information being available from the CAR record.
At appropriate points in the transaction, the CAD record activates its two data transforms Validate and Start_action. The former is implemented explicitly in the CAD as an application-specific subroutine to validate the command. A failure of validation prevents the CAD from entering the PRESET state and thus stops it starting or modifying the action. The latter transform combines aspects of the CAD implementation - generation of a pass/fail verdict on the starting of the command - with aspects of the record's connection to its CAR partner. Crucially, the CAD is required to set the CAR record into its BUSY state when starting the command: this makes it substantially easier for the client to detect the end of the action.
Figure 1.2.1-s1 (above) has two states - PRESETTING and STARTING - not mentioned in reference 1. These extra states reflect the different state-changes depending on the status return from Validate and Start_action. The current issue of reference 1 ignores (incorrectly) the CAD record's behaviour on error.
The transaction generates output signals both to the client - the MARK field of the CAD - and to other records - the various forward links. These are detailed in figure 1.2.1-s2.
NAME:
1.2.1.1
TITLE:
Validate
INPUT/OUTPUT:
Command_rejected : control_out
CAR_AtoT : data_out
CAD_VAL : data_out
CAD_MESS : data_out
CAR_AtoT : data_in
BODY:
Validate the command parameters AtoT.
If the parameters are valid:
Copy the parameters to their output.
Set CAD_VAL to zero.
Set Command_rejected to FALSE.
Otherwise:
Set CAD_VAL to an error code.
Set Command_rejected to TRUE.
NAME:
1.2.1.2
TITLE:
Start_action
INPUT/OUTPUT:
Command_failed : control_out
CAR_IVAL : data_out
CAR_MESS : data_out
CAR_IERR : data_out
CAD_VAL : data_out
CAD_MESS : data_out
BODY:
Set CAR_IVAL to BUSY.
Set CAR_MESS to no message.
Set CAR_IERR to OK.
If anything goes wrong:
Set CAD_VAL and CAD_MESS to report the error.
Set Command_failed to TRUE.
Otherwise:
Set Command_failed to FALSE.
3.2.2.4 Records implementing the action
The form of these records will be very dependent on the application. The only important point is that the outputs to the CAR record are produced correctly. The action has to respect the state machine in the CAR and must not try to force invalid state-transitions.
NAME:
1.2.3
TITLE:
Action
INPUT/OUTPUT:
CAR_IVAL : data_out
CAR_IERR : data_out
CAR_IMSS : data_out
CAD_AtoT : data_in
CAD_FLINK : control_in
BODY:
This transform represents the records or code that
executes the action. The action is assumed to begin by
reading in CAD_AtoT when CAD_FLINK is asserted.
At all times, the data CAD_IVAL, CAD_IERR and CAD_IMSS
must be kept up-to-date to reflect the state of the action.
Some constraints:
AD_IVAL should not be set to PAUSED.
CAD_IVAL should be set to BUSY until the action ends.
If the action ends successfully, CAD_IVAL should be set to
IDLE, CAR_IERR should be set to OK and CAR_MESS should
mot be set.
If the action fails, CAR_IVAL should be set to ERR,
CAR_IERR to an error code and CAR_MESS to a message
describing the error.
CAR_IVAL should be set after setting CAR_IERR and
CAR_IMESS (i.e. the CAR record should not be invited to
change state until the error information is set).
3.2.3 The client program
For the purposes of this model, the client consists of an application that invokes both an obey and a kick transaction on the example CAD-CAR pair in the server. The each transaction is represented by a data-transform that is started by passing a parameter list and ends by returning a result (a status code plus possible error-messages).
The two types of transaction are equivalent at this level except that the kick transaction has no connections to the CAR record.
By necessity, these transactions (and any others in the same client) are quasi-parallel. The model assumes no particular ordering of the messages for transactions with different CAD-CAR pairs, but see the discussion below of concurrent access to a single command.
The means by which the transaction objects connect to the server is not shown. The channel-access facilities dictate that the client will get a `channel identifier' for each field of each record that it wants to read or write. These identifiers can either be acquired en masse before the start of the first transaction or each transaction can get its own channels as it is activated. This choice is very much a detail of implementation and does not seem to effect the outcome.
The control flow CA_disconnection into each transaction is significant. This represents the channel-access repeater telling the client that contact with a server has been lost. Note that the signal goes to each individual transaction which must handle the error: the application code is not expected to trap the signal and abort the transactions.
3.2.3.1 The obey transaction
Figure 1.1.2 shows the processing needed for an obey. The core of the work is the control processing shown in figures 1.1.2-s1 to 1.1.2-s6, which implements a state machine that reflects the two state machines running in the CAD and CAR records. The client machine drives the CAD record's machine through its cycle by setting the CAD's DIR field successively to the appropriate directives. The client then monitors the CAR record to detect the end of the action, at which point the transaction is finished and the results are reported.
Several areas of this state machine bear closer examination: the exit conditions, the handling of the state of the action, the disconnection logic and the treatment of concurrent commands.
The state-transition logic, expressed in figure 1.1.2-s3, is intended to exit from the state machine - i.e. to end the transaction - either when the action finishes or at the first sign of an error. The CAD record can signal two types of error: command rejected (by the validation subroutine) and command failed to start. Both errors mean that the action will never start for this command and the transaction is dead. A failure in validation is routine: it happens if the command parameters are wrong or if the CAD record is aware of some interlocking condition in the server that invalidates the command. The command failure is expected to be very, very rare: it represents some programming error in the EPICS database. Significantly, the command failure can only be sensed by the client once validation is achieved. It is also possible for the action to start but to fail, but this is distinct from the command failure (it is sensed by the client via the CAR record, not via the CAD). On exiting the state machine from any of the states prior to running the CAD record is cleared back to its idle state to make it available to other commands.
The final exit path from the client's state machine is taken if the server fails to respond: the logic for this is shown in figures 1.1.2-s5 and 1.1.2-s6. The Timeout condition can occur in two ways: if the CA repeater sends the CA_disconnection signal, or through a time-out noted by the client itself. The latter mechanism covers possible programming errors in the server that leave it unable to complete the transaction; most errors in service are likely to be outright disconnections. Since the time to complete the action is not known by the client, no timeout can be applied to this part of the transaction. Hence, misbehaving CAR records can hang the transaction, but misbehaviour by the CAD cannot.
The logic for detecting the end of the action relies on the CAR record being in the BUSY state as soon as the CAD record reports that the action is started. This is critical: if there is a tiny gap when the CAD reports `action started' and the CAR reports `idle' then the client may conclude that the action has already finished (it is a race condition). This is the reason for the cross-connection of CAD and CAR noted above in the description of the server. Other arrangements are possible, but this one moves the responsibility from the client code to the server where it should be easier to code and where the level of test coverage is expected to be higher.
The logic for concurrent commands is in figures 1.1.2-s1 and 1.1.2-s3. Concurrent commands are not allowed, but concurrent actions are. The nature of the CAD record excludes concurrent commands (there is no command queue) and the record must be in its idle state before the client may mark it. However, if a transaction times out (after 30 seconds) while waiting for the CAD record to become free, the blocked client will clear the CAD back to idle as it abandons its transaction.
The treatment of concurrent actions is not specified by the transaction protocol but is determined by the construction of the database. Three outcomes are possible.
1. The second action is not allowed. To achieve this, the validation subroutine in the CAD record has to detect the attempted concurrency and reject the second command.
2. The second action over-rides the first silently. This will happen by default if no extra links are put into the database. The action started by the second command replaces that started by the first; when this new action ends, both transactions end with its final status.
3. The second action explicitly replaces the first. This is a variation on case 2 in which the first action ends, possibly with some warning condition-code, as soon as the second action starts. To achieve this the CAD record has to force the CAR record back to the state IDLE (or ERR if a warning is to be given) before setting the CAD to BUSY and signalling the start of the new action.
NAME:
1.1.2.1
TITLE:
Set_parameters
INPUT/OUTPUT:
CAD_AtoT : data_out
Parameter_list : data_in
BODY:
Set the items in the parameter list into the
command-parameter fields A to T of the CAD record.
Then set the DIR field to "MARK".
NAME:
1.1.2.2
TITLE:
Report_command_result
INPUT/OUTPUT:
Transaction_result : data_out
CAD_MESS : data_in
CAD_VAL : data_in
BODY:
Read the MESS and VAL fields of the CAD record.
Express the success or failure in Transaction_result.
NAME:
1.1.2.3
TITLE:
CAD_timeout
INPUT/OUTPUT:
Timeout : control_out
BODY:
Set Timeout to FALSE.
After 30 seconds, set Timeout to TRUE.
NAME:
1.1.2.4
TITLE:
Report_action_result
INPUT/OUTPUT:
CAR_OERR : data_in
CAR_OMSS : data_in
Transaction_result : data_out
BODY:
Read the OERR and OMSS field of the CAR record.
If OERR = 0:
Report success in Transaction_result.
Otherwise:
Report the action error in Transaction_result.
NAME:
1.1.2.5
TITLE:
Report_no_response
INPUT/OUTPUT:
Transaction_result : data_out
BODY:
Report lack of response from the server as an error in
Transaction_result.
3.2.3.2 The kick transaction
The kick transaction is deliberately made as similar to the obey as possible. Figure 1.1.3 shows the internal flows of command and data for the kick and should be compared to figure 1.1.2 for the obey. The subsequent figures show the logic and processing for the kick where it differs from the obey; transforms that are the same for kick and obey are not detailed.
The primary characteristics of the kick are:
z it applies a STOP directive instead of a START;
z it has no effect if the target action is not running;
z it does not connect with a CAR record;
z it does not monitor the action it affects, but ends as soon as the STOP directive is registered by the CAD.
The effect of the kick on the action is often to end the action but it may have other effects or no effect at all, depending on the state of the server; in the latter case, the server must complete the transaction even if it does not touch the application. All these responses use the same transaction code; only the client and server applications make the distinction.
Kick transactions may fail validation and be rejected by the server; or they may fail because the CAD record fails to deliver the command; or they may time out. Otherwise they succeed; the kick can never report an error in the action.
The protocol does not define a standard response for the case where a kick is issued to an action that is not running. The authors of specific applications may choose to have the server reject these kicks, but experience with DRAMA suggests that this does no service to the client and the useless kicks should typically be treated as successful.
3.2.4 Data dictionary
Action_running (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates whether an action is in progress. This mirrors
the state of the VAL field in the CAR record: "BUSY"
and "PAUSED" imply activity; the other states of the
VAL field do not.
*
CA_disconnection (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates if the connection from the client to the
server program has been broken. Channel access provides
a reliable indication of this.
If FALSE, this datum does not guarantee that a particular
record in the server is responding correctly.
*
CAD_AtoT (data flow, cel) =
.
*
A parameter of the transaction. The CAD record may have
2, 4, 8 or 20 of these fields called A, B, C, D...
*
CAD_DIR (control flow, del) =
["MARK" | "CLEAR" | "PRESET" | "START" | "STOP"].
*
The DIR (directive) field of the CAD record.
The values are encoded as integers:
0 = MARK
1 = CLEAR
2 = PRESET
3 = START
4 = STOP
*
CAD_MARK (control flow, del) =
["IDLE" | "MARKED" | "PRESET" ].
*
The MARK field of the CAD record. It shows the
state of the validation process for the transaction.
The value is encoded as an integer:
0 = IDLE
1 = MARKED
3 = PRESET
IDLE means that the command is available to a new
transaction.
MARKED means that a transaction has set some of fields of
the record but the command has not yet been validated.
PRESET means that that the command has been validated but
has not yet been started.
*
CAD_marked (control flow) =
["TRUE" | "FALSE"].
*
True when the CAD record has executed a MARK directive.
*
CAD_MESS (data flow, cel) =
.
*
The MESS field of the CAD record. It is a string that
holds an error message.
*
CAD_MLNK (control flow, del) =
["TRUE" | "FALSE"].
*
The MLNK field of the CAD record. This forward link is
asserted when the CAD is marked.
*
CAD_preset (control flow) =
["TRUE" | "FALSE"].
*
Shows when the CAD record has executed a PRESET directive.
*
CAD_SPLNK (control flow, del) =
["TRUE" | "FALSE"].
*
The SPLNK field of the CAD record. This forward link
is asserted when the CAD executes a STOP directive.
*
CAD_started (control flow) =
["TRUE" | "FALSE"].
*
Shows when the CAD record has executed a START directive.
*
CAD_STLNK (control flow, del) =
["TRUE" | "FALSE"].
*
The STLNK field of the CAD record. This forward link
is executed when the CAD executes a START directive.
*
CAD_stopped (control flow, del) =
["TRUE" | "FALSE"].
*
True when the CAD record has executed a STOP directive.
*
CAD_VAL (data/control flow, del) =
["OK" | "OTHER"].
*
The VAL field of the CAD record. It records the
status of the command. Normally, any errors reported
here mean failure of validation but it is also possible
to get codes that indicate failure of the EPICS database
itself.
The status is encoded as a 32-bit integer. OK is zero
and the token OTHER above stands for all possible non-zero
codes, each of which indicates a specific error. The
set of codes varies from command to command and is not
closed.
*
CAR_IERR (data flow, cel) =
.
*
The error code input to the CAR record. It is a 32-bit
integer.
*
CAR_IMESS (data flow, cel) =
.
*
The status message input to the CAR record. It is
an ASCII string.
*
CAR_IVAL (control flow, del) =
["IDLE" | "PAUSED" | "BUSY" | "ERR"].
*
The VAL field of the CAR record, showing possible states
of an action.
The current transaction model does not use PAUSED.
Servers should not express that state and clients
should treat it as BUSY.
The states are passed between records, and between the
server and the client, as numbers:
1: IDLE
2: PAUSED
3: BUSY
4. ERR
An action begins when the state goes to BUSY and ends when
the state next changes from BUSY to anything else.
*
CAR_OERR (data flow, cel) =
.
*
The OERR filed of the CAR record. This carries the status
of the transaction.
*
CAR_OMSS (data flow, cel) =
.
*
The OMMS field of the CAR record. It is a string holding
a message describing errors in the action.
*
CAR_VAL (control flow, del) =
["IDLE" | "PAUSED" | "BUSY" | "ERR"].
*
The VAL field of the CAR record, showing possible states
of an action.
The current transaction model does not use PAUSED.
Servers should not express that state and clients
should treat it as BUSY.
The states are passed between records, and between the
server and the client, as numbers:
1: IDLE
2: PAUSED
3: BUSY
4. ERR
An action begins when the state goes to BUSY and ends when
the state next changes from BUSY to anything else.
*
Clear_CAD (control flow, del) =
["TRUE" | "FALSE"].
*
A command, internal to the client-side transaction code,
to clear (reset) the CAD record.
*
Command_accepted (control flow, del) =
["TRUE" | "FALSE"].
*
The result of validation in the CAD record.
*
Command_available (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates whether the command is free to be preset by
the current transaction. A reading of "FALSE" indicates
that the command is being preset by a parallel transaction;
this situation is assumed to be short-lived.
The command is available if the associated action is
active. Non-availability of the action is registered
elsewhere in the system.
This datum matches the state of the CAD record as shown
in its MARK field. The command is available only when
MARK = 0.
*
Command_failed (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates whether a command that was previously accepted
was started correctly. The TRUE value indicates some
failure, presumably a programming error, in the EPICS
database. The FALSE value applies before the command
is validated and after while the action associated with the
command is executing. Failure of the action does not
set this datum to TRUE.
*
Command_marked (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates whether a command has been `marked' in the
CAD record: i.e. have the parameters (if any) been set and
is the command ready to validate?
*
Command_rejected (control flow, del) =
["TRUE" | "FALSE"].
*
The result of command validation in the CAD record.
*
Mark_CAD (control flow, del) =
["TRUE" | "FALSE"].
*
A command, internal to the client-side transaction code,
to send a MARK directive to the CAD record.
*
No_response (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates when the CAD-CAR pair fail to hold up their
side of the transaction. This can be because of
a failure of communications with the EPICS database,
because some other transaction has left the CAD record
marked for a long period, or because there are programing
errors in the EPICS database.
*
Parameter_list (data flow, pel) =
.
*
The set of parameters to be passed with the current
command. CAD records allow up to 20 scalar parameters
to be passed. Each parameter is passed as a string.
*
Preset_CAD (control flow, del) =
["TRUE" | "FALSE"].
*
A command, internal to the client-side transaction code,
to preset (validate) the CAD record.
*
Start_CAD (control flow, del) =
["TRUE" | "FALSE"].
*
A command, internal to the client-side transaction code,
to start the command in the CAD record.
*
Stop_CAD (control flow) =
["TRUE" | "FALSE"].
*
A command, internal to the Kick transaction,
to send a STOP directive to the CAD record.
*
Timeout (control flow, del) =
["TRUE" | "FALSE"].
*
Indicates that something has not happened in its allotted
span of time.
*
Transaction_result (data flow, pel) =
.
*
A status return expressing the final status of the
transaction. The return includes an error code and
may include error messages.
*
Transaction_state (control flow, del) =
["ACTIVE" | "ABANDONED" | "LOST" | "ENDED"].
*
States of the transaction that affect the reports to the
client program. The states of the state machine itself
are different.
ACTIVE means that the transaction engine is negotiating
with the CAD record or waiting for the CAR record to
show the end of the action.
ABANDONED means that the transaction failed without
starting an action: either the command was rejected
or the CAD record failed when starting the command.
LOST means that the transaction failed because contact
was lost with the CAD-CAR pair. The lack of response may
occur because the server mis-handled the transaction
protocol and does not necessarily imply a network failure
or server crash.
*
4 Notes for implementation
4.1 Summary of constraints on database design
For any EPICS database participating in the transaction protocol, the designer must abide by these rules.
1. CAD and CAR records must be used in pairs without apply records.
2. Each CAD should be prepared to process both START and STOP directives.
3. Each CAD-CAR pair must be interconnected such that the CAR shows busy at the time that the CAD shows that the START or STOP directive was accepted.
4. CAR records should not be caused to show the PAUSED state.
5. The database designer must choose the way of handling concurrent actions from the possibilities listed in section 3.2.3.1.
6. The database designer must designate the proper response to kicks on idle actions, choosing from the possibilities listed in section 3.2.3.2.
4.2 Naming conventions and locating the servers
Channel access hides the process of location of a particular database on the network by assuming that all databases have unique names. This feature is useful (one does not need to change any client software if a database is moved to a new IP address) but it complicates the business of running more than one copy of a database. We will need duplicate databases for engineering work and because we expect to deploy the same server software at three telescope on the same LAN.
To solve this, I propose that the name of each database used on
IACnet
embed the telescope name in the format telescope-database-type. For example:
INT-AG INT autoguider
JKT-AG JKT autoguider
ENG1-AG spare autoguider rack for engineering
WHT-EEV20 EEV20 CCDC at WHT: EEV20 is a unique sub-system but still takes the standard prefices.
The functions in libcia should prepend the telescope name automatically to given server-names, taking the value of the $TELESCOPE environment variable.
4.3 Suggested API for clients
It is intended that the transaction protocol be used by the type of transient-client programs that currently use DRAMA transactions; EPICS and DRAMA transactions will probably be used side by side in some clients. This suggests strongly that the EPICS transaction-protocol should be given a similar API to the RPC-like arrangements in libcia, which are described in reference 3.
I suggest that the current, DRAMA-specific API be left unchanged and a parallel API be built which can handle both EPICS and DRAMA. In all the relevant type and function names, the term RPC can be replaced by TCB for Transaction Control Block.
The basic TCB type will be ciaTcb_t. I do not define its structure here except to note that it will probably be a union of structures sharing public fields as follows:
union ciaTcb_s {
int variant;
ciaBool_t ready;
ciaBool_t active;
ciaBool_t finished;
StatusType status;
SdsIdType argsOut;
...
In this type, the Boolean flags have the meanings listed in reference 3. Variant is a flag to
distinguish the type of structure and would take values CIA_DRAMA, CIA_EPICS
etc.
ArgsOut is the handle of an SDS structure in which the values fetched by get and monitor transactions are encoded.
This is the suggested list of function calls. I omit those that apply only to DRAMA transactions.
ciaTcbObey( const char* serverName,
const char* actionName,
ciaTcb_t* tcb,
StatusType* status )
ciaTcbkick( const char* serverName,
const char* actionName,
ciaTcb_t* tcb,
StatusType* status )
ciaTcbGet( const char* serverName,
const char* paramName,
ciaTcb_t* tcb,
StatusType* status )
ciaTcbPut( const char* serverName,
const char* paramName,
ciaTcb_t* tcb,
StatusType* status )
ciaTcbMonitor( const char* serverName,
const char* paramName,
ciaTcb_t* tcb,
StatusType* status )
These calls initialize the TCBs for the five transaction types. In EPICS transactions, serverName identifies the database but not its location: channel access handles the location of the server machine. ActionName identifies the CAD_CAR pair, and paramName a record and field to interrogate. StatusType is a 32-bit condition code in the DRAMA convention. The effective serverName passed to channel access has the value of the environment variable TELESCOPE prepended, followed by a hyphen.
ciaTcbArgs( StatusType* status,
ciaTcb_t* tcb,
const char* format,
... )
This adds an argument list to a TCB. Characters in the format argument specify the type and number of the following arguments, each of which is encoded and made ready to send to the server when the transaction starts. The arguments will be assigned to the CAD record's fields A to T in the order given. The general form of this command allows SDS structures to be passed to DRAMA transactions but this is not allowed for EPICS transactions.
ciaTcb_t* ciaTcbExecute( StatusType* status,
int nTcbs,
... )
This executes one or more transactions in parallel: the trailing arguments are a list of nTcbs structures of type ciaTcb_t, each passed by reference. The transactions can be any mix of DRAMA and EPICS. The call returns when a get, set, obey or kick transaction completes or fails (including failure by disconnection), or when a monitor transaction has a value to return. The call will not activate any transaction which has ready set to false in its TCB. The active, finished and status fields will be updated in the participating TCBs on return from the call. The function returns the address of the TCB which changed state as the function result.
ciaTcbReuse( ciaTcb_t* tcb,
StatusType* status )
This resets the ready, active, finished, status and argsOut fields of the TCB, such that the transaction can be executed a second time if the TCB is passed to ciaTcbExecute().
The following code-fragment shows the parallel use of three obey transactions on two servers. The FIELD action on the autoguider server is made to follow the two probe movements by suppressing its ready flag.
ciaTcb_t probeX, probeY, field;
StatusType status = 0;
long xPos, yPos;
ciaTcbObey( "AGB", "PROBE-X", &probeX, &status );
xPos = 200;
ciaTcbArgs( &status, &probeX, "l", xPos );
ciaTcbObey( "AGB", "PROBE-Y", &probeY, &status );
yPos = 400;
ciaTcbArgs( &status, &probeY, "l", yPos );
ciaTcbObey( "AG", "FIELD", &field, &status );
field.ready = FALSE;
do{
(void)ciaTcbExecute( &status, 3, &probeX, &probeY, &field );
field.ready = (probeX.finished && probeY.finished);
} while( !field.finished && status == 0 );
Here, all three TCBs are set up before any transactions are started. Settings up the TCBs gets the necessary channels to the databases. The two probe transactions default to ready-to-run; the FIELD transaction is explicitly marked as not ready. The do loop starts all ready, non-active, non-finished transactions on each pass through the loop and there is a new pass through the loop each time an event occurs that could make a transaction ready.
This style of client-building has been scaled successfully to cover more than 15 (DRAMA) transactions run in parallel in the clients for the INT WFC. There seems to be no obvious reason why should not work as well with EPICS.