SOF-BOM-1
Issue 1.1; 5th December 1995
Royal Greenwich Observatory,
Madingley Road,
Cambridge CB3 0HJ
Telephone (01223) 374000
Fax (01223) 374700
Internet gtr@ast.cam.ac.uk
The Bill-of-materials language (BOM) and its processing utility (`bom'), are described. Instructions are given on the phrasing of bills of materials and on running bom. The syntax of the language is defined formally. The descriptions are worded for the use of software engineers needing to automate the configuration control for all or part of a large software-system.
Storing and building a system generally implies some standard layout of directories for source code, build areas, binaries, libraries and so on. BOM does not require any particular directory structure and directory trees are not discussed here. See [1] for details of a candidate for a standard directory tree.
1.2 Scope of the software
The Bill-of-materials language provides a way to specify how software components are built into an operational software system: that is, the system's configuration. BOM presumes the use of some repository from which successive versions of each component can be retrieved and the use of the standard utility `make'. The utility bom automates the use of these two facilities such that all or part of a system can be built with one command.
1.3 Glossary
assembly a unit of software built from one or more components.
BOM the Bill-of-materials language.
bill of materials any specification written in BOM.
bom the program that interprets BOM.
component the smallest unit of software that can be stored in a repository.
RCS GNU's revision-control system: a repository manager.
repository a place in which versions of components are stored.
SCCS source-code-control system: a repository manager.
1.4 References
[1] Controls for INT PFC: Configurations Management
ING/RGO document
INT-PF-3.
Many publications on configuration-control are phrased in terms of units; these units are sometimes implied to be something smaller than components. BOM regards a single C function, or its equivalent in other languages, to be a unit. Where a there is only one function in a file of code, that file is both a unit and a component. Where a file contains several functions, the functions are units but the file is the component; the units are not accounted for separately and BOM does not recognize them.
Most software needs to be compiled before it can be installed and used. BOM designates a file created by compilation to be an assembly. Assemblies are typically programs, but can also be, e.g., object libraries. It is convenient to regard a built system as consisting entirely of assemblies even though some of the files in that system may be components that needed no compilation: shell scripts for example. To accommodate this view, a file may be designated as both a component and an assembly.
Each component has a version number; this is central to configuration control. BOM is designed to work mainly with SCCS in which the versions are Dewey-decimal strings. The first version of a component entered into a repository is designated 1.1. Subsequent changes require that either the leading or trailing number in the designation be changed to create a new version. SCCS also allows branches in the line of descent designated by four-part Dewey-decimal strings, e.g. 1.1.1.1. BOM allows this notation; all version comparisons in BOM are done by matching strings.
Assemblies also have version numbers, using the same notation as the components. This allows a system or sub-system to be defined uniquely in terms of versions of its assemblies and those assembly-versions to be defined uniquely in terms of their components. Using BOM, an assembly is built under the control of a particular bill of materials: that is, a script in the BOM language. The version of an assembly is the same as the bill that assembled it; this point is crucial to preserve the uniqueness of the definition of the version. Several assemblies may be built from one bill, but they then share the same assembly version.
A statement is identified by one the of the keywords Component, Assembly, Tag, Makefile or Bill; these must occur in lower case with an initial capital. The statement need not start in the first column of the line, but there must be nothing but white-space between the start of the line and the start of the statement's keyword.
Comment lines are any characters on a line that doesn't contain a syntactically-valid statement. Any characters following a valid statement are also treated as a comment. Comments are entirely ignored.
3.1 Component statements
Component statements are the core of BOM. Each bill should include one component statement for each component in play, even if the other facilities of BOM aren't used.
A component statement ensures the presence in the current directory of a specific version of one component. If a version of the component already exists in the current directory, that file is first checked to see if it is the specified version. If the version is wrong, or if there is no version of the component in the working directory, the component is recovered from the specified repository. The syntax is:
Component component-id version repository-type repository-locationwhere the tokens are separated by white-space. Component-id is the name of the component: i.e. its file-name. Version is the version-designation, in Dewey-decimal format, as described in Section 2. Repository-type and repository-location are optional, but must appear as a pair where they are used; the type must be sccs,or copy and the location is a path-name identifying the repository.
An SCCS repository is always a directory called SCCS. Express the location of this directory by naming its parent directory. Components found in SCCS repositories are extracted via SCCS commands at the version specified in the component statement.
For a repository of type `copy', the location must be a directory holding only version of the component; the component is copied from the .repository to the working directory.
If no repository is specified for a component, the repository type is assumed to be SCCS and the location is as given by the environment variable PROJECTDIR. If PROJECTDIR isn't set, the build will fail.
This fragment shows how component statements are used.
Component sccdServerMain.c 2.4 Component sccdUnet.c 3.4 Component debug.h 1.6 sccs /ing/test Component extensions.c 4.5 copy /home/gtr/sccd Component messy.c -.- copy /imported/foreign/rubbishThe first two components are stored in some general SCCS repository (at ING, probably /ing/src) for the system, as defined by PROJECTDIR. The include-file, debug.h, is drawn from a different repository. The file extensions.c is copied from a developers home directory (this is a practice of last resort), but its version is specified. The last component originates outside the system and has no formal version; the single, current copy is kept in /imported/foreign/rubbish.
3.2 Tag statements
Assemblies, which are usually object code, are given machine-readable version-tags by compiling in special object modules. A tag statement generates the source for these modules in either C or PERL. The syntax is
Tag assembly-id version language filewhere language must be c or perl.
Tag statements have no effect unless the tag file is built into the assembly by make (or by manual compilation). For example, the tag statement
Tag example 504.8 cmatchs the make fragment
example : example.o TAG.c touch TAG.c cc -g -o TAG.o TAG.c cc -o example example.o TAG.o rm TAG.c TAG.oThe touch statement makes an empty tag-file if example.TAG.c doesn't exists when make is run: this allows the makefile to be used without bom. Removing the tag file after use ensures that it can't be picked up when building another assembly in the same directory.
3.3 Assembly statements
Assembly statements have no direct effect on the building of a system. They are provided as an automated check for anomalies in a complicated configuration.
An assembly statement checks the version of one copy of the named assembly. The syntax is
Assembly assembly-id version locationwhere assembly-id is the name in the assembly's version-tag. The location, which is optional, defines the file holding the assembly. If no location is given, the assembly is assumed to be a file in the working directory whose name is the same as assembly-id.
There are two main ways to use an assembly statement. The statement
Assembly libcia 4.2 /ing/s1.1/lib/libcia.awhen executed near the start of a bill, checks that the version of libcia installed by some other part of the system-build is the expected version, v4.2. The statement
Assembly mylib 1.2 mylib.awhich might appear at the end of the bill that assembles mylib.a checks that v1.2 of the assembly has been successfully built.
3.4 Makefile statements
A makefile statement executes make to build the specified target of the named makefile. The syntax is
Makefile makefile-name make-targetThe makefile itself, and any components it uses, must have been fetched by component statements preceding the makefile statement.
3.5 Bill statements
A bill statement causes the named bill to be processed in a subsidiary copy of bom. The syntax is
Bill bill-fileThe named bill, which is a component, must previously have been acquired through a component statement. This construct can be used recursively.
The most obvious use of Bill statements is to build an entire system from one starting point. The system is partitioned into assemblies (or related groups of assemblies) and one bill is written for each assembly (or each group). A master bill for the system collects a specific version of each of the other bills (and hence defines versions of those assemblies) and then executes them in turn. The version of the system is defined as the version of the master bill.
3.6 Indirection in file-names
Bills should never include absolute path-names unless the totality of the path is a standard agreed between all sites needing to use the software in question. In general, it is desirable to be able to remount directory trees at different points, and path-names can never be entirely standard.
A statement like
Component xyz.c 2.6 sccs /export/lpss10/ing/srcwill fail if the repository /ing/src/SCCS is moved from /export/lpss10. There is an immediate danger that bills will become specific to the installation details of one site. BOM avoids this by allowing indirection via environment variables. If the line above is changed to
Component xyz.c 2.6 sccs ${ING}/srcand ING is set to /export/lpss10/ing, then the two statements are equivalent but the second form is more portable. The braces in the reference to the environment variable are necessary in the general case.
3.7 Tagging components
Assemblies have their versions written in during the build using make and tag-files but the version of a component must be added to the source by its author. All components should be tagged in a way that allows bom to check them during a build and so that the developers can identify a version of a component when it is outside its repository.
The trigger-pattern
@(#)
introduces a version tag in a source or object file; the tag extends to the next new-line character. BOM requires a tag syntax as follows:
@(#) Component component-id versionand is insensitive to words following the version. The keyword `Component' must be in lower case with an initial capital.
For any language, the tag may be embedded in a comment, e.g.
/* @(#) Component xyz.c 23.6 created by gtr on 4/12/95 */and this is particularly valuable for non-compiled languages where the text original component is carried forward into the installed system. For compiled languages, it is convenient to propagate the tag into the object code, which can't happen if the tag is a comment. In the general case, the tag can be made into a string constant, e.g.
char xyz_c[] = "@(#) Component xyz.c 23.g";It may be necessary to be careful with the name of the variable. If every component includes a global variable called `version', then no programs will link cleanly!. Declaring the variable as local to the component (the `static' keyword in C) may help. Some compilers support a pre-processor construct for identifying components. For example, SunSoft C allows the form
# ident "@(#) Component xyz 23.6"
bom abc.bom xyz.bom cat abc.bom xyz.bom | bomare equivalent. The extension `.bom' on the names of the bills is a suggested convention and is not enforced by bom. However, the extensions must be given explicitly if used: bom will not automatically associate abc with the bill abc.bom.
bom recognizes command-line switches which suppress certain kinds of statement in the current set of bills.
-c suppresses checking and extraction of components.
-a suppresses checking of assemblies
-t suppresses generation of tag files.
-m suppresses invocation of make.
-b suppresses processing of subsidiary bills.
All facilities are enabled by default.
Any desired indirection of file-names must be set up before running bom by setting environment
variables. In the current version (v4) of bom, all file-name substitutions are allowed that are
supported by the user's default
shell.
This means that the ${NAME} construct should always work for the environment variable NAME and that the construct ~gtr for user gtr's home directory will work with csh and its derivatives but not for all other shells.
Bom does not halt when it finds errors. Barring fatal crashes, it will continue to the end of its current set of bills. It is always necessary to check the output from bom to find out if the build proceeded as planned.
It is sensible to keep a log of the work that bom does by redirecting its output:
bom xyz.bom >& xyz.bom.logBom's commentary is written on standard output but the output from most of the things that bom calls appears on standard error so it is necessary to redirect both streams.
Appendix A. Document history
Issue 1.1 05/12/95 Original document.
Appendix B. BOM syntax
The syntax is given in Backus-Naur form. Parentheses indicate grouping of tokens and do not appear literally in the BOM language. A superscripted plus sign indicates one or more repetitions of a token or group of tokens; superscripted asterisk indicates zero or more repetitions. The token Sp indicates a white-space character and the token Ch indicates a non-white-space printing character. BOL indicates beginning-of-line and EOL indicates end-of-line. Bracketed sequences are optional.
Statement := BOL Sp* (Component|Assembly|Makefile|Tag|Bill) (Sp|Ch)* EOL
Component := "Component" Sp+ Component-id Sp+ Version Sp+ [Sp+ Repository]
Assembly := "Assembly" Sp+ Assembly-id Sp+ version Sp+ [Sp+ File]
Tag := "Tag" Sp+ Assembly-id Sp+ version Sp+ [Sp+ File]
Makefile := "Makefile" Sp+ File Sp+ Make-target
Bill := "Bill" Sp+ File
Repository := ("sccs'|"copy") Sp+ File
Component-id := File (but with no directory specification)
Assembly-id := Ch+
Make-target := Ch+
File is a Unix file-name. Expansions of constructs inside File vary according to the shell in use.
Appendix C. Changes since v3
V4.2 of bom introduced the form of the language described above. V3 differed in the following ways.
The repository type field was not used. All components were taken from SCCS repositories.
The location field for components was interpreted differently. bom expected to see the fully-resolved path-name of the `SCCS history file' (e.g. /ing/src/SCCS/s.xyz.c) for the component concerned.
Indirection of file-names was not supported.
The file specifier in Tag statements was not recognized. All tags were written as TAG.c or TAG.pl.
The location field for assemblies was assumed to be the directory holding the assembly. The assembly was assumed to be a file whose name was the same as the assembly identifier. The new form allows, for example, files /ing/s0.0/lib/libcia.a and /ing/s0.0/lib/libcia.o to share the assembly version `libcia 4.2'.