A Short Description of PHENIX Run Control

Table of Content

Introduction
OOA of the Run Control
1. Information Model
2. State Models
OOD of the Run Control
What Happens at Startup Time of the Run Control
What Happens at Download Time of the Run Control
What Happens at StartRun/StopRun Time?

Introduction

We give here a short description of the run control software, how it is designed and translated into code as well as how it works internally when the user issues a certain command (e.g. when the user starts a run).

It is useful to keep in mind that this project started using an OO methodology (Shlaer-Mellor). Although it evolved quite a bit away from being a pure Shlaer-Mellor project, it nevertheless still has a lot of concepts that survived the course of time. Therefore we follow this OO methodology in describing in the first section the layout of the software (OOA of the run control). We first describe the classes and how they interrelate (using an information model or ERD [entity relationship diagram]). After that we say a few words about the state models of the few active objects in the run control.

In the second section we briefly describe how the model has been translated into C++ code and how it relates to the rest of the ONCS software.

In the following three sections we briefly describe what really happens inside the run control when the it is started, the user issues the download command or when a run is started.

Finally it should be kept in mind that the PHENIX ONCS project is far from being completed. Many things will be changed as the detector is completed and all four arms become operational....

OOA of the Run Control

Information Model

The figure below shows the information model of the run control (i.e. the main objects of the run control and how they relate to each other). The figure is followed by a brief description of the main objects.

Partition: The partition is the central object of the run control. Hardware is allocated by the partition object by the 1:M relationship with the granule object. Almost all user commands are directed to the partition object, which issues the configure, download, start/stop run commands to the process stages, which in turn control the process units (see below).
Granule: This object corresponds to an allocated granule. Notice how the 1:1 relationship between the granule and the GTM boards is the starting point of the ownership chain between the various hardware components of the PHENIX data acquisition chain (GTMs, FEMs, DCMs, ...).
Process Unit: Generic supertype of a component of the data acquisition chain. This can be a hardware component (e.g. an FEM) or a software component (e.g. a DD event pool or a data logger).
Process Unit Design: Generic object that encapsulates a given type of hardware. E.g. there is one instance for the GTM but three types for the DCM (fe1, fe2 and fifth DSP).
Process Stage: Generic object that controls a bunch of process units. E.g. all DCMs and partition modules of a given partition are controlled by one instance of the Process Stage object (called DCMstate). The following stages are normally active in the run control: GTMstage, FEMstage, DCMstage, EventBuilderStage, EventTransportStage, LoggingStage. There is a natural hierarchy in the sense that partition object only talks to the process stages, which in turn talk to their process units (see section on state models below) and report back to the partition.
Run: Object that encapsulates a given run. Only one run is active at a given time for a given run control (1:1 relationship). All parameters of the run object are given by the object run_parameters.

State Models

Currently the active objects (objects with a state machine) in the run control are: partition, process stage and run. Their state machine will be described in next:

State Model for the Partition Object

The figure above shows the state model of the partition object, which gets created into the configuring state at the startup time of the run control. A configure command can either be a command such as "allocate granule XYZ" or "GTM.DC.W modebitfile ./GTM.DC.W.gtm", which would set the configurable parameter modebit file for the process unit GTM.DC.W.

Once the user is finished configuring, he/she issues the download command. All the partition does at that point is to inform all process stages that are active to start downloading. The partition then waits until all process stages have reported a successful download (or the occurance of an error). A timeout mechanism guarantees that the error state is reached, if one or more process stages do not reply within a given time window.
Once all process stages have reported a success, the partition arrives in the Ready state, from where the user can either issue a new configure command or the start_run command.

A subtlety should be mentioned here: we normally talk about the download that bring the data acquisition chain into the ready state. As a matter of fact this is done in two steps: The first step is the "initialise" procedure. This is normally only done once during the lifetime of a partition. Then there is the proper download, which can be done several times during the lifetime of the partition. As example: the GTM objects in VxWorks get created and reset at the beginning when the run control start up (initialisation phase). However the modebitfile can be loaded many times over the lifetime of a partition (e.g. download the modebitfile, take 1000 events, end the run, download another modebitfile, take another 1000 events, etc). Therefore the loading the modebitfile is done during the download phase of the run control.
However the user typically never sees that there is the additional step of the initialisation done the first time "download" is executed.

Finally it should be noticed that the partition has no information about the process stages or the process units. All it does is to pass the user commands to the process stages and waits for their reply.

State Model for the Process Stage Object

The process stage object is really only a container object for a bunch of process units. As such it has no clue what the process units really are. This enables a uniform state model for all instances of the process units, i.e. the behaviour of this object is independent of what it controls.

The process stage normally waits for a command of the form "initiate state transition to ready" (which really means "bring all your process units into the desired state X" (say X = ready). Upon arrival of such a command, the process stage informs all its process units that they should start the process which will eventually get them into the ready state. Note that the process stage has no clue about the detailed operations, that the process units have to do. All that information is encapsulated in the process units themselves. All the process stage does is to call a virtual member function of the process unit (e.g. initialise(), download(), start_run() or stop_run() ).

The process stage then waits until it received the information from all process units that they have arrived in the desired state. The partition object gets notified if that is the case within a given timeout window. Otherwise the process stage drops into the error state.

There is the possibility that a process unit could report a state change ("pu done" in the figure above) outside the above described sequence. That situation is handled in the state "update process unit state".

Finally a word about the relation between the process stage and the process unit. The process stage is the active object, the process units are not, but the grand summary of their state is an attribute of the process stage. However the state of the process units does not play a role in the state transitions of the process stage. The process stage per se is only interested to bring the process units from an initial state X to a final stage Y.
The subtypes of the process units (eg GTM or DCM objects) have all the relevant informations what exactly has to be done when during the download/start/stop run sequence.

State Model for the Run Object

The state model of the run object is very simple: It gets created, waits until it gets the "started" command from the run control and drops into the running state. Once in the running state, it will perform a number of checks in regular intervals (such as updating how many events have been taken and to check whether the event limit has been reached (if the user has specified a limit)).

OOD of the Run Control

How is the above described run control model mapped into C++ code? The following rules have been applied:

Every copy of the run control becomes one CORBA server.: This means that if e.g. three partitions take data simultaneously, three CORBA servers of the run control are running and there are three partition objects in the ONCS system (one per CORBA server).
How does this CORBA run control server receive the commands?: The run control task accepts command from standard input, from an ASCII files containing a list of commands or through CORBA. On the client side of the CORBA server can either be individual commands, a script or a graphical user interface.
Which objects are CORBA compliant?: The objects with a state model (partition, process stage and run object) are CORBA compliant. To be more precise: These objects of the run control have a CORBA compliant proxy object. This choice was taken to insulate the CORBA compliant code (which is really the ORBIX compliant code) into one object and separate it from the actual run control object.
How are the objects mapped into C++ classes?: Every object in the run control model is translated into a C++ class. Each class inherits from a template that organizes the objects by their ASCII name (template genNamedObj).
Relationships between objects are translated into adding pointers to the corresponding instance (for a to-one relationship) or a STL list of pointers (for a to-many relationship). [Note that this will change as move some of the objects into the database].
What is the relation between process unit and the corresponding real world object?: Every real world object (DCM, DD pool, etc) has a proxy object in the run control server, that inherits from process_unit. The process unit has the information about the allowed states (initialised, downloading, ready, etc..) and defines virtual default member functions for initialise, download, start/stop run. In addition there is the virtual member function update_pu(), which is called every time a real world object (eg a DCM object in VxWorks) reports a state change. update_pu() then decides whether the chain of operation has come to an end (i.e. ready state reached) or whether further commands have to be sent from the run control to the hardware component.
Is there a difference between process units that represent hardware objects and the ones that represent software objects?: No, there is no difference. From the run control's point of view they are just process units.
How does the run control know which objects belong to which granule?: Every granule has currently one ASCII configuration file. It is foreseen to put this granule specific information into the database in the future.

What Happens at the Startup Time of the Run Control?

The following operations take place when the user starts the run control (e.g. by typing daq_boot.sh Id DC.W+DC.E+PBSC.W).

A check is performed that it is legal to start this copy of the run control (there are not too many copies already running, in the future it will be checked that the list of granules are in agreement with what has been loaded into the GL1).
A generic configuration file ($RC_HW_CONF/generic.pcf) is parsed and the generic (non-granule specific) objects are constructed (i.e. the partition, the process stages, the granules, the process unit designs,...).
If the environment variable $RC_INIT is set and is pointing to a readable file, run control commands are read from that file and executed.
The second command argument (i.e. the list of granules to be allocated) is interpreted. The ASCII configuration files for every found granule is parsed and the granule specific objects are constructed.
The run control arrives in the configure state and accepts commands from the outside. If the environment variable $RC_CONFIG is set and pointing to a readable file, run control commands are read from that file and executed.

What Happens at Download Time of the Run Control?

The following operations take place when the user issues the download command.

If this is the first time download is issued, the initialisation procedure is started. All process stages are informed to bring their process units into the initialised state.
The process units in turn invoke the virtual member function on their process units, which send the CORBA commands to the event notifier, which in turn will pass them to the CORBA objects in the outside world. As these objects execute the commands, they reports the status back to the proxy object in the run control. If an error is reported, the process stage drops into the error state. If success is reported, update_pu() is called on the process unit, until the process unit reports that it has reached the initialised state.
The process stage reports back to the partition that the initiliased state has been reached, when all process units have reached the initialised state.
The same step is repeated for the download command sequence. (Remember that initialisation is done only once at the first download). The partition is now in the ready state from where a run can be started.
The user can also issue another configuration command, which will bring the partition object back into the configuring state. Note that a download command has to be issued before a new run can be started. The download command has a parameter which allows to download only those components whose configuration has been changed. This allows to significantly reduce the download time.
How is the interdependency between GTMs and FEMs resolved?
Very simple: the initialise member function of the FEM object only sets the state attribute to initialising. When the GTM object has performed the reset and resetGLINK, it starts the actual Arcnet initialise command.

What Happens at StartRun/StopRun Time?

The following operations take place when the user issues the startRun/stopRun command:

Partition creates a run object.
Partition tells all process stages to bring their process units into the running state.
Process units invoke virtual member function start_run() on their process units. Process units send out the corba commands to their real world counter parts.
Real world counterparts send done event back to their run control proxies, which will either send another command or when all commands have been send, they inform the process stage that they are now running.
Process stage waits until all its process units are in the running state. Once this has happened, it informs the partition.
Partition waits until all process stages report that they are running. Once this has happened it checks whether the environment variable $RC_SOR is set and pointing to a readable file, run control commands are read from that file and executed.
The partition tells the run object that the run has started and the GTM VME busy is lowered on all allocated granules. Data taking starts.

The end run sequence is almost identical. Note also that all the sequences (initialisation, download, start/stop run) are pretty much the same!