REANA


Overview

REANA is a reproducible research data analysis platform developed at CERN. It is considered for Analysis Preservation in PHENIX due to the following features:

REANA workflows can be represented as Directed Acyclic Graphs which is reflected in the YAML schema based on the Common Workflow Language (CWL). Each computational component of a workflow may require a separate and distinct Docker container, although individual steps can be as simple as a shell command writing a comment to a log file, in which case containers would be redundant.

Execution of workflows in REANA requires a properly configured REANA cluster. One such cluster is available to CERN users, and there are instances at other institutions. There is also a test instance currently being evaluated at BNL and it is available on the internal BNL network only. Access to REANA clusters is controlled by their administrators granting access tokens to qualified users. The user interacts with a REANA cluster via its network interface (HTTPS), either via the Web GUI for a quick overview of workflows in various stages of execution, or the CLI client which affords the user full access to all REANA functions. The client also makes it possible to use an automated agent for interaction with the system by scripting various actions.

Getting Started

To be able to access a REANA cluster the user must be issued an access token by the administrators (this may be specific to each institution hosting its REANA facility and typically involves visiting the requisite Web page). REANA client must be installed on the user’s machine. It is a Python-based tool so optimally this is done via the “virtual environment” mechanism:

# create new virtual environment
virtualenv ~/.virtualenvs/reana
source ~/.virtualenvs/reana/bin/activate
# install reana-client (may need sudo)
pip install reana-client

The “activate” step will be necessary if a new shell/window is created for interacting with REANA. A SSH tunnel is required to access the REANA cluster at BNL. Assuming a token has been obtained and a SSH tunnel established on port 30443 a test session might look like this:

# set REANA environment variables for the client
export REANA_SERVER_URL=https://localhost:30443
export REANA_ACCESS_TOKEN=________ # user's REANA token
# clone and run a simple analysis example
git clone https://github.com/reanahub/reana-demo-root6-roofit
cd reana-demo-root6-roofit
reana-client run -w root6-roofit

By default the client will look up the workflow definition from the file reana.yaml found in the current folder. The -w option (“workflow”) simply defines the handle/name by which this workflow will be know to the system. The name can be anything. To specify a different workflow definition file and a different name one might use something like

reana-client run -f my_workflow_file.yaml -w my_custom_workflow_name

Progress of REANA workflows can be tracked in the Web-based GUI provided by each cluster or via the CLI, reana-client. Likewise, outputs files generated by the workflows (including the example above) are available for download both via the GUI and the CLI. If a workflow is no longer useful it can be deleted from the REANA system:

reana-client delete -w my_custom_workflow_name
Useful Options

List of (many) commands that can be used with the client can be easily referenced by using its --help option. There are also other options, some of the more useful ones are listed here (mostly overriding default values):

-w name of the workflow
-t access token
-f file (default is "reana.yaml")
-o path to the directory where the files are to be downloaded
Caveats

One of the available options in the definition of a REANA workflow is directories. This option is not mandatory and performs a helper function in cases when contents of a whole directory should be staged to the workspace of a running REANA process. This can have unintended consequences, for example an attempt to stage a massive AFS folder or some other file system with inherent latency or of a large size may result in a lot of network traffic on the submitting host and the whole process taking an unreasonably long time. Issues with storage quotas on the REANA cluster are also possible. Caution must be exercised.