REANA is a reproducible research data analysis platform developed at CERN. It is considered for Analysis Preservation in PHENIX due to the following features:
REANA workflows can be represented as Directed Acyclic Graphs which is reflected in the YAML schema based on the Common Workflow Language (CWL). Each computational component of a workflow may require a separate and distinct Docker container, although individual steps can be as simple as a shell command writing a comment to a log file, in which case containers would be redundant.
Execution of workflows in REANA takes place on machines configured as a REANA cluster. One such cluster is available to CERN users, and there are instances at other institutions. There is also a REANA cluster being evaluated at BNL. It is available on the internal BNL network only.
The cluster is separate from the rest of the computing facility in the sense that it does not share any file systems with the rest of the machines at BNL. Interaction with the REANA cluster is only possible using its network interface (HTTPS). A Web GUI if available for quick monitoring of the user’s workflows in various stages of execution.
Full access to all REANA functions is possible with the CLI client. Importantly, the client is used to stage in the input data, configuration and other necessary data products if necessary. The client is also used to stage out the outputs of the completed workflows i.e. bring these data to a desired location. As stated above, REANA machines do not have direct access to shared file systems like interactive or batch worker nodes at BNL. The software client must be used for bringing data to and from the REANA machines.
Access to the REANA cluster at BNL is controlled by the administrators. If approved, the user will be able to obtain an access token which will then be used by the client software (as a properly set environment variable) to authenticate to the cluster.
To initiate this process the user must first apply for an account by choosing the “sign up” option in the Web interface. In case of BNL having a valid SDCC account is a prerequisite for approval. Once an account is created, it becomes possible to log in and then the user can obtain their REANA access token in the profile section in the upper right corner of the REANA web page.
Client software must be installed on the user’s machine. It is Python-based and at the time of writing Python 3.6 and higher is recommended. This is often done via the Python “virtual environment” mechanism. If the “virtualenv” tools is available the following example will work:
# Assuming the user runs "bash" on their personal machine: # Create new virtual environment cd mkdir .virtualenvs # the exact name of this folder is unimportant virtualenv ~/.virtualenvs/reana # "reana" folder can be named differently as well, as long as its use is consistent source ~/.virtualenvs/reana/bin/activate # a self-contained Python environment is now available # Install reana-client pip install reana-client # installation takes place within the virtual environment
The “activate” step will be necessary every time a new shell/window is created for interacting with REANA. If no longer necessary, the virtual environment can be deactivated. A full session will look something like:
# Enter the virtual environment source ~/.virtualenvs/reana/bin/activate # ... REANA commands here... # Leave the virtual environment deactivate
Alternatively, if “virtualenv” is not available (such is the case on the interactive SDCC nodes) a slightly different method may be used:
# Typically tcsh is used on "rcas" nodes so this example is for tcsh # First, add custom SDCC location for Python 3 to PATH setenv PATH /u0b/software/jupyter/python/3.8.0/bin:$PATH cd mkdir .virtualenvs # the exact name is unimportant python3 -m venv .virtualenvs/reana source .virtualenvs/reana/bin/activate.csh pip install reana-client rehash
After the installation process is finished it is a good idea to check if the client is functional, for example
# Check if it's alive - should print a help screen reana-client --help
The client is equipped with very helpful nested help screens i.e. each command will also accept the “–help” option which will cause it to print help information specific for that command. For example:
reana-client download --help # ...will output help information specific to the "download" command
A SSH tunnel is required to access the REANA cluster at BNL from an outside location.
# Establish a SSH tunnel ssh -L 30443:kubmaster01.sdcc.bnl.gov:30443 ssh.sdcc.bnl.gov
Assuming a token has been obtained and a SSH tunnel established on port 30443 a test session might look like this:
# Assuming the user is running bash: # Set REANA environment variables for the client export REANA_SERVER_URL=https://localhost:30443 export REANA_ACCESS_TOKEN=________ # user's REANA token # If running tcsh replace the above lines with "setenv" # # Clone and run a standard simple analysis example git clone https://github.com/reanahub/reana-demo-root6-roofit cd reana-demo-root6-roofit reana-client run -w root6-roofit
The status of the workflow can now be checked using the REANA Web UI. Assuming
the user established a SSH tunnel as explained above, this is done by pointing
the browser to
When working within the BNL perimeter i.e. on the interactive nodes such as “rcas” machines the procedure of using the client is exactly the same however the server URL for the client needs to be specified directly as opposed to the ssh tunnel:
# If running bash: export REANA_SERVER_URL=https://kubmaster01.sdcc.bnl.gov:30443 # # # If running tcsh: setenv REANA_SERVER_URL https://kubmaster01.sdcc.bnl.gov:30443 #
REANA uses the following defaults when operating the
reana.yamlfound in the current folder.
This defauls behaviors can be changed if needed by using options:
-fwill inform the client that the workflow definition needs to be read from the user-specified file, not from the default
-wallows the user to assign custom names to workflows.
reana-client run -f my_workflow_file.yaml -w my_custom_workflow_name
Progress of REANA workflows can be tracked in the Web-based GUI provided by each cluster or via the CLI, reana-client. Likewise, outputs files generated by the workflows (including the example above) are available for download both via the GUI and the CLI. If a workflow is no longer useful it can be deleted from the REANA system:
reana-client delete -w my_custom_workflow_name
List of (many) commands that can be used with the client can be easily referenced by
--help option. There are also other options, some of the more
useful ones are listed here (mostly overriding default values):
-w name of the workflow -t access token -f file (default is "reana.yaml") -o path to the directory where the files are to be downloaded # for the "download" command
One of the available options in the definition of a REANA workflow is
This option is not mandatory and performs a helper function in cases when contents of
a whole directory should be staged to the workspace of a running REANA process. This
can have unintended consequences, for example an attempt to stage a massive AFS
folder or some other file system with inherent latency or of a large size may result
in a lot of network traffic on the submitting host and the whole process taking
an unreasonably long time. Issues with storage quotas on the REANA cluster are also
Caution must be exercised.
If the user decides to delete a workflow, the respective workspace (i.e. the sandbox storage where jobs were wun) won’t be deleted by default, which may cause confusion. To perform the deletion of both the workflow and data associated with it, one must use correponding command line options, for example:
reana-client delete --include-workspace -w my_workflow.1
To remove a workflow from the system completely, including any traces in the Web interface, the “–include-records” option should be added.