Last updated 2020-11-25 for Arvados 2.1.
Feature | Arvados |
---|---|
Documentation | ✅ |
How-Tos | ✅ |
Install guides | ✅ |
GUI | ✅ |
CLI | ✅ |
Demo | ✅ |
Local install | ⚠️ |
Cluster | ✅ |
Cloud | ⚠️ |
Complex setup | ⚠️ |
Complex use | ✅ |
CWL version | v1.2 |
🚧 - Work-in-progress
✅ - Support
❌ - No support
⚠️ - Complicated
Arvados
Arvados is a server-based workflow engine that can also manage workflows, containers and data for multiple users.
The server can be communicated with programmatically using Arvados REST API, wrapped by language bindings for Python, Go, R, Ruby, Java and Perl.
The command line tool arv
can be used for communicating with the Arvados server, and the arvados-cwl-runner provides a cwl-runner
compliant command line for executing CWL workflows in one go.
The Arvados server also provide a Web interface, the Arvados Workbench.
Arvados' maker Curii is one of the major contributors to the Common Workflow Language standards, and Arvados 2.1 supports the latest CWL specification 1.2.
Features
Arvados is perhaps the most feature-rich workflow engine for executing CWL, with multiple options for access. It has co-evolved with CWL as its native workflow format, supporting almost all CWL features.
GUI
- GUI? Yes
The Arvados Workbench provides a Web-based interface to running workflows on Arvados.
It is also possible to create/modify CWL workflows in a graphical editor using the Arvados Composer, which is based on the standalone Rabix Composer.
However it is more common to edit CWL workflows locally and interact with Arvados server using the command line tool.
CLI
- CLI? Yes
The arv command line tool provides shell access to interact with a remote Arvados server, to upload CWL workflows and its container images, to run workflows and to stage/unstage their inputs and outputs.
The arvados-cwl-runner provides a cwl-runner
interface compatible with cwltool
, allowing remote workflow execution as if it was local.
API
- API? Yes
All features of Arvados can be accessed through the Arvados REST API, which is also wrapped by language bindings for Python, Go, R, Ruby, Java and Perl.
Demo
- Demo? Yes
Arvados can be tried in the Arvados Playground or installed locally for evaluation purposes using Arvados-in-a-box.
The playground includes a pre-computed run of the tutorial Processing Whole Genome Sequences which can be followed step-by-step to get to know the Arvados Workbench.
Installation options
As a client/server architecture Arvados can be complex to install. The multiple installation options of Arvados have varying degree of setup difficulty, features and customization options.
Hosted cloud installs and subscriptions of Arvados are provided by its main developer Curii, including the free Arvados Playground which can be tried for evaluation purposes before installing Arvados.
Local install
Arvados supports two main ways to install the server on a local machine.
- Arvados-in-a-box
- Single host install
For pure ease of install, Arvados-in-a-box would be the recommended way forward for a test and development environment, with progression to the single-host install for a production environment.
Arvados-in-a-box
Arvados-in-a-box is a Docker-based distribution. This uses a Docker container to run an instance of Arvados on a single machine. This requires that the user have root
privilages and Docker installed, but is relatively straight forward to run using a dedicated arvbox
command line tool:
$ git clone https://github.com/arvados/arvados.git
$ cd arvados/tools/arvbox/bin
$ ./arvbox start localdemo
$ ./arvbox adduser demouser demo@example.com
Note that many CWL workflow use DockerRequirement, but running Docker-in-Docker requires -privileged
mode (effectively giving Arvados root access) or experimental rootless Docker in Docker.
Arvados-in-a-box approach is intended for demonstration/testing purposes and is not intended for production use.
Single host Arvados
The Single host install of Arvados uses Saltstack to install and configure Arvados as individual components on the server, as such this requires and installation and knowledge of Salt.
For the purposes of testing locally this is perhaps better suited to those who plan to use the cluster install of Arvados (using Saltstack) later. There is also the option to do a complete manual install of Arvados, although they themselves note that this is complex.
Cluster/Cloud install
Arvados can be configured for use on a cluster in 3 main ways:
- Using Saltstack to setup and configure a cluster
- Setting up Arvados to work with a Kubernetes cluster
- Manual installation of Arvados and the cluster.
Salt/Vagrant
Arvados can be installed in a virtual machines using a combination of Vagrant (for building the virtual machine) and SaltStack (for software install and configuration management).
Following the instructions for Salt you can choose to install using one of three options:
- Use Vagrant to install Arvados in a virtual machine
- Arvados on a single host
- Arvados across multiple hosts
Kubernetes
Arvados install on Kubernetes is documented for Minikube and Google Kubernetes Engine, both using Helm.
Manual install
Installing Arvados manually on a Linux server is the most time-consuming option, but allows you to pick-and-choose Arvados Components and rely more on the distribution's own software, for instance Postgres database.
The prerequisites lists all required software, after which package repositories for Centos 7 or Debian/Ubuntu add the Arvados modules.
Supported distributions as of Arvados 2.1:
- Centos 7 (by implication also RHEL 7)
- Debian 10 buster
- Debian 9 stretch
- Ubuntu 18.04 bionic
- Ubuntu 16.04 xenial
Cloud storage
It is possible to configure Arvados' Keepstore storage module to save data on file system, S3 or Azure blobs.
The working directories of Arvados for storing workflow definitions are also accessible as git
repositories.
Compute nodes
While it is possible to execute workflows locally on the Arvados head node, it is recommended to configure compute nodes that can execute the individual tools from the steps of the CWL workflow. These tools are run from Docker containers.
Cloud: AWS/Azure
Arvados has helpers to build compute node images for Azure and Amazon AWS.
Connecting Arvados to worker nodes on the cloud requires extensive configuration.
Cluster: Slurm
Clusters using the Slurm workload manager can be used as compute nodes by Arvados.
The compute nodes in the cluster must be prepared for Arvados, e.g. installing Docker.
Documentation
Extensive documentation for Arvados is available on https://doc.arvados.org/, including:
- Arvados User Guide
- Arvados installation
- Arvados SDKs
- Arvados CLI
- Arvados API
- Arvados Admin
- Arvados Architecture
The tutorial Processing Whole Genome Sequences explores step-by-step the setup and execution of a real-life bioinformatics pipeline using the Arvados Workbench GUI and the arv
command line tool.
Support options: Community/Enterprise
The Arvados Community provides support and collaboration for the open source edition of Arvados. Enterprise support is available from Curii.
Pros/cons
Arvados can be a good choice in these situations:
- Multiple users of single compute architecture
- Productionizing a relatively fixed workflow
- API integration is desired (e.g. to build custom Web Apps)
- Long-running workflow service
Arvados can be difficult in these situations:
- Workflows change often (as they need to be uploaded to Arvados)
- Tools change often (as container images need to be pre-loaded)
- Single user (too heavyweight for install on a laptop)
- Training situations (although Arvados-in-a-Box can be easily launched)