Last updated 2020-11-25 for Arvados 2.1.

Feature Arvados
Documentation
How-Tos
Install guides
GUI
CLI
Demo
Local install ⚠️
Cluster
Cloud ⚠️
Complex setup ⚠️
Complex use
CWL version v1.2

🚧 - Work-in-progress
✅ - Support
❌ - No support
⚠️ - Complicated

Arvados

Arvados is a server-based workflow engine that can also manage workflows, containers and data for multiple users.

The server can be communicated with programmatically using Arvados REST API, wrapped by language bindings for Python, Go, R, Ruby, Java and Perl.

The command line tool arv can be used for communicating with the Arvados server, and the arvados-cwl-runner provides a cwl-runner compliant command line for executing CWL workflows in one go.

The Arvados server also provide a Web interface, the Arvados Workbench.

Arvados' maker Curii is one of the major contributors to the Common Workflow Language standards, and Arvados 2.1 supports the latest CWL specification 1.2.

Features

Arvados is perhaps the most feature-rich workflow engine for executing CWL, with multiple options for access. It has co-evolved with CWL as its native workflow format, supporting almost all CWL features.

GUI

  • GUI? Yes

The Arvados Workbench provides a Web-based interface to running workflows on Arvados.

It is also possible to create/modify CWL workflows in a graphical editor using the Arvados Composer, which is based on the standalone Rabix Composer.

Arvados Playground screenshot

However it is more common to edit CWL workflows locally and interact with Arvados server using the command line tool.

CLI

  • CLI? Yes

The arv command line tool provides shell access to interact with a remote Arvados server, to upload CWL workflows and its container images, to run workflows and to stage/unstage their inputs and outputs.

The arvados-cwl-runner provides a cwl-runner interface compatible with cwltool, allowing remote workflow execution as if it was local.

API

  • API? Yes

All features of Arvados can be accessed through the Arvados REST API, which is also wrapped by language bindings for Python, Go, R, Ruby, Java and Perl.

Demo

  • Demo? Yes

Arvados can be tried in the Arvados Playground or installed locally for evaluation purposes using Arvados-in-a-box.

The playground includes a pre-computed run of the tutorial Processing Whole Genome Sequences which can be followed step-by-step to get to know the Arvados Workbench.

Installation options

As a client/server architecture Arvados can be complex to install. The multiple installation options of Arvados have varying degree of setup difficulty, features and customization options.

Hosted cloud installs and subscriptions of Arvados are provided by its main developer Curii, including the free Arvados Playground which can be tried for evaluation purposes before installing Arvados.

Local install

Arvados supports two main ways to install the server on a local machine.

  • Arvados-in-a-box
  • Single host install

For pure ease of install, Arvados-in-a-box would be the recommended way forward for a test and development environment, with progression to the single-host install for a production environment.

Arvados-in-a-box

Arvados-in-a-box is a Docker-based distribution. This uses a Docker container to run an instance of Arvados on a single machine. This requires that the user have root privilages and Docker installed, but is relatively straight forward to run using a dedicated arvbox command line tool:

$ git clone https://github.com/arvados/arvados.git
$ cd arvados/tools/arvbox/bin
$ ./arvbox start localdemo
$ ./arvbox adduser demouser demo@example.com

Note that many CWL workflow use DockerRequirement, but running Docker-in-Docker requires -privileged mode (effectively giving Arvados root access) or experimental rootless Docker in Docker.

Arvados-in-a-box approach is intended for demonstration/testing purposes and is not intended for production use.

Single host Arvados

The Single host install of Arvados uses Saltstack to install and configure Arvados as individual components on the server, as such this requires and installation and knowledge of Salt.

For the purposes of testing locally this is perhaps better suited to those who plan to use the cluster install of Arvados (using Saltstack) later. There is also the option to do a complete manual install of Arvados, although they themselves note that this is complex.

Cluster/Cloud install

Arvados can be configured for use on a cluster in 3 main ways:

  • Using Saltstack to setup and configure a cluster
  • Setting up Arvados to work with a Kubernetes cluster
  • Manual installation of Arvados and the cluster.

Salt/Vagrant

Arvados can be installed in a virtual machines using a combination of Vagrant (for building the virtual machine) and SaltStack (for software install and configuration management).

Following the instructions for Salt you can choose to install using one of three options:

Kubernetes

Arvados install on Kubernetes is documented for Minikube and Google Kubernetes Engine, both using Helm.

Manual install

Installing Arvados manually on a Linux server is the most time-consuming option, but allows you to pick-and-choose Arvados Components and rely more on the distribution's own software, for instance Postgres database.

The prerequisites lists all required software, after which package repositories for Centos 7 or Debian/Ubuntu add the Arvados modules.

Supported distributions as of Arvados 2.1:

  • Centos 7 (by implication also RHEL 7)
  • Debian 10 buster
  • Debian 9 stretch
  • Ubuntu 18.04 bionic
  • Ubuntu 16.04 xenial

Cloud storage

It is possible to configure Arvados' Keepstore storage module to save data on file system, S3 or Azure blobs.

The working directories of Arvados for storing workflow definitions are also accessible as git repositories.

Compute nodes

While it is possible to execute workflows locally on the Arvados head node, it is recommended to configure compute nodes that can execute the individual tools from the steps of the CWL workflow. These tools are run from Docker containers.

Cloud: AWS/Azure

Arvados has helpers to build compute node images for Azure and Amazon AWS.

Connecting Arvados to worker nodes on the cloud requires extensive configuration.

Cluster: Slurm

Clusters using the Slurm workload manager can be used as compute nodes by Arvados.

The compute nodes in the cluster must be prepared for Arvados, e.g. installing Docker.

Documentation

Extensive documentation for Arvados is available on https://doc.arvados.org/, including:

The tutorial Processing Whole Genome Sequences explores step-by-step the setup and execution of a real-life bioinformatics pipeline using the Arvados Workbench GUI and the arv command line tool.

Support options: Community/Enterprise

The Arvados Community provides support and collaboration for the open source edition of Arvados. Enterprise support is available from Curii.

Pros/cons

Arvados can be a good choice in these situations:

  • Multiple users of single compute architecture
  • Productionizing a relatively fixed workflow
  • API integration is desired (e.g. to build custom Web Apps)
  • Long-running workflow service

Arvados can be difficult in these situations:

  • Workflows change often (as they need to be uploaded to Arvados)
  • Tools change often (as container images need to be pre-loaded)
  • Single user (too heavyweight for install on a laptop)
  • Training situations (although Arvados-in-a-Box can be easily launched)