docs/pipelines_yaml_files.md from sanger/limber

docs/pipelines_yaml_files.md
Summary

Maintainability

Test Coverage

Issues
<!--
# @markup markdown
# @title Pipelines yaml files
-->

# Pipelines yaml files

There are a number of `*.yml` files located in `config/pipelines/` these
configure the flow of plate purposes through a {Pipeline}. Limber automatically
loads all `.yml` files within this directory into {PipelineList}.
Filenames, and the grouping of pipelines within files, have no functional
relevance, and are intended for organizational reasons.

Loading of yaml files is handled by {ConfigLoader::PipelinesLoader} which
loads all files, detects potential duplicates, and populates the {PipelineList}.

> **TIP**
> It is suggested that you create a new file for each new 'pipeline'. In most
> cases this file will actually contain a handful of internal 'pipelines'
> reflecting branches, or different stages of the process.

## An example file

This is an example yaml file configuring a WGS (whole genome sequencing)
pipeline.

```yaml
---
WGS: # Top of the pipeline (Library Prep)
  filters:
    request_type_key:
      - limber_wgs
      - limber_lcmb
      - limber_rnaa
    library_type: Standard
  library_pass: LB Lib PCR-XP
  relationships:
    LB Cherrypick: LB Shear
    LB Shear: LB Post Shear
    LB Post Shear: LB End Prep
    LB End Prep: LB Lib PCR
    LB Lib PCR: LB Lib PCR-XP
WGS MX: # Bottom of the pipeline (Pooling and normalization)
  filters:
    request_type_key:
      - limber_multiplexing
  relationships:
    LB Lib PCR-XP: LB Lib Pool
    LB Lib Pool: LB Lib Pool Norm
```

The rest of the document describes the structure of this file, and what each of the keys do.

## Top level

Each file is a `.yml` file located in `config/pipelines`, it contains the
configuration for one or more {Pipeline pipelines}.

The top level structure consists of series of keys, uniquely identifying each
pipeline. Keys need to be unique across _all_ pipelines, not just those within
the same file. Limber will detect duplicate keys, and will raise an exception
on boot.

The key will be used to set the {Pipeline#name}, this is exposed in the
pipelines overview page, and may get shown to the user in future.

The values in turn are used to describe each {Pipeline}. The valid options are details in Pipeline below.

### Pipeline

Each pipeline configures a name, high-level behaviour and a list of
relationships. As discussed above, the key is a unique value, which gets used
to set the pipeline's name.

@see Pipeline for the Ruby objects generated by this configuration.

```yaml
WGS: # Top of the pipeline (Library Prep)
  filters:
    request_type_key:
      - limber_wgs
      - limber_lcmb
      - limber_rnaa
    library_type: Standard
  library_pass: LB Lib PCR-XP
  relationships:
    LB Cherrypick: LB Shear
    LB Shear: LB Post Shear
    LB Post Shear: LB End Prep
    LB End Prep: LB Lib PCR
    LB Lib PCR: LB Lib PCR-XP
```

The other keys are detailed below.

#### pipeline_group

This groups several Limber pipelines together that are part of the same real world pipeline.

For instance, 'Heron-384 Tailed A V2' and 'Heron-384 Tailed B V2' - the split here is purely for technical reasons, to allow branching. In reality, they are both part of the Heron pipeline.

Another example is when there are separate Limber pipelines for sequential stages. For instance, 'pWGS-384' (the library prep part) and 'pWGS-384 MX' (the multiplexing part). In reality, these are both part of the same pipeline, so they both have the pipeline group 'pWGS-384'.

The pipeline group is used in the 'Work in progress' pages and the 'Pipelines overview' page.

#### filters

Filters are the way in which a pipeline works out if it is in progress. It
consists of a series of keys, and their acceptable values. Keys should be
attributes on {Sequencescape::Api::V2::Request request} (eg. library_type)
whereas values are either an array of acceptable values, or a single acceptable
value.

```yaml
filters:
  request_type_key:
    - limber_wgs
    - limber_lcmb
    - limber_rnaa
  library_type: Standard
```

Indicates that this pipeline can be used for requests with a request type of 'limber_wgs', 'limber_lcmb' or 'limber_rnaa', and a library type of 'Standard'.

The most common keys to filter on are request_type and library_type.

All filters must be fulfilled for a pipeline to be considered valid.

For branching pipelines with identical filters, you are strongly encouraged to
use yaml anchors to share the filter between pipelines. See the relationships
section below for more details, and an example.

#### library_pass

library_pass indicates the plate purposes for which the Lims should suggest the
'Charge and Pass Libraries' option. The values should be strings matching purpose names specified in `config/purposes/*.yml`.

It can be a string if library pass should be suggested at a single step:

```yaml
library_pass: LB Lib PCR-XP
```

Or an array, if there are multiple points at which a library can be passed:

```yaml
library_pass:
  - LB Cap Lib PCR-XP
  - LB Cap Lib Pool
```

> **TIP**
> library_pass usually occurs on the last plate of the pipeline, immediately
> prior to multiplexing and normalization. This is the point at which the
> pipeline transitions from the library creation request (eg. limber_wgs)
> to the multiplexing request (eg. limber_multiplexing). You'll see this
> reflected in the example above, with the 'WGS' and 'WGS MX' pipelines.
>
> This split ensures that customers can request re-pools of existing libraries,
> without incurring further charges for library creation.
>
> It is common, although not necessary, to specify both library_creation and
> multiplexing sections of a pipeline in the same file.
>
> library_pass is not specified for the final tube in the WGS MX pipeline
> because:
>
> - The behaviour is already handled by passing the tube itself
> - Multiplexing is not charged for, and rarely failed, so an explicit
>   step is unnecessary and confusing.

#### relationships

The relationships is a hash representing transitions from parent labware to
child labware. Both keys and values are strings matching purpose names specified
in `config/purposes/*.yml`.

```yaml
relationships:
  LB Cherrypick: LB Shear
  LB Shear: LB Post Shear
  LB Post Shear: LB End Prep
  LB End Prep: LB Lib PCR
  LB Lib PCR: LB Lib PCR-XP
```

The above shows a transition from 'LB Cherrypick' to 'LB Shear', 'LB Shear' to 'LB Post Shear' and so on.

> **TIP**
> In most Limber pipelines, the final multiplex library tube is created
> upfront by the limber_multiplexing request. This allows the SSRs to access
> the sequencing requests easily prior to the completion of library creation,
> allowing for the addition of removal of requests. A side effect of this is
> that any Limber pipelines using the standard limber_multiplexing request
> share the final tube purpose, 'LB Lib Pool Norm'. This is defined in:
> {file:config/purposes/final_tube.yml}

It should be noted that because the above structure is a hash, it is not possible
to reflect a branching pipeline. Instead, each branch of the pipeline can be
represented by a separate pipeline within the same file.

For example, the heron pipeline has A and B forks, representing the PCR 1 and
PCR 2 routes.

> **TIP**
> Note the use of &heron_filters and *heron_filters in the example below.
> This allows a filter to be share between two branches of the pipeline.
> You are *strongly\* encouraged to use this approach when dealing with branched
> pipelines with identical filters. In the past there have been several
> occasions where failure to follow this pattern has resulted in a library type
> only getting added to one branch of the pipeline by mistake.

```yaml
---
Heron-384 A: # Heron 384-well pipeline specific to PCR 1 plate
  filters: &heron_filters
    request_type_key: limber_heron
    library_type: PCR amplicon ligated adapters 384
  library_pass: LHR-384 Lib PCR
  relationships:
    LHR-384 RT: LHR-384 PCR 1
    LHR-384 PCR 1: LHR-384 cDNA
    LHR-384 cDNA: LHR-384 XP
    LHR-384 XP: LHR-384 End Prep
    LHR-384 End Prep: LHR-384 AL Lib
    LHR-384 AL Lib: LHR-384 Lib PCR
Heron-384 B: # Heron 384-well pipeline specific to PCR 2 plate (uses above relationships after cDNA plate)
  filters: *heron_filters
  relationships:
    LHR-384 RT: LHR-384 PCR 2
    LHR-384 PCR 2: LHR-384 cDNA
```