QutBioacoustics/baw-workers

View on GitHub
README.md

Summary

Maintainability
Test Coverage
# Baw::Workers

Bioacoustics Workbench workers. Provides workers and file storage. 

Workers can process various long-running or intensive tasks. 
File storage provides helper methods for calculating paths to original and cached files.

[![Build Status](https://travis-ci.org/QutBioacoustics/baw-workers.png?branch=master)](https://travis-ci.org/QutBioacoustics/baw-workers)
[![Dependency Status](https://gemnasium.com/QutBioacoustics/baw-workers.png)](https://gemnasium.com/QutBioacoustics/baw-workers)
[![Code Climate](https://codeclimate.com/github/QutBioacoustics/baw-workers.png)](https://codeclimate.com/github/QutBioacoustics/baw-workers)
[![Test Coverage](https://codeclimate.com/github/QutBioacoustics/baw-workers/badges/coverage.svg)](https://codeclimate.com/github/QutBioacoustics/baw-workers)
[![Documentation Status](http://inch-ci.org/github/QutBioacoustics/baw-workers.png?branch=master)](http://inch-ci.org/github/QutBioacoustics/baw-workers)
[![Documentation](https://img.shields.io/badge/docs-rdoc.info-blue.svg)](http://www.rubydoc.info/github/QutBioacoustics/baw-workers)

## Installation

Add this line to your application's Gemfile:

    gem 'baw-workers', git: 'https://github.com/QutBioacoustics/baw-workers.git'

or clone the repository to the current directory:

    git clone https://github.com/QutBioacoustics/baw-workers.git

And then execute:

    $ bundle install

## Actions

This project provides four actions. Actions are classes that implement a potentially long-running process.

### Analysis

Runs analysers over audio files. This action analyses an entire single audio file.

 1. Resque jobs can be queued from [baw-server](https://github.com/QutBioacoustics/baw-server) and processed later by a Resque dequeue worker.
 1. A directory can be analysed manually by providing the settings for a single audio file in yaml format for the the `analysis_config_file` parameter.

### Audio Check

Runs checks on original audio recording files. This action checks an entire single audio file.

 - Gets audio files to check from a csv file in a specific format by specifying `csv_file`.

### Harvest

Harvests audio files to be accessible by [baw-server](https://github.com/QutBioacoustics/baw-server) via the file storage system. 

 - The harvester will recognise valid audio files in two ways: file name in a recognised format, and optionally a directory config file. Depending on the file name format used, a directory config file may or may not be required.
 - Audio files can be harvested by specifying the parameter `harvest_dir` and the `config_file_name` in the settings file.

### Media

Cuts audio files and generates spectrograms.

 -  Resque jobs can be queued on demand from [baw-server](https://github.com/QutBioacoustics/baw-server)
and processed later by a Resque dequeue worker.

## Dependencies

You may need to install some additional tools for working with audio and images, and for processing long-running tasks.
See [baw-audio-tools](https://github.com/QutBioacoustics/baw-audio-tools) for more information.

## File Storage

There are classes for working with file storage paths:

 - original audio files
 - caches for 
    - cut audio
    - generated spectrograms
    - analysis results. 

## Running Workers

A `worker` runs an `action`. Actions are simply a process to follow. Actions can get input from the settings file when run standalone and/or from Resque jobs (when running as a Resque dequeue worker).

Then answer these questions:

 - which environment? (e.g. staging, production, development, test)
 - which worker? (e.g. audio_check, harvester, media, analysis)
 - which processing model? (e.g. standalone, resque)

Based on the answers to these questions

 - pick an existing config file (and check that the settings match the file name),
 - or create a new config file based on an existing one, named in a similar way.

Once you've got your config file, then a worker can be started.
Workers are run using rake tasks. A list of the available rake tasks can be obtained 
by running this command in the `baw-workers` cloned directory or the directory containing your `Gemfile`:

    bundle exec rake -T

There are two steps that a worker can run:

 - preliminary processing (step 1), which can finish by adding a job to a Resque queue or continuing directly to step 2.
 - final processing (step 2), which can start by reserving a job from a Resque queue or directly receiving data from step 1.

There are three ways to run a worker:

 - standalone: this will run steps 1 and 2 sequentially in the same process
 - Resque enqueue: this will run step 1 and enqueue a job using Resque
 - Resque dequeue: this will reserve a job using Resque and run step 2

### Configuration Hints

Some things to check and look out for when creating and modifying worker config files.

#### `settings.resque.queues_to_process`

This setting is only needed when running a Resque dequeue worker. 
It specifies a priority array of the Resque queues to reserve jobs from.
The jobs in a queue specify the action class that will be used to process that job.

#### `settings.resque.connection`

The connection settings are passed directly to Resque to configure the Redis connection.

#### `settings.resque.namespace`

The [Redis namespace](https://github.com/resque/resque). This should usually be left as 'resque'.

#### `settings.resque.background_pid_file`

Specify a `background_pid_file` to have a Resque dequeue worker run in the background.
The `output_log_file` and `error_log_file` settings will only be used when a Resque dequeue worker is running in the background.

#### `settings.actions`

Each action has some settings specific to that action.
An action is the actual processing that job arguments will be used to carry out.

Every action has a `queue` setting.
The `queue` is the name of the queue the action will add jobs to when running as a Resque enqueue worker.
See the Actions section below for more information about action-specific settings.

#### `settings.endpoints` and `settings.api`

These settings must match the equivalent [baw-server](https://github.com/QutBioacoustics/baw-server) settings.

#### `log_level` settings

Each `log_level` setting is independent of the others.

### Examples for running a worker

Replace `'settings_file'` with the full path to the settings file to use for the worker.
Other parameters are described in the `Actions` section below.

#### Standalone

    bundle exec rake baw:analysis:standalone:from_files[settings_file,analysis_config_file]  # Analyse audio files directly
    bundle exec rake baw:audio_check:standalone:from_csv[settings_file,csv_file]             # Enqueue audio recording file checks from a csv file to be processed directly
    bundle exec rake baw:harvest:standalone:from_files[settings_file,harvest_dir]            # Harvest audio files directly
    # media action can only be run as a Resque dequeue worker

#### Resque enqueue

    bundle exec rake baw:analysis:resque:from_files[settings_file,analysis_config_file]      # Enqueue files to analyse using Resque
    bundle exec rake baw:audio_check:resque:from_csv[settings_file,csv_file]                 # Enqueue audio recording file checks from a csv file to be processed using Resque worker
    bundle exec rake baw:harvest:resque:from_files[settings_file,harvest_dir]                # Enqueue files to harvest using Resque
    # media action can only be run as a Resque dequeue worker
    
#### Resque dequeue

A Resque dequeue worker can process any queue with any type of job.

    bundle exec rake baw:worker:current[settings_file]                                       # List running workers
    bundle exec rake baw:worker:run[settings_file]                                           # Run a resque:work with the specified settings file
    bundle exec rake baw:worker:setup[settings_file]                                         # Run a resque:work with the specified settings file
    bundle exec rake baw:worker:stop_all[settings_file]                                      # Quit running workers

## Contributing

1. [Fork this repo](https://github.com/QutBioacoustics/baw-workers/fork)
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new [pull request](https://github.com/QutBioacoustics/baw-workers/compare)