lulibrary/preservation

View on GitHub
README.md

Summary

Maintainability
Test Coverage
# Preservation

Extraction from the Pure Research Information System and transformation for
loading by Archivematica.

Includes transfer preparation, reporting and disk space management.

## Status

[![Gem Version](https://badge.fury.io/rb/preservation.svg)](https://badge.fury.io/rb/preservation)
[![Build Status](https://semaphoreci.com/api/v1/aalbinclark/preservation/branches/master/badge.svg)](https://semaphoreci.com/aalbinclark/preservation)
[![Code Climate](https://codeclimate.com/github/lulibrary/preservation/badges/gpa.svg)](https://codeclimate.com/github/lulibrary/preservation)
[![Dependency Status](https://www.versioneye.com/user/projects/5899e0d11e07ae0043969771/badge.svg?style=flat-square)](https://www.versioneye.com/user/projects/5899e0d11e07ae0043969771)
[![GitPitch](https://gitpitch.com/assets/badge.svg)](https://gitpitch.com/lulibrary/preservation)

## Installation

Add this line to your application's Gemfile:

    gem 'preservation'

And then execute:

    $ bundle

Or install it yourself as:

    $ gem install preservation

## Usage

### Configuration

Configure Preservation. If ```log_path``` is omitted, logging (standard library)
writes to STDOUT.

```ruby
Preservation.configure do |config|
  config.db_path     = ENV['ARCHIVEMATICA_DB_PATH']
  config.ingest_path = ENV['ARCHIVEMATICA_INGEST_PATH']
  config.log_path    = ENV['PRESERVATION_LOG_PATH']
end
```

Create a hash for passing to a transfer.

```ruby
# Pure host with authentication.
config = {
  url:      ENV['PURE_URL'],
  username: ENV['PURE_USERNAME'],
  password: ENV['PURE_PASSWORD']
}
```

```ruby
# Pure host without authentication.
config = {
  url: ENV['PURE_URL']
}
```

### Transfer

Configure a transfer to retrieve data from a Pure host.

```ruby
transfer = Preservation::Transfer::Dataset.new config
```

#### Single

If necessary, fetch the metadata, prepare a directory in the ingest path and
populate it with the files and JSON description file.

```ruby
transfer.prepare uuid: 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'
```

#### Batch

For multiple Pure datasets, if necessary, fetch the metadata, prepare a
directory in the ingest path and populate it with the files and JSON description
file.

A maximum of 10 will be prepared using the doi_short directory naming scheme.
Each dataset will only be prepared if 20 days have elapsed since the metadata
record was last modified.

```ruby
transfer.prepare_batch max: 10,
                       dir_scheme: :doi_short,
                       delay: 20
```

#### Directory name

The following are permitted values for the dir_scheme parameter:

```ruby
:uuid_title
:title_uuid
:date_uuid_title
:date_title_uuid
:date_time_uuid
:date_time_title
:date_time_uuid_title
:date_time_title_uuid
:uuid
:doi
:doi_short
```

#### Load directory

A transfer-ready directory, with a name built according to the directory scheme
specified, in this case doi_short. This particular example has only one file
Ebola_data_Jun15.zip in the dataset.

```
.
├── 10.17635-lancaster-researchdata-6
│   ├── Ebola_data_Jun15.zip
│   └── metadata
│       └── metadata.json
```

metadata.json:

```json
[
  {
    "filename": "objects/Ebola_data_Jun15.zip",
    "dc.title": "Ebolavirus evolution 2013-2015",
    "dc.description": "Data used for analysis of selection and evolutionary rate in Zaire Ebolavirus variant Makona",
    "dcterms.created": "2015-06-04",
    "dcterms.available": "2015-06-04",
    "dc.publisher": "Lancaster University",
    "dc.identifier": "http://dx.doi.org/10.17635/lancaster/researchdata/6",
    "dcterms.spatial": [
      "Guinea, Sierra Leone, Liberia"
    ],
    "dc.creator": [
      "Gatherer, Derek"
    ],
    "dc.contributor": [
      "Robertson, David",
      "Lovell, Simon"
    ],
    "dc.subject": [
      "Ebolavirus",
      "evolution",
      "phylogenetics",
      "virulence",
      "Filoviridae",
      "positive selection"
    ],
    "dcterms.license": "CC BY",
    "dc.relation": [
      "http://dx.doi.org/10.1136/ebmed-2014-110127",
      "http://dx.doi.org/10.1099/vir.0.067199-0"
    ]
  }
]
```

### Storage

Free up disk space for completed transfers. Can be done at any time.

```ruby
Preservation::Storage.cleanup
```

### Report

Can be used for scheduled monitoring of transfers.

```ruby
Preservation::Report::Transfer.exception
```

Formatted as JSON:

```json
{
  "pending": {
    "count": 3,
    "data": [
      {
        "path": "10.17635-lancaster-researchdata-72",
        "path_timestamp": "2016-09-29 12:08:58 +0100"
      },
      {
        "path": "10.17635-lancaster-researchdata-74",
        "path_timestamp": "2016-09-29 12:08:59 +0100"
      },
      {
        "path": "10.17635-lancaster-researchdata-75",
        "path_timestamp": "2016-09-29 12:09:00 +0100"
      }
    ]
  },
  "current": {
    "path": "10.17635-lancaster-researchdata-90",
    "unit_type": "ingest",
    "status": "PROCESSING",
    "current": 1,
    "id": 91,
    "uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
    "path_timestamp": "2016-09-28 17:09:33 +0100"
  },
  "failed": {
    "count": 0
  },
  "incomplete": {
    "count": 1,
    "data": [
      {
         "path": "10.17635-lancaster-researchdata-90",
         "unit_type": "ingest",
         "status": "PROCESSING",
         "current": 1,
         "id": 91,
         "uuid": "ebf048c3-0ca8-409c-94cf-ab3e5d97e901",
         "path_timestamp": "2016-09-28 17:09:33 +0100"
      }
    ]
  },
  "complete": {
    "count": 78
  }
}
```