README.md from resource-watch/dataset

README.md
Summary

Maintainability

Test Coverage

Issues
# Dataset microservice

[![Build Status](https://travis-ci.com/resource-watch/dataset.svg?branch=dev)](https://travis-ci.com/resource-watch/dataset)
[![Test Coverage](https://api.codeclimate.com/v1/badges/6e90d8ae68d28c916a5c/test_coverage)](https://codeclimate.com/github/resource-watch/dataset/test_coverage)

## Dependencies

The Dataset microservice is built using [Node.js](https://nodejs.org/en/), and can be executed either natively or using Docker, each of which has its own set of requirements.

Native execution requires:

- [Node.js](https://nodejs.org/en/)
- [Yarn](https://yarnpkg.com/)
- [MongoDB](https://www.mongodb.com/)

Execution using Docker requires:

- [Docker](https://www.docker.com/)
- [Docker Compose](https://docs.docker.com/compose/)

Dependencies on other Microservices:

- [Carto adapter](https://github.com/resource-watch/rw-adapter-carto/)
- [ArcGIS adapter](https://github.com/resource-watch/adapter-arcgis)
- [Google Earth Engine adapter](https://github.com/resource-watch/adapter-earth-engine)
- [BigQuery adapter](https://github.com/resource-watch/adapter-bigquery)
- [NEX-GDDP adapter](https://github.com/Vizzuality/prep-nexgddp)
- [Graph client](https://github.com/resource-watch/graph-client)
- [Geostore](https://github.com/gfw-api/gfw-geostore-api)
- [Layer](https://github.com/resource-watch/layer)
- [Metadata](https://github.com/resource-watch/rw_metadata)
- [Task Async](https://github.com/resource-watch/task-executor)
- [Vocabulary](https://github.com/resource-watch/vocabulary-tag/)
- [Widget](https://github.com/resource-watch/widget)
- [Control Tower](https://github.com/resource-watch/control-tower)

## Getting started

Start by cloning the repository from github to your execution environment

```
git clone https://github.com/resource-watch/dataset.git && cd dataset
```

After that, follow one of the instructions below:

### Using native execution

1 - Set up your environment variables. See `dev.env.sample` for a list of variables you should set, which are described in detail in [this section](#environment-variables) of the documentation. Native execution will NOT load the `dev.env` file content, so you need to use another way to define those values.

2 - Install node dependencies using yarn:
```
yarn
```

3 - Start the application server:
```
yarn start
```

The endpoints provided by this microservice should now be available through Control Tower's URL.

### Using Docker

1 - Create and complete your `dev.env` file with your configuration. The meaning of the variables is available in this [section](#configuration-environment-variables). You can find an example `dev.env.sample` file in the project root.

2 - Execute the following command to run Control tower:

```
./dataset.sh develop
```

The endpoints provided by this microservice should now be available through Control Tower's URL.

## Testing

There are two ways to run the included tests:

### Using native execution

Follow the instruction above for setting up the runtime environment for native execution, then run:
```
yarn test
```

### Using Docker

Follow the instruction above for setting up the runtime environment for Docker execution, then run:
```
./dataset.sh test
```

## Configuration

### Environment variables

- PORT => TCP port in which the service will run
- NODE_PATH => relative path to the source code. Should be `app/src`
- MICROSERVICE_TOKEN => 
- S3_ACCESS_KEY_ID => AWS S3 key id
- S3_SECRET_ACCESS_KEY => AWS S3 access key
- MONGO_PORT_27017_TCP_ADDR => IP/Address of the MongoDB server

You can optionally set other variables, see [this file](config/custom-environment-variables.json) for an extended list.

## Documentation

### Dataset model

| Property Name                | Value              | Description                                     | Required      | Notes                                                                                                                                                                  |
|---------------------------|-----------------|----------------------------------------------|------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| name                         | string             | A name of the dataset                           | Yes           |                                                                                                                                                                        |
| slug                         | string             | Slug-like string of the dataset name            | No            | Autogenerated                                                                                                                                                          |
| type                         | string             | The type of the dataset                         | No            |                                                                                                                                                                        |
| application                  | string             | The application the dataset belongs to          | No            | Autogenerated                                                                                                                                                          |
| dataPath                     | string             | Path of the data in document datasets           | No            |                                                                                                                                                                        |
| attributesPath               | string             | Attributes to import                            | No            |                                                                                                                                                                        |
| connectorType                | string             | The type of connector of the dataset            | Yes           | Valid connectorTypes values ["rest", "document", "wms"]                                                                                                                |
| provider                     | string             | The dataset connector provider                  | Yes           | Valid provider values: rest -> ["cartodb", "featureservice", "gee", "bigquery", "rasdaman", "nexgddp"]; document -> ["csv", "json", "tsv", "xml"]; wms -> ["wms"]      |
| userId                       | string             | The userId of the owner of the dataset          | No            |                                                                                                                                                                        |
| connectorUrl                 | string             | A valid url where the data is stored            | No            | Required when the dataset connectorType is rest.                                                                                                                       |
| tableName                    | string             | The name of the actual or generated table.      | No            | Autogenerated                                                                                                                                                          |
| status                       | string             |                                                 | No            |                                                                                                                                                                        |
| overwrite                    | boolean            |                                                 | No            |                                                                                                                                                                        |
| errorMessage                 | string             |                                                 | No            |                                                                                                                                                                        |
| published                    | boolean            |                                                 | No            |                                                                                                                                                                        |
| sandbox                      | boolean            |                                                 | No            |                                                                                                                                                                        |
| env                          | string             |                                                 | No            |                                                                                                                                                                        |
| geoInfo                      | boolean            |                                                 | No            |                                                                                                                                                                        |
| protected                    | boolean            |                                                 | No            |                                                                                                                                                                        |
| taskId                       | string             |                                                 | No            |                                                                                                                                                                        |
| subscribable                 | nested object      |                                                 | No            |                                                                                                                                                                        |
| legend                       | nested object      |                                                 |               |                                                                                                                                                                        |
| legend.lat                   | string             |                                                 | No            |                                                                                                                                                                        |
| legend.long                  | string             |                                                 | No            |                                                                                                                                                                        |
| legend.date                  | list               |                                                 | No            | List of string values                                                                                                                                                  |
| legend.region                | list               |                                                 | No            | List of string values                                                                                                                                                  |
| legend.country               | list               |                                                 | No            | List of string values                                                                                                                                                  |
| legend.nested                | list               |                                                 | No            | List of string values                                                                                                                                                  |
| clonedHost                   | nested object      |                                                 |               |                                                                                                                                                                        |
| clonedHost.hostProvider      | string             |                                                 | No            | Autogenerated                                                                                                                                                          |
| clonedHost.hostUrl           | string             |                                                 | No            | Autogenerated                                                                                                                                                          |
| clonedHost.hostId            | string             |                                                 | No            | Autogenerated                                                                                                                                                          |
| clonedHost.hostType          | string             |                                                 | No            | Autogenerated                                                                                                                                                          |
| clonedHost.hostPath          | string             |                                                 | No            | Autogenerated                                                                                                                                                          |
| createdAt                    | string             |                                                 | No            | Date value                                                                                                                                                             |
| updatedAt                    | string             |                                                 | No            | Date Value                                                                                                                                                             |

### Dataset Endpoints

GET: /v1/dataset

POST: /v1/dataset

GET: /v1/dataset/:dataset

PATCH: /v1/dataset/:dataset

DELETE: /v1/dataset/:dataset

POST: /v1/dataset/find-by-ids

POST: /v1/dataset/upload

GET: /v1/dataset/:dataset/clone

### Swagger

[Check out the swagger docs](https://editor.swagger.io/?url=https://raw.githubusercontent.com/GPSDD/dataset/develop/app/microservice/swagger.json)

### Legacy

At some point, we had a blockhain validation functionality. As it was not used, and for code simplicity, it was removed here:  https://github.com/resource-watch/dataset/pull/97
Use that link to restore it if needed.