mbroadst/rethunk

View on GitHub
README.md

Summary

Maintainability
Test Coverage
rethunk
-------------
[![Build Status](https://secure.travis-ci.org/mbroadst/rethunk.svg?branch=master)](https://travis-ci.org/mbroadst/rethunk)
[![Dependency Status](https://david-dm.org/mbroadst/rethunk.svg)](https://david-dm.org/mbroadst/rethunk)
[![Test Coverage](https://codeclimate.com/github/mbroadst/rethunk/badges/coverage.svg)](https://codeclimate.com/github/mbroadst/rethunk)

A Node.js driver for RethinkDB with more advanced features.

### Install

```
npm install rethunk
```

### Quick start

Rethunk uses almost the same API as the official driver. Please refer to
the [official driver's documentation](http://www.rethinkdb.com/api/javascript/)
for all the ReQL methods (the methods used to build the query).


The main differences are:

- You need to execute the module when you import it:

```js
var r = require('rethunk')();
// With the official driver:
// var r = require('rethinkdb');
```

- Connections are managed by the driver with an efficient connection pool.
Once you have imported the driver, you can immediately run queries,
you don't need to call `r.connect`, or pass a connection to `run`.

```js
var r = require('rethunk')();
r.table('users').get('orphee@gmail.com').run().then(function(user) {
  // ...
}).error(handleError)
```

- Cursors are coerced to arrays by default

```js
var r = require('rethunk')();
r.table('data').run().then(function(result) {
  assert(Array.isArray(result)) // true
  // With the official driver you need to call
  // result.toArray().then(function(result2) {
  //   assert(Array.isArray(result2))
  // })
});
```

#### Drop in

You can replace the official driver with rethunk by just replacing

```js
var r = require('rethinkdb');
```

With:

```js
var r = require('rethunk')({
  pool: false,
  cursor: true
});
```

If you want to take advantage of the connection pool, refer to the next section.


#### From the official driver

To switch from the official driver to rethunk and get the most of it,
here are the few things to do:

1. Change the way to import the driver.

  ```js
  var r = require('rethinkdb');
  ```

  To:

  ```js
  var r = require('rethunk')();
  // Or if you do not connect to the default local instance:
  // var r = require('rethunk')({servers: [{host: ..., port: ...}]});
  ```

2. Remove everything related to a connection:

  ```js
  r.connect({host: ..., port: ...}).then(function(connection) {
    connection.on('error', handleError);
    query.run(connection).then(function(result) {
      // console.log(result);
      connection.close();
    });
  });
  ```

  Becomes:

  ```js
  query.run().then(function(result) {
    // console.log(result);
  });
  ```

3. Remove the methods related to the cursor. This typically involves
removing `toArray`:

  ```js
  r.table('data').run(connection).then(function(cursor) {
    cursor.toArray().then(function(result) {
      // console.log(result):
    });
  });
  ```

  Becomes

  ```js
  r.table('data').run().then(function(result) {
    // console.log(result);
  });
  ```


#### Using TLS Connections

_Note_: Support for a TLS proxy is experimental.

RethinkDB does not support TLS connections to the server yet, but in case you want
to run it over an untrusted network and need encryption, you can easily run a TLS proxy
on your server with:

```js
var tls = require('tls');
var net = require('net');
var tlsOpts = {
  key: '', // You private key
  cert: '' // Public certificate
};
tls.createServer(tlsOpts, function (encryptedConnection) {
  var rethinkdbConn = net.connect({
    host: 'localhost',
    port: 28015
  });
  encryptedConnection.pipe(rethinkdbConn).pipe(encryptedConnection);
}).listen(29015);
```

And then safely connect to it with the `tls` option:

```js
var r = require('rethunk')({
  port: 29015,
  host: 'place-with-no-firewall.com',
  ssl: true
});
```

`ssl` may also be an object that will be passed as the `options` argument to
[`tls.connect`](http://nodejs.org/api/tls.html#tls_tls_connect_options_callback).


### New features and differences

rethunk ships with a few interesting features.


#### Importing the driver

When you import the driver, as soon as you execute the module, you will create
a default connection pool (except if you pass `{pool: false}`. The options you
can pass are:

- `db`: `<string>` - The default database to use if none is mentioned.
- `discovery`: `<boolean>` - When true, the driver will regularly pull data from the table `server_status` to
keep a list of updated hosts, default `false`
- `pool`: `<boolean>` - Set it to `false`, if you do not want to use a connection pool.
- `buffer`: `<number>` - Minimum number of connections available in the pool, default `50`
- `max`: `<number>` - Maximum number of connections available in the pool, default `1000`
- `timeout`: `<number>` - The number of seconds for a connection to be opened, default `20`
- `timeoutError`: `<number>` - Wait time before reconnecting in case of an error (in ms), default 1000
- `timeoutGb`: `<number>` - How long the pool keep a connection that hasn't been used (in ms), default 60\*60\*1000
- `maxExponent`: `<number>` - The maximum timeout before trying to reconnect is 2^maxExponent x timeoutError, default 6 (~60 seconds for the longest wait)
- `silent`: <boolean> - console.error errors, default `false`
- `servers`: an array of objects `{ host: <string>, port: <number> }` representing RethinkDB nodes to connect to.
- `optionalRun`: <boolean> - if `false`, yielding a query will not run it, default `true`

In case of a single instance, you can directly pass `host` and `port` in the top level parameters.

Examples:
```
// connect to localhost:8080, and let the driver find other instances
var r = require('rethunk')({
    discovery: true
});

// connect to and only to localhost:8080
var r = require('rethunk')();

// Do not create a connection pool
var r = require('rethunk')({pool: false});

// Connect to a cluster seeding from `192.168.0.100`, `192.168.0.101`, `192.168.0.102`
var r = require('rethunk')({
    servers: [
        {host: '192.168.0.100', port: 28015},
        {host: '192.168.0.101', port: 28015},
        {host: '192.168.0.102', port: 28015},
    ]
});

// Connect to a cluster containing `192.168.0.100`, `192.168.0.100`, `192.168.0.102` and
use a maximum of 3000 connections and try to keep 300 connections available at all time.
var r = require('rethunk')({
    servers: [
        {host: '192.168.0.100', port: 28015},
        {host: '192.168.0.101', port: 28015},
        {host: '192.168.0.102', port: 28015},
    ],
    buffer: 300,
    max: 3000
});
```

You can also pass `{cursor: true}` if you want to retrieve RethinkDB streams as cursors
and not arrays by default.

_Note_: The option `{stream: true}` that asynchronously returns a stream is deprecated. Use `toStream` instead.

_Note_: The option `{optionalRun: false}` will disable the optional run for all instances of the driver.

#### Connection pool

As mentionned before, `rethunk` has a connection pool and manage all the connections
itself. The connection pool is initialized as soon as you execute the module.

You should never have to worry about connections in rethunk. Connections are created
as they are needed, and in case of a host failure, the pool will try to open connections with an
exponential back off algorithm.

The driver execute one query per connection. Now that [rethinkdb/rethinkdb#3296](https://github.com/rethinkdb/rethinkdb/issues/3296)
is solved, this behavior may be changed in the future.

Because the connection pool will keep some connections available, a script will not
terminate. If you have finished executing your queries and want your Node.js script
to exit, you need to drain the pool with:

```js
r.getPoolMaster().drain();
```

The pool master by default will log all errors/new states on `stderr`. If you do not
want to pollute `stderr`, pass `silent: true` when you import the driver. You can retrieve the
logs by binding a listener for the `log` event on the pool master.

```js
r.getPoolMaster().on('log', console.log);
```

##### Advanced details about the pool

The pool is composed of a `PoolMaster` that retrieve connections for `n` pools where `n` is the number of
servers the driver is connected to. Each pool is connected to a unique host.

To access the pool master, you can call the method `r.getPoolMaster()`.

The pool emits a few events:
- `draining`: when `drain` is called
- `queueing`: when a query is added/removed from the queue (queries waiting for a connection), the size of the queue is provided
- `size`: when the number of connections changes, the number of connections is provided
- `available-size`: when the number of available connections changes, the number of available connections is provided

You can get the number of connections (opened or being opened).
```js
r.getPoolMaster().getLength();
```

You can also get the number of available connections (idle connections, without
a query running on it).

```js
r.getPoolMaster().getAvailableLength();
```

You can also drain the pool as mentionned earlier with;

```js
r.getPoolMaster().drain();
```

You can access all the pools with:
```js
r.getPoolMaster().getPools();
```

The pool master emits the `healthy` when its state change. Its state is defined as:
- healthy when at least one pool is healthy: Queries can be immediately executed or will be queued.
- not healthy when no pool is healthy: Queries will immediately fail.

A pool being healthy is it has at least one available connection, or it was just
created and opening a connection hasn't failed.

```js
r.getPoolMaster().on('healthy', function(healthy) {
  if (healthy === true) {
    console.log('We can run queries.');
  }
  else {
    console.log('No queries can be run.');
  }
});
```


##### Note about connections

If you do not wish to use rethunk connection pool, you can implement yours. The
connections created with rethunk emits a "release" event when they receive an
error, an atom, or the end (or full) sequence.

A connection can also emit a "timeout" event if the underlying connection times out.


#### Arrays by default, not cursors

rethunk automatically coerce cursors to arrays. If you need a raw cursor,
you can call the `run` command with the option `{cursor: true}` or import the
driver with `{cursor: true}`.

```js
r.expr([1, 2, 3]).run().then(function(result) {
  console.log(JSON.stringify(result)) // print [1, 2, 3]
})
```

```js
r.expr([1, 2, 3]).run({cursor: true}).then(function(cursor) {
  cursor.toArray().then(function(result) {
    console.log(JSON.stringify(result)) // print [1, 2, 3]
  });
})
```

__Note__: If a query returns a cursor, the connection will not be
released as long as the cursor hasn't fetched everything or has been closed.


#### Readable streams

[Readable streams](http://nodejs.org/api/stream.html#stream_class_stream_readable) can be
synchronously returned with the `toStream([connection])` method.

```js
var fs = require('fs');
var file = fs.createWriteStream('file.txt');

var r = require('rethunk')();
r.table('users').toStream()
  .on('error', console.log)
  .pipe(file)
  .on('error', console.log)
  .on('end', function() {
    r.getPool().drain();
  });
```

_Note:_ The stream will emit an error if you provide it with a single value (streams, arrays
and grouped data work fine).

_Note:_ `null` values are currently dropped from streams.

#### Writable and Transform streams

You can create a [Writable](http://nodejs.org/api/stream.html#stream_class_stream_writable)
or [Transform](http://nodejs.org/api/stream.html#stream_class_stream_transform) streams by
calling `toStream([connection, ]{writable: true})` or
`toStream([connection, ]{transform: true})` on a table.

By default, a transform stream will return the saved documents. You can return the primary
key of the new document by passing the option `format: 'primaryKey'`.

This makes a convenient way to dump a file your database.

```js
var file = fs.createReadStream('users.json')
var table = r.table('users').toStream({writable: true});

file.pipe(transformer) // transformer would be a Transform stream that splits per line and call JSON.parse
    .pipe(table)
    .on('finish', function() {
        console.log('Done');
        r.getPool().drain();
    });
```


#### Optional `run` with `yield`

The `then` and `catch` methods are implemented on a `Term` - returned by any methods
like `filter`, `update` etc. They are shortcut for `this.run().then(callback)` and
`this.run().catch(callback)`.

This means that you can `yield` any query without calling `run.`

```js
var bluebird = require('bluebird');
var r = require('rethunk')();

bluebird.coroutine(function*() {
  try {
    var result = yield r.table('users').get('orphee@gmail.com').update({name: 'Michel'});
    assert.equal(result.errors, 0);
  } catch(err) {
    console.log(err);
  }
});
```

_Note_: You have to start Node >= 0.11 with the `--harmony` flag.


#### Global default values

You can set the maximum nesting level and maximum array length on all your queries with:

```js
r.setNestingLevel(<number>)
```

```js
r.setArrayLimit(<number>)
```

#### Undefined values

rethunk will ignore the keys/values where the value is `undefined` instead
of throwing an error like the official driver.


#### Better errors


##### Backtraces

If your query fails, the driver will return an error with a backtrace; your query
will be printed and the broken part will be highlighted.

Backtraces in rethunk are tested and properly formatted. Typically, long backtraces
are split on multiple lines and if the driver cannot serialize the query,
it will provide a better location of the error.


##### Arity errors

The server may return confusing error messages when the wrong number
of arguments is provided (See [rethinkdb/rethinkdb#2463](https://github.com/rethinkdb/rethinkdb/issues/2463) to track progress).
rethunk tries to make up for it by catching errors before sending
the query to the server if possible.


#### Performance

The tree representation of the query is built step by step and stored which avoid
recomputing it if the query is re-run.

The code was partially optimized for v8, and is written in pure JavaScript which avoids
errors like [issue #2839](https://github.com/rethinkdb/rethinkdb/issues/2839)


### Run tests

Update `test/config.js` if your RethinkDB instance doesn't run on the default parameters.

Make sure you run a version of Node that supports generators and run:
```
npm test
```

Longer tests for the pool:

```
mocha --harmony-generators long_test/discovery.js -t 50000
mocha --harmony-generators long_test/static.js -t 50000
```

### FAQ

- __Why rethunk?__

  rethunk was built as an experiment for promises and a connection pool. Its
  purpose was to test new features and improve the official driver. Today,
  rethunk still tries to make the developer experience as pleasant as possible -
  like with the recent support for Node.js streams.

  Some features like promises have been back ported to the official driver, some like
  the connection pool and streams are on their way.


- __Is it stable?__

  Yes. rethunk is used by quite many people. The driver is also used by `thinky`,
  and has been and is still being tested in the wild.


- __Does it work with io.js?__

  All the tests pass with io.js so yes.


- __Is rethunk going to become the JavaScript official driver?__

  Not (yet?), maybe :)

  Completely replacing the driver requires some work:
  - Integrate the driver in RethinkDB suite test.
  - Support HTTP connections.
  - Rollback some default like the coercion of cursors to arrays.


- __Can I contribute?__

  Feel free to send a pull request. If you want to implement a new feature, please open
  an issue first, especially if it's a non backward compatible one.