mongodb/mongoid

View on GitHub
docs/reference/aggregation.txt

Summary

Maintainability
Test Coverage
.. _aggregation-pipeline:

********************
Aggregation Pipeline
********************

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 2
   :class: singlecol


Mongoid exposes `MongoDB's aggregation pipeline
<https://www.mongodb.com/docs/manual/core/aggregation-pipeline/>`_,
which is used to construct flows of operations that process and return results.
The aggregation pipeline is a superset of the deprecated
:ref:`map/reduce framework <map-reduce>` functionality.


Basic Usage
===========

.. _aggregation-pipeline-example-multiple-collections:

Querying Across Multiple Collections
````````````````````````````````````

The aggregation pipeline may be used for queries involving multiple
referenced associations at the same time:

.. code-block:: ruby

  class Band
    include Mongoid::Document
    has_many :tours
    has_many :awards
    field :name, type: String
  end

  class Tour
    include Mongoid::Document
    belongs_to :band
    field :year, type: Integer
  end

  class Award
    include Mongoid::Document
    belongs_to :band
    field :name, type: String
  end

To retrieve bands that toured since 2000 and have at least one award, one
could do the following:

.. code-block:: ruby

  band_ids = Band.collection.aggregate([
    { '$lookup' => {
      from: 'tours',
      localField: '_id',
      foreignField: 'band_id',
      as: 'tours',
    } },
    { '$lookup' => {
      from: 'awards',
      localField: '_id',
      foreignField: 'band_id',
      as: 'awards',
    } },
    { '$match' => {
      'tours.year' => {'$gte' => 2000},
      'awards._id' => {'$exists' => true},
    } },
    {'$project' => {_id: 1}},
  ])
  bands = Band.find(band_ids.to_a)

Note that the aggregation pipeline, since it is implemented by the Ruby driver
for MongoDB and not Mongoid, returns raw ``BSON::Document`` objects rather than
``Mongoid::Document`` model instances. The above example projects only
the ``_id`` field which is then used to load full models. An alternative is
to not perform such a projection and work with raw fields, which would eliminate
having to send the list of document ids to Mongoid in the second query
(which could be large).


.. _aggregation-pipeline-builder-dsl:

Builder DSL
===========

Mongoid provides limited support for constructing the aggregation pipeline
itself using a high-level DSL. The following aggregation pipeline operators
are supported:

- `$group <https://mongodb.com/docs/manual/reference/operator/aggregation/group/>`_
- `$project <https://mongodb.com/docs/manual/reference/operator/aggregation/project/>`_
- `$unwind <https://mongodb.com/docs/manual/reference/operator/aggregation/unwind/>`_

To construct a pipeline, call the corresponding aggregation pipeline methods
on a ``Criteria`` instance. Aggregation pipeline operations are added to the
``pipeline`` attribute of the ``Criteria`` instance. To execute the pipeline,
pass the ``pipeline`` attribute value to ``Collection#aggragegate`` method.

For example, given the following models:

.. code-block:: ruby

  class Tour
    include Mongoid::Document

    embeds_many :participants

    field :name, type: String
    field :states, type: Array
  end

  class Participant
    include Mongoid::Document

    embedded_in :tour

    field :name, type: String
  end

We can find out which states a participant visited:

.. code-block:: ruby

  criteria = Tour.where('participants.name' => 'Serenity',).
    unwind(:states).
    group(_id: 'states', :states.add_to_set => '$states').
    project(_id: 0, states: 1)

  pp criteria.pipeline
  # => [{"$match"=>{"participants.name"=>"Serenity"}},
  #     {"$unwind"=>"$states"},
  #     {"$group"=>{"_id"=>"states", "states"=>{"$addToSet"=>"$states"}}},
  #     {"$project"=>{"_id"=>0, "states"=>1}}]

  Tour.collection.aggregate(criteria.pipeline).to_a


group
`````

The ``group`` method adds a `$group aggregation pipeline stage
<https://mongodb.com/docs/manual/reference/operator/aggregation/group/>`_.

The field expressions support Mongoid symbol-operator syntax:

.. code-block:: ruby

    criteria = Tour.all.group(_id: 'states', :states.add_to_set => '$states')
    criteria.pipeline
    # => [{"$group"=>{"_id"=>"states", "states"=>{"$addToSet"=>"$states"}}}]

Alternatively, standard MongoDB aggregation pipeline syntax may be used:

.. code-block:: ruby

    criteria = Tour.all.group(_id: 'states', states: {'$addToSet' => '$states'})


project
```````

The ``project`` method adds a `$project aggregation pipeline stage
<https://mongodb.com/docs/manual/reference/operator/aggregation/project/>`_.

The argument should be a Hash specifying the projection:

.. code-block:: ruby

    criteria = Tour.all.project(_id: 0, states: 1)
    criteria.pipeline
    # => [{"$project"=>{"_id"=>0, "states"=>1}}]


.. _unwind-dsl:

unwind
``````

The ``unwind`` method adds an `$unwind aggregation pipeline stage
<https://mongodb.com/docs/manual/reference/operator/aggregation/unwind/>`_.

The argument can be a field name, specifiable as a symbol or a string, or
a Hash or a ``BSON::Document`` instance:

.. code-block:: ruby

    criteria = Tour.all.unwind(:states)
    criteria = Tour.all.unwind('states')
    criteria.pipeline
    # => [{"$unwind"=>"$states"}]

    criteria = Tour.all.unwind(path: '$states')
    criteria.pipeline
    # => [{"$unwind"=>{:path=>"$states"}}]