mongodb/mongoid

View on GitHub
docs/reference/map-reduce.txt

Summary

Maintainability
Test Coverage
.. _map-reduce:

**********
Map/Reduce
**********

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 2
   :class: singlecol


Mongoid provides a DSL around MongoDB's map/reduce framework, for performing
custom map/reduce jobs or simple aggregations.

.. note::

  The map-reduce operation is deprecated.
  The :ref:`aggregation framework <aggregation-pipeline>` provides better
  performance and usability than map-reduce operations, and should be
  preferred for new development.

Execution
---------

You can tell Mongoid off the class or a criteria to perform a map/reduce
by calling ``map_reduce`` and providing map and reduce javascript
functions.

.. code-block:: ruby

  map = %Q{
    function() {
      emit(this.name, { likes: this.likes });
    }
  }

  reduce = %Q{
    function(key, values) {
      var result = { likes: 0 };
      values.forEach(function(value) {
        result.likes += value.likes;
      });
      return result;
    }
  }

  Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)

Just like criteria, map/reduce calls are lazily evaluated. So nothing will
hit the database until you iterate over the results, or make a call on the
wrapper that would need to force a database hit.

.. code-block:: ruby

  Band.map_reduce(map, reduce).out(replace: "mr-results").each do |document|
    p document # { "_id" => "Tool", "value" => { "likes" => 200 }}
  end

The only required thing you provide along with a map/reduce is where to
output the results. If you do not provide this an error will be raised.
Valid options to ``#out`` are:

- ``inline: 1``: Don't store the output in a collection.
- ``replace: "name"``: Store in a collection with the
  provided name, and overwrite any documents that exist in it.
- ``merge: "name"``: Store in a collection with the
  provided name, and merge the results with the existing documents.
- ``reduce: "name"``: Store in a collection with the
  provided name, and reduce all existing results in that collection.

Raw Results
-----------

Results of Map/Reduce execution can be retrieved via the ``execute`` method
or its aliases ``raw`` and ``results``:

.. code-block:: ruby

  mr = Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)

  mr.execute
  # => {"results"=>[{"_id"=>"Tool", "value"=>{"likes"=>200.0}}],
        "timeMillis"=>14,
        "counts"=>{"input"=>4, "emit"=>4, "reduce"=>1, "output"=>1},
        "ok"=>1.0,
        "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00005633c2c2ad20 @seconds=1590105400, @increment=1>, "signature"=>{"hash"=><BSON::Binary:0x12240 type=generic data=0x0000000000000000...>, "keyId"=>0}},
        "operationTime"=>#<BSON::Timestamp:0x00005633c2c2aaf0 @seconds=1590105400, @increment=1>}


Statistics
----------

MongoDB servers 4.2 and lower provide Map/Reduce execution statistics. As of
MongoDB 4.4, Map/Reduce is implemented via the aggregation pipeline and
statistics described in this section are not available.

The following methods are provided on the ``MapReduce`` object:

- ``counts``: Number of documents read, emitted, reduced and output through
  the pipeline.
- ``input``, ``emitted``, ``reduced``, ``output``: individual count methods.
  Note that ``emitted`` and ``reduced`` methods are named differently from
  hash keys in ``counts``.
- ``time``: The time, in milliseconds, that Map/Reduce pipeline took to execute.

The following code illustrates retrieving the statistics:

.. code-block:: ruby

  mr = Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)

  mr.counts
  # => {"input"=>4, "emit"=>4, "reduce"=>1, "output"=>1}

  mr.input
  # => 4

  mr.emitted
  # => 4

  mr.reduced
  # => 1

  mr.output
  # => 1

  mr.time
  # => 14

.. note::

  Each statistics method invocation re-executes the Map/Reduce pipeline.
  The results of execution are not stored by Mongoid. Consider using the
  ``execute`` method to retrieve the raw results and obtaining the statistics
  from the raw results if multiple statistics are desired.