docs/reference/map-reduce.txt
.. _map-reduce:
**********
Map/Reduce
**********
.. default-domain:: mongodb
.. contents:: On this page
:local:
:backlinks: none
:depth: 2
:class: singlecol
Mongoid provides a DSL around MongoDB's map/reduce framework, for performing
custom map/reduce jobs or simple aggregations.
.. note::
The map-reduce operation is deprecated.
The :ref:`aggregation framework <aggregation-pipeline>` provides better
performance and usability than map-reduce operations, and should be
preferred for new development.
Execution
---------
You can tell Mongoid off the class or a criteria to perform a map/reduce
by calling ``map_reduce`` and providing map and reduce javascript
functions.
.. code-block:: ruby
map = %Q{
function() {
emit(this.name, { likes: this.likes });
}
}
reduce = %Q{
function(key, values) {
var result = { likes: 0 };
values.forEach(function(value) {
result.likes += value.likes;
});
return result;
}
}
Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)
Just like criteria, map/reduce calls are lazily evaluated. So nothing will
hit the database until you iterate over the results, or make a call on the
wrapper that would need to force a database hit.
.. code-block:: ruby
Band.map_reduce(map, reduce).out(replace: "mr-results").each do |document|
p document # { "_id" => "Tool", "value" => { "likes" => 200 }}
end
The only required thing you provide along with a map/reduce is where to
output the results. If you do not provide this an error will be raised.
Valid options to ``#out`` are:
- ``inline: 1``: Don't store the output in a collection.
- ``replace: "name"``: Store in a collection with the
provided name, and overwrite any documents that exist in it.
- ``merge: "name"``: Store in a collection with the
provided name, and merge the results with the existing documents.
- ``reduce: "name"``: Store in a collection with the
provided name, and reduce all existing results in that collection.
Raw Results
-----------
Results of Map/Reduce execution can be retrieved via the ``execute`` method
or its aliases ``raw`` and ``results``:
.. code-block:: ruby
mr = Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)
mr.execute
# => {"results"=>[{"_id"=>"Tool", "value"=>{"likes"=>200.0}}],
"timeMillis"=>14,
"counts"=>{"input"=>4, "emit"=>4, "reduce"=>1, "output"=>1},
"ok"=>1.0,
"$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00005633c2c2ad20 @seconds=1590105400, @increment=1>, "signature"=>{"hash"=><BSON::Binary:0x12240 type=generic data=0x0000000000000000...>, "keyId"=>0}},
"operationTime"=>#<BSON::Timestamp:0x00005633c2c2aaf0 @seconds=1590105400, @increment=1>}
Statistics
----------
MongoDB servers 4.2 and lower provide Map/Reduce execution statistics. As of
MongoDB 4.4, Map/Reduce is implemented via the aggregation pipeline and
statistics described in this section are not available.
The following methods are provided on the ``MapReduce`` object:
- ``counts``: Number of documents read, emitted, reduced and output through
the pipeline.
- ``input``, ``emitted``, ``reduced``, ``output``: individual count methods.
Note that ``emitted`` and ``reduced`` methods are named differently from
hash keys in ``counts``.
- ``time``: The time, in milliseconds, that Map/Reduce pipeline took to execute.
The following code illustrates retrieving the statistics:
.. code-block:: ruby
mr = Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)
mr.counts
# => {"input"=>4, "emit"=>4, "reduce"=>1, "output"=>1}
mr.input
# => 4
mr.emitted
# => 4
mr.reduced
# => 1
mr.output
# => 1
mr.time
# => 14
.. note::
Each statistics method invocation re-executes the Map/Reduce pipeline.
The results of execution are not stored by Mongoid. Consider using the
``execute`` method to retrieve the raw results and obtaining the statistics
from the raw results if multiple statistics are desired.