mongodb/mongo-ruby-driver

View on GitHub
docs/reference/collection-tasks.txt

Summary

Maintainability
Test Coverage
***********
Collections
***********

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 1
   :class: singlecol

MongoDB stores documents in collections. If a collection does not
exist, MongoDB creates the collection when you first insert a
document in that collection.

You can also explicitly create a collection with various options,
such as setting the maximum size or the documentation validation rules.

Time Series Collections
```````````````````````

Time series collections were added in MongoDB 5.0. You can read the documentation
`here <https://www.mongodb.com/docs/manual/core/timeseries-collections/>`_.

Time series collections efficiently store sequences of measurements over a
period of time. Time series data is any data that is collected over time and is
uniquely identified by one or more unchanging parameters. The unchanging
parameters that identify your time series data is generally your data source's
metadata.

Creating a Time Series Collection
---------------------------------
In order to create a time series collection, you must explicitly create a
collection using the time series options:

.. code-block:: ruby

  opts = {
    time_series: {
      timeField: "timestamp",
      metaField: "metadata",
      granularity: "hours"
    },
    expire_after: 604800
  }

  db['weather', opts].create

When creating a time series collection, specify the following options:

.. list-table::
   :header-rows: 1
   :widths: 40 80

   * - Field
     - Description
   * - ``time_series[:timeField]``
     - Required. The name of the field which contains the date in each time series document.
   * - ``time_series[:metaField]``
     - Optional. The name of the field which contains metadata in each time series document. The metadata in the specified field should be data that is used to label a unique series of documents. The metadata should rarely, if ever, change.
   * - ``time_series[:granularity]``
     - Optional. Possible values are "seconds", "minutes", and "hours". By default, MongoDB sets the granularity to "seconds" for high-frequency ingestion.
   * - ``:expireAfterSeconds``
     - Optional. Enable the automatic deletion of documents in a time series collection by specifying the number of seconds after which documents expire. MongoDB deletes expired documents automatically.

See the MongoDB `docs <https://www.mongodb.com/docs/manual/core/timeseries-collections/#create-a-time-series-collection>`_
for more information about time series collection options.

Inserting into a Time Series Collection
---------------------------------------

Inserting into a time series collection is similar to inserting into a regular collection:

.. code-block:: ruby

  db['weather'].insert_many([
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 18, 0, 0, 0),
        temp: 12
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 18, 4, 0, 0),
        temp: 11
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 18, 8, 0, 0),
        temp: 11
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 18, 12, 0, 0),
        temp: 12
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 18, 16, 0, 0),
        temp: 16
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 18, 20, 0, 0),
        temp: 15
    }, {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 19, 0, 0, 0),
        temp: 13
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 19, 4, 0, 0),
        temp: 12
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 19, 8, 0, 0),
        temp: 11
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 19, 12, 0, 0),
        temp: 12
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 19, 16, 0, 0),
        temp: 17
    },
    {
        metadata: { sensorId: 5578, type: "temperature" },
        timestamp: Time.utc(2021, 5, 19, 20, 0, 0),
        temp: 12
    }
  ])


Querying a Time Series Collection
---------------------------------

Querying a time series collection is also very similar to a regular collection:

.. code-block:: ruby

  weather.find(timestamp: Time.utc(2021, 5, 18, 0, 0, 0)).first

The result of this query:

.. code-block:: ruby

  {
    "timestamp" => 2021-05-18 00:00:00 UTC,
    "metadata" => {
      "sensorId" => 5578,
      "type" => "temperature"
    },
    "temp" => 12,
    "_id" => BSON::ObjectId('624dfb87d1327a60aeb048d2')
  }


Using the Aggregation Pipeline on a Time Series Collection
----------------------------------------------------------

The aggregation pipeline can also be used for additional query functionality:

.. code-block:: ruby

  weather.aggregate([
    {
      "$project": {
        date: {
          "$dateToParts": { date: "$timestamp" }
        },
        temp: 1
      }
    },
    {
      "$group": {
        _id: {
          date: {
            year: "$date.year",
            month: "$date.month",
            day: "$date.day"
          }
        },
        avgTmp: { "$avg": "$temp" }
      }
    }
  ]).to_a

The example aggregation pipeline groups all documents by the date of the
measurement and then returns the average of all temperature measurements
that day:

.. code-block:: ruby

  [{
    "_id" => {
      "date" => {
        "year" => 2021,
        "month" => 5,
        "day" => 18
      }
    },
    "avgTmp" => 12.833333333333334
  },
  {
    "_id" => {
      "date" => {
        "year" => 2021,
        "month" => 5,
        "day" => 19
      }
    },
    "avgTmp" => 12.833333333333334
  }]

See the MongoDB documentation on `time series collections <https://www.mongodb.com/docs/manual/core/timeseries-collections/#time-series-collections>`_
for more information.

Capped Collections
``````````````````

Capped collections have maximum size or document counts that prevent
them from growing beyond maximum thresholds. All capped collections must
specify a maximum size and may also specify a maximum document count.
MongoDB removes older documents if a collection reaches the maximum size
limit before it reaches the maximum document count.

To create a :manual:`capped collection</core/capped-collections/>`, use
the ``capped: true`` option along with a ``size`` in bytes.

.. code-block:: ruby

  client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'music')
  collection = client[:artists, capped: true, size: 10000]
  collection.create
  collection.capped? # => true

Convert an Existing Collection to Capped
````````````````````````````````````````

To convert an existing collection from non-capped to capped, use
the ``convertToCapped`` command.

.. code-block:: ruby

  client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'music')
  db = client.database
  db.command({ 'convertToCapped' => 'artists', 'size' => 10000 })


Document Validation
```````````````````

If you're using MongoDB version 3.2 or later, you can use
:manual:`document validation</core/document-validation/>`.
Collections with validations compare each inserted or updated
document against the criteria specified in the validator option.
Depending on the ``validationLevel`` and ``validationAction``, MongoDB
either returns a warning, or refuses to insert or update the document
if it fails to meet the specified criteria.

The following example creates a ``contacts`` collection with a validator
that specifies that inserted or updated documents should match at
least one of three following conditions:

- the ``phone`` field is a string
- the ``email`` field matches the regular expression
- the ``status`` field is either ``Unknown`` or ``Incomplete``.

.. code-block:: ruby

  client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'test')
  client[:contacts,

      {
         'validator' => { '$or' =>
            [
               { 'phone' => { '$type' => "string" } },
               { 'email' => { '$regex' => /@mongodb\.com$/ } },
               { 'status' => { '$in' => [ "Unknown", "Incomplete" ] } }
            ]
         }
      }

    ].create

Add Validation to an Existing Collection
````````````````````````````````````````

To add document validation criteria to an existing collection, use the
``collMod`` command. The example below demonstrates how to add a
validation to the ``contacts`` collection, ensuring that all new
documents must contain an ``age`` field which is a number.

.. code-block:: ruby

  client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'test')
  db = client.database
  db.command({ 'collMod' => 'contacts',
               'validator' =>
                 { 'age' =>
                   { '$type' => "number" }
                 }
             })

Listing Collections
```````````````````

Use ``collections`` or ``collection_names`` methods on a database
objects to list collections:

.. code-block:: ruby

  client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'music')
  database = client.database

  database.collections      # Returns an array of Collection objects.
  database.collection_names # Returns an array of collection names as strings.

Dropping Collections
````````````````````

To drop a collection, call ``drop`` on the collection object.

.. code-block:: ruby

  client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => 'music')
  artists = client[:artists]
  artists.drop