mongodb/mongo-ruby-driver

View on GitHub
docs/reference/collations.txt

Summary

Maintainability
Test Coverage
**********
Collations
**********

.. default-domain:: mongodb

.. contents:: On this page
   :local:
   :backlinks: none
   :depth: 1
   :class: singlecol

Overview
========

.. versionadded:: 3.4

Collations are sets of rules for how to compare strings, typically in a
particular natural language.

For example, in Canadian French, the last accent in a given word
determines the sorting order.

Consider the following French words:

.. code-block:: none

   cote < coté < côte < côté

The sort order using the Canadian French collation would result in
the following:

.. code-block:: none

   cote < côte < coté < côté

If collation is unspecified, MongoDB uses the simple binary comparison for
strings.  As such, the sort order of the words would be:

.. code-block:: none

    cote < coté < côte < côté

Usage
=====

You can specify a default collation for collections and indexes when
they are created, or specify a collation for CRUD operations and
aggregations. For operations that support collation, MongoDB uses the
collection's default collation unless the operation specifies a
different collation.

Collation Parameters
~~~~~~~~~~~~~~~~~~~~

.. code-block:: ruby

   'collation' => {
      'locale' => <string>,
      'caseLevel' => <bool>,
      'caseFirst' => <string>,
      'strength' => <int>,
      'numericOrdering' => <bool>,
      'alternate' => <string>,
      'maxVariable' => <string>,
      'normalization' => <bool>,
      'backwards' => <bool>
   }

The only required parameter is ``locale``, which the server parses as
an `ICU format locale ID <http://userguide.icu-project.org/locale>`_.
For example, set ``locale`` to ``en_US`` to represent US English
or ``fr_CA`` to represent Canadian French.

For a complete description of the available parameters, see the
:manual:`MongoDB manual entry</reference/collation>`.

.. _collation-on-collection:

Assign a Default Collation to a Collection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following example creates a new collection
called ``contacts`` on the ``test`` database and assigns a default
collation with the ``fr_CA`` locale. Specifying a collation when you
create the collection ensures that all operations involving a query
that are run against the
``contacts`` collection use the ``fr_CA`` collation, unless the query
specifies another collation. Any indexes on the new collection also
inherit the default collation, unless the creation command specifies
another collation.

.. code-block:: ruby

  client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test")
  client[:contacts, { "collation" => { "locale" => "fr_CA" } } ].create

.. _collation-on-index:

Assign a Collation to an Index
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To specify a collation for an index, use the ``collation``
option when you create the index.

The following example creates an index on the ``name``
field of the ``address_book`` collection, with the ``unique`` parameter
enabled and a default collation with ``locale`` set to ``en_US``.

.. code-block:: ruby

  client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test")
  client[:address_book].indexes.create_one( { "first_name" => 1 },
                                          "unique" => true,
                                          "collation" => { "locale" => "en_US" }
                                         )

To use this index, make sure your queries also specify the same
collation. The following query uses the above index:

.. code-block:: ruby

  client[:address_book].find({"first_name" : "Adam" },
                              "collation" => { "locale" => "en_US" })

The following queries do **NOT** use the index. The first query uses no
collation, and the second uses a collation with a different ``strength``
value than the collation on the index.

.. code-block:: ruby

  client[:address_book].find({"first_name" : "Adam" })

  client[:address_book].find({"first_name" : "Adam" },
                              "collation" => { "locale" => "en_US", "strength" => 2 })

Operations that Support Collation
=================================

All reading, updating, and deleting methods support collation. Some
examples are listed below.

``find()`` and ``sort()``
~~~~~~~~~~~~~~~~~~~~~~~~~

Individual queries can specify a collation to use when matching
and sorting results. The following query and sort operation uses
a German collation with the ``locale`` parameter set to ``de``.

.. code-block:: ruby

  client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test")
  docs = client[:contacts].find({ "city" => "New York" },
    { "collation" => { "locale" => "de" } }).sort( "name" => 1 )

``find_one_and_update()``
~~~~~~~~~~~~~~~~~~~~~~~~~

A collection called ``names`` contains the following documents:

.. code-block:: javascript

  { "_id" : 1, "first_name" : "Hans" }
  { "_id" : 2, "first_name" : "Gunter" }
  { "_id" : 3, "first_name" : "Günter" }
  { "_id" : 4, "first_name" : "Jürgen" }

The following ``find_one_and_update`` operation on the collection
does not specify a collation.

.. code-block:: ruby

   client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test")
   doc = client[:names].find_one_and_update( {"first_name" => { "$lt" => "Gunter" }},
     { "$set" => { "verified" => true } })

Because ``Gunter`` is lexically first in the collection,
the above operation returns no results and updates no documents.

Consider the same ``find_one_and_update`` operation but with the
collation specified.  The locale is set to ``de@collation=phonebook``.

.. note::

   Some locales have a ``collation=phonebook`` option available for
   use with languages which sort proper nouns differently from other
   words. According to the ``de@collation=phonebook`` collation,
   characters with umlauts come before the same characters without
   umlauts.

.. code-block:: ruby

   client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test")
   doc = client[:names].find_one_and_update( { "first_name" => { "$lt" => "Gunter" } },
     { "$set" => { "verified" => true } }, { "collation" => { "locale" => "de@collation=phonebook" },
     :return_document => :after } )

The operation returns the following updated document:

.. code-block:: javascript

   { "_id" => 3, "first_name" => "Günter", "verified" => true }

``find_one_and_delete()``
~~~~~~~~~~~~~~~~~~~~~~~~~

Set the ``numericOrdering`` collation parameter to ``true``
to compare numeric string by their numeric values.

The collection ``numbers`` contains the following documents:

.. code-block:: javascript

   { "_id" : 1, "a" : "16" }
   { "_id" : 2, "a" : "84" }
   { "_id" : 3, "a" : "179" }

The following example matches the first document in which field ``a``
has a numeric value greater than 100 and deletes it.

.. code-block:: ruby

   docs = numbers.find_one_and_delete({ "a" => { "$gt" => "100" } },
     { "collation" => { "locale" => "en", "numericOrdering" => true  } })

After the above operation, the following documents remain in the
collection:

.. code-block:: javascript

   { "_id" : 1, "a" : "16" }
   { "_id" : 2, "a" : "84" }

If you perform the same operation without collation, the server deletes
the first document it finds in which the lexical value of ``a`` is
greater than ``"100"``.

.. code-block:: ruby

   numbers = client[:numbers]
   docs = numbers.find_one_and_delete({ "a" => { "$gt" => "100" } })

After the above operation the document in which ``a`` was equal to
``"16"`` has been deleted, and the following documents remain in the
collection:

.. code-block:: javascript

   { "_id" : 2, "a" : "84" }
   { "_id" : 3, "a" : "179" }

``delete_many()``
~~~~~~~~~~~~~~~~~

You can use collations with all the various bulk operations which
exist in the Ruby driver.

The collection ``recipes`` contains the following documents:

.. code-block:: javascript

   { "_id" : 1, "dish" : "veggie empanadas", "cuisine" : "Spanish" }
   { "_id" : 2, "dish" : "beef bourgignon", "cuisine" : "French" }
   { "_id" : 3, "dish" : "chicken molé", "cuisine" : "Mexican" }
   { "_id" : 4, "dish" : "chicken paillard", "cuisine" : "french" }
   { "_id" : 5, "dish" : "pozole verde", "cuisine" : "Mexican" }

Setting the ``strength`` parameter of the collation document to ``1``
or ``2`` causes the server to disregard case in the query filter. The
following example uses a case-insensitive query filter
to delete all records in which the ``cuisine`` field matches
``French``.

.. code-block:: ruby

   client = Mongo::Client.new([ "127.0.0.1:27017" ], :database => "test")
   recipes = client[:recipes]
   docs = recipes.delete_many({ "cuisine" => "French" },
                                "collation" => { "locale" => "en_US", "strength" => 1 })

After the above operation runs, the documents with ``_id`` values of
``2`` and ``4`` are deleted from the collection.

Aggregation
~~~~~~~~~~~

To use collation with an aggregation operation, specify a collation in
the aggregation options.

The following aggregation example uses a collection called ``names``
and groups the ``first_name`` field together, counts the total
number of results in each group, and sorts the
results by German phonebook order.

.. code-block:: ruby

  aggregation = names.aggregate(
    [
      {
        "$group" => { "_id" => "$first_name", "name_count" => { "$sum" => 1 } }
      },
      {
          "$sort" => { "_id" => 1 }
      },

    ], { "collation" => { "locale" => "de@collation=phonebook" } }
  )

  aggregation.each do |doc|
    #=> Yields a BSON::Document.
  end