README.md
# ActiveRecord::HierarchicalQuery
[![Code Climate](https://codeclimate.com/github/take-five/activerecord-hierarchical_query.png)](https://codeclimate.com/github/take-five/activerecord-hierarchical_query)
[![Coverage Status](https://coveralls.io/repos/take-five/activerecord-hierarchical_query/badge.png)](https://coveralls.io/r/take-five/activerecord-hierarchical_query)
[![Gem Version](https://badge.fury.io/rb/activerecord-hierarchical_query.png)](http://badge.fury.io/rb/activerecord-hierarchical_query)
Create hierarchical queries using simple DSL, recursively
traverse trees using single SQL query.
If a table contains hierarchical data, then you can select rows
in hierarchical order using hierarchical query builder.
## Requirements
- ActiveRecord >= 5.0, < 8
- PostgreSQL >= 8.4
- Postgres Gem >= 0.21, < 1.5
Note that though PostgresSQL 8.4 and up should work, this library
is tested on PostgresSQL 15.1.
## In a nutshell
### Traverse trees
Let's say you've got an ActiveRecord model `Category` that related to itself:
```ruby
class Category < ActiveRecord::Base
belongs_to :parent, class_name: 'Category'
has_many :children, foreign_key: :parent_id, class_name: 'Category'
end
# Table definition
# create_table :categories do |t|
# t.integer :parent_id
# t.string :name
# end
```
### Traverse descendants
```ruby
Category.join_recursive do |query|
query.start_with(parent_id: nil)
.connect_by(id: :parent_id)
.order_siblings(:name)
end # returns ActiveRecord::Relation instance
```
### Traverse ancestors
```ruby
Category.join_recursive do |query|
query.start_with(id: 42)
.connect_by(parent_id: :id)
end
```
### Show breadcrumbs using single SQL query
```ruby
records = Category.join_recursive do |query|
query
# assume that deepest node has depth=0
.start_with(id: 42) { select('0 depth') }
# for each ancestor decrease depth by 1, do not apply
# following expression to first level of hierarchy
.select(query.prior[:depth] - 1, start_with: false)
.connect_by(parent_id: :id)
end.order('depth ASC')
# returns a regular ActiveRecord::Relation instance
# so methods like `pluck` all work as expected.
crumbs = records.pluck(:name).join(' / ')
```
## Installation
Add this line to your application's Gemfile:
```ruby
gem 'activerecord-hierarchical_query'
```
And then execute:
$ bundle
Or install it yourself as:
$ gem install activerecord-hierarchical_query
You'll then need to require the gem:
```ruby
require 'active_record/hierarchical_query'
```
Alternatively, the require can be placed in the `Gemfile`:
```ruby
gem 'activerecord-hierarchical_query', require: 'active_record/hierarchical_query'
```
## Usage
Let's say you've got an ActiveRecord model `Category` with
attributes `id`, `parent_id` and `name`. You can traverse nodes
recursively starting from root rows connected by `parent_id`
column ordered by `name`:
```ruby
Category.join_recursive do
start_with(parent_id: nil).
connect_by(id: :parent_id).
order_siblings(:name)
end
```
Hierarchical queries consist of these important clauses:
- **START WITH** clause
This clause specifies the root row(s) of the hierarchy.
- **CONNECT BY** clause
This clause specifies relationship between parent rows and child rows of the hierarchy.
- **ORDER SIBLINGS** clause
This clause specifies an order of rows in which they appear on each hierarchy level.
These terms are borrowed from [Oracle hierarchical queries syntax](http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm).
Hierarchical queries are processed as follows:
- First, root rows are selected -- those rows that satisfy `START WITH` condition in
order specified by `ORDER SIBLINGS` clause. In example above it's specified by
statements `query.start_with(parent_id: nil)` and `query.order_siblings(:name)`.
- Second, child rows for each root rows are selected. Each child row must satisfy
condition specified by `CONNECT BY` clause with respect to one of the root rows
(`query.connect_by(id: :parent_id)` in example above). Order of child rows is
also specified by `ORDER SIBLINGS` clause.
- Successive generations of child rows are selected with respect to `CONNECT BY` clause.
First the children of each row selected in step 2 selected, then the children of those
children and so on.
### START WITH
This clause is specified by `start_with` method:
```ruby
Category.join_recursive { start_with(parent_id: nil) }
Category.join_recursive { start_with { where(parent_id: nil) } }
Category.join_recursive { start_with { |root_rows| root_rows.where(parent_id: nil) } }
```
All of these statements are equivalent.
### CONNECT BY
This clause is necessary and specified by `connect_by` method:
```ruby
# join parent table ID columns and child table PARENT_ID column
Category.join_recursive { connect_by(id: :parent_id) }
# you can use block to build complex JOIN conditions
Category.join_recursive do
connect_by do |parent_table, child_table|
parent_table[:id].eq child_table[:parent_id]
end
end
```
### ORDER SIBLINGS
You can specify order in which rows on each hierarchy level should appear:
```ruby
Category.join_recursive { order_siblings(:name) }
# you can reverse order
Category.join_recursive { order_siblings(name: :desc) }
# arbitrary strings and Arel nodes are allowed also
Category.join_recursive { order_siblings('name ASC') }
Category.join_recursive { |query| query.order_siblings(query.table[:name].asc) }
```
### WHERE conditions
You can filter rows on each hierarchy level by applying `WHERE` conditions:
```ruby
Category.join_recursive do
connect_by(id: :parent_id).where('name LIKE ?', 'ruby %')
end
```
You can even refer to parent table, just don't forget to include
columns in `SELECT` clause!
```ruby
Category.join_recursive do |query|
query.connect_by(id: :parent_id)
.select(:name).
.where(query.prior[:name].matches('ruby %'))
end
```
Or, if Arel semantics does not fit your needs:
```ruby
Category.join_recursive do |query|
query.connect_by(id: :parent_id)
.where("#{query.prior.name}.name LIKE ?", 'ruby %')
end
```
### NOCYCLE
Recursive query will loop if hierarchy contains cycles (your
graph is not acyclic). `NOCYCLE` clause, which is turned off by
default, could prevent it.
Loop example:
```ruby
node_1 = Category.create
node_2 = Category.create(parent: node_1)
node_1.parent = node_2
node_1.save
```
`node_1` and `node_2` now link to each other, so the following
query will not terminate:
```ruby
Category.join_recursive do |query|
query.connect_by(id: :parent_id)
.start_with(id: node_1.id)
end
```
`#nocycle` method will prevent endless loop:
```ruby
Category.join_recursive do |query|
query.connect_by(id: :parent_id)
.start_with(id: node_1.id)
.nocycle
end
```
## DISTINCT
By default, the union term in the Common Table Expression uses a
`UNION ALL`. If you want to `SELECT DISTINCT` CTE values, add a
query option for `distinct`:
```ruby
Category.join_recursive do |query|
query.connect_by(id: :parent_id)
.start_with(id: node_1.id)
.distinct
end
```
If you want to join CTE terms by `UNION DISTINCT`, pass an option
to `join_recursive`:
```ruby
Category.join_recursive(union_type: :distinct) do |query|
query.connect_by(id: :parent_id)
.start_with(id: node_1.id)
end
```
## Generated SQL queries
Under the hood this extensions builds `INNER JOIN` to recursive subquery.
For example, this piece of code
```ruby
Category.join_recursive do |query|
query.start_with(parent_id: nil) { select('0 LEVEL') }
.connect_by(id: :parent_id)
.select(:depth)
.select(query.prior[:LEVEL] + 1, start_with: false)
.where(query.prior[:depth].lteq(5))
.order_siblings(:position)
.nocycle
end
```
Would generate following SQL:
```sql
SELECT "categories".*
FROM "categories" INNER JOIN (
WITH RECURSIVE "categories__recursive" AS (
SELECT depth,
0 LEVEL,
"categories"."id",
"categories"."parent_id",
ARRAY["categories"."position"] AS __order_column,
ARRAY["categories"."id"] AS __path
FROM "categories"
WHERE "categories"."parent_id" IS NULL
UNION ALL
SELECT "categories"."depth",
"categories__recursive"."LEVEL" + 1,
"categories"."id",
"categories"."parent_id",
"categories__recursive"."__order_column" || "categories"."position",
"categories__recursive"."__path" || "categories"."id"
FROM "categories" INNER JOIN
"categories__recursive" ON "categories__recursive"."id" = "categories"."parent_id"
WHERE ("categories__recursive"."depth" <= 5) AND
NOT ("categories"."id" = ANY("categories__recursive"."__path"))
)
SELECT "categories__recursive".* FROM "categories__recursive"
) AS "categories__recursive" ON "categories"."id" = "categories__recursive"."id"
ORDER BY "categories__recursive"."__order_column" ASC
```
If you want to use a `LEFT OUTER JOIN` instead of an `INNER JOIN`,
add a query option for `outer_join_hierarchical`. This
option allows the query to return non-hierarchical entries:
```ruby
.join_recursive(outer_join_hierarchical: true)
```
If, when joining the recursive view to the main table, you want
to change the foreign_key on the recursive view from the primary
key of the main table to another column:
```ruby
.join_recursive(foreign_key: another_column)
```
## Related resources
- [About hierarchical queries (Wikipedia)](http://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL)
- [Hierarchical queries in Oracle](http://docs.oracle.com/cd/B19306_01/server.102/b14200/queries003.htm)
- [Recursive queries in PostgreSQL](http://www.postgresql.org/docs/9.3/static/queries-with.html)
- [Using Recursive SQL with ActiveRecord trees](http://hashrocket.com/blog/posts/recursive-sql-in-activerecord)
## Contributing
Read through the short
[contributing guide](https://github.com/take-five/activerecord-hierarchical_query/blob/master/CONTRIBUTING.md).