docs/modules/ROOT/pages/node_pattern.adoc
= Node Pattern
Node pattern is a DSL to help find specific nodes in the Abstract Syntax Tree
using a simple string.
It reminds the simplicity of regular expressions but used to find specific
nodes of Ruby code.
== History
The Node Pattern was introduced by https://github.com/alexdowad[Alex Dowad]
and solves a problem that RuboCop contributors were facing for a long time:
* Ability to declaratively define rules for node search, matching, and capture.
The code below belongs to https://www.rubydoc.info/gems/rubocop/RuboCop/Cop/Style/ArrayJoin[Style/ArrayJoin]
cop and it's in favor of `Array#join` over `Array#*`. Then it tries to find
code like `%w(one two three) * ", "` and suggest to use `#join` instead.
It can also be an array of integers, and the code doesn't check it. However,
it checks if the argument sent is a string.
[source,ruby]
----
def on_send(node)
receiver_node, method_name, *arg_nodes = *node
return unless receiver_node && receiver_node.array_type? &&
method_name == :* && arg_nodes.first.str_type?
add_offense(node, location: :selector)
end
----
This code was replaced in the cop defining a new matcher that does the same as the code above:
[source,ruby]
----
def_node_matcher :join_candidate?, '(send $array :* $str)'
----
And the `on_send` method is simplified to a method usage:
[source,ruby]
----
def on_send(node)
join_candidate?(node) { add_offense(node, location: :selector) }
end
----
== Ruby Abstract Syntax Tree (AST)
Parser translates Ruby source code to a tree structure represented in text.
A simple integer literal like `1` is represented by `(int 1)` in the AST.
A method call with two integer literals:
[source,ruby]
----
foo(1, 2)
----
is represented with:
[source]
----
(send nil :foo
(int 1)
(int 2)
)
----
Every node is represented with a sequence.
The first element is the node type.
Other elements are the children. They are optionally present and depend on the node type.
E.g.:
* `nil` is just `(nil)`
* `1` is `(int 1)`
* `[1]` is `(array (int 1))`
* `[1, 2]` is `(array (int 1) (int 2))`
* `foo` is `(send nil :foo)`
* `foo(1)` is `(send nil :foo (int 1))`
=== Getting the AST representation
==== From the command-line with `ruby-parse`
[source,sh]
----
$ ruby-parse --legacy -e 'foo(1)'
(send nil :foo
(int 1))
----
NOTE: Use the `--legacy` `ruby-parse` flag to get https://github.com/whitequark/parser/#usage[the same AST that RuboCop AST returns].
There are several differences, e.g. without `--legacy`, `foo(a: 1)` would return `kwargs`, and with `--legacy` it returns `hash`.
==== From REPL
[source,ruby]
----
> puts RuboCop::AST::ProcessedSource.new('foo(1)', RUBY_VERSION.to_f).ast.to_s
(send nil :foo
(int 1))
----
== Basic Node Pattern Structure
The simplest Node Pattern would match just the node type.
E.g. the `int` node pattern would match the `(int 1)` AST (literal `1` in Ruby code).
More sophisticated node patterns match more than one child.
== `(` and `)` to Match Elements
Several matchers surrounded by parentheses would match a node with elements each matching a corresponding matcher, order-dependently.
Ruby code with an array with two integer literals, `[1, 2]` represented in AST as `(array (int 1) (int 2))` could be matched with `(array int int)` node pattern.
For a literal integer, e.g. `1` Ruby code represented by `(int 1)` in AST:
* `int` node pattern will match exactly the node, looking only the node type
* `(int 1)` node pattern will match precisely the node
* `(int 2)` node pattern will not match
== `(` and `)` for Nested Matching
Ruby code with a method call with two integer literals as arguments, `foo(1, 2)` represented in AST as `(send nil :foo (int 1) (int 2))` could be matched with `(send nil? :foo int int)` node pattern.
To match just those method calls where the first argument is a literal `1`, use `(send nil? :foo (int 1) int)`.
Any child that is a node can be a target for nested matching.
== `_` for any single node
`_` will check if there's something present in the specific position, no matter the
value:
* `(int _)` will match any number
* `(int _ _)` will not match because `int` types have just one child that
contains the value.
== `+...+` for several subsequent nodes
Where `_` matches any single node, `+...+` matches any number of nodes.
Say for example you want to find instances of calls to the method `sum` with any
number of arguments, be it `sum(1, 2)` or `sum(1, 2, 3, n)`.
First, let's check how it looks like in the AST:
[source,sh]
----
$ ruby-parse -e 'sum(1, 2)'
(send nil :sum
(int 1)
(int 2))
----
Or with more children:
[source,sh]
----
$ ruby-parse -e 'sum(1, 2, 3, n)'
(send nil :sum
(int 1)
(int 2)
(int 3)
(send nil :n))
----
The following expression would only match a call with 2 arguments:
----
(send nil? :sum _ _)
----
Instead, the following expression will any number of arguments (and thus both examples above):
----
(send nil? :sum ...)
----
Note that `+...+` can be appear anywhere in a sequence, for example `+(send nil? :sum ... int)+`
would no longer match the second example, as the last argument is not an integer.
Nesting `+...+` is also supported; the only limitation is that `+...+` and
other "variable length" patterns can only appear once within a sequence.
For example `+(send ... :sum ...)+` is not supported.
== `*`, `+`, `?` for repetitions
Another way to handle a variable number of nodes is by using `*`, `+`, `?` to signify
a particular pattern should match any number of times, at least once and at most once respectively.
Following on the previous example, to find sums of integer literals, we could use:
----
(send nil? :sum int*)
----
This would match our first example `sum(1, 2)` but not the other `sum(1, 2, 3, n)`
This pattern would also match a call to `sum` without any argument, which might not be desirable.
Using `+` would insure that only sums with at least one argument would be matched.
----
(send nil? :sum int+)
----
The `?` can limit the match only 0 or 1 nodes.
The following example would match any sum of three integer literals
optionally followed by a method call:
----
(send nil? :sum int int int send ?)
----
Note that we have to put a space between `send` and `?`,
since `send?` would be considered as a predicate (described below).
== `<>` for match in any order
You may not care about the exact order of the nodes you want to match.
In this case you can put the nodes without brackets:
----
(send nil? :sum <(int 2) int>)
----
This will match our first example (`sum(1, 2)`).
It won't match our second example though, as it specifies that there must be
exactly two arguments to the method call `sum`.
You can add `+...+` before the closing bracket to allow for additional parameters:
----
(send nil? :sum <(int 2) int ...>)
----
This will match both our examples, but not `sum(1.0, 2)` or `sum(2)`,
since the first node in the brackets is found, but not the second (`int`).
== `{}` for "OR" (union)
Lets make it a bit more complex and introduce floats:
[source,sh]
----
$ ruby-parse -e '1'
(int 1)
$ ruby-parse -e '1.0'
(float 1.0)
----
* `({int | float} _)` - int or float types, no matter the value
Branches of the union can contain more than one term:
* `(array {int int | range})` - matches an array with two integers or a single range element
If all the branches have a single term, you can omit the `|`, so `{int | float}` can be
simplified to `{int float}`.
When checking for symbols or string, you can use regexp literals for a similar effect:
[source,sh]
----
(send _ /to_s|inspect/) # => matches calls to `to_s` or `inspect`
----
== `[]` for "AND"
Imagine you want to check if the number is `odd?` and also positive numbers:
`(int [odd? positive?])` - is an int and the value should be odd and positive.
NOTE: Refer to <<Predicate methods>> to see how `odd?` works.
== `!` for Negation
Node pattern `(send nil? :sum !int _)` would match a `sum` call where the first argument is *not* a literal integer.
E.g.:
* it will match `sum(2.0, 3)`, as the first argument is of a `float` type
* it will not match `sum(2, 3)`, as the first argument is of an `int` type
NOTE: Negation operator works with other node pattern syntax elements, `{}`, `[]`, `()`, `$`, but only with those that target a single element. E.g. `$!(int 1)`, `!{false nil}`, `![#positive? #even?]` will work, while `!{int int | sym}`, `!{int int | sym sym}`, and any use of `<>` won't.
== `$` for captures
You can capture elements or nodes along with your search, prefixing the expression
with `$`. For example, in a tuple like `(int 1)`, you can capture the value using `(int $_)`.
You can also capture multiple things like:
----
(${int float} $_)
----
The tuple can be entirely captured using the `$` before the open parens:
----
$({int float} _)
----
Or remove the parens and match directly from node head:
----
${int float}
----
All variable length patterns (`+...+`, `*`, `+`, `?`, `<>`) are captured as arrays.
The following pattern will have two captures, both arrays:
----
(send nil? $int+ (send $...))
----
== `^` for parent
One may use the `^` character to check against a parent.
For example, the following pattern would find any node with two children and
with a parent that is a hash:
----
(^hash _key $_value)
----
It is possible to use `^` somewhere else than the head of a sequence; in that
case it is relative to that child (i.e. the current node). One case also use
multiple `^` to go up multiple levels.
For example, the previous example is basically the same as:
----
(pair ^^hash $_value)
----
== ``` for descendants
The ``` character can be used to search a node and all its descendants.
For example if looking for a `return` statement anywhere within a method definition,
we can write:
----
(def _method_name _args `return)
----
This would match both of these methods `foo` and `bar`, even though
these `return` for `foo` and `bar` are not at the same level.
----
def foo # (def :foo
return 42 # (args)
end # (return
# (int 42)))
def bar # (def :bar
return 42 if foo # (args)
nil # (begin
end # (if
# (send nil :foo)
# (return
# (int 42)) nil)
# (nil)))
----
== Predicate methods
Words which end with a `?` are predicate methods, are called on the target
to see if it matches any Ruby method which the matched object supports can be
used.
Example:
* `int_type?` can be used herein replacement of `(int _)`.
And refactoring the expression to allow both int or float types:
* `{int_type? float_type?}` can be used herein replacement of `({int float} _)`
You can also use it at the node level, asking for each child:
* `(int odd?)` will match only with odd numbers, asking it to the current
number.
== `#` to call functions
Sometimes, we want to add extra logic. Let's imagine we're searching for
prime numbers, so we have a method to detect it:
[source,ruby]
----
def prime?(n)
if n <= 1
false
elsif n == 2
true
else
(2..n/2).none? { |i| n % i == 0 }
end
end
----
We can use the `#prime?` function directly in the expression:
----
(int #prime?)
----
You may call a method on a constant too. Let's say you define:
[source,ruby]
----
module Util
def self.palindrome?(str)
str == str.reverse
end
end
----
You can refer to it like this:
----
(str #Util.palindrome?)
----
== Arguments for predicate and function calls
Arguments can be passed to predicates and function calls, like literals, parameters:
[source,ruby]
----
def divisible_by?(value, divisor)
value % divisor == 0
end
----
Example patterns using this function:
----
(int #divisible_by?(42))
(send (int _value) :+ (int #divisible_by?(_value))
----
The arguments can be pattern themselves, in which case a matcher responding to `===` will be passed. This makes patterns composable:
```ruby
def_node_matcher :global_const?, '(const {nil? cbase} %1)'
def_node_matcher :class_creator, '(send #global_const?({:Class :Module}) :new ...)'
```
== Using node matcher macros
The RuboCop base includes two useful methods to use the node pattern with Ruby in a
simple way. You can use the macros to define methods. The basics are
https://www.rubydoc.info/gems/rubocop-ast/RuboCop/AST/NodePattern/Macros#def_node_matcher-instance_method[def_node_matcher]
and https://www.rubydoc.info/gems/rubocop-ast/RuboCop/AST/NodePattern/Macros#def_node_search-instance_method[def_node_search].
When you define a pattern, it creates a method that accepts a node and tries to match.
Lets create an example where we're trying to find the symbols `user` and
`current_user` in expressions like: `user: current_user` or
`current_user: User.first`, so the objective here is pick all keys:
[source,sh]
----
$ ruby-parse -e ':current_user'
(sym :current_user)
$ ruby-parse -e ':user'
(sym :user)
$ ruby-parse -e '{ user: current_user }'
(hash
(pair
(sym :user)
(send nil :current_user)))
----
Our minimal matcher can get it in the simple node `sym`:
[source,ruby]
----
def_node_matcher :user_symbol?, '(sym {:current_user :user})'
----
=== Composing complex expressions with multiple matchers
Now let's go deeply combining the previous expression and also match if the
current symbol is being called from an initialization method, like:
[source,sh]
----
$ ruby-parse --legacy -e 'Comment.new(user: current_user)'
(send
(const nil :Comment) :new
(hash
(pair
(sym :user)
(send nil :current_user))))
----
And we can also reuse this and check if it's a constructor:
[source,ruby]
----
def_node_matcher :initializing_with_user?, <<~PATTERN
(send _ :new (hash (pair #user_symbol?)))
PATTERN
----
== `%` for arguments
Arguments can be passed to matchers, either as external method arguments,
or to be used to compare elements. An example of method argument:
[source,ruby]
----
def multiple_of?(n, factor)
n % factor == 0
end
def_node_matcher :int_node_multiple?, '(int #multiple_of?(%1))'
# ...
int_node_multiple?(node, 10) # => true if node is an 'int' node with a multiple of 10
----
Arguments can be used to match nodes directly:
[source,ruby]
----
def_node_matcher :has_sensitive_data?, '(hash <(pair (_ %1) $_) ...>)'
# ...
has_sensitive_data?(node, :password) # => true if node is a hash with a key +:password+
# matching uses ===, so to match strings or symbols, 'pass' or 'password' one can:
has_sensitive_data?(node, /^pass(word)?$/i)
# one can also pass lambdas...
has_sensitive_data?(node, ->(key) { # return true or false depending on key })
----
NOTE: `Array#===` will never match a single node element (so don't pass arrays),
but `Set#===` is an alias to `Set#include?` (Ruby 2.5+ only), and so can be
very useful to match within many possible literals / Nodes.
== `%param_name` for named parameters
Arguments can be passed as named parameters. They will be matched using `===`
(see `%` above).
Contrary to positional arguments, defaults values can be passed to
`def_node_matcher` and `def_node_search`:
[source,ruby]
----
def_node_matcher :interesting_call?, '(send _ %method ...)',
method: Set[:transform_values, :transform_keys,
:transform_values!, :transform_keys!,
:to_h].freeze
# Usage:
interesting_call?(node) # use the default methods
interesting_call?(node, method: /^transform/) # match anything starting with 'transform'
----
Named parameters as arguments to custom methods are also supported.
== `CONST` or `%CONST` for constants
Constants can be included in patterns. They will be matched using `===`, so
+Regexp+ / +Set+ / +Proc+ can be used in addition to literals and +Nodes+:
[source,ruby]
----
SOME_CALLS = Set[:transform_values, :transform_keys,
:transform_values!, :transform_keys!,
:to_h].freeze
def_node_matcher :interesting_call?, '(send _ SOME_CALLS ...)'
----
Constants as arguments to custom methods are also supported.
== Comments
You may have comments in node patterns at the end of lines
by preceding them with `'# '`:
[source,ruby]
----
def_node_matcher :complex_stuff, <<~PATTERN
(send
{#global_const?(:Kernel) nil?} # check for explicit call like Kernel.p too
{:p :pp} # let's consider `pp` also
$... # capture all arguments
)
PATTERN
----
== `nil` or `nil?`
Take a special attention to nil behavior:
[source,sh]
----
$ ruby-parse -e 'nil'
(nil)
----
In this case, the `nil` implicit matches with expressions like: `nil`, `(nil)`, or `nil_type?`.
But, nil is also used to represent a call from `nothing` from a simple method call:
[source,sh]
----
$ ruby-parse -e 'method'
(send nil :method)
----
Then, for such case you can use the predicate `nil?`. And the code can be
matched with an expression like:
----
(send nil? :method)
----
== More resources
Curious about how it works?
Check more details in the
https://www.rubydoc.info/gems/rubocop-ast/RuboCop/AST/NodePattern[documentation]
or browse the https://github.com/rubocop/rubocop-ast/blob/master/lib/rubocop/ast/node_pattern.rb[source code]
directly. It's easy to read and hack on.
The https://github.com/rubocop/rubocop-ast/blob/master/spec/rubocop/ast/node_pattern_spec.rb[specs]
are also very useful to comprehend each feature.