j-mie6/parsley

View on GitHub
docs/api-guide/errors/combinator.md

Summary

Maintainability
Test Coverage
{%
laika.versioned = true
laika.title = "`combinator`"
parsley.tabname = "Error Message Combinators (parsley.errors.combinator)"
laika.site.metadata.description = "This page describes how to generate informative errors."
%}

# Error Message Combinators
Aside from the failures generated by character consumption, `parsley` has
many combinators for both generating failures unconditionally, as well as
augenting existing errors with more information. These are found within
the `parsley.errors.combinator` module.

@:callout(info)
*The Scaladoc for this page can be found at [`parsley.errors.combinator`](@:api(parsley.errors.combinator$)).*
@:@

## Failure Combinators
Normally, failures can be generated by `empty`, `satisfy`, `string`, and
`notFollowedBy`; as well as their derivatives. However, those do not capture the
full variety of "unexpected" parts of error messages. In the below table, `empty`
corresponds to `empty(0)` (these are both found in `parsley.Parsley`). The
*named* items are produced by `unexpected` combinators, and wider carets of
*empty* items can be obtained by passing wider values to `empty`. This is summarised in the table below.

| Caret |  *empty*   | *raw/eof* |      *named*       |
|-------|------------|-----------|--------------------|
| `0`   | `empty(0)` | n/a       | `unexpected(0, _)` |
| `1`   | `empty(1)` | `satisfy` | `unexpected(1, _)` |
| `n`   | `empty(n)` | `string`  | `unexpected(n, _)` |

### The `unexpected` Combinator
The `unexpected` combinator fails immediately, but produces a given name as
the unexpected component of the error message with a caret as wide as the
given integer. For instance:

```scala mdoc:to-string
import parsley.character.char
import parsley.errors.combinator.unexpected

unexpected(3, "foo").parse("abcd")
(char('a') | unexpected("not an a")).parse("baa")
```

There are a few things to note about the above examples:

* Just using `unexpected` alone does not introduce any other components, like
  expected items, to the error
* When the caret width is unspecified, it will adapt to whatever the
  caret would have been for the error message
* The *named* items resulting from the combinator *dominate* other kinds of
  item, so that `char('a')`'s natural "unexpected 'a'" disappears

### The `fail` Combinator
In contrast to the `unexpected` combinator, which produces *vanilla* errors, the
`fail` combinator produces *specialised* errors, which suppress all other
components of an error in favour of some specific messages.

```scala mdoc:to-string
import parsley.character.string
import parsley.errors.combinator.fail

fail(2, "msg1", "msg2", "msg3").parse("abc")
(fail(1, "msg1") | fail(2, "msg2") | fail("msg3")).parse("abc")
(fail("msg") | string("abc")).parse("xyz")
(fail(1, "msg") | string("abc")).parse("xyz")
```

Notice that if a caret width is specified, it will override any other
carets from other combinators, like `string`. Not specifying a caret
is adaptive. The `fail` combinator also suppressed other error messages,
and merges within itself as if all the messages were generated by one
`fail`.

## Error Enrichment
Other than the freestanding combinators, some combinators are enabled
by importing `parsley.errors.combinator.ErrorMethods`. Some of these
are involved with augmenting error messages with additional information.
These are discussed below.

@:callout(info)
None of the combinators in this section have any effect on `fail` or its
derivatives.
@:@

### The `label` Combinator
When combinators that read characters fail, they produce "expected" components
in error messages:

```scala mdoc:to-string
import parsley.character.{char, string, satisfy}

char('a').parse("b")
string("abc").parse("xyz")
satisfy(_.isDigit).parse("a")
```

Notice that the `satisfy` combinator cannot produce an expected item because
nothing is known about the function passed in. The other two produce *raw*
expected items. The `label` combinator can be used to replace these and generate
*named* items. This is employed by `parsley.character` for its more specific
parsers:

```scala mdoc:to-string
import parsley.errors.combinator.ErrorMethods

val digit = satisfy(_.isDigit).label("digit")
digit.parse("a")
```

The `label` combinator above has added the label `digit` to the parser. If
there was an existing label there, it would have been replaced.

@:callout(error)
A `label` combinator cannot be provided with `""`. In other libraries, this may
represent hiding, however in `parsley`, the `hide` combinator is distinct.
@:@

A `label` combinator, along with other combinators, only applies if the
error message properly lines up with the point the input was at when it
entered the combinator - otherwise, the label may be inaccurate. For example:

```scala mdoc:to-string
val twoDigits = (digit *> digit).label("two digits")
twoDigits.parse("a")
twoDigits.parse("1a")
```

### The `explain` Combinator
The `explain` combinator allows for the addition of further lines of error
message, providing more high-level reasons for the error or explanations about
a syntactic construct. It behaves similarly to `label` in that it will only
apply when the position of the error message matches the offset that the combinator entered at.

```scala mdoc:to-string
import parsley.errors.combinator.ErrorMethods

digit.explain("a digit is needed, for some reason").parse("a")
```

@:callout(error)
A `explain` combinator cannot be provided with `""`.
@:@

### The `hide` Combinator
Sometimes, a parser should not appear in an error message. A good example is
whitespace, which is *almost* never the solution to any parsing problem, and
would otherwise distract from rest of the error content. The `hide` combinator
can be used to suppress a parser from appearing in the rest of a message:

```scala mdoc:to-string
import parsley.errors.combinator.ErrorMethods

(char('a') | digit.hide).parse("b")
```

## Error Adjustment Combinators
The previous combinators in this page have been geared at adding additional
richer information to the parse errors. However, these combinators are used to
adjust the existing information, mostly relating to position, to ensure the
error remains specific.

### The `amend` Combinator
The `amend` combinator can adjust the position of an error message so that it
occurs at an earlier position. This means that it can be affected by other
combinators like `label` and `explain`. This is a precision tool, designed
for fine-tuning error messages.

```scala mdoc:to-string
import parsley.errors.combinator.amend

amend(digit *> char('a')).parse("9b")
```

Notice that the above error makes no sense. This is why `amend` is a precision
tool: it should ideally be used in conjunction with other combinators. For instance:

```scala mdoc:silent
import parsley.syntax.character.charLift
import parsley.combinator.choice
import parsley.character.{noneOf, stringOfMany}

val escapeChar = choice('n'.as('\n'), 't'.as('\t'), '\"', '\\')
val strLetter =
    noneOf('\"', '\\').label("string char") | ('\\' ~> escapeChar).label("escape char")
val strLit = '\"' ~> stringOfMany(strLetter) <~ '\"'
```
```scala mdoc:to-string
strLit.parse("\"\\b\"")
```

In the above error, it is not *entirely* clear why the presented characters
are expected. Perhaps it would be better to highlight a correct escape
character instead? The `amend` combinator can be used in this case to pull
the error back and rectify it:

```scala mdoc:silent:nest
val strLetter = noneOf('\"', '\\').label("string char") |
                amend('\\' ~> escapeChar).label("escape char")
```
```scala mdoc:invisible
val strLit = '\"' ~> stringOfMany(strLetter) <~ '\"'
```
```scala mdoc:to-string
strLit.parse("\"\\b\"")
```

While the `amend` has pulled the error back, and thanks to the `label` the
error is still sensible, it could be improved by widening the caret and
providing an explanation:

```scala mdoc:silent:nest
import parsley.Parsley.empty
val escapeChar = choice('n'.as('\n'), 't'.as('\t'), '\"', '\\') | empty(2)
val strLetter = noneOf('\"', '\\').label("string char") |
                amend('\\' ~> escapeChar)
                  .label("escape char")
                  .explain("escape characters are \\n, \\t, \\\", or \\\\")
```
```scala mdoc:invisible
val strLit = '\"' ~> stringOfMany(strLetter) <~ '\"'
```
```scala mdoc:to-string
strLit.parse("\"\\b\"")
```

Note, an `unexpected` could also have been used instead of `empty` to good effect.

### The `entrench` and `dislodge` Combinators
The `amend` combinator will indiscriminately adjust error messages
so thay they occur earlier. However, sometimes only errors from some
parts of a parser should be repositioned. The `entrench` combinator
protects errors from within its scope from being amended, and
`dislodge` undoes that protection.

This can be useful if you want an error to be able to *dominate* another one, and then be amended
afterwards, without affecting the original error. This normally has the following pattern:

```scala
val p = amendThenDislodge(1) {
    entrench(q) | r
}
```

In this example, we believe that `r` will produce errors deeper than `q`s, but after it discards
`q`s message should be reset to an earier point. On the other hand, `q` is protected from the initial
amendment, but then is free to be amended again after the `dislodge` has removed the protection.

### The `markAsToken` Combinator
The `markAsToken` combinator will assign the "lexical" property to any error messages that happen within its scope at a *deeper* position than the combinator
began at. This is fed forward onto the `unexpectedToken` method of the `ErrorBuilder`: more about this in [lexical extraction][Token Extraction in `ErrorBuilder`].