knipknap/Gelatin

View on GitHub
docs/syntax.rst

Summary

Maintainability
Test Coverage
Gelatin Syntax
==============

The following functions and types may be used within a Gelatin syntax.

Types
-----

STRING
^^^^^^

A string is any series of characters, delimited by the :code:`'` character.
Escaping is done using the backslash character. Examples::

    'test me'
    'test \'escaped\' strings'

VARNAME
^^^^^^^

VARNAMEs are variable names. They may contain the following set of
characters::

    [a-z0-9_]

NODE
^^^^

The output that is generated by Gelatin is represented by a tree
consisting of nodes. The NODE type is used to describe a single node in
a tree. It is a URL notated string consisting of the node name,
optionally followed by attributes. Examples for NODE include:

::

    .
    element
    element?attribute1="foo"
    element?attribute1="foo"&attribute2="foo"

PATH
^^^^

A PATH addresses a node in the tree. Addressing is relative to the
currently selected node. A PATH is a string with the following syntax::

    NODE[/NODE[/NODE]...]'

Examples::

    .
    ./child
    parent/element?attribute="foo"
    parent/child1?name="foo"/child?attribute="foobar"

REGEX
^^^^^

This type describes a Python regular expression. The expression MUST NOT
extract any subgroups. In other words, when using bracket expressions,
always use ``(?:)``. Example::

    /^(test|foo|bar)$/         # invalid!
    /^(?:test|foo|bar)$/       # valid

If you are trying to extract a substring, use a match statement with
multiple fields instead.

Statements
----------

define VARNAME STRING|REGEX|VARNAME
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`define` statements assign a value to a variable. Examples::

    define my_test /(?:foo|bar)/
    define my_test2 'foobar'
    define my_test3 my_test2

match STRING|REGEX|VARNAME ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Match statements are lists of tokens that are applied against the
current input document. They parse the input stream by matching at the
current position. On a match, the matching string is consumed from the
input such that the next match statement may be applied. In other words,
the current position in the document is advanced only when a match is
found.

A match statement must be followed by an indented block. In this block,
each matching token may be accessed using the `$X` variables, where `X` is
the number of the match, starting with `$0`.

Examples::

    define digit /[0-9]/
    define number /[0-9]+/

    grammar input:
        match 'foobar':
            do.say('Match was: $0!')
        match 'foo' 'bar' /[\r\n]/:
            do.say('Match was: $0!')
        match 'foobar' digit /\s+/ number /[\r\n]/:
            do.say('Matches: $1 and $3')

You may also use multiple matches resulting in a logical OR::

    match 'foo' '[0-9]' /[\r\n]/
        | 'bar' /[a-z]/ /[\r\n]/
        | 'foobar' /[A-Z]/ /[\r\n]/:
        do.say('Match was: $1!')

imatch STRING|REGEX|VARNAME ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`imatch` statements are like `match` statements, except that matching is
case-insensitive.

when STRING|REGEX|VARNAME ...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

`when` statements are like `match` statements, with the difference
that upon a match, the string is not consumed from the input stream.
In other words, the current position in the document is not advanced,
even when a match is found.
`when` statements are generally used in places where you want to "bail
out" of a grammar without consuming the token.

Example::

    grammar user:
        match 'Name:' /\s+/ /\S+/ /\n/:
            do.say('Name was: $2!')
        when 'User:':
            do.return()

    grammar input:
        match 'User:' /\s+/ /\S+/ /\n/:
            out.enter('user/name', '$2')
            user()

skip STRING|REGEX|VARNAME
^^^^^^^^^^^^^^^^^^^^^^^^^

`skip` statements are like `match` statements without any actions.
They also do not support lists of tokens, but only one single
expression.

Example::

    grammar user:
        skip /#.*?[\r\n]+/
        match 'Name: ' /\s+/ /\n/:
            do.say('Name was: $2!')
        when 'User:':
            do.return()

Output Generating Functions
---------------------------

out.create(PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^^^

Creates the leaf node (and attributes) in the given path, regardless
of whether or not it already exists. In other words, using this
function twice will lead to duplicates.
If the given path contains multiple elements, the parent nodes are
only created if the do not yet exist.
If the STRING argument is given, the new node is also assigned the
string as data. In other words, the following function call::

    out.create('parent/child?name="test"', 'hello world')

leads to the following XML output::

    <parent>
        <child name="test">hello world</child>
    </parent>

Using the same call again, like so::

    out.create('parent/child?name="test"', 'hello world')
    out.create('parent/child?name="test"', 'hello world')

the resulting XML would look like this::

    <parent>
        <child name="test">hello world</child>
        <child name="test">hello world</child>
    </parent>

out.replace(PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Like out.create(), but replaces the nodes in the given path if they
already exist.

out.add(PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^

Like out.create(), but appends the string to the text of the existing
node if it already exists.

out.add_attribute(PATH, NAME, STRING)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Adds the attribute with the given name and value to the node with the
given path.

out.open(PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^

Like out.create(), but also selects the addressed node, such that the
PATH of all subsequent function calls is relative to the selected node
until the end of the match block is reached.

out.enter(PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^^

Like out.open(), but only creates the nodes in the given path if they do
not already exist.

out.enqueue_before(REGEX, PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Like out.add(), but is not immediately executed. Instead, it is executed
as soon as the given regular expression matches the input, regardless of
the grammar in which the match occurs.

out.enqueue_after(REGEX, PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Like out.enqueue_before(), but is executed after the given regular
expression matches the input and the next match statement was processed.

out.enqueue_on_add(REGEX, PATH[, STRING])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Like out.enqueue_before(), but is executed after the given regular
expression matches the input and the next node is added to the output.

out.clear_queue()
^^^^^^^^^^^^^^^^^

Removes any items from the queue that were previously queued using the
out.enqueue_*() functions.

out.set_root_name(STRING)
^^^^^^^^^^^^^^^^^^^^^^^^^

Specifies the name of the root tag, if the output is XML.
Has no effect on JSON and YAML output.

Control Functions
-----------------

do.skip()
^^^^^^^^^

Skip the current match and jump back to the top of the current grammar
block.

do.next()
^^^^^^^^^

| Skip the current match and continue with the next match statement
  without jumping back to the top of the current grammar block.
| This function is rarely used and probably not what you want. Instead,
  use do.skip() in almost all cases, unless it is for some
  performance-specific hacks.

do.return()
^^^^^^^^^^^

Immediately leave the current grammar block and return to the calling
function. When used at the top level (i.e. in the `input` grammar), stop
parsing.

do.say(STRING)
^^^^^^^^^^^^^^

Prints the given string to stdout, with additional debug information.

do.fail(STRING)
^^^^^^^^^^^^^^^

Like do.say(), but immediately terminates with an error.