Legend Regular Grammar

The regular grammar defines the basic language elements i.e. tokens as certain classes of character sequences like numbers, identifiers, operators and strings.

Each rule defining such a class of character sequences has the following structure:
<Class Type> < [Member group :] Class Identifier [![+|-] Next group to activate]> :: <Regular Expression>

We distinguish six types of classes:

let
Helper class, used to define the more complex tokens
They didn't belong to the language definition.
com
Comments
They didn't belong to the language definition.
tok
Tokens
They represent the regular grammar of the language definition.
ign
Character sequences which should be ignored i.e. skipped by the scanner
They didn't belong to the language definition.
ind
(De)indent tokens
Indent and dedent events will be forwarded to the parser.
Otherwise these character sequences will be skipped by the scanner.
lan
Embedded language tokens
These are special token classes which has been introduced in order to integrate embedded languages.

A regular expression spezifies the character sequences belonging to the class. Such a description usually consists of the following elements and operators:

Expression1 Expression2 ... ExpressionN

Expression1 | Expression2 | ... | ExpressionN

Expression1 - Expression2 - ... - ExpressionN

[ Expression ]

{ Expression } or Expression *

Expression +

Expression N

Expression Minimum , Maximum

Expression Minimum ,

( Expression )

Expression / 'QuotientCharacterset' or "QuotientSequence"

< LeftParanthesis > InnerExpression < RightParanthesis >

<= PatternPrefix > 'PatternCharacterset' < PatternSuffix >

<? PatternPrefix > Pattern token identifier < PatternSuffix >

Class identifier

Abreviation for the corresponding regular expression

"String"

Literal: string / character sequence

'Characterset'

Literal: characterset ( 1 .. )

Case ignore character classes can be spezified with an [I] behind the class identifier.