Legend Regular Grammar
The regular grammar defines the basic language elements i.e. tokens as
certain classes of character sequences like numbers, identifiers, operators
and strings.
Each rule defining such a class of character
sequences has the following structure:
<Class Type>
<
[Member group :] Class Identifier [![+|-] Next group to activate]>
:: <Regular Expression>
We distinguish six types of classes:
-
let
Helper class, used to define the more complex tokens
They didn't belong to the language definition.
-
com
Comments
They didn't belong to the language definition.
-
tok
Tokens
They represent the regular grammar of the language definition.
-
ign
Character sequences which should be ignored i.e. skipped by the scanner
They didn't belong to the language definition.
-
ind
(De)indent tokens
Indent and dedent events will be forwarded to the parser.
Otherwise these character sequences will be skipped by the scanner.
-
lan
Embedded language tokens
These are special token classes which has been introduced in order to
integrate embedded languages.
A regular expression spezifies
the character sequences belonging to the class. Such a description usually
consists of the following elements and operators:
-
Expression1 Expression2 ... ExpressionN
Concatenation of partial expressions
-
Expression1 | Expression2 | ... | ExpressionN
Union of partial expressions ( alternatives )
-
Expression1 - Expression2 - ... - ExpressionN
Difference of partial expressions
-
[ Expression ]
Optional partial expression
-
{ Expression } or Expression *
Iteration of a partial expression ( 0 .. )
-
Expression +
Iteration of a partial expression ( 1 .. )
-
Expression N
Limited iteration of a partial expression ( N-times )
-
Expression Minimum , Maximum
Limited iteration of a partial expression ( min .. max )
-
Expression Minimum ,
Limited iteration of a partial expression ( min .. )
-
( Expression )
Combination of a partial expression ( subexpression )
-
Expression / 'QuotientCharacterset' or "QuotientSequence"
quotient expression
-
< LeftParanthesis > InnerExpression < RightParanthesis >
non-regular dyck expression
-
<= PatternPrefix > 'PatternCharacterset' < PatternSuffix >
start pattern expression
-
<? PatternPrefix > Pattern token identifier < PatternSuffix >
end pattern expression
-
Class identifier
Abreviation for the corresponding regular expression
-
"String"
Literal: string / character sequence
-
'Characterset'
Literal: characterset ( 1 .. )
Case ignore character classes
can be spezified with an [I]
behind the class identifier.