Next Previous Contents

2. A walk-through applying Styx

2.1 The language definition

Both the regular as well as the context free grammar of the language is combined into one source. Keyword tokens need not be defined separately within the lexical grammar, but are instead extracted from the context free part of the definition. All grammatical information is contained in one file with the extension ".sty".

To give a small example how this looks like, see the calculator language below.

; [calc.sty] Grammar "Calculator"

Language calc

Regular Grammar

  ign Ign         = ' \n\r'          ; "white" characters
  tok Tok         = '()+-*/'         ; one character tokens
  tok Int         = ('0'..'9')+      ; Integer
  tok Wrd         = "end"

Context Free Grammar

start Cmd
:exp: Exp
:end: "end"

let Exp  :ign0: Exp1
:add : Exp  "+" Exp1
:sub : Exp  "-" Exp1

let Exp1 :ign0: Exp2
:mlt : Exp1 "*" Exp2
:div : Exp1 "/" Exp2

let Exp2
:neg : "-" Exp2
:ign0: "(" Exp ")"
:int : Int

Notes on the source above.

2.2 The derived depth grammar / term algebra

Applying Styx, we derive the follow depth grammar (transformed to abstract types) from calc.sty

; [calc.abs] Types of 'calc' Terms

LANGUAGE calc

TOKENS

  Int

TYPES

  calc    = Start_Cmd(Cmd)

  Cmd     = end;
            exp(Exp)

  Exp     = mlt(Exp, Exp);
            int(Int);
            neg(Exp);
            sub(Exp, Exp);
            div(Exp, Exp);
            add(Exp, Exp)

Some notes apply to this:

2.3 Testing the language definition

Even without writing down a single line of C code, one can already test the language. With the following test string given in a file, we can test both the scanner and the parser separately, yielding the following results.

1+2*(3-4)/5

calc-example:000001:001 Int     : 1
calc-example:000001:002 Tok     : +
calc-example:000001:003 Int     : 2
calc-example:000001:004 Tok     : *
calc-example:000001:005 Tok     : (
calc-example:000001:006 Int     : 3
calc-example:000001:007 Tok     : -
calc-example:000001:008 Int     : 4
calc-example:000001:009 Tok     : )
calc-example:000001:010 Tok     : /
calc-example:000001:011 Int     : 5

Derivation Tree from Source : calc-example

[calc.Start_Cmd (1,1)
 [Cmd.exp (1,1)
  [Exp.add (1,1)
   [Exp.ign0 (1,1)
    [Exp1.ign0 (1,1)
     [Exp2.int (1,1)
      [Int (1,1) "1"]]]]
   [Keyword (1,2) "+"]
   [Exp1.div (1,3)
    [Exp1.mlt (1,3)
     [Exp1.ign0 (1,3)
      [Exp2.int (1,3)
       [Int (1,3) "2"]]]
     [Keyword (1,4) "*"]
     [Exp2.ign0 (1,5)
      [Keyword (1,5) "("]
      [Exp.sub (1,6)
       [Exp.ign0 (1,6)
        [Exp1.ign0 (1,6)
         [Exp2.int (1,6)
          [Int (1,6) "3"]]]]
       [Keyword (1,7) "-"]
       [Exp1.ign0 (1,8)
        [Exp2.int (1,8)
         [Int (1,8) "4"]]]]
      [Keyword (1,9) ")"]]]
    [Keyword (1,10) "/"]
    [Exp2.int (1,11)
     [Int (1,11) "5"]]]]]]

As one can see from the parser test result, full source information is maintained. Not only that the keywords are preserved, but also the starting positions of both the productions and the terminals are kept in the derivation tree for later reference to the source, may be for diagnostics, may be for other purposes.

Note that the source tree shown by the parser test is the internal representation, which is bound to the concrete surface grammar as specified in the calc.sty file. One may have access to this representation, of course, but usually, the compiler writer will give preference to the abstract (depth) grammar as given by the (generated) calc.abs above.

2.4 The C language interface

Styx provides a proper C language interface for this abstract grammar, by means of a mapping convention. As soon one knows the mapping by heart, the C interface (header) file is of few use. One will typically prefer working with the .abs file for reference purposes. This becomes clear when having a look at the C interface file below, which is much longer then the (content-identical) .abs file:

/* ------------------------------------------------------------------------ */
/*                                                                          */
/* [calc_int.h]               Language Interface                            */
/*                                                                          */
/* ------------------------------------------------------------------------ */

/* File generated by 'ctoh'. Don't change manually. */

#ifndef calc_int_INCL
#define calc_int_INCL


#include "ptm.h"
#include "gls.h"


#ifdef __cplusplus
extern "C" {
#endif


/* --------------------- symbol objects - init & quit --------------------- */

void calc_initSymbols();               /*                                   */
void calc_quitSymbols();               /*                                   */

/* -------------------------- Types & Constants --------------------------- */

AbstractType( calc );

AbstractType( calcCmd  );
AbstractType( calcExp  );

/* --------------------------- Access to Tokens --------------------------- */

c_bool Tcalc_Int(GLS_Tok x);           /*                                   */

/* --------------------------- Access to Terms ---------------------------- */

c_bool calc_calc(PT_Term x, calc* x1);   /*                                 */
c_bool calc_Cmd(PT_Term x, calcCmd* x1); /*                                 */
c_bool calc_Exp(PT_Term x, calcExp* x1); /*                                 */

/* --------------------------------- calc --------------------------------- */

c_bool calc_Start_Cmd(calc x, calcCmd* x1)
#define calc_Start_0   calc_Start_Cmd
;


/* --------------------------------- Cmd ---------------------------------- */

c_bool calcCmd_end(calcCmd x);              /*                              */
c_bool calcCmd_exp(calcCmd x, calcExp* x1); /*                              */

/* --------------------------------- Exp ---------------------------------- */

c_bool calcExp_mlt(calcExp x, calcExp* x1, calcExp* x2); /*                 */
c_bool calcExp_int(calcExp x, GLS_Tok* x1);              /*                 */
c_bool calcExp_neg(calcExp x, calcExp* x1);              /*                 */
c_bool calcExp_sub(calcExp x, calcExp* x1, calcExp* x2); /*                 */
c_bool calcExp_div(calcExp x, calcExp* x1, calcExp* x2); /*                 */
c_bool calcExp_add(calcExp x, calcExp* x1, calcExp* x2); /*                 */


#ifdef __cplusplus
}
#endif

#endif

The interface will not be explained in full length in this walk-through, instead only two relevant sections will be highlighted:

To each variant within the discriminated union of the Exp terms, one destructor (in notions of algebra, not of C++) is provided. The naming convention of them is "LanguageType_Variant". Their first argument is the term to rip apart. The remaining arguments are variables for the parts of the decomposition. The result of the functions is a boolean value that becomes true if the destructor applies to the considered term, i.e. if we have used it on the right variant.

2.5 Using the interface

With this preparation we can look at the source of the evaluator. This is pretty straight forward. Speaking in notions of abstract algebra, the following C function is the canonical evaluation homomorphism on Exps. It maps the type of Exps to C integers by assigning a corresponding C function of the C integer algebra to every function of the term algebra of Exps. Additionally, since an integer literal is also provided in the language, the integer denotation is mapped onto it's meaning.

int evalExp(calcExp ex)
{ calcExp x1; calcExp x2; GLS_Tok x3;
  if (calcExp_mlt(ex, &x1, &x2)) return evalExp(x1) * evalExp(x2);
  if (calcExp_div(ex, &x1, &x2)) return evalExp(x1) / evalExp(x2);
  if (calcExp_add(ex, &x1, &x2)) return evalExp(x1) + evalExp(x2);
  if (calcExp_sub(ex, &x1, &x2)) return evalExp(x1) - evalExp(x2);
  if (calcExp_neg(ex, &x2))      return             - evalExp(x2);
  if (calcExp_int(ex, &x3))      return atoi(GLS_Tok_string(x3));
  BUG;
}

Together with a few lines to initiate and apply the Styx parser, the above function forms the desired calculator.

Please note that the full advantage of Styx over yacc is not expressed by this walk-trough. One can do this trivial example likely efficient with yacc's semantic actions. Note especially that the above evaluator is in contrary to yacc not hooked into the parser, but instead applied on the result of the parsing process. When the term was evaluated by evalExp the parser has already been gone, having done it's purpose by leaving a term derived from the source.

The advantage of Styx's design immediately becomes apparent as soon as one adds functions to the example grammar and defines an interpreter on top of it. Using Styx one would then find everything prepared as needed, while lex/yacc would require to do all the things that Styx offers implicitly. Perhaps we will extend the example appropriately in the next version of the document. Meanwhile we leave it as an exercise to the reader. (Hint: Use the symbols and finite maps described below to move along easily.)


Next Previous Contents