This is to become the section about Example02. For convenience, we first include the related README here, literally
Here we have the first non-trivial example of a Styx application. It is a somewhat complete little programming language approximately of the complexity of LISP, called PL0 as usual.
The example demonstrates to use of the derivation tree as a source representation beyond parsing. Here, we use it to keep the definitions of functions available for execution.
Additionally, full static and dynamic semantics of the language is implemented to introduce the use of the "Map" and "symbol" data type together with other handy library routines as the tree iterator and the PT_error routine.
An (atypical) use of the pretty printing abilities is also provided.
The profiling webbed into the example gives an impression of the efficiency of the whole library material. Note that the interpreter in this example is not optimized for speed.
With a total of about 250 lines of C code and 100 lines for the grammar, which took about 4 hours to be written from scratch including debugging, this example also shows how efficient a compiler/interpreter author can be with Styx.
Execute the example program 'testpl0.pl0' by 'pl0 testpl0.pl0' or adjust the path in the first line of 'testpl0.pl0', set the executable flag and call it directly.
Prepare the reader for a lengthy chapter introducing some compiler writing methodology, too
Following, we dissect the concrete grammar of the language;
; [pl0.sty] Grammar "pl0" - a toy language Language pl0
There is not much worth to notice, here. Comparing with the calculator grammar above, the newly introduced tokens are identifiers and comments.
The comments will not become visible in the abstract derivation tree as indicated by the preceding "com" but well kept in the concrete one. Note the hexadecimal denotation within the comments production, which restricts the comments to 7 bit characters.
Regular Grammar ign Ign = ' \n\r' ; "white" characters tok Tok = ',<=()+-*/' ; one character tokens tok Int = ('0'..'9')+ ; Integer tok Ide = ('a' .. 'z')+ ; Identifier and Keywords com Com = "#" {"\20" .. "\7e"} ; Comments
In the beginning of the grammar the overall structure of a PL0 program is defined to be a sequence of function definitions followed by a sequence of expressions to "run".
The function definitions introduced then, simply gives expressions a name and some arguments to be substituted.
Context Free Grammar start Program :pgm: Dfns Runs let Dfn :fun: "fun" Ide "(" Args ")" "=" Exp let Run :run: "run" Exp
Again referring back to the calculator example, find the expressions be extended by some predicates (les, equ), an if-then-else construction (if), a function call (app) and by variables (ide).
Notice again that we use "ign0" predictions to indicate the binding strength of the operators of the object language. We have put these into the same line as the defined nonterminal symbol to emphasize them.
let Exp :ign0: Exp1 :if : "if" Exp1 "then" Exp "else" Exp let Exp1 :ign0: Exp2 :les : Exp2 "<" Exp2 :equ : Exp2 "=" Exp2 let Exp2 :ign0: Exp3 :add : Exp2 "+" Exp3 :sub : Exp2 "-" Exp3 let Exp3 :ign0: Exp4 :mlt : Exp3 "*" Exp4 :div : Exp3 "/" Exp4 let Exp4 :neg : "-" Exp4 ; Unary minus :ign0: "(" Exp ")" :int : Int ; Literal :var : Ide ; Variable :app : Ide "(" Exps ")" ; Application
The grammar finally ends with the syntax of several lists that were previously used. Notice the occurrence of "cons" and "nil" productions, which hint the grammar abstractor.
; Lists let Args :nil : :cons: Ide Args0 let Args0 :nil : :cons: "," Ide Args0 let Exps :nil : :cons: Exp Exps0 let Exps0 :nil : :cons: "," Exp Exps0 let Dfns :nil : :cons: Dfn Dfns let Runs :nil : :cons: Run Runs
In this notation the above list productions became unnecessary.
start Program :pgm: [ Dfn ... ] [ Run ... ] let Dfn :fun: "fun" Ide "(" [ Ide "," ... ] ")" "=" Exp let Run :run: "run" Exp let Exp :ign0: Exp1 :if : "if" Exp1 "then" Exp "else" Exp let Exp1 :ign0: Exp2 :les : Exp2 "<" Exp2 :equ : Exp2 "=" Exp2 let Exp2 :ign0: Exp3 :add : Exp2 "+" Exp3 :sub : Exp2 "-" Exp3 let Exp3 :ign0: Exp4 :mlt : Exp3 "*" Exp4 :div : Exp3 "/" Exp4 let Exp4 :neg : "-" Exp4 ; Unary minus :ign0: "(" Exp ")" :int : Int ; Literal :var : Ide ; Variable :app : Ide "(" [ Exp "," ... ] ")" ; Application
Having applied the ' styx' program as indicated above, we yield the following abstract grammar.
The most noteworthy fact is, that it is much shorter and ways more handy then the concrete one from which it originates.
Two highlights are to be emphasized. First, like in the calculator grammar, the surface property of binding strength of the operators has been removed. As a result, we gain only a single, handy expression type. The second effect is that the list productions could have been removed completely, leaving only the trailing asterisk ("*") as a list type operator or indicator. Styx is well able to derive lists of lists of any degree, so you are not bound to possible inabilities of the tool here.
/* ------------------------------------------------------------------------ */ /* */ /* [pl0.abs] Abstract Grammar */ /* */ /* ------------------------------------------------------------------------ */ LANGUAGE pl0 TOKENS Int, Ide TYPES pl0 = Start_Program(Program) Program = pgm(Dfn*, Run*) Dfn = fun(Ide, Ide*, Exp) Run = run(Exp) Exp = if(Exp, Exp, Exp); div(Exp, Exp); var(Ide); equ(Exp, Exp); neg(Exp); app(Ide, Exp*); mlt(Exp, Exp); int(Int); les(Exp, Exp); sub(Exp, Exp); add(Exp, Exp)
As an addition to the earlier described development task, we have to admit, that is was not completely right with regard to the use of the abstract grammar, at least during design.
In fact, we design the abstract grammar first, only sketching the surface grammar. The design of a proper concrete grammar is typically a production step by itself. Similar, compare writing a document versus typesetting it.
While a nice depth grammar makes the design handy to the compiler author, a proper surface grammar help much to make the language usable for their users.
A grammar does not help much without a "typical" example. Here is the one from the Example02:
#!/p/bing/pl0 # [test.pl0] A PL0 example "program" # first we define a few operation the hard way. fun add(a,b) = if a = 0 then b else 1 + add(a-1,b) fun times(a,b) = if a = 0 then 0 else add(times(a-1,b),b) fun fact(n) = if n = 0 then 1 else times(n, fact(n-1)) fun profile() = fact(6) # now try the evaluator with primitive ground expressions run 1 run 1+3 run 2*7-1 # now try the evaluator using functions run add(0,3) run add(1,3) run add(7,3) run times(7,3) # following an example for profiling. # It may take a moment to compute, but # executes 2839 function calls and # evaluates a total of 23347 expressions. # on a fast machine, you might want to # increase the argument slightly to gain # a visible effect. Then notice that the # interpreter is not yet optimized for # speed. run profile()
Although the intended meaning of the programs should already be intuitively clear from the preceding language example, we cannot seriously continue without explicifying it at least descriptive.
Hereby we have carefully to distinguish between the Static Semantic, which defines the wellformedness of PL0 programs in a sense, that they can be compiled without problems, and the Dynamic Semantic, the actual meaning of the program when executed.
Reading the article the first time, one might find this section nothing but lengthy and self evident. Be welcome to skip it, but keep in mind to come back to it after having scanned the implementation of the language, since this chapter is nothing but a pre-formalized version of the program to come.
Additionally, it provides a little of the development methodology, we use when designing a language and an interpreter, though mostly provided in the form of an example.
In programming languages, the static semantic typically deals with scope and type rules. Since PL0 is a typeless language (all data will be integer), we have only few to care for:
Requirements like the following are often called "scope" rules. They define the textual range within which a defining identifier can excludes other with the same name. Scope rules guarantee the existence of a proper mapping to be associated with the scope that allows to find a unique definition for that name.
While the scope of the function names is "global" and the names cannot be reused within the whole program, the scope of the arguments is "local", i.e. they can be reused in another function (as we happily do in our example program).
Note that function and variables name are identified in different scopes and their applied occurrences are syntactically so disjunctive, that one can name a variable like a function without provoking possible conflicts. We have not exploit this opportunity in the example, though.
With properly scoped definitions in hand, we can assign applied occurrences to their definitions. The textual region within an identifier can find its definition is sometimes called the "reach". As soon as we have types that contain names (like records or structures), the reach can become hard to determine and is then only recursively definable with the type checker. Our little example is not so difficult, though, and we have only two simple rules:
As a consequence of the above rule, function names can be reached in the "run" expressions of the program, while the use of variable names is completely prohibited there.
Another consequence is, that function definitions do not have to precede their application textually, they can in fact come in any suiting order.
This last requirement is a sort of preview of type checking. Usually, applying an introduced identifier has consequences in the context of its application. We have only a single such fact in PL0:
Though we could easily drop this requirement, for instance by putting default values in omitted arguments and dropping superfluous ones, we choose to request that the proper amount of arguments is in fact passed.
All these conditions to be asserted for a PL0 program to become well-formed have not use for themselves. Instead they are properties that will be used as preconditions (given) before we can come to dynamic semantic, to the meaning of the program itself.
This means that all these properties can safely be assumed and used when defining the dynamic semantic and later actually running a well-formed program.
This is very convenient, since from now on, we do not have to be concerned anymore whether a name is defined or not or if we have enough arguments for the calls. This part is done.
After all those preparations, we can finally define the meaning of everything. Again we do this textually, mechanically passing the productions of the abstract grammar.
The evaluation of an expression always (modulo overflows, division by zero and endless recursion) yields an integer value and the meaning of an expression depends on its production:
Having the semantics defined, writing the interpreter is more or less a direct translation of the English text to C using the terms of the Styx library. So most of this chapter is to introduce the right words and to describe some of the concepts of the Styx library.
All the program fragments below come from the file 'pl0.c' of the Example02 example. One may want to scan through this file to see how these parts fit together into a single program.
simply example how to deal with list, symbol and maps.
static MAP(symbol, pl0Dfn) collectFunctions(pl0Program src, bool emitErrors) /* collect global definitions, emit duplicate errors if required */ { GLS_Lst(pl0Dfn) dfns; GLS_Lst(pl0Dfn) dit; MAP(symbol, pl0Dfn) glo = MAP_newPrimMap(); // global environment bug0( pl0Program_pgm(src,&dfns,_), "program expected"); GLS_FORALL(dit,dfns) { GLS_Tok fid; pl0Dfn dfn = GLS_FIRST(pl0Dfn,dit); bug0( pl0Dfn_fun(dfn, &fid,_,_), "expecting fun Dfn"); if (MAP_defined(glo,GLS_Tok_symbol(fid))) { if (emitErrors) PT_error(fid,"Function '%s' is already defined",GLS_Tok_string(fid)); } else MAP_define(glo,GLS_Tok_symbol(fid),dfn); } return glo; }
simply example how to use the meta-term system to traverse the derivation tree.
static void StaticSemantic(pl0Program src) /* Collect definitions and validate scoping rules */ { PT_Itr it; pl0Dfn d; pl0Exp e; MAP(symbol, pl0Dfn) glo; // global environment. MAP(symbol, void) local; // local environment, a set really. // // Pass 1 // - function names are unique // : collect them in 'glo' for later use // glo = collectFunctions(src,True); // // Pass 2 // - applied function occurences are defined ... // - ... and have the right arity // - formal parameter names are unique // - applied identfiers refer to formal parameters // local = NULL; // only to make gcc happy PT_FORALL(it,src) { PT_Term t = PT_termIT(it); if (PT_stateIT(it) == PT_PRAE && pl0_Dfn(t,&d) ) // start of function definition { GLS_Lst(GLS_Tok) fpl; GLS_Lst(GLS_Tok) fpit; bug0( pl0Dfn_fun(d, _,&fpl,_), "expecting fun Dfn"); local = MAP_newPrimMap(); // create local environment GLS_FORALL(fpit,fpl) { GLS_Tok fp = GLS_FIRST(GLS_Tok,fpit); if (MAP_defined(local,GLS_Tok_symbol(fp))) PT_error(fp,"Parameter '%s' is already defined",GLS_Tok_string(fp)); else MAP_define(local,GLS_Tok_symbol(fp),_); } } if (PT_stateIT(it) == PT_POST && pl0_Dfn(t,&d) ) // end of function definition { MAP_freeMap(local); // drop local environment } if (PT_stateIT(it) == PT_PRAE && pl0_Exp(t,&e) ) // found expression { GLS_Tok fid; GLS_Tok vid; GLS_Lst(pl0Exp) apl; if (pl0Exp_app(e, &fid, &apl)) // applied function { // check for defined occurence if (MAP_defined(glo,GLS_Tok_symbol(fid))) { GLS_Lst(GLS_Tok) fpl; bug0( pl0Dfn_fun( MAP_apply(pl0Dfn,glo,GLS_Tok_symbol(fid)), _,&fpl,_), "fun expected"); // check for matching arity if (GLS_Lst_length(fpl) != GLS_Lst_length(apl)) PT_error(e,"arity error"); } else PT_error(e,"undefined function '%s'",GLS_Tok_string(fid)); } if (pl0Exp_var(e, &vid)) // applied variable { // check for defined occurrence if (!MAP_defined(local,GLS_Tok_symbol(vid))) PT_error(vid,"Undefined variable '%s'",GLS_Tok_string(vid)); } } } MAP_freeMap(glo); }
This "machine" part does the actually interpreter task.
static int calls; // profiling function calls static int evals; // profiling evaluated expression
The function 'eval' uses the generated pl0 language interface to evaluate a pl0 expression.
First the expression type was determined by applying the appropriate destructor. Then dependent on the type the proper operation was applied to the recursively evaluated subexpressions.
The values of variables and actual function parameters are taken from the local context whereas function definitions will be looked up in the global context.
For the profiling task the function tracks the number of function calls and evaluated expression.
static int eval(pl0Exp ex, MAP(symbol,pl0Dfn) glo, MAP(symbol,int) loc) /* a standard expression evaluator */ { pl0Exp ex1, ex2, ex3; GLS_Tok tok; GLS_Lst(pl0Exp) exps; evals++; // profile if( pl0Exp_equ(ex, &ex1,&ex2) ) return eval(ex1,glo,loc) == eval(ex2,glo,loc); else if( pl0Exp_les(ex, &ex1,&ex2) ) return eval(ex1,glo,loc) < eval(ex2,glo,loc); else if( pl0Exp_div(ex, &ex1,&ex2) ) return eval(ex1,glo,loc) / eval(ex2,glo,loc); else if( pl0Exp_mlt(ex, &ex1,&ex2) ) return eval(ex1,glo,loc) * eval(ex2,glo,loc); else if( pl0Exp_sub(ex, &ex1,&ex2) ) return eval(ex1,glo,loc) - eval(ex2,glo,loc); else if( pl0Exp_add(ex, &ex1,&ex2) ) return eval(ex1,glo,loc) + eval(ex2,glo,loc); else if( pl0Exp_neg(ex, &ex1) ) return - eval(ex1,glo,loc); else if( pl0Exp_int(ex, &tok) ) return atoi(GLS_Tok_string(tok)); else if( pl0Exp_var(ex, &tok) ) return MAP_apply(int,loc,GLS_Tok_symbol(tok)); else if( pl0Exp_if(ex, &ex1,&ex2,&ex3) ) return eval(eval(ex1,glo,loc)?ex2:ex3,glo,loc); else if( pl0Exp_app(ex, &tok,&exps) ) { int res; GLS_Lst(GLS_Tok) fpit, fpl; pl0Exp body; MAP(symbol,int) newloc = MAP_newPrimMap(); pl0Dfn dfn = MAP_apply(pl0Dfn,glo,GLS_Tok_symbol(tok)); bug0( pl0Dfn_fun( dfn, _, &fpl, &body), "function expected"); calls++; // profile // evaluate actual parameter list creating new local environment GLS_FORALL(fpit,fpl) { GLS_Tok fp = GLS_FIRST(GLS_Tok,fpit); pl0Exp ap = GLS_FIRST(pl0Exp,exps); MAP_define(newloc,GLS_Tok_symbol(fp),eval(ap,glo,loc)); exps = GLS_REST(pl0Exp,exps); } res = eval(body,glo,newloc); // recursively evaluate function body MAP_freeMap(newloc); // free new local environment return res; } else { PT_error(ex,"unrecognized expression type"); return 0; // fault, but we continue anyway. } }
The main function 'DynamicSemantic' executes the given pl0 program.
First all function definitions will be collected and used as global context.
In the following loop all "runable" expressions will be evaluated and printed with the help of the functions 'eval' and ppExp.
For the convenient iteration of term lists the generic language interface provides the macro 'GLS_FORALL(ListIteratorVariable,ListVariable)'.
static void DynamicSemantic(pl0Program src) /* semantic of the program: evaluate and print each "run" expression */ { GLS_Lst(pl0Run) runs; GLS_Lst(pl0Run) runit; MAP(symbol,pl0Dfn) glo = collectFunctions(src,False); // global environment MAP(symbol,int) loc = MAP_newPrimMap(); // empty local environment bug0( pl0Program_pgm(src,_,&runs), "program expected"); GLS_FORALL(runit,runs) { pl0Exp exp; pl0Run run = GLS_FIRST(pl0Run,runit); bug0( pl0Run_run(run, &exp), "expecting run Run"); calls = 0; evals = 0; // init execution profile printf("running: "); ppExp(exp); // pretty print expression printf(" = %d",eval(exp,glo,loc)); // calculate and print result printf(" [%d calls, %d expressions evaluated]\n",calls,evals); } MAP_freeMap(loc); MAP_freeMap(glo); }
static void ppExp(pl0Exp exp) /* somewhat misused pretty printer */ /* This is only for demonstration purposes, so we don't care to get the */ /* parser table and initialize things here over and over. We do not even */ /* reformat. See [stypp.c] for how to do it the right way. For diagnostic */ /* purposes, one will certainly prefer the PT_print routine. */ { PLR_Tab plr = PLR_get_pl0(); // Get parser table PTP_init(plr); // Init Pretty Printer PTP_pp(exp,stdout); // slightly abused PTP_quit(); // Done Pretty Printer PLR_delTab(plr); // Free parser table }
/* ------------------------------------------------------------------------ */ /* */ /* [pl0.c] PL0 Interpreter */ /* */ /* Copyright (c) 2000 by Doelle, Manns */ /* ------------------------------------------------------------------------ */ #include "stdosx.h" // General Definitions (for gcc) #include "ptm_gen.h" // General Parsing Routines #include "ptm_pp.h" // Pretty Printer #include "gls.h" // General Language Services #include "hmap.h" // Datatype: Finite Maps #include "symbols.h" // Datatype: Symbols #include "pl0_int.h" // grammar interface #include "pl0_lim.h" // scanner table #include "pl0_pim.h" // parser table /* Auxiluary Functions ----------------------------------------------------- */ /* Static Semantics -------------------------------------------------------- */ /* Dynamic Semantic -------------------------------------------------------- */ /* Main Program ------------------------------------------------------------ */ void PL0(string fileid) /* initialize and get source */ { Scn_T scn; Scn_Stream cstream; // scanner table & configuration PLR_Tab plr; PT_Cfg PCfg; // parser table & configuration PT_Term srcterm; // the source term // // init modules // MAP_init(); initSymbols(); pl0_initSymbols(); // // Parse the source file // Scn_get_pl0(&scn); // Get scanner table cstream = Stream_file(scn,"",fileid,""); // Open source file plr = PLR_get_pl0(); // Get parser table PCfg = PT_init(plr,cstream); // Create parser srcterm = PT_PARSE(PCfg,"Program"); // Parse PT_setErrorCnt(PT_synErrorCnt(PCfg)); // Save error count PT_quit(PCfg); // Free parser Stream_close(cstream); // Close source stream Stream_free(cstream); // Free source stream Scn_free(scn); // Free scanner table PLR_delTab(plr); // Free parser table // // done parsing, proceed if no syntax errors // if (PT_errorCnt() == 0) { pl0Program src; // get tree for start symbol bug0( pl0_Start_Program((pl0)srcterm,&src), "Program expected"); // check & execute program StaticSemantic(src); if (PT_errorCnt() == 0) DynamicSemantic(src); } if (PT_errorCnt() > 0) { fprintf(stderr,"Total %d errors.\n",PT_errorCnt()); STD_ERREXIT; } // // release allocated objects // PT_delT(srcterm); pl0_quitSymbols(); freeSymbols(); MAP_quit(); } int main(int argc, string argv[]) { if( argc > 1 ) PL0(argv[1]); else fprintf(stderr,"missing source\n"); BUG_CORE; // check for object lefts over return 0; }