A Solution to a Scheme Project
(expression compiler)
june 27, 2000

About the solution

Upon reading student projects, we found some common defects: We tried to avoid the above defects in our solution.

Note that if a client (customer) buys an expression evaluator, he may not wish to pay the price for a potatoe evaluator! In other words, our customer will only submit expressions to his  function, and does not want the eval-expression function to check whether the expression is really an expression. This is especially bad with a recursive function!

On the other hand, the client will not complain if he is given, as a free bonus, an additional function such as:

(define (eval-anything anything)
  (if (expression? anything)
      (eval-expression anything)
      (error "This is not an expression!")))
that will not deteriorate his eval-expression function.

Back to top.

Expressions

We implement expressions as if it were a data type. The type hierarchy was suggested by the grammar: Note that we do not define any expression constructor, since we do not need to build expressions. The use of defined accessors in lieu of ad hoc car, cadr and caddr functions makes the code more readable, at very little cost (in terms of execution speed). In fact, there would be absolutely no speed loss if we had implemented accessors as plain synonymous, e.g.,
(define operator cadr)
or as macros, such as
(define-macro (operator compound-expression)
  `(cadr ,compound-expression))
provided that were using a Scheme compiler. We preferred
(define (operator compound-expression)
  (cadr compound-expression))
because it carries more information (the type of the argument), like the macro, and because it is compatible with the trace function, unlike the macro. We were not really concerned with speed.

Evaluation of an expression (see eval-expression function) is implemented using only one test to distinguish between the case of an identifier from the case of a compound expression. Code factorization is achieved by the identifier->function function, which turns each of the four possible operators into a Scheme function.

Note the use of the general purpose key-value function (defined in environments.scm file) which retrieves the value of an identifier in a given environment, more generally, of a key in an association list. This function was defined in a separate file for the sake of modularity, because it is also needed for evaluating machine instructions.

Source file: expressions.scm.

Back to top.

Machine instructions

Here too, we associate a datatype with the notion of instruction. We do not associate a datatype with the notion of a program, because a program is just a list of instructions, and the list datatype (with car and cdr accessors) is already provided in LISP. Note that some languages feature parameterized types, such as "list[instruction]": this is not the case of Scheme.

One could object that since an instruction also is a list, it is no more necessary to exhibit an instruction datatype than a program datatype. This is untrue, because an instruction is a list of fixed length, with heterogeneous contents, whereas a program is a list of variable length, with homogeneous contents (each element is an instruction).

For an instruction, we define three accessors and a constructor, the latter because the compiler needs it in order to produce instructions.

Note the modular approach to program evaluation: eval-program uses eval-instruction.

Code factorization is achieved by means of the code->function function which turns each of the possible machine codes (ld, add, sub, mult, div) into an appropriate function of type [number, number -> number]. This allows a uniform treatment of each instruction in eval-instruction: no test is needed (not even for ld)!

Source file: machine.scm.

Back to top.

Compiler

The main function, compile-reg, performs only one test to determine the nature of the expression: identifier or compound-expression. Code factorization is achieved by means of the operator->code converter.

The "backquote" notation is used in compile-reg function to produce a list of instructions: note that backquotes are not restricted to macros. This is just a matter of taste however: a combination of append and list would be just as readable.

Source file: compiler.scm.

Back to top.

Imperative machine code evaluator

The subject was not very explicit about the exec macro-expansion, with one example only. However, the example suggested that the expansion consist of three parts:
  1. a series of define forms to set up the environment;
  2. a series of define forms to initialize some registers with the () value (otherwise subsequent ld expansions might crash at execution time);
  3. the program itself.
We found that this suggestion was not good enough. Assume that the environment defines some registers, not just memories. Note that the subject gave an example of such an environment:
((A . 12) (B . 24) (C . 36) (R2001 . 367))
Then part 2 of the expansion would destroy the contents of registers initialized by part 1.

So we decided to remove part 2, and let each ld instruction macroexpand into a define form. For instance

(ld R5 C)
macroexpands into
(define R5 C)
instead of
(set! R5 C)   .
Another difficulty is caused by possible double definitions in some environment: only the first definition is meaningful. We designed the trim-environment function to keep only first definitions of identifiers.

Source file: exec.scm.

Back to top.