|
You can think about macros as system of compile-time transformations and automatic generation of code with regard to some rules. It can be used either to automate manipulations performed on similar data-types and fragments of code, add syntax shortcuts to language, optimize and make some computations safer by moving them from runtime to compile-time.
Idea of making simple inline operations on code comes from preprocessor macros, which many languages (especially C, C++) contained since early times of compiler design. We are following them in direction of much more powerful, and at the same time more secure (type-safe), solutions like Haskell Template Meta-programming.
Basically every macro is a function, which takes some fragment of code as parameter(s) and returns some other code. On a highest level of abstraction it doesn't matter if parameters are type definitions, function calls or just a sequence of assignments. Most important fact is that they are not common objects (e.g. instances of defined types, like integer numbers), but their internal representation in compiler (i.e. syntax tree).
Those functions are defined in program just like any other functions. They are written in common Nemerle syntax and the only difference is the structure of data they operate on (we provide special ways to parse and generate syntax trees).
Macros, once defined, can be used to process some parts of code. It's done by calling them with block(s) of code as parameters. This operation is in most cases indistinguishable from common function calls, so programmer using macros won't be confused by unknown syntax. Main concept of our design is to make usage of macros as much transparent as possible. From the user point of view, it is not important if particular parameters are passed to ordinary function or one, which would process them at compile-time and insert some new code in their place.
Writing a macro is as simple as writing common function, except it is preceded by keyword macro. This will make compiler know about how to use defined method (i.e. run it at compile-time in every place where it is used).
Macros can take zero (if we just want to generate new code) or more parameters. They are all some of elements of language grammar, so their type is limited to the set of defined syntax objects. The same holds for return value of macro.
Example:
macro generate_expression () { compute_some_expression (); } |
This example macro doesn't take any parameters and it's used in code by simply writing generate_expression ();. Most important is a difference between generate_expression and compute_some_expression - first one is a function executed by compiler during compilation, while latter is just some common function that must return syntax tree of expression (which is here returned and inserted into program's code by generate_expression).
Definition of function compute_some_expression might look like this:
compute_some_expression () : Expr { if (debug_on) <[ System.Console.WriteLine ("Hello, I'm debug message") ]> else <[ () ]> } |
The examples above shows macro, which conditionally inlines expression printing some message. It's not quite useful yet, but it introduced meaning of compile-time computations and also some new syntax used only in writing macros and functions operating on syntax trees. We've written here <[ ... ]> constructor to build syntax tree of expression (e.g. '()').
<[ ... ]> is used to both construction and decomposition of syntax trees. Those operations are similar to quotation of code. Simply, everything which is written inside <[ ... ]>, corresponds to its own syntax tree. It can be any valid Nemerle code, so programmer doesn't have to learn internal representation of syntax trees in compiler.
macro print_date (at_compile_time) { match (at_compile_time) { | <[ true ]> => print_compilation_time () | _ => <[ WriteLine (DateTime.Now.ToString ()) ]> } } |
Quotation alone allows using only constant expressions, which is insufficient for most tasks. For example, to write function print_compilation_time we must be able to create expression based on value known at compile-time. In next sections we introduce rest of macros' syntax to operate on general syntax trees.
When we want to decompose some large code (or more precisely, its syntax tree), we must bind its smaller parts to variables. Then we can process them recursively or just use them in arbitrary way to construct the result.
We can operate on entire subexpressions by writing $( ... ) or $ID inside quotation operator <[ ... ]>. This means binding value of ID or interior of parenthesized expression to part of syntax tree described by corresponding quotation.
macro for (init, cond, change, body) { <[ $init; def loop () : void { if ($cond) { $body; $change; loop() } else () }; loop () ]> } |
The above macro defines function for, which is similar to the loop known from C. It can be used like this
for (mutable i <- 0; i < 10; i <- i + 1, printf ("%d", i)) |
Later we show how to extend language syntax to make syntax of for exactly the same like in C.
Sometimes quoted expressions have literals inside of them (like strings, integers, etc.) and we want to operate on their value, not on their syntax trees. It's possible, because they are constant expressions and their runtime value is known at compile-time.
Let's consider previously used function print_compilation_time.
print_compilation_time () : Expr { <[ System.Console.WriteLine ($(DateTime.Now.ToString () : string)) ]> } |
Here we see some new extension of splicing syntax where we create syntax tree of string literal from some known value. It is done by adding : string inside of $(...) construct. One can think about it as of enforcing type of spliced expression to literal (similar to common Nemerle type enforcement), but in the matter of fact something more is happening here - real value is lifted to its representation as syntax tree of literal.
Other types of literals are treated in the same way (int, bool, float, char). This notation can be used also in pattern matching. We can match constant values in expressions this way.
There is also similar schema for splicing and matching variables of given name. $(v : var) denotes variable, whose name is equal to value of v (which is of type string).
After we have written for macro, we would like compiler to understand some changes to its syntax. Especially C-like notation
for (mutable i <- 0; i < n; i <- i + 1) { printf ("%d\n", i); sum <- sum + i; } |
In order to do that, we have to define which tokens and grammar elements may form call of for macro. We do that by changing its header to
macro for (init, cond, change, body) syntax ("for", "(", init, ";", cond, ";", change, ")", body) |
syntax keyword it used here to define list of elements forming syntax of macro call. First token must always be an unique identifier (from now on it is treated as special keyword triggering parsing of defined sequence). It is followed by tokens composed of operators or identifiers passed as string literals or names of parameters of macro. Every parameter must occur exactly once.
Parsing of syntax rule is straightforward - tokens from input program must match those from definition, parameters are parsed according to their type. Default type of parameter is Expr, which is just an ordinary expression (consult Nemerle grammar in reference.html). All allowed types of parameter will be described in extended version of reference manual corresponding to macros.