Infinity - An extensible programming language

Get the Infinity compiler (along with an example file and some basic libraries) and the LLVM interpreter here and unzip the archives into a folder of your choice.
Open a command prompt, switch to the folder with the content of the archives and type

Open the file example.ic and scroll all the way down. There you'll find the main function of the example program which will call the sum function defined above. All of the syntax used to express the main and the sum function can be found in the very same file, starting at the top (along with various rules that are just implemented for testing). Especially the rules for the while and the plus minus assign statements are of interest, since they introduce completely new type check and code generation semantics (but keep in mind that everything is still in development!).

Using the Compiler

Programming with Infinity

NOTE: the Infinity language and the available compiler are in active development and thus are subject to frequent changes. Therefore the informations provided here represent only the current state of development and may change in the future!

Basics

To define a new rule, at least the name of the rule and its structure has to be given. The structure of a rule is described in an EBNF-like syntax, providing elements for sequences, choices, options, repetitions, exceptions, groups, rule references and tokens. The following example defines a simple rule called print rule, containing only a sequence of a keyword, a colon and a string or a number (ignore the top level statement stuff for now; string and number are predefined token rules, see below):


rule<top level statement>:
    my syntax rule = "my syntax", ":", ( string | number );

To use this rule, simply type in the next lines:


my syntax: 42
my syntax: "foo"

To check the parsed syntax tree of this file, invoke the compiler with the -tree command line option. It will result in the following structure:


translation unit
|---<repetition> : statements
    |---my syntax rule
        |---[my syntax]
        |---[:]
        |---42
    |---my syntax rule
        |---[my syntax]
        |---[:]
        |---"foo"

Each line in the tree represents a node in the resulting syntax tree. The top node is the parsed translation unit (also a predefined rule), which consists of a collection (repetition) of top level statements. The child nodes of the repetition are the two parsed syntax rules that have been specified at the beginning of the parsed file. Each of them consists of three further nodes: the two parsed tokens (my syntax and the colon) and - according to the given choice - a number or a string token.

The following structure elements are available at the moment:

Available Structure Elements


Element              Syntax             Description

Choice               X | Y | Z          matches any of the given structures
Sequence             X, Y, Z            matches the given structures in the given order (has precedence over Choice)
Exception            X - Y              matches the given structure X, but not Y
Option               [ X ]              matches the given structure, if possible
Repetition           { X }              matches the given structure repeatedly
Group                ( X )              matches the given structure; used to override precedences, e.g.: X, ( Y | Z )
Rule reference       rule name          matches the given syntax rule or any defined syntax rule below the given one in the hierarchy (see below)
Token                "token string"     matches the given token

Whitespaces and newlines will be ignored during the matching (in the future, some mechanism will be implemented to instruct the parser to pay attention to newlines and whitespaces). Exceptions have precedence over sequences, which in turn have precedence over choices.

Structure Properties

Each syntax structure within a rule definition can be given some properties to either identify the corresponding parsed syntax tree node later on during the type check and code generation or to give the matching algorithm some additional informations for the rule matching process. Structure properties are given as identifier-value-pair in angle brackets after each structure; for some properties, the value might be omitted. Consider the following example for structure properties:


rule:
    simple variable declaration = identifier<name="type name">, "element", { identifier<name="var name"> }<name="var names", separator=",", min=1>;

Here, the two identifiers are given a property "name" with a value according to the semantic meaning of that identifier. Additionaly, the repetition structure has been given three properties, two of them being unique to repetitions. The following properties are currently available:


Property        Value Type        Description

name            string            gives a syntax structure a name to identifier it later on
separator       string            [only repetitions] defines the separator token required between elements of the repetition
min             number            [only repetitions] minimum count of elements to be found for a repetition to match
max             number            [only repetitions] maximum count of elements to be found for a repetition to match

Rule Hierarchy

Defined syntax rules are ordered hierarchically, much like classes in object oriented programming languages.

Inheritance

Each defined rule can be given a parent rule from which it inherits various informations (multiple inheritance may follow in the future). Moreover, if a rule is sub-rule to another rule, the sub-rule can be matched at each position in the code where actually the parent rule is expected. Let's have a look at an example for this behavior. Supposing a given rule my rule


rule:
    my rule = "parent";

a child rule can be defined as follows:


rule<my rule>:
    my child rule = "child";

Supposing a third rule my top level statement


rule<top level statement>:
    my top level statement = "parse", ":", my rule;

the following lines would both be valid code, since any rule in the rule hierarchy equal to or below my rule may occur after the colon:


parse: parent
parse: child

The corresponding syntax tree will look as follows, showing that the correct rules have been parsed:


translation unit
|---<repetition> : statements
    |---my top level statement
        |---[parse]
        |---[:]
        |---my rule
            |---[parent]
    |---my top level statement
        |---[parse]
        |---[:]
        |---my child rule
            |---[child]

Abstract Rules

To extend the idea of rule inheritance even further, it is also possible to define so-called abstract rule. An abstract rule is basically a rule without a syntax body. Thus it can never be parsed, but it can be referenced in other rule definitions. Abstract rules can be derived from other abstract rules, but - in contrast to e.g. Java - also from normal rules. Consider the following example (it should be self-explanatory by now):


rule:
    my abstract rule;
	
rule<my abstract rule>:
    my first abstract rule implementation = "this";
	
rule<my abstract rule>:
    my second abstract rule implementation = "that";
	
rule<top level statement>:
    my rule = "parse", ":", my abstract rule;
	
parse: this
parse: that

Required Children

Each rule and abstract rule can define a set of required children which have to be provided by each rule implementing this rule. Required children can be defined as follows:


rule:
    my rule or abstract rule;

    requires:
        foo = identifier;
        bars = { identifier<name="bar"> };
        baz;

Each rule that wants to inherit from my rule or abstract rule has to implement the three children foo, bar and baz with at least the required structure (if any is given). A rule has to implement the required children of all of its parent rules. The name of each child has to be the name of the implementing syntax structure [in the future, nameless children might be allowed]. A valid rule derived from my rule or abstract rule might look as follows:


rule<my rule or abstract rule>:
    my derived rule = identifier<name="foo">, { identifier<name="bar">, some other referenced rule }<name="bars", separator=",", min=1>, ";"<name="baz">;

Semantics and Code Generation

The definition of new syntax rules alone is not sufficient to have a full-fledged compiler; it is also required to give them some form of meaning. Infinity supports three ways to define semantical meaning and code generation logic for a rule or abstract rule, but all tree boil down to the same idea. All three mechanisms apply both to rules and to abstract rules; thus, mentioning of the term "rule" also implies "abstract rules" in the following chapters.

Functionality Inheritance

The easiest way to achieve this goal is by inheritance. Each rule can define its functionality which will be passed on to any derived rule. Thus, existing rules can be used to provide new syntax for already known functionality. This mechanism is the key for defining any syntax with Infinity since the core language comes only with syntax rules for the new rule definitions and some abstract rules for basic functionality (see Predefined Abstract Rules and Predefined Rules). The following example makes use of some of this predefined basic abstract rules to define a simple function definition, a return statement and a sum expression (rule parents ending in "base" are predefined rules; "reduced expression" is also one which reduces its parsed syntax tree to its only child, is possible):


rule<variable declaration base>:
    common variable = { identifier<name="variable name"> }<name="variable names", min=1, separator=",">, "element", type specifier<name="variable type">;

rule<function base>:
    function = identifier<name="functionName">, 
               ":",
               "(",
               { variable declaration base<name="parameter"> }<name="parameters", separator=",">, 
               ")",
               "->",
               type specifier<name="returnType">, 
               ":",
               statement<name="functionBody">;
	
rule<void type specifier base>:
    void type specifier = "(", ")";

rule<simple type specifier base>:
    simple type specifier = identifier<name="typeName">;

rule<number expression base>:
    number expression = number<name="number">;

rule<var ref base>:
    varRef = identifier<name="ref">;

rule<reduced expression>:
    literal = varRef | number expression base;
	
rule<add base>:
    add = literal<name="expr1">, { ( "+", literal<name="expr2"> ) }<name="children">;
		
rule<return base>:
    return = "return", [ add<name="value"> ], ";";

With this simple set of rule definitions, the following code can be written ("N" is a predefined natural type):


f: ( n1, n2 element N ) -> N:
    return n1 + n2 + 1;

This function is fully usable and can be called with two natural parameters, returning the incremented sum of them. All the functionality required to achieve this is provided by the used (in this case: predefined) rules. The predefined rules even provide proper type checking, thus the code


g: () -> Z:
    return n1;

will lead to some error messages like the following:


Error: .\web.ic (75|10): Unknown type specifier: 'Z'
Error: .\web.ic (76|12): Unknown identifier: 'n1'

Functionality Definition

The functionality methods for a rule can not only be inherited from parent rules but can also be defined by the programmer itself. A set of methods can be given for each rule to express its meaning; each of the available methods can be added independently to a rule, overwriting potential methods inherited from parent rules. Each available method inherently defines a parameter "this" which will refer to the parsed syntax tree node described by the containing rule definition and which may be used to retrieve the child nodes of this node. Additionally, a parameter "parser" will be defined that references a parser object which in turn provides some functionality required later on. Below is a list of the methods (refered to as "rule methods" in the following) supported by the Infinity language.

Note: All further examples will use a basic language that comes with the Infinity compiler as a library and which should be self-explanatory. It will be syntax-highlighted for convenience.

postparse

This function will be called for each node in a translation unit after the unit has been parsed completely. It will be called first for the children of a node, than for the node itself. The function requires a syntax tree node as return value, which will replace the originally parsed syntax tree node (and its children) with the returned one (returning a null value of the original node itsel will not change anything). The method allows to transform a syntax structure into another which possibly has already some functionality attached to it; thus it is the second possibility after function inheritance to define the meaning of a rule. Consider the following example, which will transform a parsed do-while statement into its corresponding while statement (consider the used rules as given) [Beware: at its current state of development, the compiler is lacking the support for the new expression, so the following code won't work at the moment]:


rule<statement>:
    do while statement = "do", ":", statement<name="body">, "while", ":", expression<name="condition">, ";";

    postparse:
        compoundStatementNode, whileStatementNode element SyntaxTreeNode;

        compoundStatementNode <- new SyntaxTreeNode( "compound statement" );
        whileStatementNode <- new SyntaxTreeNode( "while statement" );

        whileStatementNode.addChild( this.getChild( "condition" ) );
        whileStatementNode.addChild( this.getChild( "body" ) );

        compoundStatementNode.addChild( this.getChild( "body" ).clone() );
        compoundStatementNode.addChild( whileStatementNode );
		
        return compoundStatementNode;

Although the postparse function will be called for the parsed nodes only after the whole translation unit has been parsed, there is one exception to this rule: the postparse function (and all further functions) will be immediately called for all parsed rule methods after the rule has been parsed to allow the functions to be used instantly.

check

The check function can be used to compute the type a syntax tree node (e.g. an expression) would evaluate to, hence it requires a type as return value ("type" as in "type information"; not to be confused with a declared type like a class). The type of a node can be computed either during the check function call or during a precheck call and stored within the node for later use (see Information Storage). An important thing to note is that the check function will not be called for all nodes automatically; instead the programmer has to call it manually for the child of a node for which he wants to know the type. A typical use of the check function would look as follows (the "parser" parameter has to be passed on to the check function at the moment):


rule:
    my add = expression<name="lvalue">, "+", expression<name="rvalue">;

    check:
        lvalueNode, rvalueNode element SyntaxTreeNode;
        lvalue, rvalue element Type;

        lvalueNode <- this.getChild( "lvalue" );
        rvalueNode <- this.getChild( "rvalue" );

        lvalue <- lvalueNode.check( parser );
        rvalue <- rvalueNode.check( parser );

        if lvalue = rvalue:
            return lvalue;
        else:
            parser.emitError( "Non-matching types in 'add'", this.getPosition() );
            return null;

There exists a default implementation for the check functions, which will simply call the check function of all children of a node and then return a void type. The default function can be defined as follows:


rule:
    var decls = "decls", ":", { variable declaration }<min=1>;

    check = default

postcheck

The postcheck function is similar to the postparse function with the only difference that it will be called after the code has been checked. Code transformations at this point are possible (e.g. if a rule shall use the code generation functionality of another rule but requires specific type checking) but have to be handled with care, since the generated code will not be checked again [in future, a second precheck-check run might be inserted to counter this drawback].

generate

The generate function finally can be used to generate concrete code for a syntax rule; to generate executable code, the Infinity compiler uses the LLVM Compiler Infrastructure. The LLVM framework comes with a code generator that enables the user to easily create code in an intermediate language (the so-called LLVM IR or LLVM intermediate representation). This code generator can be utilized through an easy-to-use API; all the functions of this API are also available within the Infinity code (further informations about using the API can be found on the LLVM homepage). To generate code using the LLVM API, the generate functions have to return the LLVM values generated by the API. Like the check functions, the generate functions of the children of a node have to be called manually. A simple example for the code generation, which will - when executed - generate a simple null-terminated LLVM string, would look as follows:


rule:
    magic string = "magic string";

    generate:
        return LLVMConstString( "foo", 3, 0 );

Similarly to the check function, a default function can be defined for the generate function, which will simply call the generate function of all children of a node and then return a null value. The definition of the default function is the same as for the check function.

Passes

Passes are used to gather various information about the program before the actual type checking occurs. This is espacially important in languages where the order of statements (e.g. the order of function declarations within a class) shall be of no importance. A pass will be run on all nodes of the parsed syntax tree, unless specified otherwise. For a pass to have an effect on a specific syntax tree node, the corresponding rule must provide a method with an appropriate implementation or the pass must provide a default method for all nodes.

Definition

Passes are implemented similar to rules by giving them a name:


pass {
    my pass;
}

Implementation

If a syntax tree node generated by a rule shall be evaluated by the pass, the rule has to provide an implementation for this pass. This can be done in the following way:


rule:
    my rule = identifier<name="name">;

    implement pass: my pass = {
        printf( this.getChild( "name" ).toString() );
    }

Or, if the pass implementation consists only of a single statement:


rule:
    my rule = identifier<name="name">;

    implement pass: my pass = printf( this.getChild( "name" ).toString() );

Restrictions

Passes can be restricted to certain rules, so that they will only be executed on nodes which are generated by one of this rules or sub-rules. Restrictions can be given as follows:


rule: my restriction rule 1;
rule: my restriction rule 2;

pass {
    my restricted pass;
	
    restrict to: my restriction rule 1, my restriction rule 2;
}

Default Implementation

Each pass can define a default implementation, which will be executed for all nodes that do not specify a pass implementation on their own. Since the default implementation will be executed for all nodes the pass will be executed on, it should be used with care (and in most cases only for restricted passes). The default implementation can be given as follows (the this variable will point to the node for which the pass will be executed):


pass {
    my default pass;
	
    default implementation = printf( this.toString() );
}

Order

Passes can specify the order in which they are executed. This is done by listing all the passes that should be executed before and after a certain pass. If a configuration of before and after specifications leads to a non-resolvable order of passes, the compiler will emit an error. The before and after passes can be specified as follows, leading to a configuration, where pass 4 is always executed before pass 1, but after pass 2 and pass 3; the order of pass 2 and 3 is at random:


pass { my pass 1; }
pass { my pass 2; }
pass { my pass 3; }

pass {
    my pass 4;

    before: my pass 1;
    after: my pass 2, my pass 3;
}

Propagation

Passes can be propagated through the syntrax tree in two manners, by either running the pass first on the children of a node and then on the node itself or by running it first on the node itself and then on the children. Passes can be specified to use one of this propagation modes. Additionally, a third mode can be specified, which indicates that no propagation shall be done; as soon as a pass has been run on a node (i.e. the nodes pass implementation or the pass' default implementation have been triggered), it will not be propagated to nodes below that node in the tree. The propagation mode of a pass can be specified as follows:


pass { 
    my pass 1;

    propagation: children last;
}

pass { 
    my pass 2;
    
    propagation: children first;
}

pass { 
    my pass 3;
    
    propagation: none;
}

Available Passes

Currently, the following passes are available by default:


pass {
    collect types;

    propagation: children first;
}

pass {
    check variables;

    propagation: children first;
    after: collect types;
}

pass {
    check functions;

    propagation: children first;
    after: check variables;
}

Self-extending capabilities of Infinity

It is important to keep in mind that rules, which have previously been defined, can also be used to define the functionality of new rules. Consider the following code and note how the just defined rules if and return are used to define the functionality of the new rule my choice (consider the member function call and comparison operation as already defined in just the same way):


rule<if base>:
    if = "if", expression<name="condition">, ":", statement<name="ifBody">, [ "else", ":", statement<name="elseBody"> ];

rule<return base>:
    return statement = "return", expression<name="value">, ";";	
	
rule<statement>:
    my choice = "choose", "(", ( "left"<name="left"> | "right" ), ")", identifier<name="A">, "|", identifier<name="B">;

    postparse:
        if this.getChild( "left" ) /= null:
            return this.getChild( "A" );
        else:
            return this.getChild( "B" );

Information Storage

[to be added]

Error reporting

[to be added]

Predefined Abstract Rules

[to be added]
[See the example file for the usage of the currently available abstract rules.]

Predefined Rules

[to be added] [See the example file for the usage of the currently available rules.]

Predefined Types

[to be added] [See the example file for the usage of the currently available types.]