Home > java > Antlr tutorial: Hello Antlr

Antlr tutorial: Hello Antlr

Domain specific languages (DSL) are great tools to communicate with non-programmers. Normally this group includes business users that would like to configure a system / rule using a fluent language (as in – a natural language). It also includes those like my 8 year old neighbor that knows absolutely nothing about programming. He would love to tell the computer how to perform a small series of operations, without delving into the specifics. Coincidentally, I have been reading up on methodologies to approach DSLs and was introduced to ANTLR.

Enter ANTLR:

What my neighbor needs is an English like grammar. This grammar needs to be parsed into something meaningful at runtime. Every time the grammar changes, the parser would need to change too. ANTLR, is a ‘parser generator’. Once a grammar is defined, ANTLR can code-gen a lexer and a parser for this grammar. The lexer identifies tokens in any input that adheres to the grammar and the parser makes sense of these tokens.

A GUI by the name ANTLRWorks helps you generate the grammar and its corresponding lexer+parser code. Lets take a look at an example that asks the computer if a number A is ‘bigger than’ or ‘smaller than’ number B.

Before you delve into the code, it might be a good idea to download the ANTLR library and the related ANTLRWorks language workbench.

The grammar:

Here is the grammar presented in full. Lets analyze it one piece at a time. (Some parts of the grammar are based on examples from ANTLR. You will find it easier to delve into documentation that way. )

grammar SimpleCalc;
 
@header {
  package com.example.antlr;
}
 
 
 
@members {
    private String answer = "";
 
    private void setAns(String num1, String num2)
    {
        Integer a = Integer.valueOf(num1);
        Integer b = Integer.valueOf(num2);
        if(a>b)
        {
            answer = "yes";
        }
        else
        {
            answer = "no";
        }
    }
}
 
 @lexer::header {
  package com.example.antlr;
}
 
/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 
is    returns [String expr]:    'IS' 
                (
                a=NUMBER 'BIGGERTHAN' b=NUMBER {setAns($a.text,$b.text); } 
                | 
                a=NUMBER 'SMALLERTHAN' b=NUMBER {setAns($b.text,$a.text); }
                )
                {$expr=answer;};
 
/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
 
NUMBER    : (DIGIT)+ ;
 
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+     { $channel = HIDDEN; } ;
 
fragment DIGIT    : '0'..'9' ;

Grammar definition:

grammar SimpleCalc;
 
@header {
  package com.example.antlr;
}

This defines the name of the grammar and the header under which the code needs to be generated. Lexer and parser code can be generated under different packages.

Custom java code:

@members {
    private String answer = "";
 
    private void setAns(String num1, String num2)
    {
        Integer a = Integer.valueOf(num1);
        Integer b = Integer.valueOf(num2);
        if(a>b)
        {
            answer = "yes";
        }
        else
        {
            answer = "no";
        }
    }
}

This is a piece of java code. It is embedded into the parser and can be called when a rule is executed. We maintain a private String by the name ‘answer’. When the rule that compares two numbers is invoked, the setAns() method can be called with the numbers in order to compare them.

Lexer rules:

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
 
NUMBER    : (DIGIT)+ ;
 
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+     { $channel = HIDDEN; } ;
 
fragment DIGIT    : '0'..'9' ;

In the lexer section, we identify the bits of the input grammar that the parser can make sense of. A DIGIT ranges from 0-9. When the parser encounters one or more DIGITs, it represents a NUMBER. We also tell ANTLR that any whitespace needs to ignored.

Parser rules:

/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 
is    returns [String expr]:    'IS' 
                (
                a=NUMBER 'BIGGERTHAN' b=NUMBER {setAns($a.text,$b.text); } 
                | 
                a=NUMBER 'SMALLERTHAN' b=NUMBER {setAns($b.text,$a.text); }
                )
                {$expr=answer;};

The parser section of the grammar, defines a ‘is’ operation that returns a String. The pattern to match is ‘IS NUMBER ‘BIGGERTHAN’ | ‘SMALLERTHAN’ NUMBER’. The numbers are stored in variables a and b. The code inside the {} brackets is embedded into the parser, just like the code inside @members. As such this code can call the methods in @members and is executed when an appropriate match is made.

The match ‘IS NUMBER BIGGERTHAN NUMBER’ results in the following call setAns($a.text,$b.text);. When the match is for the ‘SMALLERTHAN’ input, simply reverse the numbers in the method call.

Running the pieces:

Now that the grammar is in place, it can be debugged in ANTLRWorks or code-gened and ported to a java project. Debugging in ANTLRWorks reveals interesting internal details. Once can visualize how ANTLR goes about generating a parse tree for our grammar or what the syntax tree for the ‘is’ rule looks like.

ANTLRWorks:

Syntax tree:

The code-gen route allows one to execute the parser for a user defined input.

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
 
public class CalcTest
{
    public static void main(String... args)
    {
        new CalcTest().go();
    }
 
    public void go()
    {
        try
        {
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRStringStream("IS 19 SMALLERTHAN 20"));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            String eval = parser.is();
            System.out.println(eval);
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    }
}

Modifying bits of the program above would allow the input to be obtained from the console. So my neighbor types in ‘IS 20 BIGGERTHAN 19′ and is happy with the result. Now he wants me to make a language that will print multiplication tables. His homework is due this week :D





Categories: java Tags: , , ,
  1. jbb
    February 21st, 2011 at 11:01 | #1

    The problem with ANTLR is that everything is mixed from grammar to Java code.
    I do prefer SableCC : nothing is mixed.
    1. You defined a grammar, a pure grammar without any Java code.
    2. SableCC generate the lexer/parser and a Java visotor code
    3. You subclass the visitor and here but only here you put your Java code.

    This is a more clear approach. ANTLR look powerful but bloated while SableCC is still powerful but not bloated.
    http://sablecc.org/

  2. March 18th, 2011 at 08:41 | #2

    @jbb
    Interesting. I have never used SableCC and I have no clue about its adoption in the outside world. ANTLR is pretty well adopted.

  3. usharp
    April 8th, 2011 at 17:09 | #3

    I was hunting for good ANTLR example, finally I found it here.

    It really helped me today to under stand ANTLR in true way.

    I was trying to understand since 3 months, none of example made me this clear.

    It will be great a C# version example for the same is available.

    Thx

  4. DEEPAK
    November 17th, 2012 at 12:31 | #4

    Kindly tell me how to install Antlr in eclipse.,and i haven’t any idea to check if it correctly install or not.

  1. No trackbacks yet.