This is the second post in this series about building a function grapher. In the first post I described the anatomy and stages of building. This part will provide a solution for parsing the formula in which a function is described. For example:

\( f(x) = sin(5x)*7^x \) or \( f(x) = e^{cos(x^{sin(x^2)})} \)

These examples were rendered using a \( \LaTeX \) rendering engine. The first one is represented by: sin(5x)*7^x and the second is represented by: e^{cos(x^{sin(x^2)})}. These strings are sent to a LaTeX rendering engine and returned as a gif image. The function grapher job is not that different. It takes a string, converts it into a workable data structure, the expression is evaluated for some points and the set of values is rendered on a canvas element. So instead of rendering the math symbols we evaluate the expression.

Before diving into the implementation, lets define what formulas we are supporting:

  • Functions are of one variable, x.
  • Composed of the following binary operations: Multiplication (*), Subtraction (-), Addition (+), Raising to power (^).
  • Negation is allowed:  -.
  • We allow all rational numbers and two constants: pi and e (Euler constant).
  • It is possible to use special functions like: cos, sin and log.
  • Parenthesis are allowed to bound a certain expression in order to break precedence order.

Parsing

As required in the first post, the parser should be implemented to use only client-side resources. I will use a LALR. The acronym translates to Look Ahead Left to Right which means that the input is parsed from left to right and looks ahead in order to understand what token it is parsing. I started by checking out JS/CC library. The library accepts a grammar in Bachus-Naur form styled syntax which is really intuitive and easy to learn. Manual and syntax specification are detailed in the library’s documentation. After playing around with it a bit I found its API is not mature enough and eventually decided to continue with ANTLR library which is much more mature. ANTLR did not support JavaScript output up until version 3.1 and currently in version 3.3 it does the job, although the JavaScript output is not fully supported in the UI. Like JS/CC it takes a ENBF grammar as an input and outputs a JavaScript lexer/parser code. The API is very good and based on the Java implementation. There are many languages for which ANTLR can generate lexer/parser code to, consult its website for current supported languages. ANTLR also ships with a Java based UI enviorment for writing and testing the grammars. This is not a post about ANTLR and if you wish to learn more about it you can consult its excellent documentation.

After some googling I came across the following http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx. The article presents a grammar which is very close to what a function grapher needs. In addition to numeric math expressions the grammar in the article also supports equalities, inequalities, logical expression and some scripting. I have modified the grammar to suit the needs of a function grapher. Here it is:

grammar Formula;
options
{
	language=JavaScript;
	output=AST;
	ASTLabelType=CommonTree;
 
}
 
tokens
{
	NEGATE;
}
 
expression
	: 	additiveExpression EOF!
	;
 
additiveExpression
	:	multiplicativeExpression ( (PLUS|MINUS)^ multiplicativeExpression )*
	;
 
PLUS	:	'+';
MINUS	:	'-';
 
multiplicativeExpression
	:	powerExpression ( (MULT|DIV)^ powerExpression )*
	;
 
MULT	:	'*';
DIV	:	'/';
 
powerExpression
	:	unaryExpression ( POW^ unaryExpression )*
	;
 
POW	:	'^';
 
unaryExpression
	:	primaryExpression
    	|	MINUS primaryExpression -> ^(NEGATE primaryExpression)
   	;
 
primaryExpression
	:	'('! additiveExpression ')'!
	|	value
	;
 
value
	: 	INTEGER
	|	FLOAT
	|	VARIABLE
	| 	CONST
	|	function
	;
 
VARIABLE
	:	'x'
	;
 
CONST   :	'pi'
	|	'e'
	;
 
INTEGER
	:	('0'..'9')+
	;
 
FLOAT
	:	('0'..'9')* '.' ('0'..'9')+
	;
 
function
	:	IDENT '(' ( additiveExpression (',' additiveExpression)* )? ')' -> ^(IDENT additiveExpression*)
	;
 
IDENT
	:	('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
	;
WS
	:  (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;}
	;

Lets take a step back and break the grammar into smaller pieces. First we have the grammar declaration:

grammar Formula;

This tells ANTLT that the grammar’s name is Formula (the file name must be Formula.g). Next we have the options section:

options
{
	language=JavaScript;
	output=AST;
	ASTLabelType=CommonTree;
 
}

Under options we define that the output language should be javascript and that the output of the parser is AST = Abstract Syntax Tree. It means that we want the output to be a tree. It will be easier to evaluate the function using a tree structure.

The grammar has two parts that are merged in this grammar file: lexicon and rules. The lexer uses the lexicon and the parser uses the rules. The lexer takes the input string and parse it to lexemes (terminals in a grammar language) e.g. INTEGER, FLOAT, MINUS. For example:

VARIABLE
	:	'x'
	;
 
CONST   :	'pi'
	|	'e'
	;
 
INTEGER
	:	('0'..'9')+
	;

All lexemes names are in upper case and rule definitions are in lower case. After the lexer creates a lexemes stream from the input it passes the stream to the parser which builds the syntax tree. This process will be more clear once we get to the code.

ANTLR must have a start rule. It is defined by a template which ends with EOF which is a built in lexeme used to denote the end of the input.

 expression
	: 	additiveExpression EOF!
	;

We also don’t want the EOF to be a node in our final tree so we exlude it by placing a ! symbol after it. We can also rewrite rules like we did for the unary expression:

 unaryExpression
	:	primaryExpression
    	|	MINUS primaryExpression -> ^(NEGATE primaryExpression)
   	;

An unary expression has two options, it is either a primaryExpression or a negated primaryExpression. If it is the second we don’t want to have MINUS in the tree, this will be reserved for a binary operation. So we can rewrite the rule using the above syntax. Note that NEGATE has no lexeme definition, it is declared it the tokens { … } section at beginning of the file. The rest of the grammar is pretty straight forward.

After generating code for the grammar we get two files FormulaParser.js and FormulaLexer.js. We need them both. I found a small bug in the javascript code generator. When generating code with ANTLRWorks (the UI for ANTLR) for this grammar we have two lines which are not necessary and raises an error.

set7=input.LT(1);
set7=this.input.LT(1);

The first line is reduntent and we get an error saying that “input is undefined” – delete the first line. There is another line just like that with 4 instead of 7 – delete that line also. Other than that I did not find any problem with the code.

User Code

Finally we get to the user code. In order to use our grammar the following should be included:

  1. antlr3-all-min.js – The runtime framework for ANTLR javascript output. You can download it from http://code.google.com/p/antlr-javascript/.
  2. FormulaLexer.js – The generated lexer of our grammar.
  3. FormulaParser.js – The generated parser of our grammar.

Once everything is setup we can write the following code:

var input = "5*x+7";
var cstream = new org.antlr.runtime.ANTLRStringStream(bla);
var lexer = new FormulaLexer(cstream);
var tstream = new org.antlr.runtime.CommonTokenStream(lexer);
var parser = new FormulaParser(tstream);
var foo = parser.expression();

First we initialize a stream from the input. Passing it on to the lexer and from there to the parser. Nothing special about it. In order to parse the input we call to parser.expression which is the start rule for our grammar. Each rule has a corresponding function, so we can use only a part of the grammar each time. Now foo is the parsed tree object. For the purposes of the function grapher, I am converting the librarie’s structure to a much simpler one:

function convertToObject(node, parser) {
      var current =
      {
           text: node.getToken().getText(),
           type: node.getToken().getType(),
           typeName: parser.getTokenNames()[node.getToken().getType()],
           children: new Array()
      };
 
      //Add children
      if (node.getChildCount() > 0) {
            var children = node.getChildren();
            for (i in children) {
                    current.children.push(convertToObject(children[i], parser));
            }
       }
 
      return current;
}

Using that structure I have created a test page for the grammar using prettyPrint library which visualize javascript/JSON objects with an interactive interface. You can play here.

Resources

ANTLR Javascript Target