Monday, May 16, 2011

Embedded scripting language in ActionScript

Update 7/16/2011 I have posted code for this.

I have recently been working on various projects both for myself and others but one thing which I have been focused on is creating a scripting language for the game I am working on. A lot of my entries so far have been about spin-off projects relating to the work I am doing for my ActionScript game and this is no different.

I knew that I wanted to create a scripting language that resembled C in it's structure. Ultimately, the idea was to have a function based language, however, where functions are written out in a style similar to C but there are special calls that can be made throughout the script and functions are either called by triggers on the map (handled by the game engine) or specially, such as scripts to run automatically at such points as level start, level end, et cetera.

The actual format of the scripting language is being handled separately. I have an ANTLR grammar that I am working on to parse the actual code and compile it and I will maybe post about that in the future, but this post documents handling and interpreting the intermediate representation which is what my compiler will eventually output from the C-like input.

My intermediate representation is an abstract syntax tree represented in XML. An abstract syntax tree is a representation of computational instructions and decisions in tree form that makes traversing source code much easier for a machine than directly parsing a more human readable sort of source code such as C. This XML representation is what my ActionScript interpreter actually parses.

You may wonder what the point is of having ActionScript once again parse code that has been translated ("compiled" in a sense) already from source code. Isn't this just making the code go through hoops without really doing anything? The reasoning is that the human readable source code (C style) needs to be error checked and compiled into a definitive tree structure of statements to execute. This tree structure happens to store well in XML format as well as ActionScript happens to parse XML through a built in library with ease. Ultimately, the compiler I am going to write creates an abstract syntax tree from a C like source code structure and saves the tree as XML. This abstract syntax tree is easier to digest than raw source code so parsing it is simply using an XML library to traverse the tree structure the file has as opposed to parsing a C style file which involves sorting out expressions and actually making decisions on how to handle a much looser syntax.

The actual ActionScript interpreter doesn't actually handle code "live", so to speak, from XML. This would be extraordinarily slow for operation handling. The interpreter actually contains a whole bunch of parts, one actual "interpreter" which loads and parses the XML file into memory and a virtual machine class that handles the execution of statements and other calls. Examples are the best way to illustrate how everything works.

void function startGame() {
 displayMessage("Display me!");
 for (int i = 0; i < 10; i++) {
  displayMessage(i);
 }
}

This becomes (in XML):

<script>
 <scriptFunction type="void" id="startGame">
  <statements>
   <statement type="displayMessage">
    <param id="1">
     <atom type="string"><![CDATA[Display me!]]></atom>
    </param>
   </statement>
   <statement type="for">
    <param id="1">
     <statement type="variableDeclaration">
      <param id="1">
       int
      </param>
      <param id="2">
       <atom type="variable">i</atom>
      </param>
     </statement>
     <statement type="setVariable">
      <param id="1">
       <atom type="variable">i</atom>
      </param>
      <param id="2">
       <atom type="int">0</atom>
      </param>
     </statement>
    </param>
    <param id="2">
     <statement type="compare">
      <param id="1">
       less_than
      </param>
      <left>
       <atom type="variable">i</atom>
      </left>
      <right>
       <atom type="int">10</atom>
      </right>
     </statement>
    </param>
    <param id="3">
     <statement type="setVariable">
      <param id="1">
       <atom type="variable">i</atom>
      </param>
      <param id="2">
       <statement type="arithmetic">
        <param id="1">
         add
        </param>
        <left>
         <atom type="variable">i</atom>
        </left>
        <right>
         <atom type="int">1</atom>
        </right>
       </statement>
      </param>
     </statement>
    </param>
    <statements>
     <statement type="displayMessage">
      <param id="1">
       <atom type="variable">i</atom>
      </param>
     </statement>
    </statements>
   </statement>
  </statements>
 </scriptFunction>
</script>

Update 5/23/2011 It should be noted that in the above XML I represent i++ and other increments as directly equivalent to i += 1;. This is a mistake because increment and decrement should be atomic and their action changes based on being a pre or post operation. i++ is different from ++i; the latter increments i and then returns it whereas the former returns i and then increments it. This means that instead of translating it into a simpler form, it simply would maintain the increment or decrement operator as a function in the AST, such as:

<statement type="increment">
 <param id="1">
  post
 </param>
 <param id="2">
  <atom type="variable">i</atom>
 </param>
</statement>

post is the type of increment, either post or pre.

The XML looks bloated and that is because it is a much more definitive way of stating what the C code says intuitively. As you can see, assignment and variable declaration as in int i = 0 becomes two separate statements (a variable declaration and variable assignment). This is an example of how the XML is more definitive; once again it really is just a tree of statements to handle.

The XML is loaded into the ActionScript interpreter and parsed into memory as ActionScript Array of statements. An ActionScript Array can be associative and also may hold values of varying types (as opposed to a C primitive array). The top level of the array contains all of the functions The representation of the function defined above looks like the following (note that this does not include the top level of the array which is an associative array of all functions which would contain only the above function plus some meta data for this example):

|-0 [statement]    #Top level statement 1
|--0 displayMessage   #Statement id ("displayMessage" statement)
|--1 [atom]    #Parameter 1 for statement, which is atom
|---0 String    #This atom is of type String
|---1 "Display me!"   #Atom value is String literal: "Display me!"
|-1 [statement]    #Top level statement 2
|--0 for    #Statement id ("for" statement)
|--1 [param]    #Parameter 1 of "for" statement
|---0 [statement]   #Statement 1 of parameter 1 of "for" statement
|----0 variableDeclaration  #Statement id ("variableDeclaration")
|----1 int    #Parameter 1; this says we are declaring an "int"
|----2 $var_i    #Parameter 2; Name of variable w/ $var_ prefix
|---1 [statement]   #Statement 2 of parameter 1 of "for" statement
|----0 setVariable   #Statement id ("setVariable" statement)
|----1 $var_i    #Parameter 1; this says we want to set "$var_i"
|----2 [atom]    #Parameter 2; atom to set variable to
|-----0 int    #This atom is of type int
|-----1 0    #Atom value is 0
|--2 [param]    #Parameter 2 of "for" statement
|---0 compare_left_less_than_right #Statement id
|---1 [atom]
|----0 variable    #Atom is a variable
|----1 $var_i    #Atom is $var_i
|---2 [atom]
|----0 int    #Atom is an int
|----1 10    #Atom has value of 10
|--3 [param]    #Parameter 3 of "for" statement
|---0 [statement]   #Statement 1 of parameter 3 of "for" statement
|----0 setVariable   #Statement id
|----1 $var_i    #Set variable $var_i
|----2 [param]    #Value to set $var_i to
|-----0 add    #Statement id (add)
|-----1 [atom]    #Param 1 is atom
|------0 variable
|------1 $var_i
|-----2 [atom]    #Param 2 is atom
|------0 int
|------1 1
|--4 [statements]   #Statements to execute inside of loop
|---0 [statement]   #First statement to execute
|----0 displayMessage   #Statement id is "displayMessage"
|----1 [atom]    #Param 1 is an atom
|-----0 variable
|-----1 $var_i

If we start of the top and execute each statement, going as deep as possible before moving down the list, we are essentially running depth-first through a tree. This is essentially what ends up happening except that all the statement IDs are wired up to special functions written in the virtual machine class. This means that a for loop, for example, is not executed by calling an ActionScript "for", but by calling a function that actually executes the first parameter and then runs a while loop that depends on the second parameter and inside of the while loop, loops through all statements the "for" loop is supposed to execute and then, finally, executes the third parameter.

/* Imagine
for (statement[1]; statement[2]; statement[3]) {
 statement[4][0];
 statement[4][1];
 ...
 statement[4][n];
}
*/
private function handle_for(statement:Array):void {
 handleParameter(statement[1]);
 incrementScope();
 while (handleParameter(statement[2])) {
  for (var i:uint = 0; i < statement[4].length; i++) {
   handleStatement(statement[4][i]);
  }
  handleParameter(statement[3]);
 }
 decrementScope();
}

This is a trimmed down version of the code. As you can see, the "for" loop is simulated by using a while loop and diving depth first into all of the parameters fed to the "for" loop. handleParameter() handles parameters by processing statements/atoms/expressions what-have-you and returning results, if necessary. In this example, it is black box just to demonstrate the principle behind how a statement handler works, specifically one that needs to replicate functionality such as looping and doesn't just call something elsewhere in the code like a displayMessage() might.

You may notice incrementScope() and decrementScope(). This is the next important part of the virtual machine: keeping track of variables. Variables don't just need to be handled in terms of storing data, however. They also need to be scoped properly; in other words, entering a new statement block increases the scope so all new variables enter the new scope and leaving a block pops that scope. I have chosen to make for loops contain their own scope (although variables in the for loop parameters are outside of the internal for loop scope and in it's parent instead).

I store my variables in associative arrays (built into ActionScript) and these associative arrays live on a stack of scopes or collections of variables. I also separately store variable type in a similar fashion. The first scope is the global scope because it always exists on the stack. Anytime a new scope is entered, a scope is pushed on the stack and all variable declarations automatically declare within the top-most scope. When a scope is left, it is popped off of the stack (it's contents are lost but worthless at that point anyway). However, variables can be called from parents scopes so all variable declaration checking, getting and setting involves looping through the scopes from top most to bottom most looking for the variable. This means that a variable is looked for in the top most scope but if it isn't found it is looked for in every other scope up until the global at which point if it isn't found, the appropriate errors or return values are used to declare a variable does not exist.

So far the interpreter is fairly young, but hopefully soon I will have an ANTLR based compiler and even more functionality in the virtual machine on ActionScript and the scripting language will be suitable for my game.

2 comments:

  1. I like ANTLRs but I prefer BOUFFALANTS :D

    ReplyDelete
  2. Thank you for this article and I have a question concerning your future articles about cloud technologies - any plans to write about virtual data rooms designed for business, or only personal clouds? Thank you in advance for your reply!

    ReplyDelete