Thursday, July 10, 2008

The state of the advanced compiler

First a disclaimer:
When I say that I will blog regularly, obviously you should not trust me!
I have realized that I don't want to get better at blogging regularly, since I kind of think it's boring and diverts my focus from the more important stuff, the code. But this does not mean that I will not blog more frequently in the future, I might do that, all I am saying is that I will never promise to do more blogging.
Even if I am not blogging on it, the advanced Jython compiler is making progress, just not as fast as I would like it to... So the current state of things in the advanced branch of Jython is that I have pluggable compilers working, so that I can switch which compiler Jython should use at any given time. This enables me to test things more easily, and to evolve things more gradualy.
I am still revising my ideas about the intermediate representation of code, and it is still mostly on paper. My current thinking is that perhaps the intermediate representation should be less advanced than I first had planed. I will look more at how PyPy does this, and then let the requirements of the rest of the compiler drive the need for the IR.
An important change in diraction was made this week. This came from discussion with my mentor, Jim Baker, and from greater insight into how PyPy and Psyco works, form listening to the brilliant people behind these projects at EuroPython. I had originally intended the advanced compiler to do most of it's work up front, and get rid of PyFunctionTable from the Jython code base. The change in direction is the realization that this might not be the best aproach. A better aproach would be to have the advanced compiler work as a JIT optimizer, optimizing on actual observed types, which will probably give us a greater performance boost. This also goes well with the idea that I have always intended of having more specialized code object representations for different kinds of code.
The way responsibilities will be divided in between object kinds in the call chain is:
  • Code objects contain the actual code body that gets executed by something.
  • Function objects contain references to the code objects. This starts out with a single reference to a general code object, then as time progresses, the function gets hit by different types, which will trigger an invocation on the advanced compiler that will create a specialized version of code, that will also be stored in the function for use when that particular signature is encountered in the future.
    The functions also contain the environment for use in the code. This consists of the closure variables of the function, and the global scope used in the function.
  • Frame objects provide the introspection capabilities into running code as needed by the locals() and globals() functions, and pdb and similar tools. There should be a couple of different frame implementations:
    • Full frames. These contain all of function state. Closure variables, locals, the lot. These are typically used with TableCode implementations of code.
    • Generator frames. These are actually divided into two objects. One generator state object, that (as the name suggests) contain the state of the generator. This is most of what a regular frame contains, except the previous frame in the call stack. The other object is an object that supports contains the previous frame in the call stack, and wraps generator state object to provide the frame interface.
    • Lazy frames. These are frames that contains almost nothing. Instead they query the running code for their state. I hope to be able to have them access the actual JVM stack frames for this, in which case they will be really interesting.
    The function object should be responsible for handling the life cycle of the frame objects, but I have not entierly worked out if the creation of the frame object should be up to the code object or the function object. The code object will know exactly wich implementation to choose, but then again, we might want to have different function implementations as well, so it might make sense to the entire responsibility of frames to functions.
  • Global scope objects. The global scope could be a regular dictionary. But if we have a special type for globals (that of course supports the dictionary interface) we can have code objects observing the globals for changes to allow some more aggressive optimizations, such as replacing Python loops (over range) with Java loops (with an int counter), so that the JVM JIT can perform all of its loop unrolling magic on it.
  • Class objects. These are always created at run time, unlike regular JVM classes which are created at compile time. Since classes are defined by what the locals dictionary looks like when the class definition body terminates it is quite hard to determine the actual class, as created at "define time", will look like. Although in most cases we can statically determine what most of the class will look like.
  • Class setup objects. These are to class objects what code objects are to. These contain the code body that defines a class but also a pre-compiled JVM class that contains what the compiler has determined the interface of the class to be.
    Both class objects and class setup object are fairly far into the future though, and will not be part of the initial release of the advanced compiler. They might in fact never be, if I find that there is a better way of doing things before I get there.
The other interesting change that the advanced compiler project will introduces, after these specialized call path objects are of course the actual optimizations that they enable:
  • Type specializations.
    • Primitive type unpacking. This is of course the first, most simple, and most basic type specialization. When we detect that a specific object (often) is of a primitive number type and used extensively, we can generate a version where the number is unpacked from the containing object, and the operations on the number are compiled as primitive number operations.
    • Direct invocation of object types. When we detect more coplicated object oriented types we can determine the actual type of the object and find the actual method body for the method we are invoking and insert direct linkage to that body instead of going through the dispatch overhead.
  • Inlining of builtins. When we detect that a highly used builtin function is extensively used we can inline the body of that builtin, or invoke the builtin function directly without dispatch overhead.
  • Expression inlining. Some expressions, in particular generator expressions, imply a lot of over head, since they generate hidden functions, that are only invoked localy, quite often at the same place as they are defined. In this case we can immediatley inline the actual action of the expression. So that for exampel a Python expression such as this:
    string = ", ".join(str(x) for x in range(14, 23))
    Could be transformed into the following Java snippet:
    StringBuilder _builder = new StringBuilder();
    for (int _i = 14; _i < 22; _i++) {
    _builder.append(", ");
    string = _builder.toString();

    Or in this particular case, perhaps even constant folded...
  • Loop transformation. As in the case above the loop over the Python iterator range can be transformed to a regular JVM loop. This might just be a special case of builtin inlining though.
The combination of the abstraction and indirection in between the function object and code object and these optimizations are exactly the powerful tool that we need to be able to do really interesting optimistic optimizations while maintaining the ability to back out of such decisions, should they prove to have been too optimistic. All in all providing Jython with a fair improvement in execution speed.

So that's a summery on what the current work in progress is. If have i look into my magic 8-ball and try and predict the future I would say that one interesting idea would be to have the compiler be able to persist the optimized code versions so that the next time the program is executed, the previous optimizations are already there. This would in fact be a good way of supporting the "test driven oprimizations" that you might have heard Jim and/or me rant about. So there is defenetly cool stuff going on. I cant wait to write the code, and get it out there, which is why I hereby terminate this blogging session in favour of some serious hacking!

No comments: