Monday, July 14, 2008

My JVM wishlist, pt. 1 - Interface injection

If you've been following what's going on in the JVM-languages community you have probably stumbled across John Roses blog. One of his entries was about interface injection. In short wordings interface injection is the ability to at runtime add an interface to a class that was not precompiled as implementing that interface.
Interfaces are injected at one of 3 situations:
  • When a method of the interface is invoked on an object.
  • When an instanceof check for the interface is performed on an object.
  • When the interface is queried for via reflection (I can see this working with Class#isAssignableFrom, but I have my doubts when it comes to Class#getInterfaces, although I'm sure someone smart will be able to solve this without having to know about all injectable interfaces beforehand).
When any of these occur a class can either have the interface implemented already (in the regular way) or a special static injection method on the interface class is invoked. It is up to this injection method to determine if the given class can implement the interface or not. If it determines that the class can implement the interface any missing methods have to be supplied at that time. John suggests that these methods are to be supplied as method handles. Since method handles, according to the EDR of the InvokeDynamic proposal (JSR 292), can be curried this would make it possible to attach an extra state object to the returned method handle, or to return a different implementation of the interface depending on the class they are injected to. Once an injection method of a specific interfaced has been invoked for a specific class, the injection method will never be invoked for that class again, meaning that once a class has been found to not implement an interface, that information will be final, and once an implementation of an interface has been supplied, this implementation can never be changed. This is important since it will allow the VM to perform all optimizations, such as inlining, as before.

What can this be used for?
As a language implementer on the Java platform I think interface injection would be a blessing. In fact I think it is the one thing that would simplify the implementation of languages on the JVM the most. Any language probably has a base interface (as Java has java.lang.Object and Jython has org.python.core.PyObject), let's be unbiased and call it "MyLangObject" for the sake of the continued discussion. There are two things that make the Java platform great:
  1. There are a lot of really good toolkits an libraries implemented for the Java platform.
  2. There are a lot of great languages for the Java platform in which even more great libraries and toolkits will be developed.
Therefore, if you are implementing a language for the Java platform you would want to interact with all of these libraries and toolkits. Problem is that most of them haven't been designed with your language in mind, and they shouldn't be. If MyLangObject was an injectable interface all you would need to do to be able to integrate with any object from another language would be to just interact with it through the MyLangObject interface, and the injection mechanism would take care of the rest.
The injection mechanism could even be used with the classes within your language. Instead of having a base class supplying the default implementation of the methods of MyLangObject you could let the injection method return the default implementation for your methods.
Or why not use interface injection to support function invocation with different argument counts. Each function in your language would implement a set of call methods, one for each argument count it can be invoked with. Your language would then have a set of injectable Callable interfaces one for each argument count that any function in your language can be invoked with, each with only one call method, with the appropriate number of arguments. These interfaces could be generated at runtime if your language supports runtime code generation. The default implementation of the call method in each Callable interface would of course raise an exception, since the function obviously doesn't support that argument count if it doesn't implement the appropriate method.
Interface injection really does provide a huge set of opportunities.

How could interface injection implement a Meta Object Protocol?
There is a great project initialized by Attila Szegedi of the JVM-languages comminuty to create a common meta object protocol (MOP) for all languages on the JVM. With interface injection this would be a simple task.
  1. Let all objects in your language (that supports the MOP) implement the (probably not injectable) interface java.dyn.SupportsMetaObjectProtocol. An interface with only one method:
    java.dyn.MetaObjectProtocol getMetaObjectProtocol();
    This would return the implementation of the java.dyn.MetaObjectProtocol interface for your particular language.
  2. The java.dyn.MetaObjectProtocol contains methods for getting method handles for each dynamic language construct that the community have agreed to be a good common construct, such as getters and setters for subscript operations. These method handles would come from the actual implementation of them for your particular language, and would therefore benefit from every imaginable optimization you have cooked up for your language.
  3. When the main interface of my language is being injected into a class from your language it finds that your class implements java.dyn.SupportsMetaObjectProtocol and uses that to get the method handles for all dynamic language constructs supported by my language, rebinding them them to the method names used in my language.
And as simple as that interface injection has been used to implement a common ground for all languages on the Java platform with absolutely no overhead. I'm not saying that this is the way to implement a meta object protocol for the Java platform, I am just suggesting one way to do it, someone a lot smarter than me might come up with a much better implementation.

To sum things up: I can't wait until the JVM supports interface injection.

Edit: this post has also been re-posted on Javalobby.

Thursday, July 10, 2008

The state of the advanced compiler

First a disclaimer:
When I say that I will blog regularly, obviously you should not trust me!
I have realized that I don't want to get better at blogging regularly, since I kind of think it's boring and diverts my focus from the more important stuff, the code. But this does not mean that I will not blog more frequently in the future, I might do that, all I am saying is that I will never promise to do more blogging.
Even if I am not blogging on it, the advanced Jython compiler is making progress, just not as fast as I would like it to... So the current state of things in the advanced branch of Jython is that I have pluggable compilers working, so that I can switch which compiler Jython should use at any given time. This enables me to test things more easily, and to evolve things more gradualy.
I am still revising my ideas about the intermediate representation of code, and it is still mostly on paper. My current thinking is that perhaps the intermediate representation should be less advanced than I first had planed. I will look more at how PyPy does this, and then let the requirements of the rest of the compiler drive the need for the IR.
An important change in diraction was made this week. This came from discussion with my mentor, Jim Baker, and from greater insight into how PyPy and Psyco works, form listening to the brilliant people behind these projects at EuroPython. I had originally intended the advanced compiler to do most of it's work up front, and get rid of PyFunctionTable from the Jython code base. The change in direction is the realization that this might not be the best aproach. A better aproach would be to have the advanced compiler work as a JIT optimizer, optimizing on actual observed types, which will probably give us a greater performance boost. This also goes well with the idea that I have always intended of having more specialized code object representations for different kinds of code.
The way responsibilities will be divided in between object kinds in the call chain is:
  • Code objects contain the actual code body that gets executed by something.
  • Function objects contain references to the code objects. This starts out with a single reference to a general code object, then as time progresses, the function gets hit by different types, which will trigger an invocation on the advanced compiler that will create a specialized version of code, that will also be stored in the function for use when that particular signature is encountered in the future.
    The functions also contain the environment for use in the code. This consists of the closure variables of the function, and the global scope used in the function.
  • Frame objects provide the introspection capabilities into running code as needed by the locals() and globals() functions, and pdb and similar tools. There should be a couple of different frame implementations:
    • Full frames. These contain all of function state. Closure variables, locals, the lot. These are typically used with TableCode implementations of code.
    • Generator frames. These are actually divided into two objects. One generator state object, that (as the name suggests) contain the state of the generator. This is most of what a regular frame contains, except the previous frame in the call stack. The other object is an object that supports contains the previous frame in the call stack, and wraps generator state object to provide the frame interface.
    • Lazy frames. These are frames that contains almost nothing. Instead they query the running code for their state. I hope to be able to have them access the actual JVM stack frames for this, in which case they will be really interesting.
    The function object should be responsible for handling the life cycle of the frame objects, but I have not entierly worked out if the creation of the frame object should be up to the code object or the function object. The code object will know exactly wich implementation to choose, but then again, we might want to have different function implementations as well, so it might make sense to the entire responsibility of frames to functions.
  • Global scope objects. The global scope could be a regular dictionary. But if we have a special type for globals (that of course supports the dictionary interface) we can have code objects observing the globals for changes to allow some more aggressive optimizations, such as replacing Python loops (over range) with Java loops (with an int counter), so that the JVM JIT can perform all of its loop unrolling magic on it.
  • Class objects. These are always created at run time, unlike regular JVM classes which are created at compile time. Since classes are defined by what the locals dictionary looks like when the class definition body terminates it is quite hard to determine the actual class, as created at "define time", will look like. Although in most cases we can statically determine what most of the class will look like.
  • Class setup objects. These are to class objects what code objects are to. These contain the code body that defines a class but also a pre-compiled JVM class that contains what the compiler has determined the interface of the class to be.
    Both class objects and class setup object are fairly far into the future though, and will not be part of the initial release of the advanced compiler. They might in fact never be, if I find that there is a better way of doing things before I get there.
The other interesting change that the advanced compiler project will introduces, after these specialized call path objects are of course the actual optimizations that they enable:
  • Type specializations.
    • Primitive type unpacking. This is of course the first, most simple, and most basic type specialization. When we detect that a specific object (often) is of a primitive number type and used extensively, we can generate a version where the number is unpacked from the containing object, and the operations on the number are compiled as primitive number operations.
    • Direct invocation of object types. When we detect more coplicated object oriented types we can determine the actual type of the object and find the actual method body for the method we are invoking and insert direct linkage to that body instead of going through the dispatch overhead.
  • Inlining of builtins. When we detect that a highly used builtin function is extensively used we can inline the body of that builtin, or invoke the builtin function directly without dispatch overhead.
  • Expression inlining. Some expressions, in particular generator expressions, imply a lot of over head, since they generate hidden functions, that are only invoked localy, quite often at the same place as they are defined. In this case we can immediatley inline the actual action of the expression. So that for exampel a Python expression such as this:
    string = ", ".join(str(x) for x in range(14, 23))
    Could be transformed into the following Java snippet:
    StringBuilder _builder = new StringBuilder();
    for (int _i = 14; _i < 22; _i++) {
    _builder.append(_i);
    _builder.append(", ");
    }
    _builder.append(22);
    string = _builder.toString();

    Or in this particular case, perhaps even constant folded...
  • Loop transformation. As in the case above the loop over the Python iterator range can be transformed to a regular JVM loop. This might just be a special case of builtin inlining though.
The combination of the abstraction and indirection in between the function object and code object and these optimizations are exactly the powerful tool that we need to be able to do really interesting optimistic optimizations while maintaining the ability to back out of such decisions, should they prove to have been too optimistic. All in all providing Jython with a fair improvement in execution speed.

So that's a summery on what the current work in progress is. If have i look into my magic 8-ball and try and predict the future I would say that one interesting idea would be to have the compiler be able to persist the optimized code versions so that the next time the program is executed, the previous optimizations are already there. This would in fact be a good way of supporting the "test driven oprimizations" that you might have heard Jim and/or me rant about. So there is defenetly cool stuff going on. I cant wait to write the code, and get it out there, which is why I hereby terminate this blogging session in favour of some serious hacking!

EuroPython 2008

So I've spent 3 days on EuroPython 2008 in Vilnius, Lithuania with my colleges Jim and Frank, Ted from Sun and of course a lot of other people in the European Python community. Most notably we've spent a fair amount of time talking to the PyPy group. Since they are mostly based in Europe they didn't have a large presence at PyCon.
Jim and I did two talks together. The first one was a tutorial about various ways of manipulating and compiling code in Python, and how to do that while staying compatible across Python implementations. The second was a presentation about the internals of Jython, showing how similar Jython is to CPython internally, and walking through how you go about supporting some of the cool features of Python. We also managed to sneak in some slides about our ongoing work with the advanced compiler and where that will take us (more on that in a later post). From where I was sitting it seemed people were interested in what we were talking about, and I think our presentations were fairly well received.
Yesterday we had a meeting with the PyPy group resulting in a list of tasks on which our two teams are to collaborate. I think this was very interesting and I believe both sides will get substantial benefits from this effort. It is also my hope that this list will not be complete, but that we will find more interesting tasks to collaborate around after the completion of these tasks.
The most important task for us at the moment is to get ctypes ported to Jython and the JVM. This is important for the PyPy team as well since it will make their JVM back end more complete. The way we were outlining it the implementation would be a JNI bridge to _rawffi, the part of ctypes that is implemented in C, and then use the same Python level implementation of the API as PyPy does. Another way of doing it would of course be to use JNA, but I actually think the JNI path might be more maintainable in this case, since PyPy still needs the C version of _rawffi for pypy-c.
Personally I am very interested in the PyPy JIT, and I think that my work on the advanced Jython compiler could be very useful for the PyPy team when they start their effort on a JVM back end of their JIT. I also think that I can use a lot of the ideas they have implemented in their JIT project in the advanced compiler.
I will not go into the entire list of collaboration points, since the PyPy blog post liked above does a good job there, but I would also like to mention the effort of sharing test cases, which I think is highly important.
At the moment Jim and I are in the PyPy sprint room here at EP, and we just have some blogging to do before we get our hands dirty with code.

Tuesday, July 08, 2008

Simple stuff with import hooks in Python

Yesterday Jim an I had a tutorial session at EuroPython about dynamic compilation in Python. We brought up the topics of import hooks (PEP 302) since we have successfully used them as an opportunity to create code dynamically. The code example we demonstrated for that was one of the actual import hooks that we had used in a Jython setup. Even if it was no more than 80 lines of code, it might not have been the most accessible example. A few people asked me afterwords if I had a more simple example, so by public request, here is a simple meta_path hook that prevents the user from importing some modules.
import sys

class Restriction(object):
__forbidden = set()
@classmethod
def add(cls, module_name):
cls.__forbidden.add(module_name)
def find_module(self, module_name, package_path):
if package_path: return
if module_name in self.__forbidden:
return self
def load_module(self, module_name):
raise ImportError("Restricted")

sys.meta_path.append(Restriction())
add = Restriction.add
del Restriction
If we walk through this class from the top we first of all have a set containing the names of the modules that we will not allow the user to import and a method for adding modules to that set.
The next method, find_module, is the method invoked by the import subsystem. It is responsible for locating the requested module and return an object capable of loading it. The arguments passed to find_module are the fully qualified module name and, if the module is a sub module of a package, the path where that package was found. If the module cannot be found one should either return None or raise an ImportError. This is a handy way to delegate to the next import hook or the default import behavior. If it returns a loader object on the other hand, no other import mechanism will be attempted, which of course is useful in this case. So in this implementation we return self if the module name is found to be the one of the modules that we want to prevent the user from importing.
The loader object should implement the method load_module, which takes only one argument, the fully qualified module name and returns the corresponding module or raises an ImportError if the import fails. In this case we know that load_module is only ever invoked if we want the import to fail, therefore we always raise an ImportError.
There really isn't much more to it. It should be noted however that this isn't a good way to implement security restrictions in Python, since it is possible for any user code to remove the import hook from sys.meta_path, but I still think it makes for a good introductory example to import hooks.

Happy hacking!