Monday, December 22, 2008

On Iterator composition in Java

In my last blog post I mentioned that I would like to see library support for manipulating iterators included in the JDK. I for one, and I know that more people than me, have a set of iterator manipulation classes that I bring along from project to project, usually by copy/paste. This post is an elaboration on what the things are that I think should be included in the JDK, all communicated through code (The actual implementation is not as interesting as the API, and thus left as an exercise for the reader):

package java.util;

import java.util.iter.*;

public final class Iterators {
public <E> Iterable<E> empty() { ... }
public <S,T> Iterable<T> convert(Iterable<S> iterable, Converter<S,T> converter) { ... }
public <T> Iterable<T> upCast(Iterable<? extends T> iterable) { ... }
public <T> Iterable<T> downCast(Class<T> target, Iterable<?> iterable) { ... }
public <T> Iterable<T> append(Iterable<T>... iterables) { ... }
}

// These classes should preferably be reused from somewhere.
package java.util.iter;

public interface Converter<S,T> {
T convert(S source);
}

public interface Filter<E> {
boolean accept(E element);
}

And some sample uses:

package org.thobe.example;

import java.util.Iterators;
import java.util.iter.*;

class UsesIteratorComposition {
private class Something {}
private class Source extends Something {}
private class Target { Target(Source s) {} }

Iterable<Source> source = Iterators.empty();

void conversions() {
Iterable<Target> target = Iterators.convert(source, new Converter<Source,Target>() {
Target convert(Source source) {
return new Target(source);
}
});
}

void casts() {
Iterable<Something> something = Iterators.upCast(source);
Iterable<Source> back = Iterators.downCast(Source.class, something);
}

void append() {
Iterable<Source> source = Iterators.append(source, new ArrayList<Source>(){
{
add(new Source());
add(new Source());
}
});
}
}

Wednesday, December 17, 2008

On Double Checked Locking in Java

You have probably all heard/seen the “Double-Checked Locking is Broken” declaration. And if you haven't I'll just tell you that before the improved memory model of Java 5, double checked locking didn't work as expected. Even in Java 5 and later there are things you need to consider to get it to work properly, and even more things to consider if you want it to perform well.

Joshua Bloch suggests in his book Effective Java, and keeps repeating in several presentations that there is one way to implement double checked locking and that this code snippet should be copied to every place where it's needed. I disagree. Not with the part about there only being one way of doing it, but with the copying part. In my eyes this is perfect use case for an abstract class:

package java.util.concurrent;

public abstract class LazyLoaded<T> {
private volatile T value;
// Get the value, guard the loading by double checked locking
public final T getValue() {
T result = value;
if (result == null) { // First check
synchronized (this) { // Lock
result = value;
if (result == null) { // Second check
result = loadValue();
value = result;
}
}
}
return result;
}
protected abstract T loadValue();
}

This class would then be used like this:

package org.thobe.example;

import java.util.concurrent.LazyLoaded;

class UsesDoubleCheckedLocking {
// The generics even makes it read well:
private final LazyLoaded<Something>() {
@Override protected Something loadValue() {
return new Something();
}
};
// ... your code will probably do something useful here ...
}

Just expressing my opinion. I hope someone important reads this and makes sure that it gets included in Java7. And while you're at it, make sure we get some standardized ways of composing Iterable/Iterators as well...

Friday, December 05, 2008

About types in Neo4j

When I first started using Neo4j I wondered "Why is there a RelationshipType and no NodeType?", and as more people are introduced to Neo4j I find that this is a quite common question. And of course an obvious question in the strongly typed single inheritance world of Java. This post was inspired by a discussion on the Neo4j mailing list.

The answer to the question is to be found in the name RelationshipType, it means no more than than the name implies. In particular it does not mean data type. What you are wishing for when you ask for a node type is a way of specifying what properties a node or relationship has, based on its type. Neo4j does not provied any mechanism for this at the core layer. There is a meta model component available that gives you data types for nodes and relationships with verification and all of that if you would like it, but you need to explicitly turn it on.

So what are RelationshipTypes then? A relationship type is a navigational feature of Neo4j. It is used to implement what is known in graph theory as edge-labeled multigraphs. This feature makes it a whole lot easier to navigate through a graph that represents application data. Adding similar labels to nodes would not provide any navigational benefit, which is why Neo4j does not implement such a feature.

It is well worth noticing the RelationshipType can be used to implement data types for both relationships and nodes. In a way that allows nodes to have multiple (union) data types.
The way that you implement this in Neo4j is that the data type of a node is determined by the RelationshipType of the relationship is was reached through. A n1 has reached through a relationship r1 with the RelationshipType Ta is said to have the node data type Da, while the same node n1 reached through another relationship r2 with the RelationshipType Tb is said to have the node data type Db. Although it is the same node the application logic will treat it completely different, accessing different properties (with or without overlap). This is a very powerful feature, and not as easy to implement using a label for the nodes to determine the data type of the node.

The design guide for Neo4j recommends that domain model objects are implemented by wrapping nodes and relationships, and that the wrapper class is determined by the type of the traversed relationship as mentioned above. To make this even more simple the Neo4j community have developed a few components that automate this process. Most notably there is neo-weaver that automatically implements an interface or abstract class by the means of getting and setting properties on a node/relationship or by traversing relationships.

Happy Hacking!

Thursday, November 20, 2008

Call frame introspection on the JVM

One of the pain points of implementing Python on top of the JVM, and in my opinion the worst pain point, is the call frame introspection of Python, and how it is abused throughout Python frameworks. There is in fact a library function sys._getframe() that returns the frame of the current function, or a deeper frame in the call stack if passed an integer representing the depth.

For Jython this mean that we always have to keep a heap allocated call frame object around for each function invocation, and do all access to local variables through this object. The performance implications of this is absolutely terrible. So any improvement to call frame management would greatly improve performance.

Thinking about how to better implement call frame objects in Jython I thought “the JVM already manages call frames, and using a debugger I am able to get access to all the aspects of the JVM call frame that I am interested in to construct a Python frame.”

My first idea on how to implement such a thing was to get under the hood of the JVM and add these capabilities there. Said and done, I checked out OpenJDK and started to patch it to my needs. This actually got me to a working prototype, apart from the fact that I failed to tell the garbage collector that I created extra references to some objects, so they ended up not being equal to null but in every other sense behaving like null. I never tracked down where the problem in my code was, since I found another solution. While I was working on this I found code in the JVM that did things that were surprisingly similar to what I was doing. All of this code was located in a subsystem called JVMTI. I looked further into the documentation of the JVM Tooling Interface, and found that it had been around since JVMv1.5 (commonly referred to as Java 5). This was great news! If I could create a JVM frame introspection library using JVMTI I would have a library that works for all the JVM versions that we target for the next release of Jython.

It took a while before I actually started implementing the frame introspection library, there were other tasks with higher priority, I read up the documentation of JVMTI and there was also the issue with the build process for JNI-libraries being much more painful than for the nice Java stuff I'm used to, since JNI is C code. The last problem I solved over a weekend two weeks ago by implementing a generic make file and an ant task that feeds make with the required parameters of that makefile. This took me a full weekend, since my makefile skills were never that great to begin with, and even rustier than my C skills. I got it to work for my Mac though, and have yet to test it on other platforms (Linux will probably be tested and working before the weekend). This ant task is hosted on Kenai: https://kenai.com/svn/jvm-frame-introspect~svn/jnitask/.

Armed with a good way of building JNI libraries I met up with Charles Nutter in Malmö, where he was speaking at the Øredev conference, for three days. We managed to get the library working while hacking at various coffee shops in Malmö, it should still be polished a bit but I've published it in my personal svn repository on Kenai: https://kenai.com/svn/jvm-frame-introspect~svn/frame/ for everyone who wants to take a look. Trying it out should be as simple as:
svn co https://kenai.com/svn/jvm-frame-introspect~svn/frame/ javaframe
cd javaframe
ant test

The expected output is a few printouts of stack frame content.

The call frame introspection library gives you access to:

  • Access to one call frame at a given depth.
  • Access to the entire stack of call frames for the current thread.
  • Access the stack of call frames (without local variables) for any set of threads, or all threads, in the JVM.
  • Get the reflected method object that the frame is a representation for.
  • Local variables (if this information is added to the class file by javac)
    • The number of local variables.
    • The names of the local variables.
    • The signatures of the local variables.
    • The values of the (currently live) local variables.
    • Locals may be accessed either by name or by index.
      There is also a getThis() method as a shorthand for getting the first local variable, or null if the method is static.
  • Get the current value of the instruction pointer.
  • Get the current line number (if this information is added to the class file by javac).

Update 2008-11-23: I've added support for getting stack traces of call frame snapshots from multiple threads in the JVM. The JVMTI guarantees that the stack traces of all threads are captured at the exact same point of execution. There are method calls for getting traces for a given set of threads or for all JVM threads. These methods do not provide access to the local variables in the frames, since there is no way (apart from suspending the threads) to guarantee that the frame depth is the same at the point of capturing the stack trace as at the point of acquiring the local variables.
I've also made sure that the build scripts work under Linux as well as Mac OS X, and added licensing information (I use the MIT License for this). Also, by request, cleaned up the paragraphs of this entry.

It would be great to get input from my peers on what more information you would like to access from the call frames. The Java bytecode of the method perhaps?
Charles and I also talked about what else we can use JVMTI for, and he was quite enthusiastic about the ability to walk the entire Java object graph, for implementing the objectspace feature of Ruby. One idea would be to write a library that brings the entire functionality of JVMTI to the Java level. The only problem with this would be that a lot of the JVMTI operations don't make much sense unless they are invoked in conjunction with other operations, and that many of the operations are not callback safe, meaning that we cannot allow the execution of arbitrary Java code in between the JVMTI operations. But it should be possible to create an abstraction with a more Java-esque API that performs multiple JVMTI operations at the JNI level.

Another interesting aspect of using JVMTI from Java code is that we can prototype a lot of the functionality that we want the JVM to expose to us directly, and thereby vote with code on what we want the JVM of the future to look like.

I hope you will find this useful!

Update: I have move this project to Kenai, the links have been updated accordingly.

Thursday, September 11, 2008

Meta post: Querying languages versus APIs

This is a meta post about the differences between having a querying language and an API for accessing a database. I want to analyze the pros and cons of both approaches. I believe there are benefits of both models, and that they both have weaknesses.
These are my initial ideas on what to discuss in the post:
  • An API provides a type safe way of operating with the data model.
  • An API cannot expose query injection weaknesses.
  • Queries to in a query languages are handled completely internally in the engine, and can therefore be optimized more easily.
Please comment with references on querying languages and your opinion on the pros and cons of the two approaches.

Meta post: Introducing the concept of meta posts

I am introducing a new kind of post in this journal. Meta posts. I have a list of posts that I want to write, each of these are going to be long, and take a while to write. Therefore I thought it would be easier for me, and lead to better posts, if I could get comments on them before I write them. That way I can find out what aspects of the topic to discuss as well as get some references that might be useful for the post.

These meta posts will carry the tag meta post and I will greatly appreciate any comment on these posts. One of the key features of the meta posts is that I will keep them short, which is why this post ends here.

Monday, July 14, 2008

My JVM wishlist, pt. 1 - Interface injection

If you've been following what's going on in the JVM-languages community you have probably stumbled across John Roses blog. One of his entries was about interface injection. In short wordings interface injection is the ability to at runtime add an interface to a class that was not precompiled as implementing that interface.
Interfaces are injected at one of 3 situations:
  • When a method of the interface is invoked on an object.
  • When an instanceof check for the interface is performed on an object.
  • When the interface is queried for via reflection (I can see this working with Class#isAssignableFrom, but I have my doubts when it comes to Class#getInterfaces, although I'm sure someone smart will be able to solve this without having to know about all injectable interfaces beforehand).
When any of these occur a class can either have the interface implemented already (in the regular way) or a special static injection method on the interface class is invoked. It is up to this injection method to determine if the given class can implement the interface or not. If it determines that the class can implement the interface any missing methods have to be supplied at that time. John suggests that these methods are to be supplied as method handles. Since method handles, according to the EDR of the InvokeDynamic proposal (JSR 292), can be curried this would make it possible to attach an extra state object to the returned method handle, or to return a different implementation of the interface depending on the class they are injected to. Once an injection method of a specific interfaced has been invoked for a specific class, the injection method will never be invoked for that class again, meaning that once a class has been found to not implement an interface, that information will be final, and once an implementation of an interface has been supplied, this implementation can never be changed. This is important since it will allow the VM to perform all optimizations, such as inlining, as before.

What can this be used for?
As a language implementer on the Java platform I think interface injection would be a blessing. In fact I think it is the one thing that would simplify the implementation of languages on the JVM the most. Any language probably has a base interface (as Java has java.lang.Object and Jython has org.python.core.PyObject), let's be unbiased and call it "MyLangObject" for the sake of the continued discussion. There are two things that make the Java platform great:
  1. There are a lot of really good toolkits an libraries implemented for the Java platform.
  2. There are a lot of great languages for the Java platform in which even more great libraries and toolkits will be developed.
Therefore, if you are implementing a language for the Java platform you would want to interact with all of these libraries and toolkits. Problem is that most of them haven't been designed with your language in mind, and they shouldn't be. If MyLangObject was an injectable interface all you would need to do to be able to integrate with any object from another language would be to just interact with it through the MyLangObject interface, and the injection mechanism would take care of the rest.
The injection mechanism could even be used with the classes within your language. Instead of having a base class supplying the default implementation of the methods of MyLangObject you could let the injection method return the default implementation for your methods.
Or why not use interface injection to support function invocation with different argument counts. Each function in your language would implement a set of call methods, one for each argument count it can be invoked with. Your language would then have a set of injectable Callable interfaces one for each argument count that any function in your language can be invoked with, each with only one call method, with the appropriate number of arguments. These interfaces could be generated at runtime if your language supports runtime code generation. The default implementation of the call method in each Callable interface would of course raise an exception, since the function obviously doesn't support that argument count if it doesn't implement the appropriate method.
Interface injection really does provide a huge set of opportunities.

How could interface injection implement a Meta Object Protocol?
There is a great project initialized by Attila Szegedi of the JVM-languages comminuty to create a common meta object protocol (MOP) for all languages on the JVM. With interface injection this would be a simple task.
  1. Let all objects in your language (that supports the MOP) implement the (probably not injectable) interface java.dyn.SupportsMetaObjectProtocol. An interface with only one method:
    java.dyn.MetaObjectProtocol getMetaObjectProtocol();
    This would return the implementation of the java.dyn.MetaObjectProtocol interface for your particular language.
  2. The java.dyn.MetaObjectProtocol contains methods for getting method handles for each dynamic language construct that the community have agreed to be a good common construct, such as getters and setters for subscript operations. These method handles would come from the actual implementation of them for your particular language, and would therefore benefit from every imaginable optimization you have cooked up for your language.
  3. When the main interface of my language is being injected into a class from your language it finds that your class implements java.dyn.SupportsMetaObjectProtocol and uses that to get the method handles for all dynamic language constructs supported by my language, rebinding them them to the method names used in my language.
And as simple as that interface injection has been used to implement a common ground for all languages on the Java platform with absolutely no overhead. I'm not saying that this is the way to implement a meta object protocol for the Java platform, I am just suggesting one way to do it, someone a lot smarter than me might come up with a much better implementation.

To sum things up: I can't wait until the JVM supports interface injection.

Edit: this post has also been re-posted on Javalobby.