Friday, August 07, 2009

Java integration in future Jython

I've seen a lot of this lately, so I thought that it was time for an actual Jython developer (myself) to share some ideas on how Java integration in Jython could be improved. At the same time I'd like to propose some changes that could make the different Python implementations more unified, and even could lead to a common Java integration API in all of them.

The most basic part of the Java integration in Jython is the ability to import and use Java classes. This is impossible for other Python implementations to do in the same way, and thus breaks compatibility fundamentally. I therefore propose that we remove this functionality as it is in Jython today (!). Instead we should look at how IronPython enables using CLR (.NET) classes. In IronPython you first need to import clr before you can access any of the CLR types. The same is done in other languages on the JVM as well, for example JRuby where you need to require 'java' before using any Java libraries. I propose we require something similar in Jython, and what better package to require you to import than java?

An observation: The java package in Java does not contain any classes, only sub-packages. Furthermore all the sub-packages of the java package follow the Java naming conventions, i.e. They all start with a lowercase letter. This gives us a name space to play with: anything under the java package that starts with an uppercase letter.

What happens when you import java? The java Python module is a "magic module" that registers a Python import hook. This import hook will then enable you to import real Java packages and classes. In Jython many of the builtin libraries will of course import java, which means that this will be enabled by default in Jython. But writing code that is compatible across Python implementations would now be possible, by simply ensuring that you import java before any other Java packages.

The content of the java module

Most if not all of what is needed to utilize Java classes from Python code is provided by the import hook that the java module registers when it is loaded. This means that the content of the java module needs to deal with the other direction of the interfacing: defining and implementing APIs in Python that Java code can utilize. I propose that the Python java module contain the following:

JavaClass
A class decorator that exposes the decorated class as a class that can be accessed from Java. Accepts a package keyword argument for defining the Java package to define the class in, if omitted it is derived from the __module__ attribute of the class.
Possibly JavaClass should also be the Python type of imported Java classes.
Field
An object for defining Java fields in classes. Takes a single argument, the type of the field. Example usage:
@java.JavaClass
class WithAField:
    data = java.Field(java.lang.String)
Array
An object for defining Java arrays. This is used to define Java array types. Examples:
  • Array[java.Primitive.int] corresponds to the Java type int[]
  • Array[java.lang.String] corresponds to the Java type java.lang.String[]
  • Array corresponds to the Java type java.lang.Object[]
  • Array[Array[java.lang.String]] corresponds to the Java type java.lang.String[][]
Access
A set of Java access definition decorators. Contains:
  • Access.public
  • Access.package - this needs to be explicitly available since it does not make sense as the default in Python code.
  • Access.protected
  • Access.module - for the new access modifier in the upcoming module system (a.k.a. Project Jigsaw) for Java.
  • Access.private
  • The default access modifier should either be public or the absence of an access modifier decorator would mean that the method is not exposed in the Java class at all. This needs further discussion.
Primitive
The set of primitive types in Java:
  • Primitive.void
  • Primitive.boolean
  • Primitive.byte
  • Primitive.char
  • Primitive.short
  • Primitive.int
  • Primitive.long
  • Primitive.float
  • Primitive.double
  • These can be used as type parameters for Array but not for Generic types (Since primitives are not allowed as generic type parameters in Java).
Overload
Used to implement (and define) overloaded methods, several different methods with the same name, but different type signatures. Example usage:
@java.JavaClass
class WithOverloadedMethod:
    @java.Access.public
    def method(self, value:java.lang.String) -> java.util.List[java.lang.String]:
        ...
    @java.Overload(method)
    @java.Access.public
    def method(self, value:java.lang.Integer) -> java.lang.String:
        ...
    @java.Overload(method)
    @java.Access.public
    def method(self, value:java.lang.Iterable[java.lang.String]) -> java.Primitive.void:
        ...

Java classes and interfaces, when imported, are Pythonized in such a way that they can be used as bases for Python classes. Generics are specified by subscripting the generic Java class. Java annotations are Pythonized in a way that turns them into decorators that add a special attribute to the decorated element: __java_annotations__. Annotations on imported Java classes and methods would also be exposed through the __java_annotations__ property for consistency. Access modifiers would similarly add a __java_access__ property to the object they decorate.

Kay Schluer also suggested allowing decorators on assignments, to be able to support annotations on fields. I don't really have an opinion on this. Since I don't think fields should be exported in any public API anyway it's a bit useless, and for the the cases where fields are used (such as dependency injection systems) I think it suffices to have it all in the same assignment: dependency = javax.inject.Inject(java.Access.private(java.Field(JavaClassIDependOn))), the name will be extracted to be "dependency" when the class is processed by the JavaClass class decorator. But if others find assignment decorators useful, I am not opposed to them. If assignment decorators are added to Python, it might be worth considering having a slightly different signature for these decorator function, so that the name of the target variable is passed as a parameter as well. Then my example could look like this:

@java.JavaClass
class WithInjectedDependency:
    @javax.inject.Inject # This is JSR 330 by the way
    @java.Access.private
    @java.Field
    dependency = JavaClassIDependOn
    # could expand to: dependency = javax.inject.Inject(
    #     "dependency", java.Access.private(
    #         "dependency", java.Field(
    #             "dependency", JavaClassIDependOn)))
    # or to the same thing as above, depending on how
    # assignment decorators were implemented...

When defining methods in Java integration classes we use Python 3 function annotations to define the method signatures. These can be omitted, the default types in that case would of course be java.lang.Object. It is important that we support exposing classes that don't have any Java integration added to them from Jython, since we want to enable importing existing Python libraries into Java projects and use them without having to port them. These classes will not have the JavaClass decorator applied to them. Instead this will be done automatically by Jython at the point when the Python class first need to expose a class to Java. This is not something that the java module need to deal with, since it doesn't fit with other Python implementations.

Outstanding issues

There are still a few Java integration issues that I have not dealt with, because I have not found a solution that I feel good about yet.

Defining Java interfaces
Is this something we need to be able to do? If so, the proper approach is probably to add a JavaInterface decorator to the java module, similar to the JavaClass decorator.
Defining Java enums
This might be something that we want to support. I can think of two options for how to declare the class. Either we add a JavaEnum decorator to the java module, or we add special case treatment for when a class extends java.lang.Enum (I am leaning towards this approach). Then we need to have some way to define the enum instances. Perhaps something like this:
@java.JavaClass
class MyEnum(java.lang.Enum):
    ONE = java.EnumInstance(1)
    TWO = java.EnumInstance(2, True)
    THREE = java.EnumInstance(3, True)
    FOUR = java.EnumInstance(4)
    def __init__(self, number, is_prime=False):
        self.number = number
        self.is_prime = is_prime
    def __str__(self):
        return self.name()
    class SEVENTEEN(java.EnumInstance):
        """This is an enum instance with specialized behavior.
        Will extend MyEnum, but there will only be one instance."""
        def __init__(self):
            """This class gets automatically instantiated
            by the __metaclass__ of Enum."""
            self.number = 17
            self.is_prime = True
        def __str__(self):
            return "The most random number there is."
Defining generic types
I have discussed how to specify type parameters for generic types, but how would you define a generic Java type in Python? How about something like this:
@java.JavaClass
class GenericClass:
    T = java.TypeParameter() # default is "extends=java.lang.Object"
    C = java.TypeParameter(extends=java.util.concurrent.Callable)
This gets complicated when wanting to support self references in the type parameters, but the same is true for implemented interfaces, such as:
class Something implements Comparable<? extends Something> {
    ...
}
Defining Java annotations
I have dealt with supporting the use of Java annotations, but what about defining them? I highly doubt that defining Java annotations in Python is going to be useful, but I prefer to not underestimate what developers might want to do. I do however think we could get far without the ability to define Java annotations in Python, but if we were to support it, what would it look like? Defining the class would probably be a lot like how enums are defined, either by special casing java.lang.annotation.Annotation or providing a special java.Annotation decorator.
@java.JavaInterface
class MyAnnotation(java.lang.annotation.Annotation):
    name = java.AnnotationParameter(java.lang.String)
    description = java.AnnotationParameter(java.lang.String, default="")

java for other Python implementations

I mentioned that requiring the user to explicitly import java to make use of Java classes would make it possible for other Python implementations to support the same Java integration API. So what would the default implementation of the java module look like? There is a very nice standardized API for integrating with Java from other external programming languages: JNI. The default java module would simply implement the same functionality as the Jython counterpart by interacting with JNI using ctypes. Since ctypes is supported by all Python implementations (Jython support is under development) the java integration module would work across all Python implementations without additional effort. Right there is a major advantage over JPype and JCC (the two major Java integration modules for CPython today).

Integration from the Java perspective

I have not given as much thought to the area of utilizing Python code from Java. Still this is one of the most important tasks for Jython to fulfill. This section is therefore just going to be some ideas of what I want to be able to do.

Use Python for application scripting
This is possible today, and a quite simple case, but I still think that it can be improved. Specifically the problem with Jython today is that there is no good API for doing so. Or to be frank, there is hardly an API at all. This is being improved upon though, the next update of Jython will include an updated implementation of the Java Scripting API, and the next release will introduce a first draft of a proper Jython API, something that we will support long term after a few iterations, and that you can build your applications against.
Use Jython to implement parts of your application
We want to be able to write an polyglot applications, where parts of it is implemented in Python. This is more than just scripting the application. Applications generally work without scripts. We want to be able to write the implementation of parts of an application in Python with Jython. This is possible today, but a bit awkward without an official Jython API. This is being worked on in a separate project called PlyJy, where we are experimenting with an API for creating object factories for Jython. Jython object factories are objects that call into a Python module, instantiate a Python class, conforms it to a Java interface and returns it. So far this project is looking good and there is a good possibility that this will get included in the Jython API.
Directly link (Java) applications to Python code
This is where things are starting to get advanced. It would be nice if you could write a library in Python (or import an existing one) and link your Java code with the classes and functions defined in that library directly. This would require Jython to generate Java proxies, actual Java classes where the methods correspond to the actual signatures, with proper constructors and the things you would need to use it like any other Java code, while hiding away the dynamic aspects that make it Python. This could either be done through a compilation step, where some Jython proxy compiler generates the proxies that the Java code can link with, or through utilizing a ClassLoader that loads a Python module and inspects the content, automatically generating the required proxies. With the ClassLoader approach javac would need to know about and use it to load signatures from Python code. This is of course where the Java integration decorators described above fits in.

What do you think?

I would love to get feedback on these ideas. Either through comments to this entry, via Twitter or on the Jython-dev mailing list.

Please note that the ideas presented in this blog post are my own and does not reflect any current effort in the Jython project.

6 comments:

Stu said...

This sounds great.. a common java api for all pythons would mean more eyes on the code.. the other integration ideas sound good too :)

The decorators look good too (although time would tell)

Also looking forward to ctypes working in jython, this is good stuff and will hopefully unlock some more libraries.

Dino said...

In IronPython "import clr" isn't actually required for all access to .NET functionality. You can still do "import System" or pick your .NET namespace of choice and get those namespaces w/o doing import clr. And "import System" will actually have the same result as import clr: .NET members of shared typed (int, str, object, long, complex, etc...) will become available on the Python objects. So before one of these imports you get an error doing object().ToString(). Afterwards it works.

But we'll probably need to change that as 3.x rolls on: the importer will be written in Python and we won't be able to mark the calling module when we resolve one of these imports. And furthermore we'll need to use a real import hook for resolving the .NET namespaces and types. So we'll probably end up making "import clr" work like from __future__ imports and recognize it and only it at compile time.

I'd personally lean towards always having the loader hook present and having an option which disables it. The odd thing about doing it when you encounter "import java" is that it'll globally alter other modules - so it seems like it may as well be a global option from the start.

We're added a new __clrtype__ feature in 2.6 for supporting defining .NET classes from Python. But it's just a very primitive tool requiring the user to actually create the .NET type via Python code. We hope to plumb the Python side out later or have some ambitious user come up w/ something really cool. After we find the right API it'll probably end up like yours as a bunch of helpers in the clr module.

kayschluehr said...

I'd personally lean towards always having the loader hook present and having an option which disables it.

I'm not sure a zero-effort integration of Jython/IronPython code with CPython is even a realistic goal.

I don't think system configuration should not be handled on module level but using a command line option instead. This would require a bit of an architectural supplement which had to be PEP-ed. I'd suggest writing one if the problem was really pressing.

Otherwise a root package might define an "import java" statement which can be omitted for Jython which has different preferences/defaults than CPython or IronPython.

kayschluehr said...

Kay Schluehr also suggested allowing decorators on assignments, to be able to support annotations on fields. I don't really have an opinion on this. Since I don't think fields should be exported in any public API anyway it's a bit useless

The purpose of adding all the boilerplate to Python classes is to lift the Python interface to Java providing a complete description. This doesn't mean that the fields have to be public. Annotating private fields is very common in Java, just look at examples using persistence annotations in JEE. Those are unrelated to DI but provide information to the ORM.

About Python DSL syntax for defining Java interfaces / annotations in Python. This is useless since the most succinct way to define them is to state the interfaces directly in Java. The definitions might be inlined in Python and compiled dynamically. One doesn't have to leave the script.

Stu said...
This comment has been removed by the author.
Stu said...

Decorators or not, I'd quite like to see a Java module that worked in CPython.