Tuesday, December 15, 2009

Seamless Neo4j integration in Django

About a year ago I gave a presentation at Devoxx where I showed off how easy it was to use any Java library with Django in Jython. The library I demonstrated this with was of course Neo4j. I had written some code for using Neo4j to define models for Django, and now it is ready to be released for you to use it.

The way that the integration between Django and Neo4j is implemented is in the Model layer. Since Neo4j does not have a SQL engine it would not have been efficient or practical to implement the support as a database layer for Django. Google did their implementation in the same way when they integrated BigTable with Django for App Engine. This means that there will be some minor modifications needed in your code compared to using PostgreSQL or MySQL. Just as with BigTable on App Engine, you will have to use a special library for defining your models when working with Neo4j, but the model definition is very similar to Djangos built in ORM. With persistence systems that integrate on the database layer the only difference is in configuration, but that requires the database to fit the mold of a SQL database.

Why the **** has this taken a year to finish?

Short answer: The cat ate my source code.

A mess of symlinks that stemmed from the fact that Jython didn't have good support for setuptools when I started writing this code actually lead to the complete loss of my source code. But to be honest the code wasn't that good anyways. I wanted to add support for Django's administration interface, and I knew that undertaking would require a complete rewrite of my code. A complete rewrite is done and now it will be possible for me to support the administrative interface of Django in the next release. So why not until now, a year after the first prototype? I was working on other things, it's that simple.

Getting started

While the demonstration I gave a year ago was geared towards Jython, since that was the topic of the presentation, the Python bindings for Neo4j work equally well with CPython. That is all you need, Neo4j and Django, the Python bindings for Neo4j comes with a Django integration layer built in as of the most recent revisions in the repository. The source distribution also contains a few sample applications for demonstrating how the integration works. The Django integration is still in a very early stage of development, but the base is pretty solid, so new features should be much easier to add now. Since the state is pre-alpha, installation from source is the only option at the moment. Let me walk you through how to get things up and running:

  • Set up and activate a virtualenv for your development. This isn't strictly necessary, but it's so nice to know that you will not destroy your system Python installation if you mess up. Since we got Jython to support virtualenv I use it for everything. If you use CPython your virtualenv will contain a python executable, and if you use Jython it will contain a jython executable, I will refer to either simply as python from here on, but substitute for that for jython if you, like me, prefer that implementation.
  • If you are using CPython: Install JPype, it is currently a dependency for accessing the JVM-based core of Neo4j from CPython:
    $ unzip JPype-0.5.4.1.zip
    $ cd JPype-0.5.4.1
    $ python setup.py install
    
  • Check out the source code for the Python bindings for Neo4j, and install it:
    $ svn co https://svn.neo4j.org/components/neo4j.py/trunk neo4j-python
    $ cd neo4j-python
    $ python setup.py install
    
  • Install Django:
    $ easy_install django
    
  • Create a new Django project:
    $ django-admin.py startproject neo4django
    
  • Create a new app in your Django project:
    $ python neo4django/manage.py startapp business
    
  • Set up the configuration parameters for using with Neo4j in Django by adding the following configurations to your settings.py:
    NEO4J_RESOURCE_URI = '/var/neo4j/neo4django'
    # NEO4J_RESOURCE_URI should be the path to where
    #    you want to store the Neo4j database.
    
    NEO4J_OPTIONS = {
        # this is optional and can be used to specify
        # extra startup parameters for Neo4j, such as
        # the classpath to load Neo4j from.
    }
    
    You can ignore the default Django configurations for RDBMS connections if you only plan to use Neo4j, but if you want to use Djangos built in Admin interface (not supported with Neo4j quite yet) or authentication module you will need to configure this.
  • You are now ready to create your first Neo4j backed domain objects for your Django application, by editing business/models.py. Let's create a simple model for companies with owners and employees:
    from neo4j.model import django_model as model
    
    class Person(model.NodeModel):
        first_name = model.Property()
        last_name = model.Property()
        def __unicode__(self):
            return u"%s %s" % (self.first_name, self.last_name)
    
    class Company(model.NodeModel):
        name = model.Property(indexed=True)
        owners = model.Relationship(Person,
            type=model.Outgoing.OWNED_BY,
            related_name="owns",
        )
        employees = model.Relationship(Person,
            type=model.Incoming.WORKS_AT,
            related_name="employer",
            related_single=True, # Only allow Persons to work at one Company
        )
        def __unicode__(self):
            return self.name
    
  • That's it, you've created your first Django domain model using Neo4j, let's try it out:
    $ python neo4django/manage.py shell
    >>> from neo4django.business import models
    >>> seven_eleven = models.Company.objects.create(name="Seven Eleven")
    >>> seven_eleven.employees.add(
    ...     models.Person.objects.create(
    ...         first_name="Sally", last_name="Getitdone"),
    ...     models.Person.objects.create(
    ...         first_name="John", last_name="Workerbee"))
    >>> seven_eleven.save() # store the newly created relationships
    >>> people = list(seven_eleven.employees.all())
    >>> someone = people[0]
    >>> print someone, "works at", someone.employer
    

Notice how the model objects are compatible with model objects created using Djangos built in ORM, making it easy to port your existing applications to a Neo4j backend, all you need to change is the model definitions. For more examples, see the example directory in the repository: https://svn.neo4j.org/components/neo4j.py/trunk/src/examples/python/.

Future evolution

There is still more work to be done. As this is the first release, there are likely to be bugs, and I know about a few things (mainly involving querying) that I have not implemented support for yet. I also have a list of (slightly bigger) features that I am going to add as well, to keep you interested, I'll list them with a brief explanation:

  • Add support for the Django admin interface. You should be able to manage your Neo4j entities in the Django administration interface, just as you manage ORM entities. To do this I need to dig further into the internals of the admin source code, to find out what it expects from the model objects to be able to pick up on them and manage them. The hardest part with this is that the admin system has a policy of silent failure, meaning that it will not tell me how my code violates its expectations.
  • Add support for Relationship models. Currently you can only assign properties to nodes in the domain modeling API, you should be able to have entities represented by relationships as well. The way you will do this is by extending the Relationship-class.
  • Add a few basic property types. I will add support for creating your own property types by extending the Property-class (this is implemented already, but not tested, so if it works it's only by accident). I will also add a few basic subtypes of Property, a datetime type at the very least. I will also add support for choosing what kind of index to use with each indexed property, in the case of datetime a Timeline-index seems quite natural for example... Supporting enumerated values for Properties is also planned, i.e. limiting the set of allowed values to an enumerated set of values.
  • Tapping in to the power of Neo4j. By adding support for methods that do arbitrary operations on the graph (such as traversals), and where the returned nodes are then automatically converted to entity objects. I think this will be a really cool and powerful feature, but I have not worked out the details of the API yet.

Report any bugs you encounter to either the Neo4j bug tracker, or on the Neo4j mailing list. Suggestions for improvements and other ideas are also welcome on the mailing list, to me personally, or why not as a comment on this blog.

Happy Hacking