Tuesday, December 15, 2009

Seamless Neo4j integration in Django

About a year ago I gave a presentation at Devoxx where I showed off how easy it was to use any Java library with Django in Jython. The library I demonstrated this with was of course Neo4j. I had written some code for using Neo4j to define models for Django, and now it is ready to be released for you to use it.

The way that the integration between Django and Neo4j is implemented is in the Model layer. Since Neo4j does not have a SQL engine it would not have been efficient or practical to implement the support as a database layer for Django. Google did their implementation in the same way when they integrated BigTable with Django for App Engine. This means that there will be some minor modifications needed in your code compared to using PostgreSQL or MySQL. Just as with BigTable on App Engine, you will have to use a special library for defining your models when working with Neo4j, but the model definition is very similar to Djangos built in ORM. With persistence systems that integrate on the database layer the only difference is in configuration, but that requires the database to fit the mold of a SQL database.

Why the **** has this taken a year to finish?

Short answer: The cat ate my source code.

A mess of symlinks that stemmed from the fact that Jython didn't have good support for setuptools when I started writing this code actually lead to the complete loss of my source code. But to be honest the code wasn't that good anyways. I wanted to add support for Django's administration interface, and I knew that undertaking would require a complete rewrite of my code. A complete rewrite is done and now it will be possible for me to support the administrative interface of Django in the next release. So why not until now, a year after the first prototype? I was working on other things, it's that simple.

Getting started

While the demonstration I gave a year ago was geared towards Jython, since that was the topic of the presentation, the Python bindings for Neo4j work equally well with CPython. That is all you need, Neo4j and Django, the Python bindings for Neo4j comes with a Django integration layer built in as of the most recent revisions in the repository. The source distribution also contains a few sample applications for demonstrating how the integration works. The Django integration is still in a very early stage of development, but the base is pretty solid, so new features should be much easier to add now. Since the state is pre-alpha, installation from source is the only option at the moment. Let me walk you through how to get things up and running:

  • Set up and activate a virtualenv for your development. This isn't strictly necessary, but it's so nice to know that you will not destroy your system Python installation if you mess up. Since we got Jython to support virtualenv I use it for everything. If you use CPython your virtualenv will contain a python executable, and if you use Jython it will contain a jython executable, I will refer to either simply as python from here on, but substitute for that for jython if you, like me, prefer that implementation.
  • If you are using CPython: Install JPype, it is currently a dependency for accessing the JVM-based core of Neo4j from CPython:
    $ unzip JPype-0.5.4.1.zip
    $ cd JPype-0.5.4.1
    $ python setup.py install
    
  • Check out the source code for the Python bindings for Neo4j, and install it:
    $ svn co https://svn.neo4j.org/components/neo4j.py/trunk neo4j-python
    $ cd neo4j-python
    $ python setup.py install
    
  • Install Django:
    $ easy_install django
    
  • Create a new Django project:
    $ django-admin.py startproject neo4django
    
  • Create a new app in your Django project:
    $ python neo4django/manage.py startapp business
    
  • Set up the configuration parameters for using with Neo4j in Django by adding the following configurations to your settings.py:
    NEO4J_RESOURCE_URI = '/var/neo4j/neo4django'
    # NEO4J_RESOURCE_URI should be the path to where
    #    you want to store the Neo4j database.
    
    NEO4J_OPTIONS = {
        # this is optional and can be used to specify
        # extra startup parameters for Neo4j, such as
        # the classpath to load Neo4j from.
    }
    
    You can ignore the default Django configurations for RDBMS connections if you only plan to use Neo4j, but if you want to use Djangos built in Admin interface (not supported with Neo4j quite yet) or authentication module you will need to configure this.
  • You are now ready to create your first Neo4j backed domain objects for your Django application, by editing business/models.py. Let's create a simple model for companies with owners and employees:
    from neo4j.model import django_model as model
    
    class Person(model.NodeModel):
        first_name = model.Property()
        last_name = model.Property()
        def __unicode__(self):
            return u"%s %s" % (self.first_name, self.last_name)
    
    class Company(model.NodeModel):
        name = model.Property(indexed=True)
        owners = model.Relationship(Person,
            type=model.Outgoing.OWNED_BY,
            related_name="owns",
        )
        employees = model.Relationship(Person,
            type=model.Incoming.WORKS_AT,
            related_name="employer",
            related_single=True, # Only allow Persons to work at one Company
        )
        def __unicode__(self):
            return self.name
    
  • That's it, you've created your first Django domain model using Neo4j, let's try it out:
    $ python neo4django/manage.py shell
    >>> from neo4django.business import models
    >>> seven_eleven = models.Company.objects.create(name="Seven Eleven")
    >>> seven_eleven.employees.add(
    ...     models.Person.objects.create(
    ...         first_name="Sally", last_name="Getitdone"),
    ...     models.Person.objects.create(
    ...         first_name="John", last_name="Workerbee"))
    >>> seven_eleven.save() # store the newly created relationships
    >>> people = list(seven_eleven.employees.all())
    >>> someone = people[0]
    >>> print someone, "works at", someone.employer
    

Notice how the model objects are compatible with model objects created using Djangos built in ORM, making it easy to port your existing applications to a Neo4j backend, all you need to change is the model definitions. For more examples, see the example directory in the repository: https://svn.neo4j.org/components/neo4j.py/trunk/src/examples/python/.

Future evolution

There is still more work to be done. As this is the first release, there are likely to be bugs, and I know about a few things (mainly involving querying) that I have not implemented support for yet. I also have a list of (slightly bigger) features that I am going to add as well, to keep you interested, I'll list them with a brief explanation:

  • Add support for the Django admin interface. You should be able to manage your Neo4j entities in the Django administration interface, just as you manage ORM entities. To do this I need to dig further into the internals of the admin source code, to find out what it expects from the model objects to be able to pick up on them and manage them. The hardest part with this is that the admin system has a policy of silent failure, meaning that it will not tell me how my code violates its expectations.
  • Add support for Relationship models. Currently you can only assign properties to nodes in the domain modeling API, you should be able to have entities represented by relationships as well. The way you will do this is by extending the Relationship-class.
  • Add a few basic property types. I will add support for creating your own property types by extending the Property-class (this is implemented already, but not tested, so if it works it's only by accident). I will also add a few basic subtypes of Property, a datetime type at the very least. I will also add support for choosing what kind of index to use with each indexed property, in the case of datetime a Timeline-index seems quite natural for example... Supporting enumerated values for Properties is also planned, i.e. limiting the set of allowed values to an enumerated set of values.
  • Tapping in to the power of Neo4j. By adding support for methods that do arbitrary operations on the graph (such as traversals), and where the returned nodes are then automatically converted to entity objects. I think this will be a really cool and powerful feature, but I have not worked out the details of the API yet.

Report any bugs you encounter to either the Neo4j bug tracker, or on the Neo4j mailing list. Suggestions for improvements and other ideas are also welcome on the mailing list, to me personally, or why not as a comment on this blog.

Happy Hacking

11 comments:

Unknown said...

I wonder if your work for neo4j integration with Django could also be translated into integration with ZODB, a somewhat similar database for Python objects.

Tobias said...

@afd: Yes, the same approach (integrating in the model-layer rather than the db-layer) would be the easiest way for most databases where SQL is not the most efficient means of access. ZODB certainly falls into that category.

Fragilista said...

great news, gotta try it out!

Waldemar Kornewald said...

I'm not sure if this is compatible with Neo4J, but we're working on non-relational DB support for Django's ORM. Join our project, here:
http://bitbucket.org/wkornewald/django-nonrel-multidb/

The wiki links to more information like a discussion group, blog, etc.

Sergey M. Nikitin said...

Thanks a lot! I'll try to use it )

Sergey M. Nikitin said...

What system configuration do you have, I faced some problems compiling "neo4j-python" step

(distutils.errors.DistutilsError: Setup script exited with error: command 'gcc' failed with exit status 1)

tom said...

Waldemar, that link is broken...

Tobias, are there any updates to the django code?

http://pppeoplepppowered.blogspot.com/2010/08/django-and-neo4j.html

Tom

Waldemar Kornewald said...

@tom: thanks for notifying me. The project has been rewritten with a different approach which can be found here:

http://www.allbuttonspressed.com/projects/django-nonrel

R S said...

Hi Tobias,
I've tried installing Neo4Django in a couple of different environments and am not having good luck. Has there been any development in the code? I am curious to see if I can use it on my webhost (Dreamhost) but will keep working to see if I can get it running on my own server first.

Thank you!

isen said...

Hi Tobias,

I am so desperate to get this work and I would appreciate if you could hep me.

Everything goes fine for me: installation, creating project and starting app. But the problem comes when I try to create an object of Company (i.e seven_eleven = models.Company.objects.create(name="Seven Eleven") )

I get this error:

File "/sw/lib/python2.6/site-packages/django/db/models/manager.py", line 138, in create
return self.get_query_set().create(**kwargs)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/model/django_model/__init__.py", line 362, in create
obj.save(force_insert=True)
File "/sw/lib/python2.6/site-packages/django/db/models/base.py", line 460, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/_util.py", line 47, in __get__
method, graphdb = self.descr_get(obj, cls)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/_util.py", line 41, in descr_get
graphdb = self.accessor.__get__(obj, cls)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/model/django_model/__init__.py", line 58, in __get__
return DjangoNeo.neo
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/model/django_model/__init__.py", line 124, in neo
return self.__setup_neo()
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/model/django_model/__init__.py", line 136, in __setup_neo
self.__neo = NeoService(resource_uri, **options)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/__init__.py", line 522, in __new__
neo = core.load_neo(resource_uri, params)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/_core.py", line 332, in load_neo
return load_neo(resource_uri, parameters)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/_core.py", line 225, in load_neo
return GraphDatabase(resource_uri, settings, config, log)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/_core.py", line 230, in __init__
neo = backend.load_neo(resource_uri, settings)
File "/sw/lib/python2.6/site-packages/Neo4j.py-0.1_SNAPSHOT-py2.6.egg/neo4j/_backend/__init__.py", line 74, in load_neo
return impl(resource_uri, implementation.make_map(settings))
File "/sw/lib/python2.6/site-packages/jpype/_jclass.py", line 79, in _javaInit
self.__javaobject__ = self.__class__.__javaclass__.newClassInstance(*args)
java.lang.RuntimeExceptionPyRaisable: org.neo4j.kernel.impl.transaction.TransactionFailureException: Could not create data source [nioneodb], see nested exception for cause of error

Can you please tell me what is wrong.

Thank in adcance

isen said...

Nevermind, I fixed it:

First of all my problem was that I was keeping my neo4j database files in system folder, so the files could not be saved because of the permissions problems.

Another and mos annoying thing is the bug in [..]/site-packages/django/db/models/base.py where on line 460 , Model.save has got another argument from Django 1.2 named using. So I removed using=using on line 460 and now it works like charm.

Now, I have to figure out how to display neo4j database info on web page, sice I am beginner in Django as well :)
Cheers.
Isen