Friday, July 31, 2009

"Social networking" killed productivity

Twitter has become work. Not acceptable for work, i.e. something that is not frowned upon to do at work, but actual work, something you are required to do at work. At least this is the case if you are involved somewhere where the development team is the marketing team, like a startup or an open source project. For the record my involvement in Neo4j qualify to both categories, and Jython is most certainly an open source project, and quite a high profile such as well.

In order to stay on top of things in this situation you easily find yourself with push based twitter notification or at least reading a lot of material on a regular basis. I for example get about 150 to 200 tweets per day from the people I follow. Combine this with the expectation to stay on top of email (yet again yo go for push), and you've got a constant stream of interrupts, and this really kills productivity.

Just the other day I read the Life Offline posts by Aaron Swartz, and found that very much recognize myself in how he describes the problems of the constant online presence. It would be wonderful if I, like he was able to do, could take a long stretch of time away from being connected, but I don't think that is possible, at least not now or in a near future. The problem stands though, I am not being productive. And some things don't get done in time. And this is a problem.

I've tried shifting my email and twitter use to only do processing of these things once per day, but it still takes two hours or more from my day to simply process the incoming stream. By processing I mean:

  • Read all the email and define actions.
  • Read all tweets, open tabs for links that seem interesting, skim those pages and define actions.
  • Read feeds and define actions.

That takes two hours. Then I still have to perform the actions that I have defined. Which could take up to the rest of the day.

I noticed already about twelve years ago how destructive online communities and social networks could be, and how much time they consume. I have thus tried to stay away from them, which is why I don't use my facebook account. But when social networking has become part of work it is much harder to avoid. In the case of Twitter it is also difficult to ignore because of how hugely influential it is. Twitter is the de facto way to find out about new things and interesting articles.

I am starting to believe that perhaps Donald Knuth made a wise decision in not having an email address, but as he points out having an email address is for people who need to be on top of things, and that he does not have an email address because he does not have to be on top of things anymore. I will agree with that, Donald Knuth has contributed a lot to the field of computer science, but he is definitely not on top of things anymore. So how do you cope with both being on top of things while still being productive? Is it possible? I would love to get any insight into the secrets that I am obviously unaware of.

Wednesday, July 15, 2009

Improving performance in Jython

About two weeks ago I published a writeup about my findings about the performance of synchronization primitives in Jython from my presentation at JavaOne. During the presentation I said that these performance issues were something that I was going to work on, and improve. And indeed I did. I cannot take full credit for this, Jim Baker played a substantial part in this work as well. The end result is still something I'm very proud of since we managed to improve the performance of this benchmark as much as 50 times.

The benchmarks

The comparisons were performed based on the execution of this benchmark script invoked with:

  • JAVA_HOME=$JAVA_6_HOME jython synchbench.py
  • JAVA_HOME=$JAVA_6_HOME jython -J-server synchbench.py
# -*- coding: utf-8 -*-
from __future__ import with_statement, division

from java.lang.System import nanoTime
from java.util.concurrent import Executors, Callable
from java.util.concurrent.atomic import AtomicInteger

from functools import wraps
from threading import Lock

def adder(a, b):
    return a+b


count = 0
def counting_adder(a, b):
    global count
    count += 1 # NOT SYNCHRONIZED!
    return a+b


lock = Lock()
sync_count = 0
def synchronized_counting_adder(a, b):
    global sync_count
    with lock:
        sync_count += 1
    return a+b


atomic_count = AtomicInteger()
def atomic_counting_adder(a,b):
    atomic_count.incrementAndGet()
    return a+b


class Task(Callable):
    def __init__(self, func):
        self.call = func

def callit(function):
    @Task
    @wraps(function)
    def callable():
        timings = []
        for x in xrange(5):
            start = nanoTime()
            for x in xrange(10000):
                function(5,10)
            timings.append((nanoTime() - start)/1000000.0)
        return min(timings)
    return callable

def timeit(function):
    futures = []
    for i in xrange(40):
        futures.append(pool.submit(function))
    sum = 0
    for future in futures:
        sum += future.get()
    print sum

all = (adder,counting_adder,synchronized_counting_adder,atomic_counting_adder)
all = [callit(f) for f in all]

WARMUP = 20000
print "<WARMUP>"
for function in all:
    function.call()
for function in all:
    for x in xrange(WARMUP):
        function.call()
print "</WARMUP>"

pool = Executors.newFixedThreadPool(3)

for function in all:
    print
    print function.call.__name__
    timeit(function)
pool.shutdown()

glob = list(globals())
for name in glob:
    if name.endswith('count'):
        print name, globals()[name]

And the JRuby equivalent for comparison:

require 'java'
import java.lang.System
import java.util.concurrent.Executors
require 'thread'

def adder(a,b)
  a+b
end

class Counting
  def initialize
    @count = 0
  end
  def count
    @count
  end
  def adder(a,b)
    @count = @count + 1
    a+b
  end
end

class Synchronized
  def initialize
    @mutex = Mutex.new
    @count = 0
  end
  def count
    @count
  end
  def adder(a,b)
    @mutex.synchronize {
      @count = @count + 1
    }
    a + b
  end
end

counting = Counting.new
synchronized = Synchronized.new

puts "<WARMUP>"
10.times do
  10000.times do
    adder 5, 10
    counting.adder 5, 10
    synchronized.adder 5, 10
  end
end
puts "</WARMUP>"

class Body
  def initialize
    @pool = Executors.newFixedThreadPool(3)
  end
  def timeit(name)
    puts
    puts name
    result = []
    40.times do
      result << @pool.submit do
        times = []
        5.times do
          t = System.nanoTime
          10000.times do
            yield
          end
          times << (System.nanoTime - t) / 1000000.0
        end
        times.min
      end
    end
    result.each {|future| puts future.get()}
  end
  def done
    @pool.shutdown
  end
end

body = Body.new

body.timeit("adder") {adder 5, 10}
body.timeit("counting adder") {counting.adder 5, 10}
body.timeit("synchronized adder") {synchronized.adder 5, 10}

body.done

Where we started

A week ago the performance of this Jython benchmark was bad. Compared to the equivalent code in JRuby, Jython required over 10 times as much time to complete.

When I analyzed the code that Jython and JRuby generated and executed, I came to the conclusion that the reason Jython performed so badly was that the call path from the running code to the actual lock/unlock instructions introduced too much overhead for the JVM to have any chance at analyzing and optimizing the lock. I published this analysis in my writeup on the problem. It would of course be possible to lower this overhead by importing and utilizing the pure Java classes for synchronization instead of using the Jython threading module, but we like how the with-statement reads for synchronization:

with lock:
    counter += 1

Getting better

Based on my analysis of the how the with-statement compiles and the way that this introduces overhead I worked out the following redesign of the with-statement context manager interaction that would allow us to get closer to the metal, while remaining compatible with PEP 434:

  • When entering the with-block we transform the object that constitutes the context manager to a ContextManager-object.
  • If the object that constitutes the context manager implements the ContextManager interface it is simply returned. This is where context managers written in Java get their huge benefit by getting really close to the metal.
  • Otherwise a default implementation of the ContextManager is returned. This object is created by retrieving the __exit__ method and invoking the __enter__ method of the context manager object.
  • The compiled code of the with-statement then only invokes the __enter__ and __exit__ methods of the returned ContextManager object.
  • This has the added benefit that even for context managers written in pure Python the ContextManager could be optimized and cached when we implement call site caching.

This specification was easily implemented by Jim and then he could rewrite the threading module in Java to let the lock implementation take direct benefit of the rewritten with-statement and thereby get the actual code really close to the locking and unlocking. The result were instantaneous and beyond expectation:

Not only did we improve performance, but we passed the performance of the JRuby equivalent! Even using the client compiler, with no warm up we perform almost two times better than JRuby. Turn on the server compiler and let the JIT warm up and perform all it's compilation and we end up with a speedup of slightly more than 50 times.

A disclaimer is appropriate here. With the first benchmark (before this was optimized) I didn't have time to wait for a full warmup. This because of the fact that the benchmark was so incredibly slow at that point and the fact that I was doing the benchmarks quite late before the presentation and didn't have time to leave it running over the night. Instead I turned down the compilation threshold of the Hotspot server compiler and ran just a few warmup iterations. It is possible that the JVM could have optimized the previous code slightly better given (a lot) more time. The actual speedup might be closer to the speedup from the first code to the new code using the client compiler and no warmup. But this is still a speedup of almost 20 times, which is still something I'm very proud of. There is also the possibility that I didn't run/implement the JRuby version in the best possible way, meaning that there might be ways of making the JRuby version run faster that I don't know about. The new figures are still very nice, much nicer than the old ones for sure.

The current state of performance of Jython synchronization primitives

It is also interesting to compare how the current implementation compares to the other versions in Jython that I included in my presentation:

Without synchronization the code runs about three times as fast as with synchronization, but the counter does not return the correct result here due to race conditions. It's interesting from the point of view of analyzing the overhead added by synchronization but not for an actual implementation. Two times overhead is quite good in my opinion. What is more interesting to see is that the fastest version from the presentation, the one using AtomicInteger, is now suffering from the overhead of reflection required for the method invocations compared to the synchronized version. In a system with more hardware threads (commonly referred to as "cores") the implementation based on AtomicInteger could still be faster though.

Where do we proceed from here?

Now that we have proven that it was possible to get a nice speedup from this redesign of the code paths the next step is to provide the same kind of optimizations for code written in pure Python. Providing a better version of contextlib.contextmanager that exploits these faster code paths should be the easiest way to improve context managers written in Python. Then there are of course a wide range of other areas in Python where performance could be improved through the same kind of thorough analysis. I don't know at this point what we will focus on next, but you can look forward to many more performance improvements in Jython in the time to come.

Monday, July 06, 2009

My view of EuroPython 2009

I was not very enthusiastic going to EuroPython this year. I don't like it when all I do is fly in, give a talk then fly out again, but that was all I could afford to do for EuroPython. I go to a few conferences in a year, and all of them can be filed under expenses for me, since I don't have an employer that pays for my conference trips. With EuroPython being in Birmingham and the UK not being the cheapest country there is there was no chance for the to afford staying more than two days.

My plane landed Tuesday at lunch time, and my talk was the first talk after the afternoon keynote. I arrived to the venue with about two and a half hours to take care of registration and payment. I had requested to not pay when I registered online and was told that the best solution would be to pay when I arrived. My reason was of course that I didn't have any money when they required my online registration, freelance open source development does not pay every month. The only problem with this was that when I arrived there was no one there to take care of registration, it was only open in the morning. I tried asking a few guys in staff t-shirts, but they were not very helpful, and seemed like they just wanted me to go away. I decided to wait until the next day with my registration and went to see the keynote.

It was nice to get to see Cory Doctorow in person. His keynote was about the copyright war and why it matters to us as developers. Scary stuff. The world that the media industry is forcing on us is all but pleasant. He could even provide examples of actual cases that have already happened where large music publishers have forced open source developers working on projects they didn't like to change profession or face millions of dollars in fines. Not on the basis of the software being illegal (it wasn't), but with a lawsuit on copyright infringement from downloaded MP3 files. He also talked about how the media industry seem to prefer the entire internet to be illegal, along with open source software all together, something that Reinout van Rees has a more complete summary of.

My presentation on what would make Jython a better Python for the JVM felt like the best presentation I have given to this day. The material was an updated version of the talk I gave at PyCon earlier this year. Experience really does make you a better speaker and this was the 11th presentation of my speaker career so far (in less than two years). It really feels good to stand in front of people and talk when a large number of people are nodding and smiling at what you are saying throughout most of the presentation. And the audience was great here, they asked a lot of good questions, both during the presentation and afterwards. I didn't go through all of my slides but I covered all of my material. Some things came more natural with other slides with this crowd, so I altered the talk slightly while doing it. This was a scenario I had prepared for, I was prepared to use the slides in either way, and with this audience I felt more comfortable doing it this way. The only negative aspect of the presentation was the room, it was laid out in such a way that if I stood by my computer I would block parts of the projected screen. I had to switch to my remote control and step away from the computer to allow everyone to see the material, which meant that I could not read my speaker notes on my screen. Fortunately I knew my material well enough.

The rest of the day I went around to a few different talks in the venue, still without badge. Then me and the rest of the Jython team that were there rounded off the day at a chinese restaurant with Michael Foord and some fifteen other attendees, followed by a pint at an outdoor bar before me and Frank headed to our hotel. The food and beer was great but the conference venue and the hotel were not. As I mentioned the venue had a problem with the room I presented in, but being in the basement of the Birmingham Conservatoire it was dark and small, with corridors to all the presentation rooms. Not an ideal place for a conference. The hotel (Premier Inn) was probably the worst I've experienced at a conference so far. When we checked in they were out of twin rooms even though that was what we had ordered, so we had to share a queen size bed. There were not enough towels, no AC, breakfast not included, and only one of our key cards worked. When showering one could choose between scolding hot drizzle, freezing cold shower, or loudly howling pipes. All of this at a rate of £65 per night.

The next morning I made a new attempt at registering. This failed again, this time due to the fact that my registration had not been paid. At this point I felt rather put off by the conference, I had payed for the plane and the hotel, and was expected to pay for entrance to a conference where my talk had been the only rewarding thing. I sent an e-mail to the organizers expressing my disappointment and asking them if they wanted my money and what I should do in order to give it to them. In the response I got they said that if I was not happy then maybe I should not have proposed a talk. How was I supposed to know that I was going to be unhappy about the conference before I got there? EuroPython 2008 in Vilnius was great, I had no reason to expect it to not be good in Birmingham 2009. It took me until the afternoon before I was able to track down the organizers, pay for my registration and get my badge. I did not get a t-shirt since they had made a mess of them and were unable to find a shirt in the size I had ordered.

Now that I was finally registered and all I felt that I had earned the EuroPython dinner that night. This was a very nice addition to the conference, a purely social event with the other attendees. After dinner and a pint at a nice pub I got about four hours of sleep before I headed to the airport. My plane took off at 6:30 am, so I got to the train station at 4 am to catch a sufficiently early train to the airport only to find out that the first train of the day left at 5:30. I had to run when I arrived at the airport with only 20 minutes until final boarding call. With check in luggage I would have never made it, but since I like to travel lightly and only had a small backpack I made it to the gate just after priority boarding had finished. It still would have been much better if the flight had departed an hour or so later, because I had to wait 160 minutes for the bus after I landed in Sweden.

Despite all the problems I think it was worth my time and money to go to EuroPython, the community is great, I love the Python people! But I don't think I'll attend next year unless I get the entire conference payed for. When it's not in Birmingham anymore I'll be more interested in attending again. There were also a number of interesting talks that I will now summarize.

Frank Wierzbicki's talk about web frameworks and Jython
Franks talk was right after mine, in the same room. Frank has also become a much better speaker in the past year, and I enjoyed seeing all the web frameworks that we now support with Jython. And the fact stands that for normal applications Jython performs about the same as CPython, which is nice for all of these web frameworks. I tend to forget this since I spend most of my time looking at the places where Jython performance should improve.
Mark Shannon's talk about his Python VM "HotPy"
Python is now starting to see the phenomenon that Charles Nutter has been talking about in the Ruby community: a number of new implementations and VMs are popping up and claiming to have better performance on a lot of things for different reasons. HotPy is a research Python VM that builds on a research VM toolkit by Mark Shannon from University of Glasgow. It is not complete but has a good approach to optimization, optimizing the code yields better result than compiling it.
Christian Tismer's talk about the status of Psyco
This was really impressive to me. Psyco V2 is being released this weekend. An heroic upgrading effort done by Christian alone. Everything about Psyco V2 is impressive. It yields enormous performance boosts to Python, to the point where Christian has started replacing builtin functions that are written in C in CPython with corresponding versions written in Python to improve performance. And properties get about a 100 times speedup with Psyco. I need to look at the source for Psyco and find out which of these optimization techniques we can apply to Jython, and perhaps even in the JVM.
Thursday afternoon keynote about Bletchley Park
Sue Black of the Saving Bletchley Park project and Simon Greenish gave a keynote presentation about Bletchley Park and how they are using social media to attract more visitors. The presentation featured an actual working Enigma machine on stage.
Thursday evening keynote by Sir Charles Antony Richard Hoare
Tony Hoare talked about the difference between science and engineering. Not the most entertaining talk, I had hoped for some interesting and motivating story on what had been driving his career. It would for example have been great to hear how he came up with the idea for quicksort, or hear about what interesting projects he is working on now. I did like his conclusion that Computer science and Software engineering are still allowed to be imperfect, being a much younger science than for example physics or biology, both of which were just studies of simple observable phenomena when they were at the age that computer science is today. His vision was that at some point in the future software engineering will be the most reliable form of engineering, since software never degrades. I like this vision, but I agree that computer science has to mature and evolve before we can develop zero fault software. Finally his response during Q&A to the question "Is there research that can help marketing come up with specifications so that we engineers can build the software?" was very entertaining: "That's the engineer's job. Marketing doesn't understand programming. Neither does marketing understand the customers.".