Thursday, November 19, 2009

Another Pair of Unladen Swallow Optimizations

Today a patch of mine was committed to Unladen Swallow. In the past weeks I've described some of the optimizations that have gone into Unladen Swallow, in specific I looked at removing the allocation of an argument tuple for C functions. One of the "on the horizon" things I mentioned was extending this to functions with a variable arity (that is the number of arguments they take can change). This has been implemented for functions that take a finite range of argument numbers (that is, they don't take *args, they just have a few arguments with defaults). This support was used to optimize a number of builtin functions (dict.get, list.pop, getattr for example).

However, there were still a number of functions that weren't updated for this support. I initially started porting any functions I saw, but it wasn't a totally mechanical translation so I decided to do a little profiling to better direct my efforts. I started by using the cProfile module to see what functions were called most frequently in Unladen Swallow's Django template benchmark. Imagine my surprise when I saw that unicode.encode was called over 300,000 times! A quick look at that function showed that it was a perfect contender for this optimization, it was currently designated as a METH_VARARGS, but in fact it's argument count was a finite range. After about of dozen lines of code, to change the argument parsing, I ran the benchmark again, comparing it a control version of Unladen Swallow, and it showed a consistent 3-6% speedup on the Django benchmark. Not bad for 30 minutes of work.

Another optimization I want to look at, which hasn't landed yet, is one of optimize various operations. Right now Unladen Swallow tracks various data about the types seen in the interpreter loop, however for various operators this data isn't actually used. What this patch does is check at JIT compilation time whether the operator site is monomorphic (that is there is only one pair of types ever seen there), and if it is, and it is one of a few pairings that we have optimizations for (int + int, list[int], float - float for example) then optimized code is emitted. This optimized code checks the types of both the arguments that they are the expected ones, if they are then the optimized code is executed, otherwise the VM bails back to the interpreter (various literature has shown that a single compiled optimized path is better than compiling both the fast and slow paths). For simple algorithm code this optimization can show huge improvements.

The PyPy project has recently blogged about the results of the results of some benchmarks from the Computer Language Shootout run on PyPy, Unladen Swallow, and CPython. In these benchmarks Unladen Swallow showed that for highly algorithmic code (read: mathy) it could use some work, hopefully patches like this can help improve the situation markedly. Once this patch lands I'm going to rerun these benchmarks to see how Unladen Swallow improves, I'm also going to add in some of the more macro benchmarks Unladen Swallow uses to see how it compares with PyPy in those. Either way, seeing the tremendous improvements PyPy and Unladen Swallow have over CPython gives me tremendous hope for the future.

8 comments:

  1. Awesome stuff. I'm really enjoying the performance race between the bird and the pie. These things are really pushing the pony forward.

    Magical.

    ReplyDelete
  2. You made me happy! Thanks for all your job with Python!

    ReplyDelete
  3. Personally, I'd be very interested in seeing benchmarks, both macro- and micro-, comparing CPython, Unladen Swallow and PyPy. Please, bring them on!

    ReplyDelete
  4. Hopefully we can see some feedback between pypy and unladen swallow with some of the optimisations going into both.
    It would be great if we can end up with two super fast pythons at the end of this :)


    On a completely different note, having started iphone development, I'd love to see unladen swallow appearing on there as objective C is pretty ugly.

    ReplyDelete
  5. "Another optimization I want to look at, which hasn't landed yet"... I'm not sure what that means. Has it been submitted, or is it something you're thinking about doing, or...?

    I'm very interested in "mathy" code some I'm very interested in these kinds of optimizations being done!

    ReplyDelete
  6. BTW, thanks for contributing to the Unladen Swallow project -- it's well appreciated. We're using it now for a bit of the back-end stuff at http://www.flyfi.com.

    ReplyDelete
  7. is PyPy really alive (after quitting LLVM)?

    in contrast, LLVM has a lot of gasoline in it -- I'm buying pop-corn to watch this movie :)

    ReplyDelete
  8. @vak: Pypy is more than alive, and has shown tremendous progress the last months, follow its status blog, from where the linked article comes!

    ReplyDelete

Note: Only a member of this blog may post a comment.