Lazy Pythonista

Sunday, November 1, 2009

Another month of blogging?

Last year I started this blog during the November, blog every day for a month month. This year I'm hoping to repeat the feat, blogging every single day this month. Today's post is a bit light on content, but I'm hoping to give a preview of what I'm going to be blogging about. This month I'm hopping to blog about advanced Django techniques, Comet (and other HTTP push technology), and I'm probably going to be adding some more political content to the usual technical content of this blog.

Lastly, I'm hoping to move this blog either to another hosted provider, or something on my own servers, but somewhere where I can get a tad bit more control.

In any event here's to a productive month of bloggging!

Saturday, October 10, 2009

Optimising compilers are there so that you can be a better programmer

In a discussion on the Django developers mailing list I recently commented that the performance impact of having logging infrastructure, in the case where the user doesn't want the logging, could essentially be disregarded because Unladen Swallow (and PyPy) are bringing us a proper optimising (Just in Time) compiler that would essentially remove that consideration. Shortly thereafter someone asked me if I really thought it was the job of the interpreter/compiler to make us not think about performance. And my answer is: the job of a compiler is to let me program with best practices and not suffer performance consequences for doing things the right way.

Let us consider the most common compiler optimisations. A relatively simple one is function inlining, in the case where including the body of the function would be more efficient than actually calling it, a compiler can simply move the functions body into its caller. However, we can actually do this optimisation in our own code. We could rewrite:


    def times_2(x):
        return x * 2
    
    def do_some_stuff(i):
        for x in i:
            # stuff
            z = times_2(x)
        # more stuff

as:


    def do_some_stuff(i):
        for x in i:
            # stuff
            z = x * 2
        # more stuff

And this is a trivial change to make. However in the case where ``times_2`` is slightly less trivial, and is used a lot in our codebase it would be exceptionally more programming practice to repeat this logic all over the place, what if we needed to change it down the road? Then we'd have to review our entire codebase to make sure we changed it everywhere. Needless to say that would suck. However, we don't want to give up the performance gain from inlining this function either. So here it's the job of the compiler to make sure functions are inlined when possible, that way we get the best possible performance, as well as allowing us to maintain our clean codebase.

Another common compiler optimisation is to transform multiplications by powers of 2 into binary shifts. Thus ``x * 2`` becomes ``x << 1``. However, humans don't usually think in terms of binary operations, and it makes our codebase less readable therefore. Once again, however, it does give better performance. This is another case where best practices are in conflict with efficient code. However, once again, our compiler is capable of solving this for us. When ``x`` is known to be an integer it is trivial for a compiler to make this transformation for us, once again allowing us to have both the performance we want, and the good programming practices we need.

A final optimisation we will consider is constant propagation. Many program have constants that are used throughout the codebase. These are often simple global variables. However, once again, inlining them into methods that use them could provide a significant benefit, by not requiring the code to making a lookup in the global scope whenever they are used. But we really don't want to do that by hand, as it makes our code less readable ("Why are we multiplying this value by this random float?", "You mean pi?", "Oh."), and makes it more difficult to update down the road. Once again our compiler is capable of saving the day, when it can detect a value is a constant it can propagate it throughout the code.

So does all of this mean we should never have to think about writing optimal code, the compiler can solve all problems for us? The answer to this is a resounding no. A compiler isn't going to rewrite your insertion sort into Tim sort, nor is it going to fix the fact that you do 700 SQL queries to render your homepage. What the compiler can do is allow you to maintain good programming practices.

So what does this mean for logging in Django? Fundamentally it means that we shouldn't be concerned with possible overhead from calls that do nothing (in the case where we don't care about the logging) since a good compiler will be able to eliminate those for us. In the case where we actually do want to do something (say write the log to a file) the overhead is unavoidable, we have to open a file and write to it, there's no way to optimise it out.

Friday, August 14, 2009

Django-filter 0.5 released!

I've just tagged and uploaded Django-filter 0.5 to PyPi. The biggest change this release brings is that the package name has been changed from `filter` to `django_filters` in order to avoid conflicts with the Python builtin `filter`. Other changes included the addition of an `initial` argument on Filters, as well as the addition of an `AllValuesFilter` which is a `ChoiceFilter` who's choices are any values currently in the DB for that field. Despite the change in package name I will not be changing the name of the project due to the overhead in moving the repository (Github doesn't set up redirects when you change a project's name) and the PyPi package. I hope everyone enjoys this new release, as a lot of it's improvements have come out of my usage on Django-filter in piano-man.

As for what the future holds several people have indicated their interest in the inclusion of django-filter in Django itself as a contrib package, and for usage in the Admin as a new implementation of the `list_filter` option that is more flexible. Because of this my next work is probably going to be on implementing a custom `ModelAdmin` class that uses `FilterSets` for filtering.

You can find the latest release on PyPi and Github.

Sunday, July 12, 2009

pyvcs .2 released

Hot on the heels of our .1 release (it's only been a week!) I'm pleased to announce the .2 release of pyvcs. This release brings with it lots of new goodies. Most prominent among these are the newly-added Subversion and Bazaar backends. There are also several bug fixes to the code browsing features of the Mercurial backend. This release can be found at:

PyPi: http://pypi.python.org/pypi/pyvcs
Github: http://github.com/alex/pyvcs/

If you find any bugs with this release please report them at:

http://github.com/alex/pyvcs/issues

Thanks for all the contributions to this release, almost everything you see in this release is because community members contributed to it, hardly any new code in here is originally written by Justin or I.

We are hoping to have some exciting announcements for piano-man coming up in the next couple of weeks.

Enjoy.

Sunday, July 5, 2009

Announcing pyvcs, django-vcs, and piano-man

Justin Lilly and I have just released pyvcs, a lightweight abstraction layer on top of multiple version control systems, and django-vcs, a Django application leveraging pyvcs in order to provide a web interface to version control systems. At this point pyvcs exists exclusively to serve the needs of django-vcs, although it is separate and completely usable on its own. Django-vcs has a robust feature set, including listing recent commits, pretty diff rendering of commits, and code browsing. It also supports multiple projects. Both pyvcs and django-vcs currently support Git and Mercurial, although adding support for a new backend is as simple as implementing four methods and we'd love to be able to support additional VCS like Subversion or Bazaar. Django-vcs comes with some starter templates (as well as CSS to support the pretty diff rendering).

It goes without saying that we'd like to thank the authors of the VCS themselves, in addition we'd like to thank the authors of Dulwich, for providing a pure Python implementation of the Git protocol, as well as the Pocoo guys, for pygments, the syntax highlighting library for Python, as well as the pretty diff rendering which we lifted out of the lodgeit pastbin application.

Having announced what we have already, we'll now be looking towards the future. As such Justin and I plan to be starting a new Django project "piano-man". Piano-man, a reference to Billy Joel, follows the Django tradition of naming things after musicians (although we've branched out a bit, leaving the realm of Jazz in favour of Rock 'n Roll). Piano-man will be a complete web based project management system, similar to Trac or Redmine. There are a number of logistical details that we still need to sort out, such as whether this will be a part of Pinax as a "code edition" or whether it will be a separate project entirely, like Satchmo and Reviewboard.

Some people are inevitably asking why we've chosen to start a new project, instead of working to improve one of the existing ones I alluded to. The reason is, after hearing coworkers complain about poor Git support in Trac (even with external plugins), and friends complain about the need to modify Redmine just to support branches in Git properly I'd become convinced it couldn't possibly be that hard, and I think Justin and I have proven that it isn't. All the work you see in pyvcs and django-vcs took 48 hours to complete, with both of us working evenings and a little bit during the day on these projects.

You can find both django-vcs and pyvcs on PyPi as well on Github under my account (http://github.com/alex/), both are released under the BSD license. We hope you enjoy both these projects and find them helpful, and we'd appreciate and contributions, just file bugs on Github. I'll have another blog post in a few days outlining the plan for piano-man once Justin and I work out the logistics. Enjoy.

Edit: I seem to have forgotten all the relevant links, here they are

Sorry about that.

Thursday, June 4, 2009

A response to "Python sucks"

I recently saw an article on Programming Reddit, titled, "Python sucks: Why Python is not my favourite programming language". I personally like Python quite a lot (see my blog title), but I figured I might read an interesting critique, probably from a very different point of view from mine. Unfortunately that is the opposite of what I found. The post was, at best, a horribly misinformed inaccurate critique, and at worst an intentionally dishonest, misleading, farce. The post can be found here. I felt the need to respond to it precisely because it is so lacking in facts, and reading it one can get impressions that are completely incorrect, and I am hoping I can correct some of these.

The post's initial statements about iterating over a file are accurate. However, he then goes on to say Python supports closures (which is true), and follows this with a piece of code that has absolutely nothing to do with closures, it is actually a callable object (or as C++ calls them, functors). The authors seems to take issue with these (though he doesn't explain why), ignoring the fact that Python has complete support for actual closures, not just callable objects.

The author then claims that Python has many other such arbitrary rules, using as an example the "yield" keyword. The author appears to be claiming the behavior of the yield keyword is arbitrary and poorly defined, however it's very unclear what his point actually is, or what the source of his complaints is. My only response can be to say that the "yield" keyword *always* turns the function it's used in into a generator, that is to say it returns an iterable that lazy evaluates the function, pausing each time it reaches the yield statement, and returning that object.

The author claims that many of the arbitrary decisions in Python are a result of Guido's insistence on a specific programming style, using as an example crippled lambdas. It is generally accepted that in Python lambda is just syntactic sugar for defining a function within any context (which the author completely ignores in his discussion of closure). To say that lambdas are crippled is to ignore the fact that absolutely nothing is rendered impossible by this, except for unreadable one liners.

The author's final complaint is directed at Python's C-API. This is possibly his least accurate critique. The author compares what is necessary to use a C library from within various programming languages. He shows that in Python all you have to do is import the library like you would for normal Python code. However, he goes on to say that for this to work you need to write lots of C boilerplate, and says that in other programming languages (showing examples from Haskell and PLT Scheme) this boiler plate is unnecessary. However, this is a completely disingenuous comparison. This is because what he is showing for Haskell and Scheme is their foreign function interface, not any actual language level integration. To do what he shows in Python is perfectly possible using the included ctypes library. I'm not familiar with the C-API of either Haskell or PLT Scheme, however I imagine that in order to work seamlessly and have the APIs appear the same as in code in those languages it is still necessary to write boiler plate so that the interpreter can recognize them.

In conclusion that blog post was a critique completely devoid of value, not worth the bytes that are used to store it. This is not to say there aren't any valid criticisms of Python, there are many, as evidenced by any number of recent blog posts discussing "5 things they hate about technology X", where technology X is something the author likes, because no technology is perfect, however no such honest critique was present here.

Tuesday, May 5, 2009

EuroDjangoCon 2009

EuroDjangoCon 2009 is still going strong, but I wanted to share the materials from my talk as quickly as possible. My slides are on Slide Share:

Forms, Getting Your Money's Worth

View more presentations from kingkilr.

And the first examples code follows:


from django.forms.util import ErrorList
from django.utils.datastructures import SortedDict

def multiple_form_factory(form_classes, form_order=None):
   if form_order:
       form_classes = SortedDict([(prefix, form_classes[prefix]) for prefix in
           form_order])
   else:
       form_classes = SortedDict(form_classes)
   return type('MultipleForm', (MultipleFormBase,), {'form_classes': form_classes})

class MultipleFormBase(object):
   def __init__(self, data=None, files=None, auto_id='id_%s', prefix=None,
       initial=None, error_class=ErrorList, label_suffix=':',
       empty_permitted=False):
       if prefix is None:
           prefix = ''
       self.forms = [form_class(data, files, auto_id, prefix + form_prefix,
           initial[i], error_class, label_suffix, empty_permitted) for
           i, (form_prefix, form_class) in enumerate(self.form_classes.iteritems())]

   def __unicode__(self):
       return self.as_table()

   def __iter__(self):
       for form in self.forms:
           for field in form:
               yield field

   def is_valid(self):
       return all(form.is_valid() for form in self.forms)

   def as_table(self):
       return '\n'.join(form.as_table() for form in self.forms)

   def as_ul(self):
       return '\n'.join(form.as_ul() for form in self.forms)

   def as_p(self):
       return '\n'.join(form.as_p() for form in self.forms)

   def is_multipart(self):
       return any(form.is_multipart() for form in self.forms)

   def save(self, commit=True):
       return tuple(form.save(commit) for form in self.forms)
   save.alters_data = True

EuroDjangoCon has been a blast thus far and after the conference I'll do a blogpost that does it justice.