Sunday, December 28, 2008

Building a Read Only Field in Django

One commonly requested feature in Django is to have a field on a form(or in the admin), that is read only. Such a thing is going may be a Django 1.1 feature for the admin, exclusively, since this is the level that it makes sense at, a form is by definition for inputing data, not displaying data. However, it is still possible to do this with Django now, and it doesn't even take very much code. As I've said, doing it in this manner(as a form field) isn't particularly intuitive or sensible, however it is possible.

The first thing we need to examine is how we would want to use this, for our purposes we'll use this just like we would a normal field on a form:

from django import forms
from django.contrib.auth.models import User

class UserForm(forms.ModelForm):
email = ReadOnlyField()

class Meta:
model = User
fields = ['email', 'username']


So we need to write a field, our field will actually need to be a subclass of FileField, at first glance this makes absolutely no sense, our field isn't taking files, it isn't taking any data at all. However FileFields receive the initial data for their clean() method, which other fields don't, and we need this behavior for our field to work:

class ReadOnlyField(forms.FileField):
widget = ReadOnlyWidget
def __init__(self, widget=None, label=None, initial=None, help_text=None):
forms.Field.__init__(self, label=label, initial=initial,
help_text=help_text, widget=widget)

def clean(self, value, initial):
self.widget.initial = initial
return initial


As you can see in the clean method we are exploiting this feature in order to give our widget the initial value, which it normally won't have access to at render time.

Now we write our ReadOnlyWidget:

from django.forms.util import flatatt

class ReadOnlyWidget(forms.Widget):
def render(self, name, value, attrs):
final_attrs = self.build_attrs(attrs, name=name)
if hasattr(self, 'initial'):
value = self.initial
return "%s

" % (flatatt(final_attrs), value or '')

def _has_changed(self, initial, data):
return False


Our widget simply renders the initial value to a p tag, instead of as an input tag. We also override the _has_changed method to always return False, this is used in formsets to avoid resaving data that hasn't changed, since our input can't change data, obviously it won't change.

And that's all there is to it, less than 25 lines of code in all. As I said earlier this is a fairly poor architecture, and I wouldn't recommend it, however it does work and serves as proof that Django will allow you to do just about anything in you give in a try.

Thursday, December 25, 2008

Building a Function Templatetag

One of the complaints often lobbied against Django's templating system is that there is no way to create functions. This is intentional, Django's template language is not meant to be a full programming language. However, if one wants to it is entirely possible to build a templatetag to allow a user to create, and call functions in the template language.


To get started we need to build a tag to create functions, first we need our parsing function:

from django import template

register = template.Library()

@register.tag
def function(parser, token):
arglist = token.split_contents()[1:]
func_name = arglist.pop(0)
nodelist = parser.parse(('endfunction',))
parser.delete_first_token()
return FunctionNode(func_name, arglist, nodelist)


The format for our tag is going to be {% function func_name arglist... %}. So we parse this out of the token. We get everything after the initial piece of the function definition, we make the first part of this the function name, and the rest is the argument list. The next part is to parse until the {% endfunction %} tag. Finally we return a FunctionNode. Now we need to actually build this node:


class FunctionNode(template.Node):
def __init__(self, func_name, arglist, nodelist):
self.func_name = func_name
self.nodelist = nodelist
self.arglist = arglist

def render(self, context):
if '_functions' not in context:
context['_functions'] = {}
context['_functions'][self.func_name] = (self.arglist, self.nodelist)
return ''


Our __init__ method just stores the data on the instance. Our render method stores the argument list and the nodes that make up the function in the context. This gives us all the information we will need to know in order to call and render our functions. Now we need our actual mechanism for calling our functions:


@register.tag
def call(parser, token):
arglist = token.split_contents()[1:]
func_name = arglist.pop(0)
return CallNode(func_name, arglist)


Like our function tag, we parse out the name of the function, and then the argument list. And now we need to render the result of calling it:

class CallNode(template.Node):
def __init__(self, func_name, arglist):
self.func_name = func_name
self.arglist = arglist

def render(self, context):
arglist, nodelist = context['_functions'][self.func_name]
c = template.Context(dict(zip(arglist, [template.Variable(x).resolve(context) for x in self.arglist])))
return nodelist.render(c)


render gets the arglist and nodelist out of the context, and then creates a context of the calling variables, by zipping together the variable names from function definition and function calling.

Now we can create functions by doing:

{% function f arg %}
{{ arg }}
{% endfunction %}


And call them by doing:

{% call f some_var %}
{% call f some_other_var %}


Hopefully this has given you a useful insight into how to build powerful templatetags in Django's template language. One possible improvement the reader may want to do is to have the function tag actually register a templatetag out of the function definition, and then be able to use it by simpling using it like a normal templatetag. As a slight warning I haven't tested this with a wide range of data, so if there are any issues please report them.

Many Thanks to Webfaction

One of my recent projects has been to build a website for someone so that they could put their portfolio online. Building the site with Django, was of course, a snap. However, I would like to take a moment to thank Webfaction for the painless deployment. It's a small website, so shared hosting is perfect for it, and Webfaction made it a breeze. Setting up a PostgreSQL database was a snap. Getting Django installed and my code on the server was easy. My one pain point was of my own creation(I broke the PythonPath setting in the WSGI file Webfaction provides), but even with that deployment didn't take more than an hour. They even made it easy to use the ultralight weight Nginx for my static files.

All said, Webfaction was exactly what I needed from a host, and they delivered. For the curious parties you can see the site here.

Monday, December 15, 2008

PyCon '09, Here I come!

This past year I attended PyCon 2008 in Chicago, which was a tremendous conference. I had a chance to meet people I knew from the community, listen to some amazing talks, meet new people, and get to sprint. As a result of this tremendous experience I decided for this year to submit a talk proposal. I found out just a few minutes ago that my proposal has been accepted.

I proposed a panel on "Object Relational Mapper Philosophies and Design Decisions". This panel is going to look at the design decisions that each of several ORMs engaged, and what philosophies they had, but with respect to their public APIs and their internal code design. Participating in the panel will be:
  • Jacob Kaplan-Moss, representing Django
  • Ian Bicking, representing SQL Object
  • Mike Bayer, represening SQL Alchemy
  • Guido van Rossum, representing Google App Engine
  • Dr. Massimo Di Pierro, representing web2py
I'm tremendously honored to be able to moderate a panel at PyCon, especially with these five individuals. They are all indcredibly smart, and they each bring a different insight and perspective to this panel.

PyCon is a great conference and I would encourage anyone who can to attend.

Friday, December 5, 2008

Playing with Polymorphism in Django

One of the most common requests of people using inheritance in Django, is to have the a queryset from the baseclass return instances of the derives model, instead of those of the baseclass, as you might see with polymorphism in other languages. This is a leaky abstraction of the fact that our Python classes are actually representing rows in separate tables in a database. Django itself doesn't do this, because it would require expensive joins across all derived tables, which the user probably doesn't want in all situations. For now, however, we can create a function that given an instance of the baseclass returns an instance of the appropriate subclass, be aware that this will preform up to k queries, where k is the number of subclasses we have.

First let's set up some test models to work with:

from django.db import models

class Place(models.Model):
name = models.CharField(max_length=50)

def __unicode__(self):
return u"%s the place" % self.name


class Restaurant(Place):
serves_pizza = models.BooleanField()

def __unicode__(self):
return "%s the restaurant" % self.name

class Bar(Place):
serves_wings = models.BooleanField()

def __unicode__(self):
return "%s the bar" % self.name



These are some fairly simple models that represents a common inheritance pattern. Now what we want to do is be able to get an instance of the correct subclass for a given instance of Place. To do this we'll create a mixin class, so that we can use this with other classes.

class InheritanceMixIn(object):
def get_object(self):
...

class Place(models.Model, InheritanceMixIn):
...


So what do we need to do in our get_object method? Basically we need to loop each of the subclasses, try to get the correct attribute and return it if it's there, if none of them are there, we should just return ourself. We start by looping over the fields:

class InheritanceMixIn(object):
def get_object(self):
for f in self._meta.get_all_field_names():
field = self._meta.get_field_by_name(f)[0]


_meta is where Django stores lots of the internal data about a mode, so we get all of the field names, this includes the names of the reverse descriptors that related models provide. Then we get the actual field for each of these names. Now that we have each of the fields we need to test if it's one of the reverse descriptors for the subclasses:

from django.db.models.related import RelatedObject

class InheritanceMixIn(object):
def get_object(self):
for f in self._meta.get_all_field_names():
field = self._meta.get_field_by_name(f)[0]
if isinstance(field, RelatedObject) and field.field.primary_key:


We first test if the field is a RelatedObject, and if it we see if the field on the other model is a primary key, which it will be if it's a subclass(or technically any one to one that is a primary key). Lastly we need to find what the name of that attribute is on our model and to try to return it:

class InheritanceMixIn(object):
def get_object(self):
for f in self._meta.get_all_field_names():
field = self._meta.get_field_by_name(f)[0]
if isinstance(field, RelatedObject) and field.field.primary_key:
try:
return getattr(self, field.get_accessor_name())
except field.model.DoesNotExist:
pass
return self


We try to return the attribute, and if it raises a DoesNotExist exception we move on to the next one, if none of them return anything, we just return ourself.

And that's all it takes. This won't be super efficient, since for a queryset of n objects, this will take O(n*k) given k subclasses. Ticket 7270 deals with allowing select_related() to work across reverse one to one relations as well, which will allow one to optimise this, since the subclasses would already be gotten from the database.

Tuesday, December 2, 2008

A month in review

Today is the last day of blogging every day for a month. It was harder than I expected, but I ultimately made it, which also surprised me. For my last post(of the month) I figured I'd share some stats about the blog from the month.
  • About 10k visitors and 13k page views
  • My 3 post popular posts were "What Python learned from Economics", "How the heck do Django Models Work", and "A Timeline View in Django"
  • My top 3 referers were reddit, ycombinator, and the Django project site.
  • Top search keyword was "django models"
  • 2/3 of my viewers were using Firefox
  • Under 3% IE(yay)
  • 28% were Linux users

I don't think I'll ever blog this frequently again(at least not until next November), however I hope to continue writing fairly frequently and I have at least two more posts planned for December.

Monday, December 1, 2008

Fixing up our identity mapper

The past two days we've been looking at building an identity mapper in Django. Today we're going to implement some of the improvements I mentioned yesterday.

The first improvement we're going to do is it have it execute the query as usual and just cache the results, to prevent needing to execute additional queries. This means changing the __iter__ method on our queryset class:

def __iter__(self):
for obj in self.iterator():
try:
yield get_from_cache(self.model, obj.pk)
except KeyError:
cache_instance(obj)
yield obj


Now we just iterate over self.iterator() which is a slightly lower level interface to a querysets iteration, it bypasses all the caching that occurs(this means that for now at least, if we iterate over our queryset twice we actually execute two queries, whereas Django would normally do just one), however overall this will be a big win, since before if an item wasn't in the cache we would do an extra query for it.

The next improvement I proposed was to use Django's built in caching interfaces. However, this won't work, this is because the built in locmem cache backend pickles and unpickles everything before caching and retrieving everything from the cache, so we'd end up with different objects(which defeats the point of this).

The last improvement we can make is to have this work on related objects for which we already know the primary key. The obvious route to do this is to start hacking in django.db.models.fields.related, however as I've mentioned in a previous post this area of the code is a bit complex, however if we know a little bit about how this query is executed we can do the optimisation in a far simpler way. As it turns out the related object descriptor simply tries to do the query using the default manager's get method. Therefore, we can simply special case this occurrence in order to optimise this. We also have to make a slight chance to our manager, as by default the manager won't be used on related object queries:

class CachingManager(Manager):
use_for_related_fields = True
def get_query_set(self):
return CachingQuerySet(self.model)

class CachingQueryset(QuerySet):
...
def get(self, *args, **kwargs):
if len(kwargs) == 1:
k = kwargs.keys()[0]
if k in ('pk', 'pk__exact', '%s' % self.model._meta.pk.attname, '%s__exact' % self.model._meta.pk.attname):
try:
return get_from_cache(self.model, kwargs[k])
except KeyError:
pass
clone = self.filter(*args, **kwargs)
objs = list(clone[:2])
if len(objs) == 1:
return objs[0]
if not objs:
raise self.model.DoesNotExist("%s matching query does not exist."
% self.model._meta.object_name)
raise self.model.MultipleObjectsReturned("get() returned more than one %s -- it returned %s! Lookup parameters were %s"
% (self.model._meta.object_name, len(objs), kwargs))


As you can see we just add one line to the manager, and a few lines to the begging of the get() method. Basically our logic is if there is only one kwarg to the get() method, and it is a query on the primary key of the model, we try to return our cached instance. Otherwise we fall back to executing the query.

And with this we've improved the efficiency of our identity map, there are almost definitely more places for optimisations, but now we have an identity map in very few lines of code.

A Few More Thoughts on the Identity Mapper

It's late, and I've my flight was delayed for several hours so today is going to be another quick post. With that note here are a few thoughts on the identity mapper:
  • We can optimize it to actually execute fewer queries by having it run the query as usual, and then use the primary key to check the cache, else cache the instance we already have.
  • As Doug points out in the comments, there are built in caching utilities in Django we should probably be taking advantage of. The only qualification is that whatever cache we use needs to be in memory and in process.
  • The cache is actually going to be more efficient than I originally thought. On a review of the source the default manager is used for some related queries, so our manager will actually be used for those.
  • The next place to optimize will actually be on single related objects(foreign keys and one to ones). That's because we already have their primary key and so we can check for them in the cache without executing any SQL queries.

And lastly a small note. As you may have noticed I've been doing the National Blog Everyday for a Month Month, since I started two days late, I'm going to be continueing on for another two days.