Engineering at Otovo 🏗


Sometimes, old design decisions comes back to bite you. This is one of those tales.

A project I’m working on had a Django model similar to this:

class Municipality(models.Model):
    code = models.CharField(max_length=2, primary_key=True)
    name = models.CharField(max_length=100)

and it was used by other models such as:

class ZipCode(models.Model):
    code = models.CharField(max_length=4, primary_key=True)
    municipality = models.ForeignKey(Municipality)

And all was good, until we needed to expand the municipality model to support different countries, and thus a single unique field with only the code - which may collide across countries, was not enough.

For all the modern parts of the code base we use UUIDs as the primary key, so we wanted to migrate the primary key of Municipality to a UUID, while retaining all the relations.

As of September 2017, Django does not support migrating primary keys in any nice manner, so we’re on our own here.

We tried a lot of magic solutions, but we always got stuck with the migrations system not being able to detect and handle the changes nicely.

After some research and quite a bit of trial and error, we settled on the following approach. It has some drawbacks I’ll get back to, but works reasonable well.

A quick reminder on how the world looks from a database perspective. When you define a ForeignKey field in Django, it creates a database column of the same type as the primary key of the referenced model on the referring model, and adds the foreign key constraints. So in the example above, we have two tables (in pseudo SQL):

CREATE TABLE municipality (
   code varchar(2) PRIMARY KEY NOT NULL,
   name varchar(100)
);

CREATE TABLE zipcode (
   code varchar(4) PRIMARY KEY NOT NULL,
   municipality_id VARCHAR(2) REFERENCES(municipality.id) NOT NULL
);

So, we need to:

  1. Break the foreign key constraints.
  2. Alter the root model.
  3. Map the the new primary key ids to the old ones.
  4. Re-apply the foreign keys to it.

We start by breaking the foreign keys in the referring models.

class ZipCode(models.Model):
    code = ...  # Same as before
    municipality = models.CharField(max_length=2)  # Foreign key removed here
python manage.py makemigrations -n break_zipcode_muni_foreignkey

Now that the Municipality model is free from any external referring models, we can go to work on it.

Start by adding the new id field:

class Municipality(models.Model):
    id = models.UUIDField(default=uuid.uuid4)
python manage.py makemigrations -n add_id_field_to_muni

For some reason, using uuid.uuid4() as a default function in the migration didn’t work in my case, so I added a step in the created migration to create new unique ids for all rows:

def create_ids(apps, schema_editor):
    Municipality = apps.get_model('myapp', 'Municipality')
    for m in municipality:
        m.id = uuid.uuid4()
        m.save()

# ...

operations = [
    migrations.AddField(...),
    migrations.RunPython(
        code=create_ids,
        reverse_code=migrations.RunPython.noop,
    ),
]

Now we have a UUID id field on Municipality, and we should be able to switch the primary key around:

class Municipality(models.Model):
    id = models.UUIDField(default=uuid.uuid4, primary_key=True)  # primary_key added
    code = models.CharField(max_length=2, unique=True)  # primary_key replaced with unique

Create the migration, and make sure that the AlterField operation for code is run before the AlterField on id so that we never have two primary keys at the same time. We’ve added primary_key to the id field and unique=True to the code field, since we still want to enforce that constraint for now, and we lost it when we removed the primary_key attribute from it.

Congratulations, we now have a new UUID primary key. But we still need to clean up everything we broke the foreign keys from.

Lets start by creating an empty migration:

python manage.py makemigrations --empty -n fix_zipcode_fk_to_muni_uuid myapp

Open the file, and let us begin:

def match(apps, schema_editor):
    ZipCode = apps.get_model('myapp', 'ZipCode')
    Municipality = apps.get_model('myapp', 'Municipality')
    for zip_code in ZipCode.object.all():
        zip_code.temp_muni = Municipality.objects.get(code=z.municipality)
        zip_code.save()

# ...
operations = [
    migrations.AddField(
        model_name='zipcode',
        name='temp_muni',
        field=models.UUIDField(null=True),
    ),
    migrations.RunPython(
        code=match,
        reverse_code=migrations.RunPython.noop,
    ),
    migrations.RemoveField(model_name='zipcode', name='municipality'),
    migrations.RenameField(
        model_name='zipcode', old_name='temp_muni', new_name='municipality'),
    migrations.AlterField(
        model_name='zipcode',
        name='municipality',
        field=models.ForeignKey(
            on_delete=django.db.models.deletion.PROTECT,
            to='municipality')
]

Let us go through the steps here.

  1. Add a temporary field for storing the UUID of Municipality that we want to connect to. We don’t make it a ForeignKey field just yet, as Django gets confused about the naming later on.
  2. We run the match function to look up the new ids by the old lookup key, and store it in the temporary field temp_muni.
  3. Remove the old municipality field.
  4. Rename the temporary field to municipality.
  5. Finally migrate the type of municipality to a foreign key to create all the database constraints we need.

And there you go. All done.

There are some down sides here. Since we split the migrations into several files/migrations, we leave ourself vulnerable if any of the later migrations fail. This will probably leave the application in a pretty unworkable state. So make sure to test the migrations quite a bit. You can reduce the risk by hand editing all the steps into one migration, but if you have references from multiple different apps, then you need to the breaking and restoring in separate steps anyway.

Logging / Debugging

You’ll most likely end up with some SQL errors during the process of creating these, so a nice trick I like to do is to create a simple logging migration operation.

def log(message):
    def fake_op(apps, schema_editor):
        print(message)
    return fake_op


 # ...

 operations = [
     migration.RunPython(log('Step 1')),
     migration.AlterField(..),
     migration.RunPython(log('Step 2')),
     # ...
 ]

This allows you to see where in the process the migration fail.

To see what SQL Django creates for a given migration, run python manage.py sqlmigrate <appname> <migration_number>. This is super useful for checking whether operations are run in the order that you expect.