Wednesday 19 March 2008

Composite Primary Keys - Yuuurrghhh!

Well, nearly.

You have seen me write how I think that Composite Primary Keys are above all else a BAD THING(tm).

Well, I think they still are, as Primary Keys.

However, there is no reason not to have a unique combination of keys in a table. This I have never had an issue with.

So why?

Well, look at our tagging example. We don't want to assign a tag to an entity_type twice in our frequency table, at this point we just want to increment the frequency count. Here we hit a snag.
When we send info to our model, we don't know the id_tag_frequency, only the id_tag and the id_entity_type. This isn't a unique primary key.

However, not to worry. We just need a constraint to ensure that the combination here is unique (tags will be assigned to multiple entity_types, and entity_types will have multiple tags).

If we do this, then you can do a query lookup to return just one row.

So why not make this the Primary Key?

Well, if you do this, then you could lose out on simple coding like this

if (unique a + b) and not primary key { fetch primary key from database or create new entry }

but, more importantly, if later on you need to reference the table row in another table (which is the benefit of relational databases, isn't it?) you only need the single primary key put in the new table, instead of all the parts which make up the composite key. This maintains a DRY principle within the database, because, should you need to change part of the composite uniqueness in table 1, you don't need to do it in table 2 as well.

So, never worry about needing a unique combination of fields, that is great, but always have a unique single field primary key on the row. This should help future proof your database when you need new tables, and will make it much easier to create new programs, as you have now given people the option of searching via the composite unique key, or the single primary key.

An important aside to this. If you use an iscurrent boolean for your rows. It can be difficult to make this part of the unique composite key (because only 1 will be current), but do ensure that your code makes only 1 current for the entity it relates to. This will ensure that anyone querying the database will always be able to find the current one. I have had this problem working with a legacy database before, due to the fact that someone had put the constraint:

entity + status + date = unique composite primary key

because date (especially if stored as datetime) will generally give you a guaranteed unique combination, new rows were getting stored as iscurrent = 1, but all other rows for that entity and status were not getting set to iscurrent = 0. Whilst you could always use the following sql

select entity, status, date from table order by date

and select the one with the most recent date, you lose the benefit of just asking

select entity, status, date from table where iscurrent = 1

if that is all you want. And again, this future proofs development of new applications, as if the table is very large (and in bioinformatics data storage, they often get very exceptionally large) then developers will need to rely on the field names to determine how to use the table.

So, if you have an iscurrent, keep it up to date, otherwise, DON'T USE IT.

Rant over. Normal service resumes whenever we can determine what is normal anyway.

No comments: