Monday 17 March 2008

Are Clouds Real?

This is a bit of a summary of what we decided in the production of our database tables for tags.

Very simply, we wanted to be able to tag different entities with tags. We also want to be able
to track the tags used for an entity type (frequency, use for that entity type), so that we
could lookup that, and offer suggestions based on the usage already of that tag - i.e. a cloud
of tags for that entity_type (see scrumptious tagging for some application of that).

So, the first is simply to have a table of tags

id_tag | tag
-------------------

then we have a table of runs

id_run | other info about run
_________________________________

and then a join table

id_tag_run | id_run | id_tag
_________________________________

OK, so that is quite easy. But, we don't have any opportunity to look at any information on this,
particularly our 'clouds' of tags.

We looked at the possibility of a cloud table

id_cloud | ids_of_tags_in_cloud

But how do we know the entity (remember, we want to tag many different things)

In this case, we have an entity_type table

id_entity_type | description | iscurrent
_______________________________________________

From our models, we can get them to work out their entity_type, as our models are named
xxx::model:: where entity type is the name of the table. This is just a simple
method. So, we make sure the description matches. This gives us an id_entity_type.

id_cloud | ids_of_tags_in_cloud | id_entity_type
____________________________________________________

However, now we nothing of the frequencies the tags are used for an entity. We also need to
keep appending the tag id's to a field, which could make it difficult to represent each
individual tag in the model.

At this point, our discussion came about to what clouds actually are. (at this point the
science teacher in me said a collection of water vapour). We wondered about abstracting the
cloud, and try to collect the frequencies of a tag against the entity type

So, out with the cloud table (for now?) and in with the following

id_tag_frequency | id_entity_type | id_tag | frequency
_________________________________________________________

This is looking good. We can store the frequencies that a tag has been used on an entity,
and we also get an easy lookup for all tags that have been saved against an entity, which means
that we can abstract our cloud with an sql statement like this

cloud = select e.description as entity_type, t.tag as tag, frequency
from entity_type e, tag_frequency tf, tag t
where e.description = ?
and e.id_entity_type = tf.id_entity_type
and tf.id_tag = t.id_tag;

So, do we need any more information. It is useful to know who first saved a tag for an entity, and a date.
We can add this to our tag_ join table

id_tag_run | id_run | id_tag | id_user | date
_______________________________________________________

So what we have

tag table:
id_tag | tag

's table:
id_ | info about entity

tag_ tables: (one for each entity)
id_tag_ | id_ | id_tag | id_user | date

entity_type dictionary table:
id_entity_type | description | iscurrent(optional)

tag_frequency:
id_tag_frequency | id_entity_type | id_tag | frequency

and clouds are purely abstract entities created from the data in the tables. We just need to use to code
to (sql or within your program if another language) to calculate the frequency.

So, in the Clearpress MVC framework (or in Rails) we create a model for each of tag* tables with the accessors
for the table and hey presto, a setup for tagging any entities in our database. You could even tag a tag :)

No comments: