Sunday 6 September 2009

Backending Catalyst

I am starting to look at Catalyst as a method to display some quality control results that are coming out of our analysis pipeline.
ClearPress is a nice MVC webapp builder, but is quite a light weight framework, and uses Class::Accessor for object base. We would like to move towards using Moose based objects and need a way to integrate these into Catalyst.

I am currently working my way through the latest Catalyst book (Diment & Trout), but before this arrived I found we had the following book on our Safari Subscription - Catalyst: Accelerating Perl Web Application Development: Design, develop, test and deploy applications with the open-source Catalyst MVC framework - Jonathan Rockway.

Now, note, I had been through the Tutorial on CPAN, but couldn't find on there anything about using a Filesystem as a source for the model (Did I miss something?), but this book luckily had a section on doing so.

Firstly, why have we QC data in a filesystem?

When we run the pipeline, this all happens on a staging area, which we write everything to, and then copy all our data into long term archival databases. The QC data is no exception, but we only want to archive the final agreed data. Bioinformaticians don't seem to ever be happy with a first pass that fails, if there is any chance it could be improved (i.e. a new test pipeline version, could rerunning jut squeeze 2% more...). As such we want to view the data in exactly the same way from the filesystem as from the database, because we don't want it stored until the last possible moment.

What have we done for this?

My team have been producing Moose objects which are:

1) Producing the data
2) Storing in JSON files (MooseX::Storage)
3) Reading in JSON files (MooseX::Storage) to re-instantiate the object
4) Saving to a Database (Fey)
5) Re-instantiating from a Database

I've been working with iterations of the objects, using the files, but want the objects to just sort it themselves - I shouldn't know where the data has come from, and these objects should be used in (in fact are being written for) other applications.

Catalyst very much guides you to using a Database, and seems to prefer using DBIx::Class for this, so I need a way of guiding the Model to provide the correct objects, which are not generated directly from Catalyst helpers.

What did I do?

So in the above book, I found the section 'Implementing a FileSystem model'. This shows us how to create a Backend, which takes us out of the ordinary Model style, and the call the the Model returns this Backend object instead. We then use this Backend object to contain the logic which can be used to obtain the objects from somewhere outside of the Catalyst application, de-coupling the data models from the app, and therefore increasing flexibility and maintainability. As I said, these objects are actually being written within another application project.

This has been an interesting venture, which has enabled me to write a web application which only concentrates on the logic for the view, and leave the data handling completely to someone else. We should be production ready with the application within the week, and displaying data for the users quickly and simply.

What the betting someone asks if we can regenerate the data for all previous runs? I won't be betting against it, that's for sure.

No comments: