Saturday, 31 October 2009

10 years at The Sanger - Have I finally achieved my career goal

Friday the 30th was my last working day of year 10 at the Sanger Institute. I have worked in 4 teams whilst there, plus I had been there for 2.5 months before officially starting temping in the glasswash team. Also, in January, I hit the momentous age of 35. (Which, according to bible references to 3score and 10 being the life of a man, makes me middle aged).

So I wonder if I have finally reached what I wanted my career to really be. Over the next month, NaNoWriMo (http://www.nanowrimo.org/) is on and I have decided that I might just spend the next month putting to paper my life so far. I don't claim it will be great (I'm fairly average in most things), but I think it will be good to recap and look back, coming from a middle class background, who wanted to be a bricklayer when I played with Lego aged 5 and a computer programmer aged 13 (but couldn't find anything at school to help him), to being a teacher, scientist on Human Genome Project and a Perl developer.

Who knows, maybe it will interest people, maybe it won't. But a life recap can't be bad.
And maybe it will just make me think, life really isn't too bad, you know.

Sunday, 25 October 2009

Course on Moose and Catalyst

I work at the Wellcome Trust Sanger Institute. In my team, New Pipeline Development, we have identified a need for some training in Moose and Catalyst. If anyone has any course recommendations, I'd be interested in hearing about them to propose to my team and training co-ordinator.

The course can be aimed at people who are experienced Perl developers, but only have a small amount of experience of Moose and Catalyst (as anyone who reads my blog will probably identify).

Replies as Comments, or emails will be appreciated. Many Thanks in advance.

Wednesday, 21 October 2009

Does anyone else do this?

I've been away on holiday, and I'm just going back over some comments left on my blog, and got this one?

"Is that a new fashion to use q{string} instead of 'string'?"

My response:

"Just a habit I have gotten into since using perl::critic and PBP.

As '' and ' ' are not allowed by 'the rulez' and you should use q{} and q{ }, (and their double quote equivalents}, I have the coding habit of just using then straight off."

Anyone else doing this as a matter of course, or is it just me? I notice going through a number of recent books, ' and " are used in most of the code examples.

Monday, 12 October 2009

How to put your perl ironman pic on blogspot

Go to your dashboard, and under the chosen blog, select layout

Under Add and arrange page elements, select one of the two add a gadget depending if you want it on your sidebar, or at the bottom.

Select the html/javascript gadget.

Give it a title.

In content type a html img tag containing the following

http://ironman.enlightenedperl.org/munger/mybadge/male/setitesuk.png

replacing setitesuk with your username or nickname (male can be switched to female).

click save and hey presto, it will now appear the next time you load your blog.

Note: I am sure I am duplicating this post from somewhere. Partly, this is for my own info, partly to help anyone who also can't find the details easily (apart from mst's blog about the url for the image.)

Sunday, 11 October 2009

Is it easier to be specific?

I'm re-assessing some code, trying to refactor it into a re-usable role. Simple, you might think? But is it?

It is very easy to write your code to perform a task on a specific item, or in a specific way. From variable names which mean something tangible, to methods designed to act on a pathway which is unique to your production setup. But, how do you make it more usable?

1 - Variable-Method names

I was taught to make my variable/method names mean something. This makes the code more readable.

$donut = q{jam donut};
...
eat($donut);

instead of

$d = q{jam donut};
...
eat($d);

This is a trivial example, but the principle is there.

However, you (read I) can take it too far. One such point is in directories. For the analysis pipeline, we end up with a directory, after a step called GERALD, called GERALD-date.

In the code, we put this into $gerald_dir. Sounds reasonable. Everywhere I read $gerald_dir, I know exactly what it represents.

However, here is the problem. What happens when the step and directory are renamed Harold. Whilst the principle is the same, and the same files are there, suddenly the variable name is wrong. Just grepping the filesystem won't find something like Gerald. At this point you are probably screaming at me, give it a semantic name, and document what it represents.

Exactly, but that is easy with internal local variable names, not so with public exposed method names. Suddenly I need the role to include a deprecation cycle for the replacement method names, aaahhh!

2 - Application/Locally specific

Anyone should argue that locally specific logic should exist as far up as possible, leaving it out of the generic process end logic as much as possible. No arguements there.

But how do you determine which is app specific, and which is generic.

Obviously, naming conventions are app specific. Or are they? Many things might need to know how to construct a filename in a particular way.

Ok, then how about directory structure? Again, you may find many apps wanting to access the Recalibrated data dir. They all need to know how to do get there.

This is, as you can see, quite a grey area. One where, as I am finding, I think the refactor into a generic role still needs to be a little more specific than might be first thought. You can't get to a directory without at least some knowledge of where it is likely to be. You can't open a file without at least some knowledge of how it is named.

Determine a row in a database - Some way of constructing a query to get it...

Paul Weller said "No one ever said it was gonna be easy", and I wouldnt want it any other way, but remember, if you want your code to be reusable, try to make names semantic, but not specific, and document what they represent.

Also, ensure you document why you process with particular assumptions. At least then, no-one can say you didn't tell them.

That is my plan at least (where I can) from now on.

Saturday, 10 October 2009

Just a bit of info

Whilst I don't have the code here to show this, an interesting thing that we found this week regarding a Moose attribute.

has q{attr} => (isa => q{Str}, is => q{rw});

This makes $class->attr() both reader and writer.

Now, it is mentioned in the manual that if you specify an attribute option, it will override the default by Moose, but what I was interested in was the error message.

has q{attr} => (isa => q{Str}, is => q{rw}, writer => q{_set_attr});

The error message you get here, if you try to use

$class->attr($some_string);

Is that you are trying to modify a read_only attribute. I was expecting it to have something different to this, since 'is => q{rw}'.

Using 'is => q{ro}' still allows you to use your private writer to set the value.

Personally, I always use 'is => q{ro}' unless I want the attr name to be the writer, but this is mildly interesting that setting a writer effectively overwrites the declaration of rw to be ro. Does this therefore make it more or less confusing to the user?

Looking at it from two different perspectives:

Public attribute setting:

Someone inspecting the code for this attribute would see rw and this would tell them, hey you can set this. Hopefully they would look for a writer option before diving in to use the attr name itself. But I can see the use of declaring rw in order to give some hint.

Private attribute setting:

Someone inspecting the code would see rw, and may assume that this means they should be able to set this attribute. However, the writer should only be used internally to the class, therefore giving a hint that it can, and perhaps should, be overwritten if needed. Of course, as with any situation, let the buyer beware, this functionality may change.
However, declaring ro, you are dropping the hint that outside of the class, if you didn't set it on construction, you really shouldn't be thinking of doing so.

I'll leave it up to someone else to determine a best practice. It was an interesting thing to discover. I shall stick with declaring 'is => q{ro}, writer => q{_set_attr}' since most of the time, it is private to the class to set anyway.

Monday, 5 October 2009

The Definitive Guide to Catalyst

I have just finished reading The Definitive Guide to Catalyst - Diment and Trout (Apress). As Catalyst Newbie, all I can say is buy and read this book.

It has lots of helpful advice and guidance, from creating your first application to extending Catalyst, and lots in between.

It also takes a good look at DBIx::Class, Moose and the Reaction framework, plus a good explanation of the MVC pattern.

I think this book is going to prove a valuable addition to our teams bookshelf, and be referenced for some time to come.

Pluggable pipelines - the return

On Friday, we deployed v5.0 of the new pluggable style analysis pipeline. This now includes a pluggable analysis pipeline, the archival file creation, and qc pipeline, and archival pipline.

With this new setup, we can now add additional steps (or remove them) easily in a matter of a few hours (usually actually minutes, but then we need those tests).

If you look back to my previous post, the style is there, and we have found it really works well. It is keeping the idea of individual application code separate from the pipeline code which runs it, and even loosely couples individual apps so they run more smoothly.

An additional bonus which has occured, without even trying, is that we have a significant improvement in run time. The reason is that, since we have queued all as separate jobs, instead of as batched makefiles, LSF runs them as it finds space. This means it doesn't need to find, or earmark, 8processors with max memory allowance, but just one or two with smaller memory, so they can be assigned more efficiently. This passes stuff through faster (assuming dependency trees pan out correctly), and so users can get the data quicker. Fantastic, and even better since we didn't plan for it.

No time to stop yet though, we have lots more features to add, but since we have managed to move successfully to this more agile structure, I don't see them taking long to deploy.

Here's to agile structures and development.

Saturday, 3 October 2009

Role reversal

So, in our further investigations into using Moose, I have started to look at using Roles to produce reusable code instead of base classes.

You can find out in the Moose::Manual pages on CPAN about setting them up, but what we had found initially confusing was how much we should make a role do. So our initial investigations just led us to produce ordinary Moose objects to do the functions, but not import them into the consuming objects.

So, we move further on, and find ourselves repeating code as we don't want to import some objects which mostly do other things, so what is the solution.

I'm now taking another look at Roles, to see if that might be the solution, but with another key factor. Keep the role doing as specific a thing as possible.

An example of this is that we often need an object to have methods to expose a run id, a short run name and a run_folder. We do have an object which can provide this, along with other functions, but it is not really reusable in some situations. So I have just done a very small role run::short_info, which can then be used to import just these methods in.

The result, a small amount of (what should be) very reusable code over most, if not all, of our frameworks.

The next one I am doing is run::path_info, which will hopefully give just the next level of features that many need, but not all.

This feels very much the right way to go now, and I hope that we will have a good amount of reusable code which will increase flexibility, and keep the code low maintenance. (Well, let's keep our fingers crossed).

Just now to hope that someone fixes the requires attribute/method in Moose:Role soon.