Tuesday 29 December 2009

Pragmatic Thinking and Learning

I have just finished reading Pragmatic Thinking and Learning - Refactor your Wetware by Andy Hunt. This is a very interesting book, which is well worth a read by anyone who is interested in:

a) Different ways of learning new skills
b) The way the mind works
c) The whole left side/right side debate
d) You just fancy a change from an ordinary software development textbook.

As I was reading it, a few things came to light that I did 'right' and 'wrong' last year.

Learning Moose and Erlang.

Last year I learnt Moose, and didn't learn Erlang. I set out to do both, but didn't manage it. Why? I am capable. I managed to learn to use oo perl, ruby and rails in 3 months, why not Moose and Erlang.

Simple, I had no opportunity to play with Erlang, and I had no defined targets (I want to write in Erlang using concurrency by ).

However, with Moose, it was completely different. I wanted to write a pluggable pipeline system using a modern oo perl system which others would be able to use easily. I had time to play as I learnt the basics, and then start building the structure of the framework as I learnt more.

Allowing time and scribbling.

I started to turn away from my screen and think a bit more. I also scribbled more. I created more scrap paper this year than the previous. I roughly mapped out where the code went. This worked. Thinking back to when I set up the QC application and database, I just got on and did it. That took a lot longer to achieve than it should (I got a stern warning from my bosses boss about that!) whereas the pipeline was in theory tougher, but took less than a quarter of the time in the end to get to it's initial start.

Both of these positives are described in the book as using R-mode to lead L-mode. Since I have seen some of this in action last year, I am going to try to use that in guiding me forward next year. I really need to learn some C. If there is likely to be space, I would like to learn Erlang still. However, the key is to be able to set a reason for doing so. I will.

So, first off C. Erlang isn't required for my job at this time, so I'll leave it for now.

My target for C. Finish going through the C book I have and then take a C script that my pluggable pipeline sets off, and try to work out how it all works, and see if I can refactor it. In the process of this, I should also try to write a small program to sort a user inputted list. This shoudl probably be my first mini-target once completing the book.

There are lots of other things in this book to follow. I think I probably need to reread it. I'll not write about more here now (I have a 4 year old desperate to assemble a Lego Star Wars toy with me) but I shall try to keep up to date my blogging more, and detail when it is something in from this book which has helped guide me towards it.

Friday 18 December 2009

Another year passes...

So, another work year has passed. With it including my 10th anniversary at The Sanger Institute, and 2 years as officially a software developer, and a change of team leader, its been a pretty full year.

So, what have I achieved:

Well, first off, I decided to split up a project which helped us a lot, as we have been able to be more agile in changes and bug fixes when dealing with parts of our pipelines and code.

Following the change of leader in our team, we started to spend some time looking at new technologies. I started to take a look message queues. See Stomping on the Rabbit and Pt2.
This was my first time at actually properly failing something. It felt good. I got lots of help from monadic and their team down in London relating to RabbitMQ, and we tried ActiveMQ as well. But the reason we failed it: after attempting a comprehensive test suite to test cases where servers go down, we decided that (at this time) it wasn't stable enough for passing around the code as reliably as we hoped. I spent about 3 weeks looking at this, but we believe that is time well spent. (Another team started to employ it directly, but continually had problems with their poller of a queue.)

I also spent some time at this time taking a look at Erlang, but since there didn't seem to be much opportunity to apply this at work, I stopped for the time being.

I then deployed the badminton ladder I was working on for the Sports and Social Club. See Playing Badminton through Clearpress. This has gone down well, although I do need to make some improvements to the interface.

We had been deliberating about making the analysis pipeline pluggable, so I took this task on. I also used it as a chance to take a good look at Moose as an O-O framework. I have pretty much spent the rest of the year doing this.

This had had me really excited, and probably the thing which has kept me most interested in my job for the last 6 months, and also one thing which proved I didn't waste my time splitting up the project earlier in the year.

After much discussion of whether to relaunch scripts, or load everything to LSF in one go, we went for something that does as much as it can, then submits jobs sequentially to LSF, but with code being as decoupled as possible. You can see in the powerpoint presentation Pluggable Pipelines the style we took. With the power of the Moose behind it utilising Roles and even developing my own CPAN extensions MooseX::File_or_DB::Storage and MooseX::AttributeCloner.

With this pluggable pipeline setup, of course it is now multiple times easier to add new stuff, particularly QC testing. So, we need a way to display the results. The new boss is keen on Catalyst, so I decided now is the time to look at this. Whilst this project has moved onto another team member, I gave it a bit of a start, initially to see how it would deal with getting stuff from a file server rather than a database 'Backending Catalyst'. A couple of book sources later and I get a working app up. In fact, really, the biggest challenge was tring to apply some css that I stole from two Rails apps.

I pass my 10 year anniversary without a hitch, it seems. It is funny to find that I have worked somewhere for 10 years. Although saying that, I have done a number of different roles here, from washing glassware to sequencing dna, to instituting the widespread use of Transposon Insertion Libraries in finishing, to Quality Control, to Software Development.

And then the slope to Christmas. The pipeline systems aim to be flexible has now been pushed to the limit, with virtually unreported changes to the 3rd party pipeline which it has had to keep up with - and whilst the code unavoidably has to change, we have spent most time discussing the planned changes to cope, and testing it, rather than coding.

I've also had to come to the decision that I think Class::Std is well and truly dead. RIP. You helped me become a better, O-O programmer, and you will be remembered with a fondness, but Moose has just dealt you too many blows.

Overall, it has been quite a productive agile year. I'm taking a look at the C language and trying to lobby to get a course run at work on Moose and Catalyst. Hopefully, these will be big things for me next year. I look forward to it.

Merry Christmas, and a Happy New Year to all.

Saturday 12 December 2009

Speedy does it.

In order to extend the flexibility to the pipeline, I developed MooseX::AttributeCloner (see CPAN) in order to pass around variables set on the original command line.

However, this has led to the need to redevelop some further loading scripts that had been originally written with Class::Std and Class::Accessor. Switching them to Moose has had a two-fold effect that I have been rather happy with.

1) Less code. Since the Roles that I had written for the pipeline (but made sure I left as potentially 'common code'), consuming these ditched a substantial amount of code. Woohoo!

2) Faster. The loading of the data runs exceptionally fast now. The dropping of the need to generate extra objects (since code was refactored into consumed roles) has increased the speed of data lookups. I haven't truly benchmarked the loading of the data, but something that was taking a couple of minutes to run through completely, now takes mere seconds. In fact, the logs don't show any change in the timestamp used in each of the print statements.

The longer of the two refactored scripts had a reduction of nearly half the code. The shorter about 45%. The slow parts (conversion to xml and database loading) of the long script needs some refactoring of how it loads. (drop xml mid stage) but now it seems that some improvements have certainly been achieved with the switch to Moose.

Another factor which has also been able to speed this up is the fact that, if the pipeline has told the scripts lots of information (such as file system paths, filenames, etc) then the code doesn't need to work it out again. A definite advantage of the combination of ustilising MooseX::Getopt and MooseX::AttributeCloner.

Saturday 5 December 2009

Always change the branch!

Again, this one of those blogs which have shown some numptiness by me.

Why did some things break, when they were working yesterday.

We have a lot of file passing to do, and unfortunately, a third party analysis pipeline starts changing file locations and filenames, such that we have to remain agile to cope with the third party changes.

So, we discover this happening and quickly fix our pipeline to cope with a filename change. Deploy and go, everything is working as expected.

A new release of the pipeline occurs from the development branch into trunk, and then deploy. But suddenly, the files aren't being found as expected. What has happened?

Quite simply, I didn't make a corresponding change in the development branch for the original bugfix. 4 hours of developer time spent trying to track this down for something that I broke my own rule.

Always change the branch/backport trunk to ensure that the bug is fixed everywhere.

Sometimes, just trying to be too agile, sorry read trying to fix a bug fast, just causes the bug to get reimplemented further down the line. Take your time Brown! A little caution and doing it right will make it better.