Thursday, 18 December 2008

Running Alone

John in my team at work has just pointed me at a great little CPAN module for ensuring that a script can only have one copy running at a time.

In our team, we have a lot of potentially slow running processes, which are often run as cron jobs. Now, ordinarily, you would pick cron start times
that ensure that you have finished any previous run of the script. But it is quite common for us to want to run as soon as possible, but not if the
previous run hasn't finished.

One method previously suggested was to touch a file in /tmp and then delete it when finished, wrapping the cron caller in a command not to run if
the touched file is present. However, what happens if /tmp is cleared out (which has happened to me!)

So, John found Sys::RunAlone. If the format of your script is to run a main method, and you put __END__ or __DATA__ at the bottom of the script,
then this can lock the text, and then knows no to exit if it finds a lock on the section. This means you could have your cronjob running every 30 minutes
and not worry if the script takes 5 minutes in a cycle, or 45 minutes.

A definite benefit.

A big thanks to Elizabeth Mattijsen for writing this very useful module.

(A quick example of this running is below. If you try to run the same script in another terminal before the first has finished it's sleep, you will get
the correct error).

#!/usr/bin/perl -wT
use strict;
use warnings;
use Sys::RunAlone;


sub main {
warn 'Hello ...';
warn 'world!';

sub go_to_sleep {
my $sleepytime = shift;
sleep $sleepytime;


Blog entry

Data Formats

Check out this SlideShare Presentation:
Data Formats
View SlideShare presentation or Upload your own. (tags: csv tsv)

This presentation was given by me to foomongers to start off a bit of discussion on different data transfer formats.

Saturday, 18 October 2008

Disk is cheap?

I work at the Wellcome Trust Sanger Institute, where we do a lot of processing of raw sequence data, particularly from the Next Gen sequencing machines. Over any month period we can have 320Tb of data sitting around on on our sequencing farm. The run data then gets tarballed up and moved off into repositories.

Currently, Gouying and I have been working on a MySQL database to store the QC data from the runs, so that we can get rid of the Tarballs (each themselves around 60Gb), and allow easier access to the data. Note, this is not the DNA sequence, that is being stored elsewhere. The expected annual insert into this is likely to be around 50Tb at current Illumina machine levels (and we are planning to get more!).

Roger and others have been working on storing the images for the runs longer term. This has meant moving them into an Oracle database, which means that we should have them for perhaps a year, rather than around 3 weeks. This is around 100 (tiles) * 36 (cycles) * 2 (run pair) * 8 (lanes on a chip) images per run tiff images.

Speaking with Ed at work, we discussed what was mentioned at our last Town meeting. Ed used to work for IBM, and he talked me through a bit of how processors have developed and why they are now likely to be at their limit of capability, hence going to multiple core machines. He therefore raised a good question at the town meeting - is it cheaper to store the data from the runs long term, or just rerun the sequencing?

At the moment, it costs 2% of the cost of a run to keep the data on disk, so that answers the question. Or does it?

Torvus Linalds has been quoted as saying that disk is cheap, and yes, upgrading your desk/laptop to 4x the memory is pretty cheap. However, you still need to ensure you have space for it. And there is one thing that is certainly the case, space isn't cheap. All those disks have a physical footprint. (and we could very soon run out of space in the data centre)

They also have to be managed, and people aren't cheap. We have a very good systems team, that are on 24hr callout, but that costs money.

So, it is very much a trade off. The cost of resequencing is very expensive at the moment, and storage of data is cheap, but if the next gen sequencing costs come down, then it may become very much the case just to store a plate of DNA and resequence it every time you want data from it, rather than long term storage of 4Tb+ of data.

This may be a long way off, but if the $1000 genome becomes a reality, then I think that it may change. Is this a good thing? We shall see.

Wednesday, 8 October 2008

Ajax - Web 2.0 Primer course.

So last week I went on a four day course to find out more about using Ajax to make web2.0 sites. The course was actually 2x4 day courses attempted to be fitted in to 1x4 day course.

For me, the first course was pretty simple, it was supposedly the prerequisite course fro the Ajax primer. We covered the Javascript basics, but in order to move on, the course instructor assumed our knowledge of HTML and css. For me this was OK, but some people didn't seem too up with the whole css selector specs instead of using inline markup. However, this did 'get' us through the course in a day.

We then jumped to the Ajax course. As I have blogged before, earlier this year I bought Pragmatic Ajax: A web2.0 primer ( Most of the second course was actually summed up in this book (although I did learn more about some Best Practices). We went through from making an xhr object that is cross-browser compatible, to starting our own library, and then we did some examples using jQuery, Prototype and Scriptaculous.

The course was from Learning Tree, and the instructor was very knowledgable, showing us some other books that we might want to read, primarily Douglas Crockford - Javascript, The best bits ( and John Resig - Pro Javascript Techniques (apress).

The thing that interested me the most was probably something that had the least time spent on it. For my toy app, YADB, I wanted to be able to drag and drop cards from either a card list or inventory list into a deck container. We were shown how to do this using Scriptaculous (quite fortunate, considering YADB is Ruby on Rails).

Work wise, it was good to be shown exactly how to look at blackboxing functions, and how to pass JSON and objects around.

Two disappointing features of the course - example code snippets did not follow what 'Best Practices Guy' was saying, sometimes even in the same slide, and some of the exercises on the computer didn't match the code in the book.

Also, no fault of the instructor, but somewhere along the line, they hadn't explained to Learning Tree that most people on my site primarily use Perl,
so some of us were a bit stuck when it came to modifying some of the server side code, as we had little/no experience of the 3 they had chosen (jsp, php and .net)

Overall, a good course, that could have been better, but worth going on.

I must contact Learning Tree though about doing the 2 module tests for them though.

An aside to this, why is everything PHP. I had a couple of email links come though about web app jobs coming up, and everyone wants PHP. I could be a bit biased, but certainly a lot of people I know think that PHP is probably one of the worst technologies to come along for a while. Certainly one thing that has been blogged on Perl Buzz is that you can't bug report anything from an older version than the current release, which seems a bit pointless to me. However, I am not going to go into it now, I think I just clearly need to learn it. Better hit the bookshelves!

Sunday, 3 August 2008


So on Friday I attended the second barcamb, at the Wellcome Trust Genome Campus. For those not in the know, a barcamp is an unconference. People turn up with stuff to talk about, and a plan for the day is organised over coffee and biscuits at the start of the day.

It was very good interesting day. Many of the faces we saw last year came again, and gave either updates, or new talks, and we had soem new people as well, including a chap who was using the internet to help the Neighbourhood Watch in his village.

Simon Ford brought MBED with him again, showing some more of the exciting stuff he had been doing with it (including something at a 24hr hackathon, which uses a social idea to move packages to their destination).

Unfortunately, I can't remember many peoples names, but is was great to talk with so many of you, and I was rather surprised that my quick unplanned 10minuter on 'It's Too Much Information for ME!' seemed to generate a lot of little discussions with me. It's nice to see that either other people have had the same problem, or that people realise that you end up needing to program for other peoples lack of foresight/rushing development code into production. My talk followed quite nicely from Nava Whiteford's talk on the Swift Analysis pipeline, and Matthew Astleys ad-hoc talk/discussion on Panic Driven Development.

I have some photo's which I have uploaded to Facebook. Please feel free to have a look and tag yourself (or anyone else you know) in any of them.



Friday, 18 July 2008

Perl::Critic - new release

So a new release of Perl::Critic was released, and all I want to say is what a faff.

Some key new features:

1) You must check the return value of all eval statement - don't rely on $EVAL_ERROR/$@

Now, this is a good thing(tm) but it does have some pitfalls, such as where you might be evalling a transactional commit.

eval {
$transaction_state and $dbh->commit();
} or do {
...some 'croak' code

The problem here is that the eval isn't a croak if $transaction_state is 0 (as it could be inside a much larger transaction), but the return code would be 0, therefore firing off your 'croak' code.

So, to get round this, you need to return a true value after the statement;

eval {
$transaction_state and $dbh->commit();
} or do {
...some 'croak' code

A bit off a faff, but it is better code for it.

2) Declaration of all numbers other than 0,1 and 2

So, as it could be difficult to understand what a number represents when reading the code, you now need to use Readonly to declare any numbers with a named variable at the top of your code.

So now instead of

$percentage = $x * 100/$y;

You need

use Readonly;
Readonly our $PERCENTAGE_MAKER => 100;
$percentage = $x * $PERCENTAGE_MAKER/$y;

This 'might' make sense to odd numbers floating around, but it also applies to indices on arrays. So if you want the 5th element on an array intstead of requesting

$wanted = $array[4];

You now need

use Readonly;
Readonly our $FIFTH_ARRAY_ELEMENT => 4;
$wanted = $array[$FIFTH_ARRAY_ELEMENT];

Now, I admit it is often bad to pick out many individual elements from an array by specific number, but it will seriously clutter code where this may be necessary. I admit, for a couple of lines in my code, I have now used ##no critic at the end.

I also wonder why specifically 0,1 and 2 get let off. If the problem is that you don't know if 60 means

i) minutes in an hour
ii) seconds in a minute
iii) degrees in the angle of an equilateral triangle

then 0 could mean off, false, nothing, 0
1 could mean on, true, positive, 1
2 could mean wheels on a bicycle, eyes on a face, hands, 2

Just some examples where I imagine it is because they are the most heavily used for certain features, it would almost impossible to change them (especially 1 as the return value of a perl module)

3) Declaring a variable, which you never (appear to) use

This is annoying if, like me, you use Class::Std to create inside out objects, as you need to declare a hash for your attributes, but this has is never referred to again in the code.

Now, whilst I understand that if you don't use a variable, don't declare it, in this case you are using it, just via the Class::Std methods of the accessors created. However, much like the declaration of numbers, it isn't looking at the context in which you use the declared variable. Again, in this case I have had to wrap these with a ## no critic {} ## use critic in order to not have it fail.


So as I said at the beginning, all I want to say is what a faff and the reason is as follows:

We are using Perl::Critic extensively here in New Pipeline Development, and it does force our hand to a Good Coding Practice, but I can't help but wonder here if a couple of these new standards are just a little too overzealous, and causing some things to be overcritised (i.e. a faff).

As with many things, it will take time to get used to programming in advance of critic'ing the code, but I think some of these new features need a little tweaking.

Friday, 20 June 2008

Class::Std or Blessed Hash

Objects, Objects, Objects

Everything is objects these days, well, certainly in the world of agile, well structured, extensible, easy to maintain BioInformatics software.

Even Perl6 is aiming to be OO. Probably because of the fact that so many of the modules on CPAN at least expose an OO layer, if not are only OO.

When I started programming PERL, I was writing straight forward top to bottom scripts.

I then moved on to using and producing code in modules, but just exporting the subroutines into the script that used it, for simple code reuse.

Last summer, I got finally taught with hands on development of exactly how OO works and is used. I got a bit confused, but at least I had none of the confusion of

$him = Person->new({args});
$her = $him->new({args});

Which implies a relationship which 'is not there'.

Last summer I discovered Class:Std, which I think is probably my favourite CPAN module of all time. Why?

Well this is the thing. PERL is not an OO language, and it isn't slower because of it. I also learnt Ruby on Rails (as I mentioned in a previous post) and Ruby is slower because everything is an object. Something that clearly sets the two languaged apart.

Now, that isn't the thing that bugs me about OO. In fact, I have learned to embrace PERL OO, and enjoy programming in it. But what does bug me, is that the vast majority of PERL OO breaks encapsulation because all most objects are are HASHes. You have a new constructor, which blesses the package name around a HASH reference. So, when all is said and done, whilst good packages have constructors written to expose the stored data within the object via method calls, you can just access a lot of it via a key.

$him->eye_colour() is equivalent to $him->{eye_colour}

and this encourages lazy programming, because the other advantage is that you can just say 'I need to store some data, what should I do with it, as Person doesn't have an address accessor'

Now, presumably Person does have something that links it to Address. Perhaps Address and Person both have an id_person accessor. But you can cheat. If you want to grab address now, and cache it for later, just do

$person->{address} = $address->house_and_street();

The you can drop the address object, and person now knows exactly where they live.

However, this is dangerous, because

1) Have you deleted something specifically stored in key address
2) What if they move whilst person object is still in memory. You have two places to correct the data.

Why, I hear you cry - I won't do that with my program. No, but someone else will (or you will forget).

Solution use Class::Std;

Class:Std enforces encapsulation. You still get a blessed package, but this time it is a SCALAR, which can't have keys.

You then in each package declare what accessors you want the object to have, and as such enforce people to only use those accessors. You don't have to worry about AUTOLOAD in the history of used modules, as Class::Std handles creating you accessors. You don't even need a new constructor, although you can add a BUILD method which will operate at construction.

So in my example

package Person;

use Class::Std;

my %eye_colour_of :ATTR( 'init_arg' => eye_colour, :get<eye_colour>, :set<eye_colour>);

my $him = Person->new({eye_colour => 'blue'});

Job done. Less code for initial construction than blessing via new, and you cannot be tempted to throw the address onto the person when it is being used, as

print $him = Person=SCALAR(0x9f2c68)

So, unless you specify in the code (documented and tested, of course) that you want an accessor which allows this object to store the address, it can't be done, and your later code is more robust for it.

Now, where am I going with all of this?

Well, I use Clearpress to form a base for my PERL web apps in my current role. It is a good solid platform which I have mentioned before, and I am very happy to work within it. However, I am writing an API to use the services it provides. Clearpress doesn't use Class::Std. My API does. This is no problem as they talk via LWP::UserAgent requests, but it is quite confusing as the live in the same project in subversion. And my big thing is that I am programming both at the same time. This is bad news, as I have been trying to use features of one type of Object with the other. It hasn't really made a significant difference, as the package name reminds me which I should be using, but is is wierd getting the error when you try to cheat, and use a key to cache some info in the Class::Std object, as it is a scalar.

So, from this, I am going to finish the project in the way I have started it, but I think from now on there is one golden rule:

Use only one type of object, and just ensure you enforce encapsulation by the way you program - don't get lazy.

Now, to convince my boss to refactor Clearpress into Class::Std...

Wednesday, 7 May 2008

To scroll within or out

So an interesting thing arose out of my work recently, both NPG and a personal project, which is how to scroll tables.

In NPG, we have many tables of data, from run information, to search results, to instrument information. These are generated from templates and render to the browser.

In V:YaDB, I have the same issue, as well in excess of 2000 cards is quite a bit to scroll through, but again it is templated.

Following good practice and webstandards, we are all using


or at least we should be. Most modern browsers insert this anyway, but putting it in gives you some additional css tags, and ensures you and future developers of your project know what is going where (and for anyone not familiar with this, <tfoot> should be before <tbody>, although it is optional to have a footer to your table).

Anyway, with this in mind, it should be easy to just put the following css in to make the body part of your table scroll, therefore leaving the head fixed, to keep column header visible

tbody {height:300px;overflow:auto;overflow-x:hidden;}

However, this only works in the firefox browser. What is going on there? But it is true. I spent some time this weekend on IE7 and Safari and found this to be true.

So, back to square one. Searching google found a lot of real hacks, from serious amounts of Javascript, to running different css files dependent on browser (including separate versions for IE6 and IE7). Madness.

One idea I came across that I liked the most though was the idea of two table renders. Whilst this means it is only really suitable for quick to render tables, this is something that is possible.

Now, there are two options here.

1) Using declared fixed width columns, produce one table which only has the thead part of your table. Then immediately beneath it produce the table with the data in. This one could be a bigger table because it only has to render data once, but you have no flexibility should you need to add a new column of data.

2) In a fixed height div, render two copies of the table using absolute positioning over each other. Then wrap each in it's own div labelled with an id. Using using z-align:1; for the one you want to scroll, and z-align:2; for the head, and fix the height of the head div to that only the head row is shown, and set overflow:hidden; Set the height of the scroll table to the height of the outer div, and set overflow:auto; overflow-x:hidden;

(with 2, obviously, you could also fix the width, and for both show the overflow-x)

Also with both, you need to declare a spacer column, which will then ensure room for your scrollbar.

I personally prefer 2, which I managed to create some css to produce a nice effect with the fact that with 2 tables, I was able to manipulate the header style without needing to worry about if it affected the rest of the data, and also revealed the bottom border only to give a ruled effect.

The downside to 2 is that anyone with css turned off will end up viewing two copies of your table, but hey, no-one should be turning off css or javascript in their browsers, and if you know someone who does, 'send the boyz round to ave a wurd'.

I think that I am going to expand scrumptious to have javascript and css effects, and this will be the first css effect in it. I'll let you know when the sourceforge svn trunk is updated.

However, start lobbying your local MP today to get the simplest option put into your favourite browser, or if your fave is already firefox/iceweasel, then at least IE and Safari.

Please note: I have nothing against Opera, Camino or any other browsers out there, I just don't use them on a regular basis.

Wednesday, 30 April 2008

History Meme

So I got tagged to do this by my boss Roger, and a lot of my work colleagues are doing it. So here is the result from my macbook:

history|awk '{print $2}'|sort|uniq -c|sort -rn|head

167 prove
62 cd
55 svn
45 cover
42 ls
32 rake
28 mate
11 make
8 ./bin/apachectl
7 sudo

So I am testing quite a lot on my machine (prove and cover) (always trying for test driven development is the reason for this). cd is obvious. svn is vitally important.

I am surprised rake turns up more than make though

More service soon.


Monday, 7 April 2008

Javascript for all

So on Friday, as a gentle way of trying to get back into work mode after the Rails course had finished, I started
by trying to refactor out a lot of javascript from the templates.

I have just bought 'Pragmatic Ajax (A Web 2.0 Primer)' from The Pragmatic Programmers. It is a very interesting read, and
inspired me to 'get the code out of the view'

It's true to say, that we have been quite lax in simply putting <script> tags in with fairly specialised javascript
functions, which don't really need any variables passed to them (as the function ends up with the paths and div ids hard=coded).

Well, I managed to refactor out most of the functions that we had written, and with a few additions to variables being passed to them, managed to reduce the number of some function (or make them more genericised for future reference). I even discovered a slight problem with my scrumptious.js which I need to tweak and document.

The great thing is that we have now reduced the code in the views. This makes the views easier to read and keep upto date.

There are still a few functions which I should be able to refactor, but I need to find out a couple of extra things first.

I've only got through the first 3 chapters of 'Pragmatic Ajax' so far, but it has explained a bit that so far I hadn't known from just my experience learning some Ajax through RoRails. Chapter 1 explains about what Ajax is, Chapter 2 shows you how to develop Ajaxian Maps (a google maps clone). Then it has started to go into the Nitty Gritty details of Ajax and Client-side Javascript.

However, so far the javascript examples have all been written in the html head, rather than in a separate .js file. I imagine (hope) that this will change in a best practice suggestion. I'm also hoping it will show a bit on testing javascript, which so far is something that I haven't done.

My experience of programming books has led me to find the Pragmatic Programmers books are a great way of finding the information in an easy to read style. So far, Pragmatic Ajax is a good book and hasn't let me down in it's style and (most importantly) content.

Thursday, 3 April 2008

Advancing with Rails course - Day 4 pt 3

So, the final bit of the course has been looking at integration testing

objective is to go through the processes to a conclusion, i.e.

login >>
attempt to bid on an auction you started >>

login >>
bid on auction as highest bidder >>

login >>
bid on auction >>

and so on

can cross controllers which is why next level up from functional tests. May be more than one/two asserts as
they are linked and this is useful to ensure it doesn't bother trying tests it can't even get to

very good place to often refactor heavily

integration testing routes

def test_show_route
assert_recognizes({:controller => :auctions,
:action => :show,
:id => "1"}, auction_path(1)) <==named routes
def test_generate_route
assert_generates("/auctions/3", :controller => :auctions,
:action => :show,
:id => "3")

also can assert_routes

you could get back responses to look at when rjs is rendered

irb>> app.get("/auctions/destroy/1")
>> 200
irb>> app.response.body
(output of rjs file)

rcov tool

coverage of code with tests


runs in another terminal window, and everytime you make and save a change to a file, it works out the tests which are affected and reruns those tests.

Advancing with Rails course - Day 4 pt 2


How to install and generate a plugin

guest095:yadb ajb$ script/plugin install
+ ./ChangeLog
+ ./lib/annotate_models.rb
+ ./tasks/annotate_models_tasks.rake
guest095:yadb ajb$ ls vendor/plugins/
annotate_models/ nested_has_many_through/
guest095:yadb ajb$ ls vendor/plugins/annotate_models/
ChangeLog README lib/ tasks/
guest095:yadb ajb$ ls vendor/plugins/annotate_models/tasks/annotate_models_tasks.rake
guest095:yadb ajb$

adds schema info to the top of your model files

guest095:yadb ajb$ rake annotate_models
(in /Users/ajb/dev/vtes/yadb)
Annotating Card
Annotating CardDiscipline
Annotating Clan
Unable to annotate Clan: Could not find table 'clans'
Annotating CostType
Annotating Deck
Annotating Discipline
Annotating Minion


# == Schema Information
# Schema version: 6
# Table name: cards
# id :integer not null, primary key
# name :string(255)
# text :string(255)
# requirements :string(255)
# cost :integer
# cost_type_id :integer
# minion_id :integer
# deck_id :integer
# discipline_id :integer

class Card < ActiveRecord::Base
belongs_to :deck
belongs_to :cost_type
belongs_to :minion
has_many :card_disciplines
has_many :disciplines, :through => :card_disciplines
belongs_to :vampire, :class_name => "Minion", :conditions => " = 'vampire'"
belongs_to :werewolf, :class_name => "Minion", :conditions => " = 'werewolf'"

validates_presence_of :name
validates_uniqueness_of :name
validates_presence_of :deck_id


writing a plugin

guest095:yadb ajb$ script/generate plugin nice_error_fields
create vendor/plugins/nice_error_fields/lib
create vendor/plugins/nice_error_fields/tasks
create vendor/plugins/nice_error_fields/test
create vendor/plugins/nice_error_fields/README
create vendor/plugins/nice_error_fields/MIT-LICENSE
create vendor/plugins/nice_error_fields/Rakefile
create vendor/plugins/nice_error_fields/init.rb
create vendor/plugins/nice_error_fields/install.rb
create vendor/plugins/nice_error_fields/uninstall.rb
create vendor/plugins/nice_error_fields/lib/nice_error_fields.rb
create vendor/plugins/nice_error_fields/tasks/nice_error_fields_tasks.rake
create vendor/plugins/nice_error_fields/test/nice_error_fields_test.rb

guest095:yadb ajb$ cd vendor/plugins/nice_error_fields/
guest095:nice_error_fields ajb$ ls
MIT-LICENSE Rakefile install.rb tasks uninstall.rb
README init.rb lib test

install.rb needs the require 'nice_error_fields.rb'
and the lib directory is added to the path

the init.rb files are all read (so no need to create an initializer) and all the lib paths are added

Advancing with Rails course - Day 4 pt 1

Forms first off, a bit of idioms created based on RESTful:

forms if you where to put in def new to render edit, when rendered it knows to post to create rather than edit,
and know is calling edit to put to update

AciveRecord can query objects via the new_record? method (@thing.new_record?), which means that it can determine whether it is a new object and so RESTfully can determine whether to create or update

David doesn't like this as it has the feeling that it is not very elegant. It is over economising. Use a partial if there is >serious overlap, and then have different templates, which can have extra bits.

(This was originally a plugin simply_helpful, now core and only works with REST)



You can

script/generate controller admin/things

in routes

map.namespace :admin do |n|
n.resources :things

this gives a set of restful routes like new_admin_thing
with a
parallel tree of views app/views/admin/things/

Then onto some more Ruby

Procs and Callbacks

Ruby: proc (function) objects

anonymous functions which are themselves objects (and can bepassed around, etc)

Proc can remember a local variable within itself, rather than it going out of scope and being garbage collected

guest095:yadb ajb$ irb
>> {}
=> #<Proc:0x00000000@(irb):1>
>> lambda {}
=> #<Proc:0x00000000@(irb):2>
>> proc {}
=> #<Proc:0x00000000@(irb):3>

lambda and proc are synonyms, but different from

code blocks can be captured within a def method

p = lambda {|x| puts x * 10}
array = [1,2,3,4,5,]

def convert(n, &block) <-- special & argument syntax captures code block as a Proc object

puts convert (10) { |x| x * 30 }


def convert(n)
if block_given?
n * 2

good unless you need to objectify the method for some reason

it is a closure on the variables that exist around where it is created
>> y= 1; [1,2,3].each {|x| puts x * 10; puts y; y += 1 }
=> [1, 2, 3]
>> puts y
=> nil

but leaves alone ones created after its creation

>> class Counter
>> def self.create(n=0, inc=1)
>> return{ n += inc; n - inc }
>> end
>> end
=> nil
>> c = Counter.create
=> #<Proc:0x0059f330@(irb):3>
>> puts
=> nil
>> puts
=> nil
>> n = 222
=> 222
>> c = Counter.create
=> #<Proc:0x0059f330@(irb):3>
>> puts
=> nil
>> puts
=> nil
>> c = Counter.create(5,5)
=> #<Proc:0x0059f330@(irb):3>
>> puts
=> nil
>> puts
=> nil
>> puts n
=> nil

you can write code blocks with {} or do - end, but they are not interchangeable

>> puts [1,2,3].map {|x| x * 10 }
=> nil
>> puts [1,2,3].map do |x| x * 10 end
=> nil

you can write a method to warn about the presence of a block (or not), but not every ruby method will do this

>> def m; raise "A block!" if block_given?; end
=> nil
>> m
=> nil
>> m {}
RuntimeError: A block!
from (irb):21:in `m'
from (irb):23

>> m do |x| x*10 end
RuntimeError: A block!
from (irb):27:in `m'
from (irb):28

>> def m; raise "A block!" if !block_given?; end
=> nil
>> m {}
=> nil
>> m
RuntimeError: A block!
from (irb):24:in `m'
from (irb):26

Built in callbacks
Modules: - included
Classes: - inherited

>> module M
>> def talk; puts "Hi!"; end
>> end
=> nil
NoMethodError: undefined method `new' for M:Module
from (irb):4

you can't instantiate a module, but they are good for mixins

you can include Classes in Modules

>> module Violin
>> class String; end
>> end
=> nil

modules can give their methods to a class, via include

?> class Person
>> include M
>> end
=> Person
>> andy =
=> #<Person:0x59b71c>
=> nil

Classes - inherited

>> class Furniture
>> def self.inherited(c)
>> puts "#{self} has been inherited by class #{c}"
>> end
>> end
=> nil
?> class Chair < Furniture
>> end
Furniture has been inherited by class Chair
=> nil


?> module N
>> def walk; puts "I am walking!"; end
>> end
=> nil
>> Person.ancestors
=> [Person, M, Object, Kernel]
>> andy.walk
NoMethodError: undefined method `walk' for #<Person:0x59b71c>
from (irb):29
>> andy.extend(N)
=> #<Person:0x59b71c>
>> andy.walk
I am walking!
=> nil

use extend via a module if you want to replace core methods as this is a low impact way

>> module M;def shout;puts "HI!!!!";end;end
=> nil
>> class C;end
=> nil
>> C.extend(M)
=> C
>> C.shout
=> nil

Classes are objects. It extends the singleton methods on the object

equivalent to

class << C; include M; end

extend gives class methods, include are instance methods, all though David wasn't really sure what that ultimately would mean

Wednesday, 2 April 2008

Advancing with Rails course - Day 3 - pt3

Final blog for day 3 (I'll probably end up summarising a lot of this is a presentation to foomongers, so I'll post the slides if I do. However, for now:

We started to look at REST.

Just a few notes- not in any structure. A good resource to read is Roy T. Fielding's dissertation (

Representational State Transfer

request sequence (slide 198)

by default

link_to => GET
form_for => POST

link_to with movie_path => GET
link_to using edit_movie_path => GET

as browsers don't issue PUT and DELETE requests, rails cheats with:

to get PUT:
form_for with :url => movie_path(@movie) and :html => (:method => "put")
you get => method="post", intput type="hidden" name="_method" value="put"

to get DELETE
link_to wih item_path(@item), :method => "delete"
you get => DELETE (but wrapped in a form so that spiders don't follow it)

REST and CRUD have no actual real connection, but when brought into rails, then they meet up,
because it has been orchestrated

And then finally during the day onto some Ajax stuff


Ajax request happens overall with in the cycle of a request

link_to_remote "Click me", :url => {....}
-> goes to server, does stuff C/A -> sends back to client

typically, not a whole cycle, as only sends back a snippet/fragment which you want to drop into your document

link_to_remote "Click me", :url => {:action => click_me}, :update => "mydiv"

so drops it into a div entitled mydiv

def click_me
render :partial => "clicked"

renders the partial and drops it in

link_to_remote "Click me", :url => {:action => click_me}, :div => "mydiv"

as no update

def click_me
@div = params[:div]
@del = Auction.delete(params[:id])

hands off to click_me.rjs

page.alert("Destroy operation failed")
page.visual_effect :shrink, @div, :duration =>1
page.delay(2) do
page.alert("That auction is history!")

page sends back the javascript to do what you want

form_remote_for :item, :url => {:action => :update}, :update => "mydiv"

flash is normally waiting for the next request cycle[:notice] will have it come through to the ajx response there and then, rather than wait for the page refresh

raise request.xhr? method which means is this an ajax request, which means that you can fork on the request.

name.html.erb will take preference over name.rjs, so you either shouldn't have them with the same name,
or render rjs directly in the controller

render :update do |page|

which would bypass that issue

you can put rjs directly in the view to get javascript directly in the page

xhr requests are post by default, so if you are requesting a view only allowed by get, put or delete, then you need to explicity state this in the link_to_remote tag

Advancing with Rails course - Day 3 - pt2

Rails Routing

routing system does 2 things

1) recognises and inteprets URLs
2) generating URLs and path strings in your views

- goals to determine controller and action
store other values in params hash

- arguments to link_to, form_for, etc
- for redirection in controllers
anywhere you need a path or URL in your actual code

top-level route

map.root :controller => "cards"

link_to "top", root_path

map.connect ':controller/:action/:id'

shows what you want your URL to look like

routing system only makes sense to itself -> only the routing system can know what the URL means if it generated it.

routing system doesn't know how to resolve for another application

model doesn't know the controller which is manipulating it.

to inject css into a link_to

<%= link_to "log in", {:controller => 'this', :action => 'that'}, :class => "blah" %>

hard coding

map.connect 'help',
:controller => "leagues",
;action => "assist"

this will resolve to the specific path leagues/assist
and will only resolve /help as that

map.connect 'help/:controller', :action => "assist"

resolves controller in to the path wibble/assist

extra wildcards can also be added and matched positionally, which are then stored in params
:id is special case though, which allows it to be nil, whereas anything else needs to specified,
or will find no route (unless you use globbing)

you can constrain the wildcards with pattern matching, etc and have many of these, but orderis important
(higher up the routes.rb file, will hit first)

You may need to catch a routing error.

current and default values

missing components get defaults

:controller and :action from current ones
:id, :topic, etc from params hash

default gets turned off after the first one you specify

named routes:

map.<name> "name",
:controller => :x,
:action => :y

then allows
redirect_to name_url
link_to "text" name_path

map.vampires 'vampire/:name/:capacity',
:controller => :cards,
:action => :vampires

<%= link_to "#{vampire} with capacity #{capacity}",
vampires_path :name => vampire, :capacity => capacity %>

<%= link_to "#{vampire} with capacity #{capacity}",
vampires_path (vampire, capacity) %>
(walks through the list into the url space)

Named routes and CRUD

these can be expressive and encourage good action names

cluster thinking around the operation (borrow book is create loan, renew is update, return is destroy/delete, view all loans is read)

Advancing with Rails course - Day 3 - pt1

protect_from_forgery -> ensures that posts are only acted on if from your application

session hash is now stored as a cookie on the client side
typical to put in a user_id

before_filter s : anything is true unless false and will stop processing if it gets false,
so explicitly return true

attr_accessible and attr_protected

in model

attr_protected :admin, :hashed_password
black list of stuff which can't be changed

attr_accessible :price, :size
white list of stuff which can be changed

however, if not in attr_accessible and this is declared, can't be updated

needs maintenance by hand, but pretty cheap


item = Item.find(:first, :conditions => ["material = ?", untrusted_input])

Don't know how the database does boolean
User.find(:first, :conditions => ["admin = ?", true])

for doing in
User.find(:first, :conditions => ["email in (?)", ["a@c", "b@d]])

User.find(:first, :conditions => ["created_at < ?",])

method h

<%=h %>

when there is the chance that it could be html which could get injected, which then escapes the html and so doesn't drag in a <script> or <img>

Advancing with Rails course - Day 2

April 1st

So, a lot to blog about from day 2. Don't feel you need to read it all, and some may hardly be a revelation to most, but part of the purpose of this blog is also for my own record of what occurred and what I learnt, so you'll have to bear with me.

We did a lot of working within our own apps today, doing little bits of lecture, and then seeing if it can be applied.
I seem to have picked a good little app to work on, as it does forms, and needs associations, which was some of what we worked on today. (Looking through my notes again, please note that some of these are my interpretations of the discussions an what notes I grabbed, so may not be the whole truth, or even everything David said)

The first lecture bit started on the "Request Cycle".

tThis was hardly a revelation to myself, but was nice to see it formalised. Basically, all rails apps work on the idea of a request cycle.

A user clicks somewhere (to launch the application, a hyperlink, sends a form...)
Mongrel sets off a dispatcher, which goes to routing system to load share (if your server is set up this way)
then calls controller/action/params[x]

i.e. items/show/1
-> contr/act/params[:id]

It then takes items and adds up to make it items_controller.rb and looks for ItemsController class.

This is now just all in the hands of the ruby interpreter, the server is just waiting for a response, which it then sends back to the user.

The server then cares nothing about what happens until it receives another request from the user. Quite obvious, but it explains why new instances of the objects are created each time, whereas within a gui app, the program loads once, and then there are very few times that a new instance of an object would get created. However, the application expects further user input (even if it is just a kill command), whereas the Rails app doesn't. Even if you request a form, the rails app doesn't actually expect you to fill it in and send it back.

So what is happening during the ruby interpreter phase.

firstly it creates an instance or object of the controller

controller =

and then looks for an action on that instance of the given name

passing in the params[:id] if given

now by default active record objects know that they should render a view with the same name, but you can use the render method to render another template. They also know how to render themselves as xml by default.

render :xml => @item

or you could use a redirect, but this starts a new request cycle (sends a 302 to the user/client, which the goes back to mongrel), which therefore needs to reassign and item id (and any other params that might need to pass through)

(update doesn't typically have an update.html.erb, as you usually want a redirect or to render the form again)

render uses the changed values, whereas redirect sets a new instance, which then enables the wrong values to propagate through to the rendered template

before_filter command

before_filter :set_item,
:only => ['edit', 'udpate', 'show']
this would cause set_item action to run before the stated actions

This is good, because it then enforces use of the correct variable use, and then, should you end up with a blank action (ie: def show;end) because the set_item does everything, then it will jump through and render the show.html.erb file, without even needing def show;end to be there at all

Now, instance variables are how the controller talks to the views, but in this case the instance variable does not belong purely to self, as

self in controller is a controller object
self in view is an ActionView::Base object

when rails hands off a variable to the view, it walks through the array and sets its type to the ActionView::Base,
so they don't technically share them.


Before lunch we looked at non-default associations. This got very interesting for my app YADB, we found a plugin which allowed nested has many through associations.

belongs_to :auction (gives methods from Auction)
belongs_to :bidder, :class_name => "User" (enables User to be used as a bidder, but no bidder object)

self.bidder =

has_many :bids, :foreign_key => "bidder_id" (sets the use of a different foreign key as otherwise would look for bids_id)
has_many :auctions_held, :class_name => "Auction", :foreign_key => "seller_id"
(combination of the previous two)
has_many :auctions_bid_on (no class), :through => :bids,
:source => :auction, (however you got there, I want to know via
:uniq => true (creates many to many relationship, but with useful join table with it's own model)

define singleton methods on the association

has_many :bids do
def average_interval
i = 0.0
inject {|a,b| i += b.created_at = a.created_at; b }
i / (size-1)/

A bit on inject:

it is a method on enumerable

[1,2,3,4].inject do |a,b|


1st time 1 is a, 2 is b
next time a is result of code block, b is the next element

1: a = 1, b = 2
2: a = 3, b = 3
3: a = 6, b = 4

also used with hashes (both are equivalent)

[1,2,3,4,5].inject({}) do |hash,e| hash[e] = e * 10; hash; end
[1,2,3,4,5].inject({}) do |hash,e| hash.update(e => e * 10); end

I had a problem about trying to link through 2 tables, with the following idea

class Disciplines < ActiveRecord::Base
has_many :card_disciplines
has_many :cards, :through => :card_disciplines
has_many :vampires,
:through => :cards,
:source => :minion, :conditions => " = 'vampire'"

Now this isn't in 2.0.2, but there had been a lot of discussion about it. A patch has been submitted and the writers have created the following plugin


which works. David mentioned that the discussion about this has reached a point where he thought it had been incorporated, and may be in edge rails, but clearly didn't make the cut to 2.0.2

I hope that it makes it in, as when I first attempted YADB with 1.2, I was looking for some links like this. There is some discussion about how this hits the database, but it keeps the Rails pragma, instead of writing a SQL statement myself, and still only generated 1 single SQL statement. What more do you want?


We then looked at errors and validation.

ActiveRecord examines the object to decide if it is valid. If it fails, then it never even attempts to save the information in the database.

validates_size_of :name, minimum => 5
validates_size_of :name, maximum => 50
- > must be on separate lines

You can create your own validations by using validate

def validate
errors.add("name", "That's impossible") if name =~ /\d/

All ActiveRecord objects have and errors attribute. If the errors attribute is empty, then the object is valid

You can trap the errors if the database has the constraints on it that disallow data to be be saved, but do this via the controller, as you are expecting a big error, rather than a protection of the data integrity


This then lead onto forms and processing, starting with a bit on the difference between each and map

each vs map

each loops over 1 element at a time, and return value is the object

x = [1,2,3,4]
y = x.each {|e| puts e*10}
y.equal?(x) => true

map returns a mapping (an accumulator of the results of the block)

newx = {|e| e* 10 }
newx => [10,20,30,40]

What goes on when you display and process forms

<% form_tag :action => 'update', :id => do %>
<p> name: <%= text_field 'item', 'name' %></p>
<p> year: <%= text_field 'item', 'year' %></p>
prepopulates these on form generation

then, for each first argument


and then


fields_for will allow you to override the default selected using form_for

<% form_for 'auction' :url => {:action => create} do |f| %>

<p>Title: <%= f.text_field "title" %></p>
<% fields_for 'item' do |fi| %>
<p>Description: <%= fi.text_field 'description' %>
<% end %>
<%= submit_tag %>
<% end %>



errors which occur get wrapped with a <div class="fieldWithErrors"></div> which causes the box to jump down beneath the title.

This means you could style it

You can change it.

It calls a proc, which is a function and you can replace it.

Lazy (in environment.rb, should be an initializer)

ActionView::Base.field_error_proc = {|a.b|
"<p>Andy's placeholder</p>"

This is an executable object that I can call again and again when I want it.

Look on google with 'field_error_proc'


The final lecture session was our daily dose of ruby. We were looking at the singleton method. This was interesting to find out, as we looked the singleton class, and a bit on how an object looks to find if it has a method.

The method lookup path

1) The object's singleton class
- modules mixed into singleton class
2) The object's class
- modules mixed into object's class
3) The object's superclass
- modules mixed into object's superclass

repeat 3 as needed until
a) kernel

To open the singleton class definition for self

class << self

this is a very frequent idiom for class methods, and means if you have lots of

def self.method

then you can save the 'self.' by

class << self
def method
def another_singleton_method

To demonstrate this, I'll just dump here the irb that I did whilst trying this, and that will be all from day 2. Looking forward to day 3.

guest095:~ ajb$ irb
>> class Person; attr_accessor :name; end
=> nil
>> andy =
=> #<Person:0x5a2abc>
>> class << andy
>> def talk
>> puts "Hi"
>> end
>> end
=> nil
=> nil
>> aclass = Person
=> Person
>> aclass.methods
=> ["to_yaml_style", "inspect", "private_class_method", "const_missing", "clone", "method", "public_methods", "public_instance_methods", "yaml_as", "instance_variable_defined?", "method_defined?", "superclass", "equal?", "freeze", "included_modules", "const_get", "to_yaml_properties", "methods", "respond_to?", "module_eval", "class_variables", "dup", "instance_variables", "protected_instance_methods", "to_yaml", "__id__", "public_method_defined?", "eql?", "object_id", "require", "const_set", "id", "send", "singleton_methods", "taguri", "class_eval", "taint", "require_gem", "instance_variable_get", "frozen?", "yaml_tag_class_name", "taguri=", "include?", "private_instance_methods", "instance_of?", "__send__", "private_method_defined?", "to_a", "name", "yaml_tag_read_class", "autoload", "type", "new", "<", "instance_eval", "gem", "protected_methods", "<=>", "display", "==", ">", "===", "instance_method", "instance_variable_set", "extend", "kind_of?", "protected_method_defined?", "const_defined?", ">=", "ancestors", "to_s", "<=", "public_class_method", "allocate", "class", "hash", "private_methods", "=~", "tainted?", "instance_methods", "class_variable_defined?", "untaint", "nil?", "constants", "is_a?", "yaml_tag_subclasses?", "autoload?"]
>> dc = class << andy; self; end
=> #<Class:#<Person:0x5a2abc>>
>> dc
=> #<Class:#<Person:0x5a2abc>>
>> dc.instance_methods.sort
=> ["==", "===", "=~", "__id__", "__send__", "class", "clone", "display", "dup", "eql?", "equal?", "extend", "freeze", "frozen?", "gem", "hash", "id", "inspect", "instance_eval", "instance_of?", "instance_variable_defined?", "instance_variable_get", "instance_variable_set", "instance_variables", "is_a?", "kind_of?", "method", "methods", "name", "name=", "nil?", "object_id", "private_methods", "protected_methods", "public_methods", "require", "require_gem", "respond_to?", "send", "singleton_methods", "taguri", "taguri=", "taint", "tainted?", "talk", "to_a", "to_s", "to_yaml", "to_yaml_properties", "to_yaml_style", "type", "untaint"]
>> dc.instance_methods(false)
=> ["name", "talk", "name="]

Tuesday, 1 April 2008

Advancing with Rails course - Day 1

Mar 31 was the first of our 4 day Ruby on Rails course at work. It is being presented by David Black, author of Ruby for Rails, and Director of Ruby Power and Light LLC.

We have essentially managed to catch him between the European Ruby conference and Scotland on Rails.

The day was very much an introductory day, with a few little tidbits. We set up our own little Sandbox project to work within. I have opted to start with something I have been looking to do for sometime which is a VTES CCG deck-building program. (VTES is a vampire based collectable card game which is where the Vampire Software comes from. It will be something I'll get into sourceforge soon!)

David also explained a bit more about Ruby objects and rails functional testing. The former was a good tidy up of what I already knew/had worked out. The latter was a good formalisation of what they are trying to achieve, as whilst I had done some functional testing in rails before, most was concentrated on the unit testing. (A note here. functional testing can mean different things in different frameworks. Some prefer to think of functional testing in Rails as just an extension of Unit testing specifically for controllers, as in other frameworks, functional tests are what Rails calls Integration tests).

At this point, David also gave us a reason for NOT using scaffold generation to generate model backed controllers and views. Personally, I like this approach (I think that code which writes code is a very good thing) but he says that as you get further into rails, you end up removing more from scaffold generated code, than you would write that could have been auto-generated, and as such prefers to suggest that you should learn writing from scratch. With my limited experience, I'll take a rain-check on making a confirmed decision, but I will (for now), stick with auto-generation when it comes to a model backed controllers.

One thing here though that I did discover is in Rails 2.0.2, if you have generated your model first, and then run script/generate scaffold , it stops after it finds a migration with the same name as it wants to generate. You actually need to do the following

1) script/generate scaffold --skip-migration
2) not do script/generate model first
3) rename the migration file and class within it before running

This was annoying, as before when I did this, it used to ask if I wanted to replace it, and it generates everything up to the point where it barfs, but nothing after it. Oh well, I suppose it is something I can get used to. Just think I'll need to read up on the new stuff for 2.0.2 sooner than I expected. :)

We also touched upon the new Initializers, which look very useful for keeping track of requirements. I hope we might look at those again.

The final thing to talk about here was how to implement Composite Primary Keys. Firstly - why? ActiveRecord is never going to support them.

Secondly - However, we have a big legacy database using them extensively, so David showed us how to implement Dr. Nic's composite_primary_keys gem (requiring it via an Initializer!). I won't explain it all here, but the scratchpad attempt I made of it worked well. (to install it though - sudo gem install composite_primary_keys)

And finally, speaking of Legacy databases. In Rails 'type' is a reserved column type for when using Single Table Inheritance, and ActiveRecord expects to use this as a class name. We have fallen foul of this with our legacy database, but fall foul of it no more. In the model put:

set_inheritance_column :nothing

and hey presto! you can use the 'type' column in your legacy schema as planned.

That will do for now. Let's see what the next day brings.

Wednesday, 19 March 2008

Composite Primary Keys - Yuuurrghhh!

Well, nearly.

You have seen me write how I think that Composite Primary Keys are above all else a BAD THING(tm).

Well, I think they still are, as Primary Keys.

However, there is no reason not to have a unique combination of keys in a table. This I have never had an issue with.

So why?

Well, look at our tagging example. We don't want to assign a tag to an entity_type twice in our frequency table, at this point we just want to increment the frequency count. Here we hit a snag.
When we send info to our model, we don't know the id_tag_frequency, only the id_tag and the id_entity_type. This isn't a unique primary key.

However, not to worry. We just need a constraint to ensure that the combination here is unique (tags will be assigned to multiple entity_types, and entity_types will have multiple tags).

If we do this, then you can do a query lookup to return just one row.

So why not make this the Primary Key?

Well, if you do this, then you could lose out on simple coding like this

if (unique a + b) and not primary key { fetch primary key from database or create new entry }

but, more importantly, if later on you need to reference the table row in another table (which is the benefit of relational databases, isn't it?) you only need the single primary key put in the new table, instead of all the parts which make up the composite key. This maintains a DRY principle within the database, because, should you need to change part of the composite uniqueness in table 1, you don't need to do it in table 2 as well.

So, never worry about needing a unique combination of fields, that is great, but always have a unique single field primary key on the row. This should help future proof your database when you need new tables, and will make it much easier to create new programs, as you have now given people the option of searching via the composite unique key, or the single primary key.

An important aside to this. If you use an iscurrent boolean for your rows. It can be difficult to make this part of the unique composite key (because only 1 will be current), but do ensure that your code makes only 1 current for the entity it relates to. This will ensure that anyone querying the database will always be able to find the current one. I have had this problem working with a legacy database before, due to the fact that someone had put the constraint:

entity + status + date = unique composite primary key

because date (especially if stored as datetime) will generally give you a guaranteed unique combination, new rows were getting stored as iscurrent = 1, but all other rows for that entity and status were not getting set to iscurrent = 0. Whilst you could always use the following sql

select entity, status, date from table order by date

and select the one with the most recent date, you lose the benefit of just asking

select entity, status, date from table where iscurrent = 1

if that is all you want. And again, this future proofs development of new applications, as if the table is very large (and in bioinformatics data storage, they often get very exceptionally large) then developers will need to rely on the field names to determine how to use the table.

So, if you have an iscurrent, keep it up to date, otherwise, DON'T USE IT.

Rant over. Normal service resumes whenever we can determine what is normal anyway.

Tuesday, 18 March 2008

Define Existence

Not an existential question this one, but an interesting 'bug'.

I have never used if (exists $var->{key}) {}

I don't know why, but I just have never needed to. In fact, I can only just barely remember being told about it on a course.

I have always used if (defined $var->{key}) {}

What is the difference.

To quote the Camel

A variable can only be defined if it exists, however, the reverse is not necessarily true.

So, if during an initialisation step, you do

$model->{primary_key} = [result of some sql query for primary key]

but there is no primary key result as the sql statement returned no results, then

(exists $model->{primary_key}) == true


(defined $model->{primary_key}) == false

as $model->{primary_key} == NULL

Just something I found whilst trying to bug-fix why my save kept trying to go to update (and then croaking because of, you guessed it, no primary key) when it should have been going to create.

I suppose I should have know this, but when I have always, in the past, used defined to tell me if it both exists and is defined, then it's an easy thing to miss.


Monday, 17 March 2008

Are Clouds Real?

This is a bit of a summary of what we decided in the production of our database tables for tags.

Very simply, we wanted to be able to tag different entities with tags. We also want to be able
to track the tags used for an entity type (frequency, use for that entity type), so that we
could lookup that, and offer suggestions based on the usage already of that tag - i.e. a cloud
of tags for that entity_type (see scrumptious tagging for some application of that).

So, the first is simply to have a table of tags

id_tag | tag

then we have a table of runs

id_run | other info about run

and then a join table

id_tag_run | id_run | id_tag

OK, so that is quite easy. But, we don't have any opportunity to look at any information on this,
particularly our 'clouds' of tags.

We looked at the possibility of a cloud table

id_cloud | ids_of_tags_in_cloud

But how do we know the entity (remember, we want to tag many different things)

In this case, we have an entity_type table

id_entity_type | description | iscurrent

From our models, we can get them to work out their entity_type, as our models are named
xxx::model:: where entity type is the name of the table. This is just a simple
method. So, we make sure the description matches. This gives us an id_entity_type.

id_cloud | ids_of_tags_in_cloud | id_entity_type

However, now we nothing of the frequencies the tags are used for an entity. We also need to
keep appending the tag id's to a field, which could make it difficult to represent each
individual tag in the model.

At this point, our discussion came about to what clouds actually are. (at this point the
science teacher in me said a collection of water vapour). We wondered about abstracting the
cloud, and try to collect the frequencies of a tag against the entity type

So, out with the cloud table (for now?) and in with the following

id_tag_frequency | id_entity_type | id_tag | frequency

This is looking good. We can store the frequencies that a tag has been used on an entity,
and we also get an easy lookup for all tags that have been saved against an entity, which means
that we can abstract our cloud with an sql statement like this

cloud = select e.description as entity_type, t.tag as tag, frequency
from entity_type e, tag_frequency tf, tag t
where e.description = ?
and e.id_entity_type = tf.id_entity_type
and tf.id_tag = t.id_tag;

So, do we need any more information. It is useful to know who first saved a tag for an entity, and a date.
We can add this to our tag_ join table

id_tag_run | id_run | id_tag | id_user | date

So what we have

tag table:
id_tag | tag

's table:
id_ | info about entity

tag_ tables: (one for each entity)
id_tag_ | id_ | id_tag | id_user | date

entity_type dictionary table:
id_entity_type | description | iscurrent(optional)

id_tag_frequency | id_entity_type | id_tag | frequency

and clouds are purely abstract entities created from the data in the tables. We just need to use to code
to (sql or within your program if another language) to calculate the frequency.

So, in the Clearpress MVC framework (or in Rails) we create a model for each of tag* tables with the accessors
for the table and hey presto, a setup for tagging any entities in our database. You could even tag a tag :)

Wednesday, 12 March 2008

How many tests does it take to change a lightbulb?

I gave this talk at lunchtime today during our Wednesday FooMongers meeting to probably the largest number of people who have attended. It was well received.

It focused on why we should test, and how to go about testing, mostly test-driven development wise.

Rob is putting a video up on YouTube tonight of the talk, so I'll post the link tomorrow.


From: setitesuk, 8 minutes ago

Slides for a presentation on testing given to foomongers at the EBI/Sanger Insts 2008/03/12

SlideShare Link

Tuesday, 11 March 2008

Scrumptious Tagging

So, we needed to look at, as well as writing full annotations for runs, the ability to tag runs (and other entities) with keywords. Being as we are writing a web-based application, this is very web2.0.

For examples of tagging, look at, facebook or flickr.

We wanted a style very similar to However, with the range of autocomplete/autofill javascript out there in the open source community, I couldn't find anything that mimics the way works.

In a nutshell:

Display tags already given to a run (I'll use this, but replace with entity).
Click to Add tags
Change display to an entry field which has the tags in it, a cloud of all tags which have already been assigned to runs, which you can click to add or remove from the entry field, and they highlight if already in the entry field,
and the clever bit - if the user starts typing into the field, it comes up with suggestions in another revealed field of tags from the cloud, sorted by the frequency the tags have been used, showing the ten most common for the current suggestions, changing as the user types more letters.

(You may have noticed here that we are trying to get the users to essentially try to use the same tags again and again, rather than stick in a hyphen, etc)

My boss already had a two function javascript to toggle the tags in and out from the cloud, and highlight them (with some css) if they are already in the text field, but I needed to do the rest. This I managed to some up in 4 more functions.

With my bosses permission, I have included his 2 functions (crediting him of course) and I have put the javascript on sourceforge ( You can just checkout the current svn trunk to obtain the code.

Monday, 25 February 2008

Mocking out to retrieve an email

So, we are using MIME::Lite to generate emails from our scripts/modules. The problem I came across in
testing was:

How do I test the $msg->send() to see that the email I have generated is what was expected?

In Ruby on Rails, the test framework automatically allows for email to be appended to an array, which
you can then test against.

I want something like this in PERL!

Well, it turns out that I can.

After the module using MIME::Lite has been loaded, and my object created, do the following before running a method calling $msg->send(); ($model is an object created from containing the method you want to call $msg->send() from)

my $sub = sub {
my $msg = shift;
push @{$model->{emails}}, $msg->as_string;
MIME::Lite->send('sub', $sub);

Now, if you run the method containing $msg->send(), you can then access an array, where each element contain the MIME::Lite output of the email generated (and no emails would have been sent, thus not spamming people at random:).

If you model method only creates a MIME::Lite object, and you are testing what a script might do with it, then an alternative $sub is

my $sub = sub {
my $msg = shift;
return $msg->as_string();
MIME::Lite->send('sub', $sub);

Then yous can happily do:

my $email = $msg->send();

and test $email to your hearts content, including sending it through MIME::Parser to obtain the headers, attachments, body etc.

I hope this may be of help for someone.


Friday, 8 February 2008

Contacting Authors and creating SQL

So yesterday I was having a problem. I wanted to send an email using MIME::Lite from a PERL
but such that auto responders wouldn't send back.

I trawled through the RFC to see what things they suggested. It is a lot of boring documentation, but I found 3 useful things

1) Subjects should have autoreply in them
2) The auto-submitted header should be set
3) Auto responders should ignore anything that comes with a Precendence: list header

The first two I could programmatically cope with (if (x) { ignore in some way }).

The third needed me to change my outgoing posts. So, off to the documentation for MIME::Lite to find how to set the Precendence header.

Unfortunately, I couldn't find this, and just using MIME::Lite->new(Precendence => 'list'); wasn't working. Give up I did not! I mailed the very nice man who wrote MIME::Lite as to whether a) it did it, or b) if it would be put in a future release.

Bingo, he mailed me back. Simply do MIME::Lite->new('Precendence:' => 'list'); Hey presto,it worked - brilliant. (My biggest thanks to eryq for his quick informative response!)

In fact, he mentioned you can set any of the non-standard MIME::Lite setting headers in this way. A very useful thing to know indeed.

So, one quick change and a test later, and we have this in svn!

On to the next issue - generating advanced queries in SQL programmatically.

So, I have been set for this sprint to write an advanced query page. I have taken this slowly and steadily, as I want to get it right! The first task is just to get it self generating some multi-table queries, that also use join tables, performing a simple one statement select.

Easy or hard? That is the question.

The way I have looked at it is that I, the wonderful software guru, know the database tables. After all, why wouldn't I? In a search model, I create 4 new methods.

1) advanced_search - this is to generate the SQL, perform it and return the results
2) search_for - this returns a hash of all the required fields and corresponding tables based on what the user has requested.
3) search_conditions - this returns a hash of what fields are being selected on, and corresponding tables for those. Here, I can write some additional extra WHERE statements to group together more than one field lookup if needed (which for loader was)
4) table_links - this returns a hash of all tables that have foreign keys, and the table for that foreign key (all keys have the format "id_").

With this and some looped code, I have managed to generate some rather complex single SELECT statements in the format of

SELECT DISTINCT tablea.id_tablea,, tableb.id_tableb, tablec.comments
FROM tablea, tableb, tablec, tabled
WHERE tabela.position IN (x,y,z)
AND tablea.id_tabled = tabled.id_tabled
AND tabled.id_tablec = tablec.id_tablec
AND tablec.comment LIKE '%good%'
AND tablec.id_tableb = tableb.id_tableb

I have chosen to put DISTINCT in as a default, as sometimes, when x,y,z are all chosen, it can return multiple rows that are the same.

This seems to be going quite fast, however my ever faithful friend Test::Perl::Critic says that method advanced_search has a high complexity score, so it definitely needs a bit of refactoring.

However, it seems to be going fairly well. So, refactoring can wait until next week.

By for now

Tuesday, 5 February 2008

It's been a (rather busy) while

So, a little while since I last posted. However, much has been done.

We had another release last week, some of which has needed some tweaking since for the next release. GD doesn't help make pages go quickly! However, a bit of ajaxing later and we have a page which loads quickly, but allows the graphs to be opened on the page afterwards.


Also, I have been experimenting with creating a script ot parse a MIME format email, to then store the body as an annotation and get the id from the subject.

Fairly easy. I am using MIME::Lite and MIME::Parser to do the hard work, and the api for the application to contact the main models, which in turn handle saving the annotation. You got to love well constructed and documented APIs.

I have written extensive tests over the code for the main part. I wrote most within a module, so that this could be tested easily and reused, with a small script to handle receiving the email, and if anything croaks, doing something with the croak. (An email back, unless it is an Out of Office reply - why do people need to use them? I think we should all start immediately spamming anyone who uses them so an end such that perhaps they won't in future*).

Anyway, I've done a fair bit over the past couple of weeks towards new features, or bettering existing ones. The learning curve keeps going up. I am now also looking at trying to dynamically create SQL in order to be able to do advanced searching. The problem, however, is how to store the foreign key pattern from across the database. I need to look into if the models can provide the information they possess, or if I need to store it within a hash (as current).

Must dash for now. More soon.

*This is not a serious suggestion. I do not condone spamming in any way. A polite request should be used instead.

Wednesday, 23 January 2008

Adding a graph

So, Hands up all those that have 'use(d) GD;'?

Now that includes me!

So, I needed to show a table of numbers in a bar chart format. Quite simply, I had an arrayref of hash refs, each hash ref using keys date and percentage.

First off, the GD plot needs it as an arrayref containing 2 arraysrefs, the first of the x-axis (in this case the date) and the second of the percentages. ($cmap is an object of in-house stuff to work out appropriate colours to use).


my $plot = [[],[]];
foreach my $href (@{$arrayref}) {
push $plot->[0], $href->{date};
push $plot->[1], $href->{percentage};

my $graph = GD::Graph::bars->new(1000,400);
'dclrs' => [
map {
} (qw(red purple orange blue green yellow magenta cyan))],
'fgclr' => 'black',
'boxclr' => 'white',
'accentclr' => 'black',
'shadowclr' => 'black',
'y_long_ticks' => 1,
'x_label' => 'Date',
'y_label' => 'Percentage Activity'
return $graph->plot($plot)->png();

See, I said it was easy. :)

So, of course now I have said that, everyone wants their data in a graph format. Luckily, GD provides lots of graph formats. I just wish that we could submit to them as array of hrefs, rather than array of arrays, which I personlly think is

1) messier
2) relies on the end user to know the order the arrays should be in
3) Sends no additional information for legends (should this be needed), whereas the key can be the legend.

Oh well. I shouldn't complain really. I didn't have to write it. Don't you just love the CPAN!

(GD and it's suite of components by Lincoln D. Stein and searchable on CPAN. There are lots of additional stuff to link into Template::Toolkit, etc written by others. Have a look!)

FooMongers today. Don't yet know what we'll discuss.


Friday, 18 January 2008

Release of Acme::Hardware::Light::Bulb

OK, so it itsn't the biggest thing ever, but now Acme::Hardware::Light::Bulb is in sourceforge svn

If you SVN Browse, you can see a file trunk.tar, which should download you the tarball of the trunk. Else you could just get an svn co of the trunk.

I take no responsibility for anything that happens when you use these modules!

The coverage of tests though is 97,2%, and all the programming is to the standards supported
by Test::Perl::Critic.

You will need Carp and Class::Std.

The modules are

Acme/Hardware/Light/ - which generates a Light Bulb object
Acme/Hardware/Light/Bulb/ - which generates a Change Light Bulb object with as the base class.

Anyway. Yesterday was moving methods from the View to the Model, as that is where they should reside, and then today was tests. Coverage of two modules from 48 to 97% Hurrah!

Only 3 modules below 50% now, and total is 86% over the main project. We are getting there.

next week - creating a bar chart on a web page!


Thursday, 17 January 2008

Sprint ended

So, after a frantic week of finishing off the bits, and getting an extra feature to need to add in this release,
we deployed yesterday to massive applause.

Well, OK, so that was a bit of a lie. We deployed. I then needed to fix a bug which hadn't shown up in my
test version of the database, which then crashed the server, which then needed another fix.

However, it is done,and we added many new features. Some are ready for a need to change what group people belong to and the permissions of that group. Some are already there. Some need a new feature change in the next sprint to move from a table to a bar chart.

On other things, I was writing Acme::Hardware::Light::Bulb and Acme::Hardware::Light::Bulb::Change, since at PerlMongers last week, Michael and I couldn't find any modules relating to changing a light bulb.

We started it last week, and then I finished it over the week. Complete with96.7% test coverage, and completely happy with Test::Perl::Critic, and POD/distribution test coverage.

I tried to tarball it and mail it around, but for some reason the tar files would unpack.

I then went to upload to sourceforge via SVN, and ...

... accidently deleted the whole directory --AAHHH!

Exactly what svn/cvs is supposed to help prevent, I did whilst trying to put it in there. It was due to my own stupidity though. I did a mv instead of a cp within my dev area, and then deleting that moved directory to try again did me in. Note to self: NEVER mv a whole directory.

Anyway, I have resurrected the modules and all the critic stuff. I just need to rewrite the tests. However, it is all available on sourceforge should you wish to check it out.

Once completed, I will look to adding it to CPAN.

Enough for now. Back to my main job.


Wednesday, 9 January 2008

1 week left of the sprint to 7.0

So, 1 week left of the current sprint. Deployment of 7.0 is scheduled for next weds.

Manipulating to increase the level of the loader was not difficult. Putting it into the profile of the user, just a submit button to "promote", with the next level as a hidden field. Then in the tt2 template, a wrapper to ensure that the button isn't generated if the user him/herself is the viewer (we don't want users promoting themselves), or that they are already at "Gold" standard.

So, on to writing tests to exercise the code. We aren't doing to much test-driven development (SHOCK HORROR!), but rather writing the tests afterwards and using Devel::Cover to ensure it is as full as possible. (When FireWatir is improved, I may start looking at that for testing the web interface, but at the moment, it is just Test::More unit tests)

It is quite amazing how quickly the time can go still when writing tests. It isn't the sexiest of topics, but it is good to know that the code we are writing is able to be checked, in case something down the line changes it.

I am opting for writing new test units for each of the additions I have written, rather than adding them in to previous units. This may mean an additional time overhead when running the tests, but I have found it easier when they fail to see via the name of a unit what the problem is (which is reported at the end of the make test output) as well as via the name of the test itself (assuming you have verbose switched on).

So, over the last 3 working days, I have enabled a loader to be assigned a level, and tested that the level can be found for a user who is in the loading group, and that it is displayed in the right places.

Just need to tidy up a couple of extra options and we should be ready for deployment.

Bye for now

Thursday, 3 January 2008

Database manipulation

So, the time has come to start looking at the SQL in the models.

Simple premise - Loaders are tested and move through grade to be either Bronze, Silver or Gold.

Now a loader is a user, but a user isn't necessarily a loader. A loader is a user who also belongs to "Loading" group, but he/she can also be in other groups.

So I decided, and got my boss's agreement, that we should add the level to the joining table between id_user and id_usergroup (user2usergroup).

This makes it fun though as no programmatically where do I call for the level.

In model user2usergoup - but how do I know the user id. I only want the one!
In model usergroup - but there are many members of the group!
In model user - but the user can be in many groups.

I have opted (today) to go for model user, choosing to specify the id of the usergroup, and write my sql to call out of the joining table with the known $self->id_user().

It works, and with the templating system, selects the correct Medal icon for the level.

Tomorrow, its onto a new page to assign a level to a loader (assuming the forecasted snowfall doesn't hinder me). Although, I do now have the whole thing downloaded onto my MacBook, so I can do some work without internet connections. (Would need to hope that my littl'n will let me though).

That all for today.

Wednesday, 2 January 2008

A new year, a new start

Well, here we are. A new year. This is looking to be the most exciting for me yet, as it's the first one in a new home, and as a fully fledged programmer.

Also, Perl 5.10 is now out there. I don't know what that will entail where I work at the moment, but some
of the new features look good.

Just used Devel::Cover for the first time today. It produces some good results, and has set us up for a plan a few release cycles down from here (if the feature requirements ever slow down). Better coverage. We did have very good coverage for a while, but things slow down and you get the job of doing the work above doing the tests. However, our coverage is still very good. Just some room for improvement.

Also, I have discovered the joys (or lack of them) of ssh-ing files into databases, over wireless connections. My Mac is still tied up doing this after 4hrs. Fun and games all round then. Think I'll be leaving that to do it's stuff overnight then.

Programming today has been little, but necessary. We have agreed a release schedule to try to stick to, to ensure good releases and done when expected. Agile development with Scrum! It's the way forward.
Luckily my Boss doesn't need to sell this to me, as someone did that last Summer anyway, but I think it's good, and for any of you out there not doing it, Why? Read some of the Pragmatic Programmer (Hunt and Thomas) and Xtreme Programming (Beck) if you need more.

Anyway, end of the day for me now. Back tomorrow.

Info on Perl 5.10 can be found via and Devel::Cover from the CPAN.