Monday, 12 July 2010

A small journey in Benchmarking

20ish lines of verbose code involving hashes, arrays and grouping, in comparison to a magic piece of regex which does in 3 lines the same thing.


use Benchmark q{:all};

my @lsf_indices = ( 1000,1001,1002,1003,1006,3000,3300,3301,3302,3303,3304,3305,3306,3998,3999,4000,4001,4002);

my %methods = (

regex => sub {
my $array_string = join q{,}, @lsf_indices;
$array_string =~ s/\b(\d+)(,((??{$+ + 1}))\b)+/$1-$+/g;
$array_string = q{[} . $array_string . q{]};
},
verbose => sub {
my ( $previous, $current_working_index );
my %consecutive;

foreach my $index ( @lsf_indices ) {

if ( $previous && ( $index == $previous + 1 ) ) {
push @{ $consecutive{ $current_working_index } }, $index;
$previous = $index;
} else {
$previous = $index;
$current_working_index = $index;
push @{ $consecutive{ $current_working_index } }, $index;
}

}

my @array;
foreach my $index ( sort { $a <=> $b } keys %consecutive ) {

if ( scalar @{ $consecutive{$index} } == 1 ) {

push @array, qq{$consecutive{$index}->[0]};

} else {

my $last = pop @{ $consecutive{$index} };
my $first = shift @{ $consecutive{$index} };
push @array, $first . q{-} . $last;

}

}

my $array_string = q{[} . ( join q{,}, @array ) . q{]};
},

);

cmpthese( 30_000, \%methods);
timethese( 30_000, \%methods);


Result


home$ ./benchmark.pl
Rate regex verbose
regex 6048/s -- -72%
verbose 21898/s 262% --
Benchmark: timing 30000 iterations of regex, verbose...
regex: 5 wallclock secs ( 4.96 usr + 0.01 sys = 4.97 CPU) @ 6036.22/s (n=30000)
verbose: 1 wallclock secs ( 1.37 usr + 0.00 sys = 1.37 CPU) @ 21897.81/s (n=30000)


Verbose lines of code is around 5 times faster. Happy :) Code I can read, and speed benefits as well.

Admittedly, a bit of an exercise, since this isn't really a bottleneck. ;)

4 comments:

Unknown said...

It makes me wonder if there'd be scope in Perl for declaring a function with two implementation bodies; a short simple but possibly slow one, and an optimised hand-written nightmare of code that is more performant. Build a Test module that can switch between them, and write lots of unit tests to assert equivalence of both implementations. Then a casual reader of the code can look at that slow-but-simple one and say "ah OK, I can see what that's doing", but the unit tests assert equivalence of the optimised case.

Illusori said...

Clearer and faster (or would be clearer if I could use pre/code markup...):

single_loop => sub
{
my ( $start_run, $end_run, $ret );

$ret = '';
foreach my $entry ( @lsf_indices )
{
if( defined( $end_run ) )
{
if( $entry == $end_run + 1 )
{
$end_run = $entry;
}
else
{
$ret .= '-' . $end_run if $start_run != $end_run;
$ret .= ',' . $entry;
$start_run = $end_run = $entry;
}
}
else
{
$ret .= $entry;
$start_run = $end_run = $entry;
}
}
$ret .= '-' . $end_run if $start_run != $end_run;
$ret = '[' . $ret . ']';
},

Rate regex verbose single_loop
regex 1649/s -- -68% -91%
verbose 5199/s 215% -- -71%
single_loop 17647/s 970% 239% --

Looping twice rather than once, and losing the sort-order so you need to resort is a big inefficiency.

Unknown said...

Paul, Sounds liek quite a good idea, but I think from my point of view, there would be increased code maintenance, which would make it prohibitive.

Illusori, Thanks for that. I thought there must be a way to do it in one loop, and it's just as clear and even faster, brilliant. I hope you don't mind, but I'm going to stick that in my code.

Illusori said...

Feel free, I wouldn't have posted it if I wanted to keep it to myself. :)

My version isn't that optimal either, there's somewhat more fiddling around with the strings than there needs to be: with a little work you should be able to get it down to a single $ret .= line at the end of each run without much loss in clarity.