Vampire Software Development: A small journey in Benchmarking

Monday, 12 July 2010

A small journey in Benchmarking

20ish lines of verbose code involving hashes, arrays and grouping, in comparison to a magic piece of regex which does in 3 lines the same thing.


use Benchmark q{:all};

my @lsf_indices = ( 1000,1001,1002,1003,1006,3000,3300,3301,3302,3303,3304,3305,3306,3998,3999,4000,4001,4002);

my %methods = (

  regex => sub {
    my $array_string = join q{,}, @lsf_indices;
    $array_string =~ s/\b(\d+)(,((??{$+ + 1}))\b)+/$1-$+/g;
    $array_string = q{[} . $array_string . q{]};
  },
  verbose => sub {
      my ( $previous, $current_working_index );
      my %consecutive;

       foreach my $index ( @lsf_indices ) {

         if ( $previous && ( $index == $previous + 1 ) ) {
           push @{ $consecutive{ $current_working_index } }, $index;
           $previous = $index;
         } else {
           $previous = $index;
           $current_working_index = $index;
           push @{ $consecutive{ $current_working_index } }, $index;
         }

       }

       my @array;
       foreach my $index ( sort { $a <=> $b } keys %consecutive ) {

         if ( scalar @{ $consecutive{$index} } == 1 ) {

           push @array, qq{$consecutive{$index}->[0]};

         } else {

           my $last  = pop   @{ $consecutive{$index} };
           my $first = shift @{ $consecutive{$index} };
           push @array, $first . q{-} . $last;

         }

       }

       my $array_string = q{[} . ( join q{,}, @array ) . q{]};
     },

);

cmpthese(  30_000, \%methods);
timethese( 30_000, \%methods);

Result


home$ ./benchmark.pl 
           Rate   regex verbose
regex    6048/s      --    -72%
verbose 21898/s    262%      --
Benchmark: timing 30000 iterations of regex, verbose...
     regex:  5 wallclock secs ( 4.96 usr +  0.01 sys =  4.97 CPU) @ 6036.22/s (n=30000)
   verbose:  1 wallclock secs ( 1.37 usr +  0.00 sys =  1.37 CPU) @ 21897.81/s (n=30000)

Verbose lines of code is around 5 times faster. Happy :) Code I can read, and speed benefits as well.

Admittedly, a bit of an exercise, since this isn't really a bottleneck. ;)

4 comments:

Unknown said...: It makes me wonder if there'd be scope in Perl for declaring a function with two implementation bodies; a short simple but possibly slow one, and an optimised hand-written nightmare of code that is more performant. Build a Test module that can switch between them, and write lots of unit tests to assert equivalence of both implementations. Then a casual reader of the code can look at that slow-but-simple one and say "ah OK, I can see what that's doing", but the unit tests assert equivalence of the optimised case.; 12 July 2010 at 09:53
Illusori said...: Clearer and faster (or would be clearer if I could use pre/code markup...):

single_loop => sub
{
my ( $start_run, $end_run, $ret );

$ret = '';
foreach my $entry ( @lsf_indices )
{
if( defined( $end_run ) )
{
if( $entry == $end_run + 1 )
{
$end_run = $entry;
}
else
{
$ret .= '-' . $end_run if $start_run != $end_run;
$ret .= ',' . $entry;
$start_run = $end_run = $entry;
}
}
else
{
$ret .= $entry;
$start_run = $end_run = $entry;
}
}
$ret .= '-' . $end_run if $start_run != $end_run;
$ret = '[' . $ret . ']';
},

Rate regex verbose single_loop
regex 1649/s -- -68% -91%
verbose 5199/s 215% -- -71%
single_loop 17647/s 970% 239% --

Looping twice rather than once, and losing the sort-order so you need to resort is a big inefficiency.; 12 July 2010 at 09:55
Unknown said...: Paul, Sounds liek quite a good idea, but I think from my point of view, there would be increased code maintenance, which would make it prohibitive.

Illusori, Thanks for that. I thought there must be a way to do it in one loop, and it's just as clear and even faster, brilliant. I hope you don't mind, but I'm going to stick that in my code.; 13 July 2010 at 00:34
Illusori said...: Feel free, I wouldn't have posted it if I wanted to keep it to myself. :)

My version isn't that optimal either, there's somewhat more fiddling around with the strings than there needs to be: with a little work you should be able to get it down to a single $ret .= line at the end of each run without much loss in clarity.; 13 July 2010 at 02:37

Monday, 12 July 2010

A small journey in Benchmarking

4 comments:

Vampire Software Development

SetitesUK Bingo

Perl IronMan Challenge

My Redbubble Pictures

Twitter / setitesuk

About Me

Followers

Blog Archive

My Photo Work

ironman challenge