Perl cache modules performance

January 4th, 2010

There are a lot of cache modules available on the great CPAN already. The newest kid on the block is CHI, a Moose based intelligent and flexible caching solution with a very sane API and good design, separating the driver backends from the caching logic as much as possible.

The results show, that Cache::FastMmap is by far the most efficient implementation. Cache::FastMmap (just like memory based caches) is limited to the local host by design, whereas the Memcached based caches allow distributed caching, that can be accessed from various hosts. A distributed cache is a more flexible solution for scaling your application and caching requirements. With CHI using a subcache is possible, and the L1 cache subcache implementation is exactly what can be used to combine a fast local cache with a slower but persistent “across application restarts) cache.

It seems that CHI has a still room for improvement in terms of efficiency, which probably is caused by using Moose and not doing any XS optimizations yet. But still the CHI::L1 combination of memory and memcached is quite efficient, when dealing with a high number of cache reads compared to writes (see 1:100 ratio in the test below).

Of course this benchmark does not mimic any real world scenario (and most notably not yours), but should give some overview of what overhead the caching layer itself poses. Keys used for storing are always 36 character UUID strings. The values used for caching are separated into small, medium and large datasets. Small values are actually random binary UUIDs (16 bytes), the medium dataset the same but with 10 times longer (160 bytes) values, and the large dataset with 100 times the UUID length (1600 bytes). The tests how set/get ratios related always use the medium dataset (160 byte values) and the used ratios are 1:1 (you should probably not do any caching in that situation anyway), 1:10 and 1:100.

The script used to do the benchmarks is attached and is intended to be run with prove -v bench_cache.pl (it’s actually a test useing Test::More and not cleaned up). But if you find some obvious mistakes in generating the benchmarks I would be interested to know.

The script used to generate these results is available here: bench_cache.pl (and is being tuned to allow some graphing of the results)

    The abbriviations used below are listed here:

  • CHI:Mc:lIP … CHI::Driver::Memcached::libmemcached over IP
  • CHI:Mc:l … CHI::Driver::Memcached::libmemcached over Socket
  • CHI:L1 … CHI::Driver::Memcached::libmemcached (IP) with CHI::Driver::Memory L1 cache
  • CHI:FMmap … CHI::Driver::FastMmap
  • CHI:Mem … CHI::Driver::Memory (with max_size set)
  • C:Mc:lIP … Cache::Memcached::libmemcached over IP
  • C:Mc:l … Cache::Memcached::libmemcached over Socket
  • C:FMmap … Cache::FastMmap

And here are the results being generated on a Dell D830 dual-core laptop using perl-5.10.1 of Debian testing:

Benchmarking caches with ratio 1:10 and small values

              Rate CHI:Mc:lIP CHI:Mc:l CHI:L1 CHI:FMmap CHI:Mem C:Mc:lIP C:Mc:l C:FMmap
CHI:Mc:lIP  5107/s         --      -2%   -23%      -26%    -43%     -43%   -49%    -56%
CHI:Mc:l    5219/s         2%       --   -22%      -25%    -41%     -42%   -48%    -55%
CHI:L1      6669/s        31%      28%     --       -4%    -25%     -26%   -34%    -42%
CHI:FMmap   6920/s        36%      33%     4%        --    -22%     -23%   -31%    -40%
CHI:Mem     8885/s        74%      70%    33%       28%      --      -1%   -12%    -23%
C:Mc:lIP    8986/s        76%      72%    35%       30%      1%       --   -11%    -22%
C:Mc:l     10087/s        98%      93%    51%       46%     14%      12%     --    -12%
C:FMmap    11498/s       125%     120%    72%       66%     29%      28%    14%      --


Benchmarking caches with ratio 1:10 and medium values

              Rate CHI:Mc:lIP CHI:Mc:l CHI:L1 CHI:FMmap C:Mc:lIP CHI:Mem C:Mc:l C:FMmap
CHI:Mc:lIP  4628/s         --     -10%   -30%      -30%     -46%    -47%   -55%    -59%
CHI:Mc:l    5140/s        11%       --   -23%      -23%     -40%    -41%   -50%    -54%
CHI:L1      6639/s        43%      29%     --       -0%     -23%    -23%   -35%    -41%
CHI:FMmap   6643/s        44%      29%     0%        --     -23%    -23%   -35%    -41%
C:Mc:lIP    8615/s        86%      68%    30%       30%       --     -1%   -15%    -23%
CHI:Mem     8661/s        87%      69%    30%       30%       1%      --   -15%    -23%
C:Mc:l     10188/s       120%      98%    53%       53%      18%     18%     --     -9%
C:FMmap    11201/s       142%     118%    69%       69%      30%     29%    10%      --


Benchmarking caches with ratio 1:10 and large values

              Rate CHI:Mc:lIP CHI:Mc:l CHI:FMmap CHI:L1 CHI:Mem C:Mc:lIP C:Mc:l C:FMmap
CHI:Mc:lIP  4139/s         --      -5%      -28%   -28%    -45%     -47%   -56%    -60%
CHI:Mc:l    4380/s         6%       --      -24%   -24%    -42%     -44%   -54%    -57%
CHI:FMmap   5731/s        38%      31%        --    -1%    -24%     -26%   -40%    -44%
CHI:L1      5777/s        40%      32%        1%     --    -23%     -26%   -39%    -44%
CHI:Mem     7501/s        81%      71%       31%    30%      --      -4%   -21%    -27%
C:Mc:lIP    7779/s        88%      78%       36%    35%      4%       --   -18%    -24%
C:Mc:l      9484/s       129%     117%       65%    64%     26%      22%     --     -7%
C:FMmap    10230/s       147%     134%       78%    77%     36%      32%     8%      --


Benchmarking caches with ratio 1:1 and medium values

              Rate CHI:L1 CHI:Mem CHI:Mc:lIP CHI:Mc:l CHI:FMmap C:FMmap C:Mc:lIP C:Mc:l
CHI:L1      2192/s     --    -42%       -42%     -47%      -54%    -59%     -73%   -79%
CHI:Mem     3787/s    73%      --        -0%      -9%      -21%    -29%     -53%   -64%
CHI:Mc:lIP  3806/s    74%      0%         --      -8%      -20%    -29%     -53%   -64%
CHI:Mc:l    4155/s    90%     10%         9%       --      -13%    -23%     -48%   -60%
CHI:FMmap   4781/s   118%     26%        26%      15%        --    -11%     -41%   -54%
C:FMmap     5368/s   145%     42%        41%      29%       12%      --     -33%   -49%
C:Mc:lIP    8043/s   267%    112%       111%      94%       68%     50%       --   -23%
C:Mc:l     10441/s   376%    176%       174%     151%      118%     94%      30%     --


Benchmarking caches with ratio 1:10 and medium values

              Rate CHI:Mc:lIP CHI:Mc:l CHI:L1 CHI:FMmap CHI:Mem C:Mc:lIP C:Mc:l C:FMmap
CHI:Mc:lIP  4630/s         --      -7%   -28%      -30%    -45%     -48%   -55%    -59%
CHI:Mc:l    4953/s         7%       --   -23%      -25%    -42%     -44%   -52%    -56%
CHI:L1      6408/s        38%      29%     --       -3%    -25%     -27%   -37%    -43%
CHI:FMmap   6604/s        43%      33%     3%        --    -22%     -25%   -35%    -42%
CHI:Mem     8493/s        83%      71%    33%       29%      --      -4%   -17%    -25%
C:Mc:lIP    8834/s        91%      78%    38%       34%      4%       --   -14%    -22%
C:Mc:l     10218/s       121%     106%    59%       55%     20%      16%     --    -10%
C:FMmap    11298/s       144%     128%    76%       71%     33%      28%    11%      --


Benchmarking caches with ratio 1:100 and medium values

              Rate CHI:Mc:lIP CHI:Mc:l CHI:FMmap CHI:L1 C:Mc:lIP CHI:Mem C:Mc:l C:FMmap
CHI:Mc:lIP  4626/s         --     -10%      -34%   -44%     -47%    -53%   -56%    -64%
CHI:Mc:l    5141/s        11%       --      -27%   -38%     -42%    -48%   -51%    -60%
CHI:FMmap   7004/s        51%      36%        --   -15%     -20%    -30%   -33%    -45%
CHI:L1      8279/s        79%      61%       18%     --      -6%    -17%   -21%    -36%
C:Mc:lIP    8799/s        90%      71%       26%     6%       --    -12%   -16%    -32%
CHI:Mem     9943/s       115%      93%       42%    20%      13%      --    -6%    -23%
C:Mc:l     10525/s       128%     105%       50%    27%      20%      6%     --    -18%
C:FMmap    12849/s       178%     150%       83%    55%      46%     29%    22%      --

These results were producted with Perl 5.10.1, here is perl -V output for reference:

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=2.6.31.6-dsa-ia32, archname=i486-linux-gnu-thread-multi
    uname='linux murphy 2.6.31.6-dsa-ia32 #1 smp tue nov 10 09:21:59 cet 2009 i686 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.3.4', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib /usr/lib64
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1
    gnulibc_version='2.10.1'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector'


Characteristics of this binary (from libperl): 
  Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV
                        PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP USE_ITHREADS
                        USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API
  Built under linux
  Compiled at Nov 21 2009 22:39:09
  @INC:
    /etc/perl
    /usr/local/lib/perl/5.10.1
    /usr/local/share/perl/5.10.1
    /usr/lib/perl5
    /usr/share/perl5
    /usr/lib/perl/5.10
    /usr/share/perl/5.10
    /usr/local/lib/site_perl

Add benchmarking tests to MooseX::Log::Log4perl to verify overhead

May 20th, 2009

After a while I had the chance to get back to MooseX::Log::Log4perl, which is Role (based on Moose) that can be easily reused in classes requiring logging functionality.

While it is really simple to use, I still found myself often directly using the default logger approach by creating a class variable and using that. So instead of:

use Moose;
with MooseX::Log::Log4perl;
sub whatever {
    my $self = shift;
    $self->log->debug("Here I am") if $self->log->is_debug;
}

mostly the direct logger was used in the classes.

   
use Log::Log4perl;
use vars qw($log);
$log = Log::Log4perl->get_logger(__PACKAGE__);
sub whatever {
    my $self = shift;
    $log->debug("Here I am") if $log->is_debug;
}

One reason was that during that time I optimized for speed and found a hotspot to be the additional method call for the “log” method. As perl has some overhead in calling functions, this still holds true to some extend, so that’s why I added a benchmarking test to the testsuite of MooseX::Log::Log4perl.

So if you have the chance, I’d like to see if in your test environment still the performance limits (keep overhead lower than 5% compared to using Log::Log4perl directly). To run the test simply get the sources and run the test.

cpan> look MooseX::Log::Log4perl
shell# TEST_MAINT=1 prove -l -v t/99_bench.t
t/99bench.t .. 
1..6
ok 1 - Bench instance for MooseX::Log::Log4perl isa BenchMooseXLogLog4perl
ok 2 - Bench instance for Log::Log4perl isa BenchLogLog4perl
                     Rate MooseX-L4p log MooseX-L4p logger Log4perl method Log4perl direct
MooseX-L4p log    21235/s             --               -0%             -4%             -6%
MooseX-L4p logger 21273/s             0%                --             -4%             -6%
Log4perl method   22102/s             4%                4%              --             -2%
Log4perl direct   22535/s             6%                6%              2%              --
...

If all tests pass you stayed within the limits (around 95% compared to using Log4perl directly). I’d like to see your results. So please comment on it and add the comparison table to it.

Thoughts on parametrized roles for Moose

July 15th, 2008

Moose is your favorite meta-object system in your favorite language of choice. And you already have a lot of excellent concepts that extend the basic usage of object-oriented programming, like having simple means to override parts of methods (around, before, after) and of course roles.

Roles are great, I even used them for my logging needs in MooseX::Log::Log4perl as stated earlier. After getting getting feedback by Micheal Schilli to add an easier interface for simple logging needs I had two choices to accomplish that:

  • Add another role (and reuse the initial role) like it is implemented currently in MooseX::Log::Log4perl::Easy:

    package MooseX::Log::Log4perl::Easy;
    use Moose::Role; ### Make it a moose role
    with 'MooseX::Log::Log4perl'; ### Reuse the base role with its attributes and methods
    sub log_fatal { my $self = shift; $self->logger->fatal(@_); }
    sub log_error { my $self = shift; $self->logger->error(@_); }
    ...
  • Use a method alias by using import to have a function returning the correct role to use with with. This is exactly what MooseX::Storage does to allow parametrized loading of moose roles to save some typing for lazy people and improving readability.

    use Moose;
    use MooseX::Storage; ### You have to use it to allow import to provide you the Storage alias
    with Storage('format' => 'JSON', 'io' => 'File'); ### Use the function to return the correct roles

The first approch clutters my module package a little, also requiring more documentation and hinting for people to find the module (more of a problem for lazy people like me).
The downside of the second approach is, that you have to use MooseX::Storage first, to have the Storage function exported. And it does not really look like the standard way of adding a role to the object, which is usually defined using quoted string like with 'MooseX::Log::Log4perl'; (note the quotes here).
A solution to that problem might be adding another keyword function to moose e.g. called role that makes use of some import magic and returns the correct role packages to load, also calling role initialization method, that allows to do some role tricks.

use Moose;
with role 'My::Role'; ### no magic here same as: with 'My::Role';
with role 'MooseX::Log::Log4perl' => ':easy'; ### pass a param
### or even
with role 'MooseX::Log::Log4perl', prefix => 'mylog_'; ### pass the param hash/pair to a role

The role keyword would return the correct role method to load, and additionally allow the role to initialize and use the prefix parameter to do some additional initialization, even if that would mean only setting a “_role_param attribute” that could be used later in default coderefs. Using this, would mean that you cannot use multiple roles with the with role keywords, but that’s ok for me, since perl users are used to that anyway, it’s the same for use.

Bringing logging to Moose with MooseX::Log::Log4perl

July 13th, 2008

Finally after some playing around and discovering the main concepts behind moose and failing to find a logging role using my favorite logging system log4perl I sat down an did a little coding (it is really just a few lines) and uploaded to CPAN as MooseX::Log::Log4perl.

As I already received some valuable feedback by Michael Schilli of Log4perl fame, the interface might change a little (staying backwards compatible) to be easier to use for small projects by using something like with MooseX::Log::Log4perl qw(:easy) to give you methods for log_error, log_warn, log_debug, … directly on your class instance.

If your interested see the open RT ticket 37655.

This module was actually written as part of my other project called QuikPlan, a course scheduling and management web application.

Quikplan – Ein neues Catalyst Perl / jquery javascript Projekt

July 4th, 2008

Mein neues Projekt Quikplan geht an den Start. Gemeinsam mit Chris entsteht eine Kursplanungssoftware die speziell auf die Anforderungen einer Kletterhalle (oder ähnlichem) getrimmt ist.

Als Backend dient Catalyst Perl mit diversen weiteren Modulen (Moose, DBIx::Class, …) getrennt über ein JSON basierendes API. Das Frontend ist eine Web 2.0 Applikation basierend auf jQuery und einen eigens dafür erstellten Framework für einfaches und konsistentes GUI Handling.

Die ersten Erfahrungen mit Moose sind sehr vielversprechend und die erste Klasse (ein TransactionManager als Zwischenlayer zu den DBIC/DBI Transaktionen) ist bereits erstellt.

Für mehr Informationen: www.quikplan.at bzw. ein Mail auf die Reise schicken (info A T quikplan D O T at).

primary