Perl cache modules performance
Random January 4th, 2010
There are a lot of cache modules available on the great CPAN already. The newest kid on the block is CHI, a Moose based intelligent and flexible caching solution with a very sane API and good design, separating the driver backends from the caching logic as much as possible.
The results show, that Cache::FastMmap is by far the most efficient implementation. Cache::FastMmap (just like memory based caches) is limited to the local host by design, whereas the Memcached based caches allow distributed caching, that can be accessed from various hosts. A distributed cache is a more flexible solution for scaling your application and caching requirements. With CHI using a subcache is possible, and the L1 cache subcache implementation is exactly what can be used to combine a fast local cache with a slower but persistent “across application restarts) cache.
It seems that CHI has a still room for improvement in terms of efficiency, which probably is caused by using Moose and not doing any XS optimizations yet. But still the CHI::L1 combination of memory and memcached is quite efficient, when dealing with a high number of cache reads compared to writes (see 1:100 ratio in the test below).
Of course this benchmark does not mimic any real world scenario (and most notably not yours), but should give some overview of what overhead the caching layer itself poses. Keys used for storing are always 36 character UUID strings. The values used for caching are separated into small, medium and large datasets. Small values are actually random binary UUIDs (16 bytes), the medium dataset the same but with 10 times longer (160 bytes) values, and the large dataset with 100 times the UUID length (1600 bytes). The tests how set/get ratios related always use the medium dataset (160 byte values) and the used ratios are 1:1 (you should probably not do any caching in that situation anyway), 1:10 and 1:100.
The script used to do the benchmarks is attached and is intended to be run with prove -v bench_cache.pl (it’s actually a test useing Test::More and not cleaned up). But if you find some obvious mistakes in generating the benchmarks I would be interested to know.
The script used to generate these results is available here: bench_cache.pl (and is being tuned to allow some graphing of the results)
- The abbriviations used below are listed here:
- CHI:Mc:lIP … CHI::Driver::Memcached::libmemcached over IP
- CHI:Mc:l … CHI::Driver::Memcached::libmemcached over Socket
- CHI:L1 … CHI::Driver::Memcached::libmemcached (IP) with CHI::Driver::Memory L1 cache
- CHI:FMmap … CHI::Driver::FastMmap
- CHI:Mem … CHI::Driver::Memory (with max_size set)
- C:Mc:lIP … Cache::Memcached::libmemcached over IP
- C:Mc:l … Cache::Memcached::libmemcached over Socket
- C:FMmap … Cache::FastMmap
And here are the results being generated on a Dell D830 dual-core laptop using perl-5.10.1 of Debian testing:
Benchmarking caches with ratio 1:10 and small values
Rate CHI:Mc:lIP CHI:Mc:l CHI:L1 CHI:FMmap CHI:Mem C:Mc:lIP C:Mc:l C:FMmap CHI:Mc:lIP 5107/s -- -2% -23% -26% -43% -43% -49% -56% CHI:Mc:l 5219/s 2% -- -22% -25% -41% -42% -48% -55% CHI:L1 6669/s 31% 28% -- -4% -25% -26% -34% -42% CHI:FMmap 6920/s 36% 33% 4% -- -22% -23% -31% -40% CHI:Mem 8885/s 74% 70% 33% 28% -- -1% -12% -23% C:Mc:lIP 8986/s 76% 72% 35% 30% 1% -- -11% -22% C:Mc:l 10087/s 98% 93% 51% 46% 14% 12% -- -12% C:FMmap 11498/s 125% 120% 72% 66% 29% 28% 14% --
Benchmarking caches with ratio 1:10 and medium values
Rate CHI:Mc:lIP CHI:Mc:l CHI:L1 CHI:FMmap C:Mc:lIP CHI:Mem C:Mc:l C:FMmap CHI:Mc:lIP 4628/s -- -10% -30% -30% -46% -47% -55% -59% CHI:Mc:l 5140/s 11% -- -23% -23% -40% -41% -50% -54% CHI:L1 6639/s 43% 29% -- -0% -23% -23% -35% -41% CHI:FMmap 6643/s 44% 29% 0% -- -23% -23% -35% -41% C:Mc:lIP 8615/s 86% 68% 30% 30% -- -1% -15% -23% CHI:Mem 8661/s 87% 69% 30% 30% 1% -- -15% -23% C:Mc:l 10188/s 120% 98% 53% 53% 18% 18% -- -9% C:FMmap 11201/s 142% 118% 69% 69% 30% 29% 10% --
Benchmarking caches with ratio 1:10 and large values
Rate CHI:Mc:lIP CHI:Mc:l CHI:FMmap CHI:L1 CHI:Mem C:Mc:lIP C:Mc:l C:FMmap CHI:Mc:lIP 4139/s -- -5% -28% -28% -45% -47% -56% -60% CHI:Mc:l 4380/s 6% -- -24% -24% -42% -44% -54% -57% CHI:FMmap 5731/s 38% 31% -- -1% -24% -26% -40% -44% CHI:L1 5777/s 40% 32% 1% -- -23% -26% -39% -44% CHI:Mem 7501/s 81% 71% 31% 30% -- -4% -21% -27% C:Mc:lIP 7779/s 88% 78% 36% 35% 4% -- -18% -24% C:Mc:l 9484/s 129% 117% 65% 64% 26% 22% -- -7% C:FMmap 10230/s 147% 134% 78% 77% 36% 32% 8% --
Benchmarking caches with ratio 1:1 and medium values
Rate CHI:L1 CHI:Mem CHI:Mc:lIP CHI:Mc:l CHI:FMmap C:FMmap C:Mc:lIP C:Mc:l CHI:L1 2192/s -- -42% -42% -47% -54% -59% -73% -79% CHI:Mem 3787/s 73% -- -0% -9% -21% -29% -53% -64% CHI:Mc:lIP 3806/s 74% 0% -- -8% -20% -29% -53% -64% CHI:Mc:l 4155/s 90% 10% 9% -- -13% -23% -48% -60% CHI:FMmap 4781/s 118% 26% 26% 15% -- -11% -41% -54% C:FMmap 5368/s 145% 42% 41% 29% 12% -- -33% -49% C:Mc:lIP 8043/s 267% 112% 111% 94% 68% 50% -- -23% C:Mc:l 10441/s 376% 176% 174% 151% 118% 94% 30% --
Benchmarking caches with ratio 1:10 and medium values
Rate CHI:Mc:lIP CHI:Mc:l CHI:L1 CHI:FMmap CHI:Mem C:Mc:lIP C:Mc:l C:FMmap CHI:Mc:lIP 4630/s -- -7% -28% -30% -45% -48% -55% -59% CHI:Mc:l 4953/s 7% -- -23% -25% -42% -44% -52% -56% CHI:L1 6408/s 38% 29% -- -3% -25% -27% -37% -43% CHI:FMmap 6604/s 43% 33% 3% -- -22% -25% -35% -42% CHI:Mem 8493/s 83% 71% 33% 29% -- -4% -17% -25% C:Mc:lIP 8834/s 91% 78% 38% 34% 4% -- -14% -22% C:Mc:l 10218/s 121% 106% 59% 55% 20% 16% -- -10% C:FMmap 11298/s 144% 128% 76% 71% 33% 28% 11% --
Benchmarking caches with ratio 1:100 and medium values
Rate CHI:Mc:lIP CHI:Mc:l CHI:FMmap CHI:L1 C:Mc:lIP CHI:Mem C:Mc:l C:FMmap CHI:Mc:lIP 4626/s -- -10% -34% -44% -47% -53% -56% -64% CHI:Mc:l 5141/s 11% -- -27% -38% -42% -48% -51% -60% CHI:FMmap 7004/s 51% 36% -- -15% -20% -30% -33% -45% CHI:L1 8279/s 79% 61% 18% -- -6% -17% -21% -36% C:Mc:lIP 8799/s 90% 71% 26% 6% -- -12% -16% -32% CHI:Mem 9943/s 115% 93% 42% 20% 13% -- -6% -23% C:Mc:l 10525/s 128% 105% 50% 27% 20% 6% -- -18% C:FMmap 12849/s 178% 150% 83% 55% 46% 29% 22% --
These results were producted with Perl 5.10.1, here is perl -V output for reference:
Summary of my perl5 (revision 5 version 10 subversion 1) configuration: Platform: osname=linux, osvers=2.6.31.6-dsa-ia32, archname=i486-linux-gnu-thread-multi uname='linux murphy 2.6.31.6-dsa-ia32 #1 smp tue nov 10 09:21:59 cet 2009 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.10.1 -Dsitearch=/usr/local/lib/perl/5.10.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.10.1 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define useithreads=define, usemultiplicity=define useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=undef, use64bitall=undef, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2 -g', cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include' ccversion='', gccversion='4.3.4', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib /usr/lib64 libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.1 gnulibc_version='2.10.1' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_DONT_CREATE_GVSV PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API Built under linux Compiled at Nov 21 2009 22:39:09 @INC: /etc/perl /usr/local/lib/perl/5.10.1 /usr/local/share/perl/5.10.1 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.10 /usr/share/perl/5.10 /usr/local/lib/site_perl
- Category: Bits & Bytes A-Tags: Moose, Perl
- Comments Off on Perl cache modules performance