Description of Benchmarks
The benchmarks described below are portable
R6RS
top-level programs.
None define any libraries, since the R6RS does not
specify any way for a portable program to define
libraries. Each description of a benchmark begins
with a link to its source code, omitting
the part that is shared
by all of these benchmarks.
All of these benchmarks, their inputs, and the Unix
script used to run them, are
online
and can be downloaded using anonymous
svn checkout.
The timings for Ikarus include just-in-time compilation.
The timings for PLT Scheme include just-in-time assembly
but do not include ahead-of-time compilation to byte code.
The Larceny and Petit Larceny timings do not include
ahead-of-time compilation to machine code.
Compilation time will be excluded for all implementations
that offer a documented process for separate compilation.
This pseudo-benchmark is an aggregate statistic that
shows the geometric mean for all benchmarks.
Where other benchmarks display timings in seconds,
the numerical scores for the geometric mean show the
(geometric) average ratio of the system's time to
the fastest system's time.
An average ratio of 1.0 is the lowest possible, and
can be achieved only by a system that is fastest on
every benchmark.
To discourage implementors from pursuing speed at the
expense of compatibility, systems that yield incorrect
results or cannot run a benchmark are arbitrarily
considered to be ten times as slow on that benchmark
as the slowest system that yields correct results.
Gabriel Benchmarks
Browsing a data base, a Gabriel benchmark, 1000 iterations.
[May be a test of string->symbol
and/or
symbol->string
.]
Symbolic differentiation, a Gabriel benchmark,
ten million iterations.
Table-driven symbolic differentiation, a Gabriel benchmark,
ten million iterations.
Uses hashtables and association lists instead of the original
benchmark's property lists.
Destructive list operations, a Gabriel benchmark, 1000 iterations
of a 600x50 problem.
Divides 1000 by 2 using lists as a unary notation for integers,
a Gabriel benchmark, one million iterations.
This benchmark tests
null?
, cons
, car
,
cdr
, and little else.
This benchmark is the same as diviter
except it uses deep recursion instead of iteration.
Combinatorial search of a state space, a Gabriel benchmark, 500 iterations.
A test of arrays and classical compiler optimizations.
This benchmark was originally written in Pascal by Forrest Baskett.
Another combinatorial search similar to
puzzle
,
a Gabriel benchmark,
50 iterations.
A triply recursive integer function related to the Takeuchi function,
a Gabriel benchmark.
10 iterations of (tak 32 16 8)
.
A test of non-tail calls and arithmetic.
[Historical note:
The Symbolics 3600 performed 1 iteration of (tak 18 12 6)
in 0.43 seconds using generic arithmetic.
On our test machine, Larceny runs that benchmark in 0.00083 seconds.
That's 500 times as fast.]
The tak:32:16:8
benchmark
using lists to represent integers, a Gabriel benchmark
(with different arguments), 2 iterations.
The takl
benchmark contains a
peculiar boolean expression. Rewriting that expression into a
more readable idiom allows some compilers to generate better
code for it.
The tak:32:16:8
benchmark
in continuation-passing style, 5 iterations.
A test of closure creation.
The tak:32:16:8
benchmark
in continuation-capturing style, 1 iteration.
A test of call-with-current-continuation
.
[Larceny's code for call-with-current-continuation
is now
written in C, and most of its time on this benchmark is spent crossing
the Scheme/C barrier.]
Numerical Benchmarks
Doubly recursive computation of the 40th fibonacci number
(102334155), using (< n 2)
to terminate the
recursion; 1 iteration.
A version of fib
that uses first class continuations;
written by Kent Dybvig. Calculates the 30th Fibonacci number
(832040) 10 times.
Calculation of the 35th Fibonacci number using inexact numbers;
10 iterations.
A test of floating point arithmetic.
Uses essentially the same code as the fib
benchmark.
Sums the integers from 0 to 10000, 100000 iterations.
Sums the integers from 0 to 1e6, 250 iterations.
A test of floating point arithmetic.
Uses essentially the same code as the sum
benchmark.
Fast Fourier Transform on 65536 real-valued points, 50 iterations.
A test of floating point arithmetic.
Generation of a Mandelbrot set, 1000 iterations on a problem of
size 75.
A test of floating point arithmetic on reals.
Same as the mbrot
benchmark,
but using complex instead of real arithmetic.
Generation of a Mandelbrot set, 1000 iterations on a problem of
size 75.
A test of floating point arithmetic.
Determination of a nucleic acid's spatial structure, 50 iterations.
A test of floating point arithmetic, and a real program.
Testing to see whether a point is contained within a 2-dimensional
polygon, 500000 iterations (with 12 tests per iteration).
A test of floating point arithmetic.
Kernighan and Van Wyk Benchmarks
Brian W Kernighan and Christopher J Van Wyk wrote a set of small
benchmarks
to compare the performance of several scripting languages, including
C and Scheme. Marc Feeley and I modified some of these benchmarks
to correct bugs and to increase the number of iterations.
When I translated them into R6RS Scheme, I rewrote most of
them into slightly more idiomatic Scheme.
A version of the Ackermann function, with arguments 3,12.
One iteration.
This benchmark allocates, initializes, and copies some fairly
large one-dimensional arrays. 100 iterations on a problem
size of one million.
This tests string-append
and substring
,
and very little else. 10 iterations on a problem size of 500000.
This benchmark reads and sums 100,000 floating point numbers
ten times. It is primarily a test of floating point input.
This file-copying benchmark is a simple test of character i/o.
It copies the King James Bible 25 times.
Same as cat
except it uses
UTF-8 transcoding instead of Latin-1.
Same as cat
except it uses
UTF-16 transcoding instead of Latin-1.
This benchmark performs considerable character i/o.
It prints the King James Bible verse by verse, in reverse
order of the verses, ten times.
Another character i/o benchmark.
It counts the number of words in the King James Bible
25 times.
More Input/Output Benchmarks
This synthetic benchmark tests the read
procedure on all 1-character inputs and on all 2-character
inputs that begin with #\a
.
Since most such inputs are illegal, this is largely a test
of R6RS exception handling.
Reads nboyer.sch
2500 times using Latin-1
transcoding.
Reads nboyer.sch
2500 times using UTF-8
transcoding.
Reads nboyer.sch
2500 times using UTF-16
transcoding.
Other Benchmarks
Uses eq?
hashtables to find the words that occur
most frequently in the King James Bible.
Uses symbol-hash
hashtables to find the words
that occur most frequently in the King James Bible.
A compiler kernel that looks as though it was written by Marc Feeley.
1000 iterations on a 47-line input.
[Although Larceny/IA32 is able to run this benchmark,
Larceny/SPARC cannot compile it due to its assumption
that stack frames are smaller than 4096 bytes.]
A type checker written by Jim Miller, 200 iterations.
Dynamic type inference, self-applied, 200 iterations.
Written by Fritz Henglein.
A real program.
Earley's parsing algorithm, parsing a 15-symbol input according to one
of the simplest ambiguous grammars, 1 iteration.
A real program, applied to toy data whose exponential behavior
leads to a peak heap size of half a gigabyte or more.
This program was provided by Andrew Wright, but we don't know much
about it, and would appreciate more information.
This higher order program creates closures almost as often as it
performs non-tail procedure calls.
One iteration on a problem of size 7.
Another program that was provided by Andrew Wright,
though it may have been written by Jim Miller.
It enumerates the order-preserving maps between finite lattices.
10 iterations.
Another program that was provided by Andrew Wright.
Computes maximal matrices; similar to some puzzle programs.
1000 iterations on a problem of size 5.
Constructs a maze on a hexagonal grid, 5000 iterations.
Written by Olin Shivers.
Constructs a maze on a rectangular grid using purely functional style,
5000 iterations on a problem of size 11.
Written by Marc Feeley.
Computes the number of solutions to the 13-queens problem,
10 times.
Computes the number of paraffins that have 23 carbon atoms,
5 times.
Parses the nboyer
benchmark 1000 times
using a scanner and parser generated using Will Clinger's
LexGen and ParseGen.
Partial evaluation of Scheme code, 1000 iterations.
Written by Marc Feeley.
A bignum-intensive benchmark that calculates digits of pi.
Computes the primes less than 1000, 5000 times, using
a list-based Sieve of Eratosthenes.
Written by Eric Mohr.
This is a quicksort benchmark.
(That isn't as obvious as it sounds. The quicksort
benchmark distributed with Gambit is a bignum benchmark,
not a quicksort benchmark. See the comments in
the code.)
Sorts a vector of 10000 random integers 2500 times.
Written by Lars Hansen, and restored to its original glory
by Will Clinger.
Ray tracing a simple scene, 20 iterations.
A test of floating point arithmetic.
This program is translated from the Common Lisp code in
Example 9.8 of Paul Graham's book on ANSI Common Lisp.
A Scheme interpreter evaluating a merge sort of 30 strings,
100000 iterations.
Written by Marc Feeley.
Simplex algorithm, one million iterations.
A test of floating point arithmetic, and a real program.
Scheme to LaTeX processor, 100 iterations.
A test of file i/o and probably much else.
Part of a real program written by Dorai Sitaram.
Garbage Collection Benchmarks
An updated and exponentially scalable version of the
boyer
benchmark.
The nboyer
benchmark's data structures are
considerably more appropriate than the data structures used in the
boyer
benchmarks.
These timings are for 1 iteration on
a problem of size 4.
A test of lists, vectors, and garbage collection.
A version of nboyer that has been tuned (by Henry
Baker) to reduce storage allocation, making it less of a garbage collection
benchmark and more of a compiler benchmark. Only 4 lines of code were
changed, and another 7 lines of code were added.
These timings are for 1 iteration on a problem of size 5.
This program was written to mimic the phase structure that has been
conjectured for a class of application programs for which garbage
collection may represent a significant fraction of the execution time.
This benchmark warms up by allocating and then dropping a large
binary tree.
Then it allocates a large permanent tree and a permanent array
of floating point numbers.
Then it allocates considerable tree storage in seven phases,
increasing the tree size in each phase but keeping the total
storage allocation approximately the same for each phase.
Each phase is divided into two subphases. The first subphase allocates
trees top-down using side effects, while the second subphase allocates
trees bottom-up without using side effects.
This benchmark was written in Java by John Ellis and Pete Kovac,
modified by Hans Boehm, and translated into Scheme, Standard ML,
C++, and C by William Clinger.
The timings shown are for 1 iteration on problem size 20.
The mperm20:9:2:1
benchmark
is a severe test of storage allocation and garbage collection.
At the end of each of the 20 iterations, the oldest half of the
live storage becomes garbage.
This benchmark is particularly difficult for generational garbage
collectors, since it violates their assumption that young objects
have a shorter future life expectancy than older objects.
The perm9
benchmark distributed with Gambit
does not have that property.
A synthetic garbage collection benchmark written by David Detlefs
and translated to Scheme by Will Clinger and Lars Hansen.
Synthetic Benchmarks for R6RS
The R6RS adds several new features that are not tested by
older benchmarks because they were not a standard part of
Scheme. Most of the following synthetic benchmarks were
derived from Larceny's test suites for these features.
This benchmark tests the R6RS equal?
predicate
on some fairly large structures of various shapes.
This benchmark runs all of the Unicode 5.0.0 tests
of string normalization.
This benchmark tests conversions between bytevectors
and Unicode.
This benchmark tests the list-sort
procedure.
Otherwise it is the same as the
vecsort
benchmark.
This benchmark tests the vector-sort
procedure.
Otherwise it is the same as the
listsort
benchmark.
This is a synthetic benchmark designed to stress hashtables.
Last updated 20 July 2008.