1. Overview
2. Tagged Words
3. Basic Structure Layouts
4. Numbers
5. Procedures, code, and constants
6. Extensions
Larceny bases all its data types on a few basic representations. Type tags accessible to Scheme code distinguish between actual data types which are mapped onto the basic representations.
There are six basic representations: fixnums, immediates, and pointers
to pairs, vector-like structures, bytevector-like structures, and procedures.
Each representation consists of a tagged word and, in the case
of pointer types, heap-allocated memory.
A tagged word has a 2 or 3 bit type tag in the low order bits:
Immediates are further distinguished by more bits in the low-order byte;
they are roughly divided into constants (booleans, empty list, characters,
and miscellaneous system constants) and headers:
The bits marked
The bits marked
Miscellaneous constants include
Bignums (large exact integers) are bytevector-like with the sign in
the first two bytes (0 for positive, 1 for negative), followed by a
digit count (two bytes) and then base-$2^{32}$ digits in the next
words, with the least significant word first; each word is stored
big-endian (figure 2). While bignums cannot be 0 in user code, system
code sometimes creates bignums of value 0 in an internal calculation.
A bignum with value 0 is distinguished by a digitcount of 0; the sign
is immaterial.
The rationale for keeping a digit count which is different from the vector
length is that while (in particular) the multiplication routine must create
a vector whose length is the sum of the digit counts, some of the leading
digits may be 0, and hence we can have a lower digit count without having
to reallocate memory or use a temporary buffer.
Bignums can also be considered with a different logical layout: Each
32-bit digit can be interpreted as two 16-bit digits, also in
big-endian fashion within the word; interpreted this way, the bignum
gets a funny access pattern (figure 3). The digit count is still the
number of 32-bit digits used; see above discussion for sign and
bignums of value 0.
Ratnums (exact rationals), shown in figure 4, are vector-like, with
the first word of the vector being the numerator as a Scheme object
(fixnum or bignum), and the second word being the denominator (greater
than 1), also a Scheme object.
Rectnums (exact complexes), shown in figure 5, look like ratnums,
except that the first word is the real part (an integer or ratnum) and
the second word is the imaginary part (ditto). Both parts are exact
reals, and the imaginary part is nonzero.
Flonums (inexact reals) are bytevector-like. The first word is unused,
and the two next words contain the double in IEEE double precision
format. The rationale for the unused word is this: since everything in
the system is aligned on an 8-byte boundary, one word will be wasted
anyway. By putting it before the number rather than after, the number
becomes 8-byte aligned, and we can use a ``load double'' instruction
to load it. (Figure 6.)
Compnums (inexact complexes) are bytevector-like. The first word is
unused (see the description of the flonum for a rationale). The two
next words contain the real part. The two last words contain the
imaginary part. (Figure 7.) Both parts are IEEE double precision
numbers.
2. Tagged Words
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xx00 fixnum
xxxx xxxx xxxx xxxx xxxx xxxx xxxx xx10 immediate
pppp pppp pppp pppp pppp pppp pppp p001 pointer to pair
pppp pppp pppp pppp pppp pppp pppp p011 pointer to vector struct
pppp pppp pppp pppp pppp pppp pppp p101 pointer to bytevector struct
pppp pppp pppp pppp pppp pppp pppp p111 pointer to procedure struct
0000 0000 0000 0000 0000 0000 0000 0010 #f
0000 0000 0000 0000 0000 0000 0000 0110 #t
0000 0000 0000 0000 0000 0000 0000 1010 empty list
xxxx xxxx xxxx xxxx xxxx xxxx 0001 0110 miscellaneous
0000 0000 cccc cccc 0000 0000 0010 0110 character
0sss ssss ssss ssss ssss ssss 100x xx10 reserved header
0sss ssss ssss ssss ssss ssss 101x xx10 vector-like header
0sss ssss ssss ssss ssss ssss 110x xx10 bytevector-like header
0sss ssss ssss ssss ssss ssss 1111 1110 procedure header
xxx
in the low byte of a header are fields
for type tags: they are used by Scheme code to distinguish between
different data types mapped onto the basic structures. The type tags
can be manipulated with the typetag
and
typetag-set!
primitives.
s
in a header contain the size of the data
structure in bytes, not including the header word, and not including
any padding. For vector-like structures and procedures, the two low
bits of the length field will be 0.
#!unspecified,
,
#!undefined
, and #!eof
.
3. Basic Structure Layouts
The structure layouts are listed below. When it is stated that a
pointer of a particular kind points to a particular location, the
meaning is "the pointer with its tag stripped off".
A header word may under no circumstances be stored in the data area of a
pair, vector-like structure, or procedure structure.
4. Numbers
Fixnums (small exact integers) are unboxed and kept in the high 30
bits of a word, with the two low bits always 0 (figure 1).
+------------------------------+--+
| fixnum |00|
+------------------------------+--+
Figure 1: fixnum.
+------------------------+--------+
| length | hdrtag |
+------------------------+--------+
| sign | digitcount |
+---------------------------------+
| lsd |
+---------------------------------+
...
Figure 2: Bignum with 32-bit bigits.
+------------------------+--------+
| length | hdrtag |
+------------------------+--------+
| sign | digitcount |
+---------------------------------+
| nlsd | lsd |
+---------------------------------+
...
Figure 3: Bignum with 16-bit bigits.
+------------------------+--------+
| vectorlength | hdrtag |
+------------------------+--------+
| numerator |
+---------------------------------+
| denominator |
+---------------------------------+
Figure 4: ratnum.
+------------------------+--------+
| vectorlength | hdrtag |
+------------------------+--------+
| real-part |
+---------------------------------+
| imag-part |
+---------------------------------+
Figure 5: Rectnum.
+------------------------+--------+
| length | hdrtag |
+------------------------+--------+
| unused |
+---------------------------------+
| IEEE double precision |
| |
+---------------------------------+
Figure 6: Flonum.
+------------------------+--------+
| length | hdrtag |
+------------------------+--------+
| unused |
+---------------------------------+
| IEEE double precision |
| (real part) |
+---------------------------------+
| IEEE double precision |
| (imaginary part) |
+---------------------------------+
Figure 8: Compnum.
5. Procedures, code, and constants
As stated earlier, a procedure holds tagged words. Larceny's
procedures are very simple and serve a dual role as closures and ribs in
the static environment.
A procedure structure contains a bytevector pointer in slot 0 (it points
to the code vector), a vector pointer in slot 1 (it points to the
constant vector), a static link (procedure pointer) or #f in slot 2, and
saved register values in slots 3 and up.
A code vector is a plain byte vector; there is no way to distinguish the
two without context.
A constant vector is a plain vector. Data held by a constant vector is
immutable with one exception: a constant vector holds pointers to all
global cells which are referenced by the procedure owning the constant
vector. These cells may be assigned to. In the current implementation,
global cells are pairs where the first element holds the value and the
second element optionally holds some documentation about the variable
(currently its name).
6. Extensions
Larceny provides one new data type. A bytevector is vector
that holds bytes: exact integers in the range 0..255.
$Id: note2-repr.html,v 1.2 1998/11/25 14:38:17 lth Exp $
larceny@ccs.neu.edu