LongVectors {base} | R Documentation |
Vectors of 2^31 or more elements were added in R 3.0.0.
Prior to R 3.0.0, all vectors in R were restricted to at most 2^31 - 1 elements and could be indexed by integer vectors.
Currently all atomic (raw, logical, integer, numeric, complex, character) vectors, lists and expressions can be much longer on 64-bit platforms: such vectors are referred to as ‘long vectors’ and have a slightly different internal structure. In theory they can contain up to 2^52 elements, but address space limits of current CPUs and OSes will be much smaller. Such objects will have a length that is expressed as a double, and can be indexed by double vectors.
Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.
R code typically only needs minor changes to work with long vectors,
maybe only checking that as.integer
is not used unnecessarily
for e.g. lengths. However, compiled code typically needs quite
extensive changes. Note that the .C
and
.Fortran
interfaces do not accept long vectors, so
.Call
(or similar) has to be used.
Because of the storage requirements (a minimum of 64 bytes per character string), character vectors are only going to be usable if they have a small number of distinct elements, and even then factors will be more efficient (4 bytes per element rather than 8). So it is expected that most of the usage of long vectors will be integer vectors (including factors) and numeric vectors.
It is now possible to use m x n matrices with more
than 2 billion elements. Whether matrix algebra (including
%*%
, crossprod
, svd
,
qr
, solve
and eigen
) will
actually work is somewhat implementation dependent, including the
Fortran compiler used and if an external BLAS or LAPACK is used.
An efficient parallel BLAS implementation will often be important to
obtain usable performance. For example on one particular platform
chol
on a 47,000 square matrix took about 5 hours with the
internal BLAS, 21 minutes using an optimized BLAS on one core, and 2
minutes using an optimized BLAS on 16 cores.