memCompress
In-memory Compression and Decompression
Description
In-memory compression or decompression for raw vectors.
Usage
memCompress(from, type = c("gzip", "bzip2", "xz", "none")) memDecompress(from, type = c("unknown", "gzip", "bzip2", "xz", "none"), asChar = FALSE)
Arguments
from | A raw vector. For |
type | character string, the type of compression. May be abbreviated to a single letter, defaults to the first of the alternatives. |
asChar | logical: should the result be converted to a character string? NB: character strings have a limit of 2^31 - 1 bytes, so raw vectors should be used for large inputs. |
Details
type = "none"
passes the input through unchanged, but may be useful if type
is a variable.
type = "unknown"
attempts to detect the type of compression applied (if any): this will always succeed for bzip2
compression, and will succeed for other forms if there is a suitable header. It will auto-detect the ‘magic’ header ("\x1f\x8b"
) added to files by the gzip
program (and to files written by gzfile
), but memCompress
does not add such a header. (It supports RFC 1950 format, sometimes known as ‘zlib’ format, for compression and decompression and RFC 1952 for decompression only.)
gzip
compression uses whatever is the default compression level of the underlying library (usually 6
).
bzip2
compression always adds a header ("BZh"
). The underlying library only supports in-memory (de)compression of up to 2^31 - 1 elements. Compression is equivalent to bzip2 -9
(the default).
Compressing with type = "xz"
is equivalent to compressing a file with xz -9e
(including adding the ‘magic’ header): decompression should cope with the contents of any file compressed by xz
version 4.999 and later, as well as by some versions of lzma
. There are other versions, in particular ‘raw’ streams, that are not currently handled.
All the types of compression can expand the input: for "gzip"
and "bzip2"
the maximum expansion is known and so memCompress
can always allocate sufficient space. For "xz"
it is possible (but extremely unlikely) that compression will fail if the output would have been too large.
Value
A raw vector or a character string (if asChar = TRUE
).
See Also
extSoftVersion
for the versions of the zlib
, bzip2
and xz
libraries in use.
https://en.wikipedia.org/wiki/Data_compression for background on data compression, https://zlib.net/, https://en.wikipedia.org/wiki/Gzip, http://www.bzip.org/, https://en.wikipedia.org/wiki/Bzip2, https://tukaani.org/xz/ and https://en.wikipedia.org/wiki/Xz for references about the particular schemes used.
Examples
txt <- readLines(file.path(R.home("doc"), "COPYING")) sum(nchar(txt)) txt.gz <- memCompress(txt, "g") length(txt.gz) txt2 <- strsplit(memDecompress(txt.gz, "g", asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt2)) txt.bz2 <- memCompress(txt, "b") length(txt.bz2) ## can auto-detect bzip2: txt3 <- strsplit(memDecompress(txt.bz2, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3)) ## xz compression is only worthwhile for large objects txt.xz <- memCompress(txt, "x") length(txt.xz) txt3 <- strsplit(memDecompress(txt.xz, asChar = TRUE), "\n")[[1]] stopifnot(identical(txt, txt3))
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.