binary
Module
binary
Module Summary
Library for handling binary data.
Description
This module contains functions for manipulating byte-oriented binaries. Although the majority of functions could be provided using bit-syntax, the functions in this library are highly optimized and are expected to either execute faster or consume less memory, or both, than a counterpart written in pure Erlang.
The module is provided according to Erlang Enhancement Proposal (EEP) 31.
The library handles byte-oriented data. For bitstrings that are not binaries (does not contain whole octets of bits) a badarg
exception is thrown from any of the functions in this module.
Data Types
cp()
Opaque data type representing a compiled search pattern. Guaranteed to be a tuple()
to allow programs to distinguish it from non-precompiled search patterns.
part() = {Start :: integer() >= 0, Length :: integer()}
A representaion of a part (or range) in a binary. Start
is a zero-based offset into a binary()
and Length
is the length of that part. As input to functions in this module, a reverse part specification is allowed, constructed with a negative Length
, so that the part of the binary begins at Start
+ Length
and is -Length
long. This is useful for referencing the last N
bytes of a binary as {size(Binary), -N}
. The functions in this module always return part()
s with positive Length
.
Exports
Types
Returns the byte at position Pos
(zero-based) in binary Subject
as an integer. If Pos
>= byte_size(Subject)
, a badarg
exception is raised.
Types
Same as bin_to_list(Subject, {0,byte_size(Subject)})
.
Types
Converts Subject
to a list of byte()
s, each representing the value of one byte. part()
denotes which part of the binary()
to convert.
Example:
1> binary:bin_to_list(<<"erlang">>, {1,3}). "rla" %% or [114,108,97] in list notation.
If PosLen
in any way references outside the binary, a badarg
exception is raised.
Types
Same asbin_to_list(Subject, {Pos, Len})
.
cp()
Types
Builds an internal structure representing a compilation of a search pattern, later to be used in functions match/3
, matches/3
, split/3
, or replace/4
. The cp()
returned is guaranteed to be a tuple()
to allow programs to distinguish it from non-precompiled search patterns.
When a list of binaries is specified, it denotes a set of alternative binaries to search for. For example, if [<<"functional">>,<<"programming">>]
is specified as Pattern
, this means either <<"functional">>
or <<"programming">>
". The pattern is a set of alternatives; when only a single binary is specified, the set has only one element. The order of alternatives in a pattern is not significant.
The list of binaries used for search alternatives must be flat and proper.
If Pattern
is not a binary or a flat proper list of binaries with length > 0, a badarg
exception is raised.
Types
Same as copy(Subject, 1)
.
Types
Creates a binary with the content of Subject
duplicated N
times.
This function always creates a new binary, even if N = 1
. By using copy/1
on a binary referencing a larger binary, one can free up the larger binary for garbage collection.
By deliberately copying a single binary to avoid referencing a larger binary, one can, instead of freeing up the larger binary for later garbage collection, create much more binary data than needed. Sharing binary data is usually good. Only in special cases, when small parts reference large binaries and the large binaries are no longer used in any process, deliberate copying can be a good idea.
If N
< 0
, a badarg
exception is raised.
Types
Same as decode_unsigned(Subject, big)
.
Types
Converts the binary digit representation, in big endian or little endian, of a positive integer in Subject
to an Erlang integer()
.
Example:
1> binary:decode_unsigned(<<169,138,199>>,big). 11111111
Types
Same as encode_unsigned(Unsigned, big)
.
Types
Converts a positive integer to the smallest possible representation in a binary digit representation, either big endian or little endian.
Example:
1> binary:encode_unsigned(11111111, big). <<169,138,199>>
Types
Returns the first byte of binary Subject
as an integer. If the size of Subject
is zero, a badarg
exception is raised.
Types
Returns the last byte of binary Subject
as an integer. If the size of Subject
is zero, a badarg
exception is raised.
Types
Works exactly as erlang:list_to_binary/1
, added for completeness.
Types
Returns the length of the longest common prefix of the binaries in list Binaries
.
Example:
1> binary:longest_common_prefix([<<"erlang">>, <<"ergonomy">>]). 2 2> binary:longest_common_prefix([<<"erlang">>, <<"perl">>]). 0
If Binaries
is not a flat list of binaries, a badarg
exception is raised.
Types
Returns the length of the longest common suffix of the binaries in list Binaries
.
Example:
1> binary:longest_common_suffix([<<"erlang">>, <<"fang">>]). 3 2> binary:longest_common_suffix([<<"erlang">>, <<"perl">>]). 0
If Binaries
is not a flat list of binaries, a badarg
exception is raised.
Types
Same as match(Subject, Pattern, [])
.
Types
Searches for the first occurrence of Pattern
in Subject
and returns the position and length.
The function returns {Pos, Length}
for the binary in Pattern
, starting at the lowest position in Subject
.
Example:
1> binary:match(<<"abcde">>, [<<"bcde">>, <<"cd">>],[]). {1,4}
Even though <<"cd">>
ends before <<"bcde">>
, <<"bcde">>
begins first and is therefore the first match. If two overlapping matches begin at the same position, the longest is returned.
Summary of the options:
- {scope, {Start, Length}}
Only the specified part is searched. Return values still have offsets from the beginning of
Subject
. A negativeLength
is allowed as described in section Data Types in this manual.
If none of the strings in Pattern
is found, the atom nomatch
is returned.
For a description of Pattern
, see function compile_pattern/1
.
If {scope, {Start,Length}}
is specified in the options such that Start
> size of Subject
, Start
+ Length
< 0 or Start
+ Length
> size of Subject
, a badarg
exception is raised.
Types
Same as matches(Subject, Pattern, [])
.
Types
As match/2
, but Subject
is searched until exhausted and a list of all non-overlapping parts matching Pattern
is returned (in order).
The first and longest match is preferred to a shorter, which is illustrated by the following example:
1> binary:matches(<<"abcde">>, [<<"bcde">>,<<"bc">>,<<"de">>],[]). [{1,4}]
The result shows that <<"bcde">> is selected instead of the shorter match <<"bc">> (which would have given raise to one more match, <<"de">>). This corresponds to the behavior of POSIX regular expressions (and programs like awk), but is not consistent with alternative matches in re
(and Perl), where instead lexical ordering in the search pattern selects which string matches.
If none of the strings in a pattern is found, an empty list is returned.
For a description of Pattern
, see compile_pattern/1
. For a description of available options, see match/3
.
If {scope, {Start,Length}}
is specified in the options such that Start
> size of Subject
, Start + Length
< 0 or Start + Length
is > size of Subject
, a badarg
exception is raised.
Types
Extracts the part of binary Subject
described by PosLen
.
A negative length can be used to extract bytes at the end of a binary:
1> Bin = <<1,2,3,4,5,6,7,8,9,10>>. 2> binary:part(Bin, {byte_size(Bin), -5}). <<6,7,8,9,10>>
If PosLen
in any way references outside the binary, a badarg
exception is raised.
Types
Same as part(Subject, {Pos, Len})
.
Types
If a binary references a larger binary (often described as being a subbinary), it can be useful to get the size of the referenced binary. This function can be used in a program to trigger the use of copy/1
. By copying a binary, one can dereference the original, possibly large, binary that a smaller binary is a reference to.
Example:
store(Binary, GBSet) -> NewBin = case binary:referenced_byte_size(Binary) of Large when Large > 2 * byte_size(Binary) -> binary:copy(Binary); _ -> Binary end, gb_sets:insert(NewBin,GBSet).
In this example, we chose to copy the binary content before inserting it in gb_sets:set()
if it references a binary more than twice the data size we want to keep. Of course, different rules apply when copying to different programs.
Binary sharing occurs whenever binaries are taken apart. This is the fundamental reason why binaries are fast, decomposition can always be done with O(1) complexity. In rare circumstances this data sharing is however undesirable, why this function together with copy/1
can be useful when optimizing for memory use.
Example of binary sharing:
1> A = binary:copy(<<1>>, 100). <<1,1,1,1,1 ... 2> byte_size(A). 100 3> binary:referenced_byte_size(A) 100 4> <<_:10/binary,B:10/binary,_/binary>> = A. <<1,1,1,1,1 ... 5> byte_size(B). 10 6> binary:referenced_byte_size(B) 100
Binary data is shared among processes. If another process still references the larger binary, copying the part this process uses only consumes more memory and does not free up the larger binary for garbage collection. Use this kind of intrusive functions with extreme care and only if a real problem is detected.
Types
Same as replace(Subject, Pattern, Replacement,[])
.
Types
An integer() =< byte_size(Replacement)
Constructs a new binary by replacing the parts in Subject
matching Pattern
with the content of Replacement
.
If the matching subpart of Subject
giving raise to the replacement is to be inserted in the result, option {insert_replaced, InsPos}
inserts the matching part into Replacement
at the specified position (or positions) before inserting Replacement
into Subject
.
Example:
1> binary:replace(<<"abcde">>,<<"b">>,<<"[]">>, [{insert_replaced,1}]). <<"a[b]cde">> 2> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,1}]). <<"a[b]c[d]e">> 3> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[]">>,[global,{insert_replaced,[1,1]}]). <<"a[bb]c[dd]e">> 4> binary:replace(<<"abcde">>,[<<"b">>,<<"d">>],<<"[-]">>,[global,{insert_replaced,[1,2]}]). <<"a[b-b]c[d-d]e">>
If any position specified in InsPos
> size of the replacement binary, a badarg
exception is raised.
Options global
and {scope, part()}
work as for split/3
. The return type is always a binary()
.
For a description of Pattern
, see compile_pattern/1
.
Types
Same as split(Subject, Pattern, [])
.
Types
Splits Subject
into a list of binaries based on Pattern
. If option global
is not specified, only the first occurrence of Pattern
in Subject
gives rise to a split.
The parts of Pattern
found in Subject
are not included in the result.
Example:
1> binary:split(<<1,255,4,0,0,0,2,3>>, [<<0,0,0>>,<<2>>],[]). [<<1,255,4>>, <<2,3>>] 2> binary:split(<<0,1,0,0,4,255,255,9>>, [<<0,0>>, <<255,255>>],[global]). [<<0,1>>,<<4>>,<<9>>]
Summary of options:
- {scope, part()}
Works as in
match/3
andmatches/3
. Notice that this only defines the scope of the search for matching strings, it does not cut the binary before splitting. The bytes before and after the scope are kept in the result. See the example below.- trim
Removes trailing empty parts of the result (as does
trim
inre:split/3
.- trim_all
Removes all empty parts of the result.
- global
Repeats the split until
Subject
is exhausted. Conceptually optionglobal
makes split work on the positions returned bymatches/3
, while it normally works on the position returned bymatch/3
.
Example of the difference between a scope and taking the binary apart before splitting:
1> binary:split(<<"banana">>, [<<"a">>],[{scope,{2,3}}]). [<<"ban">>,<<"na">>] 2> binary:split(binary:part(<<"banana">>,{2,3}), [<<"a">>],[]). [<<"n">>,<<"n">>]
The return type is always a list of binaries that are all referencing Subject
. This means that the data in Subject
is not copied to new binaries, and that Subject
cannot be garbage collected until the results of the split are no longer referenced.
For a description of Pattern
, see compile_pattern/1
.
© 2010–2017 Ericsson AB
Licensed under the Apache License, Version 2.0.