character literal
Syntax
' c-char ' | (1) | |
u8 ' c-char ' | (2) | (since C++17) |
u ' c-char ' | (3) | (since C++11) |
U ' c-char ' | (4) | (since C++11) |
L ' c-char ' | (5) | |
' c-char-sequence ' | (6) |
where.
- c-char is either
- a character from the source character set minus single-quote (
'
), backslash (\
), or the newline character, - escape sequence, as defined in escape sequences
- universal character name, as defined in escape sequences
- c-char-sequence is a sequence of two or more c-chars.
1) narrow character literal or ordinary character literal, e.g.
'a'
or '\n'
or '\13'
. Such literal has type char
and the value equal to the representation of c-char in the execution character set. If c-char is not representable as a single byte in the execution character set, the literal has type int
and implementation-defined value
2) UTF-8 character literal, e.g.
u8'a'
. Such literal has type char
(until C++20)char8_t
(since C++20) and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-8 code unit (that is, c-char is in the range 0x0-0x7F, inclusive). If c-char is not representable with a single UTF-8 code unit, the program is ill-formed.
3) UTF-16 character literal, e.g.
u'貓'
, but not u'????'
(u'\U0001f34c'
). Such literal has type char16_t
and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-16 code unit (that is, c-char is in the range 0x0-0xFFFF, inclusive). If c-char is not representable with a single UTF-16 code unit, the program is ill-formed.
4) UTF-32 character literal, e.g.
U'貓'
or U'????'
. Such literal has type char32_t
and the value equal to the value and the value equal to ISO 10646 code point value of c-char.
5) wide character literal, e.g.
L'β'
or L'貓'
. Such literal has type wchar_t
and the value equal to the value of c-char in the execution wide character set. If c-char is not representable in the execution character set (e.g. a non-BMP value on Windows where wchar_t is 16-bit), the value of the literal is implementation-defined.
6) Multicharacter literal, e.g.
'AB'
, has type int
and implementation-defined value.Notes
Multicharacter literals were inherited by C from the B programming language. Although not specified by the C or C++ standard, compilers implement multicharacter literals as specified in B: the values of each char in the literal initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1'
is 0x00000001
and the value of '\1\2\3\4'
is 0x01020304
.
In C, character constants such as 'a'
or '\n'
have type int
, rather than char
.
See also
user-defined literals | literals with user-defined suffix (C++11) |
© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
http://en.cppreference.com/w/cpp/language/character_literal