Hi Keith,
Thank you for posting this.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
If so, what happens to the 254 trap representations that GCC and Clang reserve for `_Bool`? Assuming a width of 1, each of those 254 object
representations represents a value in `_Bool`'s domain (the half whose
value bit is 1 represents the value `true`, while the other half whose
value bit is 0 represents the value `false`), so they cannot be thought
of as non-value representations (since a non-value representation must
be an object representation that **does not** represent a value of the
object type).
On 2025-01-17, m137 <learningcpp1@gmail.com> wrote:
Hi Keith,
Thank you for posting this.
When, where? No attribution; referenced article is expired from this
Eternal September server, which has decently long retentation times.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
That is correct. Probably because "trap representation" insinuates
that such a representation *must* produce a trap, or else the
implementation has no right to specify such a representation.
Impelmentations are not obliged to produce traps in relation to
non-value representations. Since the behaviors in question are
undefined, they may do so.
If so, what happens to the 254 trap representations that GCC and Clang
reserve for `_Bool`? Assuming a width of 1, each of those 254 object
GCC and Clang specifies trap representations for _Bool? Where is this
found in their documentation?
representations represents a value in `_Bool`'s domain (the half whose
value bit is 1 represents the value `true`, while the other half whose
value bit is 0 represents the value `false`), so they cannot be thought
of as non-value representations (since a non-value representation must
be an object representation that **does not** represent a value of the
object type).
In an integer type, it is indeed possible for the padding bits to be
nonzero, without changing the value given by the value bits.
However, how that works is not specified; it's up to an implementation,
and doesn't have to be documented.
An implementation could say that the padding bits don't mean anything;
they can have any value whatsoever and so the situation is as you
say: the bool representations with a 0 in the value bit are all false,
and those with a 1 are all true.
However, an implementation can also say that certain patterns of
bits are non-value reprensentations.
One example given is the possibility of parity bits. Suppose some
integer type has one padding bit which behaves as a parity bit. Then
suppose whenever that bit has incorrect parity, the representation is
deemed a non-value representation.
With regard to bool (say, one implemented in 8 bits), an impelmentation
can assert that if there is a nonzero value in any padding bit, the
result is a non-value representation. Then, only 0 and 1 are valid;
all other byte codes are non-value representations.
Implementations determine their own rules for how configurations of
padding bits may, on their own, or in interaction with configurations
of value bits, give rise to non-value representations.
On 2025-01-17, m137 <learningcpp1@gmail.com> wrote:
Hi Keith,
Thank you for posting this.
When, where? No attribution; referenced article is expired from this
Eternal September server, which has decently long retentation times.
Hi Keith,
Thank you for posting this.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
- **Trap representation** was last defined in [N2731 3.19.4(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf#page=) as "an object representation that need not represent a value of the
object type."
- **Non-value representation** is most recently defined in [N3435 3.26(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3435.pdf#page=23) as "an object representation that does not represent a value of the
object type."
The definition of non-value representation rules out object
representations that represent a value of the object type from being non-value representations. So it seems to be stricter than the
definition of trap representation, which does not seem to rule out such object representations from being trap representations. Is this interpretation correct?
If so, what happens to the 254 trap representations that GCC and Clang reserve for `_Bool`? Assuming a width of 1, each of those 254 object representations represents a value in `_Bool`'s domain (the half whose
value bit is 1 represents the value `true`, while the other half whose
value bit is 0 represents the value `false`), so they cannot be thought
of as non-value representations (since a non-value representation must
be an object representation that **does not** represent a value of the
object type).
On 2025-01-17, m137 <learningcpp1@gmail.com> wrote:
Hi Keith,
Thank you for posting this.
When, where? No attribution; referenced article is expired from this
Eternal September server, which has decently long retentation times.
Hi Keith,
Thank you for posting this.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
- **Trap representation** was last defined in [N2731 3.19.4(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf#page=) as "an object representation that need not represent a value of the
object type."
- **Non-value representation** is most recently defined in [N3435 3.26(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3435.pdf#page=23) as "an object representation that does not represent a value of the
object type."
The definition of non-value representation rules out object
representations that represent a value of the object type from being non-value representations. So it seems to be stricter than the
definition of trap representation, which does not seem to rule out such object representations from being trap representations. Is this interpretation correct?
If so, what happens to the 254 trap representations that GCC and Clang reserve for `_Bool`?
Assuming a width of 1, each of those 254 object representations represents a value in `_Bool`'s domain (the half whose
value bit is 1 represents the value `true`, while the other half whose
value bit is 0 represents the value `false`), so they cannot be thought
of as non-value representations (since a non-value representation must
be an object representation that **does not** represent a value of the
object type).
I've been stuck on this for quite some time, so would be grateful for
any guidance you could provide.
learningcpp1@gmail.com (m137) writes:
Hi Keith,
Thank you for posting this.
The message being referred to is one I posted Sun 2021-05-23, with
Message-ID <87tums515a.fsf@nosuchdomain.example.com>. It's visible on
Google Groups at <https://groups.google.com/g/comp.lang.c/c/4FUlV_XkmXg/m/OG8WeUCfAwAJ>.
As others have suggested, please include attribution information when
posting a followup. You don't need to quote the entire message,
but provide at least some context, particularly when the parent
message is old.
This is an update to that message.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
- **Trap representation** was last defined in [N2731 3.19.4(1)]
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf#page=)
as "an object representation that need not represent a value of the
object type."
- **Non-value representation** is most recently defined in
[N3435 3.26(1)]
(https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3435.pdf#page=23)
as "an object representation that does not represent a value of the
object type."
The definition of non-value representation rules out object
representations that represent a value of the object type from
being non-value representations. So it seems to be stricter than
the definition of trap representation, which does not seem to rule
out such object representations from being trap representations.
Is this interpretation correct?
I don't believe so. As far as I can tell, a "non-value
representation" (C23 and later) is exactly the same thing as a
"trap representation" (C17 and earlier). The older term was
probably considered unclear, since it could imply that a trap is
required. In fact, reading an object with a trap/non-value
representation has undefined behavior, which can include yielding
the value you might have expected.
If so, what happens to the 254 trap representations that GCC and
Clang reserve for `_Bool`?
I see no evidence in gcc's documentation that gcc treats
representations other than 0 or 1 as trap/non-value representations.
I see only two references to "trap representation", one for signed
integer types (saying that there are no trap representations) and
one regarding type-punning via unions. There are no relevant
references to "padding bits".
I'm less familiar with clang's documentation, but I see no reference
to "trap representation" or "non-value representation".
We can get some information about this by running a test program.
See below.
Assuming a width of 1, each of those 254
object representations represents a value in `_Bool`'s domain (the
half whose value bit is 1 represents the value `true`, while the
other half whose value bit is 0 represents the value `false`), so
they cannot be thought of as non-value representations (since a
non-value representation must be an object representation that
**does not** represent a value of the object type).
Reading an object with a non-value representation has undefined
behavior. If the observed value happens to be a valid value of
the object's type, that's still consistent with undefined
behavior. *Everything* is consistent with undefined behavior.
I've been stuck on this for quite some time, so would be grateful
for any guidance you could provide.
Editions of the C standard earlier than C23 were not entirely
clear about the representation of _Bool. (C90 does not have _Bool
or bool. C99 through C17 have _Bool as a keyword, with bool as
a macro defined in <stdbool.h>. C23 has bool as a keyword, with
_Bool as an alternate spelling.)
In C99 and later, _Bool/bool is required to be an unsigned integer
type large enough to hold the values 0 and 1. Its size must be at
least CHAR_BIT bits (which is at least 8). The *rank* of _Bool is
less than the rank of all other standard integer types.
The rank implies that the range of values is a subset of the
range of values of any other unsigned integer type. The rank does
*not* imply anything about relative sizes. unsigned char has a
higher rank than bool, but bool could have additional padding bits
making sizeof(bool)>1. (Probably no implementation does this.)
unsigned char has no padding bits.
C11 implies that _Bool can have more than one value bit, which
means it could represent values greater than 1 (but no more than 0..UCHAR_MAX).
C23 (I'm using the N3096 draft) tightens the requirements, saying
that bool has exactly one value bit and (sizeof(bool)*CHAR_BIT)-1
padding bits -- again implying that sizeof(bool) might be greater
than 1, but forbidding values greater than 1.
Typically in C17 and earlier, and always in C23, _Bool/bool will
have exactly 1 value bit and CHAR_BIT-1 padding bits. Padding bits
do not contribute to the value of an object (so 0 and 1 are the
only possible values), but non-zero padding bits *may or may not*
create trap/non-value representations. (A gratuitously exotic
implementation might use a representation other than 00000001 for
true, but 00000000 is guaranteed to be a representation for 0/false.)
As far as I can tell, the standard is silent on whether a bool object
with non-zero padding bits is a trap/non-value representation or not.
I wrote a test program to explore how bool is treated. It uses
memcpy to set the representation of a bool object and then prints
the value of that object. Source is at the bottom of this message.
If bool has no non-value representations, then the values of the
CHAR_BIT-1 padding bits must be ignored when reading a bool object,
and the value of such an object is determined only by its single
value bit, 0 or 1. If it does have non-value representations,
then reading such an object has undefined behavior.
With gcc 14.2.0, with "-std=c23", all-zeros is treated as false
when used in a condition and all other representations are treated
as true. Converting the value of a bool object to another integer
type yields the value of its full 8-bit representation. If a bool
object holds a representation other than 00000000 or 00000001,
it compares equal to both `true` and `false`.
This implies that bool has 1 value bit and 7 padding bits (as
required by C23) and that it has 2 value representations and 254
trap representations. The observed behavior for the non-value representations is the result of undefined behavior. (gcc -std=c23
sets __STDC_VERSION__ to 202000L, not 202311L. The documentation acknowledges that support for C23 is experimental and incomplete.)
With clang 19.1.4, with "-std=c23", the behavior is consistent
with bool having no non-value representations. The 7 padding bits
do not contribute to the value of a bool object. Any bool object
with 0 as the low-order bit is treated as false in a condition and
yields 0 when converted to another integer type,. Any bool object
with 1 as the low-order bit is treated as true, and yields 1 when
converted to another integer type. I presume the intent is for bool
to have 256 value representations and no non-value representations
(with the padding bits ignored as required), but it's also consistent
with bool having non-value representations and the observed behavior
being undefined. It's not possible to determine with a test program
whether the output is the result of undefined behavior or not.
As far as I can tell, the question of whether bool has non-value representations is unspecified but not implementation-defined,
meaning that an implementation is not required to document its
choice.
#include <stdio.h>
#include <string.h>
#include <limits.h>
#if __STDC_VERSION__ < 202311L
#include <stdbool.h>
#endif
int main() {
printf("__STDC_VERSION__ = %ldL\n", __STDC_VERSION__);
#if __STDC_VERSION__ < 202311L
puts("Older than C23, using <stdbool.h>");
#else
puts("C23 or later, using bool directly");
#endif
printf("sizeof (unsigned char) = %zu, sizeof (bool) = %zu\n",
sizeof (unsigned char), sizeof (bool));
const bool no = false;
const bool yes = true;
unsigned char uc;
memcpy(&uc, &no, 1);
printf("false is represented as %d\n", (int)uc);
memcpy(&uc, &yes, 1);
printf("true is represented as %d\n", (int)uc);
for (int i = 0; i <= UCHAR_MAX; i ++) {
const unsigned char uc = i;
bool b;
memcpy(&b, &uc, 1);
const unsigned char value = b;
printf("uc = 0x%02x b = 0x%02x b is %s, b%sfalse, b%strue\n",
(unsigned)uc,
value,
b ? "truthy" : "falsy ",
b == false ? "==" : "!=",
b == true ? "==" : "!=");
}
}
On 2025-01-17, m137 <learningcpp1@gmail.com> wrote:
Hi Keith,
Thank you for posting this.
When, where? No attribution; referenced article is expired from this
Eternal September server, which has decently long retentation times.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
That is correct. Probably because "trap representation" insinuates
that such a representation *must* produce a trap, or else the
implementation has no right to specify such a representation.
Impelmentations are not obliged to produce traps in relation to
non-value representations. Since the behaviors in question are
undefined, they may do so.
If so, what happens to the 254 trap representations that GCC and Clang
reserve for `_Bool`? Assuming a width of 1, each of those 254 object
GCC and Clang specifies trap representations for _Bool? Where is this
found in their documentation?
In an integer type, it is indeed possible for the padding bits to be
nonzero, without changing the value given by the value bits.
However, how that works is not specified; it's up to an implementation,
and doesn't have to be documented.
An implementation could say that the padding bits don't mean anything;
they can have any value whatsoever and so the situation is as you
say: the bool representations with a 0 in the value bit are all false,
and those with a 1 are all true.
However, an implementation can also say that certain patterns of
bits are non-value reprensentations.
One example given is the possibility of parity bits. Suppose some
integer type has one padding bit which behaves as a parity bit. Then suppose whenever that bit has incorrect parity, the representation is
deemed a non-value representation.
With regard to bool (say, one implemented in 8 bits), an impelmentation
can assert that if there is a nonzero value in any padding bit, the
result is a non-value representation. Then, only 0 and 1 are valid;
all other byte codes are non-value representations.
Implementations determine their own rules for how configurations of
padding bits may, on their own, or in interaction with configurations
of value bits, give rise to non-value representations.
learningcpp1@gmail.com (m137) writes:
Hi Keith,
Thank you for posting this.
Normally followup postings include a reference of some sort to the
article being replied to.
I noticed that the newer drafts of C23
(N2912 onwards, I think) have replaced the term "trap representation"
with "non-value representation":
- **Trap representation** was last defined in [N2731
3.19.4(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf#page=) >> as "an object representation that need not represent a value of the
object type."
- **Non-value representation** is most recently defined in [N3435
3.26(1)](https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3435.pdf#page=23) >> as "an object representation that does not represent a value of the
object type."
The definition of non-value representation rules out object
representations that represent a value of the object type from being
non-value representations. So it seems to be stricter than the
definition of trap representation, which does not seem to rule out such
object representations from being trap representations. Is this
interpretation correct?
No. Except for using a different name, there is no difference
between "trap representation" and "non-value representation".
Let's assume 8-bit chars, and also that the width of _Bool is 1
(which is optional before C23 and required in C23). Here is what
can be said about the 256 states of a _Bool object.
1. All zero bits must be a legal value for 0.
2. There must be at least one combination of bits that is a legal
value for 1 (and since it must be distinct from the all-zero
value for 0, must have at least one bit set to 1).
3. The remaining 254 possible combinations of bit settings can be
any mixture of legal values and trap representations, which are also
known as non-value representations starting in C23.
4. Considering the set of legal value bit settings, there must be at
least one bit position that is 0 in all cases where the value is
0, and is 1 in all cases where the value is 1.
5. Accessing any representation corresponding to a legal value has well-defined behavior, and yields 0 or 1 depending on the setting of
the bit (or bits) mentioned in #4.
6. Accessing any trap/non-value representation is undefined behavior
and might do anything at all. It might appear to work. It might
work in some cases but not others. It might yield a value that is
neither 0 or 1. It might abort the program. It might cause the
computer the program is running on to run a different operating
system (of course this outcome isn't very likely, but as far as the
C standard is concerned it cannot be ruled out).
Does this answer all your questions?
It is not documented (see this thread for GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88662). But I think it can
be inferred from the code snippets in Keith's OP and most recent post.
GCC seems to treat all object representations of `_Bool` other than 0b00000000 and 0b00000001 as trap/non-value representations.
I am not sure about Clang, but compiling the last snippet in this
article: https://www.trust-in-soft.com/resources/blogs/2016-06-16-trap-representations-and-padding-bits with
Clang 19.1.0 and options "-std=c23 -O3 -pedantic" seems to show that
Clang treats `_Bool` as having 254 non-value representations (see here: https://gcc.godbolt.org/z/4jK9d69P8).
The message being referred to is one I posted Sun 2021-05-23, with
Message-ID <87tums515a.fsf@nosuchdomain.example.com>. It's visible on
Google Groups at <https://groups.google.com/g/comp.lang.c/c/4FUlV_XkmXg/m/OG8WeUCfAwAJ>.
As others have suggested, please include attribution information when
posting a followup. You don't need to quote the entire message,
but provide at least some context, particularly when the parent
message is old.
The definition of non-value representation rules out object
representations that represent a value of the object type from being
non-value representations. So it seems to be stricter than the
definition of trap representation, which does not seem to rule out such
object representations from being trap representations. Is this
interpretation correct?
I don't believe so. As far as I can tell, a "non-value
representation" (C23 and later) is exactly the same thing as a "trap representation" (C17 and earlier). The older term was probably
considered unclear, since it could imply that a trap is required.
In fact, reading an object with a trap/non-value representation
has undefined behavior, which can include yielding the value you
might have expected.
Editions of the C standard earlier than C23 were not entirely
clear about the representation of _Bool.
Typically in C17 and earlier, and always in C23, _Bool/bool will
have exactly 1 value bit and CHAR_BIT-1 padding bits. Padding bits
do not contribute to the value of an object (so 0 and 1 are the
only possible values), but non-zero padding bits *may or may not*
create trap/non-value representations. (A gratuitously exotic
implementation might use a representation other than 00000001 for
true, but 00000000 is guaranteed to be a representation for 0/false.)
As far as I can tell, the standard is silent on whether a bool object
with non-zero padding bits is a trap/non-value representation or not.
I wrote a test program to explore how bool is treated. It uses
memcpy to set the representation of a bool object and then prints
the value of that object. Source is at the bottom of this message.
If bool has no non-value representations, then the values of the
CHAR_BIT-1 padding bits must be ignored when reading a bool object,
and the value of such an object is determined only by its single
value bit, 0 or 1. If it does have non-value representations,
then reading such an object has undefined behavior.
With gcc 14.2.0, with "-std=c23", all-zeros is treated as false
when used in a condition and all other representations are treated
as true. Converting the value of a bool object to another integer
type yields the value of its full 8-bit representation. If a bool
object holds a representation other than 00000000 or 00000001,
it compares equal to both `true` and `false`.
This implies that bool has 1 value bit and 7 padding bits (as
required by C23) and that it has 2 value representations and 254
trap representations. The observed behavior for the non-value representations is the result of undefined behavior. (gcc -std=c23
sets __STDC_VERSION__ to 202000L, not 202311L. The documentation acknowledges that support for C23 is experimental and incomplete.)
With clang 19.1.4, with "-std=c23", the behavior is consistent
with bool having no non-value representations. The 7 padding bits
do not contribute to the value of a bool object. Any bool object
with 0 as the low-order bit is treated as false in a condition and
yields 0 when converted to another integer type,. Any bool object
with 1 as the low-order bit is treated as true, and yields 1 when
converted to another integer type. I presume the intent is for bool
to have 256 value representations and no non-value representations
(with the padding bits ignored as required), but it's also consistent
with bool having non-value representations and the observed behavior
being undefined. It's not possible to determine with a test program
whether the output is the result of undefined behavior or not.
On Fri, 17 Jan 2025 18:39:38 +0000, Tim Rentsch wrote:[...]
Hi Tim,
Sorry for the confusion, I am new to platform and hadn't realised
that I need to quote Keith's post in my reply.
Let's assume 8-bit chars, and also that the width of _Bool is 1
(which is optional before C23 and required in C23). Here is what
can be said about the 256 states of a _Bool object.
1. All zero bits must be a legal value for 0.
2. There must be at least one combination of bits that is a legal
value for 1 (and since it must be distinct from the all-zero
value for 0, must have at least one bit set to 1).
3. The remaining 254 possible combinations of bit settings can be
any mixture of legal values and trap representations, which are also
known as non-value representations starting in C23.
4. Considering the set of legal value bit settings, there must be at
least one bit position that is 0 in all cases where the value is
0, and is 1 in all cases where the value is 1.
5. Accessing any representation corresponding to a legal value has
well-defined behavior, and yields 0 or 1 depending on the setting of
the bit (or bits) mentioned in #4.
6. Accessing any trap/non-value representation is undefined behavior
and might do anything at all. It might appear to work. It might
work in some cases but not others. It might yield a value that is
neither 0 or 1. It might abort the program. It might cause the
computer the program is running on to run a different operating
system (of course this outcome isn't very likely, but as far as the
C standard is concerned it cannot be ruled out).
Does this answer all your questions?
Yes, thank you for taking the time to reply, I really appreciate it.
Just to clarify, since padding bits do not count towards the value being represented, in point (2) above, it would have to be the value bit specifically that is set to 1; and similarly in point (4), the bit
position that is being referred to is the value bit. Is this correct?
On Fri, 17 Jan 2025 21:34:53 +0000, Keith Thompson wrote:
The message being referred to is one I posted Sun 2021-05-23, with
Message-ID <87tums515a.fsf@nosuchdomain.example.com>. It's visible on
Google Groups at
<https://groups.google.com/g/comp.lang.c/c/4FUlV_XkmXg/m/OG8WeUCfAwAJ>.
As others have suggested, please include attribution information when
posting a followup. You don't need to quote the entire message,
but provide at least some context, particularly when the parent
message is old.
Hi Keith,
Sorry for the confusion, I am new to the platform and had not realised
that I needed to quote your post in my reply.
You don't need to (in the sense that the world would end if you don't do
it), but it makes things easier for your readers if you do.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,007 |
Nodes: | 10 (0 / 10) |
Uptime: | 196:49:15 |
Calls: | 13,143 |
Files: | 186,574 |
D/L today: |
511 files (113M bytes) |
Messages: | 3,310,141 |