N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
I'll use an `int` object in my example.
Reading an object that holds a non-value representation has undefined behavior, but not all integer types have non-value representations
-- and if an implementation has certain characteristics, we can
reliably infer that int has no non-value representations (called
"trap representations" in C99, C11, and C17).
Consider this program:
```
#include <limits.h>
int main(void) {
int foo;
if (sizeof (int) == 4 &&
CHAR_BIT == 8 &&
INT_MAX == 2147483647 &&
INT_MIN == -INT_MAX-1)
{
int bar = foo;
}
}
```
If the condition is true (as it is for many real-world
implementations), then int has no padding bits and no trap
representations. The object `foo` has an indeterminate representation
when it's used to initialize `bar`. Since it cannot have a non-value representation, it has an unspecified value.
If J.2(11) is correct, then the use of the value results in undefined behavior.
But Annex J is non-normative, and as far as I can tell there is no
normative text in the standard that says the behavior is undefined.
N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
I'll use an `int` object in my example.
Reading an object that holds a non-value representation has undefined
behavior, but not all integer types have non-value representations
-- and if an implementation has certain characteristics, we can
reliably infer that int has no non-value representations (called
"trap representations" in C99, C11, and C17).
Consider this program:
```
#include <limits.h>
int main(void) {
int foo;
if (sizeof (int) == 4 &&
CHAR_BIT == 8 &&
INT_MAX == 2147483647 &&
INT_MIN == -INT_MAX-1)
{
int bar = foo;
}
}
```
If the condition is true (as it is for many real-world
implementations), then int has no padding bits and no trap
representations. The object `foo` has an indeterminate representation
when it's used to initialize `bar`. Since it cannot have a non-value
representation, it has an unspecified value.
If J.2(11) is correct, then the use of the value results in undefined
behavior.
But Annex J is non-normative, and as far as I can tell there is no
normative text in the standard that says the behavior is undefined.
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class
(never had its address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."
seems to cover it. The restriction on not having it's address taken
seems odd.
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
I'll use an `int` object in my example.
Reading an object that holds a non-value representation has undefined
behavior, but not all integer types have non-value representations
-- and if an implementation has certain characteristics, we can
reliably infer that int has no non-value representations (called
"trap representations" in C99, C11, and C17).
Consider this program:
```
#include <limits.h>
int main(void) {
int foo;
if (sizeof (int) == 4 &&
CHAR_BIT == 8 &&
INT_MAX == 2147483647 &&
INT_MIN == -INT_MAX-1)
{
int bar = foo;
}
}
```
If the condition is true (as it is for many real-world
implementations), then int has no padding bits and no trap
representations. The object `foo` has an indeterminate representation
when it's used to initialize `bar`. Since it cannot have a non-value
representation, it has an unspecified value.
If J.2(11) is correct, then the use of the value results in undefined
behavior.
But Annex J is non-normative, and as far as I can tell there is no
normative text in the standard that says the behavior is undefined.
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class
(never had its address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."
seems to cover it. The restriction on not having it's address taken
seems odd.
Good find.
That sentence was added in C11 (it doesn't appear in C99 or in
N1256, which consists of C99 plus the three Technical Corrigenda)
in response to DR #338. Since the wording in Annex J goes back to
C99 in its current form, and to C90 in a slightly different form,
that can't be what Annex J is referring to. And the statement
in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
retroactive justification.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm
Yes, that restriction does seem strange. It was inspired by the
IA64 (Itanium) architecture, which has an extra trap bit in each
CPU register (NaT, "not a thing"). The "could have been declared
with the register storage class" wording is there because the IA64
NaT bit exists only in CPU registers, not in memory.
An object with automatic storage duration might be stored in an IA64
CPU register. If the object is not initialized, the register's
NaT bit would be set. Any attempt to read it would cause a trap.
Writing it would clear the NaT bit.
Which means that a hypothetical CPU with something like a NaT bit
on each word of memory (iAPX 432? i960?) might cause a trap in
circumstances not covered by that wording -- but it *is* covered
by the wording in Annex J.
(Normally, an object whose address is taken can still be stored in
a CPU register for part of its lifetime. The effect is to forbid
certain optimizations on I64-like systems.)
It's tempting to conclude that reading an uninitialized automatic
object whose address is taken is *not* undefined behavior (https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
but the standard doesn't say so.
C90's Annex G (renamed to Annex J in later editions) says:--
The behavior in the following circumstances is undefined:
[...]
- The value of an uninitialized object that has automatic storage
duration is used before a value is assigned (6.5.7).
6.5.7 discusses initialization, but doesn't say that reading an
uninitialized object has undefined behave, so the issue is an old one.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:[...]
There are three relevant clauses in Annex J, and I think we should keep[...]
them all in mind. Sadly, they are not numbered (until C23) so I've
given then 'UB' numbers taken from the similar wording in C23.
— The value of an object with automatic storage duration is used while
it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]
— A trap representation is read by an lvalue expression that does not
have character type (6.2.6.1). [UB-12]
— An lvalue designating an object of automatic storage duration that
could have been declared with the register storage class is used in
a context that requires the value of the designated object, but the
object is uninitialized. (6.3.2.1). [UB-20]
An object with automatic storage duration might be stored in an IA64
CPU register. If the object is not initialized, the register's
NaT bit would be set. Any attempt to read it would cause a trap.
Writing it would clear the NaT bit.
Which means that a hypothetical CPU with something like a NaT bit
on each word of memory (iAPX 432? i960?) might cause a trap in
circumstances not covered by that wording -- but it *is* covered
by the wording in Annex J.
It's covered by UB-12 and that's backed up by normative text,
specifically paragraph 5 of the section cited in UB-12.
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:[...]
There are three relevant clauses in Annex J, and I think we should keep[...]
them all in mind. Sadly, they are not numbered (until C23) so I've
given then 'UB' numbers taken from the similar wording in C23.
— The value of an object with automatic storage duration is used while >> it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]
— A trap representation is read by an lvalue expression that does not
have character type (6.2.6.1). [UB-12]
— An lvalue designating an object of automatic storage duration that
could have been declared with the register storage class is used in
a context that requires the value of the designated object, but the
object is uninitialized. (6.3.2.1). [UB-20]
An object with automatic storage duration might be stored in an IA64
CPU register. If the object is not initialized, the register's
NaT bit would be set. Any attempt to read it would cause a trap.
Writing it would clear the NaT bit.
Which means that a hypothetical CPU with something like a NaT bit
on each word of memory (iAPX 432? i960?) might cause a trap in
circumstances not covered by that wording -- but it *is* covered
by the wording in Annex J.
It's covered by UB-12 and that's backed up by normative text,
specifically paragraph 5 of the section cited in UB-12.
I don't think so. A "non-value representation" (formerly a "trap representation") is determined by the bits making up the representation
of an object. For an integer type, such a representation can occur only
if the type has padding bits. The IA64 NaT bit is not part of the representation; it's neither a value bit nor a padding bit.
For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
defined as a set of 8 bytes that can be copied into an object of type `unsigned char[8]`. The NaT bit does not contribute to the size of the object.
I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation.
An automatic object with no initialization, or a malloc()ed object,
starts with an indeterminate value, and accessing that value
(other than as an array of characters) has undefined behavior.
(This is a proposal, not what the standard currently says.)
IA64 happens to have a way of (partially) representing that
provenance in hardware, outside the object in question. Other or
future architectures might do a more complete job.
[...]
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class
(never had its address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."
seems to cover it. The restriction on not having it's address taken
seems odd.
On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:I personally like this rule (but I am speaking about me. there is
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."
seems to cover it. The restriction on not having it's address takenWording like that looks like someone's solo documentation effort,
seems odd.
not peer-reviewed by an expert commitee.
That looks as if the intent is to allow some diagnoses of uses of uninitialized variables, while discouraging others.
However, it doesn't seem a good idea to be constraining
implementations in how clever they can be in identifying
an erroneous situation.
On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
Personally, I think that the root cause of this whole issue is
the defective definition of indeterminate value.
On Saturday, July 22, 2023 at 8:40:42?AM UTC+2, Kaz Kylheku wrote:
On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage
class (never had its address taken), and that object is
uninitialized (not declared with an initializer and no
assignment to it has been performed prior to use), the behavior
is undefined."
seems to cover it. The restriction on not having it's address
taken seems odd.
[...]
I personally like this rule (but I am speaking about me. there is
no full consensus about the exact interpretation of the standard
nor about what it should say). I will try to explain why. [...]
On 2023-07-21 19:42, Kaz Kylheku wrote:
On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
N3096 is the last public draft of the upcoming C23 standard.
N3096 J.2 says:
The behavior is undefined in the following circumstances:
[...]
(11) The value of an object with automatic storage duration is
used while the object has an indeterminate representation
(6.2.4, 6.7.10, 6.8).
Personally, I think that the root cause of this whole issue is
the defective definition of indeterminate value.
The problem is much deeper than that. It all boils down to the
obsession in the official C community to abuse the concept of
"undefined" to cover everything from "arbitrary natural semantics
of the hardware" to "optimizing away code unexpectedly" . [...]
Repeating the question stated in the Subject line:[400+ lines deleted]
Does reading an uninitialized object [always] have undefined
behavior?
Background: Annex J part 2 says (in various phrasings in
different revisions of the C standard, with the one below
being taken from C90):
The value of an uninitialized object that has automatic
storage duration is used before a value is assigned [is
undefined behavior] (6.5.7)
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
Summary: my reading is that accessing an object that has not
been explicitly stored into since its declaration was evaluated
is necessarily undefined behavior in C90, but not necessarily
undefined behavior in C99 and C11 (and AFAIAA also in C17 and
the upcoming C23). My reasoning is given in detail above.
Postscript: this commentary has taken much longer to write than
I thought it would, for the most part because I made an early
decision to be systematic and thorough. I hope the effort has
helped the readers gain confidence in the explanations and
conclusions stated. I may return to the deferred topic about
pointer types but have no plans at present about when that might
be.
Tim Rentsch <tr.1...@z991.linuxsc.com> writes:
Repeating the question stated in the Subject line:
Does reading an uninitialized object [always] have undefined
behavior?
Background: Annex J part 2 says (in various phrasings in
different revisions of the C standard, with the one below
being taken from C90):
The value of an uninitialized object that has automatic
storage duration is used before a value is assigned [is
undefined behavior] (6.5.7)
Remembering that Annex J is informative rather than normative,[400+ lines deleted]
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
Summary: my reading is that accessing an object that has not
been explicitly stored into since its declaration was evaluated
is necessarily undefined behavior in C90, but not necessarily
undefined behavior in C99 and C11 (and AFAIAA also in C17 and
the upcoming C23). My reasoning is given in detail above.
I personally agree with this analysis and also about the need to fix J.2. Pointers seem to fit into this scheme if you think about the validPostscript: this commentary has taken much longer to write thanThank you for taking the time to write that.
I thought it would, for the most part because I made an early
decision to be systematic and thorough. I hope the effort has
helped the readers gain confidence in the explanations and
conclusions stated. I may return to the deferred topic about
pointer types but have no plans at present about when that might
be.
I'd like to offer a brief summary of the points you made. Please let me
know if my summary is incorrect.
- An "indeterminate value" is by definition either an "unspecified
value" or a "trap representation".
- In C90 (which did not yet define all these terms), accessing the value
of an uninitialized object explicitly has undefined behavior.
- In C99 and later, J.2 (which is *not* normative) states that using the value of an object with automatic storage duration while it is
indeterminate has undefined behavior. This implies that:
int main(void) {
int n;
n;
}
has undefined behavior, even if int has no trap representations.
- Statements in J.2 *should* be supported by normative text.
- There is no normative text in any post-C90 edition of the C
standard that supports the claim that reading an uninitialized
int object actually has undefined behavior if it does not hold
a trap representation. (Pointers raise other issues, which I'll
ignore for now.)
- The cited statement in J.2 is incorrect, or at least imprecise.
I agree with you on all the above points.
There is one point on which I think we disagree. It is a matter
of opinion, not of fact. You wrote:
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
The statement in N1570 J.2 is:
The behavior is undefined in the following circumstances:
[...]
- The value of an object with automatic storage duration is used
while it is indeterminate (6.2.4, 6.7.9, 6.8).
I get the impression that you're not particularly bothered by the fact
that the statement in J.2 is merely an "approximation". In my opinion,
the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
that Annex J is, to quote the standard's foreword, "for information
only", is not an excuse to ignore factual errors. Readers of the
standard rely on the informative annexes to provide correct information. This particular text is not just a "(perhaps useful) approximation"; it
is actively misleading.
I'm not criticizing the author of the standard for making this mistake. Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.
I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation. [...]
Keith Thompson <Keith.S.T...@gmail.com> writes:One could still consider the idea that "indeterminate" is an
I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation. [...]
This idea is fundamentally wrong. NaT bits are associated with
particular areas of memory, which is to say objects. The point
of provenance is that non-viability is associated with /values/,
not with objects. Once an area of memory acquires an object
representation, the NaT bit or NaT bits for that memory are set
to zero, end of story. Note also that NaT bits are independent
of what type is used to access an object - if the NaT bit is set
then any access is illegal, no matter what type is used to do the
access. By contrast, provenance is used in situations where
non-viability is associated with values, not with objects. But
values are always type dependent; a pointer object that holds
a value that has been passed to free() is "indeterminate" when
accessed as a pointer type, but perfectly okay to access as an
unsigned char type. The two kinds of situations are essentially
different, and the theoretical models used to characterize the
rules in the two kinds of situations should therefore be
correspondingly essentially different.
On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:
Keith Thompson <Keith.S.T...@gmail.com> writes:
I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation. [...]
This idea is fundamentally wrong. NaT bits are associated with
particular areas of memory, which is to say objects. The point
of provenance is that non-viability is associated with /values/,
not with objects. Once an area of memory acquires an object
representation, the NaT bit or NaT bits for that memory are set
to zero, end of story. Note also that NaT bits are independent
of what type is used to access an object - if the NaT bit is set
then any access is illegal, no matter what type is used to do the
access. By contrast, provenance is used in situations where
non-viability is associated with values, not with objects. But
values are always type dependent; a pointer object that holds
a value that has been passed to free() is "indeterminate" when
accessed as a pointer type, but perfectly okay to access as an
unsigned char type. The two kinds of situations are essentially
different, and the theoretical models used to characterize the
rules in the two kinds of situations should therefore be
correspondingly essentially different.
One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
Martin Uecker <ma.u...@gmail.com> writes:I would love to hear your comments, because some people
On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:
Keith Thompson <Keith.S.T...@gmail.com> writes:
I think the right way for C to permit NaT-like bits is, as Kaz
suggested, to define "indeterminate value" in terms of provenance,
not just the bits that make up its current representation. [...]
This idea is fundamentally wrong. NaT bits are associated with
particular areas of memory, which is to say objects. The point
of provenance is that non-viability is associated with /values/,
not with objects. Once an area of memory acquires an object
representation, the NaT bit or NaT bits for that memory are set
to zero, end of story. Note also that NaT bits are independent
of what type is used to access an object - if the NaT bit is set
then any access is illegal, no matter what type is used to do the
access. By contrast, provenance is used in situations where
non-viability is associated with values, not with objects. But
values are always type dependent; a pointer object that holds
a value that has been passed to free() is "indeterminate" when
accessed as a pointer type, but perfectly okay to access as an
unsigned char type. The two kinds of situations are essentially
different, and the theoretical models used to characterize the
rules in the two kinds of situations should therefore be
correspondingly essentially different.
One could still consider the idea that "indeterminate" is anMy preceding comments were meant to be only about NaT bits (or
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
NaT-like bits) and provenance. There is an inherent mismatch
between the two, as I have tried to explain. It is only the idea
that provenence would provide a good foundation for defining the
semantics of "NaT everywhere" that I am saying is fundamentally
wrong.
I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Repeating the question stated in the Subject line:
Does reading an uninitialized object [always] have undefined
behavior?
Background: Annex J part 2 says (in various phrasings in
different revisions of the C standard, with the one below
being taken from C90):
The value of an uninitialized object that has automatic
storage duration is used before a value is assigned [is
undefined behavior] (6.5.7)
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
[400+ lines deleted]
Summary: my reading is that accessing an object that has not
been explicitly stored into since its declaration was evaluated
is necessarily undefined behavior in C90, but not necessarily
undefined behavior in C99 and C11 (and AFAIAA also in C17 and
the upcoming C23). My reasoning is given in detail above.
Postscript: this commentary has taken much longer to write than
I thought it would, for the most part because I made an early
decision to be systematic and thorough. I hope the effort has
helped the readers gain confidence in the explanations and
conclusions stated. I may return to the deferred topic about
pointer types but have no plans at present about when that might
be.
Thank you for taking the time to write that.
I'd like to offer a brief summary of the points you made. Please let me
know if my summary is incorrect.
- An "indeterminate value" is by definition either an "unspecified
value" or a "trap representation".
- In C90 (which did not yet define all these terms), accessing the value
of an uninitialized object explicitly has undefined behavior.
- In C99 and later, J.2 (which is *not* normative) states that using the
value of an object with automatic storage duration while it is
indeterminate has undefined behavior. This implies that:
int main(void) {
int n;
n;
}
has undefined behavior, even if int has no trap representations.
- Statements in J.2 *should* be supported by normative text.
- There is no normative text in any post-C90 edition of the C
standard that supports the claim that reading an uninitialized
int object actually has undefined behavior if it does not hold
a trap representation. (Pointers raise other issues, which I'll
ignore for now.)
- The cited statement in J.2 is incorrect, or at least imprecise.
I agree with you on all the above points.
There is one point on which I think we disagree. It is a matter
of opinion, not of fact. You wrote:
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
The statement in N1570 J.2 is:
The behavior is undefined in the following circumstances:
[...]
- The value of an object with automatic storage duration is used
while it is indeterminate (6.2.4, 6.7.9, 6.8).
I get the impression that you're not particularly bothered by the fact
that the statement in J.2 is merely an "approximation". In my opinion,
the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
that Annex J is, to quote the standard's foreword, "for information
only", is not an excuse to ignore factual errors. Readers of the
standard rely on the informative annexes to provide correct information.
This particular text is not just a "(perhaps useful) approximation"; it
is actively misleading.
I'm not criticizing the author of the standard for making this mistake.
Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.
On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
6.3.2.1 p2:
"[...] If the lvalue designates an object of automatic storage
duration that could have been declared with the register storage class
(never had its address taken), and that object is uninitialized (not
declared with an initializer and no assignment to it has been
performed prior to use), the behavior is undefined."
seems to cover it. The restriction on not having it's address taken
seems odd.
Wording like that looks like someone's solo documentation effort,
not peer-reviewed by an expert commitee.
That looks as if the intent is to allow some diagnoses of uses of uninitialized variables, while discouraging others.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Repeating the question stated in the Subject line:[400+ lines deleted]
Does reading an uninitialized object [always] have undefined
behavior?
Background: Annex J part 2 says (in various phrasings in
different revisions of the C standard, with the one below
being taken from C90):
The value of an uninitialized object that has automatic
storage duration is used before a value is assigned [is
undefined behavior] (6.5.7)
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
Summary: my reading is that accessing an object that has not
been explicitly stored into since its declaration was evaluated
is necessarily undefined behavior in C90, but not necessarily
undefined behavior in C99 and C11 (and AFAIAA also in C17 and
the upcoming C23). My reasoning is given in detail above.
Postscript: this commentary has taken much longer to write than
I thought it would, for the most part because I made an early
decision to be systematic and thorough. I hope the effort has
helped the readers gain confidence in the explanations and
conclusions stated. I may return to the deferred topic about
pointer types but have no plans at present about when that might
be.
Thank you for taking the time to write that.
I'd like to offer a brief summary of the points you made. Please let me
know if my summary is incorrect.
- An "indeterminate value" is by definition either an "unspecified
value" or a "trap representation".
- In C90 (which did not yet define all these terms), accessing the value
of an uninitialized object explicitly has undefined behavior.
- In C99 and later, J.2 (which is *not* normative) states that using the
value of an object with automatic storage duration while it is
indeterminate has undefined behavior. This implies that:
int main(void) {
int n;
n;
}
has undefined behavior, even if int has no trap representations.
- Statements in J.2 *should* be supported by normative text.
- There is no normative text in any post-C90 edition of the C
standard that supports the claim that reading an uninitialized
int object actually has undefined behavior if it does not hold
a trap representation. (Pointers raise other issues, which I'll
ignore for now.)
- The cited statement in J.2 is incorrect, or at least imprecise.
I agree with you on all the above points.
There is one point on which I think we disagree. It is a matter
of opinion, not of fact. You wrote:
Remembering that Annex J is informative rather than normative,
is this statement right even for a type that has no trap
representations? To ask that question another way, is this
statement always right or is it just a (perhaps useful)
approximation?
The statement in N1570 J.2 is:
The behavior is undefined in the following circumstances:
[...]
- The value of an object with automatic storage duration is used
while it is indeterminate (6.2.4, 6.7.9, 6.8).
I get the impression that you're not particularly bothered by the fact
that the statement in J.2 is merely an "approximation". In my opinion,
the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
that Annex J is, to quote the standard's foreword, "for information
only", is not an excuse to ignore factual errors. Readers of the
standard rely on the informative annexes to provide correct information.
This particular text is not just a "(perhaps useful) approximation"; it
is actively misleading.
I'm not criticizing the author of the standard for making this mistake.
Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:[ ... ]
Repeating the question stated in the Subject line:
Does reading an uninitialized object [always] have undefined
behavior?
Thank you for taking the time to write that.
I'm not criticizing the author of the standard for making this mistake.
Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.
On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:[ ... ]
Repeating the question stated in the Subject line:
Does reading an uninitialized object [always] have undefined
behavior?
Thank you for taking the time to write that.
I'm not criticizing the author of the standard for making this mistake.
Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.
[Supersede attempt to reduce quoted material.]
I would be in favor of a formal model of what "uninitialized" means
which could be summarized as below.
Implementors wishing to develop tooling to catch uses of uninitialized
data can refer to the model; if their tooling diagnoses only
what the model deems undefined, then the tool can be integrated
into a conforming implementation.
- Certain objects are unintialized, like auto variables without
an initializer, or new bytes coming from malloc or realloc.
- What is undefined behavior is when an uninitialized value is used
to make a control-flow decision, or when it is output, or otherwise
passed to the host environment.
Kaz Kylheku <864-117-4973@kylheku.com> writes:
On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:[ ... ]
Repeating the question stated in the Subject line:
Does reading an uninitialized object [always] have undefined
behavior?
Thank you for taking the time to write that.
I'm not criticizing the author of the standard for making this mistake.
Stuff happens. It was likely a result of an oversight during the
transition from C90 to C99.
[Supersede attempt to reduce quoted material.]
I would be in favor of a formal model of what "uninitialized" means
which could be summarized as below.
Implementors wishing to develop tooling to catch uses of uninitialized
data can refer to the model; if their tooling diagnoses only
what the model deems undefined, then the tool can be integrated
into a conforming implementation.
- Certain objects are unintialized, like auto variables without
an initializer, or new bytes coming from malloc or realloc.
- What is undefined behavior is when an uninitialized value is used
to make a control-flow decision, or when it is output, or otherwise
passed to the host environment.
Why restrict it to those particular uses, rather than saying that any
attempt to read an uninitialized value has undefined behavior?
For example, something like:
{
int uninit;
int copy = uninit + 1;
}
might cause a hardware trap on some systems (for example Itanium if
uninit is stored in a register and the NaT bit is set).
On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
Martin Uecker <ma.u...@gmail.com> writes:
One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.
I would love to hear your comments, because some people
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.
Martin Uecker <ma.uecker@gmail.com> writes:
[some unrelated passages removed]
On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
Martin Uecker <ma.u...@gmail.com> writes:
[...]
One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.
I would love to hear your comments, because some people
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.
I've been thinking about this, and am close (I think) to having
something to say in response. Before I do that, thought, let me
ask this: what problem or problems are motivating the question?
What problems do you (or "some people") want to solve? I don't
want just examples here; I'm hoping to get a full list.
On 2023-08-17, Tim Rentsch <tr.1...@z991.linuxsc.com> wrote:I do not agree with the idea that "absence of UB = safe ".
Martin Uecker <ma.u...@gmail.com> writes:
[some unrelated passages removed]
On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
Martin Uecker <ma.u...@gmail.com> writes:
[...]
One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.
I would love to hear your comments, because some people
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.
I've been thinking about this, and am close (I think) to havingI'm all about the diagnosis. Even on machines in which all
something to say in response. Before I do that, thought, let me
ask this: what problem or problems are motivating the question?
What problems do you (or "some people") want to solve? I don't
want just examples here; I'm hoping to get a full list.
representations are values, and therefore safe,
a program whose externalI would expect a debugger to output the memory as it seen
effect or output depends on unintialized data, and is therefore nondeterministic (a bad form of nondeterministic), is a repugnant
program.
I'd like to have clear rules which allow an implementation toAn implementation does not need a license from the standard
to go great depths to diagnose all such situations, while
remaining conforming. (The language agrees that those situations
are erroneous, granting the tools license to diagnose.)
At the same time, certain situations in which uninitialized data areYes, I think for C this is rather important.
used in ways that don't have a visible effect, would be nuisance if they generated diagnostics, the primary example being the copying of objects.
I would like it so that memcpy isn't magic. I want it so that the
programmer can write a bytewise memcpy which doesn't violate the
rules even if it moves uninitialized data.
I would like a model of uninitialized data which usefully lends itselfTools can already do complex analysis and track down use of
to different depths with different trade-offs, like complexity of
analysis and use of run-time resources. Limits should be imposed by implementations (what cases they want to diagnose) rather than by the
model.
Martin Uecker <ma.u...@gmail.com> writes:There are essentially two main interests driving this. First, there
[some unrelated passages removed]
On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
[...]Martin Uecker <ma.u...@gmail.com> writes:
One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.
I would love to hear your comments, because some peopleI've been thinking about this, and am close (I think) to having
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.
something to say in response. Before I do that, thought, let me
ask this: what problem or problems are motivating the question?
What problems do you (or "some people") want to solve? I don't
want just examples here; I'm hoping to get a full list.
I'm all about the diagnosis. Even on machines in which all
representations are values, and therefore safe, a program whose
external effect or output depends on unintialized data, and is
therefore nondeterministic (a bad form of nondeterministic), is a
repugnant program.
I'd like to have clear rules which allow an implementation to to
go great depths to diagnose all such situations, while remaining
conforming. (The language agrees that those situations are
erroneous, granting the tools license to diagnose.)
On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
An implementation does not need a license from the standard
to diagnose anything. I can already diagnose whatever seems
useful and this does not affect conformance at all.
I would like a model of uninitialized data which usefully lends itself
to different depths with different trade-offs, like complexity of
analysis and use of run-time resources. Limits should be imposed by
implementations (what cases they want to diagnose) rather than by the
model.
Tools can already do complex analysis and track down use of
uninitialized variables. But with respect to conformance, I think
the current standard has very good rules: memcpy/memcmp
and similar code works as expected. Locally, where a compiler
can be expected to give good diagnostics via static analysis
the use of uninitialized variables is UB. But this does not
spread via pointers elsewhere, where useful diagnostics
are unlikely and optimizer induced problems based on UB
might be far more difficult to debug.
Kaz Kylheku <864-117-4973@kylheku.com> writes:
I'm all about the diagnosis. Even on machines in which all
representations are values, and therefore safe, a program whose
external effect or output depends on unintialized data, and is
therefore nondeterministic (a bad form of nondeterministic), is a
repugnant program.
I'd like to have clear rules which allow an implementation to to
go great depths to diagnose all such situations, while remaining
conforming. (The language agrees that those situations are
erroneous, granting the tools license to diagnose.)
The C standard allows compilers to do whatever analysis they
want and to issue diagnostics for whatever conditions or
circumstances they choose.
What you want is orthogonal to what is being discussed.
On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
The C standard allows compilers to do whatever analysis they
want and to issue diagnostics for whatever conditions or
circumstances they choose.
And stop translating? If some use of an uninitialized object
isn't undefined, and you make the diagnostic a fatal error,
then you don't have a conforming compiler at that point.
[also]
If the program hasn't invoked undefined behavior, I don't thinkk
it's conforming to inject gratuitous diagnostics [..or..]
to arbitrarily terminate the program. [...]
On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:The observable behavior has to stay the same, so yes, it could
On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:That's true about diagnostics at translation time. It's not clear
An implementation does not need a license from the standard
to diagnose anything. I can already diagnose whatever seems
useful and this does not affect conformance at all.
about that happen at run time and indistinguishable from the
program's output on stdout or stderr.
Also, it might be desirable for it to be conforming to terminate theYes, this is one main reason to make certain things UB. But
program if it has run afoul of the rules.
And valgrind exists and is a useful tool (I use it myself)I would like a model of uninitialized data which usefully lends itself
to different depths with different trade-offs, like complexity of
analysis and use of run-time resources. Limits should be imposed by
implementations (what cases they want to diagnose) rather than by the
model.
Tools can already do complex analysis and track down use ofDynamic instrumentation and tracking makes that possible
uninitialized variables. But with respect to conformance, I think
the current standard has very good rules: memcpy/memcmp
and similar code works as expected. Locally, where a compiler
can be expected to give good diagnostics via static analysis
the use of uninitialized variables is UB. But this does not
spread via pointers elsewhere, where useful diagnostics
are unlikely and optimizer induced problems based on UB
might be far more difficult to debug.
for that information to follow pointer data flows, globally
in the program.
E.g. under the Valgrind tool, if one module passes an unitialized
object into another, and that other one relies on it to make
a conditional branch, it will be diagnosed. You can get the
backtrace of where that object was created as well as where
the use took place.
On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:
On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:That's true about diagnostics at translation time. It's not clear
An implementation does not need a license from the standard
to diagnose anything. I can already diagnose whatever seems
useful and this does not affect conformance at all.
about that happen at run time and indistinguishable from the
program's output on stdout or stderr.
The observable behavior has to stay the same, so yes, it could
not output to stdout or stderr. But there is nothing stopping it
to log debugging information somewhere else, where it could
be accessed.
Also, it might be desirable for it to be conforming to terminate the
program if it has run afoul of the rules.
Yes, this is one main reason to make certain things UB. But
then it can have false positives and needs to be backward
compatible, which limits what is possible.
Dynamic instrumentation and tracking makes that possibleI would like a model of uninitialized data which usefully lends itself >>>> to different depths with different trade-offs, like complexity of
analysis and use of run-time resources. Limits should be imposed by
implementations (what cases they want to diagnose) rather than by the
model.
Tools can already do complex analysis and track down use of
uninitialized variables. But with respect to conformance, I think
the current standard has very good rules: memcpy/memcmp
and similar code works as expected. Locally, where a compiler
can be expected to give good diagnostics via static analysis
the use of uninitialized variables is UB. But this does not
spread via pointers elsewhere, where useful diagnostics
are unlikely and optimizer induced problems based on UB
might be far more difficult to debug.
for that information to follow pointer data flows, globally
in the program.
E.g. under the Valgrind tool, if one module passes an unitialized
object into another, and that other one relies on it to make
a conditional branch, it will be diagnosed. You can get the
backtrace of where that object was created as well as where
the use took place.
And valgrind exists and is a useful tool (I use it myself)
despite not everything it diagnoses is UB. But it also has
false positives, so using the same rules for deciding what
should be UB in the standard as valgrind uses seems difficult.
Also note that of the output of a program relies on
unspecified values, then it is already not strictly conforming
even when the behavior itself is not undefined. So if an
implementation is smart enough to see this, it could already
reject the program.
Making already the use of unspecified values in conditional
branches be UB seems problematic. E.g. you could not
compute a hash over data structures with padding and
then compare it later to see whether something has
changed (taking into account false positives). This seems
similar to memcpy / memcmp but involved conditions,
and such techniques would become non-conforming.
Martin
On 8/19/23 4:36 AM, Martin Uecker wrote:The C standard specifies when they can change:
On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:
On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: >>> An implementation does not need a license from the standardThat's true about diagnostics at translation time. It's not clear
to diagnose anything. I can already diagnose whatever seems
useful and this does not affect conformance at all.
about that happen at run time and indistinguishable from the
program's output on stdout or stderr.
The observable behavior has to stay the same, so yes, it could
not output to stdout or stderr. But there is nothing stopping it
to log debugging information somewhere else, where it could
be accessed.
Also, it might be desirable for it to be conforming to terminate the
program if it has run afoul of the rules.
Yes, this is one main reason to make certain things UB. But
then it can have false positives and needs to be backward
compatible, which limits what is possible.
Dynamic instrumentation and tracking makes that possibleI would like a model of uninitialized data which usefully lends itself >>>> to different depths with different trade-offs, like complexity of
analysis and use of run-time resources. Limits should be imposed by >>>> implementations (what cases they want to diagnose) rather than by the >>>> model.
Tools can already do complex analysis and track down use of
uninitialized variables. But with respect to conformance, I think
the current standard has very good rules: memcpy/memcmp
and similar code works as expected. Locally, where a compiler
can be expected to give good diagnostics via static analysis
the use of uninitialized variables is UB. But this does not
spread via pointers elsewhere, where useful diagnostics
are unlikely and optimizer induced problems based on UB
might be far more difficult to debug.
for that information to follow pointer data flows, globally
in the program.
E.g. under the Valgrind tool, if one module passes an unitialized
object into another, and that other one relies on it to make
a conditional branch, it will be diagnosed. You can get the
backtrace of where that object was created as well as where
the use took place.
And valgrind exists and is a useful tool (I use it myself)
despite not everything it diagnoses is UB. But it also has
false positives, so using the same rules for deciding what
should be UB in the standard as valgrind uses seems difficult.
Also note that of the output of a program relies on
unspecified values, then it is already not strictly conforming
even when the behavior itself is not undefined. So if an
implementation is smart enough to see this, it could already
reject the program.
Making already the use of unspecified values in conditional
branches be UB seems problematic. E.g. you could not
compute a hash over data structures with padding and
then compare it later to see whether something has
changed (taking into account false positives). This seems
similar to memcpy / memcmp but involved conditions,
and such techniques would become non-conforming.
MartinMy understanding is that there is no requirement that the values of the padding bytes remains constant over time.
I can't imagine a case whereSure, writing to object may change the padding and then the
they will just change at an arbitrary time, but setting a member of the structure to a value (even if it is the same value it had) might easily affect the value of the padding bytes, so the hash changes.
On Thursday, August 17, 2023 at 8:13:07?AM UTC+2, Tim Rentsch wrote:
Martin Uecker <ma.u...@gmail.com> writes:
[some unrelated passages removed]
On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:
Martin Uecker <ma.u...@gmail.com> writes:
[...]
One could still consider the idea that "indeterminate" is an
abstract property that yields UB during read even for types
that do not have trap representations. There is no wording
in the C standard to support this, but I would not call this
idea "fundamentally wrong". You are right that this is different
to provenance provenance which is about values. What it would
have in common with pointer provenance is that there is hidden
state in the abstract machine associated with memory that
is not part of the representation. With effective types there
is another example of this.
I understand that you want to consider a broader topic, and that,
in the realm of that broader topic, something like provenance
could have a role to play. I think it is worth responding to
that thesis, and am expecting to do so in a separate reply (or
new thread?) although probably not right away.
I would love to hear your comments, because some people
want to have such an abstract of "indeterminate" and
some already believe that this is how the standard should
be understood already today.
I've been thinking about this, and am close (I think) to having
something to say in response. Before I do that, thought, let me
ask this: what problem or problems are motivating the question?
What problems do you (or "some people") want to solve? I don't
want just examples here; I'm hoping to get a full list.
There are essentially two main interests driving this. First,
there is some interest to precisely formulate the semantics for C.
The provenance proposal came out of this.
Second, there is the issue of safety problems caused by
uninitialized reads, together with compiler support for zero
initialization etc. So there are various people who want to
change the semantics for uninitialized variables completely
in the interest of safety.
So far, there was no consensus in WG14 that the rules should
be changed or what the new rules should be.
Sometimes people use compiler options
to turn off, for example, so-called "strict aliasing", and of course
the C standard allows us to do that. But compilers aren't required
to provide such an option, and if they do the option may not do
exactly what we expect it to do, because there is no standard
specification for it. The C standard should define officially
sanctioned mechanisms -- as for example standard #pragma's -- to
give standard-defined semantics to certain constructs of undefined
behavior that resemble, eg, -fno-strict-aliasing.
The second problem is basically The Law of Unintended Consequences
smashing into The Law of Least Astonishment. As compiler writers
have gotten more and more clever at exploiting the implications of
"undefined behavior", we see more and more cases of code that looks reasonable being turned into mush by overly clever "optimizing"
compilers. There is obviously something wrong with the way this
trend is going -- ever more clever "optimizations", followed by ever
more arcane compiler options to work around the problems caused by
the too-clever compilers. This problem must be addressed by the C
standard, for if it is not the ecosystem will transform into a
confused state that is exactly what the C standard was put in place
to avoid. (I do have some ideas about how to address this issue,
but I want to make sure everyone appreciates the extent of the
problem before we start talking about solutions.)
Before leaving the sub-topic of undefined behavior, let me mention
two success stories. The first is 'restrict': the performance
implications are local, the choice is under control of the program
(and programmer), and the default choice is to play safe. Good
show.
On Sat, 26 Aug 2023 19:25:55 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Sometimes people use compiler options to turn off, for example,
so-called "strict aliasing", and of course the C standard allows
us to do that. But compilers aren't required to provide such an
option, and if they do the option may not do exactly what we
expect it to do, because there is no standard specification for
it. The C standard should define officially sanctioned
mechanisms -- as for example standard #pragma's -- to give
standard-defined semantics to certain constructs of undefined
behavior that resemble, eg, -fno-strict-aliasing.
Surely the starting point for this should be the documentation of
the compilers to specify precisely what -fno-strict-aliasing does.
[...]
The second problem is basically The Law of Unintended Consequences
smashing into The Law of Least Astonishment. As compiler writers
have gotten more and more clever at exploiting the implications of
"undefined behavior", we see more and more cases of code that looks
reasonable being turned into mush by overly clever "optimizing"
compilers. There is obviously something wrong with the way this
trend is going -- ever more clever "optimizations", followed by
ever more arcane compiler options to work around the problems
caused by the too-clever compilers. This problem must be addressed
by the C standard, for if it is not the ecosystem will transform
into a confused state that is exactly what the C standard was put
in place to avoid. (I do have some ideas about how to address this
issue, but I want to make sure everyone appreciates the extent of
the problem before we start talking about solutions.)
Without specific examples , it's impossible to comment on this.
[...]
For example it has been pointed out on comp.lang.c that it's
impossible to write a malloc() implementation in conforming
C. This is certainly a weakness which should be addressed with
some appropriate #pragma .
Before leaving the sub-topic of undefined behavior, let me mention
two success stories. The first is 'restrict': the performance
implications are local, the choice is under control of the program
(and programmer), and the default choice is to play safe. Good
show.
From my point of view , restrict is not a success because the
specification of restrict is the one part of the C1999 standard I
have given up trying to understand. I understand the underlying
idea but the specifics elude me. [...]
Spiros Bousbouras <spibou@gmail.com> writes:
On Sat, 26 Aug 2023 19:25:55 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Sometimes people use compiler options to turn off, for example,
so-called "strict aliasing", and of course the C standard allows
us to do that. But compilers aren't required to provide such an
option, and if they do the option may not do exactly what we
expect it to do, because there is no standard specification for
it. The C standard should define officially sanctioned
mechanisms -- as for example standard #pragma's -- to give
standard-defined semantics to certain constructs of undefined
behavior that resemble, eg, -fno-strict-aliasing.
Surely the starting point for this should be the documentation of
the compilers to specify precisely what -fno-strict-aliasing does.
[...]
Not at all. It's easy to write a specification that says what we
want to do, along similar lines to what is said in the footnote
about union member access in section 6.5.2.3
If the member used to access the contents of a union object
is not the same as the member last used to store a value in
the object, the appropriate part of the object representation
of the value is reinterpreted as an object representation in
the new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.
That behavior should be the default, for all accesses. For cases
where a developer wants to give permission to the compiler to
optimize based on cross-type non-interference assumptions, there
should be a #pragma to do something similar to what effective type
rules do now. The effective type rules are in need of re-writing
anyway, and making type punning be the default doesn't break any
programs, because compilers are already free to ignore the
implications of violating effective type conditions.
For example it has been pointed out on comp.lang.c that it's
impossible to write a malloc() implementation in conforming
C. This is certainly a weakness which should be addressed with
some appropriate #pragma .
There isn't any reason to think malloc() should be writable in
completely portable C. That's the point of putting malloc() in
the system library in the first place. By the way, with type
punning semantics mentioned above being the default, and with the
alignment features added in C11, I think it is possible to write
malloc() in portable C without needed any additional language
changes. But even if it isn't that is no cause for concern; one
of the principal reasons for having a system library is to
provide functionality that the core language cannot express (or
cannot express conveniently).
Before leaving the sub-topic of undefined behavior, let me mention
two success stories. The first is 'restrict': the performance
implications are local, the choice is under control of the program
(and programmer), and the default choice is to play safe. Good
show.
From my point of view , restrict is not a success because the
specification of restrict is the one part of the C1999 standard I
have given up trying to understand. I understand the underlying
idea but the specifics elude me. [...]
I agree the formal definition of restrict is rather daunting. In
practice though I think using restrict with confidence is not
overly difficult. My working model for restrict is something
like this:
1. Use restrict only in the declarations of function
parameters.
2. For a declaration like const T *restrict foo ,
the compiler may assume that any objects that can be
accessed through 'foo' will not be modified.
3. For a declaration like T *restrict bas ,
the compiler may assume that any changes to objects
that can be accessed through 'bas' will be done
using 'bas' or a pointer value derived from 'bas'
(and in particular that no changes will happen
other than through 'bas' or 'bas'-derived pointer
values).
Is this summary description helpful?
On Tue, 29 Aug 2023 04:35:40 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Spiros Bousbouras <spibou@gmail.com> writes:
On Sat, 26 Aug 2023 19:25:55 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Sometimes people use compiler options to turn off, for example,
so-called "strict aliasing", and of course the C standard allows
us to do that. But compilers aren't required to provide such an
option, and if they do the option may not do exactly what we
expect it to do, because there is no standard specification for
it. The C standard should define officially sanctioned
mechanisms -- as for example standard #pragma's -- to give
standard-defined semantics to certain constructs of undefined
behavior that resemble, eg, -fno-strict-aliasing.
Surely the starting point for this should be the documentation of
the compilers to specify precisely what -fno-strict-aliasing does.
[...]
Not at all. It's easy to write a specification that says what we
want to do, along similar lines to what is said in the footnote
about union member access in section 6.5.2.3
If the member used to access the contents of a union object
is not the same as the member last used to store a value in
the object, the appropriate part of the object representation
of the value is reinterpreted as an object representation in
the new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.
Works for me but it would be good to know that this is how compiler
writers actually understand -fno-strict-aliasing . [...]
For example it has been pointed out on comp.lang.c that it's
impossible to write a malloc() implementation in conforming
C. This is certainly a weakness which should be addressed with
some appropriate #pragma .
There isn't any reason to think malloc() should be writable in
completely portable C. That's the point of putting malloc() in
the system library in the first place. By the way, with type
punning semantics mentioned above being the default, and with the
alignment features added in C11, I think it is possible to write
malloc() in portable C without needed any additional language
changes. But even if it isn't that is no cause for concern; one
of the principal reasons for having a system library is to
provide functionality that the core language cannot express (or
cannot express conveniently).
One might want to experiment with different allocation algorithms
and it seems to me that this sort of thing is within the "remit" of
C. So ideally one should be able to write it in C [...]
From my point of view , restrict is not a success because the
specification of restrict is the one part of the C1999 standard I
have given up trying to understand. I understand the underlying
idea but the specifics elude me. [...]
I agree the formal definition of restrict is rather daunting. In
practice though I think using restrict with confidence is not
overly difficult. My working model for restrict is something
like this:
1. Use restrict only in the declarations of function
parameters.
2. For a declaration like const T *restrict foo ,
the compiler may assume that any objects that can be
accessed through 'foo' will not be modified.
Wouldn't that also be the case with just const T * foo ?
3. For a declaration like T *restrict bas ,
the compiler may assume that any changes to objects
that can be accessed through 'bas' will be done
using 'bas' or a pointer value derived from 'bas'
(and in particular that no changes will happen
other than through 'bas' or 'bas'-derived pointer
values).
Is this summary description helpful?
It seems clear enough but , as I've said , I don't have any use
for restrict anyway and it's not worth it for me to expend the
additional mental effort to confirm that my code obeys the
additional restrictions of restrict. [...]
Spiros Bousbouras <spibou@gmail.com> writes:
On Tue, 29 Aug 2023 04:35:40 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Spiros Bousbouras <spibou@gmail.com> writes:
Not at all. It's easy to write a specification that says what we
want to do, along similar lines to what is said in the footnote
about union member access in section 6.5.2.3
If the member used to access the contents of a union object
is not the same as the member last used to store a value in
the object, the appropriate part of the object representation
of the value is reinterpreted as an object representation in
the new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.
Works for me but it would be good to know that this is how compiler
writers actually understand -fno-strict-aliasing . [...]
No, it wouldn't. Implementations follow the C standard, not
the other way around. Looking at what implementations do for
the -fno-strict-aliasing flag is worse than a waste of time.
There isn't any reason to think malloc() should be writable in
completely portable C. That's the point of putting malloc() in
the system library in the first place. By the way, with type
punning semantics mentioned above being the default, and with the
alignment features added in C11, I think it is possible to write
malloc() in portable C without needed any additional language
changes. But even if it isn't that is no cause for concern; one
of the principal reasons for having a system library is to
provide functionality that the core language cannot express (or
cannot express conveniently).
One might want to experiment with different allocation algorithms
and it seems to me that this sort of thing is within the "remit" of
C. So ideally one should be able to write it in C [...]
You're conflating writing something in C and writing something
in completely portable C. It's already possible to do these
things writing in C.
On Wed, 30 Aug 2023 17:40:52 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
You're conflating writing something in C and writing something
in completely portable C. It's already possible to do these
things writing in C.
I wrote
One might want to experiment with different allocation
algorithms and it seems to me that this sort of thing is
within the "remit" of C. So ideally one should be able to
write it in C and prove , starting from the standard or
precise specifications in compiler documentation , that it
works correctly. I don't necessarily mean prove the
correctness of the whole code but certain key parts.
.This doesn't conflate anything. One can do the writing but
can one do the proving or something close ?
There are essentially two main interests driving this. First,
there is some interest to precisely formulate the semantics for
C. The provenance proposal came out of this.
Second, there is the issue of safety problems caused by
uninitialized reads, together with compiler support for zero
initialization etc. So there are various people who want to
change the semantics for uninitialized variables completely
in the interest of safety.
So far, there was no consensus in WG14 that the rules should
be changed or what the new rules should be.
Martin Uecker <ma.uecker@gmail.com> writes:
[...]
There are essentially two main interests driving this. First,
there is some interest to precisely formulate the semantics for
C. The provenance proposal came out of this.
Second, there is the issue of safety problems caused by
uninitialized reads, together with compiler support for zero
initialization etc. So there are various people who want to
change the semantics for uninitialized variables completely
in the interest of safety.
So far, there was no consensus in WG14 that the rules should
be changed or what the new rules should be.
I have a second reply here, which I hope will come closer to
being relevant to the issues of interest.
What I think is being looked for is a way to describe the
language semantics in areas such as cross-type interference and
what is meant when an uninitialized object is read. I thought
about this question both while I was writing the longer earlier
reply and then more deeply afterwards.
What I think is most important is that these areas in particular
are not about language semantics in the same way as, for example,
array indexing. Rather they are about what transformations a
compiler is allowed to do in the presence of various combinations
of program constructs. That difference means the C standard
should express the rules in a way that more directly reflects
what's going on. More specifically, the standard should say or
explain what can be done, not by describing language semantics
(which is indirect), but explicitly in terms of what compiler
transformations are allowed (which is direct). Note that there
is precedent for this idea, in how the C standard talks about
looping constructs and when they may be assumed to terminate.
To give an example, take uninitialized objects, either automatic
variables without an initializer, or memory allocated by malloc or
added by realloc. The most natural semantics for such situations
is to say that newly "created" memory gets an unspecified object representation at the start of its lifetime. (Yes I know that C
in its current form lets automatic objects be "uninitialized"
whenever their declaration points are reached, but let's ignore
that for now.) Now suppose a program has a read access where it
is easy to deduce that the object being read is still in the
"unspecified object representation" initial state. To simplify
the discussion, suppose the type of the access is a pointer type,
and so is known to have trap representations (the name is changed
in the C23 draft, but the idea is what's important).
What is a compiler allowed to do in such circumstances? One thing
it might reasonably be allowed to do is to cause the program to be
terminated if it ever reaches such an access. Or there might be
an option to initialize the pointer to NULL. Or, if a suitable
compiler option were invoked, the construct might be flagged with
a fatal error (or of course a warning). There are all sorts of
actions a developer might want the compiler to take, and a
compiler could offer many of those options, as choices selected
under control of command line switches (or equivalent). I think a
few points are worth making.
One, there must be some sort of default action that all compilers
have to support. The default action in this case might be to
issue a non-fatal diagnostic.
Two, there must be a way for the developer to tell the compiler to
"proceed blindly" - saying, in effect, I accept that the compiled
code might misbehave, but let me take that risk, and generate code
like it's going to work. (In other words, for the read access, go
ahead and load whatever unspecified object representation happens
to be there.) A "proceed blindly" choice probably shouldn't be
the default, but it must be available.
Three, the consequence must never be "undefined behavior", unless
there is an explicit stipulation to that effect. The stipulation
might take the form of a #pragma, or a compiler option, or a code
decoration using "attribute" (whatever the syntax for such things
is).
I know my comments here are somewhat sketchy, but hopefully a
general sense of the ideas gets across. The suggestions should at
least serve to stimulate further discussion.
As another example, I have speed critical code that relies on running
on 2s complement machines with wraparound on signed integer overflow, and that code is being very clear and explicit in doing so, but there
is no C90 notation to tell all ISO-C implementation that this is the intention, thus it is explicit only in comments, not in the tokens
passed to the C compiler.
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
As another example, I have speed critical code that relies on running
on 2s complement machines with wraparound on signed integer overflow, and
that code is being very clear and explicit in doing so, but there
is no C90 notation to tell all ISO-C implementation that this is the
intention, thus it is explicit only in comments, not in the tokens
passed to the C compiler.
You can tell the compiler you want 2s complement by using the intN_t
types if you can find one that suits your portability requirements.
And can you not use unsigned arithmetic, re-interpreting as signed for
those places where it matters? The "overflow" can only happen in
the arithmetic, not in the re-interpretation.
I know this is a deviation from the topic, so feel free to ignore if you don't want to get into it.
On 2023-09-07 18:19, Ben Bacarisse wrote:
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
As another example, I have speed critical code that relies on runningYou can tell the compiler you want 2s complement by using the intN_t
on 2s complement machines with wraparound on signed integer overflow, and >>> that code is being very clear and explicit in doing so, but there
is no C90 notation to tell all ISO-C implementation that this is the
intention, thus it is explicit only in comments, not in the tokens
passed to the C compiler.
types if you can find one that suits your portability requirements.
And can you not use unsigned arithmetic, re-interpreting as signed for
those places where it matters? The "overflow" can only happen in
the arithmetic, not in the re-interpretation.
I know this is a deviation from the topic, so feel free to ignore if you
don't want to get into it.
The code in question has as explicit design condition that the compiler implements signed versions with wraparound for each unsigned int type .
The code cannot rely on the intN_t types because they were not part of
C90 and thus do not exist as separate types in some targeted
compilers.
Excessive casting where directly using the desired type seems possible
is highly counter-intuitive and thus it is inherently wrong for an
optimizer to presume the right to mangle code using types such as "int", "short int", "long int" and "signed char".
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 991 |
Nodes: | 10 (0 / 10) |
Uptime: | 146:05:43 |
Calls: | 12,962 |
Files: | 186,574 |
Messages: | 3,266,537 |