Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Note that in a context that requires a constant expression, overflow is >>>>>> a constraint violation. For example, a case label like:
case (INT_MAX + 1) * 0:
must be diagnosed at compile time.
gcc disagrees with you.
What makes you think so?
[...]
I'm skipping this and proceeding on to the original question.
Why?
gcc is not authoritative.
I didn't want to get into an argument
about whether gcc is conforming, or which version of gcc was used,
or any similar distractions.
The C standard /is/ authoritative,
and I thought it would save time to cut to the chase.
[snip]
I'd like to know whether you still think you were right. If so,
I'd like to see your explanation. If not, an admission that you
made a mistake would be appreciated. But I expect neither from you.
I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.
In investigating this question, I have run compilations using
multiple versions of gcc, on two different platforms. I have looked >carefully through the gcc man page. I have also run compilations
using multiple versions of clang, on two different platforms. After
doing all that, I ran compilations using godbolt, so I could check
the latest, or maybe almost latest, versions of gcc and clang. All
the different versions of gcc and clang that I have tried support my >hypothesis that gcc (and now also clang) interpret the C standard so
as to conclude that conforming to the C standard need not require a >diagnostic for situations like the code under discussion..
I'd like to ask you to do two things. First, read through the
reasoning given in my previous post, try to assess whether that
reasoning is sound, and post the results of yours contemplations.
Second, look again at the question of whether gcc (and also clang,
if you're up to it) support the hypothesis that a conforming
implementation need not give a diagnostic for code like that under >discussion. See if you can find a way of framing the question that
supports my statement, rather than simply looking for one that
supports your preconceived ideas. Post the results of your
investigations, both what other experiments you tried, and what your >assessment is of the results you got.
Do these two things and I will endeavor to explain my views on the
questions you have raised here, if such explanations are still
needed after your further examinations and comments.
[SNIP]
I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.
The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.
Now it is you who is changing the subject. Besides not being on
point to the question being considered, it's a silly argument, and I
would hope you are smart enough to realize that. However, if you do
what I have asked in the previous paragraph, I can try to explain
why I think your views on this unrelated matter are wrongheaded.
My example is this:
constexpr int A = ~0U;
The type of the rhs is `int` and the value is not representable
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.
Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.
It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.
I have yet to find or be shown a way in which the standard
actually guarantees that.
There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.
I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.
Exactly. It could also emit the string, "GOODBYE WORLD."
This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.
Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.
UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.
So any program that produces no output at all is strictly
conforming? Then what about this?
#include <limits.h>
int
zero(void)
{
return (INT_MAX + 1) * 0;
}
int
main(void)
{
(void)zero();
return 0;
}
This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.
In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.
I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Note that in a context that requires a constant expression, overflow is >>>>>> a constraint violation. For example, a case label like:
case (INT_MAX + 1) * 0:
must be diagnosed at compile time.
gcc disagrees with you.
What makes you think so?
[...]
I'm skipping this and proceeding on to the original question.
Why?
gcc is not authoritative. I didn't want to get into an argument
about whether gcc is conforming, or which version of gcc was used,
or any similar distractions. The C standard /is/ authoritative,
and I thought it would save time to cut to the chase.
You made a statement, "gcc disagrees with you". I demonstrated,
in text that you snipped, that gcc does in fact agree with me.
No, you didn't.
You were wrong.
No, I wasn't. Your testing was faulty.
I don't know the basis of your error, so I asked.
Or maybe I'm missing something, and you had a valid point that I
didn't understand.
I'm offended that you think I have an obligation to remedy your
habit of lazy thinking, especially when as here the answer was
staring you right in the face, and you simply ignored it.
You're not required to answer my question, which I think was
an extremely reasonable one, but quoting it and then explicitly
refusing to answer it is pointlessly rude.
I wasn't refusing to answer. What I was doing was trying to
answer the original question, and answer it in a way that wouldn't
get lost in pointless bickering. Silly me.
I'd like to know whether you still think you were right. If so,
I'd like to see your explanation. If not, an admission that you
made a mistake would be appreciated. But I expect neither from you.
I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.
In investigating this question, I have run compilations using
multiple versions of gcc, on two different platforms. I have looked carefully through the gcc man page. I have also run compilations
using multiple versions of clang, on two different platforms. After
doing all that, I ran compilations using godbolt, so I could check
the latest, or maybe almost latest, versions of gcc and clang. All
the different versions of gcc and clang that I have tried support my hypothesis that gcc (and now also clang) interpret the C standard so
as to conclude that conforming to the C standard need not require a diagnostic for situations like the code under discussion..
I'd like to ask you to do two things. First, read through the
reasoning given in my previous post, try to assess whether that
reasoning is sound, and post the results of yours contemplations.
Second, look again at the question of whether gcc (and also clang,
if you're up to it) support the hypothesis that a conforming
implementation need not give a diagnostic for code like that under discussion. See if you can find a way of framing the question that
supports my statement, rather than simply looking for one that
supports your preconceived ideas. Post the results of your
investigations, both what other experiments you tried, and what your assessment is of the results you got.
Do these two things and I will endeavor to explain my views on the
questions you have raised here, if such explanations are still
needed after your further examinations and comments.
[SNIP]
I see no basis for this belief. My conclusions are based on what
the C standard actually says, rather than guesses about some
unstated "intentions". I think you would do well to reach your
conclusions based more on the actual text of the C standard, and
less on your interpretation of what the text was "intended" to
mean.
The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.
Now it is you who is changing the subject. Besides not being on
point to the question being considered, it's a silly argument, and I
would hope you are smart enough to realize that. However, if you do
what I have asked in the previous paragraph, I can try to explain
why I think your views on this unrelated matter are wrongheaded.
The actual text of the standard implies that 42 is not an expression.
I rely on the obvious intent to conclude that it is.
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.
Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.
I disagree. That's not a sensible interpretation of what the
standard says.
A call to a foo() would have undefined behavior if it occurred.
There
is no call to foo().
Similarly:
int a = ..., b = ...;
int c;
if (b != 0) {
c = a / b;
}
else {
c = 0;
}
A division by zero would have undefined behavior if it occurred,
but it never occurs. A compiler cannot reject the above code
because of UB that never happens.
[...]
It returns a status of 0 from main and does nothing else.
A conforming implementation *must* generate code that implements
that behavior.
I have yet to find or be shown a way in which the standard
actually guarantees that.
How does the standard guarantee *anything*?
This strictly conforming program:
int main(void) { return 0; }
when executed returns a status of 0 from main and does nothing else.
Adding an uncalled function to the same source file doesn't change
that.
[...]
There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied. We're well past that; now it's
a vehicle for compiler writers to make benchmarks faster, but is
(generally) hostile to programmers. A lot of hay is made about
it in this group, but at the core, it's just (ironically) not
well-defined.
The standard does say what UB is meant for. It says what UB
*is*, and what constructs lead to it (by omission in some cases).
Any optimization tricks played by compiler implementers must be
based on that specification.
[...]
I agree. printf("hello, world\n") must write that string to standard
output, which may be a file or an interactive device. Just what
that means is unspecified or implementation-defined. It might be
printed in EBCDIC or incised into clay tablets. Closing stdout,
which occurs when main() terminates, might involve firing the tablet
or emitting control sequences for a screen reader.
Exactly. It could also emit the string, "GOODBYE WORLD."
No, it couldn't. It must emit "hello, world\n" in some form.
It must emit the character 'h' as represented in the execution
character set, followed by 'e', and so on.
[...]
This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.
It's not UB if it's never called. Behavior that doesn't happen is
not behavior.
I did not presuppose that the program is strictly conforming.
I read the source code and determined that it meets the standard's
definition of a strictly conforming program.
[...]
Ok, so in that case, would we say that "`foo` has undefined
behavior?" The qualification, "...if called" seems superfluous,
and I don't see anything in the standard that explicitly
disagrees.
The qualification "if called" is the whole point.
[...]
UB can time-travel, however. Because it's undefined, the
compiler is free to assume that it never executes, or that it
always executes.
"UB can time-travel" is perhaps an oversimplification.
An example is
a bug that occurred in the Linux kernel, something like:
void func(int *ptr) {
do_something_with(*ptr);
if (ptr != NULL) {
blah();
}
}
The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
not null, and elided the test on the following line.
But even assuming that's valid, a compiler absolutely cannot assume that
an instance UB always executes when, according to the semantics of the >program, it provably never executes.
[...]
So any program that produces no output at all is strictly
conforming? Then what about this?
#include <limits.h>
int
zero(void)
{
return (INT_MAX + 1) * 0;
}
int
main(void)
{
(void)zero();
return 0;
}
That's an interesting point. A more terse example:
#include <limits.h>
int main(void) {
int unused = INT_MAX + 1;
}
This program produces no output, yet clearly executes a function
that contains an expression that induces undefined behavior when
evaluated. I suppose an argument could be made that it _might_
generate output due to UB, as UB imposes no requirements Not to
do so, so perhaps the _absence_ of output depends on UB.
The program clearly has undefined behavior when executed, but no
output depends on that undefined behavior. In my humble opinion,
this demonstrates a flaw in the standard's definition of "strictly
conforming program". (As a programmer: Don't do that.)
[...]
In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.
I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.
"There are only two kinds of languages: the ones people complain
about and the ones nobody uses."
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just because
it can't prove that there will be no UB. If it could, every signed
or floating-point arithmetic operation with unknown operand values >>>>would grant the same permission.
But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.
In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.
So there are two things that are at play here.
First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.
Indeed, I would go so far as to suggest that _most_ instances of
UB are detected and used (by the translator) during translation.
So to say that, "this program doesn't have UB because the
statement that contains UB is never executed" doesn't make a lot
of sense to me. It would be closer to being correct if one said
"this program is unaffected by UB since the expression that has
UB is never evaluated when the program executes": again, in this
case (as, I suspect, in most cases) the UB simply _is_: the
expression `INT_MAX + 1` does not become well-defined just
because it is never executed.
Second, there's this notion that the standard is just
underspecified with respect to these matters, specifically, it
does not _prohibit_ a translation from implementing an emulator
for the abstract machine that evaluates code at translation
time. Indeed, I suspect that _most_ compilers do something
largely analogous to that; that's how they detect UB so that
they can take advantage of it when optimizing. But if that's
the case, then nothing prohibits them from relieving themselves
of their obligation to follow the standard once they observe
that some bit of code has UB.
A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.
Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.
Is it? I am unable to locate where the standard _actually says
that it is_. That is my whole point.
And yet the standard does not say that. That is an
interpretation; I assume it is universally shared, but if we
want to limit ourselves to what the standard _actually says_ it
is woefully underspecified in this regard.
There was, once, a view that was almost universally shared that
UB was meant for things that could not be precisely described
because hardware was too varied.
This is circular reasoning. You're saying that something that
is provably UB in this program cannot prevent that program from
being strictly confirming because the program is strictly
confirming.
This presupposes that the program is strictly conforming, but
in the limit, the standard can be interpreted in such a way that
if any statement in the program is proveably UB (as this one is)
then the program cannot said to be strictly conforming.
In my ideal world, C would be rigorously defined with a precise
operational semantics. That would be accompanied by an
explanatory document that presented those semantics in lay
terms in prose, similar to the standard now, for those who did
not want to drive Coq or something similar. But at least we'd
have something definitive to define the language, so that when
there was apparent ambiguity, we had some objective metric by
which to judge. The C standard, as written, is nowhere close as
precise as it should be.
I do not think that this will ever happen: not only would it be
very difficult to produce (as you noted elsethread), I think the
compiler writers would rebel if they felt that their UB hands
were tied by a formal specification.
In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.
Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.
I disagree. That's not a sensible interpretation of what the
standard says.
I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.
In article <865x3yd21n.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86ik81cfk5.fsf_-_@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-06-01 00:54, Keith Thompson wrote:
[...]
Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
required to do so, and (INT_MAX + 1) * 0 still has undefined
behavior. Undefined behavior is determined by the rules of the
abstract machine *without* any adjustments permitted by the as-if
rule.
This is something I really don't get in the actual C-logic...
Using constants that can be determined at compile time is UB here,
despite the '* 0' mathematically indicating an IMO clear semantics,
but using variables is only UB possibly at runtime? [...]
There's an important distinction to make here. Consider this
program:
#include <limits.h>
int
foo(){
int zero = (INT_MAX+1)*0;
return zero;
}
int
main(){
return 0;
}
This program does not transgress the bounds of undefined behavior.
To clarify, the comments in my posting were meant to be read as
saying the given text is the entire program, and that it is strictly
conforming with respect to conforming hosted implementations.
(Incidentally, given the rules for freestanding implementations, I'm
not sure that it is even possible for any program to be strictly
conforming with respect to conforming freestanding implementations.
In any case my statements were meant only in the context of hosted
implementations.)
Ok.
[snip]
Perhaps you mean that this is irrelevant because `foo` is not
invoked, but I see no reason why that need be the case in e.g.
a freestanding environment.
I explained the context of my previous statements above. Sorry for
not saying that in the original message.
In a hosted environment, I don't
think anything explicitly prevents `foo` from being called after
`main` returns (though I can't imagine that would happen in real
life; it would be weird if it did).
The semantics described in the ISO C standard don't admit that
possibility.
Could you please point to where it says this, in the C standard?
I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.
In article <86y0gp82pd.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.
I am mystified as to why you are bringing my name into this, and
why you think "I may not like the consequences", or even what
that means. In any event, you are evidently laboring under some
assumption about what I think about this matter that is probably
incorrect.
But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.
I do not believe that that is the intent. But it _is_
conformant with the text of the standard.
In article <11075os$3fm4u$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
A naive compiler that performs no optimizations would generate
code for foo() that attempts to compute (INT_MAX+1)*0 step by
step, without recognizing the overflow, and that code would never
be executed.
Sure. But a far more sophisticated translator (and I would
argue a nefarious one) could emulate that code, decide it was
UB, and immediately fail translation with an error.
I disagree. That's not a sensible interpretation of what the
standard says.
I agree it's not sensible. But sadly, the standard does not
seem to explicitly prohibit it, either. This is the point: we
necessarily rely on a "reasonable interpretation" of the
standard to be able to usefully write C code. An adversarial
interpretation is not sensible, but it appears that such is
possible given the standard as written. This is a danger with a
language that is not formally specified.
I started to compose a followup, but I found that I was mostly
repeating things I've already written.
I see no semantic difference between code in a function that's never
called and code that simply isn't in the program. Neither allows
an implementation to reject a strictly conforming program -- and
yes, the program we've been discussing is as strictly conforming as
`int main(void){}`.
There's nothing special about functions as units of a program
subject to undefined behavior. These two programs are semantically >equivalent:
void foo(void) { do_something(); }
int main(void) { foo(); }
and
int main(void) { do_something(); }
A simpler demonstration program might be:
#include <limits.h>
int main(void) {
return 0;
INT_MAX+1;
}
I assert that it is strictly conforming.
The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)
It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86y0gp82pd.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]
I'd like to know why you ignored my explanation, based directly on
text from the C standard, about why an implementation is allowed to
process the code in question, without giving a diagnostic, and
still be conforming. An explanation that Dan Cross agreed with,
even if he may not like the consequences.
I am mystified as to why you are bringing my name into this, and
why you think "I may not like the consequences", or even what
that means. In any event, you are evidently laboring under some
assumption about what I think about this matter that is probably
incorrect.
In a response to another posting of mine, you wrote this:
But as it happens, I think I can see how your interpretation may
be valid: if, as a result of UB, the expression evaluates to "0"
(or 12 or something simiilar) that _is_ representable, then
there _is no constraint violation_ and so no diagnostic is
required.
I do not believe that that is the intent. But it _is_
conformant with the text of the standard.
I based my statement that begins "An explanation that Dan Cross
agreed with, ..." on those two paragraphs.
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[snip]
I cannot find anything that says that arbitrary code cannot run
after `main()` returns, and I don't see how that could possibly
be true.
The logic here is backwards. The C standard is prescriptive: it
says what _does_ happen, not what _doesn't_ happen.
If one wants
to establish that some "action" takes place, it is necessary to
find a passage, or passages, in the C standard that, if all are
taken together, shows that the "action" occurs, or at least that it
can occur.
The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.
If there is no such chain of reasoning, naming
the pertinent passages in the C standard, to establish a possible
call, then there is no possible call. In other words the burden of
proof for a claim that some action could occur rests on whoever is
making the claim; there is no need to look for something in the C
standard that says something cannot occur.
[...]
I've discussed this particular glitch before, but it's been a while.
N3220 6.5.1 says:
An *expression* is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof.
I believe the wording is unchanged from C90 up to the latest C202y
draft. Since the word "expression" is in italics, this is the
standard's definition of the word.
This is a flawed definition. The terms "operator" and "operand"
are defined in 6.4.6:
*punctuator: one of
[ ] ( )
[snip]
A punctuator is a symbol that has independent syntactic and semantic
significance. Depending on context, it may specify an operation to
be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in
which case it is known as an *operator* (other forms of operator also
exist in some contexts). An *operand* is an entity on which an
operator acts.
Consider this expression statement:
42;
Is `42` an expression? Clearly it's intended to be, but there is no operator, and therefore there is no operand, so it doesn't meet the standard's definition of the word "expression".
[...]
The fact that the standard's definition of "expression" is flawed is
not much of a problem in practice. Virtually everyone, implementers
and programmers, assumes the obvious intent. Nobody believes that
`42` isn't an expression. But it is my strongly held opinion that
the wording should be improved in a future edition of the standard.
I think it should say something to the effect that the meaning
of the term "expression" is defined by the grammar. The current
wording that claims to be the definition of the term could, with
a few tweaks, still be turned into a valid normative statement
*about* expressions.
I have a similar issue with the standard's definition of "value":
"precise meaning of the contents of an object when interpreted as
having a specific type". It's obvious that the result of evaluating
a non-void expression (such as the infamous `42`) is a "value",
but the definition implies that a "value" can only be the meaning
of the contents of an object. Nobody is actually misled by the
current definition, but it should be improved.
On 2026-06-08 23:05, Keith Thompson wrote:
[...]
I've discussed this particular glitch before, but it's been a while.
N3220 6.5.1 says:
An *expression* is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs
a combination thereof.
I believe the wording is unchanged from C90 up to the latest C202y
draft. Since the word "expression" is in italics, this is the
standard's definition of the word.
This is a flawed definition. The terms "operator" and "operand"
are defined in 6.4.6:
*punctuator: one of
[ ] ( )
[snip]
A punctuator is a symbol that has independent syntactic and semantic
significance. Depending on context, it may specify an operation to >> be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in >> which case it is known as an *operator* (other forms of operator >> also
exist in some contexts). An *operand* is an entity on which an
operator acts.
Consider this expression statement:
42;
Is `42` an expression? Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".
Above you used the term "expression statement", and then compare the
"42" to an "expression".
I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.
I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.
Usually I'd expect above "expression-statement" to serve some purpose, semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)
Or do these stand-alone values (the "expression-statement") have some practically useful semantics?
In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.
What purpose serve such stand-alone numbers in places where statements
are expected?
[...]
Unfortunately, the C standard is simply not a precise, formal
document. This is well-known, and it's hardly C's fault: indeed
most of the applications of formalized descriptions of PL
semantics to practical programming languages postdates C's
invention; Dana Scott didn't introduce the term, "operational
semantics" until 1970, and it didn't start to make a serious
impact on languages until later.
[...]
On 09/06/2026 14:17, Janis Papanagnou wrote:
On 2026-06-08 23:05, Keith Thompson wrote:
[...]
I've discussed this particular glitch before, but it's been a while.
N3220 6.5.1 says:
An *expression* is a sequence of operators and operands that
specifies computation of a value, or that designates an object
or a function, or that generates side effects, or that performs >>> a combination thereof.
I believe the wording is unchanged from C90 up to the latest C202y
draft. Since the word "expression" is in italics, this is the
standard's definition of the word.
This is a flawed definition. The terms "operator" and "operand"
are defined in 6.4.6:
*punctuator: one of
[ ] ( )
[snip]
A punctuator is a symbol that has independent syntactic and
semantic
significance. Depending on context, it may specify an operation to >>> be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in >>> which case it is known as an *operator* (other forms of operator >>> also
exist in some contexts). An *operand* is an entity on which an
operator acts.
Consider this expression statement:
42;
Is `42` an expression? Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".
Above you used the term "expression statement", and then compare the
"42" to an "expression".
I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.
I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.
Usually I'd expect above "expression-statement" to serve some purpose,
semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)
Or do these stand-alone values (the "expression-statement") have some
practically useful semantics?
In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.
What purpose serve such stand-alone numbers in places where statements
are expected?
I think it is just difficult for the syntax to ban certain expressons
and not others. How would you express that in the grammar?
If you ramp up the warnings, then you'll get messages like 'statement
with no effect' or 'computed value not used', since sometimes there are side-effects that are needed:
f() + g();
f() and g() both do something, but nothing is done with their sum.
[...]
On 2026-06-09 15:53, Bart wrote:
On 09/06/2026 14:17, Janis Papanagnou wrote:
On 2026-06-08 23:05, Keith Thompson wrote:
[...]
I've discussed this particular glitch before, but it's been a while.
N3220 6.5.1 says:
An *expression* is a sequence of operators and operands that
specifies computation of a value, or that designates an object >>>> or a function, or that generates side effects, or that performs >>>> a combination thereof.
I believe the wording is unchanged from C90 up to the latest C202y
draft. Since the word "expression" is in italics, this is the
standard's definition of the word.
This is a flawed definition. The terms "operator" and "operand"
are defined in 6.4.6:
*punctuator: one of
[ ] ( )
[snip]
A punctuator is a symbol that has independent syntactic and
semantic
significance. Depending on context, it may specify an operation to
be performed (which in turn may yield a value or a function
designator, produce a side effect, or some combination thereof) in
which case it is known as an *operator* (other forms of
operator also
exist in some contexts). An *operand* is an entity on which an >>>> operator acts.
Consider this expression statement:
42;
Is `42` an expression? Clearly it's intended to be, but there is no
operator, and therefore there is no operand, so it doesn't meet the
standard's definition of the word "expression".
Above you used the term "expression statement", and then compare the
"42" to an "expression".
I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.
I know from various languages' syntax definitions that a number like
'42' is a sensible form for an expression (and no operators required).
It's also depending on the context. Where expressions may be written
(and where not) depends on the concrete language; syntactically and
also semantically.
Usually I'd expect above "expression-statement" to serve some purpose,
semantically. I don't recall that in "C" such an expression-statement
would serve any purpose. (Or that they'd show any observable behavior,
if that term fits the C-parlance better?)
Or do these stand-alone values (the "expression-statement") have some
practically useful semantics?
In other languages such stand-alone values serve a purpose; e.g. they
may determine the result value of a block that can then be used in an
outer context; but in "C" such constructs are obviously not possible.
What purpose serve such stand-alone numbers in places where statements
are expected?
I think it is just difficult for the syntax to ban certain expressons
and not others. How would you express that in the grammar?
Well, I'd do that as it's done in other languages.
Define _statements_ and define _expressions_.
And defined expressions
in contexts where a sensible operational semantics can be defined (as
in mathematical formulas, actual function parameter lists, etc.), but
not in places where statements are expected.
If you ramp up the warnings, then you'll get messages like 'statement
with no effect' or 'computed value not used', since sometimes there
are side-effects that are needed:
f() + g();
f() and g() both do something, but nothing is done with their sum.
Right. And I wouldn't allow a mathematical formula where the results
are calculated but not used, here an expression, as a statement.
But your example may indeed lead to the actual answer to my question;
when writing just
f();
There's no distinction of procedures and functions in "C". One cannot
tell whether that f() is a "procedure" (i.e. a function with no return
value, or one with return value but the call just relying on the side effects). In "C" any value of f() just gets discarded in this context.
That of course doesn't mean that it could be handled by the compilers
and sensibly defined by the language, depending on how f() is actually defined. After all, 'f();' is not the same case as '42;'.
But okay, we're talking about "C" here - so own design preferences are
anyway irrelevant here.
Janis
[...]
f() + g();
f() and g() both do something, but nothing is done with their sum.
On 6/9/26 15:53, Bart wrote:
f() + g();
f() and g() both do something, but nothing is done with their sum.
I've just one question : why did you waste your life time
with a lot of non-sense questions ?
In article <1107rk3$3ldg4$1@kst.eternal-september.org>,[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
The permission for UB to result in terminating a translation
isn't even in normative text. It's in a non-normative note,
which in principle means that it should be derivable from the
normative text of the standard. (I'm not entirely sure it can be.)
That specific instance is not, no; that's in a note as you point
out. I believe deriving it from the normative text is based on
UB imposing no requirement at all on the implementation.
It certainly doesn't override the requirement that a conforming
hosted implementation shall accept any strictly conforming program.
...assuming the program is strictly conforming.
I have arrived at the same place you are with your "42 is not an--
expression" example. The wording of the standard could be
improved to avoid things like this.
Actually, no, a reference to a function is not necessary. A[...]
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}
void
foo(void)
{
printf("never called\n");
}
```
The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.
Above you used the term "expression statement", and then compare the
"42" to an "expression".
I know from my earlier C-days that '42;' is a valid statement, and so
the term "expression statement" makes sense to me.
Dan Cross <cross@spitfire.i.gajendra.net> wrote:...
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.
So there are two things that are at play here.
First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
The committee has decided otherwise. The committee's resolution to DR
109 said:
"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a
conforming implementation."
David Brown <david.brown@hesbynett.no> writes:
[...]
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.
In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored). In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.
Both approaches are valid.
On 10/06/2026 00:34, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.
I don't know enough about Ada to be sure, but Pascal does not do this -
see below.
In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.
Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned the number of characters printed would be a function, used in an expression,
not a procedure used in a statement.
The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).
In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (§6.8.4p2).
You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored). In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.
Both approaches are valid.
Indeed they are.
It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour. (A "pure procedure" would not do anything.)
As far as I remember, Pascal does not make that distinction.
On 10/06/2026 00:34, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.
I don't know enough about Ada to be sure, but Pascal does not do this
- see below.
In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.
Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned
the number of characters printed would be a function, used in an
expression, not a procedure used in a statement.
The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily
be done in a clear and consistent way is to distinguish between two expressions of type "int" (or any other general non-void type).
On 10/06/2026 08:04, David Brown wrote:
On 10/06/2026 00:34, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.
I don't know enough about Ada to be sure, but Pascal does not do this
- see below.
In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.
Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned
the number of characters printed would be a function, used in an
expression, not a procedure used in a statement.
The rough equivalent of the distinction between Pascal procedures and
functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to
distinguish between void and non-void like this. What cannot easily
be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).
In C, an expression statement "expr;" causes the expression to be
evaluated as a void expression for its side effects (§6.8.4p2).
In C201x draft. 6.8.4p2 is about selection statements.
You can, arguably, say that C also requires all statements to be of
"void" type, just like Pascal - but the cast-to-void is done
implicitly to treat "expr;" as "(void) expr;".
That's not quite the same thing. If I write:
int a;
a;
then gcc -Wall will report a warning. But write it as (void)a, then it doesn't.
While this is awkward to express in a language's grammar, it can choose
to list the kinds of expressions that /are/ allowed to be statements,
rather than leave it to the whim of an implemenation. (The ones that
aren't allowed would be a much bigger, unlimited set.)
For example:
E(...); // function call
++E; // increment
E = E; // assigment (and compound assignment)
E is any expression term. Here, the call/increment/assignment is the top-level AST mode.
(I do this in my stuff, and there I can override the restriction using 'eval': eval a + b, which turns it into an allowed form.
Mainly this is for convenience of testing, but it was also used to
ensure an expression ended up in the primary register for subsequent
inline assembly.)
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored). In Ada, an error in the
equivalent Put_Line("Hello, world") raises an exception, which
can't easily be ignored.
Both approaches are valid.
Indeed they are.
Distinguishing between function and procedure is incredibly rare in
modern languages. There the preoccupation seems to be to unify
everything: everything is a function, even if-statements and loops.
Every function is a closure, etc. I do not consider that useful.
It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour. (A "pure procedure" would not do
anything.) As far as I remember, Pascal does not make that distinction.
This goes the other way and is a better idea!
David Brown <david.brown@hesbynett.no> writes:
On 10/06/2026 00:34, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.
I don't know enough about Ada to be sure, but Pascal does not do this
- see below.
You seem to disagree with me, but then you describe most of what
I wrote. I'm not sure where you disagree, or where our signals
got crossed.
Ada and Pascal don't have expression statements. The Pascal
(writeln(...)) and Ada (Put_Line(...)) constructs most similar
to C's printf("Hello\n") are procedure calls. 42 can't made into
a statement by adding a semicolon. Neither can any function call.
But a procedure call can. That's how and why Pascal and Ada allow
one but not the other. (And both languages deliberately make it
awkward to ignore the value returned by a function.)
In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.
Sure. But the key factor there is that "printf", or its equivalent
(such as "writeln", if I remember my Pascal correctly - it's been a
while) are /procedures/. A "print" function in Pascal that returned
the number of characters printed would be a function, used in an
expression, not a procedure used in a statement.
Right, and a Pascal function that prints its argument and returns an
integer value could not be used by itself as a statement.
The rough equivalent of the distinction between Pascal procedures and
functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to
distinguish between void and non-void like this. What cannot easily
be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).
Right. Which is why the I/O and similar subroutines that you'd want to
use as statements are procedures, not functions.
James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]
The committee has decided otherwise. The committee's resolution to DR
109 said:
"A conforming implementation must not fail to translate a strictly
conforming program simply because some possible execution of that
program would result in undefined behavior. Because foo might never be
called, the example given must be successfully translated by a
conforming implementation."
https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html
[...]
[...]
Actually, no, a reference to a function is not necessary. A[...]
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}
void
foo(void)
{
printf("never called\n");
}
```
The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.
That doesn't make sense to me. Do you have a citation to this incident,
and is it relevant to C?
There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),
but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.
In article <110a34q$b2kq$2@kst.eternal-september.org>,
[snip]
Here's a C version with the same behavior:
```
term% cat weird.c
#include <stdio.h>
int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}
void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```
In C, an expression statement "expr;" causes the expression to be[...]
evaluated as a void expression for its side effects (§6.8.4p2). You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".
In article <110a34q$b2kq$2@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
Actually, no, a reference to a function is not necessary. A[...]
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
```
#include <stdio.h>
void foo(void);
int
main(void)
{
for (;;);
}
void
foo(void)
{
printf("never called\n");
}
```
The result of which, when run, was to print the text "never
called" and exit. That compiler was conformant with the text
of the standard.
That doesn't make sense to me. Do you have a citation to this incident,
Yes: https://godbolt.org/z/d1WP4KP99
There was such an outcry when this was discovered that the C++
standard was modified to add a note explicitly allowing,
"trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress
That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html
and is it relevant to C?
Here's a C version with the same behavior:
```
term% cat weird.c
#include <stdio.h>
int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}
void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```
There is a special rule in C about implementations being allowed
to assume that an infinite loop terminates (N3220 6.8.6.1p4),
The program above meets the criteria in sec 6.8.6.1 para 4 that
allows an implementation to assume that the loop terminates.
Godbolt link: https://godbolt.org/z/q46o5cYGM
but (a) it wouldn't apply to this case, and (b) even if it did,
it wouldn't imply that an implicit call to foo would be permitted.
I can imagine an argument that the program has undefined behavior
and therefore it could print "never called" or "nasal demons",
but I'd have to see the argument.
Regehr aluded to this with his taxonomy of undefined functions.
For a function that is always undefined (a "Type 3" function), a
compiler is under no obligation to even produce a return
instruction for it, and the behavior of a call to such a
function is totally undefined. Nothing stops it from cascading
into whatever the linker happens to put after it.
Therefore, given UB, it is not necessary to have a reference to
some function in a program's source text in order for it to be
executed.
Replying to myself here, but...this is another example of weird[SNIP]
behavior:
```
term% cat boo.c
#include <limits.h>
int
monstartup(void)
{
return INT_MAX + 1;
}
int
main(void)
{
return 0;
}
(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)
In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]
The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.
Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
[...]
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <86tsrc8d0b.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
[...]
The C standard doesn't need to say that, for example, a
function x() other than main(), whose name is never referenced,
will never be called. If someone wants to establish that x() could
be called, there needs to be a chain of reasoning going through the
semantic descriptions given in the C standard, to show that a call
to x() could occur.
Actually, no, a reference to a function is not necessary. A
couple of years ago, a well-publicized issue in a C++ compiler a
couple of years ago was something along the lines of this:
[...]
This is comp.lang.c. My comments were only about C, and not
about C++. But of course you already knew that.
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^ what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
[...]
Replying to myself here, but...this is another example of weird[SNIP]
behavior:
```
term% cat boo.c
#include <limits.h>
int
monstartup(void)
{
return INT_MAX + 1;
}
int
main(void)
{
return 0;
}
(I admit that I am cheating a bit, but I claim that this program
is strictly conforming.)
I agree that the program is strictly conforming.
I don't know the details, but I think "monstartup" is a special name,
and that the program would behave as expected if a different name
were used. Since "monstartup" is not reserved, an implementation
that visibly treats it specially is not conforming.
Right. ("for (;;);" in the original program does not.)
Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.
The rule in C is (6.8.6.1p4):
An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
— input/output operations
— accessing a volatile object
— synchronization or atomic operations.
`for (;;)` is treated as having a constant controlling expression.
This covers more cases than the C++ rule.
I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".
In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
the implementation is allowed to "assume" that the loop eventually terminates. It's not clear what permissions the implementation is being given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined behavior it could do anything.
A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very differently because of one obscure subclause in the standard.
David Brown <david.brown@hesbynett.no> writes:
[...]
In C, an expression statement "expr;" causes the expression to be[...]
evaluated as a void expression for its side effects (§6.8.4p2). You
can, arguably, say that C also requires all statements to be of "void"
type, just like Pascal - but the cast-to-void is done implicitly to
treat "expr;" as "(void) expr;".
In an expression statement, the expression is "evaluated as a void
expression for its side effects". I think that's equivalent to
convert (not casting!) it to void, but the standard doesn't describe
it that way.
6.3.2.2: "If an expression of any other type [other than void]
is evaluated as a void expression, its value or designator is
discarded."
But statements have no type.
On 10/06/2026 23:47, Keith Thompson wrote:
Right. ("for (;;);" in the original program does not.)
Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.
The rule in C is (6.8.6.1p4):
An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
— input/output operations
— accessing a volatile object
— synchronization or atomic operations.
`for (;;)` is treated as having a constant controlling expression.
This covers more cases than the C++ rule.
I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".
In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
the implementation is allowed to "assume" that the loop eventually
terminates. It's not clear what permissions the implementation is being
given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined
behavior it could do anything.
A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very
differently because of one obscure subclause in the standard.
The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."
The loop might originally have contained source code, but become empty >through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.
Equally, I don't think it is likely that compilers will often be able to
use this rule to improve code generation - it would only help in a
situation where the loop's controlling expression is too complicated for
the compiler to be sure that it will terminate, but where the loop body
ends up effectively empty. I doubt if that turns up often in real code >either.
So while I agree that this kind of thing can lead to curiosities and >behaviour that seems counter-intuitive, and is popular with the "modern >compilers are evil" crowd, I really do not see it as an issue in
practice. There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.
[...]
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c
what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
I see the same behavior.
The following largely repeats what I've written previously in
this thread.
Apparently the authors of clang decided that this statement in N3220 >6.8.6.p4:
An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...
means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.
Of course since the behavior is undefined, *anything* could happen.
I don't know what happened inside clang (or the minds of its
maintainers) that caused it to generate code that executes a
statement in the body of a function that's never called, but that's
just one of the infinitely many allowed behaviors. A quick look at the >generated code indicates that there's no x86-64 "retq" instruction
for either main() or hello(), and apparently control falls through
from the end of main() to the body of hello(). That seems weird.
0$, it applies the rules of sec 6.8.6 para 4, assumes thatthe loop must terminate, and therefore can be removed, and
It might just be a bug (but not one that, as far as I can tell,
violates the C standard).
A function whose body contains a construct that would have undefined
behavior if the function were called (not the case here) does not
cause undefined behavior if there are no calls to the function.
In article <110dm6p$17r3s$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 10/06/2026 23:47, Keith Thompson wrote:
Right. ("for (;;);" in the original program does not.)
Note that the C++ special rule applies only when the condition is
equivalent to a constant `true` and the body of the loop is empty.
An implementation can "assume" that any other loop will eventually
finish.
The rule in C is (6.8.6.1p4):
An iteration statement may be assumed by the implementation
to terminate if its controlling expression is not a constant
expression, and none of the following operations are performed
in its body, controlling expression or (in the case of a for
statement) its expression-3
— input/output operations
— accessing a volatile object
— synchronization or atomic operations.
`for (;;)` is treated as having a constant controlling expression.
This covers more cases than the C++ rule.
I dislike it for most of the same reasonss. It should be phrased
in terms of the permitted behavior of a program, not what an
implementation is allowed to "assume".
In addition to that, I dislike the whole idea. I think it's
intended to enable optimizations, but it means that for this
contrived program:
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
the implementation is allowed to "assume" that the loop eventually
terminates. It's not clear what permissions the implementation is being >>> given if the assumption is violated. I think the program could legally
print "never reached", but if violating the assumption implies undefined >>> behavior it could do anything.
A programmer could easily write a program similar to the above
and think that the meaning is perfectly clear, have it behave very
differently because of one obscure subclause in the standard.
The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."
The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).
I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,
#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif
...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}
If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.
Equally, I don't think it is likely that compilers will often be able to
use this rule to improve code generation - it would only help in a
situation where the loop's controlling expression is too complicated for
the compiler to be sure that it will terminate, but where the loop body
ends up effectively empty. I doubt if that turns up often in real code
either.
So while I agree that this kind of thing can lead to curiosities and
behaviour that seems counter-intuitive, and is popular with the "modern
compilers are evil" crowd, I really do not see it as an issue in
practice. There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.
As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.
Personally, I don't think C should be in the business of doing
such things. But it is what it is.
- Dan C.
[...]
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
On 2026-06-09 03:25, Waldek Hebisch wrote:
[...]
Interesting views. - Thanks.
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal >specifications.
On 10/06/2026 23:47, Keith Thompson wrote:
[...]
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
[...]
[...]
The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such as
the compiler seeing that the "keep_going" variable is not volatile and
its value is never used, so assignments to it can be elided, or moving
other things outside the loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely?
In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result of
bugs in the code that accidentally run forever. If the loop is accidentally infinite, the programmer will already be expecting it to
run the code after the loop.
[...]
So while I agree that this kind of thing can lead to curiosities and behaviour that seems counter-intuitive, and is popular with the "modern compilers are evil" crowd, I really do not see it as an issue in
practice. There are many other mistakes programmers can make, or UB
that they hit accidentally - this is a drop in the ocean IMHO.
[...]Here's a C version with the same behavior:
```
term% cat weird.c
#include <stdio.h>
int
main(void)
{
for (unsigned int k = 0; k != 1; k += 2)
;
return 0;
}
void
hello(void)
{
printf("Hello, World!\n");
}
term% clang --version
clang version 22.1.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
term% ./weird
Hello, World!
term%
```
[...]
In article <110eht5$1naub$5@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.
One hopes that a formal specification (that's a term of art, and
implies something that's mathematically precise) would be
accompanied by a commentary for more casual reading.
However,
the truly precise, formal specification would be considered
definitive.
I think the odds of this ever happening for C are slim to none,
but it would be useful.
On 2026-06-09 03:25, Waldek Hebisch wrote:
[...]
Interesting views. - Thanks.
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal specifications.
David Brown <david.brown@hesbynett.no> writes:
[...]
"42" is an expression of type "int", and so is 'printf("Hello\n")'.[...]
How (and why) would a language distinguish between them and allow one
but not the other?
Ada, Pascal, and similar languages do exactly this, for what many
people consider to be good reasons.
In both languages, functions and procedures are distinct. Functions
return values; procedures do not. An expression cannot be turned
into a statement just by adding a semicolon. A function call is
an expression. A procedure call is a statement, not an expression.
An assignment is a statement, not an expression.
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).
[...]
[...]
The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).
[...]
It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour. (A "pure procedure" would not do anything.)
As far as I remember, Pascal does not make that distinction.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-06-09 03:25, Waldek Hebisch wrote:
[...]
Interesting views. - Thanks.
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.
You sniped most of what I wrote.
I certainly would prefer standard
that is less lawyerish and more mathematical, say written in similar
way to Pascal standard. But there is a _big_ gap between normal
mathematical text and a formal mathematical text (and let me note that
Pascal standard is less formal than normal mathematics).
Normal
mathematical text depends on human understanding to disambiguate
and bridge small inconsistencies. Formal one has parts which
are there only because authors were not able to avoid
ambiguity in simpler way. And once things are written in a way
that is well fit to formalizm they tend to be much less
understandable to uninitiated.
On 2026-06-10 00:34, Keith Thompson wrote:...
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).
Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)
The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."
The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is accidentally infinite, the programmer will already be expecting it to
run the code after the loop.
On 2026-06-11 14:12, Janis Papanagnou wrote:
On 2026-06-10 00:34, Keith Thompson wrote:...
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).
Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)
Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?
I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,
#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif
...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}
If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.
As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.
Personally, I don't think C should be in the business of doing
such things. But it is what it is.
On 2026-06-11 21:13, James Kuyper wrote:
On 2026-06-11 14:12, Janis Papanagnou wrote:
On 2026-06-10 00:34, Keith Thompson wrote:...
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).
Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)
Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?
Oh, actually I indeed thought that printing a constant string would not >create any error that would then be indicated by printf's return value.
[...]
I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,
#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif
...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}
If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.
I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
The manual page also notes for the cases where printf returns -1:
[snip error list]
[snip opengroup-links]
In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
I see the same behavior.
The following largely repeats what I've written previously in
this thread.
Apparently the authors of clang decided that this statement in N3220 >>6.8.6.p4:
An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...
means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.
I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).
In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
I see the same behavior.
The following largely repeats what I've written previously in
this thread.
Apparently the authors of clang decided that this statement in N3220 >>>6.8.6.p4:
An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...
means that a program that violates that assumption has undefined >>>behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.
I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).
Why do you think the behavior is unspecified rather that undefined?
Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)
What are the "two more more possibilities" in this case?
On 2026-06-11 21:13, James Kuyper wrote:
On 2026-06-11 14:12, Janis Papanagnou wrote:
On 2026-06-10 00:34, Keith Thompson wrote:...
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).
Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)
Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?
Oh, actually I indeed thought that printing a constant string would not create any error that would then be indicated by printf's return value.
I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not exactly what I'd expect from a language.
Concerning the "guarantees" that you're asking for I sadly have to say
that I meanwhile expect nothing sensible at all any more from "C". ;-)
But to be more serious again...
The man-page is very unspecific on that; 'man 3 printf' says:
"If an output error is encountered, a negative value is returned."
Now of course an error can occur with that simple 'printf' above, for example, by issuing an 'fclose (stdout);' before the 'printf (...);'
But what can I as a C-programmer derive from that; how would one act
on that. (That's just rhetorical.)
Obviously (because of that?) I've never seen anyone test such a call
by, say,
int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}
Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)
But yes, I'm aware of negative OS function or library function output.--
Our rules (back in my C/C++ days) suggested to catch any sensible and possible error indications to quickly localize any potential issues.
On 2026-06-11 21:13, James Kuyper wrote:
On 2026-06-11 14:12, Janis Papanagnou wrote:
On 2026-06-10 00:34, Keith Thompson wrote:...
For I/O, the equivalent of printf is a procedure. In C,
printf("Hello, world\n") returns a negative result to denote an
error (and that value is often ignored).
Erm, I hope that above printf() call does not create an error, but
returns the number of characters in the printed text. ;-)
Hope is nice. I hope, in particular, that you're aware that there are
not guarantees on that matter?
Oh, actually I indeed thought that printing a constant string would not create any error that would then be indicated by printf's return value.
I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not exactly what I'd expect from a language.
Now of course an error can occur with that simple 'printf' above, for example, by issuing an 'fclose (stdout);' before the 'printf (...);'
But what can I as a C-programmer derive from that; how would one act
on that. (That's just rhetorical.)
Obviously (because of that?) I've never seen anyone test such a call
by, say,
int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}
Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:[...]
I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
...unless 'i' or 'n' is modified in the body of
In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
I see the same behavior.
The following largely repeats what I've written previously in
this thread.
Apparently the authors of clang decided that this statement in N3220 >>>>6.8.6.p4:
An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...
means that a program that violates that assumption has undefined >>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>but I agree that the conclusion that the behavior is undefined is
a reasonable one.
I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).
Why do you think the behavior is unspecified rather that undefined?
Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >>(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)
What are the "two more more possibilities" in this case?
The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.
In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^ >>>>>> what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
I see the same behavior.
The following largely repeats what I've written previously in
this thread.
Apparently the authors of clang decided that this statement in N3220 >>>>>6.8.6.p4:
An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...
means that a program that violates that assumption has undefined >>>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>>but I agree that the conclusion that the behavior is undefined is
a reasonable one.
I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).
Why do you think the behavior is unspecified rather that undefined?
Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance". >>>(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)
What are the "two more more possibilities" in this case?
The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.
No, those are not the two choices. An assumption made by an
implementation is not behavior ("external appearance or action").
An implementation might invoke some behavior as a result of some
assumption.
If a loop doesn't terminate and the implementation assumes that
it does, the standard says nothing about the resulting behavior.
It doesn't provide two or more options for the actual behavior.
That's classic UB.
We've seen cases here where the actual behavior is falling through
into a function that's never called. That's certainly not a
possibility provided by the standard.
[...]
I suspect the original intent is as you said, to support removal
of "dead" loops where the body has been optimized away, or
excised using conditional compilation. Something like,
#ifdef DEBUG
#define DOTHING true
#else
#define DOTHING false
#endif
...
for (int i = 0; i < n; i++) {
if (DOTHING) {
// Something complex here...
}
}
If `DEBUG` is not defined in the preprocessor, the compiler has
license to elide the entire loop as part of dead code
elimination.
I think I see what you mean, but in this particular case the loop
can be proven to terminate unless `i` is modified in the body of
the loop, and a compiler can elide the entire loop anyway.
[...]
As I understand it, primarily by reading the C++ problem report,
which covers both C and C++ for background, the idea is to
guarantee forward progress for programs that make use of
threads: consider cooperatively-scheduled green threads; a
programmer who inadvertantly creates an infinite loop shouldn't
be able to starve all threads for access to the CPU.
Personally, I don't think C should be in the business of doing
such things. But it is what it is.
I agree.
David Brown <david.brown@hesbynett.no> writes:
[...]
The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."
The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.
How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?
while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();
A compiler could "assume" that the loop terminates, even if >something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.
[...]
On 2026-06-11 18:30, Waldek Hebisch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-06-09 03:25, Waldek Hebisch wrote:
[...]
Interesting views. - Thanks.
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.
You sniped most of what I wrote.
Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)
I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.
To elaborate only a bit more...
There's folks who have problems with "lawyer's speech" standards.
There's folks who have problems with formal mathematical standards.
But, as to my observation, there's *no* strict or natural hierarchy
that one would imply the other.
You said: "They already struggle with current standard text."
as if there would be a strict "one implies the other" fact; there
isn't one, or to be more cautious, "there isn't necessarily one".
(I used the wording "necessarily" already in my original comment.)
I certainly would prefer standard
that is less lawyerish and more mathematical, say written in similar
way to Pascal standard. But there is a _big_ gap between normal
mathematical text and a formal mathematical text (and let me note that
Pascal standard is less formal than normal mathematics).
I agree.
Normal
mathematical text depends on human understanding to disambiguate
and bridge small inconsistencies. Formal one has parts which
are there only because authors were not able to avoid
ambiguity in simpler way. And once things are written in a way
that is well fit to formalizm they tend to be much less
understandable to uninitiated.
(I'll leave that uncommented. - I've said all I intended to say.)
Janis
On 2026-06-11 08:56, David Brown wrote:
On 10/06/2026 23:47, Keith Thompson wrote:
[...]
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
[...]
[...]
The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely?
I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
clear to the programmer.
In my experience, infinite loops are generally very clearly written -
either as "for (;;)" loops or "while (true)" loops - or they are the
result of bugs in the code that accidentally run forever. If the loop
is accidentally infinite, the programmer will already be expecting it
to run the code after the loop.
[...]
So while I agree that this kind of thing can lead to curiosities and
behaviour that seems counter-intuitive, and is popular with the
"modern compilers are evil" crowd, I really do not see it as an issue
in practice. There are many other mistakes programmers can make, or
UB that they hit accidentally - this is a drop in the ocean IMHO.
Languages shall be sensibly and clearly defined. For bad designs (or
bad standards) the language or standard should be blamed, and not the
critics badly and inappropriately despised as ''"modern compilers are
evil" crowd''. - Programmers are at the final end of the "food chain".
And there's a lot of horrible pits in the C-language where programmers
"made the mistake" to fall in; don't blame them, neither the ones who silently suffer nor the ones who shout out.
David Brown <david.brown@hesbynett.no> writes:
[...]
The idea of all this is given in a footnote in the C standards - "This
is intended to allow compiler transformations such as removal of empty
loops even when termination cannot be proven."
The loop might originally have contained source code, but become empty
through pre-processing, or from other compiler transformations (such
as the compiler seeing that the "keep_going" variable is not volatile
and its value is never used, so assignments to it can be elided, or
moving other things outside the loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely? In my
experience, infinite loops are generally very clearly written - either
as "for (;;)" loops or "while (true)" loops - or they are the result
of bugs in the code that accidentally run forever. If the loop is
accidentally infinite, the programmer will already be expecting it to
run the code after the loop.
How about a loop that has a non-constant condition, but that is
not expected to terminate in normal usage?
while (! something_really_bad_happened()) {
sleep(1);
}
self_destruct();
A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
a call to self_destruct(). There are probably better ways to do that,
but it's straightforward code with seemingly obvious semantics that
an implementation is permitted to make unwarrated assumptions about.
In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
In article <110cre9$13aa9$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
I see you did not read the other messages in the (sub)thread,
but ok, here it is again, in C:
```
term% cat what.c
#include <stdio.h>
int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
term% clang --version | sed 1q
clang version 22.1.6
term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
| ^
what.c:2:58: note: put the semicolon on a separate line to silence this warning
1 warning generated.
term% ./what
Hello, World!
term%
```
I see the same behavior.
The following largely repeats what I've written previously in
this thread.
Apparently the authors of clang decided that this statement in N3220
6.8.6.p4:
An iteration statement may be assumed by the implementation to
terminate if its controlling expression is not a constant
expression, ...
means that a program that violates that assumption has undefined
behavior. I intensely dislike both the rule and the way it's stated,
but I agree that the conclusion that the behavior is undefined is
a reasonable one.
I think the behavior is technical "unspecified" in the sense of
the C standard, but yes, this is the important bit. The
controlling expresion is not constant, and the loop doesn't meet
any of the other criteria set forth in sec 6.8.6 para 4 for,
therefore, the translator may assume it terminates (it is
unspecified whether or not it does; either behavior is correct.
GCC, for example, appears not to make the same assumption).
Why do you think the behavior is unspecified rather that undefined?
Unspecified behavior is defined as: "behavior, that results from
the use of an unspecified value, or other behavior upon which
this document provides two or more possibilities and imposes
no further requirements on which is chosen in any instance".
(Implementation-defined behavior differs from unspecified behavior
in that the implementation must document how the choice is made.)
What are the "two more more possibilities" in this case?
The two choices are that the implementation may assume the loop
terminates, or it may not, but it doesn't say which. I don't
think that the language permits it to be UB. But I could be
wrong. It's a bit of a distinction without a difference as far
as the outcome is concerned.
- Dan C.
On 2026-06-10 09:04, David Brown wrote:
[...]
The rough equivalent of the distinction between Pascal procedures and
functions is that procedures are like C functions that have "void"
return type. It's fine (and not at all a bad idea) for a language to
distinguish between void and non-void like this. What cannot easily
be done in a clear and consistent way is to distinguish between two
expressions of type "int" (or any other general non-void type).
Here I cannot follow you. - The C-compiler can analyze code to do optimizations and even (as so often stated) "assume" things about
the intent concerning UB and optimization but cannot value facts
about types and context? - If so, then it sounds rather arbitrary.
[...]
It is also fine for a language to distinguish between "pure" functions
and functions/procedures with side-effects and/or functions/procedures
with observable behaviour. (A "pure procedure" would not do anything.)
By "would not do anything" you probably mean that it would not have side-effects on/with relatively global entities in the program?
As far as I remember, Pascal does not make that distinction.
Pascal functions and procedures can affect and be affected by global entities. Predefined functions and procedures can have side effects
also unrelated to global entities in the program (e.g. print effect).
A procedure/function not affecting the global (or surrounding stack) environment could likely be identified. But here we're anyway talking
about the (clean!) return-interface of functions (as opposed to the procedures).
On 11/06/2026 17:34, Janis Papanagnou wrote:
On 2026-06-11 08:56, David Brown wrote:
On 10/06/2026 23:47, Keith Thompson wrote:
[...]
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
[...]
[...]
The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely?
I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.
I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.
David Brown <david.brown@hesbynett.no> writes:
On 11/06/2026 17:34, Janis Papanagnou wrote:
On 2026-06-11 08:56, David Brown wrote:
On 10/06/2026 23:47, Keith Thompson wrote:
[...]
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
[...]
[...]
The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely?
I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.
I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.
I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.
The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.
The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)
[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly >insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always >frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >blame compiler developers for implementing the language as it is defined.
I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >unexpected results is regularly used as "evidence" by those that hold >extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.
David Brown <david.brown@hesbynett.no> writes:
On 11/06/2026 17:34, Janis Papanagnou wrote:
On 2026-06-11 08:56, David Brown wrote:
On 10/06/2026 23:47, Keith Thompson wrote:
[...]
#include <stdio.h>
int main(void) {
bool keep_going = true;
while (keep_going) {
keep_going = true;
}
puts("never reached");
}
[...]
[...]
The loop might originally have contained source code, but become
empty through pre-processing, or from other compiler
transformations (such as the compiler seeing that the "keep_going"
variable is not volatile and its value is never used, so
assignments to it can be elided, or moving other things outside the
loop body).
A programmer /could/ write the "keep_going" loop you gave, and
mistakenly believe it to be infinite. But is it likely?
I think we should not make any assumptions about the "creativity" of
a programmer ("C" or else). - Semantics should be well defined, and
then clear to the programmer.
I think the semantics of this "loops can be assumed to terminate" are
clearly defined in the standard. I agree that the details might not
be known to all C programmers, but I think they are only relevant in a
very small number of cases.
I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
is specified in terms of what an implementation may "assume", not in
terms of the semantics of the program. One can conclude that this
means that the program has undefined behavior if the assumption is
violated, but that's not directly stated. I don't know how many C >programmers know the standard well enough to reach that conclusion.
I'm not even 100% sure it's accurate.
The permission was added in C11 with little fanfare. It's not
mentioned in the list of major changes in the C11 Foreword.
The cases where it applies may be rarer than I had assumed, but
it at least has the potential to break existing code that was well
defined in C99.
The rationale is to provide more opportunities for optimization,
but it's not at all clear (at least to me) that it's particularly
successful. If cases where it can cause problems are rare, then
presumably cases where it's actually useful are rare. (That may
be an oversimplification.)
Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
Might like to have a look at the video
"Garbage In, Garbage Out, Arguing about Undefined Behavior
with Nasal Demons" (2016) by Chandler Carruth.
IIRC it essential takes the point of your friend, but maybe adds
some explanations. At 15' in, it discusses the suggestion to
"define all the behavior". It's for C++, but I think some of it
might apply to C as well. At 24' come some examples.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-06-11 18:30, Waldek Hebisch wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 2026-06-09 03:25, Waldek Hebisch wrote:
[...]
Interesting views. - Thanks.
I think biggest trouble is normal programmers. They already
struggle with current standard text. More formal presentation
could alienate even folks who now are able to explain standard
rules to other programmers.
I'm not sure what "normal programmers" are. From own experience
I can just say that there's a difference between what's "formal"
in a "lawyer's speeches and texts" sense and what's formal in a
mathematical sense. - The C-Standard as had been quoted here is
more of a lawyer's text, with its inherent property of not being
formally (in a mathematical sense) accurate (despite their tries;
in both areas, law and programming language, respectively). It's
thus not necessarily a problem if we'd have a more [mathematical]
formal standard. - Programmers, as I see it, need definite texts.
And rejection of the "lawyer's" sort of texts is not surprising.
That not necessarily affects their acceptance will of more formal
specifications.
You sniped most of what I wrote.
Yes, because I acknowledged it by my above on-line remark already
(and I didn't want to waste space unnecessarily). (No offense!)
I intended to comment just on the one paragraph above, with its
assumption that it may be an inherent problem to programmers.
But this paragraph was closely linked to the text above. Dan Cross
wanted formal semantics and my paragraph was responding to this.
I think that lawyerish style of current C standard is mostly inertia,
and making standard more mathematical would improve it. But giving
formal semantic in the standard would mean significantly bigger
change.
I think biggest trouble is normal programmers.
They already struggle with current standard text.
[...]
On 11/06/2026 20:29, Janis Papanagnou wrote:
[...]
I think this thread is getting difficult to follow - there is a lot of wandering and vagueness (mostly from me, I must admit). So I am not
sure if it is worth pursuing further.
[...]
In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
[snip]
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always
frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your
code - especially if writing correct code gives inefficient results with
the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to
blame compiler developers for implementing the language as it is defined.
Eh...I think those people have a point.
Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.
But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.
Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.
Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.
That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.
I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and
unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.
The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.
And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.
As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.
- Dan C.
On 13/06/2026 14:02, Dan Cross wrote:
In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
[snip]Eh...I think those people have a point.
As for my '"modern compilers are evil" crowd' comment, there are people
(not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are
only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking"
their code that relied on the effects of some kinds of UB. It is always >>> frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >>> code - especially if writing correct code gives inefficient results with >>> the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >>> blame compiler developers for implementing the language as it is defined. >>
Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.
I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements >when you think it could be better.
But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.
It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)
But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to >help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.
Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own >part.
Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.
I think Regehr has made some good points in his writings, but I do not
agree with him on everything.
As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no >"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.
It is the case that in C, there are some kinds of UB that can be quite >subtle. However, you rarely need to risk meeting them. Yes, there are >pitfalls - don't go near them, and they don't matter.
However, it is unfortunately the case that sometimes avoiding UB can be >costly in performance terms. An example would be if you have need of >type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not, >depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of >magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor >monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation >based on UB. I don't know what the "best" answer here is.
Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.
I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.
I am happy knowing that I cannot divide by 0,
or find the square root of a negative number (in the real
domain).
I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,
and that I cannot call a function with a different number or
type of parameters than its definition.
I have a great deal of difficulty seeing how things could be
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.
Having "well-defined semantics" does not mean the language should accept >anything that happens to fit the syntax and grammar rules, or that all >functions and operations should give a defined result for all inputs.
It means that the set of valid inputs is clearly defined, along with the >outputs and effects you get when the inputs are valid.
(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it >could reasonably be.)
That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.
Communication between the separate parties is always an issue, and it is >easy for it to be a one-way street with a language standards committee >dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.
A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less >knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for >different compilers with different unwritten rules? It is not easy to >please everyone.
I am not in any way saying that critics of aspects of C (the language,
the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >>> unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.
The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.
I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.
In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the >toolchains themselves. Sometimes there are also implementation-specific >features that change between versions (though that is less of an issue
these days).
And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.
Being dead does not resolve you of the responsibility - the person that >wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.
It might not be fair to hold it against them - there are a great many >possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful >exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.
As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.
Agreed.
I'm not a huge fan of Carruth.
In article <110k0mp$329k6$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
On 13/06/2026 14:02, Dan Cross wrote:
In article <110ghmv$21vi3$1@dont-email.me>,
David Brown <david.brown@hesbynett.no> wrote:
[snip]Eh...I think those people have a point.
As for my '"modern compilers are evil" crowd' comment, there are people >>>> (not anyone involved in this discussion) who really do fall into that
camp. I've seen people who are experienced and respected developers
make all sorts of accusations to compiler developers, claiming they are >>>> only interested in high scores on synthetic benchmarks and directly
insulting their motivations and integrity, blaming them for "breaking" >>>> their code that relied on the effects of some kinds of UB. It is always >>>> frustrating when you have code that works fine with one compiler
version, but using another compiler results in failure due to UB in your >>>> code - especially if writing correct code gives inefficient results with >>>> the first compiler. And it's fine to say you'd be happier if a
particular thing that is UB in C were not UB - but it is unreasonable to >>>> blame compiler developers for implementing the language as it is defined. >>>
Note, I don't think that "modern compilers are evil" (I mean,
wow, that's a strong word) and I certainly do not think it is
appropriate to malign the people who write them personally over
what one does with code.
I think it is important for tools to be helpful, and it's fine to
complain if a tool is being directly unhelpful - or ask for improvements
when you think it could be better.
Yes.
But I _do_ think it is fair to say that UB is very easy to fall
into in C, that programs that have worked correctly (insofar as
their intended behavior as written) for years can suddenly fail
because latent UB is treated differently in a point revision of
a compiler, and that that (as you point out) can be incredibly
frustrating for the authors.
It can certainly happen, yes. And I fully sympathise on these few
occasions when changes to the standard has meant that code that
previously had defined behaviour, now has different or undefined
behaviour. (However, I think that for some kinds of code, programmers
could be better at specifying exactly what standards their code
requires, and the standards they use when compiling code.)
But it is important to realise that if you write code with UB, it is
/your/ mistake - not the mistake of the compiler developers, or the
mistake of the standards authors. Compiler vendors can (and do!) try to
help programmers find their mistakes - experience shows, however, that
many programmers reach first for bug report forms or complaints in
forums before compiler tools like sanitisers or even enabling warnings
on their builds.
Programming in C is a cooperative effort - including the standards
authors, the compiler vendors, and the C programmers. Each group can
try to help the others, but each is ultimately responsible for their own
part.
Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.
"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.
Even once it was a part of C, the concept was communicated
poorly.
Some people seem to delight in this, believing precision in
interpreting the standard in abstruse ways is an expression of
deep technical expertise; but it really is not.
Yes, UB is created by programmers. However, in large systems,
it may be that it was created inadvertantly; someone makes a
change that subtley invalidates some invariant that an unknown
caller far away in the code base (or in another one that relies
on the change via an indirect dependency) and now you've got UB;
locally, everything appears correct; but it's the combination
where the UB manifests.
Regehr called out a dichotomy with UB: programmers using a
language hate it; compiler writers love it.
I think Regehr has made some good points in his writings, but I do not
agree with him on everything.
As a programmer, I am a fan of the concept of UB. I am quite happy with
the idea that operations have a pre-condition, and that if there is no
"right answer" for a given input, I should not provide that input. I
prefer that signed integer arithmetic overflow is UB, and do not want it
to be wrapping or have some other semantics - to me, it is far clearer
that way. If I have UB in my code, it's a bug - no different from any
other bug I might make.
This example makes little sense to me. If you don't want
integer overflow, then don't overflow; the techniques for
avoiding it are pretty well known. But why is specifically
better that it is UB, rather than than trapping in debug
builds, or having IB semantics based on the underlying machine?
It seems to be that the burden on the programmer is the same.
It is the case that in C, there are some kinds of UB that can be quite
subtle. However, you rarely need to risk meeting them. Yes, there are
pitfalls - don't go near them, and they don't matter.
I disagree. I think almost all non-trivial programs have UB to
a greater or lesser extent, whether they intend to or not.
However, it is unfortunately the case that sometimes avoiding UB can be
costly in performance terms. An example would be if you have need of
type-punning - perhaps you have a float in memory and you want to access
it as an uint32_t for some reason. Casting a float * to an uint32_t *
and using that new pointer is UB. Some compilers will nonetheless
generate the code you want after such a cast. Some compilers might not,
depending on details of the rest of the surrounding code, because it is
UB. A non-UB solution would be to use memcpy(), or a type-punning
union. For highly optimising compilers, that's fine - the code
generated by gcc or clang for a memcpy() here is likely to be as
efficient as you could get - directly reading the float from memory to
an integer register. For other compilers, however, you might get a call
to a memcpy() library function in an external DLL, taking orders of
magnitude more cycles. What is the poor programmer to do? Write code
that is portable and correct, but very slow with some implementations?
Write code that "cheats" and is efficient on some implementations but
might not give the desired results on others? Use pre-processor
monstrosities to detect different compilers and adapt accordingly? That
is what I see as the biggest issue resulting from compiler optimisation
based on UB. I don't know what the "best" answer here is.
This is kind of my point. If you need a fast way to convery
Here's my own vignette: I was chatting with a friend who works
on LLVM and clang some time ago. I said, "I don't want UB" and
he replied, "no, you really do." I asked him what he meant and
he responded that I wanted a compiler that is capable of
optimizing my program; "sure, but I still don't want UB." We
went on for a bit, and it became clear that he saw UB as _the_
vehicle for unlocking optimization.
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.
I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.
UB is literally the opposite of well-defined.
I am happy knowing that I cannot divide by 0,
Yup. That should be a trap.
or find the square root of a negative number (in the real
domain).
Yup. That should be a trap.
I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,
Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).
and that I cannot call a function with a different number or
type of parameters than its definition.
Yup. That should be a compile-time error.
I have a great deal of difficulty seeing how things could be
any different, other than in a managed language with significant
overhead from run-time checks - and that goes against the
"aggressively optimised" requirement.
There are existence proofs of other languages that can, and do,
do these things, and do them well. I hate to keep beating this
drum, but I think Rust does well here: in safe Rust, UB is a
compile-time error; in *unsafe* Rust, there are tools to help
find where programmers violate the language's invariants.
Having "well-defined semantics" does not mean the language should accept
anything that happens to fit the syntax and grammar rules, or that all
functions and operations should give a defined result for all inputs.
I never said that it did.
It means that the set of valid inputs is clearly defined, along with the
outputs and effects you get when the inputs are valid.
So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.
(There are plenty of points in the C standards where the wording could
make the semantics clearer, or where the range of input values could
easily have been larger - I am not suggesting C is as well-defined as it
could reasonably be.)
It's not just that it's nowhere close to being as well-defined
as it should be, it's because the language as defined permits
behavior that varies far too widely, specifically because of UB.
Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.
That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.
This is one reason the committee is trying to reign some of this
in.
That, I think, is the tension: there was a fundamental breakdown
in communication between the users of the language, and those
defining and implementing it. My subjective sense is that in
the past few years things are getting somewhat better, but it is
hard to evolve something as critical and widely used as C.
Communication between the separate parties is always an issue, and it is
easy for it to be a one-way street with a language standards committee
dictating the rules with little attention to feedback, then compiler
vendors following these rules without listening to the users.
A challenge here, perhaps, is that users are a very diverse group. How
much should compiler vendors cater for those that put a lot of effort
into correctness and want top efficiency, or those that are less
knowledgable about the language but want to avoid the consequences of
their mistakes? What about those working with old code written for
different compilers with different unwritten rules? It is not easy to
please everyone.
I think that's simplistic; not many programmers actively want to
"avoid the consequences of their mistakes." Do you really
believe that they do? If so, why?
Conversely, there *is* this kind of machismo attitude among many
C programmers that it requires a superior intellect to truly
understand this language, and those who do not (or who make any
mistake in their understanding) are simply unworthy. I have
repeatedly observed this over many decades now, and when I see
it, I think that it is odious.
My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.
In fairness, I think the current members of the committee
recognize this.
I am not in any way saying that critics of aspects of C (the language, >>>> the standards, or compiler implementations) should be dismissed or
despised - merely that the example of loop elimination leading to UB and >>>> unexpected results is regularly used as "evidence" by those that hold
extreme positions about C, despite it being very unrealistic for the
issue to cause problems in real coding practice.
The kernel I am working on has about 5 million lines of code.
That code has been evolving for 40 years; some of it predates
the ISO standards and even the ANSI standard. It has been
updated for newer compilers, sure, but in some places the
treatment is surface-level: using ISO-style function prototypes
and definition syntax, for example. But deep problems remain in
parts, and contraints on engineering resources couple with
economic and business pressures so that it's not going to get
cleaned up any time soon. I'm sure there is UB in it; in fact,
I know there is. But them's the breaks; and yet, customers are
using it in production. Because of this, upgrading toolchains
is laborious and complex, and takes a lot of time, and new
compilers are (rightly) viewed with suspicion. That is not a
great situation, but I don't think anyone is angry at the
compiler people over it.
I think that is a good way to handle the situation. In my projects, I
do not normally upgrade or change toolchains. While I think the risk of
UB is small in my own code, small does not mean non-existent. And for
my work, generated code that behaves correctly in terms of C semantics
but has different execution times or code size might also be an issue -
so changes in toolchains mean a lot of extra testing and qualification.
Obviously in a production setting tools should be tested and
qualified. But the danger posed by UB adds unacceptable risk on
large projects, and the burden for updating a toolchain is too
high. That is as much an indictment of the language as of any
particular project.
As a counter example, there was the Harvey project, which was a
fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
C; we accounted for this by having CI build with 6 seperate
compilers; this flushed out a lot of bugs.
I am surprised that more projects do not adopt canary CI builds
against newer toolchains.
In addition, for some microcontrollers the toolchains have relatively
small user bases and consequently higher risks of unknown bugs in the
toolchains themselves. Sometimes there are also implementation-specific
features that change between versions (though that is less of an issue
these days).
Fun fact: part of the reason Google got involved in clang and
LLVM development was because the vendor toolchain for a
particular microcontroller used in android phones was buggy and
would crash (that is, the compiler itself crashed). The
solution was not to live with it; it was to build a better
toolchain.
Google could afford to do that; I recognize not many
organizations can.
And just as it's not acceptable to blame compiler writers for
implementating the language as it is defined, it's not really
acceptable to blame programmers either; some of the people who
put the UB there are (literally) dead, and there's just not
enough time in the day to go clean it all up. I wish there was
more compassion for that.
Being dead does not resolve you of the responsibility - the person that
wrote the code with UB is the person who wrote the code with the UB,
just like any other bugs. That person wrote the code with the error.
See above. Those people may well have written the code before C
was standardized and before UB as we know it now existed. Also,
by definition UB is not an error.
It might not be fair to hold it against them - there are a great many
possible reasons why it was not their fault (typically management is
more at fault than the coders!). And placing blame is rarely a useful
exercise - usually it does not matter where the bugs came from, only
that they are there and need to be fixed or worked around.
Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.
_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.
The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.
Who's fault is that?
And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.
As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.
Agreed.
...but be careful blaming the programmer.
cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
I'm not a huge fan of Carruth.
(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)
(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)
Chandler talked about how narrow contracts allow optimizations.
| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable.
UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen.
Making this UB is an admission of the blindingly obvious - there is no correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.
I am happy knowing that I cannot divide by 0,Yup. That should be a trap.
For some programs, yes. For others, no.
I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for
it.
David Brown <david.brown@hesbynett.no> writes:
[...]
UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen.
No, it means that the implementation can make that choice (or allow you
to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
programmer any options.
[...]
Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.
Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious.
And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.
Of course those kinds of checks are not in the "spirit of C".
[...]
I am happy knowing that I cannot divide by 0,Yup. That should be a trap.
For some programs, yes. For others, no.
What's the difference between these programs?
[...]
I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for
it.
I don't want to pay the price of checking for syntax errors when I know
my code is syntactically correct. But I never know that, because I'm fallible.
I admit that's not a very strong argument. There are real differences between compile-time and run-time checks.
On 15/06/2026 00:55, Keith Thompson wrote:<snip>
David Brown <david.brown@hesbynett.no> writes:
[...]
Making this UB is an admission of the blindingly obvious - there is no
correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.
Trapping or raising/throwing an exception on overflow would also be an
admission of the blindingly obvious.
It is obvious - to me, anyway - that signed overflow is a mistake in the code. It is trying to do something that cannot be done. What is the single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.
Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.
And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.
Sure - but in practice having strict overflow checks would significantly reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.
The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.
David Brown <david.brown@hesbynett.no> wrote:
On 15/06/2026 00:55, Keith Thompson wrote:<snip>
David Brown <david.brown@hesbynett.no> writes:
[...]
Making this UB is an admission of the blindingly obvious - there is no >>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and
it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from
known good code rather than adding unnecessary run-time checks that
are never triggered.
Trapping or raising/throwing an exception on overflow would also be an
admission of the blindingly obvious.
It is obvious - to me, anyway - that signed overflow is a mistake in the
code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.
Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a
problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.
And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.
Sure - but in practice having strict overflow checks would significantly
reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.
IMO resonable and easy definition is: computation either delivers mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.
<snip>
The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.
What is value of certification required for some software? If
programmer did good job then program will work correctly.
Trap give assurance that programmer indeed correctly handled
tricky problem.
And once you know that computation works
according to math rules other forms of verification are easier.
You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.
In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).
ram@zedat.fu-berlin.de (Stefan Ram) writes:
cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
I'm not a huge fan of Carruth.
(Text after "| " below was generated by a chatbot asked to explain
narrow contracts and the reduction of efficiency by defining UB.)
(Let me guess: You are not a huge fan of chatbots either!
Ok, that was easy.)
Chandler talked about how narrow contracts allow optimizations.
| - Wide Contract: The function guarantees to handle all possible inputs
| gracefully, usually by returning an error code or throwing an
| exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
|
| - Narrow Contract: The function only guarantees correct behavior if
| the caller meets specific preconditions. If the preconditions are
| violated, the behavior is undefined.
|
| When is it appropriate to have a narrow contract? Always, when
| performance, memory footprint, or direct hardware control are
| paramount. In operating system kernels, embedded systems, real-time
| applications, and high-performance computing, the overhead of
| validating every pointer, checking every array bound, and verifying
| every integer range is unacceptable.
I have a recollection that a version of IBM's MVS operating
system did, indeed, validate input and output arguments to kernel
functions.
Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).
On 15/06/2026 12:43, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 15/06/2026 00:55, Keith Thompson wrote:<snip>
David Brown <david.brown@hesbynett.no> writes:
[...]
Making this UB is an admission of the blindingly obvious - there is no >>>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and >>>>> it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from >>>>> known good code rather than adding unnecessary run-time checks that
are never triggered.
Trapping or raising/throwing an exception on overflow would also be an >>>> admission of the blindingly obvious.
It is obvious - to me, anyway - that signed overflow is a mistake in the >>> code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3,
or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.
Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a
problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in
every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.
And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.
Sure - but in practice having strict overflow checks would significantly >>> reduce optimisation and re-arrangement possibilities, as well as having
to include the checks themselves. You might allow non-strict checks in
some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.
IMO resonable and easy definition is: computation either delivers
mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.
It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.
In C, the integer addition operation "c = a + b;" has a precondition :
(a + b) <= INT_MAX, (a + b) >= INT_MIN
It has the postcondition :
c == a + b
Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.
It means it is no longer true that the result of an addition operation
is the sum of the operands.
Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.
If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.
Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and untestable.
That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.
<snip>
The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then
what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.
What is value of certification required for some software? If
programmer did good job then program will work correctly.
Yes.
Trap give assurance that programmer indeed correctly handled
tricky problem.
No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the programmer did /not/ handle the problem correctly.
And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error message" handler), then he/she would be able to avoid the problem in the first place.
Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.
And once you know that computation works
according to math rules other forms of verification are easier.
You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.
I think if you are /not/ concerned with high efficiency in the code,
then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for integer overflow is to make more use of bigger types.
In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).
[snip]
Maybe there is scope for compilers to have better options for handling
old code, other than the usual "Use -O0 to avoid optimising on UB"
solution. You could come a long way with a "treat all variables as >volatile" flag, for example.
[snip]
That can certainly happen. But that's just bugs in the code. I don't
see why UB should be considered as something special here.
People
making changes to existing code sometimes misunderstand things, or >accidentally break something that worked before. That's life as a >programmer, and there are techniques to reduce the risk - code reviews, >linters, testing regimes, etc. Nothing gives 100% guarantees, and >everything has to weigh risks, consequences, costs and resources. UB is
not special here.
UB means precisely that I can choose trapping, or IB, or optimising on
the assumption it does not happen. If signed integer overflow were
defined as wrapping, then compilers could not put in traps to catch the >errors because as far as the language is concerned, they are not errors.
If they are defined as causing traps, then that's the semantics -
compilers could not optimise code assuming overflow does not happen,
unless it can prove there is no overflow.
And making it defined behaviour gives programmers the mistaken idea that >they don't need to avoid overflow because there is no UB.
Making this UB is an admission of the blindingly obvious - there is no >correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and it >allows tools to help programmers avoid these mistakes, and it allows >compilers to give programmers the most efficient results from known good >code rather than adding unnecessary run-time checks that are never >triggered.
[snip]
(I think you missed a bit of your answer here?)
[snip]
I realized that we were not speaking the same language _at all_.
He and I both wanted a language where we could write programs
that yield efficient object code. He saw UB as essential for
that; but what I want is a language with well-defined semantics
that can be aggressively optimized.
I too want a language with well-defined semantics that can be
aggressively optimised. But I do not see UB as a hinder to that.
UB is literally the opposite of well-defined.
I want good definitions of things that should be defined. Things that >cannot have good definitions, are fine left undefined. A language
standard should not be trying to define the behaviour of /everything/.
I am happy knowing that I cannot divide by 0,
Yup. That should be a trap.
For some programs, yes. For others, no.
or find the square root of a negative number (in the real
domain).
Yup. That should be a trap.
For some programs, yes. For others, no.
I am happy knowing that I cannot add two ints if their sum
overflows the range of their type,
Yup. That should be a trap (if you want wrapping semantics, you
should request it explicitly).
I agree that wrapping semantics should be something you have to ask for.
(As an aside, I think it is a mistake for languages to have types that
have wrapping semantics - it's the operations that should wrap, not the >types. Zig gets it right by distinguishing between "x + y" and "x +% y".)
I don't want to pay the price for checks, traps, and limited
re-arrangements and optimisations when I know my expressions don't
overflow. But I am also happy to be able to get a trap when I ask for it.
But I think it is equally bad to give things a definition simply to be
able to say there is no UB.
It is, IMHO, entirely /wrong/ of a language
to define integer overflow as wrapping simply so that it is not UB. I
do not see a guaranteed incorrect result that likely has catastrophic >consequences in a program as being better than UB.
(I believe Rust
defines integer overflow as trapping in "debug" mode and wrapping in >"release" mode, which I think is a horrendous idea.)
So I was the one who said "well-defined semantics" and I had a
specific meaning in mind. Your definition is incomplete with
respect to that meaning: in addition to what you said, invalid
inputs should be rejected, either as a compile time error, or by
generating an exception or panic at runtime. If you want to
live dangerously and turn the runtime checks off for performance
reasons, then you get 2's complement behavior for integers or
whatever the machine does for the others.
I am all in favour of compile-time checks and rejecting code with errors >(not just UB) as soon as possible. The "perfect" language is one where
you really can follow the old Ada saying - if you can make it compile,
it's ready to ship.
I don't live dangerously by not having run-time checks on integer
overflows. I make sure my code does not have them, so checks are >unnecessary. For some of my code, if it "panicked" somewhere in >calculations, that would be a disaster - when you have code controlling >power electronics, a sudden stop can mean short-circuits and components >releasing their magic grey smoke.
Thinking that run-time checks will save you from UB is wishful thinking.
How are you going to have run-time checks that a pointer parameter
points to a valid object of the right type?
You can check for a
null-pointer, but that's about it. Some things that are potential UB in
C are inherent in the type of language - checking for such problems (at >compile-time or run-time) needs a language that has a different way of >handling objects and pointers so that you cannot have arbitrary pointers
to arbitrary objects.
C is not a language suitable for such run-time or compile-time checks -
it is a language for getting the highest efficiency because the
programmer takes responsibility for getting things right.
You are
correct that large programs normally have bugs (of which UB is just one >class) - the risk of bugs goes up with the size of the code base. The >corollary is that C is not a language suitable for large programs.
Rust, I think, reduces the risk of some kinds of bugs. So does C++,
when used carefully. Most code, however, is best written in languages
where these issues cannot occur - or at least where checks can be done >without a measurable impact. For example, if you use Python, you never
have integer overflow, and you never have invalid pointers.
[snip]
Consider one of the examples you gave: signed integer overflow.
The standard doesn't say that you _can't_ add two numbers
together if you overflow, it just says that if you do, the
language imposes no requirements on the resulting behavior. It
may trap, it may elide the addition entirely, or it may do it
and let the result be whatever the underlying machine does.
That is, the _language_ does not say that it's a bug; it says
that it's not going to say anything about it at all.
I'd be happy for the C standard to say that signed integer overflow is a >bug, or that code is not allowed to overflow its integer arithmetic. I >would not be happy if it said compilers must trap on the bug or handle
it in some specific way - what happens when a bug is reached is still
UB. And if the wording of the standard were changed to call it a "bug" >rather than "UB", it would make absolutely zero difference to the way I >write my code.
[snip]
In my field, people usually put a lot of effort into writing code simply
and clearly. You avoid mistakes not by being "clever", but by being >meticulous and careful. I don't think successful C programming requires >greater intellect, knowledge or experience compared to other programming >languages - but it /does/ require an appropriate attitude. You are
working with sharp knives - pay attention to what you are doing, and
you'll be fine.
My experience is that most programmers are highly intelligent,
capable people. They are not wrong to want behavior they can
rely on, particularly when things are not obvious, as they
often are not. They also want a language that requires a less
lawyerly read of to understand its semantics; that could go the
way of formality (my preferred approach) or just clearer
exposition. Either would be preferable to the current state.
I was avoiding signed integer overflow long before I had read any C >standards or even knew about the term "UB". Programming in C does not
need a lawyer knowledge of the language. It is just like programming in
any other programming language - use features that you know are correct,
and if you want to do something and don't know how to do so correctly,
look it up.
[snip]
Exactly. The footguns hiding in C code that has worked
perfectly for decades, dating back to before the standards
existed, are legion. Caveat emptor.
_Or_ the code may have been written with careful regard for the
standard, but something _else_ may have been changed that now
leads to exposure to UB. For example, perhaps code was written
that multiples two numbers, `a*b`; a known to be `unsigned int`
when written, but `b` is a signed int. But maybe that is hidden
behind a typedef; some time in the future, the typedef is
changed so that `a` is now `unsigned short`; perhaps someone
realized that the domain values never exceed 16 bits and by
changing the definition some critical structure now fits in a
single cache line. But also now the type promotion rules kick
so that `a*b` happens with the factors as `signed int` and in
there exist values of `a` and `b` where `a*b` overflows: UB.
The code had no UB; the change was elsewhere; no one saw this
because the tests all passed and everything looked ok; then
someone upgrades the compiler and now things break.
Who's fault is that?
There's no simple answer here.
But one thing is clear to me - "UB" is irrelevant here (and in many of
your points). It would not matter if everything had fully defined >behaviour. The point is that something is changed in one part of the
code that has unexpected consequences in another part of the code. Who >cares if there is UB or not? The issue is that the code does not work
as intended or expected. UB can provide situations where you have >unexpected bugs - but so can all sorts of other things.
And no, this is not contrived; this is exactly the sort of thing
that happens on large, long-lived projects.
As said earlier, C is what it is. I suspect that it will
continue to make incremental improvements, but we're basically
stuck with what we have.
Agreed.
...but be careful blaming the programmer.
Or the language, or the tools.
Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.
"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.
On 14/06/2026 16:33, Dan Cross wrote:
...
Here's the problem that I have with this line of reasoning. C
is a language that has considerable history; there was a large
body of C code written before the first standard was ever
created, in 1988; C was a teenager. And it took many years for
decent quality ANSI C compilers to be ubiquitous. C could
legally drink by then.
"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard. That
means that there is --- still --- a large body of software that
has "UB" that was put there before UB existed as a thing
programmers needed to worry about in C.
"undefined behavior", defined as "behavior ... for which this
international standard imposes no requirements" Was introduced by the
first standard. However, before there was a standard there was K&R C,
the closest thing they had to a standard. And though the phrase
"undefined behavior" was not in use, there was "behavior for which K&R C >imposes no requirements". In fact, there was a great deal more of it,
since K&R C was not written as carefully and precisely as the first
standard, so it left a great deal more behavior that was "undefined by >omission of any relevant definition" than there was in the first standard.
David Brown <david.brown@hesbynett.no> wrote:
On 15/06/2026 12:43, Waldek Hebisch wrote:
David Brown <david.brown@hesbynett.no> wrote:
On 15/06/2026 00:55, Keith Thompson wrote:<snip>
David Brown <david.brown@hesbynett.no> writes:
[...]
Making this UB is an admission of the blindingly obvious - there is no >>>>>> correct answer when signed integer overflow occurs. It tells
programmers that it is a mistake to let your arithmetic overflow, and >>>>>> it allows tools to help programmers avoid these mistakes, and it
allows compilers to give programmers the most efficient results from >>>>>> known good code rather than adding unnecessary run-time checks that >>>>>> are never triggered.
Trapping or raising/throwing an exception on overflow would also be an >>>>> admission of the blindingly obvious.
It is obvious - to me, anyway - that signed overflow is a mistake in the >>>> code. It is trying to do something that cannot be done. What is the
single-digit sum of 5 and 8? There is no answer. The answer is not 3, >>>> or 9. Putting your hand in the air and asking the teacher for help
might be appropriate sometimes, but it is not a correct answer.
Throwing some kind of exception or trap can definitely be helpful at
times. And I agree that it would make it obvious that there has been a >>>> problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not >>>> the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in >>>> every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.
And a sufficiently clever compiler
can omit some (not all) checks in cases where it can be statically
proved that overflow doesn't occur, and/or hoist some checks out of
loops.
Sure - but in practice having strict overflow checks would significantly >>>> reduce optimisation and re-arrangement possibilities, as well as having >>>> to include the checks themselves. You might allow non-strict checks in >>>> some manner (thus allowing optimisations like "a + b - a" reducing to
just "b"), but I think that might be hard to specify and would reduce
the debugging help of the checks.
IMO resonable and easy definition is: computation either delivers
mathematically correct result or traps, and it is not allowed to
trap in cases where naive bottom-up evaluation does not trap.
In more formal way optimization is not allowed to introduce
stronger precondition, but may weaken it.
It is always the case that an implementation can weaken preconditions
and strengthen postconditions and remain correct - though it might then
be less efficient than you expect. But if you are /requiring/ a weaker
precondition and /requiring/ a strong postcondition - such as by
insisting on traps on overflow - you are changing the function or
operation specification, and it is not necessarily a good thing.
In C, the integer addition operation "c = a + b;" has a precondition :
(a + b) <= INT_MAX, (a + b) >= INT_MIN
It has the postcondition :
c == a + b
Saying that it must trap if there is overflow weakens the precondition
to any "a" and "b", but makes the postcondition much more complicated.
No. Precondition is the same. Postcondition has additional term "computation finished with no traps".
It means it is no longer true that the result of an addition operation
is the sum of the operands.
Oposite of that: no traps means that regardless of precondition
the result of an addition operation is the sum of the operands.
Addition is no longer a "pure" function -
now it has side-effects that are completely unpredictable at the site of
use. Programmers can no longer rely on the timing of the operation,
stack usage, interaction with other code, or even that the operation
ever finishes.
The difference is that without traps programmers do not know if
arithmetic operations give correct result.
With traps they do
not know if program will successfully finish, but if it
finishes they know that arithmetic gave correct results.
If your code is correct, and overflow never happens, then this is all a
big disadvantage in terms of understanding and analysing the code. And
it does not in any way reduce the effort needed to be sure that your
inputs are appropriate for getting the desired results of the operation.
One needs to use correct formulas, there is no way around that.
Without traps programmer must analyse ranges of all intermetiate
expressions. That is tedious and error prone.
People work
around that by activating traps during testing, but it is
quite hard to find worst case values, so errors may be
easily missed during testing. Having traps active during
production runs means that you may discover problem. You
apparently think that ignoring possible problems at
runtime is good thing.
For simple programs you may analyze
it well enough to be sure that nothing bad happens at
runtime, but in general computing we use a lot of "interesting"
programs which are too complex to analyse. We hope that
they will run OK, but have no proof. Sometimes hope is
based on statistical tests and on low probability input
program may fail. Traps are useful to make sure that
wrong results will not propagate further.
Trapping like this can certainly be useful for debugging. But as a
general feature it gives a false sense of security, complicates
mathematical analysis, introduces massive additional possible code path
choices which are either real or almost certainly untested in practice,
or not real (because the compiler can see they are not taken) and
untestable.
You get extra code paths only if you attempt to handle traps.
Trapping of overflows gives you assurance that in computation that
you did and which finished with no traps there were no errors of
certain kind (that is wrong results due to overflow). That is
really not different than insistence on static types.
Neither
assures you of no bugs, but each tells you that some bugs
did not happen. Of course, trapping at runtime is less
satisfactory than compile time checking, but tight a priori
bounds on ranges are notoriusly hard to obtain, so trapping
is the best we can have for high performance software with
current state of art.
That is not qualitatively worse than "who knows what will
happen" UB, but it is not significantly better.
<snip>
The correct way to handle the situation is to avoid it - be sure that
you are not dividing by zero in the first place. Identify and handle
the problem where it occurs - when this zero is created, or the
circumstances leading to that point - rather than trying to do a
post-mortem after the failed division. And if you are doing that, then >>>> what benefit is there in having trapping for division by zero? It
becomes just a waste of effort.
What is value of certification required for some software? If
programmer did good job then program will work correctly.
Yes.
Trap give assurance that programmer indeed correctly handled
tricky problem.
No, it certainly does not. And one of the reasons to dislike traps is
that it makes people think like that. A trap can only happen if the
programmer did /not/ handle the problem correctly.
Yes.
And I expect that if
the programmer is able to write an appropriate specific trap handler for
the failing expression (rather than a program-global "crash with error
message" handler), then he/she would be able to avoid the problem in the
first place.
Rather non-specific trap handler could work as "redo the computation
in arbitrary precision". If problem (like division by zero) persists,
then there is logic bug, otherwise it means that precision was
inadequate and problem is resolved.
Howver, you should think about such traps similarly to parity error
which can be signaled by some hardware. There is low but nonzero
probablity that such error can occur. Parity check gives you
reasonable chance to detect it.
Handling is at least as problematic
as with overflow. Absence of traps gives you less info: no
overflow traps mean no overflow, no parity traps means that
parity was correct, but intent of parity check it to discover bit
error and they are possible even with correct parity. So, do you
think that parity check inside MCU-s are useless?
Sometimes, of course, you are trying to write code that has some input
which is supposed to be correct, but you are not sure - and you can't
change the calling code. How you handle that situation will depend on
the program and the situation. But I don't see trapping as "correct
handling" unless the whole program is written with the expectation of
traps for error handling. You might, however, end up deciding that
trapping is the least bad option.
And once you know that computation works
according to math rules other forms of verification are easier.
You also seem to have bias to real time control: if you need
value just at given moment, then it is hard to do something
reasonable. But at least in some control areas there is
notion of "safe state", for example working heavy machine
is dangerous, stopped one usually is considerd safe. If
there is safe state, then anything not expected by program
should trigger transition to safe state.
I think if you are /not/ concerned with high efficiency in the code,
Well, if efficiency does not matter traps can be implemented as
a software layer above the language. Or one can use arbitrary
precision arithmetic. Traps matter when efficiency matters,
so they should be implemented in place giving best efficiency,
at best in CPU and if that is not possible then in optimizing
compiler.
then you should be seriously questioning the choice of C as the language
in the first place. And even if you use C, there are often things you
can do to avoid having problems in the first place. The obvious one for
integer overflow is to make more use of bigger types.
Which may be best choice if efficiency is not important. But
some calculations require surprisingly large accuracy to avoid
overflow. Worse, in vast majority of cases lower accuracy
may be adequate, so there is pressure to use "sufficient"
accuracy overlooking special cases.
In general computation, if you need correct value and have some
time there are options which may involve re-doing computation at
higher precistion, which may get rid of occasional overflows
and divisions by zero due to overflow. Division by zero may
be due to bad input data, traps allow indentification of
such data (doing it in other way may be computationaly quite
expensive).
One might also define data structures for control and status
registers using bitfield structs.
e.g. for the SATA UAHC_GLB_OOBR register:
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
Dan Cross <cross@spitfire.i.gajendra.net> wrote:
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just
because it can't prove that there will be no UB. If it could,
every signed or floating-point arithmetic operation with unknown
operand values would grant the same permission.
But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.
In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.
So there are two things that are at play here.
First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.
I think that this paragraph (and several other it this post and
other posts) represent fundamental misanderstanding. This may
be due to the way C standard is written. AFAIK Extended Pascal
standard (once you translate terminalogy) states the same things as
C about UB, but in clearer way. Some relevant parts below:
: 3.1 Dynamic-violation
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected up to,
: but not beyond, execution of the declaration, definition, or
: statement that exhibits (see clause 6) the dynamic-violation.
: 3.2 Error
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected.
...
: 5.1 Processors
...
: e) be able to determine whether or not the program violates any
: requirements of this International Standard, where such a
violation is : not designated an error or dynamic-violation,
...
: 5.2 Programs
...
: b) if it conforms at level 1, use only those features of the
language : specified in clause 6;
UB in C standard corresponds with 'error' in Pascal standard. [...]
I think that lawyerish style of current C standard is mostly
inertia,
and making standard more mathematical would improve it.
But giving formal semantic in the standard would mean
significantly bigger change.
antispam@fricas.org (Waldek Hebisch) writes:
[...]
I think that lawyerish style of current C standard is mostly
inertia,
I wouldn't use a term like lawyerish to describe the text in the
ISO C standard. Can you explain what quality you mean to ascribe
to "lawyerish" writing in the C standard without using any term
related to lawyering or legal documents?
and making standard more mathematical would improve it.
Could you elaborate on that statement? In what ways would giving
a more mathematical treatment of C semantics improve the quality
of the ISO C document? How would doing that advance the stated
purposes or goals of the C standard?
But giving formal semantic in the standard would mean
significantly bigger change.
Due to the nature of C, I believe it is effectively impossible to
give a formal mathematical definition of the semantics of C. Do
you think such a thing is feasible or practicable? If so can you
explain the reasoning behind your thinking?
antispam@fricas.org (Waldek Hebisch) writes:
Dan Cross <cross@spitfire.i.gajendra.net> wrote:
In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
and in fact
it *won't* occur during execution because foo() isn't called.
A compiler can't generate code with arbitrary behavior just
because it can't prove that there will be no UB. If it could,
every signed or floating-point arithmetic operation with unknown
operand values would grant the same permission.
But that's not the situation here. The situation is that the
compiler can prove that something _is_ UB.
In the program quoted at the top of this post, the UB occurs in
a function foo() that's never called. A compiler can replace the
body of foo() with a trap, and it can certainly warn about the UB,
but I don't believe it can reject the entire program. A clever
compiler could prove that the UB never occurs.
So there are two things that are at play here.
First, this notion that UB is _only_ a runtime matter. The text
of the standard contradicting that aside, if a translator can
detect that the behavior of a construct is provably undefined if
executed, then it seems axiomatic that UB is clearly something
that plays a role at translation time, as well.
I think that this paragraph (and several other it this post and
other posts) represent fundamental misanderstanding. This may
be due to the way C standard is written. AFAIK Extended Pascal
standard (once you translate terminalogy) states the same things as
C about UB, but in clearer way. Some relevant parts below:
: 3.1 Dynamic-violation
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected up to,
: but not beyond, execution of the declaration, definition, or
: statement that exhibits (see clause 6) the dynamic-violation.
: 3.2 Error
: A violation by a program of the requirements of this International
: Standard that a processor is permitted to leave undetected.
...
: 5.1 Processors
...
: e) be able to determine whether or not the program violates any
: requirements of this International Standard, where such a
violation is : not designated an error or dynamic-violation,
...
: 5.2 Programs
...
: b) if it conforms at level 1, use only those features of the
language : specified in clause 6;
UB in C standard corresponds with 'error' in Pascal standard. [...]
Does it? In C a syntax error is undefined behavior, but it
requires a diagnostic. (I don't mean to single out just syntax
errors; there are other examples.)
scott@slp53.sl.home (Scott Lurndal) writes:
One might also define data structures for control and status
registers using bitfield structs.
Yeah. This kind of application (among others) I consider one of
the motivating forces behind bitfields.
[Some whitespace trimming done in the excerpt below.]
e.g. for the SATA UAHC_GLB_OOBR register:
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
To me it seems kind of goofy to use uint32_t for the bitfields type.
I would just use unsigned, which is just as sure to work as intended,
isn't it?
On 21/06/2026 23:13, Tim Rentsch wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
One might also define data structures for control and statusYeah. This kind of application (among others) I consider one of
registers using bitfield structs.
the motivating forces behind bitfields.
[Some whitespace trimming done in the excerpt below.]
e.g. for the SATA UAHC_GLB_OOBR register:To me it seems kind of goofy to use uint32_t for the bitfields type.
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
I would just use unsigned, which is just as sure to work as intended,
isn't it?
Size-specific types are almost always the best choice for situations
like this.
When you are using bitfields simply as a way to pack small bits of
data more efficiently, you use whatever style of type fits best with
your needs - consistency with the rest of the code, making the sizes independent of the target, making the sizes adjust according to the
target, maximal portability across compilers and standards version -
whatever you like.
But when you are using them to fit to an existing externally defined structure, fixed-size types are a big advantage (for the whole struct,
not just the bitfields). It is easier to see that the structure is
correct because you are explicit about the sizes. Types like
"uint32_t" have the advantage that they are not portable to targets
that can't support them - as it is likely that you would need to write
such code somewhat differently for it to work on a machine that does
not have such types, causing a compile-time error is useful.
And when the structures represent hardware registers, such as here,
you have additional motivation - these registers are typically
accessed with volatile accesses, and you often want to be sure of the
exact size of the accesses. That is always up to the implementation,
but the norm is that when your bitfields are of a given size,
generated volatile accesses for them use that matching size.
So "uint32_t" says /precisely/ what the code author wants to say for
the type. "unsigned" does not. "uint32_t" is appropriate regardless
of the target and the choice of standard integer sizes - "unsigned" is
not.
scott@slp53.sl.home (Scott Lurndal) writes:
One might also define data structures for control and status
registers using bitfield structs.
Yeah. This kind of application (among others) I consider one of
the motivating forces behind bitfields.
[Some whitespace trimming done in the excerpt below.]
e.g. for the SATA UAHC_GLB_OOBR register:
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
To me it seems kind of goofy to use uint32_t for the bitfields type.
I would just use unsigned, which is just as sure to work as intended,
isn't it?
[snip]
uint32_t x;
says precisely that x is 32 bits, unsigned, with no padding bits. But
uint32_t bf : 1;
is meaningfully different from
unsigned bf : 1;
only because in most implementations (and ABIs), the underlying type of
a bit field affects the layout of the entire structure.
I accept that this is the case, but it's never made any sense to me, and >there's no hint of it in the C standard.
For example, if I write:
uint64_t bf : 1;
then the containing struct is typically at least 64 bits, even though
those other 63 bits aren't part of the bit field and other members can
be allocated within them.
It would make a lot more sense *to me* if an N-bit bit field were simply
N bits.
(And of course int, signed int, unsigned int, and bool are the only
portable types for bitfields -- but if you're using bit fields, it's
likely that portability isn't your only priority.)
David Brown <david.brown@hesbynett.no> writes:
On 21/06/2026 23:13, Tim Rentsch wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
One might also define data structures for control and statusYeah. This kind of application (among others) I consider one of
registers using bitfield structs.
the motivating forces behind bitfields.
[Some whitespace trimming done in the excerpt below.]
e.g. for the SATA UAHC_GLB_OOBR register:To me it seems kind of goofy to use uint32_t for the bitfields type.
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */ >>>> uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */ >>>> uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */ >>>> uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */ >>>> #else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
I would just use unsigned, which is just as sure to work as intended,
isn't it?
Size-specific types are almost always the best choice for situations
like this.
When you are using bitfields simply as a way to pack small bits of
data more efficiently, you use whatever style of type fits best with
your needs - consistency with the rest of the code, making the sizes
independent of the target, making the sizes adjust according to the
target, maximal portability across compilers and standards version -
whatever you like.
But when you are using them to fit to an existing externally defined
structure, fixed-size types are a big advantage (for the whole struct,
not just the bitfields). It is easier to see that the structure is
correct because you are explicit about the sizes. Types like
"uint32_t" have the advantage that they are not portable to targets
that can't support them - as it is likely that you would need to write
such code somewhat differently for it to work on a machine that does
not have such types, causing a compile-time error is useful.
And when the structures represent hardware registers, such as here,
you have additional motivation - these registers are typically
accessed with volatile accesses, and you often want to be sure of the
exact size of the accesses. That is always up to the implementation,
but the norm is that when your bitfields are of a given size,
generated volatile accesses for them use that matching size.
So "uint32_t" says /precisely/ what the code author wants to say for
the type. "unsigned" does not. "uint32_t" is appropriate regardless
of the target and the choice of standard integer sizes - "unsigned" is
not.
uint32_t x;
says precisely that x is 32 bits, unsigned, with no padding bits. But
uint32_t bf : 1;
is meaningfully different from
unsigned bf : 1;
only because in most implementations (and ABIs), the underlying type of
a bit field affects the layout of the entire structure.
I accept that this is the case, but it's never made any sense to me, and there's no hint of it in the C standard.
For example, if I write:
uint64_t bf : 1;
then the containing struct is typically at least 64 bits, even though
those other 63 bits aren't part of the bit field and other members can
be allocated within them.
It would make a lot more sense *to me* if an N-bit bit field were simply
N bits.
(And of course int, signed int, unsigned int, and bool are the only
portable types for bitfields -- but if you're using bit fields, it's
likely that portability isn't your only priority.)
scott@slp53.sl.home (Scott Lurndal) writes:
One might also define data structures for control and status
registers using bitfield structs.
Yeah. This kind of application (among others) I consider one of
the motivating forces behind bitfields.
[Some whitespace trimming done in the excerpt below.]
e.g. for the SATA UAHC_GLB_OOBR register:
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
To me it seems kind of goofy to use uint32_t for the bitfields type.
I would just use unsigned, which is just as sure to work as intended,
isn't it?
In article <86h5mv8umk.fsf@linuxsc.com>,
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
One might also define data structures for control and status
registers using bitfield structs.
Yeah. This kind of application (among others) I consider one of
the motivating forces behind bitfields.
[Some whitespace trimming done in the excerpt below.]
e.g. for the SATA UAHC_GLB_OOBR register:
union UAHC_GBL_OOBR {
uint32_t u;
struct UAHC_GBL_OOBR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint32_t we : 1; /**< R/W/H - Write enable. */
uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
#else
uint32_t cimax : 8;
uint32_t cimin : 8;
uint32_t cwmax : 8;
uint32_t cwmin : 7;
uint32_t we : 1;
#endif
} s;
};
To me it seems kind of goofy to use uint32_t for the bitfields type.
I would just use unsigned, which is just as sure to work as intended,
isn't it?
No. There are issues of alignment and padding one must consider
when using bitfields to model hardware registers, particularly
if (say) a device driver is meant to be shared across ISAs.
There are other hardware registers in our implementation of the SATA controller that are defined as 64-bit registers, for those we use
uint64_t (rather than relying on 'unsigned long' for 64-bit linux
or 'unsigned long long' for 32-bit OS - and this code was designed
to be compiled for both 32-bit and 64-bit targets originally).
On 15/06/2026 00:55, Keith Thompson wrote:(I don't expect the complete investigation report on the Ariane 5
[...][...]
Throwing some kind of exception or trap can definitely be helpful at times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
problems (the Ariane 5 failure was caused by the exception handler, not
the overflow fault). That does not mean it is better to ignore
overflows - it means there is no appropriate action that is suitable in every situation. I am far from convinced that there is even a
reasonable choice of default action that could be usefully made.
[...]
On 15/06/2026 19:57, Waldek Hebisch wrote:
[...]
No, ignoring problems is never a good thing. Writing code that doesn't
run the risk of problems is a good thing.
And I can agree that sometimes leaving traps enabled in released code
can be helpful - there are situations where you can't practically remove
the risk of overflows, and it is better to crash out reliably than risk running on with faulty data. It is, however, also the case that
sometimes traps will cause far more problems than incorrect data would. (Noting that UB does not guarantee "incorrect data" - it can do
anything. Wrapping semantics, or unspecified value semantics, would do that.)
[...]
On 2026-06-16 10:10, David Brown wrote:
On 15/06/2026 19:57, Waldek Hebisch wrote:
[...]
No, ignoring problems is never a good thing. Writing code that
doesn't run the risk of problems is a good thing.
Sure.
And I can agree that sometimes leaving traps enabled in released code
can be helpful - there are situations where you can't practically
remove the risk of overflows, and it is better to crash out reliably
than risk running on with faulty data. It is, however, also the case
that sometimes traps will cause far more problems than incorrect data
would. (Noting that UB does not guarantee "incorrect data" - it can do
anything. Wrapping semantics, or unspecified value semantics, would
do that.)
Hmm.. - not sure what you mean (and imply with) "crash out reliably".
Having been engaged in server systems software development a crash
had never been an accepted option. And that's certainly also true
with life-critical applications and costly operations (upthread you
had mentioned Ariane 5). You should always avoid crashes and catch exceptions.
and that depends on the actual application case; report it, retry it,
retry with alternative methods or adapted conditions, emulate the
result, estimate it, ask supervisor process, switch devices, etc.
I'm well aware that wrong data may also be bad, be it from a wrong algorithms, a technical overflow situation, unreliable data sources,
or an unreliable processing (not-excluding effects of UB).
I'm really not sure whether to consider "not handling an exception"
better or worse than "not handling data errors"; usually you don't
want either. So both should prevented (if possible) or acted upon
(if getting a notice about it).
Janis
[...]
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
Oh, actually I indeed thought that printing a constant string would not
create any error that would then be indicated by printf's return value.
Linux has a device called "/dev/full". It acts like it has no data
on input, and like it's full on output. You can redirect a program's
stdout to /dev/full. It's useful for testing, and much easier than
finding a writable filesystem with no remaining space. (/dev/null
accepts and discards as much intput as you send to it.)
[...]
I'd indeed also expected that, say, printing a string value with a '%d'
specifier would produce an error, but I saw that it doesn't; while the
compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not
exactly what I'd expect from a language.
Calling printf with a mismatch between the format string and
an argument has undefined behavior. Some compilers will warn
about this in most cases, but in general the format string is not
necessarily known at compile time.
No diagnostic or other error indication is required.
[...]
Obviously (because of that?) I've never seen anyone test such a call
by, say,
int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}
Quick-and-dirty programs like the classic "hello, world" often don't
bother to check. The above could print an error message to stderr and
call exit(EXIT_FAILURE). Even if stdout and stderr both produce errors,
the caller should be able to detect the error status. (I've configured
my shell to print a message when a program dies with an error status.)
But most production programs don't just blindly print stuff to stdout.
[...]
Are you - plural, all CLC audience - writing such code with 'printf()',
honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)
Write the error message to stderr, optionally log it somewhere,
and exit with an error code.
On 2026-06-11 18:37, Janis Papanagnou wrote:[...]
I'd indeed also expected that, say, printing a string value with a '%d'
specifier would produce an error, but I saw that it doesn't; while the
compiler creates just a warning, execution provides some random output
and a _non-negative_ string-length value as printf's return value. Not
exactly what I'd expect from a language.
On some systems I've used, it would try to interpret the pointer to the string as an int, and print the result.
On others, it would expect the
int to be stored in one register, whereas the pointer was stored in a different register, and as a result it would print whatever value was
last stored in the first register. These were natural outcomes for those implementations; had the C standard imposed any conflicting requirements
on the behavior, it would have complicated those implementations.
[...]
Obviously (because of that?) I've never seen anyone test such a call
by, say,
int rc = printf("Hello, world\n");
if (rc < 0) {
/* umm.. */
}
Are you - plural, all CLC audience - writing such code with 'printf()',
honestly? - Same question with 'int rc = fclose (...);' - what can one
do about that, then? (Write a logfile entry, maybe? - and then?)
For most of the programs I ever wrote, a single check for ferror(file)
at the end of the program, resulting in exit(EXIT_FAILURE) being called, would be acceptable.
That approach relies on the fact that the error
flag is sticky. Because I made a habit of such checks, we caught a
problem when a disk overflowed before we'd wasted hours "writing" data
to nowhere. If I had sent a message to a log file, it would have been
blocked by the same problem, which is why I used the exit status to
report the problem.
[...]--- Synchronet 3.22a-Linux NewsLink 1.2
On 2026-06-12 02:41, Keith Thompson wrote:[...]
Calling printf with a mismatch between the format string and
an argument has undefined behavior. Some compilers will warn
about this in most cases, but in general the format string is not
necessarily known at compile time.
Well, yes. But therefore I imagined that at runtime an rc<0 could have indicated such a mismatch.
uint32_t x;
says precisely that x is 32 bits, unsigned, with no padding bits.
But
uint32_t bf : 1;
is meaningfully different from
unsigned bf : 1;
only because in most implementations (and ABIs), the underlying type
of a bit field affects the layout of the entire structure.
I accept that this is the case, but it's never made any sense to me,
and there's no hint of it in the C standard.
For example, if I write:
uint64_t bf : 1;
then the containing struct is typically at least 64 bits, even
though those other 63 bits aren't part of the bit field and other
members can be allocated within them.
It would make a lot more sense *to me* if an N-bit bit field were
simply N bits.
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 2026-06-12 02:41, Keith Thompson wrote:
[...]
Calling printf with a mismatch between the format string and an
argument has undefined behavior. Some compilers will warn about
this in most cases, but in general the format string is not
necessarily known at compile time.
Well, yes. But therefore I imagined that at runtime an rc<0
could have indicated such a mismatch.
That would be nice, but it's just one of the infinitely many
possible results of undefined behavior.
For example, this program:
#include <stdio.h>
int main(void) {
const int result = printf("%ld\n", 0.3);
printf("printf returned %d\n", result);
}
on my system prints:
140732048673560
printf returned 16
gcc and clang warn about the format string. tcc doesn't.
The (first) printf call was apparently successful because printf
has no way to know that the argument was of an incorrect type
(types don't really exist at run time).
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:[...]
[...]But
uint32_t bf : 1;
is meaningfully different from
unsigned bf : 1;
only because in most implementations (and ABIs), the underlying type
of a bit field affects the layout of the entire structure.
I accept that this is the case, but it's never made any sense to me,
and there's no hint of it in the C standard.
I think saying there is not even a hint is an overstatement. The C
standard says that an implementation "may allocate any addressable
storage unit large enough to hold a bit-field." It shouldn't be a
surprise that how much storage is allocated depends on the type of
the bit-field member. For example, a bit-field of type 'unsigned'
might very well choose a larger storage unit than what is chosen
for a bit-field of type '_Bool'. It seems obvious that the type of
a bit-field might affect what size and layout is chosen.
For example, if I write:
uint64_t bf : 1;
then the containing struct is typically at least 64 bits, even
though those other 63 bits aren't part of the bit field and other
members can be allocated within them.
It would make a lot more sense *to me* if an N-bit bit field were
simply N bits.
Two problems with that. One, it seems to be in conflict with what
the C standard says about 0-width bit-fields.
Two, the C standard
explicitly allows allocating bit-fields using a high-to-low order or
a low-to-high order (implementation-defined choice). Presumably
this freedom is given to accommodate both big- and little-endian
platforms. The idea that an N-bit bit-field should simply be N bits
doesn't work in big-endian environments. It seems better to allow little-endian implementations to choose a size that matches what a
big-endian implementation would use, rather than insisting that they
be different.
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:[...]
For example, this program:
#include <stdio.h>
int main(void) {
const int result = printf("%ld\n", 0.3);
printf("printf returned %d\n", result);
}
on my system prints:
140732048673560
printf returned 16
gcc and clang warn about the format string. tcc doesn't.
The (first) printf call was apparently successful because printf
has no way to know that the argument was of an incorrect type
(types don't really exist at run time).
printf() could know if an argument were of an incorrect type, if
an implementation chose to do so.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:[...]
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
[...]It would make a lot more sense *to me* if an N-bit bit field were
simply N bits.
Two, the C standard
explicitly allows allocating bit-fields using a high-to-low order or
a low-to-high order (implementation-defined choice). Presumably
this freedom is given to accommodate both big- and little-endian
platforms. The idea that an N-bit bit-field should simply be N bits
doesn't work in big-endian environments. It seems better to allow
little-endian implementations to choose a size that matches what a
big-endian implementation would use, rather than insisting that they
be different.
I honestly don't understand your point here. How does making
N-bit bit-fields N bits not work in a big-endian environment?
Can you elaborate? Of course endianness can affect how bit-fields
are allocated within a "storage unit".
"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard.
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:[...]
antispam@fricas.org (Waldek Hebisch) writes:
UB in C standard corresponds with 'error' in Pascal standard. [...]
Does it? In C a syntax error is undefined behavior, but it
requires a diagnostic. (I don't mean to single out just syntax
errors; there are other examples.)
I mean typical UB, especialy cases that people complain about.
[...]
cross@spitfire.i.gajendra.net (Dan Cross) writes:
[...]
"Undefined Behavior", in C, in the manner usually discussed in
this newsgroup, was introduced with the first standard.
The term but not the concept, which was there since the
early days of C -- at least since K&R in 1978, and very
likely earlier (I haven't reviewed any of the earlier
descriptions of the language).
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,126 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 51:58:29 |
| Calls: | 14,414 |
| Calls today: | 2 |
| Files: | 186,401 |
| D/L today: |
11,531 files (3,175M bytes) |
| Messages: | 2,548,956 |
| Posted today: | 1 |