• Re: this girl calls c ugly

    From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 8 17:33:42 2026
    From Newsgroup: comp.lang.c

    In article <86y0gp82pd.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Note that in a context that requires a constant expression, overflow is >>>>>> a constraint violation. For example, a case label like:

    case (INT_MAX + 1) * 0:

    must be diagnosed at compile time.

    gcc disagrees with you.

    What makes you think so?

    [...]

    I'm skipping this and proceeding on to the original question.

    Why?

    gcc is not authoritative.

    You, Tim, wrote the words, "gcc disagrees with you."

    If you didn't want to bring GCC into it, because it is not
    authoritative (which is true), then why did you mention it in
    the first place?

    I didn't want to get into an argument
    about whether gcc is conforming, or which version of gcc was used,
    or any similar distractions.

    You opened that door and walked through it.

    The C standard /is/ authoritative,
    and I thought it would save time to cut to the chase.

    Then you should have done that from the start, and not mentioned
    GCC.

    [snip]
    I'd like to know whether you still think you were right. If so,
    I'd like to see your explanation. If not, an admission that you
    made a mistake would be appreciated. But I expect neither from you.

    I'd like to know why you ignored my explanation, based directly on
    text from the C standard, about why an implementation is allowed to
    process the code in question, without giving a diagnostic, and
    still be conforming. An explanation that Dan Cross agreed with,
    even if he may not like the consequences.

    I am mystified as to why you are bringing my name into this, and
    why you think "I may not like the consequences", or even what
    that means. In any event, you are evidently laboring under some
    assumption about what I think about this matter that is probably
    incorrect.

    Because I am not you, I cannot know this for a fact, let alone
    why it may be. Regardless, I suggest you don't do that, or at a
    minimum seek clarity from the referent of your assumptions,
    before making claims about they may think.

    In investigating this question, I have run compilations using
    multiple versions of gcc, on two different platforms. I have looked >carefully through the gcc man page. I have also run compilations
    using multiple versions of clang, on two different platforms. After
    doing all that, I ran compilations using godbolt, so I could check
    the latest, or maybe almost latest, versions of gcc and clang. All
    the different versions of gcc and clang that I have tried support my >hypothesis that gcc (and now also clang) interpret the C standard so
    as to conclude that conforming to the C standard need not require a >diagnostic for situations like the code under discussion..

    It appears that you are appealing to a certain kind of semantic
    precision, that is itself based on a number of assumptions that
    are unstated, but that are implicit in your writing. Further,
    you give every indication of believing that a reader should
    simply intuitively know.

    In fact, both GCC and clang (the versions I tried on the
    platforms I tried on) emit a diagnostic for the code under
    consideration. Your assertion appears to be that that is
    unrelated to the constraint in section 6.6 para 4, which seems
    accurate.

    But you did not say that: instead, you just made a vague
    statement that "gcc disagrees with you." That's not useful, and
    no one can reasonably know what you meant unless you elaborated
    on it.

    When it was pointed out to you that in fact GCC generates a
    diagnostic, you had an opportunity to clarify that it was not in
    response to the aforementioned constraint violation. You chose
    not to do so, and instead of arrogantly accuse others of
    laziness and a lack of willingness to understand.

    Insisting that your readers adhere to some arbitrary level of
    semantic precision you seem to fancy yourself expressing is not
    actually a sign of true expertise. Real expertise is most
    readily demonstrated through effective communication.

    I'd like to ask you to do two things. First, read through the
    reasoning given in my previous post, try to assess whether that
    reasoning is sound, and post the results of yours contemplations.

    Second, look again at the question of whether gcc (and also clang,
    if you're up to it) support the hypothesis that a conforming
    implementation need not give a diagnostic for code like that under >discussion. See if you can find a way of framing the question that
    supports my statement, rather than simply looking for one that
    supports your preconceived ideas. Post the results of your
    investigations, both what other experiments you tried, and what your >assessment is of the results you got.

    Do these two things and I will endeavor to explain my views on the
    questions you have raised here, if such explanations are still
    needed after your further examinations and comments.

    It is rather cavalier to make imperative statements to others
    regarding how they must spend their time.

    [SNIP]

    I see no basis for this belief. My conclusions are based on what
    the C standard actually says, rather than guesses about some
    unstated "intentions". I think you would do well to reach your
    conclusions based more on the actual text of the C standard, and
    less on your interpretation of what the text was "intended" to
    mean.

    The actual text of the standard implies that 42 is not an expression.
    I rely on the obvious intent to conclude that it is.

    Now it is you who is changing the subject. Besides not being on
    point to the question being considered, it's a silly argument, and I
    would hope you are smart enough to realize that. However, if you do
    what I have asked in the previous paragraph, I can try to explain
    why I think your views on this unrelated matter are wrongheaded.

    Is it a silly argument?

    Perhaps Keith has some reason for suggesting that such an
    interpretation is be valid. I'm not aware of what that might
    be, but I suspect you are not, either. But without even knowing
    what the argument is, how would you know?

    You are the one admonishing others to look at the letter of the
    standard ("My conclusions are based on what the C standard
    actually says..."), yet here you dismiss as "a silly argument",
    a thing brought up by someone who has demonstrated that they
    generally know what they're talking about, and you have done so
    without even bothering to ask what they might be refering to.

    In fact, I think this fits a pattern of behavior I observe from
    you fairly consistently. You decide on an interpretation,
    declare it correct, and appear to scoff at anyone else who does
    not immediately share that interpretation as being "lazy" or
    worse.

    Ironically, you yourself do not do well when you are shown to be
    wrong about something; cf your bizarre statement about Rust not
    being strongly typed. This does not do well for your
    credibility; everyone makes mistakes now and again, and you are
    no different, but your seeming inability to admit to it when it
    is obvious decreases faith in your interpretations when they are
    not obvious.

    You would do well to express more humility, and consider how
    others might perceive you based on the way you talk to them.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 8 17:37:52 2026
    From Newsgroup: comp.lang.c

    In article <1106d97$huo$1@reader1.panix.com>,
    Dan Cross <cross@spitfire.i.gajendra.net> wrote:
    My example is this:

    constexpr int A = ~0U;

    The type of the rhs is `int` and the value is not representable

    *sigh* "The type of the rhs is `unsigned int` and the value is
    not representable in a `signed int`.

    Perhaps,

    constexpr int A = (unsigned int)INT_MAX + 1;

    ...is an even better example.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 8 12:39:04 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    [...]
    A naive compiler that performs no optimizations would generate
    code for foo() that attempts to compute (INT_MAX+1)*0 step by
    step, without recognizing the overflow, and that code would never
    be executed.

    Sure. But a far more sophisticated translator (and I would
    argue a nefarious one) could emulate that code, decide it was
    UB, and immediately fail translation with an error.

    I disagree. That's not a sensible interpretation of what the
    standard says.

    A call to a foo() would have undefined behavior if it occurred. There
    is no call to foo().

    Similarly:

    int a = ..., b = ...;
    int c;
    if (b != 0) {
    c = a / b;
    }
    else {
    c = 0;
    }

    A division by zero would have undefined behavior if it occurred,
    but it never occurs. A compiler cannot reject the above code
    because of UB that never happens.

    [...]

    It returns a status of 0 from main and does nothing else.
    A conforming implementation *must* generate code that implements
    that behavior.

    I have yet to find or be shown a way in which the standard
    actually guarantees that.

    How does the standard guarantee *anything*?

    This strictly conforming program:

    int main(void) { return 0; }

    when executed returns a status of 0 from main and does nothing else.
    Adding an uncalled function to the same source file doesn't change
    that.

    [...]

    There was, once, a view that was almost universally shared that
    UB was meant for things that could not be precisely described
    because hardware was too varied. We're well past that; now it's
    a vehicle for compiler writers to make benchmarks faster, but is
    (generally) hostile to programmers. A lot of hay is made about
    it in this group, but at the core, it's just (ironically) not
    well-defined.

    The standard does say what UB is meant for. It says what UB
    *is*, and what constructs lead to it (by omission in some cases).
    Any optimization tricks played by compiler implementers must be
    based on that specification.

    [...]

    I agree. printf("hello, world\n") must write that string to standard
    output, which may be a file or an interactive device. Just what
    that means is unspecified or implementation-defined. It might be
    printed in EBCDIC or incised into clay tablets. Closing stdout,
    which occurs when main() terminates, might involve firing the tablet
    or emitting control sequences for a screen reader.

    Exactly. It could also emit the string, "GOODBYE WORLD."

    No, it couldn't. It must emit "hello, world\n" in some form.
    It must emit the character 'h' as represented in the execution
    character set, followed by 'e', and so on.

    [...]

    This presupposes that the program is strictly conforming, but
    in the limit, the standard can be interpreted in such a way that
    if any statement in the program is proveably UB (as this one is)
    then the program cannot said to be strictly conforming.

    It's not UB if it's never called. Behavior that doesn't happen is
    not behavior.

    I did not presuppose that the program is strictly conforming.
    I read the source code and determined that it meets the standard's
    definition of a strictly conforming program.

    [...]

    Ok, so in that case, would we say that "`foo` has undefined
    behavior?" The qualification, "...if called" seems superfluous,
    and I don't see anything in the standard that explicitly
    disagrees.

    The qualification "if called" is the whole point.

    [...]

    UB can time-travel, however. Because it's undefined, the
    compiler is free to assume that it never executes, or that it
    always executes.

    "UB can time-travel" is perhaps an oversimplification. An example is
    a bug that occurred in the Linux kernel, something like:

    void func(int *ptr) {
    do_something_with(*ptr);
    if (ptr != NULL) {
    blah();
    }
    }

    The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
    not null, and elided the test on the following line.

    But even assuming that's valid, a compiler absolutely cannot assume that
    an instance UB always executes when, according to the semantics of the
    program, it provably never executes.

    [...]

    So any program that produces no output at all is strictly
    conforming? Then what about this?

    #include <limits.h>

    int
    zero(void)
    {
    return (INT_MAX + 1) * 0;
    }

    int
    main(void)
    {
    (void)zero();
    return 0;
    }

    That's an interesting point. A more terse example:

    #include <limits.h>
    int main(void) {
    int unused = INT_MAX + 1;
    }

    This program produces no output, yet clearly executes a function
    that contains an expression that induces undefined behavior when
    evaluated. I suppose an argument could be made that it _might_
    generate output due to UB, as UB imposes no requirements Not to
    do so, so perhaps the _absence_ of output depends on UB.

    The program clearly has undefined behavior when executed, but no
    output depends on that undefined behavior. In my humble opinion,
    this demonstrates a flaw in the standard's definition of "strictly
    conforming program". (As a programmer: Don't do that.)

    [...]

    In my ideal world, C would be rigorously defined with a precise
    operational semantics. That would be accompanied by an
    explanatory document that presented those semantics in lay
    terms in prose, similar to the standard now, for those who did
    not want to drive Coq or something similar. But at least we'd
    have something definitive to define the language, so that when
    there was apparent ambiguity, we had some objective metric by
    which to judge. The C standard, as written, is nowhere close as
    precise as it should be.

    I do not think that this will ever happen: not only would it be
    very difficult to produce (as you noted elsethread), I think the
    compiler writers would rebel if they felt that their UB hands
    were tied by a formal specification.

    "There are only two kinds of languages: the ones people complain
    about and the ones nobody uses."
    -- Bjarne Stroustrup
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 8 13:40:56 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Note that in a context that requires a constant expression, overflow is >>>>>> a constraint violation. For example, a case label like:

    case (INT_MAX + 1) * 0:

    must be diagnosed at compile time.

    gcc disagrees with you.

    What makes you think so?

    [...]

    I'm skipping this and proceeding on to the original question.

    What question? I made a statement.

    Why?

    gcc is not authoritative. I didn't want to get into an argument
    about whether gcc is conforming, or which version of gcc was used,
    or any similar distractions. The C standard /is/ authoritative,
    and I thought it would save time to cut to the chase.

    I never said gcc is authoritative. *You* brought gcc into the
    discussion.

    It is a fact that gcc issues a diagnostic for that case label.
    It is a fact that it's a non-fatal warning with "-pedantic" and a
    fatal error with "-pedantic-errors", which implies, as I understand
    it, that the authors of gcc believe that the diagnostic is required
    by the standard.

    You made a statement, "gcc disagrees with you". I demonstrated,
    in text that you snipped, that gcc does in fact agree with me.

    No, you didn't.

    Yes, I did.

    You were wrong.

    No, I wasn't. Your testing was faulty.

    Yes, you were. My testing was not faulty.

    What exactly did you mean by "gcc disagrees with you"? I
    think it's sufficiently obvious that gcc does not have opinions,
    so you presumably were speaking figuratively in some sense.
    Do you not see the same diagnostic I saw?

    I don't know the basis of your error, so I asked.
    Or maybe I'm missing something, and you had a valid point that I
    didn't understand.

    I'm offended that you think I have an obligation to remedy your
    habit of lazy thinking, especially when as here the answer was
    staring you right in the face, and you simply ignored it.

    OK. I'm offended by your superior attitude. I'm offended by your
    refusal to consider that you might have made a mistake. I'm offended
    by your refusal to explain what you meant by an unclear statement
    after I repeatedly ask you to do so. I'm offended by your apparent
    assumption that if the rest of us just *think really hard*, we'll
    inevitably agree with you.

    You're not required to answer my question, which I think was
    an extremely reasonable one, but quoting it and then explicitly
    refusing to answer it is pointlessly rude.

    I wasn't refusing to answer. What I was doing was trying to
    answer the original question, and answer it in a way that wouldn't
    get lost in pointless bickering. Silly me.

    I'm assuming that by "the original question", you're referring to my *statement* that a diagnostic is required for the above case label.
    If you have some other "original question" in mind, please specify
    it. Please do not insult me by assuming that I'll know exactly
    what you mean if I just reread what you wrote and think hard enough.

    If you were trying to answer the "original question", you failed.
    You expressed your supposed disagrement by asserting, without
    further explanation, that gcc disagrees with me -- when, in fact,
    it does not, and when gcc's behavior is not directly relevant to
    the original statement anyway (since, as you correctly point out,
    gcc is not authoritative).

    I'd like to know whether you still think you were right. If so,
    I'd like to see your explanation. If not, an admission that you
    made a mistake would be appreciated. But I expect neither from you.

    I'd like to know why you ignored my explanation, based directly on
    text from the C standard, about why an implementation is allowed to
    process the code in question, without giving a diagnostic, and
    still be conforming. An explanation that Dan Cross agreed with,
    even if he may not like the consequences.

    That explanation is not relevant to your claim that gcc disagrees
    with me, which is what I asked you about.

    In investigating this question, I have run compilations using
    multiple versions of gcc, on two different platforms. I have looked carefully through the gcc man page. I have also run compilations
    using multiple versions of clang, on two different platforms. After
    doing all that, I ran compilations using godbolt, so I could check
    the latest, or maybe almost latest, versions of gcc and clang. All
    the different versions of gcc and clang that I have tried support my hypothesis that gcc (and now also clang) interpret the C standard so
    as to conclude that conforming to the C standard need not require a diagnostic for situations like the code under discussion..

    You've told us what you concluded from your compilations using godbolt.
    You haven't told us what those compilations actually told you.

    On the off chance that you're willing to answer a straightforward
    question:

    Here's one result I got on my system:

    $ gcc16 --version | head -n 1
    gcc16 (GCC) 16.1.0
    $ cat c.c
    #include <limits.h>
    int main(void) {
    switch(0) {
    case (INT_MAX + 1) * 0:
    break;
    }
    }
    $ gcc16 -std=c23 -pedantic-errors -c c.c
    c.c: In function ‘main’:
    c.c:4:23: warning: integer overflow in expression of type ‘int’ results in ‘-2147483648’ [-Woverflow]
    4 | case (INT_MAX + 1) * 0:
    | ^
    c.c:4:9: error: overflow in constant expression [-Woverflow]
    4 | case (INT_MAX + 1) * 0:
    | ^~~~
    $

    gcc emitted a fatal error message on that case label. Have you
    seen any version of gcc, either on your system or on godbolt,
    *not* issue a fatal error message when invoked on that source with
    "-std=cNN -pedantic-errors" (NN=23, or any valid value you like)?
    If so, have you seen it not at least issue a warning?

    If not, what is the basis for your claim that gcc disagrees with me?

    It's conceivable that what you meant is that gcc happens to issue
    a diagnostic, but is not required to. If so, then (a) that's
    sufficiently subtle that any reasonable person would have explained
    that point, and (b) given that gcc produces a diagnostic, I see no
    basis to assume that gcc "thinks" it's not required to do so.

    I'd like to ask you to do two things. First, read through the
    reasoning given in my previous post, try to assess whether that
    reasoning is sound, and post the results of yours contemplations.
    Second, look again at the question of whether gcc (and also clang,
    if you're up to it) support the hypothesis that a conforming
    implementation need not give a diagnostic for code like that under discussion. See if you can find a way of framing the question that
    supports my statement, rather than simply looking for one that
    supports your preconceived ideas. Post the results of your
    investigations, both what other experiments you tried, and what your assessment is of the results you got.

    You made a very simple claim, that gcc disagrees with me. I'm asking
    you about *that statement*. Do you still assert that gcc disagrees
    with me? (That is not a question about the C standard.)

    Do these two things and I will endeavor to explain my views on the
    questions you have raised here, if such explanations are still
    needed after your further examinations and comments.

    [SNIP]

    I see no basis for this belief. My conclusions are based on what
    the C standard actually says, rather than guesses about some
    unstated "intentions". I think you would do well to reach your
    conclusions based more on the actual text of the C standard, and
    less on your interpretation of what the text was "intended" to
    mean.

    The actual text of the standard implies that 42 is not an expression.
    I rely on the obvious intent to conclude that it is.

    Now it is you who is changing the subject. Besides not being on
    point to the question being considered, it's a silly argument, and I
    would hope you are smart enough to realize that. However, if you do
    what I have asked in the previous paragraph, I can try to explain
    why I think your views on this unrelated matter are wrongheaded.

    Please be less condescending.

    Leaving gcc aside, my original statement was that a case label like:

    case (INT_MAX + 1) * 0:

    is a constraint violation (and therefore that it requires a diagnostic).
    It's possible that I'm mistaken on that point. The constraint I claim
    it violates is that "Each constant expression shall evaluate to a
    constant that is in the range of representable values for its type."

    We could have discussed that much more briefly if you hadn't dragged
    gcc into it.

    I acknowledge that it can also be reasonably argued that the
    expression as a whole *can*, for a particular implementation, yield
    a result of 0, and therefore that a diagnostic is not required *for
    such an implementation*.

    The committee response to C90 DR #031 contradicts that argument:

    case (INT_MAX*4)/4: is a constraint violation.
    When subclause 6.4 says on page 55, lines 11-12:

    Each constant expression shall evaluate to a constant that is in the
    range of representable values for its type.

    the Committee's judgement of the intent is that the
    ``representable'' requirement applies to each subexpression of
    a constant expression, as shown in the third example. A constant
    expression is meant as defined by the syntax rules.

    My judgement of the intent agrees with the Committee's, and, as
    far as I can tell, with gcc's.

    (I do think that the wording in the standard could and should be
    improved.)
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 8 14:05:06 2026
    From Newsgroup: comp.lang.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    The actual text of the standard implies that 42 is not an expression.
    I rely on the obvious intent to conclude that it is.

    I made the above statement to demonstrate that just following the exact
    wording of the standard, without thinking about the (sometimes unclear)
    intent behind it, can lead to absurd results.

    I've discussed this particular glitch before, but it's been a while.

    N3220 6.5.1 says:

    An *expression* is a sequence of operators and operands that
    specifies computation of a value, or that designates an object
    or a function, or that generates side effects, or that performs
    a combination thereof.

    I believe the wording is unchanged from C90 up to the latest C202y
    draft. Since the word "expression" is in italics, this is the
    standard's definition of the word.

    This is a flawed definition. The terms "operator" and "operand"
    are defined in 6.4.6:

    *punctuator: one of
    [ ] ( )
    [snip]

    A punctuator is a symbol that has independent syntactic and semantic
    significance. Depending on context, it may specify an operation to
    be performed (which in turn may yield a value or a function
    designator, produce a side effect, or some combination thereof) in
    which case it is known as an *operator* (other forms of operator also
    exist in some contexts). An *operand* is an entity on which an
    operator acts.

    Consider this expression statement:

    42;

    Is `42` an expression? Clearly it's intended to be, but there is no
    operator, and therefore there is no operand, so it doesn't meet the
    standard's definition of the word "expression".

    For that matter, consider:

    (void)0;

    It's "obvious" that `(void)0` is an expression. It consists of one
    operator `(void)` and one operand `0` (I'll ignore the fact that
    the definition uses plurals for both), but it does not specify
    computation of a value, or designate an object or a function,
    or generates side effects, or perform a combination thereof.

    The fact that the standard's definition of "expression" is flawed is
    not much of a problem in practice. Virtually everyone, implementers
    and programmers, assumes the obvious intent. Nobody believes that
    `42` isn't an expression. But it is my strongly held opinion that
    the wording should be improved in a future edition of the standard.

    I think it should say something to the effect that the meaning
    of the term "expression" is defined by the grammar. The current
    wording that claims to be the definition of the term could, with
    a few tweaks, still be turned into a valid normative statement
    *about* expressions.

    I have a similar issue with the standard's definition of "value":
    "precise meaning of the contents of an object when interpreted as
    having a specific type". It's obvious that the result of evaluating
    a non-void expression (such as the infamous `42`) is a "value",
    but the definition implies that a "value" can only be the meaning
    of the contents of an object. Nobody is actually misled by the
    current definition, but it should be improved.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 8 23:15:48 2026
    From Newsgroup: comp.lang.c

    In article <11075os$3fm4u$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    [...]
    A naive compiler that performs no optimizations would generate
    code for foo() that attempts to compute (INT_MAX+1)*0 step by
    step, without recognizing the overflow, and that code would never
    be executed.

    Sure. But a far more sophisticated translator (and I would
    argue a nefarious one) could emulate that code, decide it was
    UB, and immediately fail translation with an error.

    I disagree. That's not a sensible interpretation of what the
    standard says.

    I agree it's not sensible. But sadly, the standard does not
    seem to explicitly prohibit it, either. This is the point: we
    necessarily rely on a "reasonable interpretation" of the
    standard to be able to usefully write C code. An adversarial
    interpretation is not sensible, but it appears that such is
    possible given the standard as written. This is a danger with a
    language that is not formally specified.

    A call to a foo() would have undefined behavior if it occurred.

    What I'm really trying to get at is that the behavior of
    `int zero = (INT_MAX + 1)*0;` is undefined in all cases. There
    is no input for which it is valid at all. It is qualitatively
    different than other examples where UB cannot be detected
    _except_ at runtime.

    In particular, it does not become defined just because it's in a
    function that is not called; the behavior is UB on its face. It
    is utterly meaningless as far as C is concerned; it is what
    Regehr calls a "Type 3" function in his taxonomy at https://blog.regehr.org/archives/213: it literally has no
    definition.

    There
    is no call to foo().

    What I am further saying is that I do not see where the C
    standard puts additional constraints on an implementation so
    that it _must_ accept a program with such a construct in it, as
    sensible as that may otherwise be (I actually don't think that
    is very sensible, but that's my opinion). The specific wording
    of the standard appears to allow a compiler to halt translation
    if it observes that expression, whether it's in a function that
    is called or not.

    I readily concede that I may be wrong. But the arguments I have
    heard opposing this interpration are not well-supported by the
    text. I would be happy if someone could provide such an
    argument that did not ultimately rely on either intuition or
    assumptions about reasonable behavior, but so far, none have
    been proferred.

    Similarly:

    int a = ..., b = ...;
    int c;
    if (b != 0) {
    c = a / b;
    }
    else {
    c = 0;
    }

    A division by zero would have undefined behavior if it occurred,
    but it never occurs. A compiler cannot reject the above code
    because of UB that never happens.

    This I also agree with. But assuming this is in some function
    that is otherwise well-defined, this is what Regehr calls a
    "Type-1" function: there is no input for which it is undefined.

    In this regard, it is qualitatively different than the `foo`
    example that is the subject of this thread. I suggest that that
    qualitative difference actually matters.

    [...]

    It returns a status of 0 from main and does nothing else.
    A conforming implementation *must* generate code that implements
    that behavior.

    I have yet to find or be shown a way in which the standard
    actually guarantees that.

    How does the standard guarantee *anything*?

    The thrust of what I have been driving at is that the standard
    actually guarantees a lot less than people take for granted.

    This strictly conforming program:

    int main(void) { return 0; }

    when executed returns a status of 0 from main and does nothing else.

    Actually, does it? It also implicitly closes the standard
    input, output, and error streams. That could have side effects.

    Adding an uncalled function to the same source file doesn't change
    that.

    But it's not _just_ an uncalled function. It's an uncalled
    function that is manifestly gibberish because there is no input
    for which that expression is well-defined.

    I have not found evidence that the standard explicitly prohibits
    a pathological compiler from doing something unexpected in that
    case. An adversarial read of the standard could allow a
    compiler to treat this in a manner similar to a syntax error.

    [...]

    There was, once, a view that was almost universally shared that
    UB was meant for things that could not be precisely described
    because hardware was too varied. We're well past that; now it's
    a vehicle for compiler writers to make benchmarks faster, but is
    (generally) hostile to programmers. A lot of hay is made about
    it in this group, but at the core, it's just (ironically) not
    well-defined.

    The standard does say what UB is meant for. It says what UB
    *is*, and what constructs lead to it (by omission in some cases).
    Any optimization tricks played by compiler implementers must be
    based on that specification.

    Yes. Just so. And it also says that anything not explicitly
    stated in the standard is UB.

    As we all know, the definition of UB in the standard is,
    "behavior, upon use of a nonportable or erroneous program
    construct or of erroneous data, for which this document imposes
    no requirements."

    Behavior is defined as, "external appearance or action". Note
    that this does not explicitly state that "behavior" is only
    applicable during execution, and we know that the standard, as
    written today, says that some behaviors are "undefined" _at
    translation time_. I cannot find something forbidding an
    implementation from interpreting "external appearance or action"
    to refer to the success or failure of translation and production
    of an associated artifact. Translation phase 7 then says that
    the after all of the preprocessing and so forth, "the resulting
    tokens are syntactically and semantically analyzed and
    translated as a translation unit." As written, a compiler could
    certainly detect that that expression, whether executed or not,
    is UB.

    Indeed, sec 3.5.3 para 2, "Note 1 to entry", explicitly mentions
    terminating translation as one of a few sample "undefined
    behaviors". It doesn't say that the compiler _has_ to do that,
    but does not say that it _must not_, either.

    Sec 3.5.3 para 4 ("Note 3 to entry") is the closest I see to
    mandating the interpretation you and Rentsch have taken, but
    that is specific to _execution time_, not _translation time_,
    and the latter is not outright banned from responding to UB: the
    text of the standard imposes no requirements in this context.
    Dare I say that the translation-time behavior is undefined?

    [...]

    I agree. printf("hello, world\n") must write that string to standard
    output, which may be a file or an interactive device. Just what
    that means is unspecified or implementation-defined. It might be
    printed in EBCDIC or incised into clay tablets. Closing stdout,
    which occurs when main() terminates, might involve firing the tablet
    or emitting control sequences for a screen reader.

    Exactly. It could also emit the string, "GOODBYE WORLD."

    No, it couldn't. It must emit "hello, world\n" in some form.
    It must emit the character 'h' as represented in the execution
    character set, followed by 'e', and so on.

    I didn't say that it wouldn't; I was referring specifically to
    the behavior on closing stdout. You are right, it must emit
    something corresponding to, "hello, world\n"; but what it does
    after that is up to the implementation. We agree that it could
    emit a terminal reset sequence; there is no reason that sequence
    couldn't be, "GOODBYE WORLD." It'd be a weird one, but it's not
    impossible.

    [...]

    This presupposes that the program is strictly conforming, but
    in the limit, the standard can be interpreted in such a way that
    if any statement in the program is proveably UB (as this one is)
    then the program cannot said to be strictly conforming.

    It's not UB if it's never called. Behavior that doesn't happen is
    not behavior.

    See above. The standard simply does not say that. The standard
    merely says that behavior is something that manifests as
    "external appearance or action." Translation is certainly an
    action with an "external appearance" and nothing says that
    behavior _during translation_ is any less "behavior" than
    behavior during execution. In fact, the standard explicitly
    mentions undefined behavior and translation.

    I did not presuppose that the program is strictly conforming.

    Well, you kinda did: you said that the program is strictly
    conforming, and then said that it must be accepted because it is
    strictly conforming. That acceptance is predicated on it being
    strictly conforming.

    I read the source code and determined that it meets the standard's
    definition of a strictly conforming program.

    I have presented what I think is an equally valid, alternative
    reading of the text of the standard where that does not hold.

    That reading is, admittedly, adversarial. That does not mean it
    is wrong. I am saying that this is a weakness of the standard,
    not a good interpretation.

    40 years ago people thought the idea of that a post-modern
    compiler time-travelling in the pursuit of optimization when UB
    is detected during translation was an adversarial read of the
    standard. And yet, here we are.

    [...]

    Ok, so in that case, would we say that "`foo` has undefined
    behavior?" The qualification, "...if called" seems superfluous,
    and I don't see anything in the standard that explicitly
    disagrees.

    The qualification "if called" is the whole point.

    Except it's not. The behavior of that expression is simply
    undefined; whether executed or not, there's no way it _could_ be
    defined.

    [...]

    UB can time-travel, however. Because it's undefined, the
    compiler is free to assume that it never executes, or that it
    always executes.

    "UB can time-travel" is perhaps an oversimplification.

    An example is
    a bug that occurred in the Linux kernel, something like:

    void func(int *ptr) {
    do_something_with(*ptr);
    if (ptr != NULL) {
    blah();
    }
    }

    The compiler, on seeing the expression `*ptr`, assumed that `ptr` is
    not null, and elided the test on the following line.

    But even assuming that's valid, a compiler absolutely cannot assume that
    an instance UB always executes when, according to the semantics of the >program, it provably never executes.

    Time travel is a term of art, here. I posted this elsewhere in
    the thread, and I think he does a much better job explaining it
    than I can:
    https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633

    Reading a bit more, I think that C23 sec 3.5.3 para 4 appears
    to be trying to reign that in. Hope springs eternal.

    [...]

    So any program that produces no output at all is strictly
    conforming? Then what about this?

    #include <limits.h>

    int
    zero(void)
    {
    return (INT_MAX + 1) * 0;
    }

    int
    main(void)
    {
    (void)zero();
    return 0;
    }

    That's an interesting point. A more terse example:

    #include <limits.h>
    int main(void) {
    int unused = INT_MAX + 1;
    }

    Sure. Or consider this program:

    ```
    #include <limits.h>

    int
    foo(int a)
    {
    extern int int_max;
    int_max = INT_MAX + 1;
    return int_max;
    }

    int
    main(void)
    {
    return 0;
    }
    ```

    Suppose that no definition for `int_max` is provided; is this a
    strictly conforming program? Consider section 6.9.1, which
    describes external definitions. The relevant paragraph is 5,
    which reads in part, "If an identifier declared with external
    linkage is used in an expression somewhere in the entire program
    there shall be exactly one external definition for the
    identifier; otherwise, there shall be no more than one."

    But as has been argued, `int_max` is not actually _used_, since
    `foo` is never called. If that holds, then this ought to be
    accepted by a conforming implementation. Yet, this fails to
    build with both gcc and clang, clearly both consider `int_max`
    to be "used". Ok, so what about this?

    #include <limits.h>

    int
    foo(int a)
    {
    extern int int_max;
    if ((INT_MAX + 1)*0) {
    int_max = INT_MAX + 1;
    }
    return 0;
    }

    int
    main(void)
    {
    return 0;
    }

    This _does_ build.

    So it appears that, at least for `gcc` and `clang`, merely not
    calling `foo` is insufficient.

    This program produces no output, yet clearly executes a function
    that contains an expression that induces undefined behavior when
    evaluated. I suppose an argument could be made that it _might_
    generate output due to UB, as UB imposes no requirements Not to
    do so, so perhaps the _absence_ of output depends on UB.

    The program clearly has undefined behavior when executed, but no
    output depends on that undefined behavior. In my humble opinion,
    this demonstrates a flaw in the standard's definition of "strictly
    conforming program". (As a programmer: Don't do that.)

    That's kind of what I'm saying. Though this interpretation
    hinges on whether the absence of output can be defined as output
    in some sense; in this case, the compiler could emit code that
    says, "this program has UB", and I think that would be fine with
    respect to the standard.

    But the standard says that an implementation can stop
    translating a program if it detects UB, and nothing appears to
    limit that to functions that have been called from `main`.

    [...]

    In my ideal world, C would be rigorously defined with a precise
    operational semantics. That would be accompanied by an
    explanatory document that presented those semantics in lay
    terms in prose, similar to the standard now, for those who did
    not want to drive Coq or something similar. But at least we'd
    have something definitive to define the language, so that when
    there was apparent ambiguity, we had some objective metric by
    which to judge. The C standard, as written, is nowhere close as
    precise as it should be.

    I do not think that this will ever happen: not only would it be
    very difficult to produce (as you noted elsethread), I think the
    compiler writers would rebel if they felt that their UB hands
    were tied by a formal specification.

    "There are only two kinds of languages: the ones people complain
    about and the ones nobody uses."

    Yup.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Jun 9 01:25:04 2026
    From Newsgroup: comp.lang.c

    Dan Cross <cross@spitfire.i.gajendra.net> wrote:
    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    and in fact
    it *won't* occur during execution because foo() isn't called.
    A compiler can't generate code with arbitrary behavior just because
    it can't prove that there will be no UB. If it could, every signed
    or floating-point arithmetic operation with unknown operand values >>>>would grant the same permission.

    But that's not the situation here. The situation is that the
    compiler can prove that something _is_ UB.

    In the program quoted at the top of this post, the UB occurs in
    a function foo() that's never called. A compiler can replace the
    body of foo() with a trap, and it can certainly warn about the UB,
    but I don't believe it can reject the entire program. A clever
    compiler could prove that the UB never occurs.

    So there are two things that are at play here.

    First, this notion that UB is _only_ a runtime matter. The text
    of the standard contradicting that aside, if a translator can
    detect that the behavior of a construct is provably undefined if
    executed, then it seems axiomatic that UB is clearly something
    that plays a role at translation time, as well.

    I think that this paragraph (and several other it this post and
    other posts) represent fundamental misanderstanding. This may
    be due to the way C standard is written. AFAIK Extended Pascal
    standard (once you translate terminalogy) states the same things as
    C about UB, but in clearer way. Some relevant parts below:

    : 3.1 Dynamic-violation
    : A violation by a program of the requirements of this International
    : Standard that a processor is permitted to leave undetected up to,
    : but not beyond, execution of the declaration, definition, or
    : statement that exhibits (see clause 6) the dynamic-violation.

    : 3.2 Error
    : A violation by a program of the requirements of this International
    : Standard that a processor is permitted to leave undetected.
    ...
    : 5.1 Processors
    ...
    : e) be able to determine whether or not the program violates any
    : requirements of this International Standard, where such a violation is
    : not designated an error or dynamic-violation,
    ...

    : 5.2 Programs
    ...
    : b) if it conforms at level 1, use only those features of the language
    : specified in clause 6;

    UB in C standard corresponds with 'error' in Pascal standard. And
    (by clause above) program is allowed only to use defined features,
    trying to use something that has no definition (undefined by
    ommision of definition) is automatically an error.

    Overflow in arithmetic in Pascal is an error, as is accessing
    wrong variant of variant record. Due to this accessing variable
    using wring type is an error in Pascal.

    Since valid programs shall contain no errors (as defined above)
    Pascal compiler my optimize assuming that user program contains
    no errors. This is the same as C compiler optimizing on assumption
    that there is no udefined behaviour in the program.

    Of course, C is different language than Pascal and in particular
    C contains more "dangerous" constructs that may lead to
    undefined behaviour.

    However, the fundamental thing remain: detecting undefined behaviour
    ("errors") at compile time in general is hard, and compilers are
    not obliged to do so. But they may optimize trusting that program
    contains no undefined behaviour ("no error").

    Indeed, I would go so far as to suggest that _most_ instances of
    UB are detected and used (by the translator) during translation.

    I think it is different: compiler _assumes_ no undefined behaviour
    and optimizes accoringly. But when there is undefined behaviour,
    then program behaves in unexpected way at runtime. Also, when
    you assume a false thing, then you can logically derive anything
    from it, so there is no limit to possible damage.

    So to say that, "this program doesn't have UB because the
    statement that contains UB is never executed" doesn't make a lot
    of sense to me. It would be closer to being correct if one said
    "this program is unaffected by UB since the expression that has
    UB is never evaluated when the program executes": again, in this
    case (as, I suspect, in most cases) the UB simply _is_: the
    expression `INT_MAX + 1` does not become well-defined just
    because it is never executed.

    Well, what is interesting to users is runtime behaviour of programs
    and undefined behaviour usually is runtime thing (as troubles that
    can be easily detected at compile time are usually constraint
    violations which should be detected at compile time). Fact that
    some undefined behaviour can be detected at compile time does
    not change this. And AFAICS there was very deliberate decision to
    allow programs which contains code that would be undefined behaviour
    if executed, but are considerd OK if such code is not executed.

    BTW: Pascal wording is different but Pascal standard contains
    identical provision and Pascal validation suite contains explicit
    tests of this sort.

    Second, there's this notion that the standard is just
    underspecified with respect to these matters, specifically, it
    does not _prohibit_ a translation from implementing an emulator
    for the abstract machine that evaluates code at translation
    time. Indeed, I suspect that _most_ compilers do something
    largely analogous to that; that's how they detect UB so that
    they can take advantage of it when optimizing. But if that's
    the case, then nothing prohibits them from relieving themselves
    of their obligation to follow the standard once they observe
    that some bit of code has UB.

    As I wrote, this is different. Compilers routinely compute some
    constant expressions at compile time. Constant here meaning that
    expression does not depend on runtime values. Compilers track
    ranges of variables. But this is done using assumption that
    there are no undefined behaviour. For example in loop:

    for(int i = 1; i > 0; i++) {
    ...
    }

    absent assigments to i in loop body compiler may infer that 'i > 0'
    and skip the test. If however there is undefined behaviour, then
    compiler may infer any nonsense. This may look like compiler
    detected undefined behaviour, but compiler typically do not check
    consistency of inferences. In fact, intermediate things and
    useless facts are quickly discarded, so detecting undefined
    behaviour via inconsistency of inferred facts would significantly
    increase memory use and probably also compile time.

    A naive compiler that performs no optimizations would generate
    code for foo() that attempts to compute (INT_MAX+1)*0 step by
    step, without recognizing the overflow, and that code would never
    be executed.

    Sure. But a far more sophisticated translator (and I would
    argue a nefarious one) could emulate that code, decide it was
    UB, and immediately fail translation with an error.

    As already noted C standard explictely forbids such behaviour.

    BTW: There were past discussions of the same and other people
    quited relevant passage which is quite explicit. I am not
    going to search for it, but it is in the standard.

    Is it? I am unable to locate where the standard _actually says
    that it is_. That is my whole point.

    Sorry, I looked in place given by other people, but I do not
    remember exact location. I would say that once you find right
    place and read it carefuly it is pretty clear.

    And yet the standard does not say that. That is an
    interpretation; I assume it is universally shared, but if we
    want to limit ourselves to what the standard _actually says_ it
    is woefully underspecified in this regard.

    There was, once, a view that was almost universally shared that
    UB was meant for things that could not be precisely described
    because hardware was too varied.

    Originally C was defined by single implementation which was not
    doing much optimisation. But clearly starting from the first
    C standard undefined behaviour had the same meaning as Pascal
    error: permission for compilers to optimize on assumption that
    is does not happen. The issue was well understood in seventies.
    Already in late sixties Fortran compilers could do interesting
    optimizations, not expected by naive users. In seventies
    majority (or at least most influential) view was that it is
    programmer resonsibility to obey language rules and that
    compiler should optimize on assumption that rules are obeyed.
    C reflect this point of view.

    One can discuss if such point of view is valid now, but C
    is a product of such thinking.

    This is circular reasoning. You're saying that something that
    is provably UB in this program cannot prevent that program from
    being strictly confirming because the program is strictly
    confirming.

    What you wrote above is similar to standard wording, except that
    standard formilates it much better, closer to "code that would
    cause undefined behaviour if executed does not prevent otherwise
    strictly confirming program from being strictly confirming".

    In the past there were disscusion when an implementation can reject
    a program. I do not remember what was the conclusion in the case
    when implementation can prove that program must cause undefined
    behaviour, but otherwise program violates no constraints. Probably
    it can reject it, but I am not sure. But if there were any possiblity
    that program may execute without undefined behaviour (including
    containg code that would cause undefined behaviour if executed),
    then implementation should accept such program.

    This presupposes that the program is strictly conforming, but
    in the limit, the standard can be interpreted in such a way that
    if any statement in the program is proveably UB (as this one is)
    then the program cannot said to be strictly conforming.

    As I wrote, there is quite explicit statement in the standard
    which says opposite of what you wrote above: mere presence of
    code that would cause undefined behaviour if executed does not
    make program non conforming.

    In my ideal world, C would be rigorously defined with a precise
    operational semantics. That would be accompanied by an
    explanatory document that presented those semantics in lay
    terms in prose, similar to the standard now, for those who did
    not want to drive Coq or something similar. But at least we'd
    have something definitive to define the language, so that when
    there was apparent ambiguity, we had some objective metric by
    which to judge. The C standard, as written, is nowhere close as
    precise as it should be.

    I do not think that this will ever happen: not only would it be
    very difficult to produce (as you noted elsethread), I think the
    compiler writers would rebel if they felt that their UB hands
    were tied by a formal specification.

    I do not thing operational semantics is best way to define C.
    Naive operational semantics would define too much things, so
    one would need serious work to define what should be defined
    and leave undefined what should be undefined.

    My personal favorite is axiomatic semantics. IMO it is quite well
    adapted to defining programming languages. Substantial part of
    Pascal was given axiomatic semantics in Alagic and Arbib book.
    Pascal standard do not explitely use axiomatic semantics, but
    my impression was that with managable effort it coulde be rewritten
    using axiomatic semantics. Modern C standard is bigger, partly due
    to library case, partly because C have much more operators. But
    problem seem to be mostly quantity of needed text.

    I think that compiler writers would welcome axiomatic semantics,
    it would make their work simpler. More preciely, now compiler
    writers must temselves translate standad text into formulation
    similar to axiomatic semantics. Having official semantics
    would make their work simpler. It would prevent implementing
    some optimizations based on misunderstanding of the standard,
    but if such optimizations were deemed worthty they could
    implement then as a nonstandard thing (like -fast-math now)
    and lobby for change in the standard.

    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.
    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 8 18:51:58 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <11075os$3fm4u$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    [...]
    A naive compiler that performs no optimizations would generate
    code for foo() that attempts to compute (INT_MAX+1)*0 step by
    step, without recognizing the overflow, and that code would never
    be executed.

    Sure. But a far more sophisticated translator (and I would
    argue a nefarious one) could emulate that code, decide it was
    UB, and immediately fail translation with an error.

    I disagree. That's not a sensible interpretation of what the
    standard says.

    I agree it's not sensible. But sadly, the standard does not
    seem to explicitly prohibit it, either. This is the point: we
    necessarily rely on a "reasonable interpretation" of the
    standard to be able to usefully write C code. An adversarial
    interpretation is not sensible, but it appears that such is
    possible given the standard as written. This is a danger with a
    language that is not formally specified.

    I started to compose a followup, but I found that I was mostly
    repeating things I've already written.

    I see no semantic difference between code in a function that's never
    called and code that simply isn't in the program. Neither allows
    an implementation to reject a strictly conforming program -- and
    yes, the program we've been discussing is as strictly conforming as
    `int main(void){}`.

    There's nothing special about functions as units of a program
    subject to undefined behavior. These two programs are semantically
    equivalent:
    void foo(void) { do_something(); }
    int main(void) { foo(); }
    and
    int main(void) { do_something(); }

    A simpler demonstration program might be:

    #include <limits.h>
    int main(void) {
    return 0;
    INT_MAX+1;
    }

    I assert that it is strictly conforming.

    The permission for UB to result in terminating a translation
    isn't even in normative text. It's in a non-normative note,
    which in principle means that it should be derivable from the
    normative text of the standard. (I'm not entirely sure it can be.)
    It certainly doesn't override the requirement that a conforming
    hosted implementation shall accept any strictly conforming program.

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 8 23:05:24 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <865x3yd21n.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86ik81cfk5.fsf_-_@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 2026-06-01 00:54, Keith Thompson wrote:

    [...]

    Yes, a compiler can reduce (a + b) * 0 to just 0. But it's not
    required to do so, and (INT_MAX + 1) * 0 still has undefined
    behavior. Undefined behavior is determined by the rules of the
    abstract machine *without* any adjustments permitted by the as-if
    rule.

    This is something I really don't get in the actual C-logic...

    Using constants that can be determined at compile time is UB here,
    despite the '* 0' mathematically indicating an IMO clear semantics,
    but using variables is only UB possibly at runtime? [...]

    There's an important distinction to make here. Consider this
    program:

    #include <limits.h>

    int
    foo(){
    int zero = (INT_MAX+1)*0;
    return zero;
    }

    int
    main(){
    return 0;
    }

    This program does not transgress the bounds of undefined behavior.

    To clarify, the comments in my posting were meant to be read as
    saying the given text is the entire program, and that it is strictly
    conforming with respect to conforming hosted implementations.
    (Incidentally, given the rules for freestanding implementations, I'm
    not sure that it is even possible for any program to be strictly
    conforming with respect to conforming freestanding implementations.
    In any case my statements were meant only in the context of hosted
    implementations.)

    Ok.

    [snip]
    Perhaps you mean that this is irrelevant because `foo` is not
    invoked, but I see no reason why that need be the case in e.g.
    a freestanding environment.

    I explained the context of my previous statements above. Sorry for
    not saying that in the original message.

    In a hosted environment, I don't
    think anything explicitly prevents `foo` from being called after
    `main` returns (though I can't imagine that would happen in real
    life; it would be weird if it did).

    The semantics described in the ISO C standard don't admit that
    possibility.

    I have read through much of what has been said in the subthread
    following this posting. I expect I will not be responding to much
    of it; my overall sense is that the discussion is mostly confused.
    I would like to say one thing here, and see if that helps things.

    Could you please point to where it says this, in the C standard?

    I cannot find anything that says that arbitrary code cannot run
    after `main()` returns, and I don't see how that could possibly
    be true.

    The logic here is backwards. The C standard is prescriptive: it
    says what _does_ happen, not what _doesn't_ happen. If one wants
    to establish that some "action" takes place, it is necessary to
    find a passage, or passages, in the C standard that, if all are
    taken together, shows that the "action" occurs, or at least that it
    can occur. The C standard doesn't need to say that, for example, a
    function x() other than main(), whose name is never referenced,
    will never be called. If someone wants to establish that x() could
    be called, there needs to be a chain of reasoning going through the
    semantic descriptions given in the C standard, to show that a call
    to x() could occur. If there is no such chain of reasoning, naming
    the pertinent passages in the C standard, to establish a possible
    call, then there is no possible call. In other words the burden of
    proof for a claim that some action could occur rests on whoever is
    making the claim; there is no need to look for something in the C
    standard that says something cannot occur.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Jun 9 00:54:08 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86y0gp82pd.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    [...]

    I'd like to know why you ignored my explanation, based directly on
    text from the C standard, about why an implementation is allowed to
    process the code in question, without giving a diagnostic, and
    still be conforming. An explanation that Dan Cross agreed with,
    even if he may not like the consequences.

    I am mystified as to why you are bringing my name into this, and
    why you think "I may not like the consequences", or even what
    that means. In any event, you are evidently laboring under some
    assumption about what I think about this matter that is probably
    incorrect.

    In a response to another posting of mine, you wrote this:

    But as it happens, I think I can see how your interpretation may
    be valid: if, as a result of UB, the expression evaluates to "0"
    (or 12 or something simiilar) that _is_ representable, then
    there _is no constraint violation_ and so no diagnostic is
    required.

    I do not believe that that is the intent. But it _is_
    conformant with the text of the standard.

    I based my statement that begins "An explanation that Dan Cross
    agreed with, ..." on those two paragraphs.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 9 09:46:01 2026
    From Newsgroup: comp.lang.c

    In article <1107rk3$3ldg4$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <11075os$3fm4u$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    [...]
    A naive compiler that performs no optimizations would generate
    code for foo() that attempts to compute (INT_MAX+1)*0 step by
    step, without recognizing the overflow, and that code would never
    be executed.

    Sure. But a far more sophisticated translator (and I would
    argue a nefarious one) could emulate that code, decide it was
    UB, and immediately fail translation with an error.

    I disagree. That's not a sensible interpretation of what the
    standard says.

    I agree it's not sensible. But sadly, the standard does not
    seem to explicitly prohibit it, either. This is the point: we
    necessarily rely on a "reasonable interpretation" of the
    standard to be able to usefully write C code. An adversarial
    interpretation is not sensible, but it appears that such is
    possible given the standard as written. This is a danger with a
    language that is not formally specified.

    I started to compose a followup, but I found that I was mostly
    repeating things I've already written.

    Yeah, I feel we're going around in circles, here.

    I see no semantic difference between code in a function that's never
    called and code that simply isn't in the program. Neither allows
    an implementation to reject a strictly conforming program -- and
    yes, the program we've been discussing is as strictly conforming as
    `int main(void){}`.

    That's the crux of the issue. I'm not convinced that it is. I
    can see an argument for it (and it's a pretty strong one) but I
    can see an argument against, and the standard as written is
    underspecified in my opinion. Really, that's it.

    There's nothing special about functions as units of a program
    subject to undefined behavior. These two programs are semantically >equivalent:
    void foo(void) { do_something(); }
    int main(void) { foo(); }
    and
    int main(void) { do_something(); }

    A simpler demonstration program might be:

    #include <limits.h>
    int main(void) {
    return 0;
    INT_MAX+1;
    }

    I assert that it is strictly conforming.

    The permission for UB to result in terminating a translation
    isn't even in normative text. It's in a non-normative note,
    which in principle means that it should be derivable from the
    normative text of the standard. (I'm not entirely sure it can be.)

    That specific instance is not, no; that's in a note as you point
    out. I believe deriving it from the normative text is based on
    UB imposing no requirement at all on the implementation.

    It certainly doesn't override the requirement that a conforming
    hosted implementation shall accept any strictly conforming program.

    ...assuming the program is strictly conforming.

    I have arrived at the same place you are with your "42 is not an
    expression" example. The wording of the standard could be
    improved to avoid things like this.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 9 10:08:09 2026
    From Newsgroup: comp.lang.c

    In article <86pl2087z3.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86y0gp82pd.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    [...]

    I'd like to know why you ignored my explanation, based directly on
    text from the C standard, about why an implementation is allowed to
    process the code in question, without giving a diagnostic, and
    still be conforming. An explanation that Dan Cross agreed with,
    even if he may not like the consequences.

    I am mystified as to why you are bringing my name into this, and
    why you think "I may not like the consequences", or even what
    that means. In any event, you are evidently laboring under some
    assumption about what I think about this matter that is probably
    incorrect.

    In a response to another posting of mine, you wrote this:

    But as it happens, I think I can see how your interpretation may
    be valid: if, as a result of UB, the expression evaluates to "0"
    (or 12 or something simiilar) that _is_ representable, then
    there _is no constraint violation_ and so no diagnostic is
    required.

    I do not believe that that is the intent. But it _is_
    conformant with the text of the standard.

    I based my statement that begins "An explanation that Dan Cross
    agreed with, ..." on those two paragraphs.

    Nothing in those two paragraphs asserts that I am unhappy with
    the consequences; I neither like nor dislike the "consequences."
    I simply don't think that was the intent of people who wrote the
    standard.

    Before asserting a subjective interpretation of what someone
    else feels about a thing, you should seek to clarify if what you
    intent to say is accurate. Better yet, just don't do it. And
    of course, what I think about the matter is irrelevant to what
    you wrote to Keith, which I found sufficiently distasteful that
    I rather wish you hadn't mentioned my name in it at all.

    The rest of my earlier response stands.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 9 10:19:21 2026
    From Newsgroup: comp.lang.c

    In article <86tsrc8d0b.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [snip]
    I cannot find anything that says that arbitrary code cannot run
    after `main()` returns, and I don't see how that could possibly
    be true.

    The logic here is backwards. The C standard is prescriptive: it
    says what _does_ happen, not what _doesn't_ happen.

    The definition of undefined behavior in the standard says that
    it _imposes no requirements._ It is explicit that it says it
    mandates neither "what _does_ happen" nor "what _doesn't_
    happen."

    If one wants
    to establish that some "action" takes place, it is necessary to
    find a passage, or passages, in the C standard that, if all are
    taken together, shows that the "action" occurs, or at least that it
    can occur.

    So you're saying that the proverbial nasal demons quip about UB
    is incorrect, since it's not proscribed by the standard. Thanks
    for clarfiying that.

    The C standard doesn't need to say that, for example, a
    function x() other than main(), whose name is never referenced,
    will never be called. If someone wants to establish that x() could
    be called, there needs to be a chain of reasoning going through the
    semantic descriptions given in the C standard, to show that a call
    to x() could occur.

    Actually, no, a reference to a function is not necessary. A
    couple of years ago, a well-publicized issue in a C++ compiler a
    couple of years ago was something along the lines of this:

    ```
    #include <stdio.h>
    void foo(void);
    int
    main(void)
    {
    for (;;);
    }

    void
    foo(void)
    {
    printf("never called\n");
    }
    ```

    The result of which, when run, was to print the text "never
    called" and exit. That compiler was conformant with the text
    of the standard.

    If there is no such chain of reasoning, naming
    the pertinent passages in the C standard, to establish a possible
    call, then there is no possible call. In other words the burden of
    proof for a claim that some action could occur rests on whoever is
    making the claim; there is no need to look for something in the C
    standard that says something cannot occur.

    See above.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Jun 9 15:17:29 2026
    From Newsgroup: comp.lang.c

    On 2026-06-08 23:05, Keith Thompson wrote:
    [...]
    I've discussed this particular glitch before, but it's been a while.

    N3220 6.5.1 says:

    An *expression* is a sequence of operators and operands that
    specifies computation of a value, or that designates an object
    or a function, or that generates side effects, or that performs
    a combination thereof.

    I believe the wording is unchanged from C90 up to the latest C202y
    draft. Since the word "expression" is in italics, this is the
    standard's definition of the word.

    This is a flawed definition. The terms "operator" and "operand"
    are defined in 6.4.6:

    *punctuator: one of
    [ ] ( )
    [snip]

    A punctuator is a symbol that has independent syntactic and semantic
    significance. Depending on context, it may specify an operation to
    be performed (which in turn may yield a value or a function
    designator, produce a side effect, or some combination thereof) in
    which case it is known as an *operator* (other forms of operator also
    exist in some contexts). An *operand* is an entity on which an
    operator acts.

    Consider this expression statement:

    42;

    Is `42` an expression? Clearly it's intended to be, but there is no operator, and therefore there is no operand, so it doesn't meet the standard's definition of the word "expression".

    Above you used the term "expression statement", and then compare the
    "42" to an "expression".

    I know from my earlier C-days that '42;' is a valid statement, and so
    the term "expression statement" makes sense to me.

    I know from various languages' syntax definitions that a number like
    '42' is a sensible form for an expression (and no operators required).
    It's also depending on the context. Where expressions may be written
    (and where not) depends on the concrete language; syntactically and
    also semantically.

    Usually I'd expect above "expression-statement" to serve some purpose, semantically. I don't recall that in "C" such an expression-statement
    would serve any purpose. (Or that they'd show any observable behavior,
    if that term fits the C-parlance better?)

    Or do these stand-alone values (the "expression-statement") have some practically useful semantics?

    In other languages such stand-alone values serve a purpose; e.g. they
    may determine the result value of a block that can then be used in an
    outer context; but in "C" such constructs are obviously not possible.

    What purpose serve such stand-alone numbers in places where statements
    are expected?

    [...]

    The fact that the standard's definition of "expression" is flawed is
    not much of a problem in practice. Virtually everyone, implementers
    and programmers, assumes the obvious intent. Nobody believes that
    `42` isn't an expression. But it is my strongly held opinion that
    the wording should be improved in a future edition of the standard.

    I think it should say something to the effect that the meaning
    of the term "expression" is defined by the grammar. The current
    wording that claims to be the definition of the term could, with
    a few tweaks, still be turned into a valid normative statement
    *about* expressions.

    I have a similar issue with the standard's definition of "value":
    "precise meaning of the contents of an object when interpreted as
    having a specific type". It's obvious that the result of evaluating
    a non-void expression (such as the infamous `42`) is a "value",
    but the definition implies that a "value" can only be the meaning
    of the contents of an object. Nobody is actually misled by the
    current definition, but it should be improved.

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Bart@bc@freeuk.com to comp.lang.c on Tue Jun 9 14:53:35 2026
    From Newsgroup: comp.lang.c

    On 09/06/2026 14:17, Janis Papanagnou wrote:
    On 2026-06-08 23:05, Keith Thompson wrote:
    [...]
    I've discussed this particular glitch before, but it's been a while.

    N3220 6.5.1 says:

         An *expression* is a sequence of operators and operands that
         specifies computation of a value, or that designates an object
         or a function, or that generates side effects, or that performs
         a combination thereof.

    I believe the wording is unchanged from C90 up to the latest C202y
    draft.  Since the word "expression" is in italics, this is the
    standard's definition of the word.

    This is a flawed definition.  The terms "operator" and "operand"
    are defined in 6.4.6:

         *punctuator: one of
             [ ] ( )
         [snip]
         A punctuator is a symbol that has independent syntactic and semantic
         significance. Depending on context, it may specify an operation to >>      be performed (which in turn may yield a value or a function
         designator, produce a side effect, or some combination thereof) in >>      which case it is known as an *operator* (other forms of operator >> also
         exist in some contexts). An *operand* is an entity on which an
         operator acts.

    Consider this expression statement:

         42;

    Is `42` an expression?  Clearly it's intended to be, but there is no
    operator, and therefore there is no operand, so it doesn't meet the
    standard's definition of the word "expression".

    Above you used the term "expression statement", and then compare the
    "42" to an "expression".

    I know from my earlier C-days that '42;' is a valid statement, and so
    the term "expression statement" makes sense to me.

    I know from various languages' syntax definitions that a number like
    '42' is a sensible form for an expression (and no operators required).
    It's also depending on the context. Where expressions may be written
    (and where not) depends on the concrete language; syntactically and
    also semantically.

    Usually I'd expect above "expression-statement" to serve some purpose, semantically. I don't recall that in "C" such an expression-statement
    would serve any purpose. (Or that they'd show any observable behavior,
    if that term fits the C-parlance better?)

    Or do these stand-alone values (the "expression-statement") have some practically useful semantics?

    In other languages such stand-alone values serve a purpose; e.g. they
    may determine the result value of a block that can then be used in an
    outer context; but in "C" such constructs are obviously not possible.

    What purpose serve such stand-alone numbers in places where statements
    are expected?

    I think it is just difficult for the syntax to ban certain expressons
    and not others. How would you express that in the grammar?

    If you ramp up the warnings, then you'll get messages like 'statement
    with no effect' or 'computed value not used', since sometimes there are side-effects that are needed:

    f() + g();

    f() and g() both do something, but nothing is done with their sum.

    In my projects, such standalone expressions are always a hard error. The
    main exceptions include (using C syntax):

    f();
    ++a;
    a = b;

    These are expressions that can return values, but that can sensibly be
    used standalone too. (I don't support value-returning compound assignments.)

    (I first introduced this check because in the past, if I'd been writing
    some C, I might write 'a = b' instead of 'a := b'. The first does
    nothing (compares then discards result), but it is not what I'd intended.)

    Anyway, I don't have it as a syntax violation either.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Jun 9 16:05:03 2026
    From Newsgroup: comp.lang.c

    On 2026-06-08 14:41, Dan Cross wrote:
    [...]

    Unfortunately, the C standard is simply not a precise, formal
    document. This is well-known, and it's hardly C's fault: indeed
    most of the applications of formalized descriptions of PL
    semantics to practical programming languages postdates C's
    invention; Dana Scott didn't introduce the term, "operational
    semantics" until 1970, and it didn't start to make a serious
    impact on languages until later.

    Disclaimer: I haven't read Dana Scott's source that you refer to.
    Myself I've heard that term at university during the early 1980's.
    In 1970 my "knowledge" about computers was on Star-Trek level only.

    I just want to point out Algol 68's formal specification (pre-1970).

    And provide this quote on "Operational Semantic" (from Wikipedia):
    "The concept of operational semantics was used for the first time
    in defining the semantics of Algol 68."

    But Algol 68 was certainly outstanding here, concerning its formal specification, compared to most other languages back these days.

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Jun 9 16:30:23 2026
    From Newsgroup: comp.lang.c

    On 2026-06-09 15:53, Bart wrote:
    On 09/06/2026 14:17, Janis Papanagnou wrote:
    On 2026-06-08 23:05, Keith Thompson wrote:
    [...]
    I've discussed this particular glitch before, but it's been a while.

    N3220 6.5.1 says:

         An *expression* is a sequence of operators and operands that
         specifies computation of a value, or that designates an object
         or a function, or that generates side effects, or that performs >>>      a combination thereof.

    I believe the wording is unchanged from C90 up to the latest C202y
    draft.  Since the word "expression" is in italics, this is the
    standard's definition of the word.

    This is a flawed definition.  The terms "operator" and "operand"
    are defined in 6.4.6:

         *punctuator: one of
             [ ] ( )
         [snip]
         A punctuator is a symbol that has independent syntactic and
    semantic
         significance. Depending on context, it may specify an operation to >>>      be performed (which in turn may yield a value or a function
         designator, produce a side effect, or some combination thereof) in >>>      which case it is known as an *operator* (other forms of operator >>> also
         exist in some contexts). An *operand* is an entity on which an
         operator acts.

    Consider this expression statement:

         42;

    Is `42` an expression?  Clearly it's intended to be, but there is no
    operator, and therefore there is no operand, so it doesn't meet the
    standard's definition of the word "expression".

    Above you used the term "expression statement", and then compare the
    "42" to an "expression".

    I know from my earlier C-days that '42;' is a valid statement, and so
    the term "expression statement" makes sense to me.

    I know from various languages' syntax definitions that a number like
    '42' is a sensible form for an expression (and no operators required).
    It's also depending on the context. Where expressions may be written
    (and where not) depends on the concrete language; syntactically and
    also semantically.

    Usually I'd expect above "expression-statement" to serve some purpose,
    semantically. I don't recall that in "C" such an expression-statement
    would serve any purpose. (Or that they'd show any observable behavior,
    if that term fits the C-parlance better?)

    Or do these stand-alone values (the "expression-statement") have some
    practically useful semantics?

    In other languages such stand-alone values serve a purpose; e.g. they
    may determine the result value of a block that can then be used in an
    outer context; but in "C" such constructs are obviously not possible.

    What purpose serve such stand-alone numbers in places where statements
    are expected?

    I think it is just difficult for the syntax to ban certain expressons
    and not others. How would you express that in the grammar?

    Well, I'd do that as it's done in other languages.

    Define _statements_ and define _expressions_. And defined expressions
    in contexts where a sensible operational semantics can be defined (as
    in mathematical formulas, actual function parameter lists, etc.), but
    not in places where statements are expected.


    If you ramp up the warnings, then you'll get messages like 'statement
    with no effect' or 'computed value not used', since sometimes there are side-effects that are needed:

       f() + g();

    f() and g() both do something, but nothing is done with their sum.

    Right. And I wouldn't allow a mathematical formula where the results
    are calculated but not used, here an expression, as a statement.

    But your example may indeed lead to the actual answer to my question;
    when writing just

    f();

    There's no distinction of procedures and functions in "C". One cannot
    tell whether that f() is a "procedure" (i.e. a function with no return
    value, or one with return value but the call just relying on the side
    effects). In "C" any value of f() just gets discarded in this context.

    That of course doesn't mean that it could be handled by the compilers
    and sensibly defined by the language, depending on how f() is actually
    defined. After all, 'f();' is not the same case as '42;'.

    But okay, we're talking about "C" here - so own design preferences are
    anyway irrelevant here.

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Jun 9 17:13:10 2026
    From Newsgroup: comp.lang.c

    On 09/06/2026 16:30, Janis Papanagnou wrote:
    On 2026-06-09 15:53, Bart wrote:
    On 09/06/2026 14:17, Janis Papanagnou wrote:
    On 2026-06-08 23:05, Keith Thompson wrote:
    [...]
    I've discussed this particular glitch before, but it's been a while.

    N3220 6.5.1 says:

         An *expression* is a sequence of operators and operands that
         specifies computation of a value, or that designates an object >>>>      or a function, or that generates side effects, or that performs >>>>      a combination thereof.

    I believe the wording is unchanged from C90 up to the latest C202y
    draft.  Since the word "expression" is in italics, this is the
    standard's definition of the word.

    This is a flawed definition.  The terms "operator" and "operand"
    are defined in 6.4.6:

         *punctuator: one of
             [ ] ( )
         [snip]
         A punctuator is a symbol that has independent syntactic and
    semantic
         significance. Depending on context, it may specify an operation to
         be performed (which in turn may yield a value or a function
         designator, produce a side effect, or some combination thereof) in
         which case it is known as an *operator* (other forms of
    operator also
         exist in some contexts). An *operand* is an entity on which an >>>>      operator acts.

    Consider this expression statement:

         42;

    Is `42` an expression?  Clearly it's intended to be, but there is no
    operator, and therefore there is no operand, so it doesn't meet the
    standard's definition of the word "expression".

    Above you used the term "expression statement", and then compare the
    "42" to an "expression".

    I know from my earlier C-days that '42;' is a valid statement, and so
    the term "expression statement" makes sense to me.

    I know from various languages' syntax definitions that a number like
    '42' is a sensible form for an expression (and no operators required).
    It's also depending on the context. Where expressions may be written
    (and where not) depends on the concrete language; syntactically and
    also semantically.

    Usually I'd expect above "expression-statement" to serve some purpose,
    semantically. I don't recall that in "C" such an expression-statement
    would serve any purpose. (Or that they'd show any observable behavior,
    if that term fits the C-parlance better?)


    I don't see why you would expect that. Statements do not have to have observable behaviour - indeed, I don't think any statements in C have observable behaviour in themselves. A "statement" in C is basically
    something that does not produce a value - "return", "if ...", "for...",
    or it is an "expression statement". Expression statements are the most
    common type of statement, I would guess (without having calculated statistics.)

    Expressions do not have to have observable behaviour. "x = y + z;" is a perfectly good expression statement, but has no observable behaviour
    (unless x, y or z are volatile). Most statements, and most expressions,
    do not have observable behaviour. (Again, I have no statistics, but I
    think this would be the solid majority of statements and expressions.)

    Of course most statements and expressions /contribute/ to later
    observable behaviour - such as printing out the result of a calculation.
    Otherwise they are not much use (and compilers can eliminate or reduce
    them, if the compiler is sure that there is no effect on observable behaviour).

    Or do these stand-alone values (the "expression-statement") have some
    practically useful semantics?

    In other languages such stand-alone values serve a purpose; e.g. they
    may determine the result value of a block that can then be used in an
    outer context; but in "C" such constructs are obviously not possible.

    What purpose serve such stand-alone numbers in places where statements
    are expected?

    I think it is just difficult for the syntax to ban certain expressons
    and not others. How would you express that in the grammar?

    Agreed.

    "42" is an expression of type "int", and so is 'printf("Hello\n")'. How
    (and why) would a language distinguish between them and allow one but
    not the other?


    Well, I'd do that as it's done in other languages.

    Define _statements_ and define _expressions_.

    C defines statements and expressions. One type of statement is the "expression statement", consisting of an expression followed by a
    semi-colon. The expression is optional - if it is missing, you have a
    null statement.

    And defined expressions
    in contexts where a sensible operational semantics can be defined (as
    in mathematical formulas, actual function parameter lists, etc.), but
    not in places where statements are expected.


    So where would "printf" fit in this picture? A printf call gives a
    result - it is an expression. It also has side-effects and observable behaviour. "while (false) ;" is a valid statement, with no
    side-effects. The distinction you want to make does not exist in C.
    (And I don't think C is special in that regard.)


    If you ramp up the warnings, then you'll get messages like 'statement
    with no effect' or 'computed value not used', since sometimes there
    are side-effects that are needed:

        f() + g();

    f() and g() both do something, but nothing is done with their sum.

    Right. And I wouldn't allow a mathematical formula where the results
    are calculated but not used, here an expression, as a statement.

    If the definitions of "f" and "g" are not visible to the compiler at the
    time, how could the compiler know that they have no side-effects? Lots
    of operators have side-effects - if you want to allow "x = y;" but
    disallow "x + y;" you are going to have to have a lot of special cases
    and extra grammar, syntax or constraint rules. It is better to do as C
    does, and allow expression statements in the language and let compilers
    and other tools help developers spot their mistakes.


    But your example may indeed lead to the actual answer to my question;
    when writing just

      f();

    There's no distinction of procedures and functions in "C". One cannot
    tell whether that f() is a "procedure" (i.e. a function with no return
    value, or one with return value but the call just relying on the side effects). In "C" any value of f() just gets discarded in this context.


    Yes.

    It is certainly possible for a language to distinguish between "pure functions" and functions/procedures with side-effects. (C actually lets
    you do that, with the [[reproducible]] and [[unsequenced]] attributes in
    C23, or compiler extensions before C23.) These can aid compiler static
    error checking and optimisation, but do not affect the grammar of the language.

    That of course doesn't mean that it could be handled by the compilers
    and sensibly defined by the language, depending on how f() is actually defined. After all, 'f();' is not the same case as '42;'.

    But okay, we're talking about "C" here - so own design preferences are
    anyway irrelevant here.

    Janis

    [...]


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From tTh@tth@none.invalid to comp.lang.c on Tue Jun 9 19:27:50 2026
    From Newsgroup: comp.lang.c

    On 6/9/26 15:53, Bart wrote:

       f() + g();

    f() and g() both do something, but nothing is done with their sum.

    I've just one question : why did you waste your life time
    with a lot of non-sense questions ?
    --
    ** **
    * tTh des Bourtoulots *
    * http://maison.tth.netlib.re/ *
    ** **
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Bart@bc@freeuk.com to comp.lang.c on Tue Jun 9 19:19:07 2026
    From Newsgroup: comp.lang.c

    On 09/06/2026 18:27, tTh wrote:
    On 6/9/26 15:53, Bart wrote:

        f() + g();

    f() and g() both do something, but nothing is done with their sum.

      I've just one question : why did you waste your life time
      with a lot of non-sense questions ?


    I didn't ask any question.

    You, on the other hand, did.

    I take it that you don't understand what is being discussed, and why. In
    that case you're wasting /your/ time posting.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 15:07:54 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <1107rk3$3ldg4$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    [...]
    The permission for UB to result in terminating a translation
    isn't even in normative text. It's in a non-normative note,
    which in principle means that it should be derivable from the
    normative text of the standard. (I'm not entirely sure it can be.)

    That specific instance is not, no; that's in a note as you point
    out. I believe deriving it from the normative text is based on
    UB imposing no requirement at all on the implementation.

    No, the standard imposes no requirements on the *behavior*.
    It still imposes requirements on the implementation.

    The requirements imposed on an implementation are of a different
    kind than the requirements imposed on a running program.
    (An implementation might not even be writtin in C.)

    For example, if a program dies with a segfault, it's likely due to
    the program having undefined behavior. If a compiler dies with a
    segfault, it's always a bug in the compiler (though the standard
    doesn't say this).

    If, as I suggest, the word "behavior" ("external appearance or
    action") refers only to the behavior of a running program, then I
    don't see how the non-normative permission to terminate a translation
    follows from any normative text.

    One possible argument is the statement in Section 4 that "A
    *conforming hosted implementation* shall accept any strictly
    conforming program", which *might* imply that a conforming hosted implementation is permitted to reject (not accept) any program that
    is not strictly conforming. I'm not comfortable with that argument.

    It certainly doesn't override the requirement that a conforming
    hosted implementation shall accept any strictly conforming program.

    ...assuming the program is strictly conforming.

    Or deriving the fact that a program is strictly conforming by reading
    the program and the definition of "strictly conforming program".

    I have arrived at the same place you are with your "42 is not an
    expression" example. The wording of the standard could be
    improved to avoid things like this.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 15:12:42 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    Actually, no, a reference to a function is not necessary. A
    couple of years ago, a well-publicized issue in a C++ compiler a
    couple of years ago was something along the lines of this:

    ```
    #include <stdio.h>
    void foo(void);
    int
    main(void)
    {
    for (;;);
    }

    void
    foo(void)
    {
    printf("never called\n");
    }
    ```

    The result of which, when run, was to print the text "never
    called" and exit. That compiler was conformant with the text
    of the standard.
    [...]

    That doesn't make sense to me. Do you have a citation to this incident,
    and is it relevant to C?

    There is a special rule in C about implementations being allowed
    to assume that an infinite loop terminates (N3220 6.8.6.1p4),
    but (a) it wouldn't apply to this case, and (b) even if it did,
    it wouldn't imply that an implicit call to foo would be permitted.
    I can imagine an argument that the program has undefined behavior
    and therefore it could print "never called" or "nasal demons",
    but I'd have to see the argument.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 15:22:06 2026
    From Newsgroup: comp.lang.c

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    [...]
    Above you used the term "expression statement", and then compare the
    "42" to an "expression".

    I know from my earlier C-days that '42;' is a valid statement, and so
    the term "expression statement" makes sense to me.

    Sorry, I thought that would be clear enough.

    Syntactically, an expression-statement is an optional statement
    followed by a semicolon (N3220 6.8.4, glossing over an irrelevant
    detail). I merely used it as an easy way to establish a context in
    which 42 is obviously a full expression (defined as "an expression
    that is not part of another expression, nor part of a declarator
    or abstract declarator").

    An expression-statement where the expression has no side effects
    is not useful, but it's permitted. C tends not to ban things just
    because they're not useful. `42;` is useful only to illustrate
    the point I was making about expressions.

    Since a function call is an expression, this is an expression-statement:

    printf("hello, world\n");

    [...]

    To be clear, I have zero doubt that 42 is an expression. My concern
    is that the C standard's English definition of "expression" doesn't
    quite say so. I advocate improving the wording so it expresses
    the obvious and universally agreed intent.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Tue Jun 9 18:29:38 2026
    From Newsgroup: comp.lang.c

    On 2026-06-08 21:25, Waldek Hebisch wrote:
    Dan Cross <cross@spitfire.i.gajendra.net> wrote:
    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    ...
    In the program quoted at the top of this post, the UB occurs in
    a function foo() that's never called. A compiler can replace the
    body of foo() with a trap, and it can certainly warn about the UB,
    but I don't believe it can reject the entire program. A clever
    compiler could prove that the UB never occurs.

    So there are two things that are at play here.

    First, this notion that UB is _only_ a runtime matter. The text
    of the standard contradicting that aside, if a translator can
    detect that the behavior of a construct is provably undefined if
    executed, then it seems axiomatic that UB is clearly something
    that plays a role at translation time, as well.

    The committee has decided otherwise. The committee's resolution to DR
    109 said:

    "A conforming implementation must not fail to translate a strictly
    conforming program simply because some possible execution of that
    program would result in undefined behavior. Because foo might never be
    called, the example given must be successfully translated by a
    conforming implementation."

    The module in question defined a function with a line that contained the expression-statement

    1/0;

    and that statement was absolutely guaranteed to be executed if the
    function was called. However, since the module did not contain any calls
    to that function, the committee ruled that an implementation was not
    allowed to refuse to translate it.

    If linked to another module that contained a call to that function,
    whether or not the implementation could refuse translation depends upon
    what could be said about the call:

    1. If the call to that function was guaranteed to be executed upon
    starting the program, the implementation may refuse translation.

    2. If the call to that function was guaranteed to never be executed, the undefined behavior associated with 1/0 has no effect.

    3. If the call to that function might or might not be executed, the
    undefined behavior associated with 1/0 cannot have effect until
    execution of that call becomes inevitable.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 15:34:06 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]

    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.

    In both languages, functions and procedures are distinct. Functions
    return values; procedures do not. An expression cannot be turned
    into a statement just by adding a semicolon. A function call is
    an expression. A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored). In Ada, an error in the
    equivalent Put_Line("Hello, world") raises an exception, which
    can't easily be ignored.

    Both approaches are valid.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Jun 9 16:01:14 2026
    From Newsgroup: comp.lang.c

    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    [...]
    The committee has decided otherwise. The committee's resolution to DR
    109 said:

    "A conforming implementation must not fail to translate a strictly
    conforming program simply because some possible execution of that
    program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a
    conforming implementation."

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Jun 10 09:04:26 2026
    From Newsgroup: comp.lang.c

    On 10/06/2026 00:34, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]

    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.


    I don't know enough about Ada to be sure, but Pascal does not do this -
    see below.

    In both languages, functions and procedures are distinct. Functions
    return values; procedures do not. An expression cannot be turned
    into a statement just by adding a semicolon. A function call is
    an expression. A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    Sure. But the key factor there is that "printf", or its equivalent
    (such as "writeln", if I remember my Pascal correctly - it's been a
    while) are /procedures/. A "print" function in Pascal that returned the number of characters printed would be a function, used in an expression,
    not a procedure used in a statement.

    The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
    return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily be
    done in a clear and consistent way is to distinguish between two
    expressions of type "int" (or any other general non-void type).

    In C, an expression statement "expr;" causes the expression to be
    evaluated as a void expression for its side effects (§6.8.4p2). You
    can, arguably, say that C also requires all statements to be of "void"
    type, just like Pascal - but the cast-to-void is done implicitly to
    treat "expr;" as "(void) expr;".


    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored). In Ada, an error in the
    equivalent Put_Line("Hello, world") raises an exception, which
    can't easily be ignored.

    Both approaches are valid.


    Indeed they are.

    It is also fine for a language to distinguish between "pure" functions
    and functions/procedures with side-effects and/or functions/procedures
    with observable behaviour. (A "pure procedure" would not do anything.)
    As far as I remember, Pascal does not make that distinction.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Bart@bc@freeuk.com to comp.lang.c on Wed Jun 10 11:10:29 2026
    From Newsgroup: comp.lang.c

    On 10/06/2026 08:04, David Brown wrote:
    On 10/06/2026 00:34, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]

    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.


    I don't know enough about Ada to be sure, but Pascal does not do this -
    see below.

    In both languages, functions and procedures are distinct.  Functions
    return values; procedures do not.  An expression cannot be turned
    into a statement just by adding a semicolon.  A function call is
    an expression.  A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    Sure.  But the key factor there is that "printf", or its equivalent
    (such as "writeln", if I remember my Pascal correctly - it's been a
    while) are /procedures/.  A "print" function in Pascal that returned the number of characters printed would be a function, used in an expression,
    not a procedure used in a statement.

    The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
    return type.  It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this.  What cannot easily be done in a clear and consistent way is to distinguish between two
    expressions of type "int" (or any other general non-void type).

    In C, an expression statement "expr;" causes the expression to be
    evaluated as a void expression for its side effects (§6.8.4p2).

    In C201x draft. 6.8.4p2 is about selection statements.

      You
    can, arguably, say that C also requires all statements to be of "void"
    type, just like Pascal - but the cast-to-void is done implicitly to
    treat "expr;" as "(void) expr;".

    That's not quite the same thing. If I write:

    int a;
    a;

    then gcc -Wall will report a warning. But write it as (void)a, then it doesn't.

    While this is awkward to express in a language's grammar, it can choose
    to list the kinds of expressions that /are/ allowed to be statements,
    rather than leave it to the whim of an implemenation. (The ones that
    aren't allowed would be a much bigger, unlimited set.)

    For example:

    E(...); // function call
    ++E; // increment
    E = E; // assigment (and compound assignment)

    E is any expression term. Here, the call/increment/assignment is the
    top-level AST mode.

    (I do this in my stuff, and there I can override the restriction using
    'eval': eval a + b, which turns it into an allowed form.

    Mainly this is for convenience of testing, but it was also used to
    ensure an expression ended up in the primary register for subsequent
    inline assembly.)


    For I/O, the equivalent of printf is a procedure.  In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).  In Ada, an error in the
    equivalent Put_Line("Hello, world") raises an exception, which
    can't easily be ignored.

    Both approaches are valid.


    Indeed they are.

    Distinguishing between function and procedure is incredibly rare in
    modern languages. There the preoccupation seems to be to unify
    everything: everything is a function, even if-statements and loops.
    Every function is a closure, etc. I do not consider that useful.

    It is also fine for a language to distinguish between "pure" functions
    and functions/procedures with side-effects and/or functions/procedures
    with observable behaviour.  (A "pure procedure" would not do anything.)
    As far as I remember, Pascal does not make that distinction.


    This goes the other way and is a better idea!

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 03:17:41 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    On 10/06/2026 00:34, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]
    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.

    I don't know enough about Ada to be sure, but Pascal does not do this
    - see below.

    You seem to disagree with me, but then you describe most of what
    I wrote. I'm not sure where you disagree, or where our signals
    got crossed.

    Ada and Pascal don't have expression statements. The Pascal
    (writeln(...)) and Ada (Put_Line(...)) constructs most similar
    to C's printf("Hello\n") are procedure calls. 42 can't made into
    a statement by adding a semicolon. Neither can any function call.
    But a procedure call can. That's how and why Pascal and Ada allow
    one but not the other. (And both languages deliberately make it
    awkward to ignore the value returned by a function.)

    In both languages, functions and procedures are distinct. Functions
    return values; procedures do not. An expression cannot be turned
    into a statement just by adding a semicolon. A function call is
    an expression. A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    Sure. But the key factor there is that "printf", or its equivalent
    (such as "writeln", if I remember my Pascal correctly - it's been a
    while) are /procedures/. A "print" function in Pascal that returned
    the number of characters printed would be a function, used in an
    expression, not a procedure used in a statement.

    Right, and a Pascal function that prints its argument and returns an
    integer value could not be used by itself as a statement.

    The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
    return type. It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this. What cannot easily
    be done in a clear and consistent way is to distinguish between two expressions of type "int" (or any other general non-void type).

    Right. Which is why the I/O and similar subroutines that you'd want to
    use as statements are procedures, not functions.

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Jun 10 13:29:13 2026
    From Newsgroup: comp.lang.c

    On 10/06/2026 12:10, Bart wrote:
    On 10/06/2026 08:04, David Brown wrote:
    On 10/06/2026 00:34, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]

    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.


    I don't know enough about Ada to be sure, but Pascal does not do this
    - see below.

    In both languages, functions and procedures are distinct.  Functions
    return values; procedures do not.  An expression cannot be turned
    into a statement just by adding a semicolon.  A function call is
    an expression.  A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    Sure.  But the key factor there is that "printf", or its equivalent
    (such as "writeln", if I remember my Pascal correctly - it's been a
    while) are /procedures/.  A "print" function in Pascal that returned
    the number of characters printed would be a function, used in an
    expression, not a procedure used in a statement.

    The rough equivalent of the distinction between Pascal procedures and
    functions is that procedures are like C functions that have "void"
    return type.  It's fine (and not at all a bad idea) for a language to
    distinguish between void and non-void like this.  What cannot easily
    be done in a clear and consistent way is to distinguish between two
    expressions of type "int" (or any other general non-void type).

    In C, an expression statement "expr;" causes the expression to be
    evaluated as a void expression for its side effects (§6.8.4p2).

    In C201x draft. 6.8.4p2 is about selection statements.


    C23 is the latest C standard, so that was what I was using (n3220.pdf).
    It is unfortunate that C23 has slightly different numbers for some
    sections - the standards authors have previously managed a higher
    consistency between versions. Section 6.8.3p2 is the number for C11 (as
    you have probably found already).

      You can, arguably, say that C also requires all statements to be of
    "void" type, just like Pascal - but the cast-to-void is done
    implicitly to treat "expr;" as "(void) expr;".

    That's not quite the same thing. If I write:

       int a;

    (Just to be clear that we agree - "int a;" is a declaration, not a
    statement, expression, or expression statement.)

       a;

    then gcc -Wall will report a warning. But write it as (void)a, then it doesn't.

    Yes. But that's a matter of warnings and conventional idioms, not the C language. "a;" and "(void) a;" both mean the same thing in the C
    language. gcc, like many compilers, has warnings on unused variables
    and parameters, and set-but-unused variables, as these are often the
    result of mistakes in the code. And tools that have such warnings have
    ways to mark intentionally unused variables and parameters - such as __attribute__(("unused")) or C23's "[[maybe_unused]]". A common idiom
    is that casting an expression or variable to void tells the compiler
    that you know the variable or parameter is unused, and only evaluated
    for its side-effects (if any).


    While this is awkward to express in a language's grammar, it can choose
    to list the kinds of expressions that /are/ allowed to be statements,
    rather than leave it to the whim of an implemenation. (The ones that
    aren't allowed would be a much bigger, unlimited set.)


    Yes, a language could do that. In C, the language chooses to allow expressions of any type - that's the simplest to express!

    For example:

       E(...);      // function call
       ++E;         // increment
       E = E;       // assigment (and compound assignment)

    E is any expression term. Here, the call/increment/assignment is the top-level AST mode.

    (I do this in my stuff, and there I can override the restriction using 'eval': eval a + b, which turns it into an allowed form.

    Mainly this is for convenience of testing, but it was also used to
    ensure an expression ended up in the primary register for subsequent
    inline assembly.)

    A better choice for a language that wanted to restrict the kinds of expressions that can be used as statements would be to do as Pascal does
    - allow only what C would consider "void" expressions as statements, and
    make things like assignment void expressions. Saying that "x = 1" is an expression of type "int" that can be used as a statement while "x + 1"
    is an expression of type "int" that cannot be used as a statement would
    likely require significant complication in the language rules to work
    well. Saying that "x = 1" is a void expression and can therefore be
    used as a statement, while "x + 1" is a non-void expression and can
    therefore not be used as a statement, is simple and clear. The cost -
    or the benefit, depending on your viewpoint and preferences - is that it
    is no longer possible to write "x = y = 1" or "while (x = read())...".




    For I/O, the equivalent of printf is a procedure.  In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).  In Ada, an error in the
    equivalent Put_Line("Hello, world") raises an exception, which
    can't easily be ignored.

    Both approaches are valid.


    Indeed they are.

    Distinguishing between function and procedure is incredibly rare in
    modern languages. There the preoccupation seems to be to unify
    everything: everything is a function, even if-statements and loops.
    Every function is a closure, etc. I do not consider that useful.


    Fair enough. There are pros and cons to any such choices.

    It is also fine for a language to distinguish between "pure" functions
    and functions/procedures with side-effects and/or functions/procedures
    with observable behaviour.  (A "pure procedure" would not do
    anything.) As far as I remember, Pascal does not make that distinction.


    This goes the other way and is a better idea!


    I personally think the "purity" of a function/procedure is a more
    important distinction than whether or not it evaluates to a non-void.
    But it is hard to see how it would work well in a compiled imperative
    language - an emphasis on pure functions is more the domain of function programming languages. But a discussion on that would be more for comp.lang.misc than comp.lang.c


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Wed Jun 10 13:43:01 2026
    From Newsgroup: comp.lang.c

    On 10/06/2026 12:17, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 10/06/2026 00:34, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]
    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.

    I don't know enough about Ada to be sure, but Pascal does not do this
    - see below.

    You seem to disagree with me, but then you describe most of what
    I wrote. I'm not sure where you disagree, or where our signals
    got crossed.


    It was most likely a misunderstanding or misinterpretation of what you
    wrote - or what I wrote in the earlier post. We agree on how Pascal
    (and, AFAIUI, Ada) work, and we can let that stand as a clarification
    rather than risk yet another endless thread on the details of exactly
    what words were used.

    Ada and Pascal don't have expression statements. The Pascal
    (writeln(...)) and Ada (Put_Line(...)) constructs most similar
    to C's printf("Hello\n") are procedure calls. 42 can't made into
    a statement by adding a semicolon. Neither can any function call.
    But a procedure call can. That's how and why Pascal and Ada allow
    one but not the other. (And both languages deliberately make it
    awkward to ignore the value returned by a function.)

    Agreed.


    In both languages, functions and procedures are distinct. Functions
    return values; procedures do not. An expression cannot be turned
    into a statement just by adding a semicolon. A function call is
    an expression. A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    Sure. But the key factor there is that "printf", or its equivalent
    (such as "writeln", if I remember my Pascal correctly - it's been a
    while) are /procedures/. A "print" function in Pascal that returned
    the number of characters printed would be a function, used in an
    expression, not a procedure used in a statement.

    Right, and a Pascal function that prints its argument and returns an
    integer value could not be used by itself as a statement.

    Agreed.


    The rough equivalent of the distinction between Pascal procedures and
    functions is that procedures are like C functions that have "void"
    return type. It's fine (and not at all a bad idea) for a language to
    distinguish between void and non-void like this. What cannot easily
    be done in a clear and consistent way is to distinguish between two
    expressions of type "int" (or any other general non-void type).

    Right. Which is why the I/O and similar subroutines that you'd want to
    use as statements are procedures, not functions.


    That is often the case, but is certainly not required by the language.
    Even standard functions can have side-effects (like "random"), though idiomatic Pascal typically uses procedures where the results are
    obtained by passing result variables by reference.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 12:36:28 2026
    From Newsgroup: comp.lang.c

    In article <110a5vr$b2kq$5@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    [...]
    The committee has decided otherwise. The committee's resolution to DR
    109 said:

    "A conforming implementation must not fail to translate a strictly
    conforming program simply because some possible execution of that
    program would result in undefined behavior. Because foo might never be
    called, the example given must be successfully translated by a
    conforming implementation."

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_109.html

    [...]

    That does appear to settle the matter definitively, thanks.

    Ok, I was wrong and I concede that the program we've been
    discussing is strictly conforming, regardless of however
    antagnostic a reader of the standard may be.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 14:37:01 2026
    From Newsgroup: comp.lang.c

    In article <110a34q$b2kq$2@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    Actually, no, a reference to a function is not necessary. A
    couple of years ago, a well-publicized issue in a C++ compiler a
    couple of years ago was something along the lines of this:

    ```
    #include <stdio.h>
    void foo(void);
    int
    main(void)
    {
    for (;;);
    }

    void
    foo(void)
    {
    printf("never called\n");
    }
    ```

    The result of which, when run, was to print the text "never
    called" and exit. That compiler was conformant with the text
    of the standard.
    [...]

    That doesn't make sense to me. Do you have a citation to this incident,

    Yes: https://godbolt.org/z/d1WP4KP99

    There was such an outcry when this was discovered that the C++
    standard was modified to add a note explicitly allowing,
    "trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress

    That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
    in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html

    and is it relevant to C?

    Here's a C version with the same behavior:

    ```
    term% cat weird.c
    #include <stdio.h>

    int
    main(void)
    {
    for (unsigned int k = 0; k != 1; k += 2)
    ;
    return 0;
    }

    void
    hello(void)
    {
    printf("Hello, World!\n");
    }
    term% clang --version
    clang version 22.1.6
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
    term% ./weird
    Hello, World!
    term%
    ```

    There is a special rule in C about implementations being allowed
    to assume that an infinite loop terminates (N3220 6.8.6.1p4),

    The program above meets the criteria in sec 6.8.6.1 para 4 that
    allows an implementation to assume that the loop terminates.
    Godbolt link: https://godbolt.org/z/q46o5cYGM

    but (a) it wouldn't apply to this case, and (b) even if it did,
    it wouldn't imply that an implicit call to foo would be permitted.
    I can imagine an argument that the program has undefined behavior
    and therefore it could print "never called" or "nasal demons",
    but I'd have to see the argument.

    Regehr aluded to this with his taxonomy of undefined functions.
    For a function that is always undefined (a "Type 3" function), a
    compiler is under no obligation to even produce a return
    instruction for it, and the behavior of a call to such a
    function is totally undefined. Nothing stops it from cascading
    into whatever the linker happens to put after it.

    Therefore, given UB, it is not necessary to have a reference to
    some function in a program's source text in order for it to be
    executed.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 18:30:53 2026
    From Newsgroup: comp.lang.c

    In article <110bsqd$9ab$1@reader1.panix.com>,
    Dan Cross <cross@spitfire.i.gajendra.net> wrote:
    In article <110a34q$b2kq$2@kst.eternal-september.org>,
    [snip]
    Here's a C version with the same behavior:

    ```
    term% cat weird.c
    #include <stdio.h>

    int
    main(void)
    {
    for (unsigned int k = 0; k != 1; k += 2)
    ;
    return 0;
    }

    void
    hello(void)
    {
    printf("Hello, World!\n");
    }
    term% clang --version
    clang version 22.1.6
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
    term% ./weird
    Hello, World!
    term%
    ```

    Replying to myself here, but...this is another example of weird
    behavior:

    ```
    term% cat boo.c
    #include <limits.h>

    int
    monstartup(void)
    {
    return INT_MAX + 1;
    }

    int
    main(void)
    {
    return 0;
    }
    term% clang --version | sed 1q
    FreeBSD clang version 19.1.7 (https://github.com/llvm/llvm-project.git llvmorg-19.1.7-0-gcd708029e0b2)
    term% clang -Wall -Wextra -pedantic -pedantic-errors -pg -fsanitize=undefined -o boo boo.c
    boo.c:6:17: warning: overflow in expression; result is -2'147'483'648 with type 'int' [-Winteger-overflow]
    6 | return INT_MAX + 1;
    | ~~~~~~~~^~~
    1 warning generated.
    term% ./boo
    boo.c:6:17: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
    SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior boo.c:6:17
    term%
    ```

    (I admit that I am cheating a bit, but I claim that this program
    is strictly conforming.)

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 14:08:52 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    In C, an expression statement "expr;" causes the expression to be
    evaluated as a void expression for its side effects (§6.8.4p2). You
    can, arguably, say that C also requires all statements to be of "void"
    type, just like Pascal - but the cast-to-void is done implicitly to
    treat "expr;" as "(void) expr;".
    [...]

    In an expression statement, the expression is "evaluated as a void
    expression for its side effects". I think that's equivalent to
    convert (not casting!) it to void, but the standard doesn't describe
    it that way.

    6.3.2.2: "If an expression of any other type [other than void]
    is evaluated as a void expression, its value or designator is
    discarded."

    But statements have no type.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 14:47:10 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110a34q$b2kq$2@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    Actually, no, a reference to a function is not necessary. A
    couple of years ago, a well-publicized issue in a C++ compiler a
    couple of years ago was something along the lines of this:

    ```
    #include <stdio.h>
    void foo(void);
    int
    main(void)
    {
    for (;;);
    }

    void
    foo(void)
    {
    printf("never called\n");
    }
    ```

    The result of which, when run, was to print the text "never
    called" and exit. That compiler was conformant with the text
    of the standard.
    [...]

    That doesn't make sense to me. Do you have a citation to this incident,

    Yes: https://godbolt.org/z/d1WP4KP99

    There was such an outcry when this was discovered that the C++
    standard was modified to add a note explicitly allowing,
    "trivial infinite loops, which cannot be removed or reordered." https://eel.is/c++draft/intro.progress

    That change is commit 29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e (https://github.com/cplusplus/draft/commit/29fcc1c1fab7277d96bbd2ccd37b0c14dfd75a0e)
    in response to P2809: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2809r3.html

    So the reason the behavior was conforming was that the behavior of
    the infinite loop is undefined. I dislike the way the C++ standard
    expresses this. It says "The implementation *may assume* that any
    thread will eventually do one of the following" (emphasis added).
    More on that later in the context of the similar C rule.

    and is it relevant to C?

    Here's a C version with the same behavior:

    ```
    term% cat weird.c
    #include <stdio.h>

    int
    main(void)
    {
    for (unsigned int k = 0; k != 1; k += 2)
    ;
    return 0;
    }

    void
    hello(void)
    {
    printf("Hello, World!\n");
    }
    term% clang --version
    clang version 22.1.6
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
    term% ./weird
    Hello, World!
    term%
    ```

    There is a special rule in C about implementations being allowed
    to assume that an infinite loop terminates (N3220 6.8.6.1p4),

    The program above meets the criteria in sec 6.8.6.1 para 4 that
    allows an implementation to assume that the loop terminates.
    Godbolt link: https://godbolt.org/z/q46o5cYGM

    Right. ("for (;;);" in the original program does not.)

    Note that the C++ special rule applies only when the condition is
    equivalent to a constant `true` and the body of the loop is empty.
    An implementation can "assume" that any other loop will eventually
    finish.

    The rule in C is (6.8.6.1p4):

    An iteration statement may be assumed by the implementation
    to terminate if its controlling expression is not a constant
    expression, and none of the following operations are performed
    in its body, controlling expression or (in the case of a for
    statement) its expression-3
    — input/output operations
    — accessing a volatile object
    — synchronization or atomic operations.

    `for (;;)` is treated as having a constant controlling expression.

    This covers more cases than the C++ rule.

    I dislike it for most of the same reasonss. It should be phrased
    in terms of the permitted behavior of a program, not what an
    implementation is allowed to "assume".

    In addition to that, I dislike the whole idea. I think it's
    intended to enable optimizations, but it means that for this
    contrived program:

    #include <stdio.h>
    int main(void) {
    bool keep_going = true;
    while (keep_going) {
    keep_going = true;
    }
    puts("never reached");
    }

    the implementation is allowed to "assume" that the loop eventually
    terminates. It's not clear what permissions the implementation is being
    given if the assumption is violated. I think the program could legally
    print "never reached", but if violating the assumption implies undefined behavior it could do anything.

    A programmer could easily write a program similar to the above
    and think that the meaning is perfectly clear, have it behave very
    differently because of one obscure subclause in the standard.

    but (a) it wouldn't apply to this case, and (b) even if it did,
    it wouldn't imply that an implicit call to foo would be permitted.
    I can imagine an argument that the program has undefined behavior
    and therefore it could print "never called" or "nasal demons",
    but I'd have to see the argument.

    Regehr aluded to this with his taxonomy of undefined functions.
    For a function that is always undefined (a "Type 3" function), a
    compiler is under no obligation to even produce a return
    instruction for it, and the behavior of a call to such a
    function is totally undefined. Nothing stops it from cascading
    into whatever the linker happens to put after it.

    Therefore, given UB, it is not necessary to have a reference to
    some function in a program's source text in order for it to be
    executed.

    Of course. Given UB, anything can happen. There's nothing special
    about a function that's never called in that context. It just
    happens to be the way it showed up in the C++ incident.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 14:55:00 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    Replying to myself here, but...this is another example of weird
    behavior:

    ```
    term% cat boo.c
    #include <limits.h>

    int
    monstartup(void)
    {
    return INT_MAX + 1;
    }

    int
    main(void)
    {
    return 0;
    }
    [SNIP]
    (I admit that I am cheating a bit, but I claim that this program
    is strictly conforming.)

    I agree that the program is strictly conforming.

    I don't know the details, but I think "monstartup" is a special name,
    and that the program would behave as expected if a different name
    were used. Since "monstartup" is not reserved, an implementation
    that visibly treats it specially is not conforming.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Wed Jun 10 15:11:46 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86tsrc8d0b.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    [...]
    The C standard doesn't need to say that, for example, a
    function x() other than main(), whose name is never referenced,
    will never be called. If someone wants to establish that x() could
    be called, there needs to be a chain of reasoning going through the
    semantic descriptions given in the C standard, to show that a call
    to x() could occur.

    Actually, no, a reference to a function is not necessary. A
    couple of years ago, a well-publicized issue in a C++ compiler a
    couple of years ago was something along the lines of this:
    [...]

    This is comp.lang.c. My comments were only about C, and not
    about C++. But of course you already knew that.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 22:44:26 2026
    From Newsgroup: comp.lang.c

    In article <86ldcm82ql.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    In article <86tsrc8d0b.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    [...]
    The C standard doesn't need to say that, for example, a
    function x() other than main(), whose name is never referenced,
    will never be called. If someone wants to establish that x() could
    be called, there needs to be a chain of reasoning going through the
    semantic descriptions given in the C standard, to show that a call
    to x() could occur.

    Actually, no, a reference to a function is not necessary. A
    couple of years ago, a well-publicized issue in a C++ compiler a
    couple of years ago was something along the lines of this:
    [...]

    This is comp.lang.c. My comments were only about C, and not
    about C++. But of course you already knew that.

    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^
    what.c:2:58: note: put the semicolon on a separate line to silence this warning 1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Wed Jun 10 16:19:34 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^ what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220
    6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined
    behavior. I intensely dislike both the rule and the way it's stated,
    but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    Of course since the behavior is undefined, *anything* could happen.
    I don't know what happened inside clang (or the minds of its
    maintainers) that caused it to generate code that executes a
    statement in the body of a function that's never called, but that's
    just one of the infinitely many allowed behaviors. A quick look at the generated code indicates that there's no x86-64 "retq" instruction
    for either main() or hello(), and apparently control falls through
    from the end of main() to the body of hello(). That seems weird.

    It might just be a bug (but not one that, as far as I can tell,
    violates the C standard).

    A function whose body contains a construct that would have undefined
    behavior if the function were called (not the case here) does not
    cause undefined behavior if there are no calls to the function.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Wed Jun 10 23:32:47 2026
    From Newsgroup: comp.lang.c

    In article <110cmfk$116qm$3@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    Replying to myself here, but...this is another example of weird
    behavior:

    ```
    term% cat boo.c
    #include <limits.h>

    int
    monstartup(void)
    {
    return INT_MAX + 1;
    }

    int
    main(void)
    {
    return 0;
    }
    [SNIP]
    (I admit that I am cheating a bit, but I claim that this program
    is strictly conforming.)

    I agree that the program is strictly conforming.

    I don't know the details, but I think "monstartup" is a special name,
    and that the program would behave as expected if a different name
    were used. Since "monstartup" is not reserved, an implementation
    that visibly treats it specially is not conforming.

    That's why it's cheating: `monstartup` is a function called from
    the C runtime when using the `gprof` profiler, before `main` is
    called, and I just happen to know that the csu code will call a
    function by that name if compiled with profiling enabled. Thus,
    this program can tickle the UB in `monstartup` in some weird
    configurations. This is outside of the domain of strictly
    defined C, but it is the sort of thing that happens in the real
    world. Caveat emptor.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Jun 11 08:56:25 2026
    From Newsgroup: comp.lang.c

    On 10/06/2026 23:47, Keith Thompson wrote:

    Right. ("for (;;);" in the original program does not.)

    Note that the C++ special rule applies only when the condition is
    equivalent to a constant `true` and the body of the loop is empty.
    An implementation can "assume" that any other loop will eventually
    finish.

    The rule in C is (6.8.6.1p4):

    An iteration statement may be assumed by the implementation
    to terminate if its controlling expression is not a constant
    expression, and none of the following operations are performed
    in its body, controlling expression or (in the case of a for
    statement) its expression-3
    — input/output operations
    — accessing a volatile object
    — synchronization or atomic operations.

    `for (;;)` is treated as having a constant controlling expression.

    This covers more cases than the C++ rule.

    I dislike it for most of the same reasonss. It should be phrased
    in terms of the permitted behavior of a program, not what an
    implementation is allowed to "assume".

    In addition to that, I dislike the whole idea. I think it's
    intended to enable optimizations, but it means that for this
    contrived program:

    #include <stdio.h>
    int main(void) {
    bool keep_going = true;
    while (keep_going) {
    keep_going = true;
    }
    puts("never reached");
    }

    the implementation is allowed to "assume" that the loop eventually terminates. It's not clear what permissions the implementation is being given if the assumption is violated. I think the program could legally
    print "never reached", but if violating the assumption implies undefined behavior it could do anything.

    A programmer could easily write a program similar to the above
    and think that the meaning is perfectly clear, have it behave very differently because of one obscure subclause in the standard.


    The idea of all this is given in a footnote in the C standards - "This
    is intended to allow compiler transformations such as removal of empty
    loops even when termination cannot be proven."

    The loop might originally have contained source code, but become empty
    through pre-processing, or from other compiler transformations (such as
    the compiler seeing that the "keep_going" variable is not volatile and
    its value is never used, so assignments to it can be elided, or moving
    other things outside the loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite. But is it likely? In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result of
    bugs in the code that accidentally run forever. If the loop is
    accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    Equally, I don't think it is likely that compilers will often be able to
    use this rule to improve code generation - it would only help in a
    situation where the loop's controlling expression is too complicated for
    the compiler to be sure that it will terminate, but where the loop body
    ends up effectively empty. I doubt if that turns up often in real code either.

    So while I agree that this kind of thing can lead to curiosities and
    behaviour that seems counter-intuitive, and is popular with the "modern compilers are evil" crowd, I really do not see it as an issue in
    practice. There are many other mistakes programmers can make, or UB
    that they hit accidentally - this is a drop in the ocean IMHO.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Jun 11 09:10:29 2026
    From Newsgroup: comp.lang.c

    On 10/06/2026 23:08, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    In C, an expression statement "expr;" causes the expression to be
    evaluated as a void expression for its side effects (§6.8.4p2). You
    can, arguably, say that C also requires all statements to be of "void"
    type, just like Pascal - but the cast-to-void is done implicitly to
    treat "expr;" as "(void) expr;".
    [...]

    In an expression statement, the expression is "evaluated as a void
    expression for its side effects". I think that's equivalent to
    convert (not casting!) it to void, but the standard doesn't describe
    it that way.

    Agreed (I also agree on the correction of terminology).


    6.3.2.2: "If an expression of any other type [other than void]
    is evaluated as a void expression, its value or designator is
    discarded."

    But statements have no type.


    Correct.

    I did not mean to suggest that statements in C actually have a type, and
    that their type is "void". It was a philosophical wandering - I was not trying to stay true to the grammar and terminology of either the C or
    Pascal language standards.

    What I meant was that if you were to think that statements /did/ have
    type void, the resulting language would be basically the same. It gives
    a way to think about C and Pascal that shows that though they appear to
    have a different model of statements and expressions, they are
    fundamentally similar - the distinction being that C has an explicit conversion to void when non-void expressions are used in a statement
    context.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Jun 11 11:38:35 2026
    From Newsgroup: comp.lang.c

    In article <110dm6p$17r3s$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:

    Right. ("for (;;);" in the original program does not.)

    Note that the C++ special rule applies only when the condition is
    equivalent to a constant `true` and the body of the loop is empty.
    An implementation can "assume" that any other loop will eventually
    finish.

    The rule in C is (6.8.6.1p4):

    An iteration statement may be assumed by the implementation
    to terminate if its controlling expression is not a constant
    expression, and none of the following operations are performed
    in its body, controlling expression or (in the case of a for
    statement) its expression-3
    — input/output operations
    — accessing a volatile object
    — synchronization or atomic operations.

    `for (;;)` is treated as having a constant controlling expression.

    This covers more cases than the C++ rule.

    I dislike it for most of the same reasonss. It should be phrased
    in terms of the permitted behavior of a program, not what an
    implementation is allowed to "assume".

    In addition to that, I dislike the whole idea. I think it's
    intended to enable optimizations, but it means that for this
    contrived program:

    #include <stdio.h>
    int main(void) {
    bool keep_going = true;
    while (keep_going) {
    keep_going = true;
    }
    puts("never reached");
    }

    the implementation is allowed to "assume" that the loop eventually
    terminates. It's not clear what permissions the implementation is being
    given if the assumption is violated. I think the program could legally
    print "never reached", but if violating the assumption implies undefined
    behavior it could do anything.

    A programmer could easily write a program similar to the above
    and think that the meaning is perfectly clear, have it behave very
    differently because of one obscure subclause in the standard.

    The idea of all this is given in a footnote in the C standards - "This
    is intended to allow compiler transformations such as removal of empty
    loops even when termination cannot be proven."

    The loop might originally have contained source code, but become empty >through pre-processing, or from other compiler transformations (such as
    the compiler seeing that the "keep_going" variable is not volatile and
    its value is never used, so assignments to it can be elided, or moving
    other things outside the loop body).

    I suspect the original intent is as you said, to support removal
    of "dead" loops where the body has been optimized away, or
    excised using conditional compilation. Something like,

    #ifdef DEBUG
    #define DOTHING true
    #else
    #define DOTHING false
    #endif

    ...
    for (int i = 0; i < n; i++) {
    if (DOTHING) {
    // Something complex here...
    }
    }

    If `DEBUG` is not defined in the preprocessor, the compiler has
    license to elide the entire loop as part of dead code
    elimination.

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite. But is it likely? In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result of
    bugs in the code that accidentally run forever. If the loop is
    accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    Equally, I don't think it is likely that compilers will often be able to
    use this rule to improve code generation - it would only help in a
    situation where the loop's controlling expression is too complicated for
    the compiler to be sure that it will terminate, but where the loop body
    ends up effectively empty. I doubt if that turns up often in real code >either.

    So while I agree that this kind of thing can lead to curiosities and >behaviour that seems counter-intuitive, and is popular with the "modern >compilers are evil" crowd, I really do not see it as an issue in
    practice. There are many other mistakes programmers can make, or UB
    that they hit accidentally - this is a drop in the ocean IMHO.

    As I understand it, primarily by reading the C++ problem report,
    which covers both C and C++ for background, the idea is to
    guarantee forward progress for programs that make use of
    threads: consider cooperatively-scheduled green threads; a
    programmer who inadvertantly creates an infinite loop shouldn't
    be able to starve all threads for access to the CPU.

    Personally, I don't think C should be in the business of doing
    such things. But it is what it is.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Jun 11 11:50:04 2026
    From Newsgroup: comp.lang.c

    In article <110cre9$13aa9$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c
    what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^
    what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220 >6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined
    behavior. I intensely dislike both the rule and the way it's stated,
    but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    I think the behavior is technical "unspecified" in the sense of
    the C standard, but yes, this is the important bit. The
    controlling expresion is not constant, and the loop doesn't meet
    any of the other criteria set forth in sec 6.8.6 para 4 for,
    therefore, the translator may assume it terminates (it is
    unspecified whether or not it does; either behavior is correct.
    GCC, for example, appears not to make the same assumption).

    Of course since the behavior is undefined, *anything* could happen.
    I don't know what happened inside clang (or the minds of its
    maintainers) that caused it to generate code that executes a
    statement in the body of a function that's never called, but that's
    just one of the infinitely many allowed behaviors. A quick look at the >generated code indicates that there's no x86-64 "retq" instruction
    for either main() or hello(), and apparently control falls through
    from the end of main() to the body of hello(). That seems weird.

    Here's a slightly better version of `what.c` (that removes the
    annoying "loop is body, move the semicolon to the next line"
    warning):

    ```
    #include <stdio.h>
    int main(void) { unsigned int k = 0; while (k != 1) k += 2; return 0; }
    void hello(void) { printf("Hello, World!\n"); }
    ```

    I think the reasoning goes something like this: in optimization
    phase $n$, the compiler determines that `k` can never be 1, and
    thus the loop does not terminate, and therefore, `return 0;` is
    inaccessible, so it's removed. Then, in phase $n + k$, for
    0$, it applies the rules of sec 6.8.6 para 4, assumes that
    the loop must terminate, and therefore can be removed, and
    removes it. The `return` is already gone. So what you're left
    with is an label that just cascades into whatever is next in
    object code; that just happens to be `hello`.

    It might just be a bug (but not one that, as far as I can tell,
    violates the C standard).

    It's known. It was known when first reported a couple of years
    ago in the C++ context, and I suspect they know about it now. I
    can ask someone who works on LLVM. I suspect the reasoning will
    be that this is important to guarantee forward progress, and
    that they can't solve the halting problem, therefore such loops
    can be removed. If that causes your program to do something
    weird, then, well, don't do that.

    A function whose body contains a construct that would have undefined
    behavior if the function were called (not the case here) does not
    cause undefined behavior if there are no calls to the function.

    True, but irrelevant to the point I was making, which is that UB
    can induce a "call" to a function, even without a reference to
    it appearing in the source text.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Thu Jun 11 14:05:28 2026
    From Newsgroup: comp.lang.c

    On 11/06/2026 13:38, Dan Cross wrote:
    In article <110dm6p$17r3s$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:

    Right. ("for (;;);" in the original program does not.)

    Note that the C++ special rule applies only when the condition is
    equivalent to a constant `true` and the body of the loop is empty.
    An implementation can "assume" that any other loop will eventually
    finish.

    The rule in C is (6.8.6.1p4):

    An iteration statement may be assumed by the implementation
    to terminate if its controlling expression is not a constant
    expression, and none of the following operations are performed
    in its body, controlling expression or (in the case of a for
    statement) its expression-3
    — input/output operations
    — accessing a volatile object
    — synchronization or atomic operations.

    `for (;;)` is treated as having a constant controlling expression.

    This covers more cases than the C++ rule.

    I dislike it for most of the same reasonss. It should be phrased
    in terms of the permitted behavior of a program, not what an
    implementation is allowed to "assume".

    In addition to that, I dislike the whole idea. I think it's
    intended to enable optimizations, but it means that for this
    contrived program:

    #include <stdio.h>
    int main(void) {
    bool keep_going = true;
    while (keep_going) {
    keep_going = true;
    }
    puts("never reached");
    }

    the implementation is allowed to "assume" that the loop eventually
    terminates. It's not clear what permissions the implementation is being >>> given if the assumption is violated. I think the program could legally
    print "never reached", but if violating the assumption implies undefined >>> behavior it could do anything.

    A programmer could easily write a program similar to the above
    and think that the meaning is perfectly clear, have it behave very
    differently because of one obscure subclause in the standard.

    The idea of all this is given in a footnote in the C standards - "This
    is intended to allow compiler transformations such as removal of empty
    loops even when termination cannot be proven."

    The loop might originally have contained source code, but become empty
    through pre-processing, or from other compiler transformations (such as
    the compiler seeing that the "keep_going" variable is not volatile and
    its value is never used, so assignments to it can be elided, or moving
    other things outside the loop body).

    I suspect the original intent is as you said, to support removal
    of "dead" loops where the body has been optimized away, or
    excised using conditional compilation. Something like,

    #ifdef DEBUG
    #define DOTHING true
    #else
    #define DOTHING false
    #endif

    ...
    for (int i = 0; i < n; i++) {
    if (DOTHING) {
    // Something complex here...
    }
    }

    If `DEBUG` is not defined in the preprocessor, the compiler has
    license to elide the entire loop as part of dead code
    elimination.


    I don't know about "original intent" - I was quoting a footnote in the C standard, but I have not done any research like reading through the
    rationale documents.

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite. But is it likely? In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result of
    bugs in the code that accidentally run forever. If the loop is
    accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    Equally, I don't think it is likely that compilers will often be able to
    use this rule to improve code generation - it would only help in a
    situation where the loop's controlling expression is too complicated for
    the compiler to be sure that it will terminate, but where the loop body
    ends up effectively empty. I doubt if that turns up often in real code
    either.

    So while I agree that this kind of thing can lead to curiosities and
    behaviour that seems counter-intuitive, and is popular with the "modern
    compilers are evil" crowd, I really do not see it as an issue in
    practice. There are many other mistakes programmers can make, or UB
    that they hit accidentally - this is a drop in the ocean IMHO.

    As I understand it, primarily by reading the C++ problem report,
    which covers both C and C++ for background, the idea is to
    guarantee forward progress for programs that make use of
    threads: consider cooperatively-scheduled green threads; a
    programmer who inadvertantly creates an infinite loop shouldn't
    be able to starve all threads for access to the CPU.

    Personally, I don't think C should be in the business of doing
    such things. But it is what it is.

    - Dan C.


    I agree there. It is up to programmers to write useful programs - I
    don't think it makes sense for a language standard to say that programs
    have to either do something observable, or get out of the way and don't
    block something else from being useful. But I have difficulty seeing
    that this rule in the C standards would make much real-world difference
    one way or the other.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 16:49:09 2026
    From Newsgroup: comp.lang.c

    On 2026-06-09 03:25, Waldek Hebisch wrote:
    [...]

    Interesting views. - Thanks.


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal specifications.

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Jun 11 15:20:01 2026
    From Newsgroup: comp.lang.c

    In article <110eht5$1naub$5@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-09 03:25, Waldek Hebisch wrote:
    [...]

    Interesting views. - Thanks.


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal >specifications.

    One hopes that a formal specification (that's a term of art, and
    implies something that's mathematically precise) would be
    accompanied by a commentary for more casual reading. However,
    the truly precise, formal specification would be considered
    definitive.

    I think the odds of this ever happening for C are slim to none,
    but it would be useful.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 17:34:35 2026
    From Newsgroup: comp.lang.c

    On 2026-06-11 08:56, David Brown wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:
    [...]

    #include <stdio.h>
    int main(void) {
         bool keep_going = true;
         while (keep_going) {
             keep_going = true;
         }
         puts("never reached");
    }

    [...]

    [...]

    The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such as
    the compiler seeing that the "keep_going" variable is not volatile and
    its value is never used, so assignments to it can be elided, or moving
    other things outside the loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite.  But is it likely?

    I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
    clear to the programmer.

    In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result of
    bugs in the code that accidentally run forever.  If the loop is accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    [...]

    So while I agree that this kind of thing can lead to curiosities and behaviour that seems counter-intuitive, and is popular with the "modern compilers are evil" crowd, I really do not see it as an issue in
    practice.  There are many other mistakes programmers can make, or UB
    that they hit accidentally - this is a drop in the ocean IMHO.

    Languages shall be sensibly and clearly defined. For bad designs (or
    bad standards) the language or standard should be blamed, and not the
    critics badly and inappropriately despised as ''"modern compilers are
    evil" crowd''. - Programmers are at the final end of the "food chain".
    And there's a lot of horrible pits in the C-language where programmers
    "made the mistake" to fall in; don't blame them, neither the ones who
    silently suffer nor the ones who shout out.

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 17:45:31 2026
    From Newsgroup: comp.lang.c

    On 2026-06-10 16:37, Dan Cross wrote:
    [...]
    Here's a C version with the same behavior:

    ```
    term% cat weird.c
    #include <stdio.h>

    int
    main(void)
    {
    for (unsigned int k = 0; k != 1; k += 2)
    ;
    return 0;
    }

    void
    hello(void)
    {
    printf("Hello, World!\n");
    }
    term% clang --version
    clang version 22.1.6
    Target: x86_64-pc-linux-gnu
    Thread model: posix
    InstalledDir: /usr/bin
    term% clang -Wall -pedantic -O1 -std=c23 -o weird weird.c
    term% ./weird
    Hello, World!
    term%
    ```

    Wow, that's really fascinating! (In a bad sense.)

    And (in clang) just an effect of the '-O1' (as I notice).

    I may have missed the "programming language design" wisdom of the
    past decades. Back then we had the conception that "optimization"
    is a method to transform a program to a _functionally equivalent_
    code (one that is faster, requires less memory, or some such).

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 18:08:39 2026
    From Newsgroup: comp.lang.c

    On 2026-06-11 17:20, Dan Cross wrote:
    In article <110eht5$1naub$5@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal
    specifications.

    One hopes that a formal specification (that's a term of art, and
    implies something that's mathematically precise) would be
    accompanied by a commentary for more casual reading.

    Commentaries generally make sense, and they are one possibility
    to serve the needs also of programmers. But a more formal text
    would also help the authors of textbooks to provide a clearer
    description for those programmers that are repelled by standards
    papers.

    However,
    the truly precise, formal specification would be considered
    definitive.

    Yes. (That's what I intended to express.)


    I think the odds of this ever happening for C are slim to none,
    but it would be useful.

    I agree. (And I don't wait for that; I'm taking "C" as it is.)

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Thu Jun 11 16:30:45 2026
    From Newsgroup: comp.lang.c

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-09 03:25, Waldek Hebisch wrote:
    [...]

    Interesting views. - Thanks.


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal specifications.

    You sniped most of what I wrote. I certainly would prefer standard
    that is less lawyerish and more mathematical, say written in similar
    way to Pascal standard. But there is a _big_ gap between normal
    mathematical text and a formal mathematical text (and let me note that
    Pascal standard is less formal than normal mathematics). Normal
    mathematical text depends on human understanding to disambiguate
    and bridge small inconsistencies. Formal one has parts which
    are there only because authors were not able to avoid
    ambiguity in simpler way. And once things are written in a way
    that is well fit to formalizm they tend to be much less
    understandable to uninitiated.
    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 20:12:32 2026
    From Newsgroup: comp.lang.c

    On 2026-06-10 00:34, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "42" is an expression of type "int", and so is 'printf("Hello\n")'.
    How (and why) would a language distinguish between them and allow one
    but not the other?
    [...]

    Ada, Pascal, and similar languages do exactly this, for what many
    people consider to be good reasons.

    Right.

    What I'm not sure about is the predominance of "these" or "those"
    languages. - Is that clear distinction of procedures and function
    the typical case, or are the "C-derived" languages predominant and
    languages with a clear distinction (meanwhile?) just outliers?

    There's of course also other languages that distinguish procedures
    from functions "only" by the 'void' "return type", but are anyway
    able to diagnose the appropriate context and emit error messages
    when inappropriately used.


    In both languages, functions and procedures are distinct. Functions
    return values; procedures do not. An expression cannot be turned
    into a statement just by adding a semicolon. A function call is
    an expression. A procedure call is a statement, not an expression.
    An assignment is a statement, not an expression.

    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).

    Erm, I hope that above printf() call does not create an error, but
    returns the number of characters in the printed text. ;-)

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 20:29:16 2026
    From Newsgroup: comp.lang.c

    On 2026-06-10 09:04, David Brown wrote:
    [...]

    The rough equivalent of the distinction between Pascal procedures and functions is that procedures are like C functions that have "void"
    return type.  It's fine (and not at all a bad idea) for a language to distinguish between void and non-void like this.  What cannot easily be done in a clear and consistent way is to distinguish between two
    expressions of type "int" (or any other general non-void type).

    Here I cannot follow you. - The C-compiler can analyze code to do
    optimizations and even (as so often stated) "assume" things about
    the intent concerning UB and optimization but cannot value facts
    about types and context? - If so, then it sounds rather arbitrary.

    [...]

    It is also fine for a language to distinguish between "pure" functions
    and functions/procedures with side-effects and/or functions/procedures
    with observable behaviour.  (A "pure procedure" would not do anything.)

    By "would not do anything" you probably mean that it would not have side-effects on/with relatively global entities in the program?

    As far as I remember, Pascal does not make that distinction.

    Pascal functions and procedures can affect and be affected by global
    entities. Predefined functions and procedures can have side effects
    also unrelated to global entities in the program (e.g. print effect).
    A procedure/function not affecting the global (or surrounding stack) environment could likely be identified. But here we're anyway talking
    about the (clean!) return-interface of functions (as opposed to the procedures).

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Thu Jun 11 20:52:30 2026
    From Newsgroup: comp.lang.c

    On 2026-06-11 18:30, Waldek Hebisch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-09 03:25, Waldek Hebisch wrote:
    [...]

    Interesting views. - Thanks.


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal
    specifications.

    You sniped most of what I wrote.

    Yes, because I acknowledged it by my above on-line remark already
    (and I didn't want to waste space unnecessarily). (No offense!)

    I intended to comment just on the one paragraph above, with its
    assumption that it may be an inherent problem to programmers.

    To elaborate only a bit more...
    There's folks who have problems with "lawyer's speech" standards.
    There's folks who have problems with formal mathematical standards.
    But, as to my observation, there's *no* strict or natural hierarchy
    that one would imply the other.

    You said: "They already struggle with current standard text."
    as if there would be a strict "one implies the other" fact; there
    isn't one, or to be more cautious, "there isn't necessarily one".
    (I used the wording "necessarily" already in my original comment.)

    I certainly would prefer standard
    that is less lawyerish and more mathematical, say written in similar
    way to Pascal standard. But there is a _big_ gap between normal
    mathematical text and a formal mathematical text (and let me note that
    Pascal standard is less formal than normal mathematics).

    I agree.

    Normal
    mathematical text depends on human understanding to disambiguate
    and bridge small inconsistencies. Formal one has parts which
    are there only because authors were not able to avoid
    ambiguity in simpler way. And once things are written in a way
    that is well fit to formalizm they tend to be much less
    understandable to uninitiated.

    (I'll leave that uncommented. - I've said all I intended to say.)

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Thu Jun 11 15:13:09 2026
    From Newsgroup: comp.lang.c

    On 2026-06-11 14:12, Janis Papanagnou wrote:
    On 2026-06-10 00:34, Keith Thompson wrote:
    ...
    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).

    Erm, I hope that above printf() call does not create an error, but
    returns the number of characters in the printed text. ;-)

    Hope is nice. I hope, in particular, that you're aware that there are
    not guarantees on that matter?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 13:29:10 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    The idea of all this is given in a footnote in the C standards - "This
    is intended to allow compiler transformations such as removal of empty
    loops even when termination cannot be proven."

    The loop might originally have contained source code, but become empty through pre-processing, or from other compiler transformations (such
    as the compiler seeing that the "keep_going" variable is not volatile
    and its value is never used, so assignments to it can be elided, or
    moving other things outside the loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite. But is it likely? In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result
    of bugs in the code that accidentally run forever. If the loop is accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    How about a loop that has a non-constant condition, but that is
    not expected to terminate in normal usage?

    while (! something_really_bad_happened()) {
    sleep(1);
    }
    self_destruct();

    A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
    a call to self_destruct(). There are probably better ways to do that,
    but it's straightforward code with seemingly obvious semantics that
    an implementation is permitted to make unwarrated assumptions about.

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Jun 12 00:37:03 2026
    From Newsgroup: comp.lang.c

    On 2026-06-11 21:13, James Kuyper wrote:
    On 2026-06-11 14:12, Janis Papanagnou wrote:
    On 2026-06-10 00:34, Keith Thompson wrote:
    ...
    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).

    Erm, I hope that above printf() call does not create an error, but
    returns the number of characters in the printed text. ;-)

    Hope is nice. I hope, in particular, that you're aware that there are
    not guarantees on that matter?

    Oh, actually I indeed thought that printing a constant string would not
    create any error that would then be indicated by printf's return value.

    I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the
    compiler creates just a warning, execution provides some random output
    and a _non-negative_ string-length value as printf's return value. Not
    exactly what I'd expect from a language.

    Concerning the "guarantees" that you're asking for I sadly have to say
    that I meanwhile expect nothing sensible at all any more from "C". ;-)

    But to be more serious again...

    The man-page is very unspecific on that; 'man 3 printf' says:
    "If an output error is encountered, a negative value is returned."

    Now of course an error can occur with that simple 'printf' above, for
    example, by issuing an 'fclose (stdout);' before the 'printf (...);'
    But what can I as a C-programmer derive from that; how would one act
    on that. (That's just rhetorical.)

    Obviously (because of that?) I've never seen anyone test such a call
    by, say,

    int rc = printf("Hello, world\n");
    if (rc < 0) {
    /* umm.. */
    }

    Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
    do about that, then? (Write a logfile entry, maybe? - and then?)

    But yes, I'm aware of negative OS function or library function output.

    Our rules (back in my C/C++ days) suggested to catch any sensible and
    possible error indications to quickly localize any potential issues.

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 15:38:41 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I suspect the original intent is as you said, to support removal
    of "dead" loops where the body has been optimized away, or
    excised using conditional compilation. Something like,

    #ifdef DEBUG
    #define DOTHING true
    #else
    #define DOTHING false
    #endif

    ...
    for (int i = 0; i < n; i++) {
    if (DOTHING) {
    // Something complex here...
    }
    }

    If `DEBUG` is not defined in the preprocessor, the compiler has
    license to elide the entire loop as part of dead code
    elimination.

    I think I see what you mean, but in this particular case the loop
    can be proven to terminate unless `i` is modified in the body of
    the loop, and a compiler can elide the entire loop anyway.

    [...]

    As I understand it, primarily by reading the C++ problem report,
    which covers both C and C++ for background, the idea is to
    guarantee forward progress for programs that make use of
    threads: consider cooperatively-scheduled green threads; a
    programmer who inadvertantly creates an infinite loop shouldn't
    be able to starve all threads for access to the CPU.

    Personally, I don't think C should be in the business of doing
    such things. But it is what it is.

    I agree.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Jun 11 23:05:17 2026
    From Newsgroup: comp.lang.c

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 2026-06-11 21:13, James Kuyper wrote:
    On 2026-06-11 14:12, Janis Papanagnou wrote:
    On 2026-06-10 00:34, Keith Thompson wrote:
    ...
    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).

    Erm, I hope that above printf() call does not create an error, but
    returns the number of characters in the printed text. ;-)

    Hope is nice. I hope, in particular, that you're aware that there are
    not guarantees on that matter?

    Oh, actually I indeed thought that printing a constant string would not >create any error that would then be indicated by printf's return value.

    The manual page also notes for the cases where printf returns -1:

    For the conditions under which [CX] [Option Start] dprintf(), [Option End] fprintf(),
    and printf() fail and may fail, refer to fputc() or fputwc().

    In addition, all forms of fprintf() shall fail if:

    [EILSEQ]
    [CX] [Option Start] A wide-character code that does not correspond to a valid character has been detected. [Option End]
    [EOVERFLOW]
    [CX] [Option Start] The value to be returned is greater than {INT_MAX}. [Option End]

    [CX] [Option Start] The asprintf() function shall fail if:

    [ENOMEM]
    Insufficient storage space is available.

    The dprintf() function may fail if:

    [EBADF]
    The fildes argument is not a valid file descriptor.

    [Option End]

    The [CX] [Option Start] dprintf(), [Option End] fprintf(), and printf() functions may fail if:

    [ENOMEM]
    [CX] [Option Start] Insufficient storage space is available. [Option End]

    The fputc(3) errors:

    ERRORS

    The fputc() function shall fail if either the stream is unbuffered or the stream's buffer needs to be flushed, and:

    [EAGAIN]
    [CX] [Option Start] The O_NONBLOCK flag is set for the file descriptor underlying stream and the thread would be delayed in the write operation. [Option End]
    [EBADF]
    [CX] [Option Start] The file descriptor underlying stream is not a valid file descriptor open for writing. [Option End]
    [EFBIG]
    [CX] [Option Start] An attempt was made to write to a file that exceeds the maximum file size. [Option End]
    [EFBIG]
    [CX] [Option Start] An attempt was made to write to a file that exceeds the file size limit of the process.
    [Option End] [XSI] [Option Start] A SIGXFSZ signal shall also be generated for the thread. [Option End]
    [EFBIG]
    [CX] [Option Start] The file is a regular file and an attempt was made to write at or beyond the offset maximum. [Option End]
    [EINTR]
    [CX] [Option Start] The write operation was terminated due to the receipt of a signal, and no data was transferred. [Option End]
    [EIO]
    [CX] [Option Start] A physical I/O error has occurred, or the process is a member of a background process group attempting to write to its controlling terminal, TOSTOP is set, the calling thread is not blocking SIGTTOU, the process is not ignoring SIGTTOU, and the process group of the process is orphaned. This error may also be returned under implementation-defined conditions. [Option End]
    [ENOSPC]
    [CX] [Option Start] There was no free space remaining on the device containing the file. [Option End]
    [EPIPE]
    [CX] [Option Start] An attempt is made to write to a pipe or FIFO that is not open for reading by any process. A SIGPIPE signal shall also be sent to the thread. [Option End]


    The fputc() function may fail if:

    [ENOMEM]
    [CX] [Option Start] Insufficient storage space is available. [Option End]
    [ENXIO]
    [CX] [Option Start] A request was made of a nonexistent device, or the request was outside the capabilities of the device. [Option End]


    The '[Option start]' '[Option end]' tags describe behavior aligned with the C standard.

    https://pubs.opengroup.org/onlinepubs/9799919799/functions/fputc.html https://pubs.opengroup.org/onlinepubs/9799919799/functions/printf.html
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Thu Jun 11 23:07:00 2026
    From Newsgroup: comp.lang.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I suspect the original intent is as you said, to support removal
    of "dead" loops where the body has been optimized away, or
    excised using conditional compilation. Something like,

    #ifdef DEBUG
    #define DOTHING true
    #else
    #define DOTHING false
    #endif

    ...
    for (int i = 0; i < n; i++) {
    if (DOTHING) {
    // Something complex here...
    }
    }

    If `DEBUG` is not defined in the preprocessor, the compiler has
    license to elide the entire loop as part of dead code
    elimination.

    I think I see what you mean, but in this particular case the loop
    can be proven to terminate unless `i` is modified in the body of

    ...unless 'i' or 'n' is modified in the body of
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Fri Jun 12 01:18:17 2026
    From Newsgroup: comp.lang.c

    On 2026-06-12 01:05, Scott Lurndal wrote:

    The manual page also notes for the cases where printf returns -1:

    The man page on my Linux doesn't. :-(

    [snip error list]

    Thanks for the error list.

    [snip opengroup-links]

    Yeah, and these links; always useful to look up these resources.

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 16:28:47 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110cre9$13aa9$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^
    what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220 >>6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined
    behavior. I intensely dislike both the rule and the way it's stated,
    but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    I think the behavior is technical "unspecified" in the sense of
    the C standard, but yes, this is the important bit. The
    controlling expresion is not constant, and the loop doesn't meet
    any of the other criteria set forth in sec 6.8.6 para 4 for,
    therefore, the translator may assume it terminates (it is
    unspecified whether or not it does; either behavior is correct.
    GCC, for example, appears not to make the same assumption).

    Why do you think the behavior is unspecified rather that undefined?

    Unspecified behavior is defined as: "behavior, that results from
    the use of an unspecified value, or other behavior upon which
    this document provides two or more possibilities and imposes
    no further requirements on which is chosen in any instance". (Implementation-defined behavior differs from unspecified behavior
    in that the implementation must document how the choice is made.)

    What are the "two more more possibilities" in this case?

    [SNIP]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Thu Jun 11 23:46:14 2026
    From Newsgroup: comp.lang.c

    In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110cre9$13aa9$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>> void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^
    what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220 >>>6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined >>>behavior. I intensely dislike both the rule and the way it's stated,
    but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    I think the behavior is technical "unspecified" in the sense of
    the C standard, but yes, this is the important bit. The
    controlling expresion is not constant, and the loop doesn't meet
    any of the other criteria set forth in sec 6.8.6 para 4 for,
    therefore, the translator may assume it terminates (it is
    unspecified whether or not it does; either behavior is correct.
    GCC, for example, appears not to make the same assumption).

    Why do you think the behavior is unspecified rather that undefined?

    Unspecified behavior is defined as: "behavior, that results from
    the use of an unspecified value, or other behavior upon which
    this document provides two or more possibilities and imposes
    no further requirements on which is chosen in any instance". >(Implementation-defined behavior differs from unspecified behavior
    in that the implementation must document how the choice is made.)

    What are the "two more more possibilities" in this case?

    The two choices are that the implementation may assume the loop
    terminates, or it may not, but it doesn't say which. I don't
    think that the language permits it to be UB. But I could be
    wrong. It's a bit of a distinction without a difference as far
    as the outcome is concerned.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 17:41:38 2026
    From Newsgroup: comp.lang.c

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 2026-06-11 21:13, James Kuyper wrote:
    On 2026-06-11 14:12, Janis Papanagnou wrote:
    On 2026-06-10 00:34, Keith Thompson wrote:
    ...
    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).

    Erm, I hope that above printf() call does not create an error, but
    returns the number of characters in the printed text. ;-)

    Hope is nice. I hope, in particular, that you're aware that there are
    not guarantees on that matter?

    Oh, actually I indeed thought that printing a constant string would not create any error that would then be indicated by printf's return value.

    Linux has a device called "/dev/full". It acts like it has no data
    on input, and like it's full on output. You can redirect a program's
    stdout to /dev/full. It's useful for testing, and much easier than
    finding a writable filesystem with no remaining space. (/dev/null
    accepts and discards as much intput as you send to it.)

    On my system, a small write to /dev/full will typically succeed, since
    the output is buffered rather than being immediately sent to the
    file. It fails with ENOSPC after about 4 kbytes.

    If I use fopen() to open /dev/full, then write to it, then fclose()
    it, the fclose() fails. Since files are implicitly closed when
    main() finishes, this is likely to go undetected.

    A common pattern is "some_program > some_file", which redirects
    stdout to a file but leaves stderr going to the default (typically
    the tty).

    I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the compiler creates just a warning, execution provides some random output
    and a _non-negative_ string-length value as printf's return value. Not exactly what I'd expect from a language.

    Calling printf with a mismatch between the format string and
    an argument has undefined behavior. Some compilers will warn
    about this in most cases, but in general the format string is not
    necessarily known at compile time. No diagnostic or other error
    indication is required.

    Concerning the "guarantees" that you're asking for I sadly have to say
    that I meanwhile expect nothing sensible at all any more from "C". ;-)

    But to be more serious again...

    The man-page is very unspecific on that; 'man 3 printf' says:
    "If an output error is encountered, a negative value is returned."

    Now of course an error can occur with that simple 'printf' above, for example, by issuing an 'fclose (stdout);' before the 'printf (...);'
    But what can I as a C-programmer derive from that; how would one act
    on that. (That's just rhetorical.)

    Obviously (because of that?) I've never seen anyone test such a call
    by, say,

    int rc = printf("Hello, world\n");
    if (rc < 0) {
    /* umm.. */
    }

    Quick-and-dirty programs like the classic "hello, world" often don't
    bother to check. The above could print an error message to stderr and
    call exit(EXIT_FAILURE). Even if stdout and stderr both produce errors,
    the caller should be able to detect the error status. (I've configured
    my shell to print a message when a program dies with an error status.)

    But most production programs don't just blindly print stuff to stdout.

    For example, GNU coreutils "cat" and "echo" both print "write error:
    No space left on device" on stderr and exit with a status of 1 when
    output is redirected to /dev/full -- if the output is big enough.
    I haven't checked the source, but they must be explicitly checking
    the result of both whatever output routine(s) they use and the
    fclose(), or perhaps doing some fancy system-specific stuff that
    has the same effect.

    Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
    do about that, then? (Write a logfile entry, maybe? - and then?)

    Write the error message to stderr, optionally log it somewhere,
    and exit with an error code.

    But yes, I'm aware of negative OS function or library function output.

    Our rules (back in my C/C++ days) suggested to catch any sensible and possible error indications to quickly localize any potential issues.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Thu Jun 11 20:41:49 2026
    From Newsgroup: comp.lang.c

    On 2026-06-11 18:37, Janis Papanagnou wrote:
    On 2026-06-11 21:13, James Kuyper wrote:
    On 2026-06-11 14:12, Janis Papanagnou wrote:
    On 2026-06-10 00:34, Keith Thompson wrote:
    ...
    For I/O, the equivalent of printf is a procedure. In C,
    printf("Hello, world\n") returns a negative result to denote an
    error (and that value is often ignored).

    Erm, I hope that above printf() call does not create an error, but
    returns the number of characters in the printed text. ;-)

    Hope is nice. I hope, in particular, that you're aware that there are
    not guarantees on that matter?

    Oh, actually I indeed thought that printing a constant string would not create any error that would then be indicated by printf's return value.

    Every I/O function has a way of reporting failure, because every one is
    capable of failing. That's because, if nothing else, hardware problems
    could prevent I/O from happening. How much attention you need to pay to
    that possibility depends upon the context.

    I'd indeed also expected that, say, printing a string value with a '%d' specifier would produce an error, but I saw that it doesn't; while the compiler creates just a warning, execution provides some random output
    and a _non-negative_ string-length value as printf's return value. Not exactly what I'd expect from a language.

    On some systems I've used, it would try to interpret the pointer to the
    string as an int, and print the result. On others, it would expect the
    int to be stored in one register, whereas the pointer was stored in a
    different register, and as a result it would print whatever value was
    last stored in the first register. These were natural outcomes for those implementations; had the C standard imposed any conflicting requirements
    on the behavior, it would have complicated those implementations.

    ...
    Now of course an error can occur with that simple 'printf' above, for example, by issuing an 'fclose (stdout);' before the 'printf (...);'
    But what can I as a C-programmer derive from that; how would one act
    on that. (That's just rhetorical.)

    Obviously (because of that?) I've never seen anyone test such a call
    by, say,

    int rc = printf("Hello, world\n");
    if (rc < 0) {
    /* umm.. */
    }

    Are you - plural, all CLC audience - writing such code with 'printf()', honestly? - Same question with 'int rc = fclose (...);' - what can one
    do about that, then? (Write a logfile entry, maybe? - and then?)

    For most of the programs I ever wrote, a single check for ferror(file)
    at the end of the program, resulting in exit(EXIT_FAILURE) being called,
    would be acceptable. That approach relies on the fact that the error
    flag is sticky. Because I made a habit of such checks, we caught a
    problem when a disk overflowed before we'd wasted hours "writing" data
    to nowhere. If I had sent a message to a log file, it would have been
    blocked by the same problem, which is why I used the exit status to
    report the problem.
    But I was never involved in writing interactive programs, where I
    suspect that would not be acceptable.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 17:43:52 2026
    From Newsgroup: comp.lang.c

    scott@slp53.sl.home (Scott Lurndal) writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    I think I see what you mean, but in this particular case the loop
    can be proven to terminate unless `i` is modified in the body of

    ...unless 'i' or 'n' is modified in the body of

    Touché.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Thu Jun 11 18:29:54 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110cre9$13aa9$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^
    what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220 >>>>6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined >>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    I think the behavior is technical "unspecified" in the sense of
    the C standard, but yes, this is the important bit. The
    controlling expresion is not constant, and the loop doesn't meet
    any of the other criteria set forth in sec 6.8.6 para 4 for,
    therefore, the translator may assume it terminates (it is
    unspecified whether or not it does; either behavior is correct.
    GCC, for example, appears not to make the same assumption).

    Why do you think the behavior is unspecified rather that undefined?

    Unspecified behavior is defined as: "behavior, that results from
    the use of an unspecified value, or other behavior upon which
    this document provides two or more possibilities and imposes
    no further requirements on which is chosen in any instance". >>(Implementation-defined behavior differs from unspecified behavior
    in that the implementation must document how the choice is made.)

    What are the "two more more possibilities" in this case?

    The two choices are that the implementation may assume the loop
    terminates, or it may not, but it doesn't say which. I don't
    think that the language permits it to be UB. But I could be
    wrong. It's a bit of a distinction without a difference as far
    as the outcome is concerned.

    No, those are not the two choices. An assumption made by an
    implementation is not behavior ("external appearance or action").
    An implementation might invoke some behavior as a result of some
    assumption.

    If a loop doesn't terminate and the implementation assumes that
    it does, the standard says nothing about the resulting behavior.
    It doesn't provide two or more options for the actual behavior.
    That's classic UB.

    We've seen cases here where the actual behavior is falling through
    into a function that's never called. That's certainly not a
    possibility provided by the standard.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Fri Jun 12 01:54:09 2026
    From Newsgroup: comp.lang.c

    In article <110fnem$1s3nm$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110cre9$13aa9$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>>> void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^ >>>>>> what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220 >>>>>6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined >>>>>behavior. I intensely dislike both the rule and the way it's stated, >>>>>but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    I think the behavior is technical "unspecified" in the sense of
    the C standard, but yes, this is the important bit. The
    controlling expresion is not constant, and the loop doesn't meet
    any of the other criteria set forth in sec 6.8.6 para 4 for,
    therefore, the translator may assume it terminates (it is
    unspecified whether or not it does; either behavior is correct.
    GCC, for example, appears not to make the same assumption).

    Why do you think the behavior is unspecified rather that undefined?

    Unspecified behavior is defined as: "behavior, that results from
    the use of an unspecified value, or other behavior upon which
    this document provides two or more possibilities and imposes
    no further requirements on which is chosen in any instance". >>>(Implementation-defined behavior differs from unspecified behavior
    in that the implementation must document how the choice is made.)

    What are the "two more more possibilities" in this case?

    The two choices are that the implementation may assume the loop
    terminates, or it may not, but it doesn't say which. I don't
    think that the language permits it to be UB. But I could be
    wrong. It's a bit of a distinction without a difference as far
    as the outcome is concerned.

    No, those are not the two choices. An assumption made by an
    implementation is not behavior ("external appearance or action").
    An implementation might invoke some behavior as a result of some
    assumption.

    If a loop doesn't terminate and the implementation assumes that
    it does, the standard says nothing about the resulting behavior.
    It doesn't provide two or more options for the actual behavior.
    That's classic UB.

    We've seen cases here where the actual behavior is falling through
    into a function that's never called. That's certainly not a
    possibility provided by the standard.

    Ok, fair point.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Fri Jun 12 02:02:51 2026
    From Newsgroup: comp.lang.c

    In article <110fddl$1pooi$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I suspect the original intent is as you said, to support removal
    of "dead" loops where the body has been optimized away, or
    excised using conditional compilation. Something like,

    #ifdef DEBUG
    #define DOTHING true
    #else
    #define DOTHING false
    #endif

    ...
    for (int i = 0; i < n; i++) {
    if (DOTHING) {
    // Something complex here...
    }
    }

    If `DEBUG` is not defined in the preprocessor, the compiler has
    license to elide the entire loop as part of dead code
    elimination.

    I think I see what you mean, but in this particular case the loop
    can be proven to terminate unless `i` is modified in the body of
    the loop, and a compiler can elide the entire loop anyway.

    Yes. Scott aluded to the rest; what if the actual body had set
    the exit condition for the loop, and had been optimized away?

    For example, given `DOTHING` as above:

    for (int i = 0; i < n; ) {
    if (DOTHING) {
    // Something complex here...
    i++;
    }
    }

    Here, as before, the compiler is allowed to assume that the loop
    _would_ terminate, and thus elide it, as before. Of course, it
    is not forced to _guarantee_ that happens because it can't solve
    the halting problem.

    [...]

    As I understand it, primarily by reading the C++ problem report,
    which covers both C and C++ for background, the idea is to
    guarantee forward progress for programs that make use of
    threads: consider cooperatively-scheduled green threads; a
    programmer who inadvertantly creates an infinite loop shouldn't
    be able to starve all threads for access to the CPU.

    Personally, I don't think C should be in the business of doing
    such things. But it is what it is.

    I agree.

    Yup.

    It is one of the reasons C is no longer my favorite language.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Fri Jun 12 02:08:45 2026
    From Newsgroup: comp.lang.c

    In article <110f5qm$1nfih$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    The idea of all this is given in a footnote in the C standards - "This
    is intended to allow compiler transformations such as removal of empty
    loops even when termination cannot be proven."

    The loop might originally have contained source code, but become empty
    through pre-processing, or from other compiler transformations (such
    as the compiler seeing that the "keep_going" variable is not volatile
    and its value is never used, so assignments to it can be elided, or
    moving other things outside the loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite. But is it likely? In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result
    of bugs in the code that accidentally run forever. If the loop is
    accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    How about a loop that has a non-constant condition, but that is
    not expected to terminate in normal usage?

    while (! something_really_bad_happened()) {
    sleep(1);
    }
    self_destruct();

    A compiler could "assume" that the loop terminates, even if >something_really_bad never happens, and that assumption could result in
    a call to self_destruct(). There are probably better ways to do that,
    but it's straightforward code with seemingly obvious semantics that
    an implementation is permitted to make unwarrated assumptions about.

    [...]

    I think, given the names, that this would _likely_ not meet the
    criteria in 6.8.6 para 4. What would the criteria for, `something_really_bad_happened` to return `true`? It would
    almost certainly involve something that is listed as a require
    for the compiler to prove could not happen in order to assume
    the loop terminates; as written, the "assume it terminates"
    pretty much only allows empty loop bodies, or bodies that just
    do simple calculations. I guess it's possible, but I'm having a
    hard time imagining that `something_really_bad_happened`
    wouldn't do IO or access a volatile or do an atomic operation or
    something.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Fri Jun 12 02:20:11 2026
    From Newsgroup: comp.lang.c

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-11 18:30, Waldek Hebisch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-09 03:25, Waldek Hebisch wrote:
    [...]

    Interesting views. - Thanks.


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal
    specifications.

    You sniped most of what I wrote.

    Yes, because I acknowledged it by my above on-line remark already
    (and I didn't want to waste space unnecessarily). (No offense!)

    I intended to comment just on the one paragraph above, with its
    assumption that it may be an inherent problem to programmers.

    But this paragraph was closely linked to the text above. Dan Cross
    wanted formal semantics and my paragraph was responding to this.
    I think that lawyerish style of current C standard is mostly inertia,
    and making standard more mathematical would improve it. But giving
    formal semantic in the standard would mean significantly bigger
    change.

    To elaborate only a bit more...
    There's folks who have problems with "lawyer's speech" standards.
    There's folks who have problems with formal mathematical standards.
    But, as to my observation, there's *no* strict or natural hierarchy
    that one would imply the other.

    You said: "They already struggle with current standard text."
    as if there would be a strict "one implies the other" fact; there
    isn't one, or to be more cautious, "there isn't necessarily one".
    (I used the wording "necessarily" already in my original comment.)

    I certainly would prefer standard
    that is less lawyerish and more mathematical, say written in similar
    way to Pascal standard. But there is a _big_ gap between normal
    mathematical text and a formal mathematical text (and let me note that
    Pascal standard is less formal than normal mathematics).

    I agree.

    Normal
    mathematical text depends on human understanding to disambiguate
    and bridge small inconsistencies. Formal one has parts which
    are there only because authors were not able to avoid
    ambiguity in simpler way. And once things are written in a way
    that is well fit to formalizm they tend to be much less
    understandable to uninitiated.

    (I'll leave that uncommented. - I've said all I intended to say.)

    Janis

    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 10:58:07 2026
    From Newsgroup: comp.lang.c

    On 11/06/2026 17:34, Janis Papanagnou wrote:
    On 2026-06-11 08:56, David Brown wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:
    [...]

    #include <stdio.h>
    int main(void) {
         bool keep_going = true;
         while (keep_going) {
             keep_going = true;
         }
         puts("never reached");
    }

    [...]

    [...]

    The loop might originally have contained source code, but become empty
    through pre-processing, or from other compiler transformations (such
    as the compiler seeing that the "keep_going" variable is not volatile
    and its value is never used, so assignments to it can be elided, or
    moving other things outside the loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite.  But is it likely?

    I think we should not make any assumptions about the "creativity" of a programmer ("C" or else). - Semantics should be well defined, and then
    clear to the programmer.

    I think the semantics of this "loops can be assumed to terminate" are
    clearly defined in the standard. I agree that the details might not be
    known to all C programmers, but I think they are only relevant in a very
    small number of cases.


    In my experience, infinite loops are generally very clearly written -
    either as "for (;;)" loops or "while (true)" loops - or they are the
    result of bugs in the code that accidentally run forever.  If the loop
    is accidentally infinite, the programmer will already be expecting it
    to run the code after the loop.

    [...]

    So while I agree that this kind of thing can lead to curiosities and
    behaviour that seems counter-intuitive, and is popular with the
    "modern compilers are evil" crowd, I really do not see it as an issue
    in practice.  There are many other mistakes programmers can make, or
    UB that they hit accidentally - this is a drop in the ocean IMHO.

    Languages shall be sensibly and clearly defined. For bad designs (or
    bad standards) the language or standard should be blamed, and not the
    critics badly and inappropriately despised as ''"modern compilers are
    evil" crowd''. - Programmers are at the final end of the "food chain".
    And there's a lot of horrible pits in the C-language where programmers
    "made the mistake" to fall in; don't blame them, neither the ones who silently suffer nor the ones who shout out.


    I agree that standards should be clear, and standards documents should
    be held accountable if they are not. There's no doubt that the C
    standards are not perfect (Keith's "42 is not an expression" is an
    example of that).

    But it is less obvious that the language should be blamed for bad
    design. As a wise man here said, "C is what it is". The reasons for
    design decisions might be lost to history, inappropriate for a modern language, or forced for compatibility reasons - but the language stands
    with the rules it has. I don't know of anyone who uses a mainstream programming language for serious work and does not think at least some
    of its design decisions are bad - "bad" is highly subjective, depending
    on both the programmer and the type of work they do. Just like for any programming language, if you are programming in C, then you need to be
    aware of the pitfalls of C or steer well clear of where pitfalls might be.

    Ultimately, programming languages are subject to the equivalent of
    market forces - the choice of language to use for a particular task is a matter of weighing up what you think are the good and bad points for
    available alternatives. As the incumbent in many situations, C of
    course has an unfair advantage - but with enough incentive, people move
    to other languages with their own benefits, disadvantages, and "bad"
    design decisions. This is a slow process, but it is the only way forward.


    As for my '"modern compilers are evil" crowd' comment, there are people
    (not anyone involved in this discussion) who really do fall into that
    camp. I've seen people who are experienced and respected developers
    make all sorts of accusations to compiler developers, claiming they are
    only interested in high scores on synthetic benchmarks and directly
    insulting their motivations and integrity, blaming them for "breaking"
    their code that relied on the effects of some kinds of UB. It is always frustrating when you have code that works fine with one compiler
    version, but using another compiler results in failure due to UB in your
    code - especially if writing correct code gives inefficient results with
    the first compiler. And it's fine to say you'd be happier if a
    particular thing that is UB in C were not UB - but it is unreasonable to
    blame compiler developers for implementing the language as it is defined.

    I am not in any way saying that critics of aspects of C (the language,
    the standards, or compiler implementations) should be dismissed or
    despised - merely that the example of loop elimination leading to UB and unexpected results is regularly used as "evidence" by those that hold
    extreme positions about C, despite it being very unrealistic for the
    issue to cause problems in real coding practice.


    It is always best if compilers are able to warn you about problems in
    your code - such as UB - and avoid surprising results. But I don't
    think it is practical to expect them to catch everything, and too many warnings will flood you with false positives. (gcc used to have a
    warning for when code was elided - as the compiler got stronger and
    gained more optimisations, the warning was dropped because eliding code happened far too often to warn about.)


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 11:02:24 2026
    From Newsgroup: comp.lang.c

    On 11/06/2026 22:29, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    The idea of all this is given in a footnote in the C standards - "This
    is intended to allow compiler transformations such as removal of empty
    loops even when termination cannot be proven."

    The loop might originally have contained source code, but become empty
    through pre-processing, or from other compiler transformations (such
    as the compiler seeing that the "keep_going" variable is not volatile
    and its value is never used, so assignments to it can be elided, or
    moving other things outside the loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite. But is it likely? In my
    experience, infinite loops are generally very clearly written - either
    as "for (;;)" loops or "while (true)" loops - or they are the result
    of bugs in the code that accidentally run forever. If the loop is
    accidentally infinite, the programmer will already be expecting it to
    run the code after the loop.

    How about a loop that has a non-constant condition, but that is
    not expected to terminate in normal usage?

    while (! something_really_bad_happened()) {
    sleep(1);
    }
    self_destruct();

    A compiler could "assume" that the loop terminates, even if something_really_bad never happens, and that assumption could result in
    a call to self_destruct(). There are probably better ways to do that,
    but it's straightforward code with seemingly obvious semantics that
    an implementation is permitted to make unwarrated assumptions about.


    The compiler can only assume that if it knows that the controlling
    expression - the call to "something_really_bad_happened()" - does not
    contain any IO operations, volatile accesses or atomic operations.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 11:37:39 2026
    From Newsgroup: comp.lang.c

    On 12/06/2026 01:46, Dan Cross wrote:
    In article <110fgbi$1qf9f$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <110cre9$13aa9$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    [...]
    I see you did not read the other messages in the (sub)thread,
    but ok, here it is again, in C:

    ```
    term% cat what.c
    #include <stdio.h>
    int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; } >>>>> void hello(void) { printf("Hello, World!\n"); }
    term% clang --version | sed 1q
    clang version 22.1.6
    term% clang -Wall -pedantic -pedantic-errors -O1 -std=c23 -o what what.c >>>>> what.c:2:58: warning: for loop has empty body [-Wempty-body]
    2 | int main(void) { for (unsigned int k = 0; k != 1; k += 2); return 0; }
    | ^
    what.c:2:58: note: put the semicolon on a separate line to silence this warning
    1 warning generated.
    term% ./what
    Hello, World!
    term%
    ```

    I see the same behavior.

    The following largely repeats what I've written previously in
    this thread.

    Apparently the authors of clang decided that this statement in N3220
    6.8.6.p4:

    An iteration statement may be assumed by the implementation to
    terminate if its controlling expression is not a constant
    expression, ...

    means that a program that violates that assumption has undefined
    behavior. I intensely dislike both the rule and the way it's stated,
    but I agree that the conclusion that the behavior is undefined is
    a reasonable one.

    I think the behavior is technical "unspecified" in the sense of
    the C standard, but yes, this is the important bit. The
    controlling expresion is not constant, and the loop doesn't meet
    any of the other criteria set forth in sec 6.8.6 para 4 for,
    therefore, the translator may assume it terminates (it is
    unspecified whether or not it does; either behavior is correct.
    GCC, for example, appears not to make the same assumption).

    Why do you think the behavior is unspecified rather that undefined?

    Unspecified behavior is defined as: "behavior, that results from
    the use of an unspecified value, or other behavior upon which
    this document provides two or more possibilities and imposes
    no further requirements on which is chosen in any instance".
    (Implementation-defined behavior differs from unspecified behavior
    in that the implementation must document how the choice is made.)

    What are the "two more more possibilities" in this case?

    The two choices are that the implementation may assume the loop
    terminates, or it may not, but it doesn't say which. I don't
    think that the language permits it to be UB. But I could be
    wrong. It's a bit of a distinction without a difference as far
    as the outcome is concerned.

    - Dan C.


    I think perhaps there is both undefined and unspecified aspects here.

    The implementation may assume the loop terminates - that means, to me,
    that there are no requirements for what happens if the loop does not terminate. Not terminating would be UB.

    However, I don't support clang's reasoning after that in this case. As
    I see it, a compiler can reason that the loop terminates and then
    executes "return 0;" because the non-terminating situation is UB and
    cannot occur. Thus it can skip the loop and go straight to "return 0;".
    Alternatively, it can reason that the non-terminating situation is UB
    and we don't care what happens if it does not terminate - so "return 0;"
    would be fine in that case too, simplifying the generated code.

    But it seems that clang is reasoning that it can assume the loop
    terminates, and it can prove that the loop does not terminate, and this contradiction means that anything is allowed (including skipping all
    code generation). The code has two conflicting semantics - it is an
    infinite loop, and it is a terminating loop. I think the standards say
    that the compiler /may/ consider the terminating loop interpretation as correct, thus giving just "return 0;", or it may choose not to consider
    that it terminates, and generate an infinite loop. Clang appears to
    think that it can pick both options at once, which would give
    contradictory behaviour, and therefore jump straight to UB.

    I would say that the best behaviour for a compiler here would be to give
    a warning, then it should pick one or the other defined behaviours.
    (gcc picks the infinite loop, but does not give any warning.) I cannot
    say for sure that clang's behaviour is incorrect - but it is certainly
    very unhelpful and poor quality of implementation.

    (I also think that it makes sense for compilers to use the "ud2" or
    similar "undefined behaviour" trap instructions in cases where they know
    an execution path is definitely UB and doing so does not affect the
    efficiency of non-UB paths.)

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Fri Jun 12 12:55:36 2026
    From Newsgroup: comp.lang.c

    On 11/06/2026 20:29, Janis Papanagnou wrote:
    On 2026-06-10 09:04, David Brown wrote:
    [...]

    The rough equivalent of the distinction between Pascal procedures and
    functions is that procedures are like C functions that have "void"
    return type.  It's fine (and not at all a bad idea) for a language to
    distinguish between void and non-void like this.  What cannot easily
    be done in a clear and consistent way is to distinguish between two
    expressions of type "int" (or any other general non-void type).

    Here I cannot follow you. - The C-compiler can analyze code to do optimizations and even (as so often stated) "assume" things about
    the intent concerning UB and optimization but cannot value facts
    about types and context? - If so, then it sounds rather arbitrary.


    I think this thread is getting difficult to follow - there is a lot of wandering and vagueness (mostly from me, I must admit). So I am not
    sure if it is worth pursuing further.

    However, what I am trying to say is that it is easy for a programming
    language design to make a distinction between "things that result in a
    value of a type like int" and "things that do not result in a value" -
    and then the language could decide that the former are "expressions"
    that cannot stand alone, and the later are "statements". It is much
    harder for a language description to say that /some/ "things that result
    in a value of a type like int" can be used as "statements", while others
    can be used only as "expressions" and not stand-alone statements.

    [...]

    It is also fine for a language to distinguish between "pure" functions
    and functions/procedures with side-effects and/or functions/procedures
    with observable behaviour.  (A "pure procedure" would not do anything.)

    By "would not do anything" you probably mean that it would not have side-effects on/with relatively global entities in the program?

    Yes.

    A "pure" function is one whose output depends entirely on its input parameters, and has no side-effects. (Some details of the definition
    may be varied, such as the ability to read global data that never
    changes after the first call. Perhaps memoizing might also be allowed.)
    If you don't use the value of a call to a pure function - or if the
    pure function does not return a value - then it can't do anything useful.


    As far as I remember, Pascal does not make that distinction.

    Pascal functions and procedures can affect and be affected by global entities. Predefined functions and procedures can have side effects
    also unrelated to global entities in the program (e.g. print effect).
    A procedure/function not affecting the global (or surrounding stack) environment could likely be identified. But here we're anyway talking
    about the (clean!) return-interface of functions (as opposed to the procedures).


    That all agrees with what I thought.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Fri Jun 12 12:27:00 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    On 11/06/2026 17:34, Janis Papanagnou wrote:
    On 2026-06-11 08:56, David Brown wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:
    [...]

    #include <stdio.h>
    int main(void) {
         bool keep_going = true;
         while (keep_going) {
             keep_going = true;
         }
         puts("never reached");
    }

    [...]

    [...]

    The loop might originally have contained source code, but become
    empty through pre-processing, or from other compiler
    transformations (such as the compiler seeing that the "keep_going"
    variable is not volatile and its value is never used, so
    assignments to it can be elided, or moving other things outside the
    loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite.  But is it likely?

    I think we should not make any assumptions about the "creativity" of
    a programmer ("C" or else). - Semantics should be well defined, and
    then clear to the programmer.

    I think the semantics of this "loops can be assumed to terminate" are
    clearly defined in the standard. I agree that the details might not
    be known to all C programmers, but I think they are only relevant in a
    very small number of cases.

    I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
    is specified in terms of what an implementation may "assume", not in
    terms of the semantics of the program. One can conclude that this
    means that the program has undefined behavior if the assumption is
    violated, but that's not directly stated. I don't know how many C
    programmers know the standard well enough to reach that conclusion.
    I'm not even 100% sure it's accurate.

    The permission was added in C11 with little fanfare. It's not
    mentioned in the list of major changes in the C11 Foreword.
    The cases where it applies may be rarer than I had assumed, but
    it at least has the potential to break existing code that was well
    defined in C99.

    The rationale is to provide more opportunities for optimization,
    but it's not at all clear (at least to me) that it's particularly
    successful. If cases where it can cause problems are rare, then
    presumably cases where it's actually useful are rare. (That may
    be an oversimplification.)
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jun 13 12:36:15 2026
    From Newsgroup: comp.lang.c

    On 12/06/2026 21:27, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 11/06/2026 17:34, Janis Papanagnou wrote:
    On 2026-06-11 08:56, David Brown wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:
    [...]

    #include <stdio.h>
    int main(void) {
         bool keep_going = true;
         while (keep_going) {
             keep_going = true;
         }
         puts("never reached");
    }

    [...]

    [...]

    The loop might originally have contained source code, but become
    empty through pre-processing, or from other compiler
    transformations (such as the compiler seeing that the "keep_going"
    variable is not volatile and its value is never used, so
    assignments to it can be elided, or moving other things outside the
    loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite.  But is it likely?

    I think we should not make any assumptions about the "creativity" of
    a programmer ("C" or else). - Semantics should be well defined, and
    then clear to the programmer.

    I think the semantics of this "loops can be assumed to terminate" are
    clearly defined in the standard. I agree that the details might not
    be known to all C programmers, but I think they are only relevant in a
    very small number of cases.

    I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
    is specified in terms of what an implementation may "assume", not in
    terms of the semantics of the program. One can conclude that this
    means that the program has undefined behavior if the assumption is
    violated, but that's not directly stated. I don't know how many C programmers know the standard well enough to reach that conclusion.
    I'm not even 100% sure it's accurate.

    The permission was added in C11 with little fanfare. It's not
    mentioned in the list of major changes in the C11 Foreword.
    The cases where it applies may be rarer than I had assumed, but
    it at least has the potential to break existing code that was well
    defined in C99.

    The rationale is to provide more opportunities for optimization,
    but it's not at all clear (at least to me) that it's particularly
    successful. If cases where it can cause problems are rare, then
    presumably cases where it's actually useful are rare. (That may
    be an oversimplification.)


    I agree on that last point. I doubt if any code would suffer if the
    paragraph were removed entirely from the standard. And while I also
    don't think much real-world code is at risk of problems from its
    inclusion in the standard, as long as there is some risk of problems
    with existing correct code, or some risk of confusion or
    misunderstanding on the part of programmers reading the standard, then
    it would be better if that paragraph had not been added to the standard
    at all.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 13 12:02:24 2026
    From Newsgroup: comp.lang.c

    In article <110ghmv$21vi3$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    As for my '"modern compilers are evil" crowd' comment, there are people
    (not anyone involved in this discussion) who really do fall into that
    camp. I've seen people who are experienced and respected developers
    make all sorts of accusations to compiler developers, claiming they are
    only interested in high scores on synthetic benchmarks and directly >insulting their motivations and integrity, blaming them for "breaking"
    their code that relied on the effects of some kinds of UB. It is always >frustrating when you have code that works fine with one compiler
    version, but using another compiler results in failure due to UB in your >code - especially if writing correct code gives inefficient results with
    the first compiler. And it's fine to say you'd be happier if a
    particular thing that is UB in C were not UB - but it is unreasonable to >blame compiler developers for implementing the language as it is defined.

    Eh...I think those people have a point.

    Note, I don't think that "modern compilers are evil" (I mean,
    wow, that's a strong word) and I certainly do not think it is
    appropriate to malign the people who write them personally over
    what one does with code.

    But I _do_ think it is fair to say that UB is very easy to fall
    into in C, that programs that have worked correctly (insofar as
    their intended behavior as written) for years can suddenly fail
    because latent UB is treated differently in a point revision of
    a compiler, and that that (as you point out) can be incredibly
    frustrating for the authors.

    Regehr called out a dichotomy with UB: programmers using a
    language hate it; compiler writers love it.

    Here's my own vignette: I was chatting with a friend who works
    on LLVM and clang some time ago. I said, "I don't want UB" and
    he replied, "no, you really do." I asked him what he meant and
    he responded that I wanted a compiler that is capable of
    optimizing my program; "sure, but I still don't want UB." We
    went on for a bit, and it became clear that he saw UB as _the_
    vehicle for unlocking optimization.

    I realized that we were not speaking the same language _at all_.
    He and I both wanted a language where we could write programs
    that yield efficient object code. He saw UB as essential for
    that; but what I want is a language with well-defined semantics
    that can be aggressively optimized.

    That, I think, is the tension: there was a fundamental breakdown
    in communication between the users of the language, and those
    defining and implementing it. My subjective sense is that in
    the past few years things are getting somewhat better, but it is
    hard to evolve something as critical and widely used as C.

    I am not in any way saying that critics of aspects of C (the language,
    the standards, or compiler implementations) should be dismissed or
    despised - merely that the example of loop elimination leading to UB and >unexpected results is regularly used as "evidence" by those that hold >extreme positions about C, despite it being very unrealistic for the
    issue to cause problems in real coding practice.

    The kernel I am working on has about 5 million lines of code.
    That code has been evolving for 40 years; some of it predates
    the ISO standards and even the ANSI standard. It has been
    updated for newer compilers, sure, but in some places the
    treatment is surface-level: using ISO-style function prototypes
    and definition syntax, for example. But deep problems remain in
    parts, and contraints on engineering resources couple with
    economic and business pressures so that it's not going to get
    cleaned up any time soon. I'm sure there is UB in it; in fact,
    I know there is. But them's the breaks; and yet, customers are
    using it in production. Because of this, upgrading toolchains
    is laborious and complex, and takes a lot of time, and new
    compilers are (rightly) viewed with suspicion. That is not a
    great situation, but I don't think anyone is angry at the
    compiler people over it.

    And just as it's not acceptable to blame compiler writers for
    implementating the language as it is defined, it's not really
    acceptable to blame programmers either; some of the people who
    put the UB there are (literally) dead, and there's just not
    enough time in the day to go clean it all up. I wish there was
    more compassion for that.

    As said earlier, C is what it is. I suspect that it will
    continue to make incremental improvements, but we're basically
    stuck with what we have.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 13 12:03:47 2026
    From Newsgroup: comp.lang.c

    In article <110hmi7$2e85g$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 11/06/2026 17:34, Janis Papanagnou wrote:
    On 2026-06-11 08:56, David Brown wrote:
    On 10/06/2026 23:47, Keith Thompson wrote:
    [...]

    #include <stdio.h>
    int main(void) {
         bool keep_going = true;
         while (keep_going) {
             keep_going = true;
         }
         puts("never reached");
    }

    [...]

    [...]

    The loop might originally have contained source code, but become
    empty through pre-processing, or from other compiler
    transformations (such as the compiler seeing that the "keep_going"
    variable is not volatile and its value is never used, so
    assignments to it can be elided, or moving other things outside the
    loop body).

    A programmer /could/ write the "keep_going" loop you gave, and
    mistakenly believe it to be infinite.  But is it likely?

    I think we should not make any assumptions about the "creativity" of
    a programmer ("C" or else). - Semantics should be well defined, and
    then clear to the programmer.

    I think the semantics of this "loops can be assumed to terminate" are
    clearly defined in the standard. I agree that the details might not
    be known to all C programmers, but I think they are only relevant in a
    very small number of cases.

    I disagree that the semantics are clearly defined. N3220 6.8.6.1p4
    is specified in terms of what an implementation may "assume", not in
    terms of the semantics of the program. One can conclude that this
    means that the program has undefined behavior if the assumption is
    violated, but that's not directly stated. I don't know how many C >programmers know the standard well enough to reach that conclusion.
    I'm not even 100% sure it's accurate.

    The permission was added in C11 with little fanfare. It's not
    mentioned in the list of major changes in the C11 Foreword.
    The cases where it applies may be rarer than I had assumed, but
    it at least has the potential to break existing code that was well
    defined in C99.

    Another example of something that was previously well-defined
    and is now UB, I guess. :-/

    The rationale is to provide more opportunities for optimization,
    but it's not at all clear (at least to me) that it's particularly
    successful. If cases where it can cause problems are rare, then
    presumably cases where it's actually useful are rare. (That may
    be an oversimplification.)

    I'm not sure that's the rationale: rather, it's to guarantee
    forward progress. Again, that's not really the language's
    purview.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.c on Sat Jun 13 12:13:13 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
    Here's my own vignette: I was chatting with a friend who works
    on LLVM and clang some time ago. I said, "I don't want UB" and
    he replied, "no, you really do." I asked him what he meant and

    Might like to have a look at the video

    "Garbage In, Garbage Out, Arguing about Undefined Behavior
    with Nasal Demons" (2016) by Chandler Carruth.

    IIRC it essential takes the point of your friend, but maybe adds
    some explanations. At 15' in, it discusses the suggestion to
    "define all the behavior". It's for C++, but I think some of it
    might apply to C as well. At 24' come some examples.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sat Jun 13 12:44:29 2026
    From Newsgroup: comp.lang.c

    In article <video-20260613131240@ram.dialup.fu-berlin.de>,
    Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
    Here's my own vignette: I was chatting with a friend who works
    on LLVM and clang some time ago. I said, "I don't want UB" and
    he replied, "no, you really do." I asked him what he meant and

    Might like to have a look at the video

    "Garbage In, Garbage Out, Arguing about Undefined Behavior
    with Nasal Demons" (2016) by Chandler Carruth.

    IIRC it essential takes the point of your friend, but maybe adds
    some explanations. At 15' in, it discusses the suggestion to
    "define all the behavior". It's for C++, but I think some of it
    might apply to C as well. At 24' come some examples.

    I'm not a huge fan of Carruth.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Jun 13 14:57:52 2026
    From Newsgroup: comp.lang.c

    On 2026-06-12 04:20, Waldek Hebisch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-11 18:30, Waldek Hebisch wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-06-09 03:25, Waldek Hebisch wrote:
    [...]

    Interesting views. - Thanks.


    I think biggest trouble is normal programmers. They already
    struggle with current standard text. More formal presentation
    could alienate even folks who now are able to explain standard
    rules to other programmers.

    I'm not sure what "normal programmers" are. From own experience
    I can just say that there's a difference between what's "formal"
    in a "lawyer's speeches and texts" sense and what's formal in a
    mathematical sense. - The C-Standard as had been quoted here is
    more of a lawyer's text, with its inherent property of not being
    formally (in a mathematical sense) accurate (despite their tries;
    in both areas, law and programming language, respectively). It's
    thus not necessarily a problem if we'd have a more [mathematical]
    formal standard. - Programmers, as I see it, need definite texts.
    And rejection of the "lawyer's" sort of texts is not surprising.
    That not necessarily affects their acceptance will of more formal
    specifications.

    You sniped most of what I wrote.

    Yes, because I acknowledged it by my above on-line remark already
    (and I didn't want to waste space unnecessarily). (No offense!)

    I intended to comment just on the one paragraph above, with its
    assumption that it may be an inherent problem to programmers.

    But this paragraph was closely linked to the text above. Dan Cross
    wanted formal semantics and my paragraph was responding to this.
    I think that lawyerish style of current C standard is mostly inertia,
    and making standard more mathematical would improve it. But giving
    formal semantic in the standard would mean significantly bigger
    change.

    Yes, you said that, and I had acknowledged that; meanwhile twice.

    I'm not sure why you persistently insist on any relation to your
    previous text when all what *I* wanted to comment on in your post
    was just _one aspect_ in your last paragraph, which was:

    I think biggest trouble is normal programmers.
    They already struggle with current standard text.

    And I expressed that I refute that view and I explained my view.

    If you think your statement about "normal programmers" (whatever
    you imply with "normal") is correct and my perception with people
    is in any way wrong we can discuss that.

    (On your other text I see nothing that we'd need to discuss.)

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Jun 13 15:01:38 2026
    From Newsgroup: comp.lang.c

    On 2026-06-12 12:55, David Brown wrote:
    On 11/06/2026 20:29, Janis Papanagnou wrote:
    [...]

    I think this thread is getting difficult to follow - there is a lot of wandering and vagueness (mostly from me, I must admit).  So I am not
    sure if it is worth pursuing further.

    I agree, and I appreciate your post to clarify some things. - Thanks.

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sat Jun 13 18:32:24 2026
    From Newsgroup: comp.lang.c

    On 13/06/2026 14:02, Dan Cross wrote:
    In article <110ghmv$21vi3$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    As for my '"modern compilers are evil" crowd' comment, there are people
    (not anyone involved in this discussion) who really do fall into that
    camp. I've seen people who are experienced and respected developers
    make all sorts of accusations to compiler developers, claiming they are
    only interested in high scores on synthetic benchmarks and directly
    insulting their motivations and integrity, blaming them for "breaking"
    their code that relied on the effects of some kinds of UB. It is always
    frustrating when you have code that works fine with one compiler
    version, but using another compiler results in failure due to UB in your
    code - especially if writing correct code gives inefficient results with
    the first compiler. And it's fine to say you'd be happier if a
    particular thing that is UB in C were not UB - but it is unreasonable to
    blame compiler developers for implementing the language as it is defined.

    Eh...I think those people have a point.

    Note, I don't think that "modern compilers are evil" (I mean,
    wow, that's a strong word) and I certainly do not think it is
    appropriate to malign the people who write them personally over
    what one does with code.

    I think it is important for tools to be helpful, and it's fine to
    complain if a tool is being directly unhelpful - or ask for improvements
    when you think it could be better.


    But I _do_ think it is fair to say that UB is very easy to fall
    into in C, that programs that have worked correctly (insofar as
    their intended behavior as written) for years can suddenly fail
    because latent UB is treated differently in a point revision of
    a compiler, and that that (as you point out) can be incredibly
    frustrating for the authors.

    It can certainly happen, yes. And I fully sympathise on these few
    occasions when changes to the standard has meant that code that
    previously had defined behaviour, now has different or undefined
    behaviour. (However, I think that for some kinds of code, programmers
    could be better at specifying exactly what standards their code
    requires, and the standards they use when compiling code.)

    But it is important to realise that if you write code with UB, it is
    /your/ mistake - not the mistake of the compiler developers, or the
    mistake of the standards authors. Compiler vendors can (and do!) try to
    help programmers find their mistakes - experience shows, however, that
    many programmers reach first for bug report forms or complaints in
    forums before compiler tools like sanitisers or even enabling warnings
    on their builds.

    Programming in C is a cooperative effort - including the standards
    authors, the compiler vendors, and the C programmers. Each group can
    try to help the others, but each is ultimately responsible for their own
    part.


    Regehr called out a dichotomy with UB: programmers using a
    language hate it; compiler writers love it.

    I think Regehr has made some good points in his writings, but I do not
    agree with him on everything.

    As a programmer, I am a fan of the concept of UB. I am quite happy with
    the idea that operations have a pre-condition, and that if there is no
    "right answer" for a given input, I should not provide that input. I
    prefer that signed integer arithmetic overflow is UB, and do not want it
    to be wrapping or have some other semantics - to me, it is far clearer
    that way. If I have UB in my code, it's a bug - no different from any
    other bug I might make.

    It is the case that in C, there are some kinds of UB that can be quite
    subtle. However, you rarely need to risk meeting them. Yes, there are pitfalls - don't go near them, and they don't matter.

    However, it is unfortunately the case that sometimes avoiding UB can be
    costly in performance terms. An example would be if you have need of type-punning - perhaps you have a float in memory and you want to access
    it as an uint32_t for some reason. Casting a float * to an uint32_t *
    and using that new pointer is UB. Some compilers will nonetheless
    generate the code you want after such a cast. Some compilers might not, depending on details of the rest of the surrounding code, because it is
    UB. A non-UB solution would be to use memcpy(), or a type-punning
    union. For highly optimising compilers, that's fine - the code
    generated by gcc or clang for a memcpy() here is likely to be as
    efficient as you could get - directly reading the float from memory to
    an integer register. For other compilers, however, you might get a call
    to a memcpy() library function in an external DLL, taking orders of
    magnitude more cycles. What is the poor programmer to do? Write code
    that is portable and correct, but very slow with some implementations?
    Write code that "cheats" and is efficient on some implementations but
    might not give the desired results on others? Use pre-processor
    monstrosities to detect different compilers and adapt accordingly? That
    is what I see as the biggest issue resulting from compiler optimisation
    based on UB. I don't know what the "best" answer here is.


    Here's my own vignette: I was chatting with a friend who works
    on LLVM and clang some time ago. I said, "I don't want UB" and
    he replied, "no, you really do." I asked him what he meant and
    he responded that I wanted a compiler that is capable of
    optimizing my program; "sure, but I still don't want UB." We
    went on for a bit, and it became clear that he saw UB as _the_
    vehicle for unlocking optimization.

    I realized that we were not speaking the same language _at all_.
    He and I both wanted a language where we could write programs
    that yield efficient object code. He saw UB as essential for
    that; but what I want is a language with well-defined semantics
    that can be aggressively optimized.

    I too want a language with well-defined semantics that can be
    aggressively optimised. But I do not see UB as a hinder to that. I am
    happy knowing that I cannot divide by 0, or find the square root of a
    negative number (in the real domain). I am happy knowing that I cannot
    add two ints if their sum overflows the range of their type, and that I
    cannot call a function with a different number or type of parameters
    than its definition. I have a great deal of difficulty seeing how
    things could be any different, other than in a managed language with significant overhead from run-time checks - and that goes against the "aggressively optimised" requirement.

    Having "well-defined semantics" does not mean the language should accept anything that happens to fit the syntax and grammar rules, or that all functions and operations should give a defined result for all inputs.
    It means that the set of valid inputs is clearly defined, along with the outputs and effects you get when the inputs are valid.

    (There are plenty of points in the C standards where the wording could
    make the semantics clearer, or where the range of input values could
    easily have been larger - I am not suggesting C is as well-defined as it
    could reasonably be.)


    That, I think, is the tension: there was a fundamental breakdown
    in communication between the users of the language, and those
    defining and implementing it. My subjective sense is that in
    the past few years things are getting somewhat better, but it is
    hard to evolve something as critical and widely used as C.


    Communication between the separate parties is always an issue, and it is
    easy for it to be a one-way street with a language standards committee dictating the rules with little attention to feedback, then compiler
    vendors following these rules without listening to the users.

    A challenge here, perhaps, is that users are a very diverse group. How
    much should compiler vendors cater for those that put a lot of effort
    into correctness and want top efficiency, or those that are less
    knowledgable about the language but want to avoid the consequences of
    their mistakes? What about those working with old code written for
    different compilers with different unwritten rules? It is not easy to
    please everyone.


    I am not in any way saying that critics of aspects of C (the language,
    the standards, or compiler implementations) should be dismissed or
    despised - merely that the example of loop elimination leading to UB and
    unexpected results is regularly used as "evidence" by those that hold
    extreme positions about C, despite it being very unrealistic for the
    issue to cause problems in real coding practice.

    The kernel I am working on has about 5 million lines of code.
    That code has been evolving for 40 years; some of it predates
    the ISO standards and even the ANSI standard. It has been
    updated for newer compilers, sure, but in some places the
    treatment is surface-level: using ISO-style function prototypes
    and definition syntax, for example. But deep problems remain in
    parts, and contraints on engineering resources couple with
    economic and business pressures so that it's not going to get
    cleaned up any time soon. I'm sure there is UB in it; in fact,
    I know there is. But them's the breaks; and yet, customers are
    using it in production. Because of this, upgrading toolchains
    is laborious and complex, and takes a lot of time, and new
    compilers are (rightly) viewed with suspicion. That is not a
    great situation, but I don't think anyone is angry at the
    compiler people over it.

    I think that is a good way to handle the situation. In my projects, I
    do not normally upgrade or change toolchains. While I think the risk of
    UB is small in my own code, small does not mean non-existent. And for
    my work, generated code that behaves correctly in terms of C semantics
    but has different execution times or code size might also be an issue -
    so changes in toolchains mean a lot of extra testing and qualification.
    In addition, for some microcontrollers the toolchains have relatively
    small user bases and consequently higher risks of unknown bugs in the toolchains themselves. Sometimes there are also implementation-specific features that change between versions (though that is less of an issue
    these days).


    And just as it's not acceptable to blame compiler writers for
    implementating the language as it is defined, it's not really
    acceptable to blame programmers either; some of the people who
    put the UB there are (literally) dead, and there's just not
    enough time in the day to go clean it all up. I wish there was
    more compassion for that.


    Being dead does not resolve you of the responsibility - the person that
    wrote the code with UB is the person who wrote the code with the UB,
    just like any other bugs. That person wrote the code with the error.
    It might not be fair to hold it against them - there are a great many
    possible reasons why it was not their fault (typically management is
    more at fault than the coders!). And placing blame is rarely a useful exercise - usually it does not matter where the bugs came from, only
    that they are there and need to be fixed or worked around.

    As said earlier, C is what it is. I suspect that it will
    continue to make incremental improvements, but we're basically
    stuck with what we have.

    - Dan C.


    Agreed.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Sun Jun 14 14:33:33 2026
    From Newsgroup: comp.lang.c

    In article <110k0mp$329k6$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 13/06/2026 14:02, Dan Cross wrote:
    In article <110ghmv$21vi3$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    As for my '"modern compilers are evil" crowd' comment, there are people
    (not anyone involved in this discussion) who really do fall into that
    camp. I've seen people who are experienced and respected developers
    make all sorts of accusations to compiler developers, claiming they are
    only interested in high scores on synthetic benchmarks and directly
    insulting their motivations and integrity, blaming them for "breaking"
    their code that relied on the effects of some kinds of UB. It is always >>> frustrating when you have code that works fine with one compiler
    version, but using another compiler results in failure due to UB in your >>> code - especially if writing correct code gives inefficient results with >>> the first compiler. And it's fine to say you'd be happier if a
    particular thing that is UB in C were not UB - but it is unreasonable to >>> blame compiler developers for implementing the language as it is defined. >>
    Eh...I think those people have a point.

    Note, I don't think that "modern compilers are evil" (I mean,
    wow, that's a strong word) and I certainly do not think it is
    appropriate to malign the people who write them personally over
    what one does with code.

    I think it is important for tools to be helpful, and it's fine to
    complain if a tool is being directly unhelpful - or ask for improvements >when you think it could be better.

    Yes.

    But I _do_ think it is fair to say that UB is very easy to fall
    into in C, that programs that have worked correctly (insofar as
    their intended behavior as written) for years can suddenly fail
    because latent UB is treated differently in a point revision of
    a compiler, and that that (as you point out) can be incredibly
    frustrating for the authors.

    It can certainly happen, yes. And I fully sympathise on these few
    occasions when changes to the standard has meant that code that
    previously had defined behaviour, now has different or undefined
    behaviour. (However, I think that for some kinds of code, programmers
    could be better at specifying exactly what standards their code
    requires, and the standards they use when compiling code.)

    But it is important to realise that if you write code with UB, it is
    /your/ mistake - not the mistake of the compiler developers, or the
    mistake of the standards authors. Compiler vendors can (and do!) try to >help programmers find their mistakes - experience shows, however, that
    many programmers reach first for bug report forms or complaints in
    forums before compiler tools like sanitisers or even enabling warnings
    on their builds.

    Programming in C is a cooperative effort - including the standards
    authors, the compiler vendors, and the C programmers. Each group can
    try to help the others, but each is ultimately responsible for their own >part.

    Here's the problem that I have with this line of reasoning. C
    is a language that has considerable history; there was a large
    body of C code written before the first standard was ever
    created, in 1988; C was a teenager. And it took many years for
    decent quality ANSI C compilers to be ubiquitous. C could
    legally drink by then.

    "Undefined Behavior", in C, in the manner usually discussed in
    this newsgroup, was introduced with the first standard. That
    means that there is --- still --- a large body of software that
    has "UB" that was put there before UB existed as a thing
    programmers needed to worry about in C.

    Even once it was a part of C, the concept was communicated
    poorly.

    Some people seem to delight in this, believing precision in
    interpreting the standard in abstruse ways is an expression of
    deep technical expertise; but it really is not.

    Yes, UB is created by programmers. However, in large systems,
    it may be that it was created inadvertantly; someone makes a
    change that subtley invalidates some invariant that an unknown
    caller far away in the code base (or in another one that relies
    on the change via an indirect dependency) and now you've got UB;
    locally, everything appears correct; but it's the combination
    where the UB manifests.

    Regehr called out a dichotomy with UB: programmers using a
    language hate it; compiler writers love it.

    I think Regehr has made some good points in his writings, but I do not
    agree with him on everything.

    As a programmer, I am a fan of the concept of UB. I am quite happy with
    the idea that operations have a pre-condition, and that if there is no >"right answer" for a given input, I should not provide that input. I
    prefer that signed integer arithmetic overflow is UB, and do not want it
    to be wrapping or have some other semantics - to me, it is far clearer
    that way. If I have UB in my code, it's a bug - no different from any
    other bug I might make.

    This example makes little sense to me. If you don't want
    integer overflow, then don't overflow; the techniques for
    avoiding it are pretty well known. But why is specifically
    better that it is UB, rather than than trapping in debug
    builds, or having IB semantics based on the underlying machine?
    It seems to be that the burden on the programmer is the same.

    It is the case that in C, there are some kinds of UB that can be quite >subtle. However, you rarely need to risk meeting them. Yes, there are >pitfalls - don't go near them, and they don't matter.

    I disagree. I think almost all non-trivial programs have UB to
    a greater or lesser extent, whether they intend to or not.

    However, it is unfortunately the case that sometimes avoiding UB can be >costly in performance terms. An example would be if you have need of >type-punning - perhaps you have a float in memory and you want to access
    it as an uint32_t for some reason. Casting a float * to an uint32_t *
    and using that new pointer is UB. Some compilers will nonetheless
    generate the code you want after such a cast. Some compilers might not, >depending on details of the rest of the surrounding code, because it is
    UB. A non-UB solution would be to use memcpy(), or a type-punning
    union. For highly optimising compilers, that's fine - the code
    generated by gcc or clang for a memcpy() here is likely to be as
    efficient as you could get - directly reading the float from memory to
    an integer register. For other compilers, however, you might get a call
    to a memcpy() library function in an external DLL, taking orders of >magnitude more cycles. What is the poor programmer to do? Write code
    that is portable and correct, but very slow with some implementations?
    Write code that "cheats" and is efficient on some implementations but
    might not give the desired results on others? Use pre-processor >monstrosities to detect different compilers and adapt accordingly? That
    is what I see as the biggest issue resulting from compiler optimisation >based on UB. I don't know what the "best" answer here is.

    This is kind of my point. If you need a fast way to convery

    Here's my own vignette: I was chatting with a friend who works
    on LLVM and clang some time ago. I said, "I don't want UB" and
    he replied, "no, you really do." I asked him what he meant and
    he responded that I wanted a compiler that is capable of
    optimizing my program; "sure, but I still don't want UB." We
    went on for a bit, and it became clear that he saw UB as _the_
    vehicle for unlocking optimization.

    I realized that we were not speaking the same language _at all_.
    He and I both wanted a language where we could write programs
    that yield efficient object code. He saw UB as essential for
    that; but what I want is a language with well-defined semantics
    that can be aggressively optimized.

    I too want a language with well-defined semantics that can be
    aggressively optimised. But I do not see UB as a hinder to that.

    UB is literally the opposite of well-defined.

    I am happy knowing that I cannot divide by 0,

    Yup. That should be a trap.

    or find the square root of a negative number (in the real
    domain).

    Yup. That should be a trap.

    I am happy knowing that I cannot add two ints if their sum
    overflows the range of their type,

    Yup. That should be a trap (if you want wrapping semantics, you
    should request it explicitly).

    and that I cannot call a function with a different number or
    type of parameters than its definition.

    Yup. That should be a compile-time error.

    I have a great deal of difficulty seeing how things could be
    any different, other than in a managed language with significant
    overhead from run-time checks - and that goes against the
    "aggressively optimised" requirement.

    There are existence proofs of other languages that can, and do,
    do these things, and do them well. I hate to keep beating this
    drum, but I think Rust does well here: in safe Rust, UB is a
    compile-time error; in *unsafe* Rust, there are tools to help
    find where programmers violate the language's invariants.

    Having "well-defined semantics" does not mean the language should accept >anything that happens to fit the syntax and grammar rules, or that all >functions and operations should give a defined result for all inputs.

    I never said that it did.

    It means that the set of valid inputs is clearly defined, along with the >outputs and effects you get when the inputs are valid.

    So I was the one who said "well-defined semantics" and I had a
    specific meaning in mind. Your definition is incomplete with
    respect to that meaning: in addition to what you said, invalid
    inputs should be rejected, either as a compile time error, or by
    generating an exception or panic at runtime. If you want to
    live dangerously and turn the runtime checks off for performance
    reasons, then you get 2's complement behavior for integers or
    whatever the machine does for the others.

    (There are plenty of points in the C standards where the wording could
    make the semantics clearer, or where the range of input values could
    easily have been larger - I am not suggesting C is as well-defined as it >could reasonably be.)

    It's not just that it's nowhere close to being as well-defined
    as it should be, it's because the language as defined permits
    behavior that varies far too widely, specifically because of UB.

    Consider one of the examples you gave: signed integer overflow.
    The standard doesn't say that you _can't_ add two numbers
    together if you overflow, it just says that if you do, the
    language imposes no requirements on the resulting behavior. It
    may trap, it may elide the addition entirely, or it may do it
    and let the result be whatever the underlying machine does.

    That is, the _language_ does not say that it's a bug; it says
    that it's not going to say anything about it at all.

    This is one reason the committee is trying to reign some of this
    in.

    That, I think, is the tension: there was a fundamental breakdown
    in communication between the users of the language, and those
    defining and implementing it. My subjective sense is that in
    the past few years things are getting somewhat better, but it is
    hard to evolve something as critical and widely used as C.

    Communication between the separate parties is always an issue, and it is >easy for it to be a one-way street with a language standards committee >dictating the rules with little attention to feedback, then compiler
    vendors following these rules without listening to the users.

    A challenge here, perhaps, is that users are a very diverse group. How
    much should compiler vendors cater for those that put a lot of effort
    into correctness and want top efficiency, or those that are less >knowledgable about the language but want to avoid the consequences of
    their mistakes? What about those working with old code written for >different compilers with different unwritten rules? It is not easy to >please everyone.

    I think that's simplistic; not many programmers actively want to
    "avoid the consequences of their mistakes." Do you really
    believe that they do? If so, why?

    Conversely, there *is* this kind of machismo attitude among many
    C programmers that it requires a superior intellect to truly
    understand this language, and those who do not (or who make any
    mistake in their understanding) are simply unworthy. I have
    repeatedly observed this over many decades now, and when I see
    it, I think that it is odious.

    My experience is that most programmers are highly intelligent,
    capable people. They are not wrong to want behavior they can
    rely on, particularly when things are not obvious, as they
    often are not. They also want a language that requires a less
    lawyerly read of to understand its semantics; that could go the
    way of formality (my preferred approach) or just clearer
    exposition. Either would be preferable to the current state.

    In fairness, I think the current members of the committee
    recognize this.

    I am not in any way saying that critics of aspects of C (the language,
    the standards, or compiler implementations) should be dismissed or
    despised - merely that the example of loop elimination leading to UB and >>> unexpected results is regularly used as "evidence" by those that hold
    extreme positions about C, despite it being very unrealistic for the
    issue to cause problems in real coding practice.

    The kernel I am working on has about 5 million lines of code.
    That code has been evolving for 40 years; some of it predates
    the ISO standards and even the ANSI standard. It has been
    updated for newer compilers, sure, but in some places the
    treatment is surface-level: using ISO-style function prototypes
    and definition syntax, for example. But deep problems remain in
    parts, and contraints on engineering resources couple with
    economic and business pressures so that it's not going to get
    cleaned up any time soon. I'm sure there is UB in it; in fact,
    I know there is. But them's the breaks; and yet, customers are
    using it in production. Because of this, upgrading toolchains
    is laborious and complex, and takes a lot of time, and new
    compilers are (rightly) viewed with suspicion. That is not a
    great situation, but I don't think anyone is angry at the
    compiler people over it.

    I think that is a good way to handle the situation. In my projects, I
    do not normally upgrade or change toolchains. While I think the risk of
    UB is small in my own code, small does not mean non-existent. And for
    my work, generated code that behaves correctly in terms of C semantics
    but has different execution times or code size might also be an issue -
    so changes in toolchains mean a lot of extra testing and qualification.

    Obviously in a production setting tools should be tested and
    qualified. But the danger posed by UB adds unacceptable risk on
    large projects, and the burden for updating a toolchain is too
    high. That is as much an indictment of the language as of any
    particular project.

    As a counter example, there was the Harvey project, which was a
    fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
    C; we accounted for this by having CI build with 6 seperate
    compilers; this flushed out a lot of bugs.

    I am surprised that more projects do not adopt canary CI builds
    against newer toolchains.

    In addition, for some microcontrollers the toolchains have relatively
    small user bases and consequently higher risks of unknown bugs in the >toolchains themselves. Sometimes there are also implementation-specific >features that change between versions (though that is less of an issue
    these days).

    Fun fact: part of the reason Google got involved in clang and
    LLVM development was because the vendor toolchain for a
    particular microcontroller used in android phones was buggy and
    would crash (that is, the compiler itself crashed). The
    solution was not to live with it; it was to build a better
    toolchain.

    Google could afford to do that; I recognize not many
    organizations can.

    And just as it's not acceptable to blame compiler writers for
    implementating the language as it is defined, it's not really
    acceptable to blame programmers either; some of the people who
    put the UB there are (literally) dead, and there's just not
    enough time in the day to go clean it all up. I wish there was
    more compassion for that.

    Being dead does not resolve you of the responsibility - the person that >wrote the code with UB is the person who wrote the code with the UB,
    just like any other bugs. That person wrote the code with the error.

    See above. Those people may well have written the code before C
    was standardized and before UB as we know it now existed. Also,
    by definition UB is not an error.

    It might not be fair to hold it against them - there are a great many >possible reasons why it was not their fault (typically management is
    more at fault than the coders!). And placing blame is rarely a useful >exercise - usually it does not matter where the bugs came from, only
    that they are there and need to be fixed or worked around.

    Exactly. The footguns hiding in C code that has worked
    perfectly for decades, dating back to before the standards
    existed, are legion. Caveat emptor.

    _Or_ the code may have been written with careful regard for the
    standard, but something _else_ may have been changed that now
    leads to exposure to UB. For example, perhaps code was written
    that multiples two numbers, `a*b`; a known to be `unsigned int`
    when written, but `b` is a signed int. But maybe that is hidden
    behind a typedef; some time in the future, the typedef is
    changed so that `a` is now `unsigned short`; perhaps someone
    realized that the domain values never exceed 16 bits and by
    changing the definition some critical structure now fits in a
    single cache line. But also now the type promotion rules kick
    so that `a*b` happens with the factors as `signed int` and in
    there exist values of `a` and `b` where `a*b` overflows: UB.

    The code had no UB; the change was elsewhere; no one saw this
    because the tests all passed and everything looked ok; then
    someone upgrades the compiler and now things break.

    Who's fault is that?

    And no, this is not contrived; this is exactly the sort of thing
    that happens on large, long-lived projects.

    As said earlier, C is what it is. I suspect that it will
    continue to make incremental improvements, but we're basically
    stuck with what we have.

    Agreed.

    ...but be careful blaming the programmer.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.c on Sun Jun 14 17:22:22 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
    I'm not a huge fan of Carruth.

    (Text after "| " below was generated by a chatbot asked to explain
    narrow contracts and the reduction of efficiency by defining UB.)

    (Let me guess: You are not a huge fan of chatbots either!
    Ok, that was easy.)

    Chandler talked about how narrow contracts allow optimizations.

    | - Wide Contract: The function guarantees to handle all possible inputs
    | gracefully, usually by returning an error code or throwing an
    | exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
    |
    | - Narrow Contract: The function only guarantees correct behavior if
    | the caller meets specific preconditions. If the preconditions are
    | violated, the behavior is undefined.
    |
    | When is it appropriate to have a narrow contract? Always, when
    | performance, memory footprint, or direct hardware control are
    | paramount. In operating system kernels, embedded systems, real-time
    | applications, and high-performance computing, the overhead of
    | validating every pointer, checking every array bound, and verifying
    | every integer range is unacceptable. C assumes the programmer is
    | competent and knows the state of their own data. Narrow contracts
    | shift the burden of correctness from runtime execution to compile-time
    | reasoning and programmer discipline.

    Chandler also explained how defining UB for certain operations
    would require less efficient code to be generated.

    | The hardware: Some architectures silently wrap on overflow, some trap
    | and halt the CPU, and some have no concept of the operation at all.
    | Forcing a single, defined behavior (like "always wrap around") would
    | require compilers to insert expensive emulation code on architectures
    | that don't support it natively, destroying C's "trust the hardware"
    | philosophy.
    |
    | Or, consider a loop:
    |
    | for (int i = 0; i < n; i++) {
    | arr[i] = 0;
    | }
    |
    | If out-of-bounds array access had defined behavior, the compiler would
    | have to insert a bounds check ("if (i >= array_length)") on every single
    | iteration. Because out-of-bounds access is UB, the compiler can assume
    | n is always within bounds. This allows it to unroll the loop,
    | vectorize it using SIMD instructions, and process 8 or 16 elements per
    | CPU cycle, yielding massive performance gains.

    Well, there are some tests that can be taken out of loops (as
    in Java), but other tests can't.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Sun Jun 14 22:02:50 2026
    From Newsgroup: comp.lang.c

    On 14/06/2026 16:33, Dan Cross wrote:
    In article <110k0mp$329k6$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    On 13/06/2026 14:02, Dan Cross wrote:
    In article <110ghmv$21vi3$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    As for my '"modern compilers are evil" crowd' comment, there are people >>>> (not anyone involved in this discussion) who really do fall into that
    camp. I've seen people who are experienced and respected developers
    make all sorts of accusations to compiler developers, claiming they are >>>> only interested in high scores on synthetic benchmarks and directly
    insulting their motivations and integrity, blaming them for "breaking" >>>> their code that relied on the effects of some kinds of UB. It is always >>>> frustrating when you have code that works fine with one compiler
    version, but using another compiler results in failure due to UB in your >>>> code - especially if writing correct code gives inefficient results with >>>> the first compiler. And it's fine to say you'd be happier if a
    particular thing that is UB in C were not UB - but it is unreasonable to >>>> blame compiler developers for implementing the language as it is defined. >>>
    Eh...I think those people have a point.

    Note, I don't think that "modern compilers are evil" (I mean,
    wow, that's a strong word) and I certainly do not think it is
    appropriate to malign the people who write them personally over
    what one does with code.

    I think it is important for tools to be helpful, and it's fine to
    complain if a tool is being directly unhelpful - or ask for improvements
    when you think it could be better.

    Yes.

    But I _do_ think it is fair to say that UB is very easy to fall
    into in C, that programs that have worked correctly (insofar as
    their intended behavior as written) for years can suddenly fail
    because latent UB is treated differently in a point revision of
    a compiler, and that that (as you point out) can be incredibly
    frustrating for the authors.

    It can certainly happen, yes. And I fully sympathise on these few
    occasions when changes to the standard has meant that code that
    previously had defined behaviour, now has different or undefined
    behaviour. (However, I think that for some kinds of code, programmers
    could be better at specifying exactly what standards their code
    requires, and the standards they use when compiling code.)

    But it is important to realise that if you write code with UB, it is
    /your/ mistake - not the mistake of the compiler developers, or the
    mistake of the standards authors. Compiler vendors can (and do!) try to
    help programmers find their mistakes - experience shows, however, that
    many programmers reach first for bug report forms or complaints in
    forums before compiler tools like sanitisers or even enabling warnings
    on their builds.

    Programming in C is a cooperative effort - including the standards
    authors, the compiler vendors, and the C programmers. Each group can
    try to help the others, but each is ultimately responsible for their own
    part.

    Here's the problem that I have with this line of reasoning. C
    is a language that has considerable history; there was a large
    body of C code written before the first standard was ever
    created, in 1988; C was a teenager. And it took many years for
    decent quality ANSI C compilers to be ubiquitous. C could
    legally drink by then.

    "Undefined Behavior", in C, in the manner usually discussed in
    this newsgroup, was introduced with the first standard. That
    means that there is --- still --- a large body of software that
    has "UB" that was put there before UB existed as a thing
    programmers needed to worry about in C.

    Even once it was a part of C, the concept was communicated
    poorly.


    It is certainly the case that C code has been written for a long time.
    And it is certainly the case that some C code was written long ago, and
    is still used on systems today. But I think it is important to keep in
    mind that the solid majority of C code is relatively recent. Very
    little pre-C90 code is ever compiled with modern tools. Code that is
    old and still in use is important code, but modern code and modern tools should not be kept back because of it.

    Maybe there is scope for compilers to have better options for handling
    old code, other than the usual "Use -O0 to avoid optimising on UB"
    solution. You could come a long way with a "treat all variables as
    volatile" flag, for example.

    Some people seem to delight in this, believing precision in
    interpreting the standard in abstruse ways is an expression of
    deep technical expertise; but it really is not.


    Agreed.

    Yes, UB is created by programmers. However, in large systems,
    it may be that it was created inadvertantly; someone makes a
    change that subtley invalidates some invariant that an unknown
    caller far away in the code base (or in another one that relies
    on the change via an indirect dependency) and now you've got UB;
    locally, everything appears correct; but it's the combination
    where the UB manifests.


    That can certainly happen. But that's just bugs in the code. I don't
    see why UB should be considered as something special here. People
    making changes to existing code sometimes misunderstand things, or accidentally break something that worked before. That's life as a
    programmer, and there are techniques to reduce the risk - code reviews, linters, testing regimes, etc. Nothing gives 100% guarantees, and
    everything has to weigh risks, consequences, costs and resources. UB is
    not special here.

    Regehr called out a dichotomy with UB: programmers using a
    language hate it; compiler writers love it.

    I think Regehr has made some good points in his writings, but I do not
    agree with him on everything.

    As a programmer, I am a fan of the concept of UB. I am quite happy with
    the idea that operations have a pre-condition, and that if there is no
    "right answer" for a given input, I should not provide that input. I
    prefer that signed integer arithmetic overflow is UB, and do not want it
    to be wrapping or have some other semantics - to me, it is far clearer
    that way. If I have UB in my code, it's a bug - no different from any
    other bug I might make.

    This example makes little sense to me. If you don't want
    integer overflow, then don't overflow; the techniques for
    avoiding it are pretty well known. But why is specifically
    better that it is UB, rather than than trapping in debug
    builds, or having IB semantics based on the underlying machine?
    It seems to be that the burden on the programmer is the same.


    UB means precisely that I can choose trapping, or IB, or optimising on
    the assumption it does not happen. If signed integer overflow were
    defined as wrapping, then compilers could not put in traps to catch the
    errors because as far as the language is concerned, they are not errors.
    If they are defined as causing traps, then that's the semantics -
    compilers could not optimise code assuming overflow does not happen,
    unless it can prove there is no overflow.

    And making it defined behaviour gives programmers the mistaken idea that
    they don't need to avoid overflow because there is no UB.

    Making this UB is an admission of the blindingly obvious - there is no
    correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and it allows tools to help programmers avoid these mistakes, and it allows
    compilers to give programmers the most efficient results from known good
    code rather than adding unnecessary run-time checks that are never
    triggered.

    It is the case that in C, there are some kinds of UB that can be quite
    subtle. However, you rarely need to risk meeting them. Yes, there are
    pitfalls - don't go near them, and they don't matter.

    I disagree. I think almost all non-trivial programs have UB to
    a greater or lesser extent, whether they intend to or not.

    However, it is unfortunately the case that sometimes avoiding UB can be
    costly in performance terms. An example would be if you have need of
    type-punning - perhaps you have a float in memory and you want to access
    it as an uint32_t for some reason. Casting a float * to an uint32_t *
    and using that new pointer is UB. Some compilers will nonetheless
    generate the code you want after such a cast. Some compilers might not,
    depending on details of the rest of the surrounding code, because it is
    UB. A non-UB solution would be to use memcpy(), or a type-punning
    union. For highly optimising compilers, that's fine - the code
    generated by gcc or clang for a memcpy() here is likely to be as
    efficient as you could get - directly reading the float from memory to
    an integer register. For other compilers, however, you might get a call
    to a memcpy() library function in an external DLL, taking orders of
    magnitude more cycles. What is the poor programmer to do? Write code
    that is portable and correct, but very slow with some implementations?
    Write code that "cheats" and is efficient on some implementations but
    might not give the desired results on others? Use pre-processor
    monstrosities to detect different compilers and adapt accordingly? That
    is what I see as the biggest issue resulting from compiler optimisation
    based on UB. I don't know what the "best" answer here is.

    This is kind of my point. If you need a fast way to convery


    (I think you missed a bit of your answer here?)

    Here's my own vignette: I was chatting with a friend who works
    on LLVM and clang some time ago. I said, "I don't want UB" and
    he replied, "no, you really do." I asked him what he meant and
    he responded that I wanted a compiler that is capable of
    optimizing my program; "sure, but I still don't want UB." We
    went on for a bit, and it became clear that he saw UB as _the_
    vehicle for unlocking optimization.

    I realized that we were not speaking the same language _at all_.
    He and I both wanted a language where we could write programs
    that yield efficient object code. He saw UB as essential for
    that; but what I want is a language with well-defined semantics
    that can be aggressively optimized.

    I too want a language with well-defined semantics that can be
    aggressively optimised. But I do not see UB as a hinder to that.

    UB is literally the opposite of well-defined.


    I want good definitions of things that should be defined. Things that
    cannot have good definitions, are fine left undefined. A language
    standard should not be trying to define the behaviour of /everything/.

    I am happy knowing that I cannot divide by 0,

    Yup. That should be a trap.

    For some programs, yes. For others, no.


    or find the square root of a negative number (in the real
    domain).

    Yup. That should be a trap.


    For some programs, yes. For others, no.

    I am happy knowing that I cannot add two ints if their sum
    overflows the range of their type,

    Yup. That should be a trap (if you want wrapping semantics, you
    should request it explicitly).

    I agree that wrapping semantics should be something you have to ask for.
    (As an aside, I think it is a mistake for languages to have types that
    have wrapping semantics - it's the operations that should wrap, not the
    types. Zig gets it right by distinguishing between "x + y" and "x +% y".)

    I don't want to pay the price for checks, traps, and limited
    re-arrangements and optimisations when I know my expressions don't
    overflow. But I am also happy to be able to get a trap when I ask for it.


    and that I cannot call a function with a different number or
    type of parameters than its definition.

    Yup. That should be a compile-time error.


    There I agree entirely. The build model of compiling units to separate
    object files without any information beyond symbol names made sense 50
    years ago - we should be doing far better now. (We /can/ do far better,
    but it requires conventions in the way you write your C code and the
    options used when compiling or linting the program.)

    I have a great deal of difficulty seeing how things could be
    any different, other than in a managed language with significant
    overhead from run-time checks - and that goes against the
    "aggressively optimised" requirement.

    There are existence proofs of other languages that can, and do,
    do these things, and do them well. I hate to keep beating this
    drum, but I think Rust does well here: in safe Rust, UB is a
    compile-time error; in *unsafe* Rust, there are tools to help
    find where programmers violate the language's invariants.


    Certainly it is possible to eliminate a number of things that are UB in
    C. UB that is not necessary, or not useful, is a bad thing in a language.

    But I think it is equally bad to give things a definition simply to be
    able to say there is no UB. It is, IMHO, entirely /wrong/ of a language
    to define integer overflow as wrapping simply so that it is not UB. I
    do not see a guaranteed incorrect result that likely has catastrophic consequences in a program as being better than UB. (I believe Rust
    defines integer overflow as trapping in "debug" mode and wrapping in
    "release" mode, which I think is a horrendous idea.)

    Having "well-defined semantics" does not mean the language should accept
    anything that happens to fit the syntax and grammar rules, or that all
    functions and operations should give a defined result for all inputs.

    I never said that it did.

    I didn't say you said it did :-)


    It means that the set of valid inputs is clearly defined, along with the
    outputs and effects you get when the inputs are valid.

    So I was the one who said "well-defined semantics" and I had a
    specific meaning in mind. Your definition is incomplete with
    respect to that meaning: in addition to what you said, invalid
    inputs should be rejected, either as a compile time error, or by
    generating an exception or panic at runtime. If you want to
    live dangerously and turn the runtime checks off for performance
    reasons, then you get 2's complement behavior for integers or
    whatever the machine does for the others.


    I am all in favour of compile-time checks and rejecting code with errors
    (not just UB) as soon as possible. The "perfect" language is one where
    you really can follow the old Ada saying - if you can make it compile,
    it's ready to ship.

    I don't live dangerously by not having run-time checks on integer
    overflows. I make sure my code does not have them, so checks are
    unnecessary. For some of my code, if it "panicked" somewhere in
    calculations, that would be a disaster - when you have code controlling
    power electronics, a sudden stop can mean short-circuits and components releasing their magic grey smoke.

    Thinking that run-time checks will save you from UB is wishful thinking.
    How are you going to have run-time checks that a pointer parameter
    points to a valid object of the right type? You can check for a
    null-pointer, but that's about it. Some things that are potential UB in
    C are inherent in the type of language - checking for such problems (at compile-time or run-time) needs a language that has a different way of handling objects and pointers so that you cannot have arbitrary pointers
    to arbitrary objects.

    C is not a language suitable for such run-time or compile-time checks -
    it is a language for getting the highest efficiency because the
    programmer takes responsibility for getting things right. You are
    correct that large programs normally have bugs (of which UB is just one
    class) - the risk of bugs goes up with the size of the code base. The corollary is that C is not a language suitable for large programs.

    Rust, I think, reduces the risk of some kinds of bugs. So does C++,
    when used carefully. Most code, however, is best written in languages
    where these issues cannot occur - or at least where checks can be done
    without a measurable impact. For example, if you use Python, you never
    have integer overflow, and you never have invalid pointers.


    (There are plenty of points in the C standards where the wording could
    make the semantics clearer, or where the range of input values could
    easily have been larger - I am not suggesting C is as well-defined as it
    could reasonably be.)

    It's not just that it's nowhere close to being as well-defined
    as it should be, it's because the language as defined permits
    behavior that varies far too widely, specifically because of UB.

    Consider one of the examples you gave: signed integer overflow.
    The standard doesn't say that you _can't_ add two numbers
    together if you overflow, it just says that if you do, the
    language imposes no requirements on the resulting behavior. It
    may trap, it may elide the addition entirely, or it may do it
    and let the result be whatever the underlying machine does.

    That is, the _language_ does not say that it's a bug; it says
    that it's not going to say anything about it at all.


    I'd be happy for the C standard to say that signed integer overflow is a
    bug, or that code is not allowed to overflow its integer arithmetic. I
    would not be happy if it said compilers must trap on the bug or handle
    it in some specific way - what happens when a bug is reached is still
    UB. And if the wording of the standard were changed to call it a "bug"
    rather than "UB", it would make absolutely zero difference to the way I
    write my code.

    This is one reason the committee is trying to reign some of this
    in.

    That, I think, is the tension: there was a fundamental breakdown
    in communication between the users of the language, and those
    defining and implementing it. My subjective sense is that in
    the past few years things are getting somewhat better, but it is
    hard to evolve something as critical and widely used as C.

    Communication between the separate parties is always an issue, and it is
    easy for it to be a one-way street with a language standards committee
    dictating the rules with little attention to feedback, then compiler
    vendors following these rules without listening to the users.

    A challenge here, perhaps, is that users are a very diverse group. How
    much should compiler vendors cater for those that put a lot of effort
    into correctness and want top efficiency, or those that are less
    knowledgable about the language but want to avoid the consequences of
    their mistakes? What about those working with old code written for
    different compilers with different unwritten rules? It is not easy to
    please everyone.

    I think that's simplistic; not many programmers actively want to
    "avoid the consequences of their mistakes." Do you really
    believe that they do? If so, why?

    It was badly worded - I meant that programmers do not want mistakes that
    they might make to lead to additional problems. We can all appreciate
    and expect that if we make a mistake in code with an incorrect
    calculation, that will give incorrect output, or perhaps a crash in the program. But we hope that it will not lead to corruption of a
    filesystem, or an exploitable security hole - something out of
    proportion with the mistake.


    Conversely, there *is* this kind of machismo attitude among many
    C programmers that it requires a superior intellect to truly
    understand this language, and those who do not (or who make any
    mistake in their understanding) are simply unworthy. I have
    repeatedly observed this over many decades now, and when I see
    it, I think that it is odious.

    In my field, people usually put a lot of effort into writing code simply
    and clearly. You avoid mistakes not by being "clever", but by being meticulous and careful. I don't think successful C programming requires greater intellect, knowledge or experience compared to other programming languages - but it /does/ require an appropriate attitude. You are
    working with sharp knives - pay attention to what you are doing, and
    you'll be fine.


    My experience is that most programmers are highly intelligent,
    capable people. They are not wrong to want behavior they can
    rely on, particularly when things are not obvious, as they
    often are not. They also want a language that requires a less
    lawyerly read of to understand its semantics; that could go the
    way of formality (my preferred approach) or just clearer
    exposition. Either would be preferable to the current state.


    I was avoiding signed integer overflow long before I had read any C
    standards or even knew about the term "UB". Programming in C does not
    need a lawyer knowledge of the language. It is just like programming in
    any other programming language - use features that you know are correct,
    and if you want to do something and don't know how to do so correctly,
    look it up.

    In fairness, I think the current members of the committee
    recognize this.

    I am not in any way saying that critics of aspects of C (the language, >>>> the standards, or compiler implementations) should be dismissed or
    despised - merely that the example of loop elimination leading to UB and >>>> unexpected results is regularly used as "evidence" by those that hold
    extreme positions about C, despite it being very unrealistic for the
    issue to cause problems in real coding practice.

    The kernel I am working on has about 5 million lines of code.
    That code has been evolving for 40 years; some of it predates
    the ISO standards and even the ANSI standard. It has been
    updated for newer compilers, sure, but in some places the
    treatment is surface-level: using ISO-style function prototypes
    and definition syntax, for example. But deep problems remain in
    parts, and contraints on engineering resources couple with
    economic and business pressures so that it's not going to get
    cleaned up any time soon. I'm sure there is UB in it; in fact,
    I know there is. But them's the breaks; and yet, customers are
    using it in production. Because of this, upgrading toolchains
    is laborious and complex, and takes a lot of time, and new
    compilers are (rightly) viewed with suspicion. That is not a
    great situation, but I don't think anyone is angry at the
    compiler people over it.

    I think that is a good way to handle the situation. In my projects, I
    do not normally upgrade or change toolchains. While I think the risk of
    UB is small in my own code, small does not mean non-existent. And for
    my work, generated code that behaves correctly in terms of C semantics
    but has different execution times or code size might also be an issue -
    so changes in toolchains mean a lot of extra testing and qualification.

    Obviously in a production setting tools should be tested and
    qualified. But the danger posed by UB adds unacceptable risk on
    large projects, and the burden for updating a toolchain is too
    high. That is as much an indictment of the language as of any
    particular project.

    As a counter example, there was the Harvey project, which was a
    fork of Plan 9 where the Plan 9 C dialect was replaced with ISO
    C; we accounted for this by having CI build with 6 seperate
    compilers; this flushed out a lot of bugs.

    I am surprised that more projects do not adopt canary CI builds
    against newer toolchains.

    In addition, for some microcontrollers the toolchains have relatively
    small user bases and consequently higher risks of unknown bugs in the
    toolchains themselves. Sometimes there are also implementation-specific
    features that change between versions (though that is less of an issue
    these days).

    Fun fact: part of the reason Google got involved in clang and
    LLVM development was because the vendor toolchain for a
    particular microcontroller used in android phones was buggy and
    would crash (that is, the compiler itself crashed). The
    solution was not to live with it; it was to build a better
    toolchain.


    Buggy toolchains are always a pain. (So is buggy hardware -
    microcontrollers and cpus have their errors too.)

    Google could afford to do that; I recognize not many
    organizations can.

    Unfortunately that's true.


    And just as it's not acceptable to blame compiler writers for
    implementating the language as it is defined, it's not really
    acceptable to blame programmers either; some of the people who
    put the UB there are (literally) dead, and there's just not
    enough time in the day to go clean it all up. I wish there was
    more compassion for that.

    Being dead does not resolve you of the responsibility - the person that
    wrote the code with UB is the person who wrote the code with the UB,
    just like any other bugs. That person wrote the code with the error.

    See above. Those people may well have written the code before C
    was standardized and before UB as we know it now existed. Also,
    by definition UB is not an error.

    It might not be fair to hold it against them - there are a great many
    possible reasons why it was not their fault (typically management is
    more at fault than the coders!). And placing blame is rarely a useful
    exercise - usually it does not matter where the bugs came from, only
    that they are there and need to be fixed or worked around.

    Exactly. The footguns hiding in C code that has worked
    perfectly for decades, dating back to before the standards
    existed, are legion. Caveat emptor.

    _Or_ the code may have been written with careful regard for the
    standard, but something _else_ may have been changed that now
    leads to exposure to UB. For example, perhaps code was written
    that multiples two numbers, `a*b`; a known to be `unsigned int`
    when written, but `b` is a signed int. But maybe that is hidden
    behind a typedef; some time in the future, the typedef is
    changed so that `a` is now `unsigned short`; perhaps someone
    realized that the domain values never exceed 16 bits and by
    changing the definition some critical structure now fits in a
    single cache line. But also now the type promotion rules kick
    so that `a*b` happens with the factors as `signed int` and in
    there exist values of `a` and `b` where `a*b` overflows: UB.

    The code had no UB; the change was elsewhere; no one saw this
    because the tests all passed and everything looked ok; then
    someone upgrades the compiler and now things break.

    Who's fault is that?

    There's no simple answer here.

    But one thing is clear to me - "UB" is irrelevant here (and in many of
    your points). It would not matter if everything had fully defined
    behaviour. The point is that something is changed in one part of the
    code that has unexpected consequences in another part of the code. Who
    cares if there is UB or not? The issue is that the code does not work
    as intended or expected. UB can provide situations where you have
    unexpected bugs - but so can all sorts of other things.


    And no, this is not contrived; this is exactly the sort of thing
    that happens on large, long-lived projects.

    As said earlier, C is what it is. I suspect that it will
    continue to make incremental improvements, but we're basically
    stuck with what we have.

    Agreed.

    ...but be careful blaming the programmer.

    Or the language, or the tools.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Sun Jun 14 21:24:20 2026
    From Newsgroup: comp.lang.c

    ram@zedat.fu-berlin.de (Stefan Ram) writes:
    cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
    I'm not a huge fan of Carruth.

    (Text after "| " below was generated by a chatbot asked to explain
    narrow contracts and the reduction of efficiency by defining UB.)

    (Let me guess: You are not a huge fan of chatbots either!
    Ok, that was easy.)

    Chandler talked about how narrow contracts allow optimizations.

    | - Wide Contract: The function guarantees to handle all possible inputs
    | gracefully, usually by returning an error code or throwing an
    | exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
    |
    | - Narrow Contract: The function only guarantees correct behavior if
    | the caller meets specific preconditions. If the preconditions are
    | violated, the behavior is undefined.
    |
    | When is it appropriate to have a narrow contract? Always, when
    | performance, memory footprint, or direct hardware control are
    | paramount. In operating system kernels, embedded systems, real-time
    | applications, and high-performance computing, the overhead of
    | validating every pointer, checking every array bound, and verifying
    | every integer range is unacceptable.

    I have a recollection that a version of IBM's MVS operating
    system did, indeed, validate input and output arguments to kernel
    functions.

    Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jun 14 15:55:09 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    [...]
    UB means precisely that I can choose trapping, or IB, or optimising on
    the assumption it does not happen.

    No, it means that the implementation can make that choice (or allow you
    to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
    programmer any options.

    [...]

    Making this UB is an admission of the blindingly obvious - there is no correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and
    it allows tools to help programmers avoid these mistakes, and it
    allows compilers to give programmers the most efficient results from
    known good code rather than adding unnecessary run-time checks that
    are never triggered.

    Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious. And a sufficiently clever compiler
    can omit some (not all) checks in cases where it can be statically
    proved that overflow doesn't occur, and/or hoist some checks out of
    loops.

    Of course those kinds of checks are not in the "spirit of C".

    [...]

    I am happy knowing that I cannot divide by 0,
    Yup. That should be a trap.

    For some programs, yes. For others, no.

    What's the difference between these programs?

    [...]

    I don't want to pay the price for checks, traps, and limited
    re-arrangements and optimisations when I know my expressions don't
    overflow. But I am also happy to be able to get a trap when I ask for
    it.

    I don't want to pay the price of checking for syntax errors when I know
    my code is syntactically correct. But I never know that, because I'm
    fallible.

    I admit that's not a very strong argument. There are real differences
    between compile-time and run-time checks.

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jun 15 10:09:56 2026
    From Newsgroup: comp.lang.c

    On 15/06/2026 00:55, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    UB means precisely that I can choose trapping, or IB, or optimising on
    the assumption it does not happen.

    No, it means that the implementation can make that choice (or allow you
    to make that choice). A conforming compiler could generate code on the assumption that signed overflow never happens, and not give the
    programmer any options.

    Sure. But if it were not UB, then a conforming implementation could not
    make such choices or give me such choices. UB does not mean that I
    definitely have such choices (as my poor wording implies), but that implementations are able to give me the choice.

    If the standards had said integer overflow was IB, then that puts limits
    on what the compiler can do - and therefore on what it can do to help
    the programmer. Exactly what options it had would depend on the wording
    of the standard, such as whether it required an "implementation-defined
    value" or, like narrowing conversions to signed integer types, "either
    the result is implementation-defined or an implementation-defined signal
    is raised". However, even in that later case I think it would be more confusing for a lot of programmers - many programmers, quite reasonably,
    have an intuition that "UB" means "don't do this" or "this is not legal
    in C". They also have the intuition that "IB" means "this works
    according to the underlying hardware". If the standards had said
    integer overflow was IB, most programmers would immediately assume that
    meant wrapping behaviour.

    More interesting, I think, is the possible future "erroneous behaviour" marker. My understanding is that it lets the compiler have traps or
    other run-time detection, or provide unspecified values, while making it
    clear that erroneous behaviour is a result of software bugs.



    [...]

    Making this UB is an admission of the blindingly obvious - there is no
    correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and
    it allows tools to help programmers avoid these mistakes, and it
    allows compilers to give programmers the most efficient results from
    known good code rather than adding unnecessary run-time checks that
    are never triggered.

    Trapping or raising/throwing an exception on overflow would also be an admission of the blindingly obvious.

    It is obvious - to me, anyway - that signed overflow is a mistake in the
    code. It is trying to do something that cannot be done. What is the single-digit sum of 5 and 8? There is no answer. The answer is not 3,
    or 9. Putting your hand in the air and asking the teacher for help
    might be appropriate sometimes, but it is not a correct answer.

    Throwing some kind of exception or trap can definitely be helpful at
    times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
    problems (the Ariane 5 failure was caused by the exception handler, not
    the overflow fault). That does not mean it is better to ignore
    overflows - it means there is no appropriate action that is suitable in
    every situation. I am far from convinced that there is even a
    reasonable choice of default action that could be usefully made.


    And a sufficiently clever compiler
    can omit some (not all) checks in cases where it can be statically
    proved that overflow doesn't occur, and/or hoist some checks out of
    loops.

    Sure - but in practice having strict overflow checks would significantly reduce optimisation and re-arrangement possibilities, as well as having
    to include the checks themselves. You might allow non-strict checks in
    some manner (thus allowing optimisations like "a + b - a" reducing to
    just "b"), but I think that might be hard to specify and would reduce
    the debugging help of the checks.


    Of course those kinds of checks are not in the "spirit of C".


    Indeed.

    And if we want to move away from the "spirit of C", then I think we
    should move away from the /language/ of C. In C, people do not expect exceptions or sudden jumps from their code - they expect that if there
    is checking for errors, it is explicit in the code. In many other
    languages, there is a much clearer understanding that lots of things can
    fail and cause immediate exits from the function - and code is
    (hopefully!) written to handle that.

    [...]

    I am happy knowing that I cannot divide by 0,
    Yup. That should be a trap.

    For some programs, yes. For others, no.

    What's the difference between these programs?

    There are disadvantages in having a trap. It can (depending on
    hardware) mean extra code to detect the zero - usually that run-time
    cost is negligible, but sometimes it is not. It will mean extra code to handle the exception - again, often but not always negligible. Those
    costs apply even if the programmer has made sure that division by zero
    never occurs. And if a trap is thrown, what then? I think that a
    programmer that is careful enough to see that a division expression
    might throw, and handle the trap or exception appropriately, is going to
    be careful enough to avoid the problem in the first place. So the trap
    is going to be unexpected and handled badly. A badly handled division
    by zero exception left the USS Yorktown dead in the water for three hours.

    Is it better /not/ to trap? There is no general rule. If you have
    tried to divide by zero, something has gone wrong before the division,
    and there are no good answers to what will go wrong afterwards.
    Sometimes it is possible to do damage limitation - sometimes not.

    The correct way to handle the situation is to avoid it - be sure that
    you are not dividing by zero in the first place. Identify and handle
    the problem where it occurs - when this zero is created, or the
    circumstances leading to that point - rather than trying to do a
    post-mortem after the failed division. And if you are doing that, then
    what benefit is there in having trapping for division by zero? It
    becomes just a waste of effort.

    (There are other ways of handling such things, like the use of NaN's in floating point, or extending your integers with some kind of "invalid" indicators.)


    [...]

    I don't want to pay the price for checks, traps, and limited
    re-arrangements and optimisations when I know my expressions don't
    overflow. But I am also happy to be able to get a trap when I ask for
    it.

    I don't want to pay the price of checking for syntax errors when I know
    my code is syntactically correct. But I never know that, because I'm fallible.


    Checking for syntax errors is cheap - PC computing power is, in this
    context, pretty much free and unlimited. If I am using a target
    environment where run-time resources are plentiful, I would not be using
    C in the first place.

    I admit that's not a very strong argument. There are real differences between compile-time and run-time checks.


    Perhaps I work in a field where that difference is more extreme than for
    many programmers, and I thus feel it more than most.



    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 15 10:43:25 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> wrote:
    On 15/06/2026 00:55, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    <snip>
    [...]
    Making this UB is an admission of the blindingly obvious - there is no
    correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and
    it allows tools to help programmers avoid these mistakes, and it
    allows compilers to give programmers the most efficient results from
    known good code rather than adding unnecessary run-time checks that
    are never triggered.

    Trapping or raising/throwing an exception on overflow would also be an
    admission of the blindingly obvious.

    It is obvious - to me, anyway - that signed overflow is a mistake in the code. It is trying to do something that cannot be done. What is the single-digit sum of 5 and 8? There is no answer. The answer is not 3,
    or 9. Putting your hand in the air and asking the teacher for help
    might be appropriate sometimes, but it is not a correct answer.

    Throwing some kind of exception or trap can definitely be helpful at
    times. And I agree that it would make it obvious that there has been a problem detected. But throwing exceptions or traps can cause more
    problems (the Ariane 5 failure was caused by the exception handler, not
    the overflow fault). That does not mean it is better to ignore
    overflows - it means there is no appropriate action that is suitable in every situation. I am far from convinced that there is even a
    reasonable choice of default action that could be usefully made.


    And a sufficiently clever compiler
    can omit some (not all) checks in cases where it can be statically
    proved that overflow doesn't occur, and/or hoist some checks out of
    loops.

    Sure - but in practice having strict overflow checks would significantly reduce optimisation and re-arrangement possibilities, as well as having
    to include the checks themselves. You might allow non-strict checks in
    some manner (thus allowing optimisations like "a + b - a" reducing to
    just "b"), but I think that might be hard to specify and would reduce
    the debugging help of the checks.

    IMO resonable and easy definition is: computation either delivers mathematically correct result or traps, and it is not allowed to
    trap in cases where naive bottom-up evaluation does not trap.
    In more formal way optimization is not allowed to introduce
    stronger precondition, but may weaken it.

    <snip>

    The correct way to handle the situation is to avoid it - be sure that
    you are not dividing by zero in the first place. Identify and handle
    the problem where it occurs - when this zero is created, or the circumstances leading to that point - rather than trying to do a
    post-mortem after the failed division. And if you are doing that, then
    what benefit is there in having trapping for division by zero? It
    becomes just a waste of effort.

    What is value of certification required for some software? If
    programmer did good job then program will work correctly.
    Trap give assurance that programmer indeed correctly handled
    tricky problem. And once you know that computation works
    according to math rules other forms of verification are easier.

    You also seem to have bias to real time control: if you need
    value just at given moment, then it is hard to do something
    reasonable. But at least in some control areas there is
    notion of "safe state", for example working heavy machine
    is dangerous, stopped one usually is considerd safe. If
    there is safe state, then anything not expected by program
    should trigger transition to safe state.

    In general computation, if you need correct value and have some
    time there are options which may involve re-doing computation at
    higher precistion, which may get rid of occasional overflows
    and divisions by zero due to overflow. Division by zero may
    be due to bad input data, traps allow indentification of
    such data (doing it in other way may be computationaly quite
    expensive).
    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jun 15 16:01:32 2026
    From Newsgroup: comp.lang.c

    On 15/06/2026 12:43, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 15/06/2026 00:55, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    <snip>
    [...]
    Making this UB is an admission of the blindingly obvious - there is no >>>> correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and
    it allows tools to help programmers avoid these mistakes, and it
    allows compilers to give programmers the most efficient results from
    known good code rather than adding unnecessary run-time checks that
    are never triggered.

    Trapping or raising/throwing an exception on overflow would also be an
    admission of the blindingly obvious.

    It is obvious - to me, anyway - that signed overflow is a mistake in the
    code. It is trying to do something that cannot be done. What is the
    single-digit sum of 5 and 8? There is no answer. The answer is not 3,
    or 9. Putting your hand in the air and asking the teacher for help
    might be appropriate sometimes, but it is not a correct answer.

    Throwing some kind of exception or trap can definitely be helpful at
    times. And I agree that it would make it obvious that there has been a
    problem detected. But throwing exceptions or traps can cause more
    problems (the Ariane 5 failure was caused by the exception handler, not
    the overflow fault). That does not mean it is better to ignore
    overflows - it means there is no appropriate action that is suitable in
    every situation. I am far from convinced that there is even a
    reasonable choice of default action that could be usefully made.


    And a sufficiently clever compiler
    can omit some (not all) checks in cases where it can be statically
    proved that overflow doesn't occur, and/or hoist some checks out of
    loops.

    Sure - but in practice having strict overflow checks would significantly
    reduce optimisation and re-arrangement possibilities, as well as having
    to include the checks themselves. You might allow non-strict checks in
    some manner (thus allowing optimisations like "a + b - a" reducing to
    just "b"), but I think that might be hard to specify and would reduce
    the debugging help of the checks.

    IMO resonable and easy definition is: computation either delivers mathematically correct result or traps, and it is not allowed to
    trap in cases where naive bottom-up evaluation does not trap.
    In more formal way optimization is not allowed to introduce
    stronger precondition, but may weaken it.


    It is always the case that an implementation can weaken preconditions
    and strengthen postconditions and remain correct - though it might then
    be less efficient than you expect. But if you are /requiring/ a weaker precondition and /requiring/ a strong postcondition - such as by
    insisting on traps on overflow - you are changing the function or
    operation specification, and it is not necessarily a good thing.

    In C, the integer addition operation "c = a + b;" has a precondition :

    (a + b) <= INT_MAX, (a + b) >= INT_MIN

    It has the postcondition :

    c == a + b

    Saying that it must trap if there is overflow weakens the precondition
    to any "a" and "b", but makes the postcondition much more complicated.
    It means it is no longer true that the result of an addition operation
    is the sum of the operands. Addition is no longer a "pure" function -
    now it has side-effects that are completely unpredictable at the site of
    use. Programmers can no longer rely on the timing of the operation,
    stack usage, interaction with other code, or even that the operation
    ever finishes.

    If your code is correct, and overflow never happens, then this is all a
    big disadvantage in terms of understanding and analysing the code. And
    it does not in any way reduce the effort needed to be sure that your
    inputs are appropriate for getting the desired results of the operation.

    Trapping like this can certainly be useful for debugging. But as a
    general feature it gives a false sense of security, complicates
    mathematical analysis, introduces massive additional possible code path choices which are either real or almost certainly untested in practice,
    or not real (because the compiler can see they are not taken) and
    untestable. That is not qualitatively worse than "who knows what will
    happen" UB, but it is not significantly better.


    <snip>

    The correct way to handle the situation is to avoid it - be sure that
    you are not dividing by zero in the first place. Identify and handle
    the problem where it occurs - when this zero is created, or the
    circumstances leading to that point - rather than trying to do a
    post-mortem after the failed division. And if you are doing that, then
    what benefit is there in having trapping for division by zero? It
    becomes just a waste of effort.

    What is value of certification required for some software? If
    programmer did good job then program will work correctly.

    Yes.

    Trap give assurance that programmer indeed correctly handled
    tricky problem.

    No, it certainly does not. And one of the reasons to dislike traps is
    that it makes people think like that. A trap can only happen if the programmer did /not/ handle the problem correctly. And I expect that if
    the programmer is able to write an appropriate specific trap handler for
    the failing expression (rather than a program-global "crash with error message" handler), then he/she would be able to avoid the problem in the
    first place.

    Sometimes, of course, you are trying to write code that has some input
    which is supposed to be correct, but you are not sure - and you can't
    change the calling code. How you handle that situation will depend on
    the program and the situation. But I don't see trapping as "correct
    handling" unless the whole program is written with the expectation of
    traps for error handling. You might, however, end up deciding that
    trapping is the least bad option.


    And once you know that computation works
    according to math rules other forms of verification are easier.

    You also seem to have bias to real time control: if you need
    value just at given moment, then it is hard to do something
    reasonable. But at least in some control areas there is
    notion of "safe state", for example working heavy machine
    is dangerous, stopped one usually is considerd safe. If
    there is safe state, then anything not expected by program
    should trigger transition to safe state.

    I think if you are /not/ concerned with high efficiency in the code,
    then you should be seriously questioning the choice of C as the language
    in the first place. And even if you use C, there are often things you
    can do to avoid having problems in the first place. The obvious one for integer overflow is to make more use of bigger types.


    In general computation, if you need correct value and have some
    time there are options which may involve re-doing computation at
    higher precistion, which may get rid of occasional overflows
    and divisions by zero due to overflow. Division by zero may
    be due to bad input data, traps allow indentification of
    such data (doing it in other way may be computationaly quite
    expensive).


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 15 17:52:09 2026
    From Newsgroup: comp.lang.c

    In article <8_EXR.112952$Mm3.81340@fx33.iad>,
    Scott Lurndal <slp53@pacbell.net> wrote:
    ram@zedat.fu-berlin.de (Stefan Ram) writes:
    cross@spitfire.i.gajendra.net (Dan Cross) wrote or quoted:
    I'm not a huge fan of Carruth.

    (Text after "| " below was generated by a chatbot asked to explain
    narrow contracts and the reduction of efficiency by defining UB.)

    (Let me guess: You are not a huge fan of chatbots either!
    Ok, that was easy.)

    Chandler talked about how narrow contracts allow optimizations.

    | - Wide Contract: The function guarantees to handle all possible inputs
    | gracefully, usually by returning an error code or throwing an
    | exception. (e.g., "If the pointer is null, return ERR_NULL_PTR").
    |
    | - Narrow Contract: The function only guarantees correct behavior if
    | the caller meets specific preconditions. If the preconditions are
    | violated, the behavior is undefined.
    |
    | When is it appropriate to have a narrow contract? Always, when
    | performance, memory footprint, or direct hardware control are
    | paramount. In operating system kernels, embedded systems, real-time
    | applications, and high-performance computing, the overhead of
    | validating every pointer, checking every array bound, and verifying
    | every integer range is unacceptable.

    I have a recollection that a version of IBM's MVS operating
    system did, indeed, validate input and output arguments to kernel
    functions.

    Indeed, google says it was called MVS/SP and later MVS/XA (extended addressing).

    The Midori folks at Microsoft added bounds checking to all array
    accesses in M# (the safe language they wrote Midori in). They
    expected performance to be awful; when they provided it, the
    overhead was pretty much undetectable: the cost was in the
    noise.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 15 17:57:31 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> wrote:
    On 15/06/2026 12:43, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 15/06/2026 00:55, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    <snip>
    [...]
    Making this UB is an admission of the blindingly obvious - there is no >>>>> correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and >>>>> it allows tools to help programmers avoid these mistakes, and it
    allows compilers to give programmers the most efficient results from >>>>> known good code rather than adding unnecessary run-time checks that
    are never triggered.

    Trapping or raising/throwing an exception on overflow would also be an >>>> admission of the blindingly obvious.

    It is obvious - to me, anyway - that signed overflow is a mistake in the >>> code. It is trying to do something that cannot be done. What is the
    single-digit sum of 5 and 8? There is no answer. The answer is not 3,
    or 9. Putting your hand in the air and asking the teacher for help
    might be appropriate sometimes, but it is not a correct answer.

    Throwing some kind of exception or trap can definitely be helpful at
    times. And I agree that it would make it obvious that there has been a
    problem detected. But throwing exceptions or traps can cause more
    problems (the Ariane 5 failure was caused by the exception handler, not
    the overflow fault). That does not mean it is better to ignore
    overflows - it means there is no appropriate action that is suitable in
    every situation. I am far from convinced that there is even a
    reasonable choice of default action that could be usefully made.


    And a sufficiently clever compiler
    can omit some (not all) checks in cases where it can be statically
    proved that overflow doesn't occur, and/or hoist some checks out of
    loops.

    Sure - but in practice having strict overflow checks would significantly >>> reduce optimisation and re-arrangement possibilities, as well as having
    to include the checks themselves. You might allow non-strict checks in
    some manner (thus allowing optimisations like "a + b - a" reducing to
    just "b"), but I think that might be hard to specify and would reduce
    the debugging help of the checks.

    IMO resonable and easy definition is: computation either delivers
    mathematically correct result or traps, and it is not allowed to
    trap in cases where naive bottom-up evaluation does not trap.
    In more formal way optimization is not allowed to introduce
    stronger precondition, but may weaken it.


    It is always the case that an implementation can weaken preconditions
    and strengthen postconditions and remain correct - though it might then
    be less efficient than you expect. But if you are /requiring/ a weaker precondition and /requiring/ a strong postcondition - such as by
    insisting on traps on overflow - you are changing the function or
    operation specification, and it is not necessarily a good thing.

    In C, the integer addition operation "c = a + b;" has a precondition :

    (a + b) <= INT_MAX, (a + b) >= INT_MIN

    It has the postcondition :

    c == a + b

    Saying that it must trap if there is overflow weakens the precondition
    to any "a" and "b", but makes the postcondition much more complicated.

    No. Precondition is the same. Postcondition has additional term
    "computation finished with no traps".

    It means it is no longer true that the result of an addition operation
    is the sum of the operands.

    Oposite of that: no traps means that regardless of precondition
    the result of an addition operation is the sum of the operands.

    Addition is no longer a "pure" function -
    now it has side-effects that are completely unpredictable at the site of use. Programmers can no longer rely on the timing of the operation,
    stack usage, interaction with other code, or even that the operation
    ever finishes.

    The difference is that without traps programmers do not know if
    arithmetic operations give correct result. With traps they do
    not know if program will successfully finish, but if it
    finishes they know that arithmetic gave correct results.

    If your code is correct, and overflow never happens, then this is all a
    big disadvantage in terms of understanding and analysing the code. And
    it does not in any way reduce the effort needed to be sure that your
    inputs are appropriate for getting the desired results of the operation.

    One needs to use correct formulas, there is no way around that.
    Without traps programmer must analyse ranges of all intermetiate
    expressions. That is tedious and error prone. People work
    around that by activating traps during testing, but it is
    quite hard to find worst case values, so errors may be
    easily missed during testing. Having traps active during
    production runs means that you may discover problem. You
    apparently think that ignoring possible problems at
    runtime is good thing. For simple programs you may analyze
    it well enough to be sure that nothing bad happens at
    runtime, but in general computing we use a lot of "interesting"
    programs which are too complex to analyse. We hope that
    they will run OK, but have no proof. Sometimes hope is
    based on statistical tests and on low probability input
    program may fail. Traps are useful to make sure that
    wrong results will not propagate further.

    Trapping like this can certainly be useful for debugging. But as a
    general feature it gives a false sense of security, complicates
    mathematical analysis, introduces massive additional possible code path choices which are either real or almost certainly untested in practice,
    or not real (because the compiler can see they are not taken) and untestable.

    You get extra code paths only if you attempt to handle traps.
    Trapping of overflows gives you assurance that in computation that
    you did and which finished with no traps there were no errors of
    certain kind (that is wrong results due to overflow). That is
    really not different than insistence on static types. Neither
    assures you of no bugs, but each tells you that some bugs
    did not happen. Of course, trapping at runtime is less
    satisfactory than compile time checking, but tight a priori
    bounds on ranges are notoriusly hard to obtain, so trapping
    is the best we can have for high performance software with
    current state of art.

    That is not qualitatively worse than "who knows what will
    happen" UB, but it is not significantly better.


    <snip>

    The correct way to handle the situation is to avoid it - be sure that
    you are not dividing by zero in the first place. Identify and handle
    the problem where it occurs - when this zero is created, or the
    circumstances leading to that point - rather than trying to do a
    post-mortem after the failed division. And if you are doing that, then
    what benefit is there in having trapping for division by zero? It
    becomes just a waste of effort.

    What is value of certification required for some software? If
    programmer did good job then program will work correctly.

    Yes.

    Trap give assurance that programmer indeed correctly handled
    tricky problem.

    No, it certainly does not. And one of the reasons to dislike traps is
    that it makes people think like that. A trap can only happen if the programmer did /not/ handle the problem correctly.

    Yes.

    And I expect that if
    the programmer is able to write an appropriate specific trap handler for
    the failing expression (rather than a program-global "crash with error message" handler), then he/she would be able to avoid the problem in the first place.

    Rather non-specific trap handler could work as "redo the computation
    in arbitrary precision". If problem (like division by zero) persists,
    then there is logic bug, otherwise it means that precision was
    inadequate and problem is resolved.

    Howver, you should think about such traps similarly to parity error
    which can be signaled by some hardware. There is low but nonzero
    probablity that such error can occur. Parity check gives you
    reasonable chance to detect it. Handling is at least as problematic
    as with overflow. Absence of traps gives you less info: no
    overflow traps mean no overflow, no parity traps means that
    parity was correct, but intent of parity check it to discover bit
    error and they are possible even with correct parity. So, do you
    think that parity check inside MCU-s are useless?

    Sometimes, of course, you are trying to write code that has some input
    which is supposed to be correct, but you are not sure - and you can't
    change the calling code. How you handle that situation will depend on
    the program and the situation. But I don't see trapping as "correct handling" unless the whole program is written with the expectation of
    traps for error handling. You might, however, end up deciding that
    trapping is the least bad option.


    And once you know that computation works
    according to math rules other forms of verification are easier.

    You also seem to have bias to real time control: if you need
    value just at given moment, then it is hard to do something
    reasonable. But at least in some control areas there is
    notion of "safe state", for example working heavy machine
    is dangerous, stopped one usually is considerd safe. If
    there is safe state, then anything not expected by program
    should trigger transition to safe state.

    I think if you are /not/ concerned with high efficiency in the code,

    Well, if efficiency does not matter traps can be implemented as
    a software layer above the language. Or one can use arbitrary
    precision arithmetic. Traps matter when efficiency matters,
    so they should be implemented in place giving best efficiency,
    at best in CPU and if that is not possible then in optimizing
    compiler.

    then you should be seriously questioning the choice of C as the language
    in the first place. And even if you use C, there are often things you
    can do to avoid having problems in the first place. The obvious one for integer overflow is to make more use of bigger types.

    Which may be best choice if efficiency is not important. But
    some calculations require surprisingly large accuracy to avoid
    overflow. Worse, in vast majority of cases lower accuracy
    may be adequate, so there is pressure to use "sufficient"
    accuracy overlooking special cases.

    In general computation, if you need correct value and have some
    time there are options which may involve re-doing computation at
    higher precistion, which may get rid of occasional overflows
    and divisions by zero due to overflow. Division by zero may
    be due to bad input data, traps allow indentification of
    such data (doing it in other way may be computationaly quite
    expensive).


    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 15 19:26:16 2026
    From Newsgroup: comp.lang.c

    Prefatory: I think we're largely in agreement; I'll just add a
    few notes, but snip most of the rest.

    In article <110n1db$3sbck$1@dont-email.me>,
    David Brown <david.brown@hesbynett.no> wrote:
    [snip]
    Maybe there is scope for compilers to have better options for handling
    old code, other than the usual "Use -O0 to avoid optimising on UB"
    solution. You could come a long way with a "treat all variables as >volatile" flag, for example.

    The problem is, the language doesn't make any guarantees here,
    and the compilers get to decide. If you're lucky, the compiler
    gives you some control via flags or pragmas or something, but
    if you're not lucky, it doesn't and the guarantees you can rely
    on are just too weak.

    [snip]
    That can certainly happen. But that's just bugs in the code. I don't
    see why UB should be considered as something special here.

    Because unlike many bugs, which are clearly bugs, UB is just the
    absence of defined behavior. So the output of a program
    executes can change in subtle ways with no changes to the code,
    only changes to the compiler or how it is invoked.

    People
    making changes to existing code sometimes misunderstand things, or >accidentally break something that worked before. That's life as a >programmer, and there are techniques to reduce the risk - code reviews, >linters, testing regimes, etc. Nothing gives 100% guarantees, and >everything has to weigh risks, consequences, costs and resources. UB is
    not special here.

    Yes. My point with this line is that UB doesn't show up because
    programmers are just careless, and "just write code more
    carefully" doesn't scale any better than, "have you tried just
    writing code without bugs?"

    UB means precisely that I can choose trapping, or IB, or optimising on
    the assumption it does not happen. If signed integer overflow were
    defined as wrapping, then compilers could not put in traps to catch the >errors because as far as the language is concerned, they are not errors.

    If "you" means the compiler, then sure. If "you" means the
    programmer, then you are lucky if you get to choose that, but
    it is not guaranteed that you will have that kind of flexibility
    available.

    If they are defined as causing traps, then that's the semantics -
    compilers could not optimise code assuming overflow does not happen,
    unless it can prove there is no overflow.

    And making it defined behaviour gives programmers the mistaken idea that >they don't need to avoid overflow because there is no UB.

    Making this UB is an admission of the blindingly obvious - there is no >correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and it >allows tools to help programmers avoid these mistakes, and it allows >compilers to give programmers the most efficient results from known good >code rather than adding unnecessary run-time checks that are never >triggered.

    But it doesn't say that. It says, "no guarantees; whatever
    happens happens."

    This is the thing: the correct answer is whatever the language
    defines it to be. The language could say, "this is an error" or
    it could say, "we do whatever the hardware does." But making it
    UB isn't a statement of anything. UB is a refusal to make a
    statement.

    [snip]
    (I think you missed a bit of your answer here?)

    (I did, but i was just going to say something about memcpy; it
    wasn't that interesting. :-/)

    [snip]
    I realized that we were not speaking the same language _at all_.
    He and I both wanted a language where we could write programs
    that yield efficient object code. He saw UB as essential for
    that; but what I want is a language with well-defined semantics
    that can be aggressively optimized.

    I too want a language with well-defined semantics that can be
    aggressively optimised. But I do not see UB as a hinder to that.

    UB is literally the opposite of well-defined.

    I want good definitions of things that should be defined. Things that >cannot have good definitions, are fine left undefined. A language
    standard should not be trying to define the behaviour of /everything/.

    I accept that there will be some number of things that one
    cannot reasonably define when creating a programming language.
    But that set should be small;

    I am happy knowing that I cannot divide by 0,

    Yup. That should be a trap.

    For some programs, yes. For others, no.

    No. I don't accept that division by zero is ever acceptable in
    a real program. What purpose would be served by _not_ trapping?
    Most hardware will do it anyway.

    or find the square root of a negative number (in the real
    domain).

    Yup. That should be a trap.

    For some programs, yes. For others, no.

    Same as above. If you want a NaN to be a possbility, you should
    use an operation that lets you get that, `unchecked_sqrt()` or
    something.

    I am happy knowing that I cannot add two ints if their sum
    overflows the range of their type,

    Yup. That should be a trap (if you want wrapping semantics, you
    should request it explicitly).

    I agree that wrapping semantics should be something you have to ask for.
    (As an aside, I think it is a mistake for languages to have types that
    have wrapping semantics - it's the operations that should wrap, not the >types. Zig gets it right by distinguishing between "x + y" and "x +% y".)

    Yes. Rust has this as well, in `.wrapping_add()` et al.

    I don't want to pay the price for checks, traps, and limited
    re-arrangements and optimisations when I know my expressions don't
    overflow. But I am also happy to be able to get a trap when I ask for it.

    Then the language should give you the ability to explicitly ask
    for the unchecked versions of those operations.

    But I think it is equally bad to give things a definition simply to be
    able to say there is no UB.

    I'm not suggesting that one should do that. What I'm saying is
    that it is possible to conceive of a language that lets you
    write robust, complex programs with strong guarantees about the
    behavior of code, without UB. That doesn't mean that the
    language is devoid of all notions of undefined behavior, but
    rather that unless you ask for it, using UB is an error.

    It is, IMHO, entirely /wrong/ of a language
    to define integer overflow as wrapping simply so that it is not UB. I
    do not see a guaranteed incorrect result that likely has catastrophic >consequences in a program as being better than UB.

    We've discussed this before, and I understand your perspective
    on it, but I feel it necessary to reiterate that I do not share
    that perspective.

    Defining arithmetic to be modular is perfectly acceptable. It
    is not "wrong". Defining arithmetic on explicitly sized types
    to use 2's complement semantics similarly. C defined arithmetic
    overflow for signed types to be UB because when it was
    standardized, machines existed that had different behavior and
    representations for signed types. Why didn't they make it IB?
    I don't know.

    The world is different now.

    (I believe Rust
    defines integer overflow as trapping in "debug" mode and wrapping in >"release" mode, which I think is a horrendous idea.)

    I agree that's kind of a wart. It's basically what you get with
    UB in C.

    In my opinion, the right call is providing an `unchecked_add`
    and forcing the caller to wrap that in an `unsafe` block, while
    normal `+` is always checked unless the compiler can deduce that
    overflow cannot happen.

    https://doc.rust-lang.org/std/primitive.u32.html#method.unchecked_add

    So I was the one who said "well-defined semantics" and I had a
    specific meaning in mind. Your definition is incomplete with
    respect to that meaning: in addition to what you said, invalid
    inputs should be rejected, either as a compile time error, or by
    generating an exception or panic at runtime. If you want to
    live dangerously and turn the runtime checks off for performance
    reasons, then you get 2's complement behavior for integers or
    whatever the machine does for the others.

    I am all in favour of compile-time checks and rejecting code with errors >(not just UB) as soon as possible. The "perfect" language is one where
    you really can follow the old Ada saying - if you can make it compile,
    it's ready to ship.

    I don't live dangerously by not having run-time checks on integer
    overflows. I make sure my code does not have them, so checks are >unnecessary. For some of my code, if it "panicked" somewhere in >calculations, that would be a disaster - when you have code controlling >power electronics, a sudden stop can mean short-circuits and components >releasing their magic grey smoke.

    This doesn't follow. If you have validated that the code cannot
    overflow, and you are confident in that, then the code won't
    panic due to overflow. So arguing against the validation seems
    superfluous.

    And of course if the compiler can validate that your code is
    free of overflow (perhaps by examining your checks) then it
    needn't insert the checks, so there is no runtime overhead.

    Thinking that run-time checks will save you from UB is wishful thinking.
    How are you going to have run-time checks that a pointer parameter
    points to a valid object of the right type?

    In strongly-typed languages with non-nullable references and
    lifetimes as a first-class property of an object, the compiler
    does that for you, statically, at compile-time.

    You can check for a
    null-pointer, but that's about it. Some things that are potential UB in
    C are inherent in the type of language - checking for such problems (at >compile-time or run-time) needs a language that has a different way of >handling objects and pointers so that you cannot have arbitrary pointers
    to arbitrary objects.

    C is not a language suitable for such run-time or compile-time checks -

    I agree.

    it is a language for getting the highest efficiency because the
    programmer takes responsibility for getting things right.

    Paradoxically, this is not true. Consider pointers: because
    they can be invalid, they have to be checked before dereference.
    Contrast to non-nullable references in e.g. Rust; since their
    mere existence implies that they refer to a valid object, they
    do not need to be checked for nullity, misalignment, etc. Thus,
    the better-defined language with stronger guarantees can afford
    opportunities for optimization that don't exist in the
    lower-level language riddled with UB.

    You are
    correct that large programs normally have bugs (of which UB is just one >class) - the risk of bugs goes up with the size of the code base. The >corollary is that C is not a language suitable for large programs.

    Sadly, I now agree.

    Rust, I think, reduces the risk of some kinds of bugs. So does C++,
    when used carefully. Most code, however, is best written in languages
    where these issues cannot occur - or at least where checks can be done >without a measurable impact. For example, if you use Python, you never
    have integer overflow, and you never have invalid pointers.

    If you use Rust, and restrict yourself as far as practical to
    the safe subset, you never have invalid pointers, either. Nor
    do you have uninitialized variables, or double-frees, or data
    races. Entire categories of problems --- and their expensive
    runtime checks --- are simply eliminated.

    [snip]
    Consider one of the examples you gave: signed integer overflow.
    The standard doesn't say that you _can't_ add two numbers
    together if you overflow, it just says that if you do, the
    language imposes no requirements on the resulting behavior. It
    may trap, it may elide the addition entirely, or it may do it
    and let the result be whatever the underlying machine does.

    That is, the _language_ does not say that it's a bug; it says
    that it's not going to say anything about it at all.

    I'd be happy for the C standard to say that signed integer overflow is a >bug, or that code is not allowed to overflow its integer arithmetic. I >would not be happy if it said compilers must trap on the bug or handle
    it in some specific way - what happens when a bug is reached is still
    UB. And if the wording of the standard were changed to call it a "bug" >rather than "UB", it would make absolutely zero difference to the way I >write my code.

    This is an example of two people who are not sharing a
    vocabulary around UB. I have no real commentary on that; I just
    think it is interesting.

    [snip]
    In my field, people usually put a lot of effort into writing code simply
    and clearly. You avoid mistakes not by being "clever", but by being >meticulous and careful. I don't think successful C programming requires >greater intellect, knowledge or experience compared to other programming >languages - but it /does/ require an appropriate attitude. You are
    working with sharp knives - pay attention to what you are doing, and
    you'll be fine.

    50 years of experience shows us that that simply isn't true.
    "Pay attention" and "be careful" just don't work.

    My experience is that most programmers are highly intelligent,
    capable people. They are not wrong to want behavior they can
    rely on, particularly when things are not obvious, as they
    often are not. They also want a language that requires a less
    lawyerly read of to understand its semantics; that could go the
    way of formality (my preferred approach) or just clearer
    exposition. Either would be preferable to the current state.

    I was avoiding signed integer overflow long before I had read any C >standards or even knew about the term "UB". Programming in C does not
    need a lawyer knowledge of the language. It is just like programming in
    any other programming language - use features that you know are correct,
    and if you want to do something and don't know how to do so correctly,
    look it up.

    Right. But the issue is that the source of truth, the standard,
    is ambiguous in places and opaque in others. Sussing out the
    true semantics of a thing can be cross-referencing half a dozen
    different places, and this newsgroup sees cases where people who
    are clearly intelligent, and who have an aptitude for
    programming in C, can disagree on the specific meaning of things
    in the standard.

    Frankly, I think much of that is a waste of time. Let's have
    better definitions, and more rigorous exposition.

    [snip]
    Exactly. The footguns hiding in C code that has worked
    perfectly for decades, dating back to before the standards
    existed, are legion. Caveat emptor.

    _Or_ the code may have been written with careful regard for the
    standard, but something _else_ may have been changed that now
    leads to exposure to UB. For example, perhaps code was written
    that multiples two numbers, `a*b`; a known to be `unsigned int`
    when written, but `b` is a signed int. But maybe that is hidden
    behind a typedef; some time in the future, the typedef is
    changed so that `a` is now `unsigned short`; perhaps someone
    realized that the domain values never exceed 16 bits and by
    changing the definition some critical structure now fits in a
    single cache line. But also now the type promotion rules kick
    so that `a*b` happens with the factors as `signed int` and in
    there exist values of `a` and `b` where `a*b` overflows: UB.

    The code had no UB; the change was elsewhere; no one saw this
    because the tests all passed and everything looked ok; then
    someone upgrades the compiler and now things break.

    Who's fault is that?

    There's no simple answer here.

    But one thing is clear to me - "UB" is irrelevant here (and in many of
    your points). It would not matter if everything had fully defined >behaviour. The point is that something is changed in one part of the
    code that has unexpected consequences in another part of the code. Who >cares if there is UB or not? The issue is that the code does not work
    as intended or expected. UB can provide situations where you have >unexpected bugs - but so can all sorts of other things.

    UB is the essential characteristic here. With a better defined
    language, these issues are either compile-time failures, or they
    become immediately apparent during testing. In the face of
    C-style UB, however, they become spooky action at a distance;
    the realized effect of the change may not manifest as a bug for
    many years.

    And no, this is not contrived; this is exactly the sort of thing
    that happens on large, long-lived projects.

    As said earlier, C is what it is. I suspect that it will
    continue to make incremental improvements, but we're basically
    stuck with what we have.

    Agreed.

    ...but be careful blaming the programmer.

    Or the language, or the tools.

    I push back on both of these.

    There's an old saw that goes, "a good craftsman never blames his
    tools." (I dislike it, but that's how it usually goes.)

    But there's an unstated corollary: a good craftsman also
    maintains and carefully selects the tools for the job at hand.
    You don't smooth a rough-cut board with a screwdriver, nor do
    you turn a bolt with a hammer. And you don't use a chainsaw
    without a guard.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Mon Jun 15 21:59:14 2026
    From Newsgroup: comp.lang.c

    On 14/06/2026 16:33, Dan Cross wrote:
    ...
    Here's the problem that I have with this line of reasoning. C
    is a language that has considerable history; there was a large
    body of C code written before the first standard was ever
    created, in 1988; C was a teenager. And it took many years for
    decent quality ANSI C compilers to be ubiquitous. C could
    legally drink by then.

    "Undefined Behavior", in C, in the manner usually discussed in
    this newsgroup, was introduced with the first standard. That
    means that there is --- still --- a large body of software that
    has "UB" that was put there before UB existed as a thing
    programmers needed to worry about in C.

    "undefined behavior", defined as "behavior ... for which this
    international standard imposes no requirements" Was introduced by the
    first standard. However, before there was a standard there was K&R C,
    the closest thing they had to a standard. And though the phrase
    "undefined behavior" was not in use, there was "behavior for which K&R C imposes no requirements". In fact, there was a great deal more of it,
    since K&R C was not written as carefully and precisely as the first
    standard, so it left a great deal more behavior that was "undefined by
    omission of any relevant definition" than there was in the first standard.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Tue Jun 16 04:59:38 2026
    From Newsgroup: comp.lang.c

    In article <110qali$3q27m$1@dont-email.me>,
    James Kuyper <jameskuyper@alumni.caltech.edu> wrote:
    On 14/06/2026 16:33, Dan Cross wrote:
    ...
    Here's the problem that I have with this line of reasoning. C
    is a language that has considerable history; there was a large
    body of C code written before the first standard was ever
    created, in 1988; C was a teenager. And it took many years for
    decent quality ANSI C compilers to be ubiquitous. C could
    legally drink by then.

    "Undefined Behavior", in C, in the manner usually discussed in
    this newsgroup, was introduced with the first standard. That
    means that there is --- still --- a large body of software that
    has "UB" that was put there before UB existed as a thing
    programmers needed to worry about in C.

    "undefined behavior", defined as "behavior ... for which this
    international standard imposes no requirements" Was introduced by the
    first standard. However, before there was a standard there was K&R C,
    the closest thing they had to a standard. And though the phrase
    "undefined behavior" was not in use, there was "behavior for which K&R C >imposes no requirements". In fact, there was a great deal more of it,
    since K&R C was not written as carefully and precisely as the first
    standard, so it left a great deal more behavior that was "undefined by >omission of any relevant definition" than there was in the first standard.

    I am guessing that there was supposed to be a point in there
    somewhere, but I can't find it.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Tue Jun 16 10:10:21 2026
    From Newsgroup: comp.lang.c

    On 15/06/2026 19:57, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 15/06/2026 12:43, Waldek Hebisch wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    On 15/06/2026 00:55, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    <snip>
    [...]
    Making this UB is an admission of the blindingly obvious - there is no >>>>>> correct answer when signed integer overflow occurs. It tells
    programmers that it is a mistake to let your arithmetic overflow, and >>>>>> it allows tools to help programmers avoid these mistakes, and it
    allows compilers to give programmers the most efficient results from >>>>>> known good code rather than adding unnecessary run-time checks that >>>>>> are never triggered.

    Trapping or raising/throwing an exception on overflow would also be an >>>>> admission of the blindingly obvious.

    It is obvious - to me, anyway - that signed overflow is a mistake in the >>>> code. It is trying to do something that cannot be done. What is the
    single-digit sum of 5 and 8? There is no answer. The answer is not 3, >>>> or 9. Putting your hand in the air and asking the teacher for help
    might be appropriate sometimes, but it is not a correct answer.

    Throwing some kind of exception or trap can definitely be helpful at
    times. And I agree that it would make it obvious that there has been a >>>> problem detected. But throwing exceptions or traps can cause more
    problems (the Ariane 5 failure was caused by the exception handler, not >>>> the overflow fault). That does not mean it is better to ignore
    overflows - it means there is no appropriate action that is suitable in >>>> every situation. I am far from convinced that there is even a
    reasonable choice of default action that could be usefully made.


    And a sufficiently clever compiler
    can omit some (not all) checks in cases where it can be statically
    proved that overflow doesn't occur, and/or hoist some checks out of
    loops.

    Sure - but in practice having strict overflow checks would significantly >>>> reduce optimisation and re-arrangement possibilities, as well as having >>>> to include the checks themselves. You might allow non-strict checks in >>>> some manner (thus allowing optimisations like "a + b - a" reducing to
    just "b"), but I think that might be hard to specify and would reduce
    the debugging help of the checks.

    IMO resonable and easy definition is: computation either delivers
    mathematically correct result or traps, and it is not allowed to
    trap in cases where naive bottom-up evaluation does not trap.
    In more formal way optimization is not allowed to introduce
    stronger precondition, but may weaken it.


    It is always the case that an implementation can weaken preconditions
    and strengthen postconditions and remain correct - though it might then
    be less efficient than you expect. But if you are /requiring/ a weaker
    precondition and /requiring/ a strong postcondition - such as by
    insisting on traps on overflow - you are changing the function or
    operation specification, and it is not necessarily a good thing.

    In C, the integer addition operation "c = a + b;" has a precondition :

    (a + b) <= INT_MAX, (a + b) >= INT_MIN

    It has the postcondition :

    c == a + b

    Saying that it must trap if there is overflow weakens the precondition
    to any "a" and "b", but makes the postcondition much more complicated.

    No. Precondition is the same. Postcondition has additional term "computation finished with no traps".

    That's back where we started, with no defined behaviour if "a + b" is
    too big - that is the specification for normal C addition. When you say
    that addition should either deliver the correct result for suitable "a"
    and "b", and trap for other values, you now have an operation that
    accepts any "a" and "b", and has a postcondition that includes traps.
    You have changed the function, and changed its specification,
    pre-conditions and post-conditions.


    It means it is no longer true that the result of an addition operation
    is the sum of the operands.

    Oposite of that: no traps means that regardless of precondition
    the result of an addition operation is the sum of the operands.


    Your change means that either the result is no traps and a correct sum,
    /or/ it is a trap and no valid sum (what you get returned as the "sum"
    will depend on how you define all this).

    I think, perhaps, what you mean here is that if you do something like "x
    = a + b;", and the execution makes it through the addition and does the assignment, then "x" is guaranteed to be equal to the sum of "a" and
    "b". That is fair enough - without such guarantees, traps, exceptions,
    etc., would be a completely useless concept.

    Addition is no longer a "pure" function -
    now it has side-effects that are completely unpredictable at the site of
    use. Programmers can no longer rely on the timing of the operation,
    stack usage, interaction with other code, or even that the operation
    ever finishes.

    The difference is that without traps programmers do not know if
    arithmetic operations give correct result.

    They do know - if the code is written correctly. They know the result
    is correct because they know they have fulfilled the pre-conditions. It
    is the caller code that has the responsibility to make sure the
    pre-conditions hold.

    If the programmer does not know if the pre-conditions will hold before
    the call, then they don't know what their code will do. And that is not
    a good situation to be in - the possibility of some unknown jump to
    somewhere else in the code does not make it better.

    Note that all of this is different from run-time failures that might
    occur in the normal course of the program, outside of the knowledge or
    control of the calling code. C++ exceptions, or C error return codes,
    are fine for things like a "read file" function in the case when the
    file does not exist. That is not the result of a bug in the code.
    (Well, it might be, but it doesn't have to be.) It is an expected
    situation that can be handled.

    Traps on UB are unexpected situations resulting from bugs in code. They
    can be helpful for fault-finding, and may have some uses in damage
    limitation.

    With traps they do
    not know if program will successfully finish, but if it
    finishes they know that arithmetic gave correct results.


    This is achievable in a controlled manner, without traps.

    If your code is correct, and overflow never happens, then this is all a
    big disadvantage in terms of understanding and analysing the code. And
    it does not in any way reduce the effort needed to be sure that your
    inputs are appropriate for getting the desired results of the operation.

    One needs to use correct formulas, there is no way around that.
    Without traps programmer must analyse ranges of all intermetiate
    expressions. That is tedious and error prone.

    Then do a better job of it - or find ways that are not as tedious.

    The main reasons for getting integer overflow are :

    1. Using unsanitised input.

    2. Using types that are too small.

    3. Not having a clear idea of what kinds of values you are dealing with,
    and what you are doing with them.

    The way to avoid 1 is obvious. The way to avoid 2 is obvious (except in
    the very rare situations where 64-bit integers are not big enough). The
    way to avoid 3 is obvious. (Sometimes the details of implementing these
    fixes are not minor, but the principle is clear.)

    People work
    around that by activating traps during testing, but it is
    quite hard to find worst case values, so errors may be
    easily missed during testing. Having traps active during
    production runs means that you may discover problem. You
    apparently think that ignoring possible problems at
    runtime is good thing.

    No, ignoring problems is never a good thing. Writing code that doesn't
    run the risk of problems is a good thing.

    And I can agree that sometimes leaving traps enabled in released code
    can be helpful - there are situations where you can't practically remove
    the risk of overflows, and it is better to crash out reliably than risk running on with faulty data. It is, however, also the case that
    sometimes traps will cause far more problems than incorrect data would. (Noting that UB does not guarantee "incorrect data" - it can do
    anything. Wrapping semantics, or unspecified value semantics, would do
    that.)


    For simple programs you may analyze
    it well enough to be sure that nothing bad happens at
    runtime, but in general computing we use a lot of "interesting"
    programs which are too complex to analyse. We hope that
    they will run OK, but have no proof. Sometimes hope is
    based on statistical tests and on low probability input
    program may fail. Traps are useful to make sure that
    wrong results will not propagate further.


    This is why you break your code down into manageable and understandable
    parts - functions, classes (for some languages), modules / translation
    units, files, directories, libraries. Yes, there can be interactions
    that can be very difficult to test well - testing is not easy.

    Code over a certain size is likely to contain bugs - programmers are
    rarely infallible, and even when they are ( :-) ), the customer
    specifying the program is not.

    But we are talking here about a specific class of bugs - UB that can be detected by trap options in code generation or cpu hardware, which
    basically means integer overflows, divide by 0, dereferencing null
    pointers, and shift by inappropriate amounts. Those bugs are avoidable
    - I really do not see them as a concern. Trapping won't help all the
    other bugs - buffer overflows, unterminated strings, index out of range, misunderstanding the specifications, mixing up parameter order in
    function calls, data races, logical errors, memory resource ownership
    mixups, and everything else.

    So your traps on arithmetic overflow is crippling the efficiency of calculations (and efficiency of calculations is a big reason for picking
    C in the first place) to give unexpected crashes when easily preventable mistakes occur - while doing nothing to aid the big risks.


    Trapping like this can certainly be useful for debugging. But as a
    general feature it gives a false sense of security, complicates
    mathematical analysis, introduces massive additional possible code path
    choices which are either real or almost certainly untested in practice,
    or not real (because the compiler can see they are not taken) and
    untestable.

    You get extra code paths only if you attempt to handle traps.

    Unhandled traps are also a code path.

    Trapping of overflows gives you assurance that in computation that
    you did and which finished with no traps there were no errors of
    certain kind (that is wrong results due to overflow). That is
    really not different than insistence on static types.

    They are not remotely the same - the distinction between compile-time
    and runtime is critical.

    Neither
    assures you of no bugs, but each tells you that some bugs
    did not happen. Of course, trapping at runtime is less
    satisfactory than compile time checking, but tight a priori
    bounds on ranges are notoriusly hard to obtain, so trapping
    is the best we can have for high performance software with
    current state of art.

    That is not qualitatively worse than "who knows what will
    happen" UB, but it is not significantly better.


    <snip>

    The correct way to handle the situation is to avoid it - be sure that
    you are not dividing by zero in the first place. Identify and handle
    the problem where it occurs - when this zero is created, or the
    circumstances leading to that point - rather than trying to do a
    post-mortem after the failed division. And if you are doing that, then >>>> what benefit is there in having trapping for division by zero? It
    becomes just a waste of effort.

    What is value of certification required for some software? If
    programmer did good job then program will work correctly.

    Yes.

    Trap give assurance that programmer indeed correctly handled
    tricky problem.

    No, it certainly does not. And one of the reasons to dislike traps is
    that it makes people think like that. A trap can only happen if the
    programmer did /not/ handle the problem correctly.

    Yes.

    And I expect that if
    the programmer is able to write an appropriate specific trap handler for
    the failing expression (rather than a program-global "crash with error
    message" handler), then he/she would be able to avoid the problem in the
    first place.

    Rather non-specific trap handler could work as "redo the computation
    in arbitrary precision". If problem (like division by zero) persists,
    then there is logic bug, otherwise it means that precision was
    inadequate and problem is resolved.


    If you are talking here about using traps as a testing and debugging
    aid, helping the developer spot problems and improve their code, then I
    agree - that's a good thing.

    If you are talking about some kind of automatic handling, then that is
    totally out of scope for a language like C. It would be much more
    appropriate to use a higher level managed language and higher level
    arithmetic (like support for arbitrary precision integers) in the first
    place.


    Howver, you should think about such traps similarly to parity error
    which can be signaled by some hardware. There is low but nonzero
    probablity that such error can occur. Parity check gives you
    reasonable chance to detect it.

    That's not an unreasonable comparison. Parity checks used to be popular
    - they are almost non-existent in communication protocols now. You
    either have something that you know works correctly, or you use much
    better methods - multiple ECC bits, CRCs, FEC, or whatever, according to
    the balance of cost, error rates, consequences of data loss, etc.

    Handling is at least as problematic
    as with overflow. Absence of traps gives you less info: no
    overflow traps mean no overflow, no parity traps means that
    parity was correct, but intent of parity check it to discover bit
    error and they are possible even with correct parity. So, do you
    think that parity check inside MCU-s are useless?

    Yes, for the most part. A parity check is almost always either
    unnecessary, or not nearly enough.


    Sometimes, of course, you are trying to write code that has some input
    which is supposed to be correct, but you are not sure - and you can't
    change the calling code. How you handle that situation will depend on
    the program and the situation. But I don't see trapping as "correct
    handling" unless the whole program is written with the expectation of
    traps for error handling. You might, however, end up deciding that
    trapping is the least bad option.


    And once you know that computation works
    according to math rules other forms of verification are easier.

    You also seem to have bias to real time control: if you need
    value just at given moment, then it is hard to do something
    reasonable. But at least in some control areas there is
    notion of "safe state", for example working heavy machine
    is dangerous, stopped one usually is considerd safe. If
    there is safe state, then anything not expected by program
    should trigger transition to safe state.

    I think if you are /not/ concerned with high efficiency in the code,

    Well, if efficiency does not matter traps can be implemented as
    a software layer above the language. Or one can use arbitrary
    precision arithmetic. Traps matter when efficiency matters,
    so they should be implemented in place giving best efficiency,
    at best in CPU and if that is not possible then in optimizing
    compiler.

    then you should be seriously questioning the choice of C as the language
    in the first place. And even if you use C, there are often things you
    can do to avoid having problems in the first place. The obvious one for
    integer overflow is to make more use of bigger types.

    Which may be best choice if efficiency is not important. But
    some calculations require surprisingly large accuracy to avoid
    overflow. Worse, in vast majority of cases lower accuracy
    may be adequate, so there is pressure to use "sufficient"
    accuracy overlooking special cases.

    In general computation, if you need correct value and have some
    time there are options which may involve re-doing computation at
    higher precistion, which may get rid of occasional overflows
    and divisions by zero due to overflow. Division by zero may
    be due to bad input data, traps allow indentification of
    such data (doing it in other way may be computationaly quite
    expensive).




    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 21 14:13:23 2026
    From Newsgroup: comp.lang.c

    scott@slp53.sl.home (Scott Lurndal) writes:

    One might also define data structures for control and status
    registers using bitfield structs.

    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.

    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
    uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
    uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
    uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
    #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };

    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?

    (Personal note: I tried sending an email to you at the address in
    your news posting, but my mailer complained about the address. If
    it's okay could I ask you to send me an email at the address in my
    news posting? Whatever you decide, thanks.)
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 21 14:26:08 2026
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    Dan Cross <cross@spitfire.i.gajendra.net> wrote:

    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    and in fact
    it *won't* occur during execution because foo() isn't called.
    A compiler can't generate code with arbitrary behavior just
    because it can't prove that there will be no UB. If it could,
    every signed or floating-point arithmetic operation with unknown
    operand values would grant the same permission.

    But that's not the situation here. The situation is that the
    compiler can prove that something _is_ UB.

    In the program quoted at the top of this post, the UB occurs in
    a function foo() that's never called. A compiler can replace the
    body of foo() with a trap, and it can certainly warn about the UB,
    but I don't believe it can reject the entire program. A clever
    compiler could prove that the UB never occurs.

    So there are two things that are at play here.

    First, this notion that UB is _only_ a runtime matter. The text
    of the standard contradicting that aside, if a translator can
    detect that the behavior of a construct is provably undefined if
    executed, then it seems axiomatic that UB is clearly something
    that plays a role at translation time, as well.

    I think that this paragraph (and several other it this post and
    other posts) represent fundamental misanderstanding. This may
    be due to the way C standard is written. AFAIK Extended Pascal
    standard (once you translate terminalogy) states the same things as
    C about UB, but in clearer way. Some relevant parts below:

    : 3.1 Dynamic-violation
    : A violation by a program of the requirements of this International
    : Standard that a processor is permitted to leave undetected up to,
    : but not beyond, execution of the declaration, definition, or
    : statement that exhibits (see clause 6) the dynamic-violation.

    : 3.2 Error
    : A violation by a program of the requirements of this International
    : Standard that a processor is permitted to leave undetected.
    ...
    : 5.1 Processors
    ...
    : e) be able to determine whether or not the program violates any
    : requirements of this International Standard, where such a
    violation is : not designated an error or dynamic-violation,
    ...

    : 5.2 Programs
    ...
    : b) if it conforms at level 1, use only those features of the
    language : specified in clause 6;

    UB in C standard corresponds with 'error' in Pascal standard. [...]

    Does it? In C a syntax error is undefined behavior, but it
    requires a diagnostic. (I don't mean to single out just syntax
    errors; there are other examples.)
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 21 15:26:35 2026
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    [...]

    I think that lawyerish style of current C standard is mostly
    inertia,

    I wouldn't use a term like lawyerish to describe the text in the
    ISO C standard. Can you explain what quality you mean to ascribe
    to "lawyerish" writing in the C standard without using any term
    related to lawyering or legal documents?

    and making standard more mathematical would improve it.

    Could you elaborate on that statement? In what ways would giving
    a more mathematical treatment of C semantics improve the quality
    of the ISO C document? How would doing that advance the stated
    purposes or goals of the C standard?

    But giving formal semantic in the standard would mean
    significantly bigger change.

    Due to the nature of C, I believe it is effectively impossible to
    give a formal mathematical definition of the semantics of C. Do
    you think such a thing is feasible or practicable? If so can you
    explain the reasoning behind your thinking?
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 22 03:40:56 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    [...]

    I think that lawyerish style of current C standard is mostly
    inertia,

    I wouldn't use a term like lawyerish to describe the text in the
    ISO C standard. Can you explain what quality you mean to ascribe
    to "lawyerish" writing in the C standard without using any term
    related to lawyering or legal documents?

    Sorry no, I can not. My point is that you need to treat
    C standard almost like legal document and I can not explain
    this without using proper terminology.

    and making standard more mathematical would improve it.

    Could you elaborate on that statement? In what ways would giving
    a more mathematical treatment of C semantics improve the quality
    of the ISO C document? How would doing that advance the stated
    purposes or goals of the C standard?

    There are many aspect of mathematical treatment. One is care
    about terminology, namely that terms are either reasonably
    clearly marked as "primitve" (and assumed to be understood
    by readers) or are precisely defined. Related is that
    words can be taken as written, without needing to look at
    intent or similar legal style arguments. You may think that
    C standard already posseses such properties, but recent
    example, that is definition of expression nicely illustrates
    current problems. With mathematical treatment expression
    would be part of C program derived from corresponding
    grammar rule and that would resolve the problem. In the
    past in this group there were several discussions about
    various parts of C standard, and there were cases were
    standard wording looked genuinly confusing. I am not
    prepared to dig into those discussions, but my impression
    was that in some cases mathematical treatment would make
    things clearer.

    But giving formal semantic in the standard would mean
    significantly bigger change.

    Due to the nature of C, I believe it is effectively impossible to
    give a formal mathematical definition of the semantics of C. Do
    you think such a thing is feasible or practicable? If so can you
    explain the reasoning behind your thinking?

    I think that this is possible given dedicated team of qualified
    people doing the work. I do not know if it is practically
    possible to assemble needed team. I already mentioned axiomatic
    semantics. There is C grammar and we need to assign semantics
    to various production rules. We do this assigning precondtions
    and postcondtions to the rules. In much simpler cases this
    was done. C is bigger language and rules are more complicated,
    but that for me looks like quantitive problem, that is there is
    more work and result will be bigger. Clearly, this would
    require buy-in from the standard body. Namely, formalization
    is likely to uncover many unclear places in C standard and
    ensuring that formalization matches the standard would require
    resolution by the standard body. It is quite possible that
    standard body would refuse to cooperate. To explain this more,
    let me mention past discussion about Extended Pascal in a
    different forum. I was looking at types of constants, but in
    specific case rules looked contradictory, so I asked a
    question. One response was from former commitee member (this
    was several years after Pascal standard was ratified), he
    basicaly said that type of constants does not matter. Which
    was mostly true, but my reason for asking the question was
    that validity of programs depended on types of constants.
    Something similar may happen during formalization:
    formalization may discover unclear places in C standard
    which C commitee considers irrelevant in practice and
    refuses to clarify.

    BTW: Authors of some tools already need and have formal
    semantics for language rather close to C. Namely,
    Comp-Cert compiler is matching conditions in source
    code with machine code and for that it needs reasonably
    good aproximation to formal semantics of language implemented
    by C compiler (more precisely gcc). Microsoft developed
    formal checking tools and that too needs formal semantics.
    But since goals are different neither give semantics of
    standard C.
    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Jun 22 03:56:24 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    antispam@fricas.org (Waldek Hebisch) writes:

    Dan Cross <cross@spitfire.i.gajendra.net> wrote:

    In article <1100g0e$1lt8i$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    and in fact
    it *won't* occur during execution because foo() isn't called.
    A compiler can't generate code with arbitrary behavior just
    because it can't prove that there will be no UB. If it could,
    every signed or floating-point arithmetic operation with unknown
    operand values would grant the same permission.

    But that's not the situation here. The situation is that the
    compiler can prove that something _is_ UB.

    In the program quoted at the top of this post, the UB occurs in
    a function foo() that's never called. A compiler can replace the
    body of foo() with a trap, and it can certainly warn about the UB,
    but I don't believe it can reject the entire program. A clever
    compiler could prove that the UB never occurs.

    So there are two things that are at play here.

    First, this notion that UB is _only_ a runtime matter. The text
    of the standard contradicting that aside, if a translator can
    detect that the behavior of a construct is provably undefined if
    executed, then it seems axiomatic that UB is clearly something
    that plays a role at translation time, as well.

    I think that this paragraph (and several other it this post and
    other posts) represent fundamental misanderstanding. This may
    be due to the way C standard is written. AFAIK Extended Pascal
    standard (once you translate terminalogy) states the same things as
    C about UB, but in clearer way. Some relevant parts below:

    : 3.1 Dynamic-violation
    : A violation by a program of the requirements of this International
    : Standard that a processor is permitted to leave undetected up to,
    : but not beyond, execution of the declaration, definition, or
    : statement that exhibits (see clause 6) the dynamic-violation.

    : 3.2 Error
    : A violation by a program of the requirements of this International
    : Standard that a processor is permitted to leave undetected.
    ...
    : 5.1 Processors
    ...
    : e) be able to determine whether or not the program violates any
    : requirements of this International Standard, where such a
    violation is : not designated an error or dynamic-violation,
    ...

    : 5.2 Programs
    ...
    : b) if it conforms at level 1, use only those features of the
    language : specified in clause 6;

    UB in C standard corresponds with 'error' in Pascal standard. [...]

    Does it? In C a syntax error is undefined behavior, but it
    requires a diagnostic. (I don't mean to single out just syntax
    errors; there are other examples.)

    I mean typical UB, especialy cases that people complain about.
    It does not help that C uses the same term in few other cases,
    which are really different.
    --
    Waldek Hebisch
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jun 22 08:58:02 2026
    From Newsgroup: comp.lang.c

    On 21/06/2026 23:13, Tim Rentsch wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:

    One might also define data structures for control and status
    registers using bitfield structs.

    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.

    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
    uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
    uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
    uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
    #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };

    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?


    Size-specific types are almost always the best choice for situations
    like this.

    When you are using bitfields simply as a way to pack small bits of data
    more efficiently, you use whatever style of type fits best with your
    needs - consistency with the rest of the code, making the sizes
    independent of the target, making the sizes adjust according to the
    target, maximal portability across compilers and standards version -
    whatever you like.

    But when you are using them to fit to an existing externally defined structure, fixed-size types are a big advantage (for the whole struct,
    not just the bitfields). It is easier to see that the structure is
    correct because you are explicit about the sizes. Types like "uint32_t"
    have the advantage that they are not portable to targets that can't
    support them - as it is likely that you would need to write such code
    somewhat differently for it to work on a machine that does not have such types, causing a compile-time error is useful.

    And when the structures represent hardware registers, such as here, you
    have additional motivation - these registers are typically accessed with volatile accesses, and you often want to be sure of the exact size of
    the accesses. That is always up to the implementation, but the norm is
    that when your bitfields are of a given size, generated volatile
    accesses for them use that matching size.

    So "uint32_t" says /precisely/ what the code author wants to say for the
    type. "unsigned" does not. "uint32_t" is appropriate regardless of the target and the choice of standard integer sizes - "unsigned" is not.


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 22 03:35:24 2026
    From Newsgroup: comp.lang.c

    David Brown <david.brown@hesbynett.no> writes:
    On 21/06/2026 23:13, Tim Rentsch wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    One might also define data structures for control and status
    registers using bitfield structs.
    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.
    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
    uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
    uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
    uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
    #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };
    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?


    Size-specific types are almost always the best choice for situations
    like this.

    When you are using bitfields simply as a way to pack small bits of
    data more efficiently, you use whatever style of type fits best with
    your needs - consistency with the rest of the code, making the sizes independent of the target, making the sizes adjust according to the
    target, maximal portability across compilers and standards version -
    whatever you like.

    But when you are using them to fit to an existing externally defined structure, fixed-size types are a big advantage (for the whole struct,
    not just the bitfields). It is easier to see that the structure is
    correct because you are explicit about the sizes. Types like
    "uint32_t" have the advantage that they are not portable to targets
    that can't support them - as it is likely that you would need to write
    such code somewhat differently for it to work on a machine that does
    not have such types, causing a compile-time error is useful.

    And when the structures represent hardware registers, such as here,
    you have additional motivation - these registers are typically
    accessed with volatile accesses, and you often want to be sure of the
    exact size of the accesses. That is always up to the implementation,
    but the norm is that when your bitfields are of a given size,
    generated volatile accesses for them use that matching size.

    So "uint32_t" says /precisely/ what the code author wants to say for
    the type. "unsigned" does not. "uint32_t" is appropriate regardless
    of the target and the choice of standard integer sizes - "unsigned" is
    not.

    uint32_t x;
    says precisely that x is 32 bits, unsigned, with no padding bits. But
    uint32_t bf : 1;
    is meaningfully different from
    unsigned bf : 1;
    only because in most implementations (and ABIs), the underlying type of
    a bit field affects the layout of the entire structure.

    I accept that this is the case, but it's never made any sense to me, and there's no hint of it in the C standard.

    For example, if I write:
    uint64_t bf : 1;
    then the containing struct is typically at least 64 bits, even though
    those other 63 bits aren't part of the bit field and other members can
    be allocated within them.

    It would make a lot more sense *to me* if an N-bit bit field were simply
    N bits.

    (And of course int, signed int, unsigned int, and bool are the only
    portable types for bitfields -- but if you're using bit fields, it's
    likely that portability isn't your only priority.)
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 22 10:45:06 2026
    From Newsgroup: comp.lang.c

    In article <86h5mv8umk.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:

    One might also define data structures for control and status
    registers using bitfield structs.

    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.

    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
    uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
    uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
    uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
    #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };

    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?

    No. There are issues of alignment and padding one must consider
    when using bitfields to model hardware registers, particularly
    if (say) a device driver is meant to be shared across ISAs.

    Using the exact width types really does make a difference; it's
    IB what those properties are, though we're usually at the mercy
    of the target platform's ABI anyway at that point.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 22 10:50:10 2026
    From Newsgroup: comp.lang.c

    In article <111b35d$1duuq$1@kst.eternal-september.org>,
    Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    [snip]
    uint32_t x;
    says precisely that x is 32 bits, unsigned, with no padding bits. But
    uint32_t bf : 1;
    is meaningfully different from
    unsigned bf : 1;
    only because in most implementations (and ABIs), the underlying type of
    a bit field affects the layout of the entire structure.

    I accept that this is the case, but it's never made any sense to me, and >there's no hint of it in the C standard.

    For example, if I write:
    uint64_t bf : 1;
    then the containing struct is typically at least 64 bits, even though
    those other 63 bits aren't part of the bit field and other members can
    be allocated within them.

    It would make a lot more sense *to me* if an N-bit bit field were simply
    N bits.

    If dealing with, e.g., hardware, then the author should probably
    constrain things so that bitfields occupy the fully width of the
    underlying type. E.g.,

    uint64_t bt:1;
    uint64_t reserved:63;

    And so forth.

    (And of course int, signed int, unsigned int, and bool are the only
    portable types for bitfields -- but if you're using bit fields, it's
    likely that portability isn't your only priority.)

    It may be, but you'll be programming against an ABI (or set of
    ABIs) or similar external standards that give you stronger
    guarantees than ISO C, at that point.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Jun 22 12:59:27 2026
    From Newsgroup: comp.lang.c

    On 22/06/2026 12:35, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    On 21/06/2026 23:13, Tim Rentsch wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:
    One might also define data structures for control and status
    registers using bitfield structs.
    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.
    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */ >>>> uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */ >>>> uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */ >>>> uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */ >>>> #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };
    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?


    Size-specific types are almost always the best choice for situations
    like this.

    When you are using bitfields simply as a way to pack small bits of
    data more efficiently, you use whatever style of type fits best with
    your needs - consistency with the rest of the code, making the sizes
    independent of the target, making the sizes adjust according to the
    target, maximal portability across compilers and standards version -
    whatever you like.

    But when you are using them to fit to an existing externally defined
    structure, fixed-size types are a big advantage (for the whole struct,
    not just the bitfields). It is easier to see that the structure is
    correct because you are explicit about the sizes. Types like
    "uint32_t" have the advantage that they are not portable to targets
    that can't support them - as it is likely that you would need to write
    such code somewhat differently for it to work on a machine that does
    not have such types, causing a compile-time error is useful.

    And when the structures represent hardware registers, such as here,
    you have additional motivation - these registers are typically
    accessed with volatile accesses, and you often want to be sure of the
    exact size of the accesses. That is always up to the implementation,
    but the norm is that when your bitfields are of a given size,
    generated volatile accesses for them use that matching size.

    So "uint32_t" says /precisely/ what the code author wants to say for
    the type. "unsigned" does not. "uint32_t" is appropriate regardless
    of the target and the choice of standard integer sizes - "unsigned" is
    not.

    uint32_t x;
    says precisely that x is 32 bits, unsigned, with no padding bits. But
    uint32_t bf : 1;
    is meaningfully different from
    unsigned bf : 1;
    only because in most implementations (and ABIs), the underlying type of
    a bit field affects the layout of the entire structure.

    I accept that this is the case, but it's never made any sense to me, and there's no hint of it in the C standard.

    For example, if I write:
    uint64_t bf : 1;
    then the containing struct is typically at least 64 bits, even though
    those other 63 bits aren't part of the bit field and other members can
    be allocated within them.

    It would make a lot more sense *to me* if an N-bit bit field were simply
    N bits.

    There is sense in that, yes, but as I said the access type is important
    too. The struct Scott gave would not be the same if it used uint8_t
    instead of uint32_t for the bit-fields, even though there would be no difference in the alignments or paddings (on a "normal" cpus, rather
    than a DS9000). For hardware registers, access size is often critical -
    it is not like accessing ram. And while the choice of access size is implementation defined, the size of the type used for the bit-field is
    the most common way to determine that (for volatile accesses).

    If C had a different way of specifying access sizes, then it might be a
    bit different - perhaps _BitInt types would be the best choices for
    bit-field types.


    (And of course int, signed int, unsigned int, and bool are the only
    portable types for bitfields -- but if you're using bit fields, it's
    likely that portability isn't your only priority.)


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Jun 22 15:04:52 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    scott@slp53.sl.home (Scott Lurndal) writes:

    One might also define data structures for control and status
    registers using bitfield structs.

    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.

    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
    uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
    uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
    uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
    #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };

    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?

    The SATA hardware register is defined as a 32-bit register in the
    SATA specification. Therefore we explicitly declare it as such.

    There are other hardware registers in our implementation of the SATA
    controller that are defined as 64-bit registers, for those we use
    uint64_t (rather than relying on 'unsigned long' for 64-bit linux
    or 'unsigned long long' for 32-bit OS - and this code was designed
    to be compiled for both 32-bit and 64-bit targets originally).


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Jun 22 15:23:40 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:
    In article <86h5mv8umk.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    scott@slp53.sl.home (Scott Lurndal) writes:

    One might also define data structures for control and status
    registers using bitfield structs.

    Yeah. This kind of application (among others) I consider one of
    the motivating forces behind bitfields.

    [Some whitespace trimming done in the excerpt below.]

    e.g. for the SATA UAHC_GLB_OOBR register:

    union UAHC_GBL_OOBR {
    uint32_t u;
    struct UAHC_GBL_OOBR_s {
    #if __BYTE_ORDER == __BIG_ENDIAN
    uint32_t we : 1; /**< R/W/H - Write enable. */
    uint32_t cwmin : 7; /**< R/W/H - COMWAKE minimum value [...] */
    uint32_t cwmax : 8; /**< R/W/H - COMWAKE maximum value [...] */
    uint32_t cimin : 8; /**< R/W/H - COMINIT minimum value [...] */
    uint32_t cimax : 8; /**< R/W/H - COMINIT maximum value [...] */
    #else
    uint32_t cimax : 8;
    uint32_t cimin : 8;
    uint32_t cwmax : 8;
    uint32_t cwmin : 7;
    uint32_t we : 1;
    #endif
    } s;
    };

    To me it seems kind of goofy to use uint32_t for the bitfields type.
    I would just use unsigned, which is just as sure to work as intended,
    isn't it?

    No. There are issues of alignment and padding one must consider
    when using bitfields to model hardware registers, particularly
    if (say) a device driver is meant to be shared across ISAs.

    That's a good choice of verb (model).

    As it happens, the primary use of this data structure is not
    to handle direct accesses to the hardware registers, but rather
    to model them in a simulation. So when the simulated CPU
    accesses the register, after determining the target address
    is assigned to the SATA controller GBL_OOB register, the
    SATA device model code (which hosts the register) will access
    the bitfields individually by name when implementing the
    semantics of a store to that register by the simulated CPU
    (which will typically be running the linux SATA driver).

    Far more maintainable and readable than manipulating the bit fields
    with shift and mask operations.

    e.g.

    if (gbl_oobr.s.we) { /* Writes are enabled */
    /* do it */
    }

    is better in all respects than

    if (gbl_oobr & 1) /* LE */
    or
    if (gbl_oobr & (1 << 31)) /* BE */
    or even
    if (gbl_oobr & (1 << WRITE_ENABLE_BIT_OFFSET))


    Of course the data structure can also be used by a real
    hardware device driver, with the caveat that the contents
    of the hardware register is loaded explicitly into the '.u'
    member by the driver before accessing the bitfields.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Mon Jun 22 13:02:27 2026
    From Newsgroup: comp.lang.c

    scott@slp53.sl.home (Scott Lurndal) writes:
    [...]
    There are other hardware registers in our implementation of the SATA controller that are defined as 64-bit registers, for those we use
    uint64_t (rather than relying on 'unsigned long' for 64-bit linux
    or 'unsigned long long' for 32-bit OS - and this code was designed
    to be compiled for both 32-bit and 64-bit targets originally).

    You could have used unsigned long long for both. I agree that using
    uint64_t is better if you specifically need 64 bits.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Jun 24 16:58:36 2026
    From Newsgroup: comp.lang.c

    I'm entering this sub-thread late and yet I haven't finished reading
    all posts. - While I lately noticed some convergence of opinions and
    facts this post appears to me to mostly fall back again; but I won't
    re-open the discussion. I just want to comment on a single exposition.

    On 2026-06-15 10:09, David Brown wrote:
    On 15/06/2026 00:55, Keith Thompson wrote:
    [...]
    [...]

    Throwing some kind of exception or trap can definitely be helpful at times.  And I agree that it would make it obvious that there has been a problem detected.  But throwing exceptions or traps can cause more
    problems (the Ariane 5 failure was caused by the exception handler, not
    the overflow fault).  That does not mean it is better to ignore
    overflows - it means there is no appropriate action that is suitable in every situation.  I am far from convinced that there is even a
    reasonable choice of default action that could be usefully made.
    (I don't expect the complete investigation report on the Ariane 5
    incident being represented or explained, but picking a few facts is
    not only an oversimplification here, it lead to a misrepresentation
    of the case and inappropriate reasoning and conclusions.)

    Throwing an exception in a system that should have a well-defined and
    safe behavior is of course stupid. Exceptions are there to catch them
    and handle them with appropriate actions to mitigate or fix any issue.

    Not UB, but well defined software and well defined system behavior is
    the key! That should be not only in aviation and life-critical systems
    but (ideally) also in "ordinary" software development with used tools.

    The problem with the Ariane 5 was a sequence and combination of events.

    But the _primary cause_ had not been the [technical] interrupt. It was
    the fact that the *requirements* (the flight trajectories) changed from
    Ariane 4 to Ariane 5 and that they didn't adjust the system accordingly
    but just re-used formerly designed system components unchanged.

    (This actually reminds (or resembles?) more the case that Dan narrated;
    of using old software systems with new tools, that "unexpectedly" fails
    in a new compiler-environment, because of a component in another place
    that was just "invisible" at the place where the problem got triggered.)

    In retrospect it is clear that all the software components with their
    contracts should have been double-checked against the (new) Ariane 5 requirements - that hadn't been done and that was the problem source!

    (There's a reason why they use Ada and not "C" in such areas; the
    rocket might otherwise have exploded on the launching-ramp already. ;-)

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Jun 24 17:45:22 2026
    From Newsgroup: comp.lang.c

    On 2026-06-16 10:10, David Brown wrote:
    On 15/06/2026 19:57, Waldek Hebisch wrote:
    [...]

    No, ignoring problems is never a good thing.  Writing code that doesn't
    run the risk of problems is a good thing.

    Sure.


    And I can agree that sometimes leaving traps enabled in released code
    can be helpful - there are situations where you can't practically remove
    the risk of overflows, and it is better to crash out reliably than risk running on with faulty data.  It is, however, also the case that
    sometimes traps will cause far more problems than incorrect data would. (Noting that UB does not guarantee "incorrect data" - it can do
    anything.  Wrapping semantics, or unspecified value semantics, would do that.)

    Hmm.. - not sure what you mean (and imply with) "crash out reliably".

    Having been engaged in server systems software development a crash
    had never been an accepted option. And that's certainly also true
    with life-critical applications and costly operations (upthread you
    had mentioned Ariane 5). You should always avoid crashes and catch
    exceptions. The point is what you can then do with that information,
    and that depends on the actual application case; report it, retry it,
    retry with alternative methods or adapted conditions, emulate the
    result, estimate it, ask supervisor process, switch devices, etc.

    I'm well aware that wrong data may also be bad, be it from a wrong
    algorithms, a technical overflow situation, unreliable data sources,
    or an unreliable processing (not-excluding effects of UB).

    I'm really not sure whether to consider "not handling an exception"
    better or worse than "not handling data errors"; usually you don't
    want either. So both should prevented (if possible) or acted upon
    (if getting a notice about it).

    Janis

    [...]

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Wed Jun 24 12:27:47 2026
    From Newsgroup: comp.lang.c

    On 6/24/2026 8:45 AM, Janis Papanagnou wrote:
    On 2026-06-16 10:10, David Brown wrote:
    On 15/06/2026 19:57, Waldek Hebisch wrote:
    [...]

    No, ignoring problems is never a good thing.  Writing code that
    doesn't run the risk of problems is a good thing.

    Sure.


    And I can agree that sometimes leaving traps enabled in released code
    can be helpful - there are situations where you can't practically
    remove the risk of overflows, and it is better to crash out reliably
    than risk running on with faulty data.  It is, however, also the case
    that sometimes traps will cause far more problems than incorrect data
    would. (Noting that UB does not guarantee "incorrect data" - it can do
    anything.  Wrapping semantics, or unspecified value semantics, would
    do that.)

    Hmm.. - not sure what you mean (and imply with) "crash out reliably".

    Having been engaged in server systems software development a crash
    had never been an accepted option. And that's certainly also true
    with life-critical applications and costly operations (upthread you
    had mentioned Ariane 5). You should always avoid crashes and catch exceptions.

    Right. Also, fwiw, I had a calibration system for my server framework
    that would artificially crash a system while keep logs. On reboot, it
    read the results and self calibrated itself.



    The point is what you can then do with that information,
    and that depends on the actual application case; report it, retry it,
    retry with alternative methods or adapted conditions, emulate the
    result, estimate it, ask supervisor process, switch devices, etc.

    I'm well aware that wrong data may also be bad, be it from a wrong algorithms, a technical overflow situation, unreliable data sources,
    or an unreliable processing (not-excluding effects of UB).

    I'm really not sure whether to consider "not handling an exception"
    better or worse than "not handling data errors"; usually you don't
    want either. So both should prevented (if possible) or acted upon
    (if getting a notice about it).

    Janis

    [...]


    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Jun 28 02:49:30 2026
    From Newsgroup: comp.lang.c

    On 2026-06-12 02:41, Keith Thompson wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    Oh, actually I indeed thought that printing a constant string would not
    create any error that would then be indicated by printf's return value.

    Linux has a device called "/dev/full". It acts like it has no data
    on input, and like it's full on output. You can redirect a program's
    stdout to /dev/full. It's useful for testing, and much easier than
    finding a writable filesystem with no remaining space. (/dev/null
    accepts and discards as much intput as you send to it.)

    I've never stumbled across /dev/full before. Thanks for that hint.

    [...]

    I'd indeed also expected that, say, printing a string value with a '%d'
    specifier would produce an error, but I saw that it doesn't; while the
    compiler creates just a warning, execution provides some random output
    and a _non-negative_ string-length value as printf's return value. Not
    exactly what I'd expect from a language.

    Calling printf with a mismatch between the format string and
    an argument has undefined behavior. Some compilers will warn
    about this in most cases, but in general the format string is not
    necessarily known at compile time.

    Well, yes. But therefore I imagined that at runtime an rc<0 could have indicated such a mismatch.

    (BTW, only after my post I noticed that my example scenario ("printing
    a string value with a '%d'") a string is actually passed as a pointer
    value, so is a type similar to an integer, which might make it easier
    to spot at compile time than at runtime? Anyway; this is something I'd
    like to be detected.)

    No diagnostic or other error indication is required.

    Well.

    [...]

    Obviously (because of that?) I've never seen anyone test such a call
    by, say,

    int rc = printf("Hello, world\n");
    if (rc < 0) {
    /* umm.. */
    }

    Quick-and-dirty programs like the classic "hello, world" often don't
    bother to check. The above could print an error message to stderr and
    call exit(EXIT_FAILURE). Even if stdout and stderr both produce errors,
    the caller should be able to detect the error status. (I've configured
    my shell to print a message when a program dies with an error status.)

    But most production programs don't just blindly print stuff to stdout.
    [...]

    Are you - plural, all CLC audience - writing such code with 'printf()',
    honestly? - Same question with 'int rc = fclose (...);' - what can one
    do about that, then? (Write a logfile entry, maybe? - and then?)

    Write the error message to stderr, optionally log it somewhere,
    and exit with an error code.

    Just note that generally a terminal might not be connected. As I wrote,
    logging was what we've done, so I'm with you here. An exit, OTOH, was in
    our server applications not an option.

    Janis

    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Jun 28 03:16:15 2026
    From Newsgroup: comp.lang.c

    On 2026-06-12 02:41, James Kuyper wrote:
    On 2026-06-11 18:37, Janis Papanagnou wrote:
    [...]
    I'd indeed also expected that, say, printing a string value with a '%d'
    specifier would produce an error, but I saw that it doesn't; while the
    compiler creates just a warning, execution provides some random output
    and a _non-negative_ string-length value as printf's return value. Not
    exactly what I'd expect from a language.

    On some systems I've used, it would try to interpret the pointer to the string as an int, and print the result.

    Right. That occurred to me only after I had sent my post.

    On others, it would expect the
    int to be stored in one register, whereas the pointer was stored in a different register, and as a result it would print whatever value was
    last stored in the first register. These were natural outcomes for those implementations; had the C standard imposed any conflicting requirements
    on the behavior, it would have complicated those implementations.

    [...]

    Obviously (because of that?) I've never seen anyone test such a call
    by, say,

    int rc = printf("Hello, world\n");
    if (rc < 0) {
    /* umm.. */
    }

    Are you - plural, all CLC audience - writing such code with 'printf()',
    honestly? - Same question with 'int rc = fclose (...);' - what can one
    do about that, then? (Write a logfile entry, maybe? - and then?)

    For most of the programs I ever wrote, a single check for ferror(file)
    at the end of the program, resulting in exit(EXIT_FAILURE) being called, would be acceptable.

    Hmm.. - I don't recall to have ever used ferror().

    Personally it seems to me that continuing I/O once an error gets
    flagged is not something I'd have done with an easy conscience.
    So I'd not have dared to interrogate that state only once at the
    end of the program. But then, instead of regularly calling ferror,
    checking the RC should suffice?

    (I think I mentioned already that usually we inspected the RC at
    the place of generation for all functions where we felt it matters
    and where we can act on such events.)

    That approach relies on the fact that the error
    flag is sticky. Because I made a habit of such checks, we caught a
    problem when a disk overflowed before we'd wasted hours "writing" data
    to nowhere. If I had sent a message to a log file, it would have been
    blocked by the same problem, which is why I used the exit status to
    report the problem.

    Okay. In our cases we had no problems with disk overflows; our main
    operational concern was communication of data, less storage of huge
    amounts of temporaries or payload data. For logfiles we had our own
    framework with IIRC a "tandem approach" (alternating between two or
    more logfiles, each with a fixed maximum number of log-entries, and
    reused when it's its turn); so disk space was well under control.

    Janis

    [...]
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sat Jun 27 21:00:39 2026
    From Newsgroup: comp.lang.c

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 2026-06-12 02:41, Keith Thompson wrote:
    [...]
    Calling printf with a mismatch between the format string and
    an argument has undefined behavior. Some compilers will warn
    about this in most cases, but in general the format string is not
    necessarily known at compile time.

    Well, yes. But therefore I imagined that at runtime an rc<0 could have indicated such a mismatch.

    That would be nice, but it's just one of the infinitely many possible
    results of undefined behavior.

    For example, this program:

    #include <stdio.h>
    int main(void) {
    const int result = printf("%ld\n", 0.3);
    printf("printf returned %d\n", result);
    }

    on my system prints:

    140732048673560
    printf returned 16

    gcc and clang warn about the format string. tcc doesn't.

    The (first) printf call was apparently successful because printf
    has no way to know that the argument was of an incorrect type
    (types don't really exist at run time).
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 28 09:42:15 2026
    From Newsgroup: comp.lang.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    [...]

    uint32_t x;
    says precisely that x is 32 bits, unsigned, with no padding bits.

    Actually it says a little bit more, but never mind that.

    But
    uint32_t bf : 1;
    is meaningfully different from
    unsigned bf : 1;

    only because in most implementations (and ABIs), the underlying type
    of a bit field affects the layout of the entire structure.

    I would say this differently. The two member declarations shown
    might be meaningfully different, depending on the implementation:
    they >can< be different, but they don't have to be, and indeed on
    many implementations they are exactly the same.

    I accept that this is the case, but it's never made any sense to me,
    and there's no hint of it in the C standard.

    I think saying there is not even a hint is an overstatement. The C
    standard says that an implementation "may allocate any addressable
    storage unit large enough to hold a bit-field." It shouldn't be a
    surprise that how much storage is allocated depends on the type of
    the bit-field member. For example, a bit-field of type 'unsigned'
    might very well choose a larger storage unit than what is chosen
    for a bit-field of type '_Bool'. It seems obvious that the type of
    a bit-field might affect what size and layout is chosen.

    For example, if I write:
    uint64_t bf : 1;

    then the containing struct is typically at least 64 bits, even
    though those other 63 bits aren't part of the bit field and other
    members can be allocated within them.

    It would make a lot more sense *to me* if an N-bit bit field were
    simply N bits.

    Two problems with that. One, it seems to be in conflict with what
    the C standard says about 0-width bit-fields. Two, the C standard
    explicitly allows allocating bit-fields using a high-to-low order or
    a low-to-high order (implementation-defined choice). Presumably
    this freedom is given to accommodate both big- and little-endian
    platforms. The idea that an N-bit bit-field should simply be N bits
    doesn't work in big-endian environments. It seems better to allow little-endian implementations to choose a size that matches what a
    big-endian implementation would use, rather than insisting that they
    be different.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Jun 28 09:52:36 2026
    From Newsgroup: comp.lang.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

    On 2026-06-12 02:41, Keith Thompson wrote:

    [...]

    Calling printf with a mismatch between the format string and an
    argument has undefined behavior. Some compilers will warn about
    this in most cases, but in general the format string is not
    necessarily known at compile time.

    Well, yes. But therefore I imagined that at runtime an rc<0
    could have indicated such a mismatch.

    That would be nice, but it's just one of the infinitely many
    possible results of undefined behavior.

    For example, this program:

    #include <stdio.h>
    int main(void) {
    const int result = printf("%ld\n", 0.3);
    printf("printf returned %d\n", result);
    }

    on my system prints:

    140732048673560
    printf returned 16

    gcc and clang warn about the format string. tcc doesn't.

    The (first) printf call was apparently successful because printf
    has no way to know that the argument was of an incorrect type
    (types don't really exist at run time).

    printf() could know if an argument were of an incorrect type, if
    an implementation chose to do so.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jun 28 18:06:31 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    But
    uint32_t bf : 1;
    is meaningfully different from
    unsigned bf : 1;

    only because in most implementations (and ABIs), the underlying type
    of a bit field affects the layout of the entire structure.
    [...]
    I accept that this is the case, but it's never made any sense to me,
    and there's no hint of it in the C standard.

    I think saying there is not even a hint is an overstatement. The C
    standard says that an implementation "may allocate any addressable
    storage unit large enough to hold a bit-field." It shouldn't be a
    surprise that how much storage is allocated depends on the type of
    the bit-field member. For example, a bit-field of type 'unsigned'
    might very well choose a larger storage unit than what is chosen
    for a bit-field of type '_Bool'. It seems obvious that the type of
    a bit-field might affect what size and layout is chosen.

    I'm sure it seems obvious to you. As I said, it's not at all
    obvious to me.

    Prior to C99, C didn't even require compilers to support bit-field types
    other than int, unsigned int, and signed int. The declared type might typically be used only to determine the signedness of the bit-field
    (though I *think* most compilers permitted other types).

    Implementations are certainly not *required* to use the declared
    type of a bit-field as a factor in deciding how to allocate it,
    or how to allocate the rest of the structure. Allocating just one
    byte for an isolated 1-bit bit-field of any declared type would
    be conforming. A conforming compiler could use the declared type
    only to determine the signedness and the maximum allowed width of
    a bit-field (and its conversion behavior in the case of bool)

    For example, if I write:
    uint64_t bf : 1;

    then the containing struct is typically at least 64 bits, even
    though those other 63 bits aren't part of the bit field and other
    members can be allocated within them.

    It would make a lot more sense *to me* if an N-bit bit field were
    simply N bits.

    Two problems with that. One, it seems to be in conflict with what
    the C standard says about 0-width bit-fields.

    0-width bit-fields are obviously a special case.

    Two, the C standard
    explicitly allows allocating bit-fields using a high-to-low order or
    a low-to-high order (implementation-defined choice). Presumably
    this freedom is given to accommodate both big- and little-endian
    platforms. The idea that an N-bit bit-field should simply be N bits
    doesn't work in big-endian environments. It seems better to allow little-endian implementations to choose a size that matches what a
    big-endian implementation would use, rather than insisting that they
    be different.

    I honestly don't understand your point here. How does making
    N-bit bit-fields N bits not work in a big-endian environment?
    Can you elaborate? Of course endianness can affect how bit-fields
    are allocated within a "storage unit".
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jun 28 18:28:46 2026
    From Newsgroup: comp.lang.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    For example, this program:

    #include <stdio.h>
    int main(void) {
    const int result = printf("%ld\n", 0.3);
    printf("printf returned %d\n", result);
    }

    on my system prints:

    140732048673560
    printf returned 16

    gcc and clang warn about the format string. tcc doesn't.

    The (first) printf call was apparently successful because printf
    has no way to know that the argument was of an incorrect type
    (types don't really exist at run time).

    printf() could know if an argument were of an incorrect type, if
    an implementation chose to do so.

    Sure. I did write "on my system", where printf has no way to know
    that the argument was of an incorrect type.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Sun Jun 28 20:20:43 2026
    From Newsgroup: comp.lang.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    It would make a lot more sense *to me* if an N-bit bit field were
    simply N bits.
    [...]
    Two, the C standard
    explicitly allows allocating bit-fields using a high-to-low order or
    a low-to-high order (implementation-defined choice). Presumably
    this freedom is given to accommodate both big- and little-endian
    platforms. The idea that an N-bit bit-field should simply be N bits
    doesn't work in big-endian environments. It seems better to allow
    little-endian implementations to choose a size that matches what a
    big-endian implementation would use, rather than insisting that they
    be different.

    I honestly don't understand your point here. How does making
    N-bit bit-fields N bits not work in a big-endian environment?
    Can you elaborate? Of course endianness can affect how bit-fields
    are allocated within a "storage unit".

    Perhaps you read more than I intended into my statement about N-bit
    bit-fields being "simply N bits".

    Thinking about this a bit more.

    As of C90, "A bit-field shall have a type that is a qualified or
    unqualified version of one of int, unsigned int, or signed int."
    The "shall" is outside a constraint, so an implementation could allow bit-fields of other types without triggering a required diagnostic,
    and many implementations did so.

    C99 added _Bool bit-fields, and explicitly allowed "some other implementation-defined type". C23 allows bit-fields of bit-precise
    integer types; I'll avoid thinking about that for now.

    Implementions commonly use the declared type of a bit-field to
    affect the layout, not necessarily of the bit-field itself, but
    of the containing structure. Given that the standard doesn't
    require support for types other than bool and the int types (and
    now bit-precise integer types), the idea that `short bf:1` and
    `long bf:1` have different semantics is not, as far as I can tell,
    implied by anything in the standard.

    I understand that implementations *can* allow other integer types
    in bit-field declarations, and that they can use the declared type
    in implementation-defined ways.

    One possible approach would be to use the declared type only to
    determine the signedness of the bit-field (and its conversion
    behavior in the case of bool), and the upper bound for the number
    of bits (`int bf:33` is a constraint violation if int is 32 bits).
    In this relatively simple approach, there's no point in defining
    a bit-field with one of the char or short types.

    Using gcc on Linux, if I define a 1-bit bit-field with a 64-bit type,
    that forces the containing structure to be at least 64 bits -- but
    not by reserving a 64-bit region to hold the bit-field. If I define
    a struct containing a 1-bit unsigned long long bit-field followed by
    a 1-byte ordinary member, the second member is at a 1-bytes offset.

    I had gotten the impression that the behavior is imposed by ABIs,
    but my copy of the "System V Application Binary Interface AMD64
    Architecture Processor Supplement" just says:

    - bit-fields are allocated from right to left
    - bit-fields must be contained in a storage unit appropriate for
    its declared type
    - bit-fields may share a storage unit with other struct / union
    members

    which doesn't seem to be enough to specify the behavior I see
    (and I find it annoyingly vague).

    Is there a document (ABI, compiler document, whatever) that specifies
    the (odd, to me) behavior I'm seeing?

    Here's a test program:

    #include <stdio.h>
    #include <stddef.h>
    int main(void) {
    struct s1 { unsigned char bf:1; unsigned char c; };
    struct s2 { unsigned short bf:1; unsigned char c; };
    struct s3 { unsigned int bf:1; unsigned char c; };
    struct s4 { unsigned long bf:1; unsigned char c; };
    struct s5 { unsigned long long bf:1; unsigned char c; };

    printf("%-18s %-4s %-6s %s\n",
    "type", "size", "offset", "struct-size");

    printf("%-18s %-4zu %-6zu %-1zu\n",
    "unsigned char",
    sizeof (unsigned char),
    offsetof(struct s1, c),
    sizeof (struct s1));
    printf("%-18s %-4zu %-6zu %-1zu\n",
    "unsigned short",
    sizeof (unsigned short),
    offsetof(struct s2, c),
    sizeof (struct s2));
    printf("%-18s %-4zu %-6zu %-1zu\n",
    "unsigned int",
    sizeof (unsigned int),
    offsetof(struct s3, c),
    sizeof (struct s3));
    printf("%-18s %-4zu %-6zu %-1zu\n",
    "unsigned long",
    sizeof (unsigned long),
    offsetof(struct s4, c),
    sizeof (struct s4));
    printf("%-18s %-4zu %-6zu %-1zu\n",
    "unsigned long long",
    sizeof (unsigned long long),
    offsetof(struct s5, c),
    sizeof (struct s5));
    }

    and its output on my system (Ubuntu, x86_64):

    type size offset struct-size
    unsigned char 1 1 2
    unsigned short 2 1 2
    unsigned int 4 1 4
    unsigned long 8 1 8
    unsigned long long 8 1 8

    Again, the declared type of a bit-field doesn't affect how the
    bit-field itself is allocated, but it does affect the size of the
    containing struct, but it doesn't prevent other members from being
    allocated within that space.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 29 05:41:10 2026
    From Newsgroup: comp.lang.c

    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    [...]

    "Undefined Behavior", in C, in the manner usually discussed in
    this newsgroup, was introduced with the first standard.

    The term but not the concept, which was there since the
    early days of C -- at least since K&R in 1978, and very
    likely earlier (I haven't reviewed any of the earlier
    descriptions of the language).
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jun 29 06:27:13 2026
    From Newsgroup: comp.lang.c

    antispam@fricas.org (Waldek Hebisch) writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    antispam@fricas.org (Waldek Hebisch) writes:
    [...]
    UB in C standard corresponds with 'error' in Pascal standard. [...]

    Does it? In C a syntax error is undefined behavior, but it
    requires a diagnostic. (I don't mean to single out just syntax
    errors; there are other examples.)

    I mean typical UB, especialy cases that people complain about.
    [...]

    Then you should say what you mean, rather than leaving it
    for other people to guess.
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From cross@cross@spitfire.i.gajendra.net (Dan Cross) to comp.lang.c on Mon Jun 29 15:23:31 2026
    From Newsgroup: comp.lang.c

    In article <86mrwd7c49.fsf@linuxsc.com>,
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    cross@spitfire.i.gajendra.net (Dan Cross) writes:

    [...]

    "Undefined Behavior", in C, in the manner usually discussed in
    this newsgroup, was introduced with the first standard.

    The term but not the concept, which was there since the
    early days of C -- at least since K&R in 1978, and very
    likely earlier (I haven't reviewed any of the earlier
    descriptions of the language).

    How much time elapsed before your response?

    If you cannot respond in a timely manner (read: within a week),
    then please do not respond at all.

    That said, not really. I've read K&R, both editions, and the
    first really doesn't define a concept that gives such supreme
    latitude to the compiler. They merely acknowledged that there
    existed things for which they could not give a good behavioral
    definition. The way that UB is defined and used in 2026 was
    absent in K&R in 1978.

    - Dan C.

    --- Synchronet 3.22a-Linux NewsLink 1.2