• Does reading an uninitialized object have undefined behavior?

    From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Thu Jul 20 22:16:01 2023
    From Newsgroup: comp.std.c

    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    I'll use an `int` object in my example.

    Reading an object that holds a non-value representation has undefined
    behavior, but not all integer types have non-value representations
    -- and if an implementation has certain characteristics, we can
    reliably infer that int has no non-value representations (called
    "trap representations" in C99, C11, and C17).

    Consider this program:
    ```
    #include <limits.h>
    int main(void) {
    int foo;
    if (sizeof (int) == 4 &&
    CHAR_BIT == 8 &&
    INT_MAX == 2147483647 &&
    INT_MIN == -INT_MAX-1)
    {
    int bar = foo;
    }
    }
    ```

    If the condition is true (as it is for many real-world
    implementations), then int has no padding bits and no trap
    representations. The object `foo` has an indeterminate representation
    when it's used to initialize `bar`. Since it cannot have a non-value representation, it has an unspecified value.

    If J.2(11) is correct, then the use of the value results in undefined
    behavior.

    But Annex J is non-normative, and as far as I can tell there is no
    normative text in the standard that says the behavior is undefined.

    6.2.4 discusses storage duration.

    6.7.10 discusses initialization; p11 implies that the representation of
    `foo` is indeterminate. It does not say

    6.8 discusses statements and blocks, and repeats that "the
    representation of objects without an initializer becomes
    indeterminate".

    None of these discuss what happens when the value of an object with
    an indeterminate representation is used -- nor does any other text
    I found by searching the standard for "indeterminate representation".

    I see no relevant changes between C11 and C23 (except that C23 changes
    the term "trap representation" to "non-value representation").

    I suggest there are three possible resolutions:

    1. J.2(11) is correct and I've missed something (always a possibility,
    but so far nobody in comp.lang.c has come up with anything).

    2. J.2(11) reflects the intent, and normative text somewhere else
    in the standard needs to be updated or added to make it clear
    that using the value of an object with automatic storage duration
    while the object has an indeterminate representation has undefined
    behavior.

    3. J.2(11) is incorrect and needs to be modified or deleted.
    (This would also imply that compilers may not perform certain
    optimizations. I have no idea whether any compilers would actually
    be affected.)

    I'm going to post this to comp.std.c and email it to the C23 editors.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.std.c on Fri Jul 21 16:33:53 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    I'll use an `int` object in my example.

    Reading an object that holds a non-value representation has undefined behavior, but not all integer types have non-value representations
    -- and if an implementation has certain characteristics, we can
    reliably infer that int has no non-value representations (called
    "trap representations" in C99, C11, and C17).

    Consider this program:
    ```
    #include <limits.h>
    int main(void) {
    int foo;
    if (sizeof (int) == 4 &&
    CHAR_BIT == 8 &&
    INT_MAX == 2147483647 &&
    INT_MIN == -INT_MAX-1)
    {
    int bar = foo;
    }
    }
    ```

    If the condition is true (as it is for many real-world
    implementations), then int has no padding bits and no trap
    representations. The object `foo` has an indeterminate representation
    when it's used to initialize `bar`. Since it cannot have a non-value representation, it has an unspecified value.

    If J.2(11) is correct, then the use of the value results in undefined behavior.

    But Annex J is non-normative, and as far as I can tell there is no
    normative text in the standard that says the behavior is undefined.

    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage class
    (never had its address taken), and that object is uninitialized (not
    declared with an initializer and no assignment to it has been
    performed prior to use), the behavior is undefined."

    seems to cover it. The restriction on not having it's address taken
    seems odd.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Fri Jul 21 17:42:23 2023
    From Newsgroup: comp.std.c

    On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    Personally, I think that the root cause of this whole issue is
    the defective definition of indeterminate value.

    Indeterminacy must be an abstract concept that is not encoded
    in the bits of the object; it is a matter of provenance.

    An indeterminate integer could have a valid bit pattern,
    such as all zero, yet the implementation should be free to terminate
    with a diagnostic (or behave in other ways) when it is accessed.

    It should not be possible to tell whether an object is indeterminate
    by looking at its bits.

    An implementation can track this with meta data. Translation time
    flow-analysis data can catch some uses of uninitialized objects;
    that's how we get classic uninitialized variable warnings.

    An implementation can track uninitialized bits at run-time with
    hidden meta-data. The Valgrind debugging tool does this; for
    every bit, whose value is necessarily always 0 or 1, it tracks
    whether the bit is initialized.

    That poor definition of indeterminate value should go.

    Otherwise the standard is contradicting itself and doing
    silly things like asserting that using an indeterminate value
    is undefined behavior if it is a local variable with automatic
    storage.

    A reasonable definition of indeterminate might be:

    indeterminate

    an abstract status indicating that a value is invalid,
    irrespective of the content of the bits which constitute
    that value.

    An improperly obtained value is indeterminate(1).

    A previously valid value may lapse into indeterminate status.(2)

    Any use of an indeterminate value is undefined behavior.

    --
    (1) For example, a value obtained accessing an uninitialized
    object defined in automatic storage, or in an uninitializeed
    region of memory obtained from malloc

    (2) For example, a pointer to an object becomes indeterminate
    if that object is deallocated.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Fri Jul 21 11:56:00 2023
    From Newsgroup: comp.std.c

    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    I'll use an `int` object in my example.

    Reading an object that holds a non-value representation has undefined
    behavior, but not all integer types have non-value representations
    -- and if an implementation has certain characteristics, we can
    reliably infer that int has no non-value representations (called
    "trap representations" in C99, C11, and C17).

    Consider this program:
    ```
    #include <limits.h>
    int main(void) {
    int foo;
    if (sizeof (int) == 4 &&
    CHAR_BIT == 8 &&
    INT_MAX == 2147483647 &&
    INT_MIN == -INT_MAX-1)
    {
    int bar = foo;
    }
    }
    ```

    If the condition is true (as it is for many real-world
    implementations), then int has no padding bits and no trap
    representations. The object `foo` has an indeterminate representation
    when it's used to initialize `bar`. Since it cannot have a non-value
    representation, it has an unspecified value.

    If J.2(11) is correct, then the use of the value results in undefined
    behavior.

    But Annex J is non-normative, and as far as I can tell there is no
    normative text in the standard that says the behavior is undefined.

    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage class
    (never had its address taken), and that object is uninitialized (not
    declared with an initializer and no assignment to it has been
    performed prior to use), the behavior is undefined."

    seems to cover it. The restriction on not having it's address taken
    seems odd.

    Good find.

    That sentence was added in C11 (it doesn't appear in C99 or in
    N1256, which consists of C99 plus the three Technical Corrigenda)
    in response to DR #338. Since the wording in Annex J goes back to
    C99 in its current form, and to C90 in a slightly different form,
    that can't be what Annex J is referring to. And the statement
    in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
    retroactive justification.

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm

    Yes, that restriction does seem strange. It was inspired by the
    IA64 (Itanium) architecture, which has an extra trap bit in each
    CPU register (NaT, "not a thing"). The "could have been declared
    with the register storage class" wording is there because the IA64
    NaT bit exists only in CPU registers, not in memory.

    An object with automatic storage duration might be stored in an IA64
    CPU register. If the object is not initialized, the register's
    NaT bit would be set. Any attempt to read it would cause a trap.
    Writing it would clear the NaT bit.

    Which means that a hypothetical CPU with something like a NaT bit
    on each word of memory (iAPX 432? i960?) might cause a trap in
    circumstances not covered by that wording -- but it *is* covered
    by the wording in Annex J.

    (Normally, an object whose address is taken can still be stored in
    a CPU register for part of its lifetime. The effect is to forbid
    certain optimizations on I64-like systems.)

    It's tempting to conclude that reading an uninitialized automatic
    object whose address is taken is *not* undefined behavior (https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
    but the standard doesn't say so.

    C90's Annex G (renamed to Annex J in later editions) says:

    The behavior in the following circumstances is undefined:
    [...]
    - The value of an uninitialized object that has automatic storage
    duration is used before a value is assigned (6.5.7).

    6.5.7 discusses initialization, but doesn't say that reading an
    uninitialized object has undefined behave, so the issue is an old one.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.std.c on Fri Jul 21 20:54:36 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    I'll use an `int` object in my example.

    Reading an object that holds a non-value representation has undefined
    behavior, but not all integer types have non-value representations
    -- and if an implementation has certain characteristics, we can
    reliably infer that int has no non-value representations (called
    "trap representations" in C99, C11, and C17).

    Consider this program:
    ```
    #include <limits.h>
    int main(void) {
    int foo;
    if (sizeof (int) == 4 &&
    CHAR_BIT == 8 &&
    INT_MAX == 2147483647 &&
    INT_MIN == -INT_MAX-1)
    {
    int bar = foo;
    }
    }
    ```

    If the condition is true (as it is for many real-world
    implementations), then int has no padding bits and no trap
    representations. The object `foo` has an indeterminate representation
    when it's used to initialize `bar`. Since it cannot have a non-value
    representation, it has an unspecified value.

    If J.2(11) is correct, then the use of the value results in undefined
    behavior.

    But Annex J is non-normative, and as far as I can tell there is no
    normative text in the standard that says the behavior is undefined.

    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage class
    (never had its address taken), and that object is uninitialized (not
    declared with an initializer and no assignment to it has been
    performed prior to use), the behavior is undefined."

    seems to cover it. The restriction on not having it's address taken
    seems odd.

    Good find.

    That sentence was added in C11 (it doesn't appear in C99 or in
    N1256, which consists of C99 plus the three Technical Corrigenda)
    in response to DR #338. Since the wording in Annex J goes back to
    C99 in its current form, and to C90 in a slightly different form,
    that can't be what Annex J is referring to. And the statement
    in Annex J is more general, so we can't quite use 6.3.2.1p2 as a
    retroactive justification.

    Thanks for looking into the history. I was going to do that when I had
    some time.

    There are three relevant clauses in Annex J, and I think we should keep
    them all in mind. Sadly, they are not numbered (until C23) so I've
    given then 'UB' numbers taken from the similar wording in C23.

    — The value of an object with automatic storage duration is used while
    it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]

    — A trap representation is read by an lvalue expression that does not
    have character type (6.2.6.1). [UB-12]

    — An lvalue designating an object of automatic storage duration that
    could have been declared with the register storage class is used in
    a context that requires the value of the designated object, but the
    object is uninitialized. (6.3.2.1). [UB-20]

    Clearly, UB-20 is explained by the quote I posted, but UB-11 (the one we
    are talking about) is there as well and, as you say, can't be fully
    explained by that normative quote.

    https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm

    Yes, that restriction does seem strange. It was inspired by the
    IA64 (Itanium) architecture, which has an extra trap bit in each
    CPU register (NaT, "not a thing"). The "could have been declared
    with the register storage class" wording is there because the IA64
    NaT bit exists only in CPU registers, not in memory.

    Thanks. I wondered if might have been some hardware consideration...

    An object with automatic storage duration might be stored in an IA64
    CPU register. If the object is not initialized, the register's
    NaT bit would be set. Any attempt to read it would cause a trap.
    Writing it would clear the NaT bit.

    Which means that a hypothetical CPU with something like a NaT bit
    on each word of memory (iAPX 432? i960?) might cause a trap in
    circumstances not covered by that wording -- but it *is* covered
    by the wording in Annex J.

    It's covered by UB-12 and that's backed up by normative text,
    specifically paragraph 5 of the section cited in UB-12.

    (Normally, an object whose address is taken can still be stored in
    a CPU register for part of its lifetime. The effect is to forbid
    certain optimizations on I64-like systems.)

    It's tempting to conclude that reading an uninitialized automatic
    object whose address is taken is *not* undefined behavior (https://en.wikipedia.org/wiki/Exception_that_proves_the_rule),
    but the standard doesn't say so.

    But it doesn't say that it is UB either, does it? That case is excluded
    in 6.3.2.1 p2, but there's not else covering it but the non-normative
    Annex J.

    C90's Annex G (renamed to Annex J in later editions) says:

    The behavior in the following circumstances is undefined:
    [...]
    - The value of an uninitialized object that has automatic storage
    duration is used before a value is assigned (6.5.7).

    6.5.7 discusses initialization, but doesn't say that reading an
    uninitialized object has undefined behave, so the issue is an old one.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Fri Jul 21 14:26:20 2023
    From Newsgroup: comp.std.c

    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    There are three relevant clauses in Annex J, and I think we should keep
    them all in mind. Sadly, they are not numbered (until C23) so I've
    given then 'UB' numbers taken from the similar wording in C23.

    — The value of an object with automatic storage duration is used while
    it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]

    — A trap representation is read by an lvalue expression that does not
    have character type (6.2.6.1). [UB-12]

    — An lvalue designating an object of automatic storage duration that
    could have been declared with the register storage class is used in
    a context that requires the value of the designated object, but the
    object is uninitialized. (6.3.2.1). [UB-20]
    [...]
    An object with automatic storage duration might be stored in an IA64
    CPU register. If the object is not initialized, the register's
    NaT bit would be set. Any attempt to read it would cause a trap.
    Writing it would clear the NaT bit.

    Which means that a hypothetical CPU with something like a NaT bit
    on each word of memory (iAPX 432? i960?) might cause a trap in
    circumstances not covered by that wording -- but it *is* covered
    by the wording in Annex J.

    It's covered by UB-12 and that's backed up by normative text,
    specifically paragraph 5 of the section cited in UB-12.

    I don't think so. A "non-value representation" (formerly a "trap representation") is determined by the bits making up the representation
    of an object. For an integer type, such a representation can occur only
    if the type has padding bits. The IA64 NaT bit is not part of the representation; it's neither a value bit nor a padding bit.

    For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
    defined as a set of 8 bytes that can be copied into an object of type
    `unsigned char[8]`. The NaT bit does not contribute to the size of the
    object.

    I think the right way for C to permit NaT-like bits is, as Kaz
    suggested, to define "indeterminate value" in terms of provenance,
    not just the bits that make up its current representation.
    An automatic object with no initialization, or a malloc()ed object,
    starts with an indeterminate value, and accessing that value
    (other than as an array of characters) has undefined behavior.
    (This is a proposal, not what the standard currently says.)
    IA64 happens to have a way of (partially) representing that
    provenance in hardware, outside the object in question. Other or
    future architectures might do a more complete job.

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.std.c on Fri Jul 21 23:39:42 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    [...]
    There are three relevant clauses in Annex J, and I think we should keep
    them all in mind. Sadly, they are not numbered (until C23) so I've
    given then 'UB' numbers taken from the similar wording in C23.

    — The value of an object with automatic storage duration is used while >> it is indeterminate (6.2.4, 6.7.9, 6.8). [UB-11]

    — A trap representation is read by an lvalue expression that does not
    have character type (6.2.6.1). [UB-12]

    — An lvalue designating an object of automatic storage duration that
    could have been declared with the register storage class is used in
    a context that requires the value of the designated object, but the
    object is uninitialized. (6.3.2.1). [UB-20]
    [...]
    An object with automatic storage duration might be stored in an IA64
    CPU register. If the object is not initialized, the register's
    NaT bit would be set. Any attempt to read it would cause a trap.
    Writing it would clear the NaT bit.

    Which means that a hypothetical CPU with something like a NaT bit
    on each word of memory (iAPX 432? i960?) might cause a trap in
    circumstances not covered by that wording -- but it *is* covered
    by the wording in Annex J.

    It's covered by UB-12 and that's backed up by normative text,
    specifically paragraph 5 of the section cited in UB-12.

    I don't think so. A "non-value representation" (formerly a "trap representation") is determined by the bits making up the representation
    of an object. For an integer type, such a representation can occur only
    if the type has padding bits. The IA64 NaT bit is not part of the representation; it's neither a value bit nor a padding bit.

    For a 64-bit integer type, given CHAR_BIT==8, its *representation* is
    defined as a set of 8 bytes that can be copied into an object of type `unsigned char[8]`. The NaT bit does not contribute to the size of the object.

    Ah, right. I thought you were including it as a padding bit.

    I think the right way for C to permit NaT-like bits is, as Kaz
    suggested, to define "indeterminate value" in terms of provenance,
    not just the bits that make up its current representation.
    An automatic object with no initialization, or a malloc()ed object,
    starts with an indeterminate value, and accessing that value
    (other than as an array of characters) has undefined behavior.
    (This is a proposal, not what the standard currently says.)
    IA64 happens to have a way of (partially) representing that
    provenance in hardware, outside the object in question. Other or
    future architectures might do a more complete job.

    [...]

    That would work.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Sat Jul 22 06:40:39 2023
    From Newsgroup: comp.std.c

    On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage class
    (never had its address taken), and that object is uninitialized (not
    declared with an initializer and no assignment to it has been
    performed prior to use), the behavior is undefined."

    seems to cover it. The restriction on not having it's address taken
    seems odd.

    Wording like that looks like someone's solo documentation effort,
    not peer-reviewed by an expert commitee.

    That looks as if the intent is to allow some diagnoses of uses of
    uninitialized variables, while discouraging others.

    However, it doesn't seem a good idea to be constraining
    implementations in how clever they can be in identifying
    an erroneous situation.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Sat Jul 22 06:03:53 2023
    From Newsgroup: comp.std.c

    On Saturday, July 22, 2023 at 8:40:42 AM UTC+2, Kaz Kylheku wrote:
    On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been
    performed prior to use), the behavior is undefined."

    seems to cover it. The restriction on not having it's address taken
    seems odd.
    Wording like that looks like someone's solo documentation effort,
    not peer-reviewed by an expert commitee.

    That looks as if the intent is to allow some diagnoses of uses of uninitialized variables, while discouraging others.

    However, it doesn't seem a good idea to be constraining
    implementations in how clever they can be in identifying
    an erroneous situation.
    I personally like this rule (but I am speaking about me. there is
    no full consensus about the exact interpretation of the standard
    nor about what it should say). I will try to explain why.
    In C, we also can access objects using character points. This
    should work in all cases, even for non-value (trap) representations,
    and is also used in practice a lot to copy uninitialized or partially initialized objects. If one makes all reads of objects with
    indeterminate representation have undefined behavior, than
    this would not work anymore.
    If one wants to allow this (and a lot of real-world programs rely
    on this), then one has to invent rules how this works with an
    abstract (provenance-based) notion of indeterminate values.
    This turns out to be difficult.
    But if we keep this rule, it becomes very simple: On the one
    hand, all reads of uninitialized automatic variables whose
    address is not taken are undefined behavior. This is the most
    useful behavior for detecting bugs and/or optimization.
    On the other hand, taking an address and working with character
    pointer to copy or manipulate an object is always defined, one
    simply gets unspecified representation bytes (which may be
    a non-value representation for some type and it is UB to
    read them using a lvalue of this type). So low-level operations
    with partially initialized objects work as expected without having
    to introduce complicated rules.
    It will cost a tiny bit of optimization opportunities, but avoid
    a lot of trouble.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jakob Bohm@jb-usenet@wisemo.com.invalid to comp.std.c on Mon Jul 24 07:53:59 2023
    From Newsgroup: comp.std.c

    On 2023-07-21 19:42, Kaz Kylheku wrote:
    On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    Personally, I think that the root cause of this whole issue is
    the defective definition of indeterminate value.


    The problem is much deeper than that. It all boils down to the
    obsession in the official C community to abuse the concept of
    "undefined" to cover everything from "arbitrary natural semantics
    of the hardware" to "optimizing away code unexpectedly" . It would
    be highly beneficial to a cleanup in C30 or even a corrective TR to
    split up the concept into explicit cases that vary for each
    situation. For example, runtime error reporting should be very
    different from optimizing away code that may encounter runtime
    errors on different hardware than the one it is actually run on.

    From a simplified conceptual machine model that resembles a modern
    von Neumann architecture with only floating point types having
    actual trap representations, a lot of rules that have at various
    times been rephrased using the word "undefined" seem utterly absurd,
    and applying the current meaning of "undefined" back to the
    actual machines that inspired them will tend to cause even more absurdities.

    For example that ability of the IA64 CPUs to raise an actual trap
    exception in response to reading an uninitialized register is very
    different from aggressively optimizing away code that might use an
    unknown stray value, especially with the aggressive optimization
    settings required by the IA64 Explicitly Parallel design.


    Some of the things that "undefined" in the current text could map
    to:

    - anyof(A,B,C) = An implementation specific and possibly uncontrolled
    choice between A, B and C (with no others permitted).
    - Continuing as if nothing happened
    - Aborting execution, possibly with an error indication.
    - raise(X) where X is specified in the standard.
    - An implementation specific value to be listed in the
    implementation documentation.
    - A standard specified value.
    - Executing machine code at a specified memory address in accordance
    with the actual machine behavior (This is common for calling
    a function pointer that isn't set to a C function of proper type).
    - Causing the code to be eliminated (think assume(0);)
    - Reserved for future standardization in future editions.
    - Reserved for standardization in other ISO documents (such as POSIX
    or C++).
    - Reserved for implementation specific behavior to be listed in the
    implementation documentation.

    For example, the effect of calling assert() with a false value is "anyof(continuing as if nothing, abort with error)", with it being implementation defined how to force either choice (many
    implementations will use the status of the DEBUG define).


    There should also be a way for limits.h (one of the few headers
    required in free-standing implementations) to specify via new
    standard defines if the implementation conforms to common sets
    of implementation specific behaviors such as "twos complement int
    with wraparound", "ones complement int with wraparound", "sign
    and magnitude int with wraparound", "unsigned with wraparound",
    "IEEE nnnn floating point with/without overflow exceptions",
    "negative int division by positive int rounds towards zero"
    (and the other possibilities for division special cases) etc. etc.


    Enjoy

    Jakob
    --
    Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
    Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
    This public discussion message is non-binding and may contain errors.
    WiseMo - Remote Service Management for PCs, Phones and Embedded
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Tue Jul 25 21:53:06 2023
    From Newsgroup: comp.std.c

    Martin Uecker <ma.uecker@gmail.com> writes:

    On Saturday, July 22, 2023 at 8:40:42?AM UTC+2, Kaz Kylheku wrote:

    On 2023-07-21, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage
    class (never had its address taken), and that object is
    uninitialized (not declared with an initializer and no
    assignment to it has been performed prior to use), the behavior
    is undefined."

    seems to cover it. The restriction on not having it's address
    taken seems odd.

    [...]

    I personally like this rule (but I am speaking about me. there is
    no full consensus about the exact interpretation of the standard
    nor about what it should say). I will try to explain why. [...]

    It's a good rule. I agree with your comments. I guess it's
    possible the wording could be improved, but compared to other
    parts of the C standard the clarity of this passage is closer to
    the top than it is to the bottom.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Tue Jul 25 21:57:20 2023
    From Newsgroup: comp.std.c

    Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

    On 2023-07-21 19:42, Kaz Kylheku wrote:

    On 2023-07-21, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:

    N3096 is the last public draft of the upcoming C23 standard.

    N3096 J.2 says:

    The behavior is undefined in the following circumstances:
    [...]
    (11) The value of an object with automatic storage duration is
    used while the object has an indeterminate representation
    (6.2.4, 6.7.10, 6.8).

    Personally, I think that the root cause of this whole issue is
    the defective definition of indeterminate value.

    The problem is much deeper than that. It all boils down to the
    obsession in the official C community to abuse the concept of
    "undefined" to cover everything from "arbitrary natural semantics
    of the hardware" to "optimizing away code unexpectedly" . [...]

    This discussion looks interesting but it seems better that
    there be a separate thread to take it up.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Thu Aug 3 13:13:26 2023
    From Newsgroup: comp.std.c

    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Background: Annex J part 2 says (in various phrasings in
    different revisions of the C standard, with the one below
    being taken from C90):

    The value of an uninitialized object that has automatic
    storage duration is used before a value is assigned [is
    undefined behavior] (6.5.7)

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?

    I think this question can be answered convincingly by reviewing
    the subject's history in each revision of the ISO C standard.


    We start in C90.

    In C90 reading the value of an uninitialized object is always
    undefined behavior (and that includes malloc()ed storage as well
    as automatic storage duration objects). The C90 standard says,
    in 6.5.7:

    If an object that has automatic storage duration is not
    initialized explicitly, its value is indeterminate.

    and in 7.10.3.3:

    The malloc function allocates space for an object whose size
    is specified by size and whose value is indeterminate.

    The term "indeterminate" is not defined in C90, but accessing
    storage that is indeterminate is explicitly undefined behavior.
    Indeed such uses are part of the /definition/ of undefined
    behavior - C90 says in 3.16 (which is an entry in Definitions):

    undefined behavior: Behavior, upon use of a nonportable or
    erroneous program construct, of erroneous data, or of
    indeterminately valued objects, for which this International
    Standard imposes no requirements.

    So for C90 we have a clear answer: always undefined behavior for
    accessing any uninitialized object.

    Unfortunately the C90 scheme has some serious issues. There is
    no exception for reading using a character type. More seriously,
    although C90 gives some situations that cause values to be
    indeterminate, it doesn't say anything about making them /not/
    be indeterminate. We can guess (but only guess) that assigning
    a value to the object as a whole removes indeterminate-ness, but
    what about these cases (and other similar ones):

    int x;
    *(char*)&x = 0;
    // is the value of x now indeterminate or not?

    struct { int x, y; } s;
    s.x = 0;
    // is the value of s now indeterminate or not?

    Again, we can make guesses about what these answers should be,
    but the C90 standard doesn't say. Clearly C90 has some
    significant deficiencies.


    Next we look at C99.

    (Actually, before we do that, I should mention that C90 was
    amended and corrected in 1994, 1995, and 1996, by the three
    intermediate documents ISO/IEC 9899/COR1, ISO/IEC 9899/AMD1, and
    ISO/IEC 9899/COR2. As far as I am aware these revisions have no
    bearing on the matter at hand.)

    The C99 standard represents a substantial revision and expansion
    of the C90 standard. The relationship between uninitialized
    memory and undefined behavior is nearly completely rewritten, and
    also made more concrete. There's lots to look at here. Starting
    at the top, the definition of undefined behavior is revised not
    to give any mention of indeterminately valued objects. Here is
    section 3.4.3 paragraph 1:

    undefined behavior
    behavior, upon use of a nonportable or erroneous program
    construct or of erroneous data, for which this International
    Standard imposes no requirements

    (Incidentally the section and paragraph references given in this
    part of the discussion are relative to the ISO N1256 document.)

    The next most prominent change is that "indeterminate value" is
    explicitly defined, in section 3.17.2 paragraph 1:

    indeterminate value
    either an unspecified value or a trap representation

    This definition makes use of two new terms, "unspecified value"
    and "trap representation", that were not used in C90. The term
    unspecified value is defined immediately following, in 3.17.3 p1:

    unspecified value
    valid value of the relevant type where this International
    Standard imposes no requirements on which value is chosen in
    any instance

    There is also an informative note in p2:

    NOTE An unspecified value cannot be a trap representation.

    The term "trap representation" is defined in 6.2.6.1 p5:

    Certain object representations need not represent a value of
    the object type. If the stored value of an object has such a
    representation and is read by an lvalue expression that does
    not have character type, the behavior is undefined. If such
    a representation is produced by a side effect that modifies
    all or any part of the object by an lvalue expression that
    does not have character type, the behavior is undefined.41)
    Such a representation is called a /trap representation/.

    The slant characters around "trap representation" indicate
    italics, which the C standard uses to denote a term being
    defined. Also there is a '41)' footnote reference

    41) Thus, an automatic variable can be initialized to a trap
    representation without causing undefined behavior, but the
    value of the variable cannot be used until a proper value is
    stored in it.

    which underscores the non-undefined-behavior aspect of using
    character types to change the object representation (and hence
    the value) of an object.

    The C99 text doesn't use the term "trap representation" very
    often. There are several cases where certain types are ruled out
    from having trap representations; a few cases where a result
    /might be/ a trap representation; and a case involving integer
    types where there is an implementation-defined choice as to
    whether a specific combination of value bits is a valid value or
    a trap representation. Also, in Annex J part 2, the list of
    undefined behaviors, there are these summary items:

    A trap representation is read by an lvalue expression that
    does not have character type (6.2.6.1).

    A trap representation is produced by a side effect that
    modifies any part of the object using an lvalue expression
    that does not have character type (6.2.6.1).

    which of course correspond directly to what is said in the
    definition of trap representation. Based on various passages in
    section 6.2.6, which describes the representation of types, we
    can deduce that for some integer types all bit combinations must
    be a valid value, and so no trap representations are possible for
    those types. Such types always include 'unsigned char', and may
    also include other integer types depending on the size of the
    type, the value of CHAR_BIT, and the values given in <limits.h>
    for the range of the type in question. (More concretely, if the
    set of distinct values for type T has 2**(sizeof(T)*CHAR_BIT)
    elements, then all object representations are valid values, and
    thus type T cannot have any trap representations.)

    There are three points worth mentioning regarding unspecified
    values and trap representations. One is that unspecified values
    are always valid values, and never by themselves cause undefined
    behavior. Two is that the distinction between an unspecified
    value and a trap representation depends on the type used to
    access the object. Three is that, once we know the type of an
    access, whether a given object holds a valid value or a trap
    representation depends only on the bits and bytes that make up
    the object representation of the object, and in particular not on
    any hidden "magic" state associated with the object. (There is
    one case though that deserves a closer look, which is explained
    further on.)

    The rule for trap representations is simple and clear: any
    access of an object whose object representation is a trap
    representation of the access's type is undefined behavior, and
    this consequence is accurately portrayed in Annex J part 2.

    Having settled the question for trap representations, how about
    indeterminate values?

    Ruling out the definition and an entry in the index, the term
    "indeterminate value" (or values plural) appears in just six
    places in the C99 standard: three in informative passages
    (usually examples), and three normative passages, those being
    6.7.8 paragraph 9 (about unnamed members), 6.8 paragraph 3 (about
    declarations for objects with automatic storage duration), and
    7.20.3.4 paragraph 2 (about bytes added by a call to realloc()).
    The sentence in 6.8 paragraph 3 deserves quoting:

    The initializers of objects that have automatic storage
    duration, and the variable length array declarators of
    ordinary identifiers with block scope, are evaluated and the
    values are stored in the objects (including storing an
    indeterminate value in objects without an initializer) each
    time the declaration is reached in the order of execution, as
    if it were a statement, and within each declaration in the
    order that declarators appear.

    Section 7 has many places where the word "indeterminate" appears
    without being followed by "value". I think most of these can be
    safely skipped over, but the description of malloc() deserves
    quoting (it is 7.20.3.3 paragraph 2):

    The malloc function allocates space for an object whose size
    is specified by size and whose value is indeterminate.

    Presumably the sentence here is meant to express the same idea
    as the parallel passage describing the results from realloc(),
    which says (in 7.20.3.4 paragraph 2):

    Any bytes in the new object beyond the size of the old object
    have indeterminate values.

    The word "indeterminate" without being followed by "value"
    is used in just six other places in the standard: five in the
    main body (all of which are part of normative text), plus one
    entry in Annex J part 2 (which is of course informative). The
    normative uses may be seen to be in two categories, as follows.

    Four of the five normative uses are basically restatements of the
    long sentence from 6.8 paragraph 3; they are in 6.2.4 paragraph 5
    (two uses) and paragraph 6, and 6.7.8 paragraph 10. Here are
    excerpts showing these four occurrences (all of which refer to
    objects with automatic storage duration):

    The initial value of the object is indeterminate.

    [if an object had no initializer] the value becomes
    indeterminate each time the declaration is reached.

    The initial value of the object is indeterminate.

    If an object that has automatic storage duration is not
    initialized explicitly, its value is indeterminate.

    Although these passages use different phrasing, it seems clear
    they are meant to mirror the parenthetical phrase in 6.8 p3,
    "storing an indeterminate value in objects without an
    initializer"; presumably the difference in phrasing simply
    reflects the styles of the respective sections: 6.8 gives an
    imperative description, whereas 6.2.4 and 6.7 tend to be more
    declarative in style. (The last of these excerpts matches
    word-for-word with the analogous sentence in C90.) That the C99
    standard considers these five passages as expressing the same
    idea can be seen by them all being referenced in a single entry
    given in Annex J part 2:

    The value of an object with automatic storage duration is
    used while it is indeterminate (6.2.4, 6.7.8, 6.8).

    Compare this text with the corresponding entry in C90. One
    reason for the difference is that in C99, unlike in C90, an
    object can become "unassigned" after it is first assigned (which
    is a consequence in C99 of being able to mix declarations and
    statements). So rather than say "before a value is assigned"
    the C99 standard says "while it is indeterminate".

    The one other place where the word "indeterminate" is used
    without being followed by "value" is in 6.2.4 paragraph 2:

    The value of a pointer becomes indeterminate when the object
    it points to reaches the end of its lifetime.

    (The analogous sentence in C90 says basically the same but using
    different phrasing, partly because C90 doesn't have any explicit
    definition of "lifetime", which of course C99 does.)

    There is a corresponding entry for this passage in Annex J part 2
    (and which actually doesn't use the word indeterminate):

    The value of a pointer to an object whose lifetime has ended
    is used (6.2.4).

    There is a subtle but important difference between this rule and
    the other passages mentioned above. In all of the other cases
    there is a specific object being referenced. In the rule here,
    we aren't talking about a particular object, nor even just one
    object necessarily (there could be many), but possibly about
    values that aren't in an object at all. Consider this code
    fragment:

    char *p = malloc( 1 );
    char *q = p + (free(p),0);

    It seems clear that the second line is meant to be undefined
    behavior /even if the (leftmost) access of p has already taken
    place before the call to free() is done/. It isn't an access to
    an object (whether indeterminate or not) that is causing the
    problem. Rather, it is the use of a value -- valid at the time
    the value was obtained -- that has been rendered /invalid/
    between the time the value was loaded from p and the time the
    value is used in a '+' operation.

    Of course, we all understand what's really going on here. In
    real computer hardware, the bits of a pointer value don't
    magically change when a free() is done (or when an object goes
    out of scope and its lifetime ends, etc). Instead, the bits stay
    the same, but whether the bits are meaningful or not (or whether
    they have the same meaning as before) depends on the state of the
    "memory system" as a whole. The term "memory system" is in
    quotes because it is meant to include not just state in the
    actual hardware but also assumptions made by the compiled code;
    a pointer to memory in a departed stack frame may be perfectly
    fine as far as the hardware is concerned, but it violates an
    assumption made by the compiler that the associated memory may
    be (or already have been) reused for another purpose.

    One problem with this understanding is that it isn't amenable to
    being expressed in the language of the abstract machine. So C99
    glosses over the problem by saying "the value of a pointer
    becomes indeterminate when ...", disregards what the definition
    of "indeterminate value" says, and then pretends (in Annex J.2)
    that using any such value is undefined behavior. The text in the
    standard is very clear: reading a trap representation is always
    undefined behavior (unless accessed using a character type).
    There is nothing in the normative text of the standard that says
    accessing an indeterminate value is undefined behavior. In fact,
    if we take the text of the standard at its word, /every/ object
    has an indeterminate value, because every object representation
    is either a valid value or a trap representation.

    If we ignore pointer types we have an answer to our question:
    any type that has no trap representations never causes undefined
    behavior by being accessed. Then why does the entry in Annex J.2
    give a blanket statement that any use is undefined behavior? A
    reasonable guess is that entries in Annex J are meant to provide
    useful shorthands without necessarily being completely accurate
    (consider for example that the exception for access done using a
    character type is not mentioned in the Annex J.2 entry -- a clear
    omission).

    There is more to say about pointer types. Considering how long
    this memo is already it seems better to defer that to a separate
    posting.


    Next we look at C11.

    With respect to the question being considered, the C11 standard
    is almost exactly the same as the C99 standard. There are two
    differences. First, there is a cosmetic change in that the term
    "trap representation" is given a summary definition in section
    3.19.4; the paragraph in 6.2.6 where "trap representation" was
    previously defined in C99 is unchanged except that in C11 there
    are no italics.

    The second difference is not a revision but an addition. In
    section 6.3.2.1 paragraph 2, talking about lvalue conversion, one
    sentence has been added at the end of the paragraph:

    If the lvalue designates an object of automatic storage
    duration that could have been declared with the register
    storage class (never had its address taken), and that object
    is uninitialized (not declared with an initializer and no
    assignment to it has been performed prior to use), the
    behavior is undefined.

    Naturally there is a corresponding entry that has been added to
    Annex J.2:

    An lvalue designating an object of automatic storage
    duration that could have been declared with the register
    storage class is used in a context that requires the value
    of the designated object, but the object is uninitialized.
    (6.3.2.1).

    The motivation for this new rule reportedly reflects hardware
    behavior, on some more recent chips, for some stack-allocated
    variables. The added text has several points worth noting.

    One, the rule adds a specific, narrow case of undefined behavior
    that is simple and clearly delineated.

    Two, it does not use the term "indeterminate" or "indeterminate
    value". Instead the rule is written in terms of initialization
    and assignment. By avoiding "indeterminate", it avoids any
    uncertainty about whether undefined behavior must result from
    using an indeterminate value.

    Three, it provides indirect evidence that use of an indeterminate
    value is not necessarily undefined behavior, because if it were
    then this new rule would not be necessary.

    Four, the condition of undefined behavior is expressed using
    imperative phrasing: what matters is what has been done, or not
    done, to the object in question. This choice makes this rule a
    supplement, not a replacement, for 6.8 p3 et al. Consider this
    example function definition:

    double
    example( double in ){
    unsigned yet = 0;
    redux: ;
    double d;
    if( !yet ){
    d = in;
    yet++;
    goto redux;
    }
    return d;
    }

    The use of 'd' in 'return d;' might give undefined behavior,
    because 'd' may have a trap representation under 6.8 p3. But
    the code doesn't violate the conditions of 6.3.2.1 p2, because
    an assignment has been done before the lvalue conversion in the
    final statement; the intervening evaluation of 'double d;'
    doesn't change that. Note also that the clause in 6.8 p3 for
    such declarations, "storing an indeterminate value in objects
    without an initializer", does not interfere with the application
    of the rule in 6.3.2.1 p2, because that rule is written in terms
    of assignment, and not in terms of storing a value (which may
    have been done because of the parenthetical phrase in 6.8 p3).


    After C11

    I have not taken the time to review at the C17 standard or the
    C23 draft standard while researching the topic here. I see that
    some changes have been made (such as "non-value representation"
    for "trap representation"), but to the best of my knowledge none
    of the key passages are substantively different. I may check on
    that later (but no promises on when or whether).


    Summary: my reading is that accessing an object that has not
    been explicitly stored into since its declaration was evaluated
    is necessarily undefined behavior in C90, but not necessarily
    undefined behavior in C99 and C11 (and AFAIAA also in C17 and
    the upcoming C23). My reasoning is given in detail above.


    Postscript: this commentary has taken much longer to write than
    I thought it would, for the most part because I made an early
    decision to be systematic and thorough. I hope the effort has
    helped the readers gain confidence in the explanations and
    conclusions stated. I may return to the deferred topic about
    pointer types but have no plans at present about when that might
    be.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Thu Aug 3 15:20:14 2023
    From Newsgroup: comp.std.c

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Background: Annex J part 2 says (in various phrasings in
    different revisions of the C standard, with the one below
    being taken from C90):

    The value of an uninitialized object that has automatic
    storage duration is used before a value is assigned [is
    undefined behavior] (6.5.7)

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?
    [400+ lines deleted]
    Summary: my reading is that accessing an object that has not
    been explicitly stored into since its declaration was evaluated
    is necessarily undefined behavior in C90, but not necessarily
    undefined behavior in C99 and C11 (and AFAIAA also in C17 and
    the upcoming C23). My reasoning is given in detail above.


    Postscript: this commentary has taken much longer to write than
    I thought it would, for the most part because I made an early
    decision to be systematic and thorough. I hope the effort has
    helped the readers gain confidence in the explanations and
    conclusions stated. I may return to the deferred topic about
    pointer types but have no plans at present about when that might
    be.

    Thank you for taking the time to write that.

    I'd like to offer a brief summary of the points you made. Please let me
    know if my summary is incorrect.

    - An "indeterminate value" is by definition either an "unspecified
    value" or a "trap representation".

    - In C90 (which did not yet define all these terms), accessing the value
    of an uninitialized object explicitly has undefined behavior.

    - In C99 and later, J.2 (which is *not* normative) states that using the
    value of an object with automatic storage duration while it is
    indeterminate has undefined behavior. This implies that:
    int main(void) {
    int n;
    n;
    }
    has undefined behavior, even if int has no trap representations.

    - Statements in J.2 *should* be supported by normative text.

    - There is no normative text in any post-C90 edition of the C
    standard that supports the claim that reading an uninitialized
    int object actually has undefined behavior if it does not hold
    a trap representation. (Pointers raise other issues, which I'll
    ignore for now.)

    - The cited statement in J.2 is incorrect, or at least imprecise.

    I agree with you on all the above points.

    There is one point on which I think we disagree. It is a matter
    of opinion, not of fact. You wrote:

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?

    The statement in N1570 J.2 is:

    The behavior is undefined in the following circumstances:
    [...]
    - The value of an object with automatic storage duration is used
    while it is indeterminate (6.2.4, 6.7.9, 6.8).

    I get the impression that you're not particularly bothered by the fact
    that the statement in J.2 is merely an "approximation". In my opinion,
    the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
    that Annex J is, to quote the standard's foreword, "for information
    only", is not an excuse to ignore factual errors. Readers of the
    standard rely on the informative annexes to provide correct information.
    This particular text is not just a "(perhaps useful) approximation"; it
    is actively misleading.

    I'm not criticizing the author of the standard for making this mistake.
    Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Sat Aug 5 01:15:46 2023
    From Newsgroup: comp.std.c

    On Friday, August 4, 2023 at 12:20:25 AM UTC+2, Keith Thompson wrote:
    Tim Rentsch <tr.1...@z991.linuxsc.com> writes:
    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Background: Annex J part 2 says (in various phrasings in
    different revisions of the C standard, with the one below
    being taken from C90):

    The value of an uninitialized object that has automatic
    storage duration is used before a value is assigned [is
    undefined behavior] (6.5.7)

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?
    [400+ lines deleted]
    Summary: my reading is that accessing an object that has not
    been explicitly stored into since its declaration was evaluated
    is necessarily undefined behavior in C90, but not necessarily
    undefined behavior in C99 and C11 (and AFAIAA also in C17 and
    the upcoming C23). My reasoning is given in detail above.


    Postscript: this commentary has taken much longer to write than
    I thought it would, for the most part because I made an early
    decision to be systematic and thorough. I hope the effort has
    helped the readers gain confidence in the explanations and
    conclusions stated. I may return to the deferred topic about
    pointer types but have no plans at present about when that might
    be.
    Thank you for taking the time to write that.

    I'd like to offer a brief summary of the points you made. Please let me
    know if my summary is incorrect.

    - An "indeterminate value" is by definition either an "unspecified
    value" or a "trap representation".

    - In C90 (which did not yet define all these terms), accessing the value
    of an uninitialized object explicitly has undefined behavior.

    - In C99 and later, J.2 (which is *not* normative) states that using the value of an object with automatic storage duration while it is
    indeterminate has undefined behavior. This implies that:
    int main(void) {
    int n;
    n;
    }
    has undefined behavior, even if int has no trap representations.

    - Statements in J.2 *should* be supported by normative text.

    - There is no normative text in any post-C90 edition of the C
    standard that supports the claim that reading an uninitialized
    int object actually has undefined behavior if it does not hold
    a trap representation. (Pointers raise other issues, which I'll
    ignore for now.)

    - The cited statement in J.2 is incorrect, or at least imprecise.

    I agree with you on all the above points.

    There is one point on which I think we disagree. It is a matter
    of opinion, not of fact. You wrote:

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?
    The statement in N1570 J.2 is:
    The behavior is undefined in the following circumstances:
    [...]
    - The value of an object with automatic storage duration is used
    while it is indeterminate (6.2.4, 6.7.9, 6.8).

    I get the impression that you're not particularly bothered by the fact
    that the statement in J.2 is merely an "approximation". In my opinion,
    the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
    that Annex J is, to quote the standard's foreword, "for information
    only", is not an excuse to ignore factual errors. Readers of the
    standard rely on the informative annexes to provide correct information. This particular text is not just a "(perhaps useful) approximation"; it
    is actively misleading.

    I'm not criticizing the author of the standard for making this mistake. Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.
    I personally agree with this analysis and also about the need to fix J.2. Pointers seem to fit into this scheme if you think about the valid
    addresses of objects + null pointers as the set of valid values
    for a pointer. Any representation not corresponding to such an
    address is then a non-value representation.
    But note that there are many people who believe that "indeterminate"
    should be understood as an abstract property propagated similar
    to pointer provenance that can be an abstract non-value
    representation even for types which do not have room for such
    representations.
    For C23 the rules stay the same. We changed the term "trap representation"
    to "non-value representation" because people were often confused.
    A non-value representation is UB in lvalue conversion but this does
    not necessarily imply a trap. On the other hand, a trap might be
    defined behavior caused by a valid value of a type.
    The term "indeterminate value" was changed to "indeterminate
    representation" because the wording "an indeterminate value is
    either an unspecified value or a trap representation" does not
    much sense because value and representation are different
    things. Also some compilers and also C++ have indeterminate
    values with different semantics, which caused confusion, i.e.
    in C++ you can copy indeterminate values from an uninitialized
    object to another and this is not UB. In C you either directly
    have UB or you copy an unspecified value which is valid, so
    there are no indeterminate values as such.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Sat Aug 12 17:00:40 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    I think the right way for C to permit NaT-like bits is, as Kaz
    suggested, to define "indeterminate value" in terms of provenance,
    not just the bits that make up its current representation. [...]

    This idea is fundamentally wrong. NaT bits are associated with
    particular areas of memory, which is to say objects. The point
    of provenance is that non-viability is associated with /values/,
    not with objects. Once an area of memory acquires an object
    representation, the NaT bit or NaT bits for that memory are set
    to zero, end of story. Note also that NaT bits are independent
    of what type is used to access an object - if the NaT bit is set
    then any access is illegal, no matter what type is used to do the
    access. By contrast, provenance is used in situations where
    non-viability is associated with values, not with objects. But
    values are always type dependent; a pointer object that holds
    a value that has been passed to free() is "indeterminate" when
    accessed as a pointer type, but perfectly okay to access as an
    unsigned char type. The two kinds of situations are essentially
    different, and the theoretical models used to characterize the
    rules in the two kinds of situations should therefore be
    correspondingly essentially different.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Sun Aug 13 23:41:06 2023
    From Newsgroup: comp.std.c

    On Sunday, August 13, 2023 at 2:00:45 AM UTC+2, Tim Rentsch wrote:
    Keith Thompson <Keith.S.T...@gmail.com> writes:

    I think the right way for C to permit NaT-like bits is, as Kaz
    suggested, to define "indeterminate value" in terms of provenance,
    not just the bits that make up its current representation. [...]

    This idea is fundamentally wrong. NaT bits are associated with
    particular areas of memory, which is to say objects. The point
    of provenance is that non-viability is associated with /values/,
    not with objects. Once an area of memory acquires an object
    representation, the NaT bit or NaT bits for that memory are set
    to zero, end of story. Note also that NaT bits are independent
    of what type is used to access an object - if the NaT bit is set
    then any access is illegal, no matter what type is used to do the
    access. By contrast, provenance is used in situations where
    non-viability is associated with values, not with objects. But
    values are always type dependent; a pointer object that holds
    a value that has been passed to free() is "indeterminate" when
    accessed as a pointer type, but perfectly okay to access as an
    unsigned char type. The two kinds of situations are essentially
    different, and the theoretical models used to characterize the
    rules in the two kinds of situations should therefore be
    correspondingly essentially different.
    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Tue Aug 15 21:06:37 2023
    From Newsgroup: comp.std.c

    Martin Uecker <ma.uecker@gmail.com> writes:

    On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:

    Keith Thompson <Keith.S.T...@gmail.com> writes:

    I think the right way for C to permit NaT-like bits is, as Kaz
    suggested, to define "indeterminate value" in terms of provenance,
    not just the bits that make up its current representation. [...]

    This idea is fundamentally wrong. NaT bits are associated with
    particular areas of memory, which is to say objects. The point
    of provenance is that non-viability is associated with /values/,
    not with objects. Once an area of memory acquires an object
    representation, the NaT bit or NaT bits for that memory are set
    to zero, end of story. Note also that NaT bits are independent
    of what type is used to access an object - if the NaT bit is set
    then any access is illegal, no matter what type is used to do the
    access. By contrast, provenance is used in situations where
    non-viability is associated with values, not with objects. But
    values are always type dependent; a pointer object that holds
    a value that has been passed to free() is "indeterminate" when
    accessed as a pointer type, but perfectly okay to access as an
    unsigned char type. The two kinds of situations are essentially
    different, and the theoretical models used to characterize the
    rules in the two kinds of situations should therefore be
    correspondingly essentially different.

    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.

    My preceding comments were meant to be only about NaT bits (or
    NaT-like bits) and provenance. There is an inherent mismatch
    between the two, as I have tried to explain. It is only the idea
    that provenence would provide a good foundation for defining the
    semantics of "NaT everywhere" that I am saying is fundamentally
    wrong.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Tue Aug 15 22:40:37 2023
    From Newsgroup: comp.std.c

    On Wednesday, August 16, 2023 at 6:06:43 AM UTC+2, Tim Rentsch wrote:
    Martin Uecker <ma.u...@gmail.com> writes:
    On Sunday, August 13, 2023 at 2:00:45?AM UTC+2, Tim Rentsch wrote:

    Keith Thompson <Keith.S.T...@gmail.com> writes:

    I think the right way for C to permit NaT-like bits is, as Kaz
    suggested, to define "indeterminate value" in terms of provenance,
    not just the bits that make up its current representation. [...]

    This idea is fundamentally wrong. NaT bits are associated with
    particular areas of memory, which is to say objects. The point
    of provenance is that non-viability is associated with /values/,
    not with objects. Once an area of memory acquires an object
    representation, the NaT bit or NaT bits for that memory are set
    to zero, end of story. Note also that NaT bits are independent
    of what type is used to access an object - if the NaT bit is set
    then any access is illegal, no matter what type is used to do the
    access. By contrast, provenance is used in situations where
    non-viability is associated with values, not with objects. But
    values are always type dependent; a pointer object that holds
    a value that has been passed to free() is "indeterminate" when
    accessed as a pointer type, but perfectly okay to access as an
    unsigned char type. The two kinds of situations are essentially
    different, and the theoretical models used to characterize the
    rules in the two kinds of situations should therefore be
    correspondingly essentially different.

    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.
    My preceding comments were meant to be only about NaT bits (or
    NaT-like bits) and provenance. There is an inherent mismatch
    between the two, as I have tried to explain. It is only the idea
    that provenence would provide a good foundation for defining the
    semantics of "NaT everywhere" that I am saying is fundamentally
    wrong.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.
    I would love to hear your comments, because some people
    want to have such an abstract of "indeterminate" and
    some already believe that this is how the standard should
    be understood already today.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Wed Aug 16 09:19:10 2023
    From Newsgroup: comp.std.c

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Background: Annex J part 2 says (in various phrasings in
    different revisions of the C standard, with the one below
    being taken from C90):

    The value of an uninitialized object that has automatic
    storage duration is used before a value is assigned [is
    undefined behavior] (6.5.7)

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?

    [400+ lines deleted]

    Summary: my reading is that accessing an object that has not
    been explicitly stored into since its declaration was evaluated
    is necessarily undefined behavior in C90, but not necessarily
    undefined behavior in C99 and C11 (and AFAIAA also in C17 and
    the upcoming C23). My reasoning is given in detail above.


    Postscript: this commentary has taken much longer to write than
    I thought it would, for the most part because I made an early
    decision to be systematic and thorough. I hope the effort has
    helped the readers gain confidence in the explanations and
    conclusions stated. I may return to the deferred topic about
    pointer types but have no plans at present about when that might
    be.

    Thank you for taking the time to write that.

    It's nice to be appreciated. Thank you.

    I'd like to offer a brief summary of the points you made. Please let me
    know if my summary is incorrect.

    Excellent. I am writing a reaction directly after each item.

    - An "indeterminate value" is by definition either an "unspecified
    value" or a "trap representation".

    Yes.

    - In C90 (which did not yet define all these terms), accessing the value
    of an uninitialized object explicitly has undefined behavior.

    C90 made "use [...] of indeterminately valued objects" part of the
    definition of undefined behavior. To connect the dots we need to
    know that "If an object that has automatic storage duration is not
    initialized explicitly, its value is indeterminate." These two
    normative items are combined into one in J.2: "The value of an
    uninitialized object that has automatic storage duration is used
    before a value is assigned".

    - In C99 and later, J.2 (which is *not* normative) states that using the
    value of an object with automatic storage duration while it is
    indeterminate has undefined behavior. This implies that:
    int main(void) {
    int n;
    n;
    }
    has undefined behavior, even if int has no trap representations.

    For the J.2 summary, yes. I don't think I gave the implied
    conclusion, but I agree with you that the J.2 entry does seem to
    imply this.

    - Statements in J.2 *should* be supported by normative text.

    I don't think I said this at all. At least for now I offer
    no opinion on this recommendation.

    - There is no normative text in any post-C90 edition of the C
    standard that supports the claim that reading an uninitialized
    int object actually has undefined behavior if it does not hold
    a trap representation. (Pointers raise other issues, which I'll
    ignore for now.)

    Yes, with a very minor correction that it is C99 and later, because
    I haven't looked at the editions of the C standard after C90 but
    before C99.

    - The cited statement in J.2 is incorrect, or at least imprecise.

    I don't think I said this exactly. I did say or at least imply
    that the quoted entry in J.2 is not completely accurate. Certainly
    it allows conclusions that are not supported by normative text, and
    looked at from that point of view it is "wrong".

    I agree with you on all the above points.

    There is one point on which I think we disagree. It is a matter
    of opinion, not of fact. You wrote:

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?

    The statement in N1570 J.2 is:

    The behavior is undefined in the following circumstances:
    [...]
    - The value of an object with automatic storage duration is used
    while it is indeterminate (6.2.4, 6.7.9, 6.8).

    I get the impression that you're not particularly bothered by the fact
    that the statement in J.2 is merely an "approximation". In my opinion,
    the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
    that Annex J is, to quote the standard's foreword, "for information
    only", is not an excuse to ignore factual errors. Readers of the
    standard rely on the informative annexes to provide correct information.
    This particular text is not just a "(perhaps useful) approximation"; it
    is actively misleading.

    Like I said before, for now I offer no opinion on this question. I
    wouldn't mind if a footnote were added to help mitigate the problem.

    I'm not criticizing the author of the standard for making this mistake.
    Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.

    After reading the various standards carefully, I believe the wording
    in the J.2 entry was not just an oversight. I suspect there is
    something deeper going on. In neither case, however, does it prompt
    any specific reaction (ie, in myself) as to what to do about it (if
    anything).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Wed Aug 16 11:11:41 2023
    From Newsgroup: comp.std.c

    Kaz Kylheku <864-117-4973@kylheku.com> writes:

    On 2023-07-21, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

    6.3.2.1 p2:

    "[...] If the lvalue designates an object of automatic storage
    duration that could have been declared with the register storage class
    (never had its address taken), and that object is uninitialized (not
    declared with an initializer and no assignment to it has been
    performed prior to use), the behavior is undefined."

    seems to cover it. The restriction on not having it's address taken
    seems odd.

    Wording like that looks like someone's solo documentation effort,
    not peer-reviewed by an expert commitee.

    That looks as if the intent is to allow some diagnoses of uses of uninitialized variables, while discouraging others.

    That isn't at all what this passage is about.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Wed Aug 16 19:51:40 2023
    From Newsgroup: comp.std.c

    On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Background: Annex J part 2 says (in various phrasings in
    different revisions of the C standard, with the one below
    being taken from C90):

    The value of an uninitialized object that has automatic
    storage duration is used before a value is assigned [is
    undefined behavior] (6.5.7)

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?
    [400+ lines deleted]
    Summary: my reading is that accessing an object that has not
    been explicitly stored into since its declaration was evaluated
    is necessarily undefined behavior in C90, but not necessarily
    undefined behavior in C99 and C11 (and AFAIAA also in C17 and
    the upcoming C23). My reasoning is given in detail above.


    Postscript: this commentary has taken much longer to write than
    I thought it would, for the most part because I made an early
    decision to be systematic and thorough. I hope the effort has
    helped the readers gain confidence in the explanations and
    conclusions stated. I may return to the deferred topic about
    pointer types but have no plans at present about when that might
    be.

    Thank you for taking the time to write that.

    I'd like to offer a brief summary of the points you made. Please let me
    know if my summary is incorrect.

    - An "indeterminate value" is by definition either an "unspecified
    value" or a "trap representation".

    - In C90 (which did not yet define all these terms), accessing the value
    of an uninitialized object explicitly has undefined behavior.

    - In C99 and later, J.2 (which is *not* normative) states that using the
    value of an object with automatic storage duration while it is
    indeterminate has undefined behavior. This implies that:
    int main(void) {
    int n;
    n;
    }
    has undefined behavior, even if int has no trap representations.

    - Statements in J.2 *should* be supported by normative text.

    - There is no normative text in any post-C90 edition of the C
    standard that supports the claim that reading an uninitialized
    int object actually has undefined behavior if it does not hold
    a trap representation. (Pointers raise other issues, which I'll
    ignore for now.)

    - The cited statement in J.2 is incorrect, or at least imprecise.

    I agree with you on all the above points.

    There is one point on which I think we disagree. It is a matter
    of opinion, not of fact. You wrote:

    Remembering that Annex J is informative rather than normative,
    is this statement right even for a type that has no trap
    representations? To ask that question another way, is this
    statement always right or is it just a (perhaps useful)
    approximation?

    The statement in N1570 J.2 is:

    The behavior is undefined in the following circumstances:
    [...]
    - The value of an object with automatic storage duration is used
    while it is indeterminate (6.2.4, 6.7.9, 6.8).

    I get the impression that you're not particularly bothered by the fact
    that the statement in J.2 is merely an "approximation". In my opinion,
    the statement in J.2 is simply incorrect, and should be fixed. (That's unlikely to be possible at this stage of the C23 process.) The fact
    that Annex J is, to quote the standard's foreword, "for information
    only", is not an excuse to ignore factual errors. Readers of the
    standard rely on the informative annexes to provide correct information.
    This particular text is not just a "(perhaps useful) approximation"; it
    is actively misleading.

    I'm not criticizing the author of the standard for making this mistake.
    Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.

    I would be in favor of a formal model of what "uninitialized" means
    which could be summarized as below.

    Implementors wishing to develop tooling to catch uses of uninitialized
    data can refer to the model; if their tooling diagnoses only
    what the model deems undefined, then the tool can be integrated
    into a conforming implementation.

    - Certain objects are unintialized, like auto variables without
    an initializer, or new bytes coming from malloc or realloc.

    - What is undefined behavior is when an uninitialized value is used
    to make a control-flow decision, or when it is output, or otherwise
    passed to the host environment.

    - The formal model defines "uninitialized" in terms of there being,
    in the abstract semantics, a "shadow value" corresponding to every
    byte of a value, and that shadow value indicates whether the
    corresponding byte is initialized or not.

    - Shadow values propagate across copies, accesses and calculations.

    - No special exception is needed for unsigned, other than that
    it doesn't have trap representations.

    - This would be undefined:

    {
    int uninited;
    int *p = &uninited;
    int v = * (unsigned char *) p;

    if (v) ... // undefined here

    printf("%d\n", v); // undefined

    No special blessing is required for unsigned char to access
    the object. The resulting value keeps carrying the shadow byte
    which indicates that it is uninitialized, and so when it is output,
    or used for a control flow decision, the behavior is undefined.

    memcpy can be written without outputting the bytes being copied,
    and without allowing their value sto control flow.

    If a structure is copied with memcpy, and has uninitialized padding,
    the shadow value models says that the destination object now
    has uninitialized padding.

    - When a value is obtained by accessing an object which has one
    or more uninitialized bytes, the corresponding bytes of the
    value are uninitialized.

    - When a calculation has any operands that have one or more
    uninitialized bytes, all bytes of the resulting value
    are uninitialized.

    E.g. if there is an int *p, which is used to access a value *p,
    where the low-order byte is initialized, then the low order
    byte of *p is initialized; the other bytes are uninitialized.
    But in the value *p + 0, the entire value is uninitialized.
    Implementations following the model don't have to track individual
    bits or bytes through calculations. This could apply to type
    conversions. e.g. tif *p is of type unsigned char, and
    refers to an uninitialized byte, then the entire promoted
    int (or possibly unsigned int) value is uninitialized:
    all four bytes (or however many) of it.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Wed Aug 16 20:03:54 2023
    From Newsgroup: comp.std.c

    On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Thank you for taking the time to write that.
    [ ... ]
    I'm not criticizing the author of the standard for making this mistake.
    Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.

    [Supersede attempt to reduce quoted material.]

    I would be in favor of a formal model of what "uninitialized" means
    which could be summarized as below.

    Implementors wishing to develop tooling to catch uses of uninitialized
    data can refer to the model; if their tooling diagnoses only
    what the model deems undefined, then the tool can be integrated
    into a conforming implementation.

    - Certain objects are unintialized, like auto variables without
    an initializer, or new bytes coming from malloc or realloc.

    - What is undefined behavior is when an uninitialized value is used
    to make a control-flow decision, or when it is output, or otherwise
    passed to the host environment.

    - The formal model defines "uninitialized" in terms of there being,
    in the abstract semantics, a "shadow value" corresponding to every
    byte of a value, and that shadow value indicates whether the
    corresponding byte is initialized or not.

    - Shadow values propagate across copies, accesses and calculations.

    - No special exception is needed for unsigned, other than that
    it doesn't have trap representations.

    - This would be undefined:

    {
    int uninited;
    int *p = &uninited;
    int v = * (unsigned char *) p;

    if (v) ... // undefined here

    printf("%d\n", v); // undefined

    No special blessing is required for unsigned char to access
    the object. The resulting value keeps carrying the shadow byte
    which indicates that it is uninitialized, and so when it is output,
    or used for a control flow decision, the behavior is undefined.

    memcpy can be written without outputting the bytes being copied,
    and without allowing their value sto control flow.

    If a structure is copied with memcpy, and has uninitialized padding,
    the shadow value models says that the destination object now
    has uninitialized padding.

    - When a value is obtained by accessing an object which has one
    or more uninitialized bytes, the corresponding bytes of the
    value are uninitialized.

    - When a calculation has any operands that have one or more
    uninitialized bytes, all bytes of the resulting value
    are uninitialized.

    E.g. if there is an int *p, which is used to access a value *p,
    where the low-order byte is initialized, then the low order
    byte of *p is initialized; the other bytes are uninitialized.
    But in the value *p + 0, the entire value is uninitialized.
    Implementations following the model don't have to track individual
    bits or bytes through calculations. This could apply to type
    conversions. e.g. tif *p is of type unsigned char, and
    refers to an uninitialized byte, then the entire promoted
    int (or possibly unsigned int) value is uninitialized:
    all four bytes (or however many) of it.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.std.c on Wed Aug 16 13:43:30 2023
    From Newsgroup: comp.std.c

    Kaz Kylheku <864-117-4973@kylheku.com> writes:
    On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Thank you for taking the time to write that.
    [ ... ]
    I'm not criticizing the author of the standard for making this mistake.
    Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.

    [Supersede attempt to reduce quoted material.]

    I would be in favor of a formal model of what "uninitialized" means
    which could be summarized as below.

    Implementors wishing to develop tooling to catch uses of uninitialized
    data can refer to the model; if their tooling diagnoses only
    what the model deems undefined, then the tool can be integrated
    into a conforming implementation.

    - Certain objects are unintialized, like auto variables without
    an initializer, or new bytes coming from malloc or realloc.

    - What is undefined behavior is when an uninitialized value is used
    to make a control-flow decision, or when it is output, or otherwise
    passed to the host environment.

    Why restrict it to those particular uses, rather than saying that any
    attempt to read an uninitialized value has undefined behavior?

    For example, something like:
    {
    int uninit;
    int copy = uninit + 1;
    }
    might cause a hardware trap on some systems (for example Itanium if
    uninit is stored in a register and the NaT bit is set).

    [...]
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Will write code for food.
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Wed Aug 16 21:08:19 2023
    From Newsgroup: comp.std.c

    On 2023-08-16, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <864-117-4973@kylheku.com> writes:
    On 2023-08-03, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Repeating the question stated in the Subject line:

    Does reading an uninitialized object [always] have undefined
    behavior?

    Thank you for taking the time to write that.
    [ ... ]
    I'm not criticizing the author of the standard for making this mistake.
    Stuff happens. It was likely a result of an oversight during the
    transition from C90 to C99.

    [Supersede attempt to reduce quoted material.]

    I would be in favor of a formal model of what "uninitialized" means
    which could be summarized as below.

    Implementors wishing to develop tooling to catch uses of uninitialized
    data can refer to the model; if their tooling diagnoses only
    what the model deems undefined, then the tool can be integrated
    into a conforming implementation.

    - Certain objects are unintialized, like auto variables without
    an initializer, or new bytes coming from malloc or realloc.

    - What is undefined behavior is when an uninitialized value is used
    to make a control-flow decision, or when it is output, or otherwise
    passed to the host environment.

    Why restrict it to those particular uses, rather than saying that any
    attempt to read an uninitialized value has undefined behavior?

    Because that then brings back complications like

    - unsigned char access has to be exempt

    - what happens if we copy through in intermediate values:

    int ch = *src++; // *src is uninitialized, therefore so is ch
    *dst++ = ch; // ch is uninitialized and not unsigned char

    Is the second access to ch uninitialized?

    - structures: when a struct is access which has uninitialized
    padding, what happens: we need a rule like if those bytes
    are accessed, they are accessed as if unsigned char.

    The idea of trapping only control flow decisions or output is inspired
    by Valgrind.

    Valgrind does not "spaz out" just because an uninitialized value is
    accessed, because it would result in useless false positives.

    Not all of the reasoning applies to C; part of it is that Valgrind is
    working with machine, with no source language knowledge. The basic idea
    makes sense though.

    Valgrind usefully finds uninitialized data bugs, while allowing you to
    write your own memcpy which can copy a structure full of uninitialized
    bytes: and it does so without knowing anything about unsigned char.

    We could make the rule that only visible behavior depending on
    an uninitialized byte is undefined; the rule about control flows
    makes it a bit tighter, while allowing the copying of of uninited
    data.

    For example, something like:
    {
    int uninit;
    int copy = uninit + 1;
    }
    might cause a hardware trap on some systems (for example Itanium if
    uninit is stored in a register and the NaT bit is set).

    Right, so the model above doesn't speak to traps. We still have those.

    You can copy an object using unsigned char not because it's specially
    blessed for access (other than in regard to aliasing rules), but because
    it has no trap representation.

    On a machine without traps, the above code would just result
    in copy being uninitialized.

    If that value isn't printed, or used in if, or switch, then it
    doesn't matter.

    If the type int has trap representations, then it's undefined on that implementation; it's basically just a matter of luck whether uninit is a
    trap or a value, so it has to be regarded as undefined.

    I believe that the model can be used to implement useful diagnostics
    even without realizing the actual shadow bytes. A subset of the
    bugs can be diagnosed within a lexical scope, like uses of
    uninitialized auto locals. When the compiler is doing data flow
    analysis, it just propagates that uninited info around the program
    graph. If an uninited data flow reaches certain nodes in the program
    graph, like where control decisions are made or certain functions
    are called that are known to pass the datum to the host environment,
    then it can diagnose.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Wed Aug 16 23:13:03 2023
    From Newsgroup: comp.std.c

    Martin Uecker <ma.uecker@gmail.com> writes:

    [some unrelated passages removed]

    On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:

    Martin Uecker <ma.u...@gmail.com> writes:

    [...]

    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.

    I would love to hear your comments, because some people
    want to have such an abstract of "indeterminate" and
    some already believe that this is how the standard should
    be understood already today.

    I've been thinking about this, and am close (I think) to having
    something to say in response. Before I do that, thought, let me
    ask this: what problem or problems are motivating the question?
    What problems do you (or "some people") want to solve? I don't
    want just examples here; I'm hoping to get a full list.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Thu Aug 17 07:08:45 2023
    From Newsgroup: comp.std.c

    On 2023-08-17, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Martin Uecker <ma.uecker@gmail.com> writes:

    [some unrelated passages removed]

    On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:

    Martin Uecker <ma.u...@gmail.com> writes:

    [...]

    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.

    I would love to hear your comments, because some people
    want to have such an abstract of "indeterminate" and
    some already believe that this is how the standard should
    be understood already today.

    I've been thinking about this, and am close (I think) to having
    something to say in response. Before I do that, thought, let me
    ask this: what problem or problems are motivating the question?
    What problems do you (or "some people") want to solve? I don't
    want just examples here; I'm hoping to get a full list.

    I'm all about the diagnosis. Even on machines in which all
    representations are values, and therefore safe, a program whose external
    effect or output depends on unintialized data, and is therefore nondeterministic (a bad form of nondeterministic), is a repugnant
    program.

    I'd like to have clear rules which allow an implementation to
    to go great depths to diagnose all such situations, while
    remaining conforming. (The language agrees that those situations
    are erroneous, granting the tools license to diagnose.)

    At the same time, certain situations in which uninitialized data are
    used in ways that don't have a visible effect, would be nuisance if they generated diagnostics, the primary example being the copying of objects.
    I would like it so that memcpy isn't magic. I want it so that the
    programmer can write a bytewise memcpy which doesn't violate the
    rules even if it moves uninitialized data.

    I would like a model of uninitialized data which usefully lends itself
    to different depths with different trade-offs, like complexity of
    analysis and use of run-time resources. Limits should be imposed by implementations (what cases they want to diagnose) rather than by the
    model.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Fri Aug 18 12:44:11 2023
    From Newsgroup: comp.std.c

    On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
    On 2023-08-17, Tim Rentsch <tr.1...@z991.linuxsc.com> wrote:
    Martin Uecker <ma.u...@gmail.com> writes:

    [some unrelated passages removed]

    On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:

    Martin Uecker <ma.u...@gmail.com> writes:

    [...]

    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.

    I would love to hear your comments, because some people
    want to have such an abstract of "indeterminate" and
    some already believe that this is how the standard should
    be understood already today.

    I've been thinking about this, and am close (I think) to having
    something to say in response. Before I do that, thought, let me
    ask this: what problem or problems are motivating the question?
    What problems do you (or "some people") want to solve? I don't
    want just examples here; I'm hoping to get a full list.
    I'm all about the diagnosis. Even on machines in which all
    representations are values, and therefore safe,
    I do not agree with the idea that "absence of UB = safe ".
    a program whose external
    effect or output depends on unintialized data, and is therefore nondeterministic (a bad form of nondeterministic), is a repugnant
    program.
    I would expect a debugger to output the memory as it seen
    by the CPU. But yes, it would not be a strictly conforming program.
    I'd like to have clear rules which allow an implementation to
    to go great depths to diagnose all such situations, while
    remaining conforming. (The language agrees that those situations
    are erroneous, granting the tools license to diagnose.)
    An implementation does not need a license from the standard
    to diagnose anything. I can already diagnose whatever seems
    useful and this does not affect conformance at all.
    But it becomes easier to usefully diagnose behavior which is
    undefined, because then one can expect that in portable C it
    is not used intentionally.
    At the same time, certain situations in which uninitialized data are
    used in ways that don't have a visible effect, would be nuisance if they generated diagnostics, the primary example being the copying of objects.
    I would like it so that memcpy isn't magic. I want it so that the
    programmer can write a bytewise memcpy which doesn't violate the
    rules even if it moves uninitialized data.
    Yes, I think for C this is rather important.
    I would like a model of uninitialized data which usefully lends itself
    to different depths with different trade-offs, like complexity of
    analysis and use of run-time resources. Limits should be imposed by implementations (what cases they want to diagnose) rather than by the
    model.
    Tools can already do complex analysis and track down use of
    uninitialized variables. But with respect to conformance, I think
    the current standard has very good rules: memcpy/memcmp
    and similar code works as expected. Locally, where a compiler
    can be expected to give good diagnostics via static analysis
    the use of uninitialized variables is UB. But this does not
    spread via pointers elsewhere, where useful diagnostics
    are unlikely and optimizer induced problems based on UB
    might be far more difficult to debug.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Fri Aug 18 12:52:42 2023
    From Newsgroup: comp.std.c

    On Thursday, August 17, 2023 at 8:13:07 AM UTC+2, Tim Rentsch wrote:
    Martin Uecker <ma.u...@gmail.com> writes:

    [some unrelated passages removed]
    On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:

    Martin Uecker <ma.u...@gmail.com> writes:
    [...]
    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.

    I would love to hear your comments, because some people
    want to have such an abstract of "indeterminate" and
    some already believe that this is how the standard should
    be understood already today.
    I've been thinking about this, and am close (I think) to having
    something to say in response. Before I do that, thought, let me
    ask this: what problem or problems are motivating the question?
    What problems do you (or "some people") want to solve? I don't
    want just examples here; I'm hoping to get a full list.
    There are essentially two main interests driving this. First, there
    is some interest to precisely formulate the semantics for C.
    The provenance proposal came out of this.
    Second, there is the issue of safety problems caused by
    uninitialized reads, together with compiler support for zero
    initialization etc. So there are various people who want to
    change the semantics for uninitialized variables completely
    in the interest of safety.
    So far, there was no consensus in WG14 that the rules should
    be changed or what the new rules should be.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Fri Aug 18 20:20:05 2023
    From Newsgroup: comp.std.c

    Kaz Kylheku <864-117-4973@kylheku.com> writes:

    I'm all about the diagnosis. Even on machines in which all
    representations are values, and therefore safe, a program whose
    external effect or output depends on unintialized data, and is
    therefore nondeterministic (a bad form of nondeterministic), is a
    repugnant program.

    I'd like to have clear rules which allow an implementation to to
    go great depths to diagnose all such situations, while remaining
    conforming. (The language agrees that those situations are
    erroneous, granting the tools license to diagnose.)

    The C standard allows compilers to do whatever analysis they
    want and to issue diagnostics for whatever conditions or
    circumstances they choose. What you want is orthogonal to
    what is being discussed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Sat Aug 19 05:04:06 2023
    From Newsgroup: comp.std.c

    On 2023-08-18, Martin Uecker <ma.uecker@gmail.com> wrote:
    On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
    An implementation does not need a license from the standard
    to diagnose anything. I can already diagnose whatever seems
    useful and this does not affect conformance at all.

    That's true about diagnostics at translation time. It's not clear
    about that happen at run time and indistinguishable from the
    program's output on stdout or stderr.

    Also, it might be desirable for it to be conforming to terminate the
    program if it has run afoul of the rules.

    I would like a model of uninitialized data which usefully lends itself
    to different depths with different trade-offs, like complexity of
    analysis and use of run-time resources. Limits should be imposed by
    implementations (what cases they want to diagnose) rather than by the
    model.

    Tools can already do complex analysis and track down use of
    uninitialized variables. But with respect to conformance, I think
    the current standard has very good rules: memcpy/memcmp
    and similar code works as expected. Locally, where a compiler
    can be expected to give good diagnostics via static analysis
    the use of uninitialized variables is UB. But this does not
    spread via pointers elsewhere, where useful diagnostics
    are unlikely and optimizer induced problems based on UB
    might be far more difficult to debug.

    Dynamic instrumentation and tracking makes that possible
    for that information to follow pointer data flows, globally
    in the program.

    E.g. under the Valgrind tool, if one module passes an unitialized
    object into another, and that other one relies on it to make
    a conditional branch, it will be diagnosed. You can get the
    backtrace of where that object was created as well as where
    the use took place.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.std.c on Sat Aug 19 05:23:29 2023
    From Newsgroup: comp.std.c

    On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Kaz Kylheku <864-117-4973@kylheku.com> writes:

    I'm all about the diagnosis. Even on machines in which all
    representations are values, and therefore safe, a program whose
    external effect or output depends on unintialized data, and is
    therefore nondeterministic (a bad form of nondeterministic), is a
    repugnant program.

    I'd like to have clear rules which allow an implementation to to
    go great depths to diagnose all such situations, while remaining
    conforming. (The language agrees that those situations are
    erroneous, granting the tools license to diagnose.)

    The C standard allows compilers to do whatever analysis they
    want and to issue diagnostics for whatever conditions or
    circumstances they choose.

    And stop translating? If some use of an uninitialized object
    isn't undefined, and you make the diagnostic a fatal error,
    then you don't have a conforming compiler at that point.

    What you want is orthogonal to what is being discussed.

    I'm mainly concerned about run-time.

    If the program hasn't invoked undefined behavior, I don't thinkk it's conforming to inject gratuitous diagnostics into the program's run-time,
    such that they appear as if they were its output on stderr or stdout.
    Those diagnostics have to go to some special debug port.

    Also, not conforming to arbitrarily terminate the program. (Other
    than in some weasly language lawyering way, by declaring that it
    has exceeded an implementation limit or something.)
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Fri Aug 18 22:56:34 2023
    From Newsgroup: comp.std.c

    Kaz Kylheku <864-117-4973@kylheku.com> writes:

    On 2023-08-19, Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    [...]

    The C standard allows compilers to do whatever analysis they
    want and to issue diagnostics for whatever conditions or
    circumstances they choose.

    And stop translating? If some use of an uninitialized object
    isn't undefined, and you make the diagnostic a fatal error,
    then you don't have a conforming compiler at that point.

    [also]

    If the program hasn't invoked undefined behavior, I don't thinkk
    it's conforming to inject gratuitous diagnostics [..or..]
    to arbitrarily terminate the program. [...]

    You need to learn how to say what you mean. Your earlier
    posting didn't say anything about failing to compile
    or altering program behavior. If you can't learn how
    to say what you mean then there is roughly a 1e-29 percent
    chance that you'll get what you want.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Sat Aug 19 01:36:23 2023
    From Newsgroup: comp.std.c

    On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
    On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:
    On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
    An implementation does not need a license from the standard
    to diagnose anything. I can already diagnose whatever seems
    useful and this does not affect conformance at all.
    That's true about diagnostics at translation time. It's not clear
    about that happen at run time and indistinguishable from the
    program's output on stdout or stderr.
    The observable behavior has to stay the same, so yes, it could
    not output to stdout or stderr. But there is nothing stopping it
    to log debugging information somewhere else, where it could
    be accessed.
    Also, it might be desirable for it to be conforming to terminate the
    program if it has run afoul of the rules.
    Yes, this is one main reason to make certain things UB. But
    then it can have false positives and needs to be backward
    compatible, which limits what is possible.
    I would like a model of uninitialized data which usefully lends itself
    to different depths with different trade-offs, like complexity of
    analysis and use of run-time resources. Limits should be imposed by
    implementations (what cases they want to diagnose) rather than by the
    model.

    Tools can already do complex analysis and track down use of
    uninitialized variables. But with respect to conformance, I think
    the current standard has very good rules: memcpy/memcmp
    and similar code works as expected. Locally, where a compiler
    can be expected to give good diagnostics via static analysis
    the use of uninitialized variables is UB. But this does not
    spread via pointers elsewhere, where useful diagnostics
    are unlikely and optimizer induced problems based on UB
    might be far more difficult to debug.
    Dynamic instrumentation and tracking makes that possible
    for that information to follow pointer data flows, globally
    in the program.

    E.g. under the Valgrind tool, if one module passes an unitialized
    object into another, and that other one relies on it to make
    a conditional branch, it will be diagnosed. You can get the
    backtrace of where that object was created as well as where
    the use took place.
    And valgrind exists and is a useful tool (I use it myself)
    despite not everything it diagnoses is UB. But it also has
    false positives, so using the same rules for deciding what
    should be UB in the standard as valgrind uses seems difficult.
    Also note that of the output of a program relies on
    unspecified values, then it is already not strictly conforming
    even when the behavior itself is not undefined. So if an
    implementation is smart enough to see this, it could already
    reject the program.
    Making already the use of unspecified values in conditional
    branches be UB seems problematic. E.g. you could not
    compute a hash over data structures with padding and
    then compare it later to see whether something has
    changed (taking into account false positives). This seems
    similar to memcpy / memcmp but involved conditions,
    and such techniques would become non-conforming.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@Richard@Damon-Family.org to comp.std.c on Sat Aug 19 09:18:17 2023
    From Newsgroup: comp.std.c

    On 8/19/23 4:36 AM, Martin Uecker wrote:
    On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
    On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:
    On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote:
    An implementation does not need a license from the standard
    to diagnose anything. I can already diagnose whatever seems
    useful and this does not affect conformance at all.
    That's true about diagnostics at translation time. It's not clear
    about that happen at run time and indistinguishable from the
    program's output on stdout or stderr.

    The observable behavior has to stay the same, so yes, it could
    not output to stdout or stderr. But there is nothing stopping it
    to log debugging information somewhere else, where it could
    be accessed.

    Also, it might be desirable for it to be conforming to terminate the
    program if it has run afoul of the rules.

    Yes, this is one main reason to make certain things UB. But
    then it can have false positives and needs to be backward
    compatible, which limits what is possible.

    I would like a model of uninitialized data which usefully lends itself >>>> to different depths with different trade-offs, like complexity of
    analysis and use of run-time resources. Limits should be imposed by
    implementations (what cases they want to diagnose) rather than by the
    model.

    Tools can already do complex analysis and track down use of
    uninitialized variables. But with respect to conformance, I think
    the current standard has very good rules: memcpy/memcmp
    and similar code works as expected. Locally, where a compiler
    can be expected to give good diagnostics via static analysis
    the use of uninitialized variables is UB. But this does not
    spread via pointers elsewhere, where useful diagnostics
    are unlikely and optimizer induced problems based on UB
    might be far more difficult to debug.
    Dynamic instrumentation and tracking makes that possible
    for that information to follow pointer data flows, globally
    in the program.

    E.g. under the Valgrind tool, if one module passes an unitialized
    object into another, and that other one relies on it to make
    a conditional branch, it will be diagnosed. You can get the
    backtrace of where that object was created as well as where
    the use took place.

    And valgrind exists and is a useful tool (I use it myself)
    despite not everything it diagnoses is UB. But it also has
    false positives, so using the same rules for deciding what
    should be UB in the standard as valgrind uses seems difficult.

    Also note that of the output of a program relies on
    unspecified values, then it is already not strictly conforming
    even when the behavior itself is not undefined. So if an
    implementation is smart enough to see this, it could already
    reject the program.

    Making already the use of unspecified values in conditional
    branches be UB seems problematic. E.g. you could not
    compute a hash over data structures with padding and
    then compare it later to see whether something has
    changed (taking into account false positives). This seems
    similar to memcpy / memcmp but involved conditions,
    and such techniques would become non-conforming.

    Martin

    My understanding is that there is no requirement that the values of the padding bytes remains constant over time. I can't imagine a case where
    they will just change at an arbitrary time, but setting a member of the structure to a value (even if it is the same value it had) might easily
    affect the value of the padding bytes, so the hash changes.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Martin Uecker@ma.uecker@gmail.com to comp.std.c on Sat Aug 19 11:12:53 2023
    From Newsgroup: comp.std.c

    On Saturday, August 19, 2023 at 3:18:22 PM UTC+2, Richard Damon wrote:
    On 8/19/23 4:36 AM, Martin Uecker wrote:
    On Saturday, August 19, 2023 at 7:04:10 AM UTC+2, Kaz Kylheku wrote:
    On 2023-08-18, Martin Uecker <ma.u...@gmail.com> wrote:
    On Thursday, August 17, 2023 at 9:08:48 AM UTC+2, Kaz Kylheku wrote: >>> An implementation does not need a license from the standard
    to diagnose anything. I can already diagnose whatever seems
    useful and this does not affect conformance at all.
    That's true about diagnostics at translation time. It's not clear
    about that happen at run time and indistinguishable from the
    program's output on stdout or stderr.

    The observable behavior has to stay the same, so yes, it could
    not output to stdout or stderr. But there is nothing stopping it
    to log debugging information somewhere else, where it could
    be accessed.

    Also, it might be desirable for it to be conforming to terminate the
    program if it has run afoul of the rules.

    Yes, this is one main reason to make certain things UB. But
    then it can have false positives and needs to be backward
    compatible, which limits what is possible.

    I would like a model of uninitialized data which usefully lends itself >>>> to different depths with different trade-offs, like complexity of
    analysis and use of run-time resources. Limits should be imposed by >>>> implementations (what cases they want to diagnose) rather than by the >>>> model.

    Tools can already do complex analysis and track down use of
    uninitialized variables. But with respect to conformance, I think
    the current standard has very good rules: memcpy/memcmp
    and similar code works as expected. Locally, where a compiler
    can be expected to give good diagnostics via static analysis
    the use of uninitialized variables is UB. But this does not
    spread via pointers elsewhere, where useful diagnostics
    are unlikely and optimizer induced problems based on UB
    might be far more difficult to debug.
    Dynamic instrumentation and tracking makes that possible
    for that information to follow pointer data flows, globally
    in the program.

    E.g. under the Valgrind tool, if one module passes an unitialized
    object into another, and that other one relies on it to make
    a conditional branch, it will be diagnosed. You can get the
    backtrace of where that object was created as well as where
    the use took place.

    And valgrind exists and is a useful tool (I use it myself)
    despite not everything it diagnoses is UB. But it also has
    false positives, so using the same rules for deciding what
    should be UB in the standard as valgrind uses seems difficult.

    Also note that of the output of a program relies on
    unspecified values, then it is already not strictly conforming
    even when the behavior itself is not undefined. So if an
    implementation is smart enough to see this, it could already
    reject the program.

    Making already the use of unspecified values in conditional
    branches be UB seems problematic. E.g. you could not
    compute a hash over data structures with padding and
    then compare it later to see whether something has
    changed (taking into account false positives). This seems
    similar to memcpy / memcmp but involved conditions,
    and such techniques would become non-conforming.

    Martin
    My understanding is that there is no requirement that the values of the padding bytes remains constant over time.
    The C standard specifies when they can change:
    "When a value is stored in an object of structure or union type,
    including in a member object, the bytes of the object representation
    that correspond to any padding bytes take unspecified values"
    I can't imagine a case where
    they will just change at an arbitrary time, but setting a member of the structure to a value (even if it is the same value it had) might easily affect the value of the padding bytes, so the hash changes.
    Sure, writing to object may change the padding and then the
    hash changes. This is why I mentioned false positives.
    Martin
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Sat Aug 26 19:25:55 2023
    From Newsgroup: comp.std.c

    Martin Uecker <ma.uecker@gmail.com> writes:

    On Thursday, August 17, 2023 at 8:13:07?AM UTC+2, Tim Rentsch wrote:

    Martin Uecker <ma.u...@gmail.com> writes:

    [some unrelated passages removed]

    On Wednesday, August 16, 2023 at 6:06:43?AM UTC+2, Tim Rentsch wrote:

    Martin Uecker <ma.u...@gmail.com> writes:

    [...]

    One could still consider the idea that "indeterminate" is an
    abstract property that yields UB during read even for types
    that do not have trap representations. There is no wording
    in the C standard to support this, but I would not call this
    idea "fundamentally wrong". You are right that this is different
    to provenance provenance which is about values. What it would
    have in common with pointer provenance is that there is hidden
    state in the abstract machine associated with memory that
    is not part of the representation. With effective types there
    is another example of this.

    I understand that you want to consider a broader topic, and that,
    in the realm of that broader topic, something like provenance
    could have a role to play. I think it is worth responding to
    that thesis, and am expecting to do so in a separate reply (or
    new thread?) although probably not right away.

    I would love to hear your comments, because some people
    want to have such an abstract of "indeterminate" and
    some already believe that this is how the standard should
    be understood already today.

    I've been thinking about this, and am close (I think) to having
    something to say in response. Before I do that, thought, let me
    ask this: what problem or problems are motivating the question?
    What problems do you (or "some people") want to solve? I don't
    want just examples here; I'm hoping to get a full list.

    There are essentially two main interests driving this. First,
    there is some interest to precisely formulate the semantics for C.
    The provenance proposal came out of this.

    Second, there is the issue of safety problems caused by
    uninitialized reads, together with compiler support for zero
    initialization etc. So there are various people who want to
    change the semantics for uninitialized variables completely
    in the interest of safety.

    This response doesn't answer my question. What are the problems,
    specifically, that people want to solve? If there isn't a good
    understanding of what the problem is, there is little hope of
    finding a solution, let alone reaching agreement on whether a
    proposed change does in fact solve the problem. If we don't know
    where we're going, any choice of road is equally good.

    That said, I understand that you are asking not on your own behalf
    but on behalf (perhaps indirectly) of others, and the others might
    not know what the problem(s) are that they want to solve. I think
    it's worth asking the question explicitly, What is the problem
    that we want to solve here? Start by simply trying to write a
    clear statement of what the problem is; proceed on to looking for
    a solution only after there is agreement (and I don't mean just a
    majority vote) about what problem it is the group wants to solve.

    (Note added after writing: I didn't realize when I started how
    difficult this subject is and how much there is to say about it.
    I hope readers will appreciate the amount of effort that has
    been invested, and get some value out of what has been produced,
    even if it spends too much time on some less important issues.)

    (Also, after having written the whole posting, I see that there
    are some aspects that I didn't relate to the indeterminate
    question and so didn't address. If you want me to say more about
    formalizing semantics or the issue of safety for uninitialized
    variables, I really need some specifics before I can talk about
    those.)

    (One further thought: on reading through my comments one last
    time, I may have more to say about uninitialized variables. But
    I am deferring that for now, to get this beast out the door.)


    So far, there was no consensus in WG14 that the rules should
    be changed or what the new rules should be.

    That's because they don't know what problem it is that they want
    to solve.

    Consider the question of what happens with padding bits/bytes,
    and unnamed members, in structs (unions too of course, but for
    now we consider only structs). The C standard says these bits of
    memory take unspecified values whenever there is a store to any
    member of the struct (and maybe also at other times, but let's
    ignore that). I understand why this decision was made, namely,
    to give more freedom to implementations as to how such operations
    are actualized. But it leaves behind a problem. Speaking as a
    developer, I want the values of these bits to be stable, at least
    in certain cases (and I want to be able to choose which cases
    those are). The C language doesn't give me any way to do that,
    at least not one that isn't horribly inconvenient. In making the
    decision about padding bits/bytes, the C committee answered the
    /question/ but didn't address the /problem/. I expect that
    something similar is going on with the current discussions.

    To better understand the landscape, let's look at three different
    kinds of undefined behavior. The illustrating constructions are
    signed integer arithmetic, obsolete pointer values, and violating
    effective type rules.

    Situations where arithmetic on signed integers overflows might be
    called /practical/ undefined behavior. Certainly it would be
    possible to require a better-defined semantics (such as giving an
    unspecified result), but presumably overflow doesn't come up very
    often, it's not clear how useful the "better" result would be,
    and the cost in some hardware environments might be prohibitive.
    Furthermore there is a fairly easy workaround to avoid overflow:
    simply convert to unsigned types, do the operations, and then
    convert back. Overflow being undefined behavior isn't absolutely
    necessary but in practical terms it's acceptable. (I acknowledge
    that some people have different views on that last statement.)

    An obsolete pointer value is a pointer to an object after the end
    of the object's lifetime. Attempting to make use of an obsolete
    pointer value, in any way whatsoever including simply loading it
    by means of lvalue conversion, is undefined behavior. We can
    imagine narrowing the scope a bit so simply loading an obsolete
    pointer value or comparing one for equality could be better
    defined, but any attempt to dereference an obsolete pointer value
    is what might be called an /essential/ undefined behavior. The
    problem here is both practical and theoretical: there is no way
    to be sure the underlying hardware will be able to carry out the
    asked-for operation (without a machine check, etc), and even if
    there were, there is no way to describe what happens in a way
    that can be expressed (usefully) in terms that relate to what's
    going on in the abstract machine. There simply is no practical,
    useful, sensible way to define the behavior of dereferencing an
    obsolete pointer value.

    At the other end of the spectrum, violating effective type rules is
    what might be called /gratuitous/ undefined behavior. There is no
    particular hardware motivation for choosing UB. And there is no
    problem defining the semantics of a cross-type access, which can be
    done definedly in the same way as accessing union members. So there
    is no reason to think that adding cross-type restrictions is
    necessary. An argument can be made that cross-type restrictions
    are /desirable/, because they allow code transformations that
    improve performance in some cases.

    Incidentally, it might seem like effective type rules are similar in
    some way to NaT bits or pointer provenance. They aren't. NaT bits
    are hardware indicators that actually exist, and pointer provenances
    are attached to values, not to objects. Neither of those conditions
    hold for effective types. The seeming similarity to hidden memory
    bits is a red herring.

    (Also, effective type rules are a lot more complicated than they
    seem at first blush, and have some peculiar properties as a result.
    They seem to work okay if not looked at too closely, but a closer
    look shows some serious shortcomings. But I digress.)

    There are two significant problems with undefined behavior. The
    smaller of the two is that there are no distinctions between the
    different classes of undefined behavior. There is no way around
    having some sort of undefined behavior for obsolete pointer values,
    but cross-typing rules are a completely different story. Yet the C
    standard puts all the different kinds of undefined behaviors into
    the same absolute category. Sometimes people use compiler options
    to turn off, for example, so-called "strict aliasing", and of course
    the C standard allows us to do that. But compilers aren't required
    to provide such an option, and if they do the option may not do
    exactly what we expect it to do, because there is no standard
    specification for it. The C standard should define officially
    sanctioned mechanisms -- as for example standard #pragma's -- to
    give standard-defined semantics to certain constructs of undefined
    behavior that resemble, eg, -fno-strict-aliasing.

    (Let me add in passing that this should be done for some cases of
    unspecified behavior as well. To give one example, the C standard
    should provide a way to direct a C compiler to maintain the values
    of padding bits and bytes and unnamed members, taking away the
    freedom for such things to assume unspecified values.)

    The second problem is basically The Law of Unintended Consequences
    smashing into The Law of Least Astonishment. As compiler writers
    have gotten more and more clever at exploiting the implications of
    "undefined behavior", we see more and more cases of code that looks
    reasonable being turned into mush by overly clever "optimizing"
    compilers. There is obviously something wrong with the way this
    trend is going -- ever more clever "optimizations", followed by ever
    more arcane compiler options to work around the problems caused by
    the too-clever compilers. This problem must be addressed by the C
    standard, for if it is not the ecosystem will transform into a
    confused state that is exactly what the C standard was put in place
    to avoid. (I do have some ideas about how to address this issue,
    but I want to make sure everyone appreciates the extent of the
    problem before we start talking about solutions.)

    Before leaving the sub-topic of undefined behavior, let me mention
    two success stories. The first is 'restrict': the performance
    implications are local, the choice is under control of the program
    (and programmer), and the default choice is to play safe. Good
    show. The second is the improved sequencing rules introduced in
    C11. A thorny problem, and since C11 handled very deftly. These
    parts of the C language and C standard should be held up as examples
    when considering how to go forward on other problems.

    And now on to the question of "indeterminate". Following that, a
    somewhat philosophical perspective concerning the nature of the C
    standard and the people who work on it.

    First an observation. The idea of "indeterminate values" is
    actually two ideas in one: non-valid abstract /values/ (like
    obsolete pointers), and "uninitialized" /objects/ (in quotes
    because in some circumstances objects can become "uninitialized"
    even after they have been stored into.) The word "indeterminate"
    isn't really right for either of these ideas. I understand why it
    was used in the first C standard, and in that context it seems
    okay, but going forward a better word (or words) should be found.
    I will keep using it here but please don't get overly attached to
    the word, lest it confuse the discussion.

    My very strong sense is that some general notion of indeterminate
    values (or objects) is a solution in search of a problem. Let's
    look at some different kinds of undefined behavior, while also
    considering the lens of "indeterminate values (or objects)".

    One: signed integer overflow. Could this situation somehow produce
    an "indeterminate value" that could be stored so it could wreak
    havoc later? Two problems: no sensible developer is going to want
    the bad behavior deferred rather than happening right away, and
    besides anything an "indeterminate value" could do can already be
    done by virtue of the generating condition being undefined itself.

    Two: obsolete pointers. These values are not indeterminate. They
    start off as valid, become obsolete when their pointed-to object
    ends its lifetime, and are always obsolete thereafter. It isn't
    hard to make a formal model for "obsoleteness" (ignoring problems
    such as converting pointers to and from integers, and other C-isms).
    Of course the formal model doesn't map nicely onto real computer
    hardware, because pointers would have far too many bits (and maybe
    other problems as well, but let's ignore that). So we pretend the
    extra bits are there, even though they aren't, with a strange
    consequence that two pointer objects can have the same object
    representation but still be different in that one is obsolete and
    the other isn't. Also a pointer can start off with a non-valid
    value, meaning "not null and points to no object". Here again the
    badness remains until a valid pointer value is put into the object;
    a pointer object with a non-valid value doesn't ever magically
    become valid without having been assigned or stored into. (Note
    that the same formal model for obsolete pointers can accommodate
    non-valid pointers, which are simply obsolete at the start.)

    Three: effective type rules. Broken. One of the weakest areas of
    the C standard. This framework may have started off as not a bad
    idea in C90, but looking at it now it's clear that we've gotten
    ahead of our skis, sorely in need of a top-to-bottom reformulation,
    similar at least in spirit with what was done with sequencing rules
    in C11. Also there should be a standard-defined way of allowing
    cross-type interference, with defined behavior, like what was
    explained above. I expect a well-done reformulation of cross-type (non-)interference rules would have no notion of assigning "magic
    state" to objects, and so have no need of any idea of "indeterminate
    objects (or values)".

    Four: uninitialized objects. Here we have a question: Why? What
    problem are we hoping to solve? Presumably the point of having
    uninitialized objects be "indeterminate" is so that reading them
    is undefined behavior. Let's explore that.

    I realize of course that any object having a trap representation
    (called a non-value representation in the C23 draft) causes
    undefined behavior if read using a type in which the object
    representation corresponds to a trap representation. Obviously
    there is good reason to say trying to read a trap representation
    is undefined behavior. Some types, notably unsigned char, don't
    have any trap representations. Should reading an uninitialized
    object using such a type be undefined behavior? Speaking as a
    developer, I don't see any benefit. An implementation would have
    to go out of its way to do anything other than deliver a valid
    unspecified value; if there is to be undefined behavior, it is
    /contrived/ undefined behavior. Consider:

    A: such UB could allow trapping on any use of an uninitialized
    object. But UB does not guarantee that, and if someone wants
    it there are tools like valgrind to get it (and without any
    special language support needed to do so).

    B: such UB could allow "optimizations" by clever compiler writers.
    The result would be more unexpected code scramblings and more
    arcane compiler options to disable them. A better way to provide
    such imagined benefits is by adding one or more new language
    constructs, along lines similar to the 'restrict' qualifier, to
    selectively enable such performance changes.

    C: future hardware developments might need or take advantage of
    such UB. If and when such things happen it's better to add
    specific wording to reflect the new hardware behaviors. The last
    sentence of 6.3.2.1 p2, added in C11, provides an excellent example
    of how to accommodate such new hardware developments.

    Indeterminate objects is a solution in search of a problem. To
    make progress, first agree on a particular problem. Only after
    that point should possible solutions be considered; I would be
    surprised if some general notion of indeterminateness ever turned
    out to be the solution of choice.

    Now I would like to offer a perspective on how to view work that
    is done in writing the C standard.

    In some respects the ISO C committee resembles the US Supreme
    Court. They consider issues, draw conclusions, and ultimately
    issue "rulings" in the form of ISO-approved standards documents.
    Like the Supreme Court, their decisions are final and cannot be
    appealed.

    However, the Supreme Court ultimately draws its authority from
    how the public views its rulings. If the rulings get too far out
    of line with what the general public believes, confidence in the
    Court will decline and its opinions will carry less weight. (I
    don't mean to make a political statement here - I am simply
    repeating some analysis I have read recently regarding current
    attitudes towards the Court.)

    The same is true of the ISO C committee. They can make whatever
    decisions they want, and those decisions will end up being what
    goes into the C standard. At the same time, it's important - I
    would say very important - to keep the confidence of people for
    whom the C standard is regarded as an important document. If
    that confidence is lost then the C standard will be on its way
    to becoming irrelevant.

    Unfortunately I have the sense that this trend has already
    started. The most important constituency for the C language (and
    so for the C standard) is developers. Many developers, but in
    particular and very especially C developers, want stability. I
    understand the desire to want to "improve" the language. Getting
    agreement on a change has to mean more than a majority vote -- it
    needs to be not just accepted but enthusiastically approved and
    with overwhelming support. Too much of what is planned for C23
    is coming from the implementation community without regard for
    what is beneficial to the development community. I see the
    reported desire for general "indeterminate"-ness as part of this
    trend. It is my hope that those people who are part of the ISO C
    committee reflect on this perspective and reconsider where the C
    language should go for the next C standard.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Spiros Bousbouras@spibou@gmail.com to comp.std.c on Sun Aug 27 08:31:26 2023
    From Newsgroup: comp.std.c

    On Sat, 26 Aug 2023 19:25:55 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Sometimes people use compiler options
    to turn off, for example, so-called "strict aliasing", and of course
    the C standard allows us to do that. But compilers aren't required
    to provide such an option, and if they do the option may not do
    exactly what we expect it to do, because there is no standard
    specification for it. The C standard should define officially
    sanctioned mechanisms -- as for example standard #pragma's -- to
    give standard-defined semantics to certain constructs of undefined
    behavior that resemble, eg, -fno-strict-aliasing.

    Surely the starting point for this should be the documentation of the
    compilers to specify precisely what -fno-strict-aliasing does. If
    a consensus emerges out of these precise specifications or C programmers indicate that they prefer the specification of some particular compiler
    then this can become part of the standard. Adding a relevant #pragma
    should be trivial.

    The second problem is basically The Law of Unintended Consequences
    smashing into The Law of Least Astonishment. As compiler writers
    have gotten more and more clever at exploiting the implications of
    "undefined behavior", we see more and more cases of code that looks reasonable being turned into mush by overly clever "optimizing"
    compilers. There is obviously something wrong with the way this
    trend is going -- ever more clever "optimizations", followed by ever
    more arcane compiler options to work around the problems caused by
    the too-clever compilers. This problem must be addressed by the C
    standard, for if it is not the ecosystem will transform into a
    confused state that is exactly what the C standard was put in place
    to avoid. (I do have some ideas about how to address this issue,
    but I want to make sure everyone appreciates the extent of the
    problem before we start talking about solutions.)

    Without specific examples , it's impossible to comment on this. Why did
    the "reasonable" code have the undefined behaviour ? Could the result
    the programmer was aiming for have been achieved with defined behaviour
    ? For example it has been pointed out on comp.lang.c that it's
    impossible to write a malloc() implementation in conforming C. This is certainly a weakness which should be addressed with some appropriate
    #pragma .

    Before leaving the sub-topic of undefined behavior, let me mention
    two success stories. The first is 'restrict': the performance
    implications are local, the choice is under control of the program
    (and programmer), and the default choice is to play safe. Good
    show.

    From my point of view , restrict is not a success because the
    specification of restrict is the one part of the C1999 standard I have
    given up trying to understand. I understand the underlying idea but the specifics elude me. I remember many years ago someone asked on this
    group about some code involving restrict and a member of the standard committee replied and I found the reply counterintuitive. So I have
    decided to not use restrict in my own code taking also into account
    that I don't need the microoptimisations which restrict is intended to
    allow. But for all I know , people who do need these optimisations find
    the specification of restrict in the standard perfectly adequate.
    --
    It is not widely known that the "CPC" in "Amstrad CPC" actually stands
    for "cool people club".
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Tue Aug 29 04:35:40 2023
    From Newsgroup: comp.std.c

    Spiros Bousbouras <spibou@gmail.com> writes:

    On Sat, 26 Aug 2023 19:25:55 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Sometimes people use compiler options to turn off, for example,
    so-called "strict aliasing", and of course the C standard allows
    us to do that. But compilers aren't required to provide such an
    option, and if they do the option may not do exactly what we
    expect it to do, because there is no standard specification for
    it. The C standard should define officially sanctioned
    mechanisms -- as for example standard #pragma's -- to give
    standard-defined semantics to certain constructs of undefined
    behavior that resemble, eg, -fno-strict-aliasing.

    Surely the starting point for this should be the documentation of
    the compilers to specify precisely what -fno-strict-aliasing does.
    [...]

    Not at all. It's easy to write a specification that says what we
    want to do, along similar lines to what is said in the footnote
    about union member access in section 6.5.2.3

    If the member used to access the contents of a union object
    is not the same as the member last used to store a value in
    the object, the appropriate part of the object representation
    of the value is reinterpreted as an object representation in
    the new type as described in 6.2.6 (a process sometimes called
    "type punning"). This might be a trap representation.

    That behavior should be the default, for all accesses. For cases
    where a developer wants to give permission to the compiler to
    optimize based on cross-type non-interference assumptions, there
    should be a #pragma to do something similar to what effective type
    rules do now. The effective type rules are in need of re-writing
    anyway, and making type punning be the default doesn't break any
    programs, because compilers are already free to ignore the
    implications of violating effective type conditions.


    The second problem is basically The Law of Unintended Consequences
    smashing into The Law of Least Astonishment. As compiler writers
    have gotten more and more clever at exploiting the implications of
    "undefined behavior", we see more and more cases of code that looks
    reasonable being turned into mush by overly clever "optimizing"
    compilers. There is obviously something wrong with the way this
    trend is going -- ever more clever "optimizations", followed by
    ever more arcane compiler options to work around the problems
    caused by the too-clever compilers. This problem must be addressed
    by the C standard, for if it is not the ecosystem will transform
    into a confused state that is exactly what the C standard was put
    in place to avoid. (I do have some ideas about how to address this
    issue, but I want to make sure everyone appreciates the extent of
    the problem before we start talking about solutions.)

    Without specific examples , it's impossible to comment on this.
    [...]

    I feel that so much has been written about this issue that it
    isn't necessary for me to elaborate.

    For example it has been pointed out on comp.lang.c that it's
    impossible to write a malloc() implementation in conforming
    C. This is certainly a weakness which should be addressed with
    some appropriate #pragma .

    There isn't any reason to think malloc() should be writable in
    completely portable C. That's the point of putting malloc() in
    the system library in the first place. By the way, with type
    punning semantics mentioned above being the default, and with the
    alignment features added in C11, I think it is possible to write
    malloc() in portable C without needed any additional language
    changes. But even if it isn't that is no cause for concern; one
    of the principal reasons for having a system library is to
    provide functionality that the core language cannot express (or
    cannot express conveniently).


    Before leaving the sub-topic of undefined behavior, let me mention
    two success stories. The first is 'restrict': the performance
    implications are local, the choice is under control of the program
    (and programmer), and the default choice is to play safe. Good
    show.

    From my point of view , restrict is not a success because the
    specification of restrict is the one part of the C1999 standard I
    have given up trying to understand. I understand the underlying
    idea but the specifics elude me. [...]

    I agree the formal definition of restrict is rather daunting. In
    practice though I think using restrict with confidence is not
    overly difficult. My working model for restrict is something
    like this:

    1. Use restrict only in the declarations of function
    parameters.

    2. For a declaration like const T *restrict foo ,
    the compiler may assume that any objects that can be
    accessed through 'foo' will not be modified.

    3. For a declaration like T *restrict bas ,
    the compiler may assume that any changes to objects
    that can be accessed through 'bas' will be done
    using 'bas' or a pointer value derived from 'bas'
    (and in particular that no changes will happen
    other than through 'bas' or 'bas'-derived pointer
    values).

    Is this summary description helpful?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Spiros Bousbouras@spibou@gmail.com to comp.std.c on Wed Aug 30 19:53:40 2023
    From Newsgroup: comp.std.c

    On Tue, 29 Aug 2023 04:35:40 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Spiros Bousbouras <spibou@gmail.com> writes:

    On Sat, 26 Aug 2023 19:25:55 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Sometimes people use compiler options to turn off, for example,
    so-called "strict aliasing", and of course the C standard allows
    us to do that. But compilers aren't required to provide such an
    option, and if they do the option may not do exactly what we
    expect it to do, because there is no standard specification for
    it. The C standard should define officially sanctioned
    mechanisms -- as for example standard #pragma's -- to give
    standard-defined semantics to certain constructs of undefined
    behavior that resemble, eg, -fno-strict-aliasing.

    Surely the starting point for this should be the documentation of
    the compilers to specify precisely what -fno-strict-aliasing does.
    [...]

    Not at all. It's easy to write a specification that says what we
    want to do, along similar lines to what is said in the footnote
    about union member access in section 6.5.2.3

    If the member used to access the contents of a union object
    is not the same as the member last used to store a value in
    the object, the appropriate part of the object representation
    of the value is reinterpreted as an object representation in
    the new type as described in 6.2.6 (a process sometimes called
    "type punning"). This might be a trap representation.

    Works for me but it would be good to know that this is how compiler
    writers actually understand -fno-strict-aliasing .Is there any compiler documentation which says something like this ?

    That behavior should be the default, for all accesses. For cases
    where a developer wants to give permission to the compiler to
    optimize based on cross-type non-interference assumptions, there
    should be a #pragma to do something similar to what effective type
    rules do now. The effective type rules are in need of re-writing
    anyway, and making type punning be the default doesn't break any
    programs, because compilers are already free to ignore the
    implications of violating effective type conditions.

    [...]

    For example it has been pointed out on comp.lang.c that it's
    impossible to write a malloc() implementation in conforming
    C. This is certainly a weakness which should be addressed with
    some appropriate #pragma .

    There isn't any reason to think malloc() should be writable in
    completely portable C. That's the point of putting malloc() in
    the system library in the first place. By the way, with type
    punning semantics mentioned above being the default, and with the
    alignment features added in C11, I think it is possible to write
    malloc() in portable C without needed any additional language
    changes. But even if it isn't that is no cause for concern; one
    of the principal reasons for having a system library is to
    provide functionality that the core language cannot express (or
    cannot express conveniently).

    One might want to experiment with different allocation algorithms
    and it seems to me that this sort of thing is within the "remit" of
    C. So ideally one should be able to write it in C and prove , starting
    from the standard or precise specifications in compiler documentation ,
    that it works correctly. I don't necessarily mean prove the correctness
    of the whole code but certain key parts.

    Another application I have in mind is languages which get translated
    to C and support garbage collection. Again one might want to use the
    standard malloc() to allocate a large block of memory and use different
    parts of this memory for different types of objects.

    If with the semantics you propose these things are possible , I'm happy.
    I'm not bothered which is the default as long as there is a precise specification from which you can reason that you get the desired behaviour.

    Before leaving the sub-topic of undefined behavior, let me mention
    two success stories. The first is 'restrict': the performance
    implications are local, the choice is under control of the program
    (and programmer), and the default choice is to play safe. Good
    show.

    From my point of view , restrict is not a success because the
    specification of restrict is the one part of the C1999 standard I
    have given up trying to understand. I understand the underlying
    idea but the specifics elude me. [...]

    I agree the formal definition of restrict is rather daunting. In
    practice though I think using restrict with confidence is not
    overly difficult. My working model for restrict is something
    like this:

    1. Use restrict only in the declarations of function
    parameters.

    2. For a declaration like const T *restrict foo ,
    the compiler may assume that any objects that can be
    accessed through 'foo' will not be modified.

    Wouldn't that also be the case with just const T * foo ?

    3. For a declaration like T *restrict bas ,
    the compiler may assume that any changes to objects
    that can be accessed through 'bas' will be done
    using 'bas' or a pointer value derived from 'bas'
    (and in particular that no changes will happen
    other than through 'bas' or 'bas'-derived pointer
    values).

    Is this summary description helpful?

    It seems clear enough but , as I've said , I don't have any use for
    restrict anyway and it's not worth it for me to expend the additional
    mental effort to confirm that my code obeys the additional restrictions
    of restrict .If I call a function with a preexisting interface which
    involves restrict then it seems easy enough to obey the restrictions.
    --
    Carrie also narrates the film, providing useful guidelines for those
    challenged by its intricacies. Sample: "Later that day, Big and I
    arrived home."
    http://www.rogerebert.com/reviews/sex-and-the-city-2-2010
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Wed Aug 30 17:40:52 2023
    From Newsgroup: comp.std.c

    Spiros Bousbouras <spibou@gmail.com> writes:

    On Tue, 29 Aug 2023 04:35:40 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Spiros Bousbouras <spibou@gmail.com> writes:

    On Sat, 26 Aug 2023 19:25:55 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Sometimes people use compiler options to turn off, for example,
    so-called "strict aliasing", and of course the C standard allows
    us to do that. But compilers aren't required to provide such an
    option, and if they do the option may not do exactly what we
    expect it to do, because there is no standard specification for
    it. The C standard should define officially sanctioned
    mechanisms -- as for example standard #pragma's -- to give
    standard-defined semantics to certain constructs of undefined
    behavior that resemble, eg, -fno-strict-aliasing.

    Surely the starting point for this should be the documentation of
    the compilers to specify precisely what -fno-strict-aliasing does.
    [...]

    Not at all. It's easy to write a specification that says what we
    want to do, along similar lines to what is said in the footnote
    about union member access in section 6.5.2.3

    If the member used to access the contents of a union object
    is not the same as the member last used to store a value in
    the object, the appropriate part of the object representation
    of the value is reinterpreted as an object representation in
    the new type as described in 6.2.6 (a process sometimes called
    "type punning"). This might be a trap representation.

    Works for me but it would be good to know that this is how compiler
    writers actually understand -fno-strict-aliasing . [...]

    No, it wouldn't. Implementations follow the C standard, not
    the other way around. Looking at what implementations do for
    the -fno-strict-aliasing flag is worse than a waste of time.

    For example it has been pointed out on comp.lang.c that it's
    impossible to write a malloc() implementation in conforming
    C. This is certainly a weakness which should be addressed with
    some appropriate #pragma .

    There isn't any reason to think malloc() should be writable in
    completely portable C. That's the point of putting malloc() in
    the system library in the first place. By the way, with type
    punning semantics mentioned above being the default, and with the
    alignment features added in C11, I think it is possible to write
    malloc() in portable C without needed any additional language
    changes. But even if it isn't that is no cause for concern; one
    of the principal reasons for having a system library is to
    provide functionality that the core language cannot express (or
    cannot express conveniently).

    One might want to experiment with different allocation algorithms
    and it seems to me that this sort of thing is within the "remit" of
    C. So ideally one should be able to write it in C [...]

    You're conflating writing something in C and writing something
    in completely portable C. It's already possible to do these
    things writing in C.

    From my point of view , restrict is not a success because the
    specification of restrict is the one part of the C1999 standard I
    have given up trying to understand. I understand the underlying
    idea but the specifics elude me. [...]

    I agree the formal definition of restrict is rather daunting. In
    practice though I think using restrict with confidence is not
    overly difficult. My working model for restrict is something
    like this:

    1. Use restrict only in the declarations of function
    parameters.

    2. For a declaration like const T *restrict foo ,
    the compiler may assume that any objects that can be
    accessed through 'foo' will not be modified.

    Wouldn't that also be the case with just const T * foo ?

    No.

    3. For a declaration like T *restrict bas ,
    the compiler may assume that any changes to objects
    that can be accessed through 'bas' will be done
    using 'bas' or a pointer value derived from 'bas'
    (and in particular that no changes will happen
    other than through 'bas' or 'bas'-derived pointer
    values).

    Is this summary description helpful?

    It seems clear enough but , as I've said , I don't have any use
    for restrict anyway and it's not worth it for me to expend the
    additional mental effort to confirm that my code obeys the
    additional restrictions of restrict. [...]

    If you don't want to use restrict that is quite okay. Part of
    why I call restrict a success is that it can be ignored, with
    only minimal effort, by any developer who doesn't want to use it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Spiros Bousbouras@spibou@gmail.com to comp.std.c on Thu Aug 31 18:18:59 2023
    From Newsgroup: comp.std.c

    On Wed, 30 Aug 2023 17:40:52 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
    Spiros Bousbouras <spibou@gmail.com> writes:

    On Tue, 29 Aug 2023 04:35:40 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    Spiros Bousbouras <spibou@gmail.com> writes:

    [...]

    Not at all. It's easy to write a specification that says what we
    want to do, along similar lines to what is said in the footnote
    about union member access in section 6.5.2.3

    If the member used to access the contents of a union object
    is not the same as the member last used to store a value in
    the object, the appropriate part of the object representation
    of the value is reinterpreted as an object representation in
    the new type as described in 6.2.6 (a process sometimes called
    "type punning"). This might be a trap representation.

    Works for me but it would be good to know that this is how compiler
    writers actually understand -fno-strict-aliasing . [...]

    No, it wouldn't. Implementations follow the C standard, not
    the other way around. Looking at what implementations do for
    the -fno-strict-aliasing flag is worse than a waste of time.

    Actually the influence goes in both directions. In theory the standard is the ultimate authority , in practice whatever C compilers one has access to. For now the standard doesn't have something like -fno-strict-aliasing so if one needs it then looking at what implementations do is the only option. But even the standard committee should look at it and whether C programmers find it useful to decide what around such lines (if anything) should go into the standard.

    There isn't any reason to think malloc() should be writable in
    completely portable C. That's the point of putting malloc() in
    the system library in the first place. By the way, with type
    punning semantics mentioned above being the default, and with the
    alignment features added in C11, I think it is possible to write
    malloc() in portable C without needed any additional language
    changes. But even if it isn't that is no cause for concern; one
    of the principal reasons for having a system library is to
    provide functionality that the core language cannot express (or
    cannot express conveniently).

    One might want to experiment with different allocation algorithms
    and it seems to me that this sort of thing is within the "remit" of
    C. So ideally one should be able to write it in C [...]

    You're conflating writing something in C and writing something
    in completely portable C. It's already possible to do these
    things writing in C.

    I wrote

    One might want to experiment with different allocation algorithms and it
    seems to me that this sort of thing is within the "remit" of C. So
    ideally one should be able to write it in C and prove , starting from the
    standard or precise specifications in compiler documentation , that it
    works correctly. I don't necessarily mean prove the correctness of the
    whole code but certain key parts.

    .This doesn't conflate anything. One can do the writing but can one do the proving or something close ?
    --
    vlaho.ninja/prog
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Tue Sep 5 05:39:57 2023
    From Newsgroup: comp.std.c

    Spiros Bousbouras <spibou@gmail.com> writes:

    On Wed, 30 Aug 2023 17:40:52 -0700
    Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

    [...]

    You're conflating writing something in C and writing something
    in completely portable C. It's already possible to do these
    things writing in C.

    I wrote

    One might want to experiment with different allocation
    algorithms and it seems to me that this sort of thing is
    within the "remit" of C. So ideally one should be able to
    write it in C and prove , starting from the standard or
    precise specifications in compiler documentation , that it
    works correctly. I don't necessarily mean prove the
    correctness of the whole code but certain key parts.

    .This doesn't conflate anything. One can do the writing but
    can one do the proving or something close ?

    A substitute for malloc()/free() can be written in standard C.

    A substitute for malloc()/free() can not be written in completely
    portable standard C.

    I hope this clarifies my earlier comments.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Tue Sep 5 17:03:46 2023
    From Newsgroup: comp.std.c

    Martin Uecker <ma.uecker@gmail.com> writes:

    [...]

    There are essentially two main interests driving this. First,
    there is some interest to precisely formulate the semantics for
    C. The provenance proposal came out of this.

    Second, there is the issue of safety problems caused by
    uninitialized reads, together with compiler support for zero
    initialization etc. So there are various people who want to
    change the semantics for uninitialized variables completely
    in the interest of safety.

    So far, there was no consensus in WG14 that the rules should
    be changed or what the new rules should be.

    I have a second reply here, which I hope will come closer to
    being relevant to the issues of interest.

    What I think is being looked for is a way to describe the
    language semantics in areas such as cross-type interference and
    what is meant when an uninitialized object is read. I thought
    about this question both while I was writing the longer earlier
    reply and then more deeply afterwards.

    What I think is most important is that these areas in particular
    are not about language semantics in the same way as, for example,
    array indexing. Rather they are about what transformations a
    compiler is allowed to do in the presence of various combinations
    of program constructs. That difference means the C standard
    should express the rules in a way that more directly reflects
    what's going on. More specifically, the standard should say or
    explain what can be done, not by describing language semantics
    (which is indirect), but explicitly in terms of what compiler
    transformations are allowed (which is direct). Note that there
    is precedent for this idea, in how the C standard talks about
    looping constructs and when they may be assumed to terminate.

    To give an example, take uninitialized objects, either automatic
    variables without an initializer, or memory allocated by malloc or
    added by realloc. The most natural semantics for such situations
    is to say that newly "created" memory gets an unspecified object
    representation at the start of its lifetime. (Yes I know that C
    in its current form lets automatic objects be "uninitialized"
    whenever their declaration points are reached, but let's ignore
    that for now.) Now suppose a program has a read access where it
    is easy to deduce that the object being read is still in the
    "unspecified object representation" initial state. To simplify
    the discussion, suppose the type of the access is a pointer type,
    and so is known to have trap representations (the name is changed
    in the C23 draft, but the idea is what's important).

    What is a compiler allowed to do in such circumstances? One thing
    it might reasonably be allowed to do is to cause the program to be
    terminated if it ever reaches such an access. Or there might be
    an option to initialize the pointer to NULL. Or, if a suitable
    compiler option were invoked, the construct might be flagged with
    a fatal error (or of course a warning). There are all sorts of
    actions a developer might want the compiler to take, and a
    compiler could offer many of those options, as choices selected
    under control of command line switches (or equivalent). I think a
    few points are worth making.

    One, there must be some sort of default action that all compilers
    have to support. The default action in this case might be to
    issue a non-fatal diagnostic.

    Two, there must be a way for the developer to tell the compiler to
    "proceed blindly" - saying, in effect, I accept that the compiled
    code might misbehave, but let me take that risk, and generate code
    like it's going to work. (In other words, for the read access, go
    ahead and load whatever unspecified object representation happens
    to be there.) A "proceed blindly" choice probably shouldn't be
    the default, but it must be available.

    Three, the consequence must never be "undefined behavior", unless
    there is an explicit stipulation to that effect. The stipulation
    might take the form of a #pragma, or a compiler option, or a code
    decoration using "attribute" (whatever the syntax for such things
    is).

    I know my comments here are somewhat sketchy, but hopefully a
    general sense of the ideas gets across. The suggestions should at
    least serve to stimulate further discussion.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jakob Bohm@jb-usenet@wisemo.com.invalid to comp.std.c on Thu Sep 7 17:09:56 2023
    From Newsgroup: comp.std.c

    On 2023-09-06 02:03, Tim Rentsch wrote:
    Martin Uecker <ma.uecker@gmail.com> writes:

    [...]

    There are essentially two main interests driving this. First,
    there is some interest to precisely formulate the semantics for
    C. The provenance proposal came out of this.

    Second, there is the issue of safety problems caused by
    uninitialized reads, together with compiler support for zero
    initialization etc. So there are various people who want to
    change the semantics for uninitialized variables completely
    in the interest of safety.

    So far, there was no consensus in WG14 that the rules should
    be changed or what the new rules should be.

    I have a second reply here, which I hope will come closer to
    being relevant to the issues of interest.

    What I think is being looked for is a way to describe the
    language semantics in areas such as cross-type interference and
    what is meant when an uninitialized object is read. I thought
    about this question both while I was writing the longer earlier
    reply and then more deeply afterwards.

    What I think is most important is that these areas in particular
    are not about language semantics in the same way as, for example,
    array indexing. Rather they are about what transformations a
    compiler is allowed to do in the presence of various combinations
    of program constructs. That difference means the C standard
    should express the rules in a way that more directly reflects
    what's going on. More specifically, the standard should say or
    explain what can be done, not by describing language semantics
    (which is indirect), but explicitly in terms of what compiler
    transformations are allowed (which is direct). Note that there
    is precedent for this idea, in how the C standard talks about
    looping constructs and when they may be assumed to terminate.

    To give an example, take uninitialized objects, either automatic
    variables without an initializer, or memory allocated by malloc or
    added by realloc. The most natural semantics for such situations
    is to say that newly "created" memory gets an unspecified object representation at the start of its lifetime. (Yes I know that C
    in its current form lets automatic objects be "uninitialized"
    whenever their declaration points are reached, but let's ignore
    that for now.) Now suppose a program has a read access where it
    is easy to deduce that the object being read is still in the
    "unspecified object representation" initial state. To simplify
    the discussion, suppose the type of the access is a pointer type,
    and so is known to have trap representations (the name is changed
    in the C23 draft, but the idea is what's important).

    What is a compiler allowed to do in such circumstances? One thing
    it might reasonably be allowed to do is to cause the program to be
    terminated if it ever reaches such an access. Or there might be
    an option to initialize the pointer to NULL. Or, if a suitable
    compiler option were invoked, the construct might be flagged with
    a fatal error (or of course a warning). There are all sorts of
    actions a developer might want the compiler to take, and a
    compiler could offer many of those options, as choices selected
    under control of command line switches (or equivalent). I think a
    few points are worth making.

    One, there must be some sort of default action that all compilers
    have to support. The default action in this case might be to
    issue a non-fatal diagnostic.

    Two, there must be a way for the developer to tell the compiler to
    "proceed blindly" - saying, in effect, I accept that the compiled
    code might misbehave, but let me take that risk, and generate code
    like it's going to work. (In other words, for the read access, go
    ahead and load whatever unspecified object representation happens
    to be there.) A "proceed blindly" choice probably shouldn't be
    the default, but it must be available.

    Three, the consequence must never be "undefined behavior", unless
    there is an explicit stipulation to that effect. The stipulation
    might take the form of a #pragma, or a compiler option, or a code
    decoration using "attribute" (whatever the syntax for such things
    is).


    Agreed so far!

    As a developer of programs in C with practical but not infinite
    portability, I very much abhore the mad optimizations that use
    language lawyering to state that any code path that might,
    hypothetically, exceed the boundaries of standard-enforced behavior
    is allowed to be arbitrarily mangled to get a faster bad result.

    For example, I have one function which intentionally reads an
    uninitialized variable to get a somewhat arbitrary value of a type
    with no known trap representation. I have a number of other
    programs which extensively process a block of data before deciding
    in some other way if the data is garbage or useful. This is done
    for sound technical reasons but requires that the compiler doesn't
    plant landmines all over virgin land.

    As another example, I have speed critical code that relies on running
    on 2s complement machines with wraparound on signed integer overflow,
    and that code is being very clear and explicit in doing so, but there
    is no C90 notation to tell all ISO-C implementation that this is the
    intention, thus it is explicit only in comments, not in the tokens
    passed to the C compiler.

    I know my comments here are somewhat sketchy, but hopefully a
    general sense of the ideas gets across. The suggestions should at
    least serve to stimulate further discussion.


    I am writing from a similar perspective .

    Enjoy

    Jakob
    --
    Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
    Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
    This public discussion message is non-binding and may contain errors.
    WiseMo - Remote Service Management for PCs, Phones and Embedded
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.std.c on Thu Sep 7 17:19:56 2023
    From Newsgroup: comp.std.c

    Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

    As another example, I have speed critical code that relies on running
    on 2s complement machines with wraparound on signed integer overflow, and that code is being very clear and explicit in doing so, but there
    is no C90 notation to tell all ISO-C implementation that this is the intention, thus it is explicit only in comments, not in the tokens
    passed to the C compiler.

    You can tell the compiler you want 2s complement by using the intN_t
    types if you can find one that suits your portability requirements.

    And can you not use unsigned arithmetic, re-interpreting as signed for
    those places where it matters? The "overflow" can only happen in
    the arithmetic, not in the re-interpretation.

    I know this is a deviation from the topic, so feel free to ignore if you
    don't want to get into it.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jakob Bohm@jb-usenet@wisemo.com.invalid to comp.std.c on Fri Sep 8 23:12:00 2023
    From Newsgroup: comp.std.c

    On 2023-09-07 18:19, Ben Bacarisse wrote:
    Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

    As another example, I have speed critical code that relies on running
    on 2s complement machines with wraparound on signed integer overflow, and
    that code is being very clear and explicit in doing so, but there
    is no C90 notation to tell all ISO-C implementation that this is the
    intention, thus it is explicit only in comments, not in the tokens
    passed to the C compiler.

    You can tell the compiler you want 2s complement by using the intN_t
    types if you can find one that suits your portability requirements.

    And can you not use unsigned arithmetic, re-interpreting as signed for
    those places where it matters? The "overflow" can only happen in
    the arithmetic, not in the re-interpretation.

    I know this is a deviation from the topic, so feel free to ignore if you don't want to get into it.


    The code in question has as explicit design condition that the compiler implements signed versions with wraparound for each unsigned int type .

    The code cannot rely on the intN_t types because they were not part of
    C90 and thus do not exist as separate types in some targeted compilers.

    In the world of C90 compilers, stdint.h was a non-standard system header
    that provided convenience names for the most closely matching C90 types
    on the platform, and some platforms simply didn't provide that header,
    instead documenting how each C90 type mapped to data sizes.

    Excessive casting where directly using the desired type seems possible
    is highly counter-intuitive and thus it is inherently wrong for an
    optimizer to presume the right to mangle code using types such as "int",
    "short int", "long int" and "signed char".

    Once again this comes down to a language drift from "undefined" meaning
    "not defined by this standard" to "An extremely toxic trap condition" .


    Enjoy

    Jakob
    --
    Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
    Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
    This public discussion message is non-binding and may contain errors.
    WiseMo - Remote Service Management for PCs, Phones and Embedded
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.std.c on Fri Sep 8 22:31:04 2023
    From Newsgroup: comp.std.c

    Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

    On 2023-09-07 18:19, Ben Bacarisse wrote:
    Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

    As another example, I have speed critical code that relies on running
    on 2s complement machines with wraparound on signed integer overflow, and >>> that code is being very clear and explicit in doing so, but there
    is no C90 notation to tell all ISO-C implementation that this is the
    intention, thus it is explicit only in comments, not in the tokens
    passed to the C compiler.
    You can tell the compiler you want 2s complement by using the intN_t
    types if you can find one that suits your portability requirements.
    And can you not use unsigned arithmetic, re-interpreting as signed for
    those places where it matters? The "overflow" can only happen in
    the arithmetic, not in the re-interpretation.
    I know this is a deviation from the topic, so feel free to ignore if you
    don't want to get into it.

    The code in question has as explicit design condition that the compiler implements signed versions with wraparound for each unsigned int type .

    The code cannot rely on the intN_t types because they were not part of
    C90 and thus do not exist as separate types in some targeted
    compilers.

    Ah, I didn't know targetting C90 was still a thing. I've been out of
    the business for many years.

    Excessive casting where directly using the desired type seems possible
    is highly counter-intuitive and thus it is inherently wrong for an
    optimizer to presume the right to mangle code using types such as "int", "short int", "long int" and "signed char".

    I wasn't suggesting casts as they don't remove the undefined behaviour.
    But you have a design that suits your needs so it's all good.
    --
    Ben.
    --- Synchronet 3.20a-Linux NewsLink 1.114