• Arrays and pointer arithmetic

    From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.std.c on Mon Feb 28 12:07:33 2022
    From Newsgroup: comp.std.c

    This posting is prompted by a discussion in comp.lang.c++ about
    arrays and pointer arithmetic. An excerpt from that discussion
    is given below, to provide context for anyone who did not see the
    recent comp.lang.c++ discussion.

    Consider three aspects of behavior in C:

    * reading a union member after storing into a different
    member

    * sequencing rules for expressions

    * implications for code reordering around a volatile access

    Here are some notes about these areas.

    For reading a union member after a different member has been
    written, C90 says the value is implementation defined. C99 does
    not say that, and explains what appears to be a different rule in
    a non-normative footnote. Yet meeting notes from the ISO C
    website indicate that the C99 description is meant to convey the
    same semantics as the C90 description (or vice versa).

    For sequencing within a single expression, there was a famous
    debate about whether (for C90 and C99) an assignment such as
    'a[a[0]] = 4;', where a[0] initially has the value 0, has defined
    behavior or undefined behavior. A straightforward reading of the
    C90/C99 text suggests it was undefined. In C11, the description
    of sequencing rules was revised, and under the C11 description
    the behavior is, pretty unambiguously, well defined. Yet there
    is no mention of the C11 sequencing rules constituting a change
    from the C90/C99 rules; apparently the C11 description was meant
    to be, at best, a clarification, but without any change to what
    the semantics are.

    For code reordering around a volatile access, it's easy to draw
    the conclusion that the C standard allows no movement (i.e., for
    purposes of optimization) of any earlier or later reads or writes
    across the volatile access expression. Yet discussion with
    committee members definitely indicates that some such code
    movement is allowed, despite what the C standard text would
    plainly indicate.

    Another example has to do with type rules for printf() arguments.
    If there is a printf() call such as

    printf( "%u", 7 );

    is the behavior defined or undefined? There are reasonable
    positions both pro and con. How are we to understand which view
    better represents the judgment of the committee members? (I take
    it as given that a judgment from the ISO C committee constitutes
    the ultimate authority as to what the C standard either requires
    or allows.) Incidentally, in the recent draft N2731, there is
    new wording that answers this question in favor of the behavior
    being defined, not undefined.

    How are we to make sense of these apparent incongruities? All of
    these cases can be understood using a single explanation: members
    of the ISO C committee have a mental model for how the language
    is supposed to behave in each case, and what is written in the
    ISO C standard is meant to reflect those models, but sometimes the
    writing falls short. When it does, the model prevails, because
    as far as the members' view is concerned, "the truth" is what the
    model says, not what the words say.

    The description of semantic rules for pointer arithmetic talk
    about situations where "the expression P points [...] to an
    element of an array object [...]", but it isn't always clear what
    "array object" is being referenced (in particular in the presence
    of allocated memory).

    My understanding of what C allows for pointer arithmetic is as
    follows. What matters is where the pointer value in question
    originally came from. If the original pointer value pointed to
    an element of an array (with suitable language to handle the case
    of pointing one past the last element of the array), further use
    of that pointer value (e.g., by means of casting) is allowed to
    access all the memory occupied by the array of the element of the
    original pointer value source. Thus in the example below the
    address &foo points to a single element array that coincides with
    all of the memory occupied by foo, and thus it may access (after
    the castint) all of the int elements of the two-dimensional array.

    Evidence for this mental model, and for committee members holding
    it, can be seen in various official ISO C writings on their
    website, when the "provenance" of pointer values is discussed.
    My understanding of what C allows here is based partly or perhaps
    mostly on those written discussions.

    When I say below "an argument could be made...", it doesn't mean
    that I feel unsure about my own understanding. What it does mean
    is that someone reading just the text in the C standard, and
    nothing else, might very well reach a different conclusion. My
    comment is meant to acknowledge that such positions may exist,
    even though I myself don't find them persuasive.

    I hope this explanation clarifies both what I meant and why I
    have reached the conclusions that I have.


    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Ben Bacarisse <ben.usenet@bsb.me.uk> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    [edited for brevity]

    If we have this code fragment

    int foo[10][20];
    extern void set_elements( int *, size_t, int )

    set_elements( (int*) &foo, 10*20, -1 );

    an argument could be made that set_elements() cannot use pointer
    arithmetic (including that implied by use of []) on its first
    argument other than to access between foo[0][0] and foo[0][19] (or
    to construct a pointer to foo[0][20]). [...]

    [...] It's what direct additions and subtractions are permitted
    for any given pointer that I no longer feel sure about. Your "a
    case could be made" suggests you are not entirely sure either,
    though it does suggest you consider that case is a stretch.

    The implied question here has a somewhat longish answer. I'll
    get to it when I can. Also, as it seems we have drifted rather
    far from C++, comp.std.c is I think a better place to continue.

    (This concludes the quoted excerpt.)
    --- Synchronet 3.19c-Linux NewsLink 1.113