• store to wide load forwarding

    From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Jan 31 11:33:30 2026
    From Newsgroup: comp.arch

    I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
    from gcc's auto-vectorization for the bubble-sort benchmark of John
    Hennessy's collection of small integer benchmarks. The reason is that auto-vectorization turns two 4-byte stores into one 8-byte store, and
    in the next iteration of bubble-sort two 4-byte loads are
    auto-vectorized into an 8-byte load, but this load only partially
    overlaps the store. This results in taking a slow path in
    store-to-load forwarding. By contrast, without auto-vectorization the
    stores and the loads are 4-byte wide, store-to-load forwarding sees a
    full overlap, and a fast path is taken.

    I found that gcc-14.2 is significantly less aggressive in vectorizing
    than gcc-12.2, but still incurs the above-mentioned slowdown. But I
    only checked that later. First I wondered whether gcc-14.2 would
    still see a slowdown from auto-vectorization, and in which
    store-to-load forwarding cases it would happen. You can find the
    results at <https://www.complang.tuwien.ac.at/anton/stwlf/>.

    For those who want the gist:

    * Narrow (8-byte) completely overlapping store-to-load forwarding (all
    those cases we see in the -O code) is fast on Zen 3 and Zen 4 in all
    measured cases, and on the other microarchitectures in most measured
    cases.

    * Wide (16-byte) completely overlapping store-to-load forwarding (-O3
    code fdor the wl>ws=>wl case) is significantly slower on those
    machines where the non-vectorized counterpart is fast (Zen4, Zen3,
    Gracemont), but on a number of uarchs the non-vectorized counterpart
    has the same slowdown.

    * Narrow-to-wide or partially overlapping wide-to-wide store-to-load
    forwarding is very slow and tends to become slower (in cycles) with
    newer generations. It is already slow if the dependency chain ends
    soon after the wide load, and cases involving recurrences tend to be
    even slower.

    * Wide-store-to-narrow-load forwarding is cheap.

    So it seems that unless the compiler has very good knowledge that the
    wide load was not preceded by a recent store to one of the involved
    addresses, it is better not to vectorize two narrow loads into a wide
    load.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Jan 31 12:27:40 2026
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
    from gcc's auto-vectorization for the bubble-sort benchmark of John Hennessy's collection of small integer benchmarks.

    What is the PR number?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Jan 31 18:42:49 2026
    From Newsgroup: comp.arch

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
    from gcc's auto-vectorization for the bubble-sort benchmark of John
    Hennessy's collection of small integer benchmarks.

    What is the PR number?

    By now you should know that I consider gcc bug reports a waste of
    time. I last told you that in
    <2025Jul15.080403@mips.complang.tuwien.ac.at> and gave PR93811 as an
    example where I have wasted my time with creating a PR, and the status
    of this PR has not changed in the meantime.

    You seem to think that it is worthwhile creating gcc bug reports, so
    go ahead and create one yourself. I think the web page contains all information necessary, but if you miss something, let me know.

    - anton
    --
    'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
    Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sat Jan 31 21:11:24 2026
    From Newsgroup: comp.arch

    On Sat, 31 Jan 2026 18:42:49 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    I have seen big slowdowns (factor 5.7 on a Rocket Lake with
    gcc-14.2) from gcc's auto-vectorization for the bubble-sort
    benchmark of John Hennessy's collection of small integer
    benchmarks.

    What is the PR number?

    By now you should know that I consider gcc bug reports a waste of
    time. I last told you that in
    <2025Jul15.080403@mips.complang.tuwien.ac.at> and gave PR93811 as an
    example where I have wasted my time with creating a PR, and the status
    of this PR has not changed in the meantime.

    You seem to think that it is worthwhile creating gcc bug reports, so
    go ahead and create one yourself. I think the web page contains all information necessary, but if you miss something, let me know.

    - anton

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't really
    count.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Jan 31 21:21:42 2026
    From Newsgroup: comp.arch

    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    Thomas Koenig <tkoenig@netcologne.de> writes:
    Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
    I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
    from gcc's auto-vectorization for the bubble-sort benchmark of John
    Hennessy's collection of small integer benchmarks.

    What is the PR number?

    By now you should know that I consider gcc bug reports a waste of
    time.

    Posting to this newsgroup certainly is, at least as far as actually accomplishing anything is concerned. Otherwise, you would have
    at least a chance of having this fixed, especially if it is
    a regression.

    But let me qualify the above statement: Make a self-contained, small
    test case, and I'll submit a PR for you.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 00:33:15 2026
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't really
    count.

    The case in point appears to be a regression, which are supposed to be
    fixed, and receive much higher attention than "normal" bugs.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 11:06:29 2026
    From Newsgroup: comp.arch

    On Sun, 1 Feb 2026 00:33:15 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't
    really count.

    The case in point appears to be a regression, which are supposed to be
    fixed, and receive much higher attention than "normal" bugs.


    According to my experience, it could receive higher attention than
    average pessimization case, but there is close to zero chance
    that it would be fixed at the end.
    The typical scenario for such cases is that they "fall between chairs"
    of tree optimization and target code generation and neither party is
    taking responsibility.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 09:16:27 2026
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't really
    count.

    Maybe some figures to put this into perspective.

    In 2025, 593 missed-optimization bugs were closed, most of them
    marked as fixed, 528 new ones were submitted. As of today, there
    are 3672 missed-optimization bugs open, so we are looking at arount
    a 6 year average turnover.

    97 missed-optimization regressions were submitted in 2025, with
    174 of them closed, with 318 missed-optimization regressions open
    right now, so it is more of a two-year average turnover (and
    there seems to be progress in reducing those).

    I chose 2025 because it is easy to search for; it does not
    correspond to a gcc release cycle.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 09:17:11 2026
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:
    On Sun, 1 Feb 2026 00:33:15 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't
    really count.

    The case in point appears to be a regression, which are supposed to be
    fixed, and receive much higher attention than "normal" bugs.


    According to my experience, it could receive higher attention than
    average pessimization case, but there is close to zero chance
    that it would be fixed at the end.
    The typical scenario for such cases is that they "fall between chairs"
    of tree optimization and target code generation and neither party is
    taking responsibility.

    Do you have an example?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Sun Feb 1 11:50:29 2026
    From Newsgroup: comp.arch

    On 01/02/2026 01:33, Thomas Koenig wrote:
    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't really
    count.

    As have I.

    There is also the fact that not all changes or improvements that might
    have been inspired by bug reports lead to comments or closure on the bug report itself. Like all large development projects, there's a mismatch between the people interested in doing the technical stuff and improving
    the program, and the interest in paperwork and bureaucracy. That is particularly true for things that involve larger, more structural or algorithmic changes, rather than individual small patches.


    The case in point appears to be a regression, which are supposed to be
    fixed, and receive much higher attention than "normal" bugs.


    Yes, that's the idea. You still won't get 100% hit rate, but it will be higher than for general "missed optimisation" reports.

    And of course with any major changes to code generation, there are
    likely to be some regressions - if it's 3 steps forward and 1 step back,
    it can be a positive improvement in general even though there are
    regressions. If that's the case, then the answer is likely to be a
    compiler flag or tuneable for helping the particular code.

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 13:18:17 2026
    From Newsgroup: comp.arch

    On Sun, 1 Feb 2026 09:16:27 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't
    really count.

    Maybe some figures to put this into perspective.

    In 2025, 593 missed-optimization bugs were closed, most of them
    marked as fixed, 528 new ones were submitted. As of today, there
    are 3672 missed-optimization bugs open, so we are looking at arount
    a 6 year average turnover.

    97 missed-optimization regressions were submitted in 2025, with
    174 of them closed, with 318 missed-optimization regressions open
    right now, so it is more of a two-year average turnover (and
    there seems to be progress in reducing those).

    I chose 2025 because it is easy to search for; it does not
    correspond to a gcc release cycle.


    I very rarely submit "missed optimization" bugs.
    As far as I am concerned, missed optimization is not a bug, it is
    business as usual, with the hope of improvement in the future.
    My cases are nearly always "compiler tries too be too smart with
    horrible consequences" rather than "compiler is too stupid".

    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 11:31:56 2026
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:

    I very rarely submit "missed optimization" bugs.
    As far as I am concerned, missed optimization is not a bug, it is
    business as usual, with the hope of improvement in the future.

    It still makes sense to submit a PR (which stands for Problem
    Report) so people can look at it when they want to improve
    code generation. I have submitted quite a few of these - of the
    733 PRs I have submitted so far, 120 were missed-optimization.

    My cases are nearly always "compiler tries too be too smart with
    horrible consequences" rather than "compiler is too stupid".

    Keyword wrong-code then, or suboptimal code generation? The latter
    is also classified as missed-optimiztation.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 13:35:16 2026
    From Newsgroup: comp.arch

    On Sun, 1 Feb 2026 09:17:11 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:
    On Sun, 1 Feb 2026 00:33:15 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't
    really count.

    The case in point appears to be a regression, which are supposed
    to be fixed, and receive much higher attention than "normal" bugs.


    According to my experience, it could receive higher attention than
    average pessimization case, but there is close to zero chance
    that it would be fixed at the end.
    The typical scenario for such cases is that they "fall between
    chairs" of tree optimization and target code generation and neither
    party is taking responsibility.

    Do you have an example?

    Here is a good example
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
    By chance, it is remotely related to Anton's case.


    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 13:54:11 2026
    From Newsgroup: comp.arch

    On Sun, 1 Feb 2026 13:18:17 +0200
    Michael S <already5chosen@yahoo.com> wrote:

    On Sun, 1 Feb 2026 09:16:27 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:

    I had above-zero success rate with gcc bug reports related to
    compiler pessimization. May be, 10%. May be even 15%. I didn't
    really count.

    Maybe some figures to put this into perspective.

    In 2025, 593 missed-optimization bugs were closed, most of them
    marked as fixed, 528 new ones were submitted. As of today, there
    are 3672 missed-optimization bugs open, so we are looking at arount
    a 6 year average turnover.

    97 missed-optimization regressions were submitted in 2025, with
    174 of them closed, with 318 missed-optimization regressions open
    right now, so it is more of a two-year average turnover (and
    there seems to be progress in reducing those).

    I chose 2025 because it is easy to search for; it does not
    correspond to a gcc release cycle.


    I very rarely submit "missed optimization" bugs.
    As far as I am concerned, missed optimization is not a bug, it is
    business as usual, with the hope of improvement in the future.
    My cases are nearly always "compiler tries too be too smart with
    horrible consequences" rather than "compiler is too stupid".


    I just re-checked one of very few cases where what I reported could be
    properly called "missed optimization". https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86975

    It was fixed in gcc14, 5 or 6 years later.

    https://godbolt.org/z/T1o34ean8

    "The mills of Gcc maintanance grind slow, but they grind exceedingly
    fine"

    Unfortunately, only on MIPS, for which I care very little.
    It was not fixed on Nios2, the architecture that I really care about,
    because Nios2 is no longer supported by gcc.



    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 11:58:58 2026
    From Newsgroup: comp.arch

    Michael S <already5chosen@yahoo.com> schrieb:

    Here is a good example
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
    By chance, it is remotely related to Anton's case.

    That's a good one, and would apparently quite some work to fix.
    I've pinged it, BTW.

    By the way, regarding your quota of fixed bugs: I see 15 bugs
    submitted, 3 as WONTFIX, 5 as FIXED. If you take out the WONTFIX
    (for an architecture which is no longer supported due to lack of
    a maintainer), you have a 42% success quota so far, at least for
    the e-mail address in the PR, not 10-15% as you estimated.
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21b-Linux NewsLink 1.2
  • From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 14:11:18 2026
    From Newsgroup: comp.arch

    On Sun, 1 Feb 2026 11:58:58 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:

    Michael S <already5chosen@yahoo.com> schrieb:

    Here is a good example
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
    By chance, it is remotely related to Anton's case.

    That's a good one, and would apparently quite some work to fix.
    I've pinged it, BTW.

    By the way, regarding your quota of fixed bugs: I see 15 bugs
    submitted, 3 as WONTFIX, 5 as FIXED. If you take out the WONTFIX
    (for an architecture which is no longer supported due to lack of
    a maintainer), you have a 42% success quota so far, at least for
    the e-mail address in the PR, not 10-15% as you estimated.


    Here is one case that I did not submit [not just out of laziness, but
    also], because I was not sure whether it is bug or feature. https://www.realworldtech.com/forum/?threadid=226267&curpostid=226267
    Although the observation made by Freddie causes me to believe that the
    change of behavior was not intentional.


    --- Synchronet 3.21b-Linux NewsLink 1.2