Forum: War Ensemble BBS

store to wide load forwarding

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Jan 31 11:33:30 2026

From Newsgroup: comp.arch

I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
from gcc's auto-vectorization for the bubble-sort benchmark of John
Hennessy's collection of small integer benchmarks. The reason is that auto-vectorization turns two 4-byte stores into one 8-byte store, and
in the next iteration of bubble-sort two 4-byte loads are
auto-vectorized into an 8-byte load, but this load only partially
overlaps the store. This results in taking a slow path in
store-to-load forwarding. By contrast, without auto-vectorization the
stores and the loads are 4-byte wide, store-to-load forwarding sees a
full overlap, and a fast path is taken.

I found that gcc-14.2 is significantly less aggressive in vectorizing
than gcc-12.2, but still incurs the above-mentioned slowdown. But I
only checked that later. First I wondered whether gcc-14.2 would
still see a slowdown from auto-vectorization, and in which
store-to-load forwarding cases it would happen. You can find the
results at <https://www.complang.tuwien.ac.at/anton/stwlf/>.

For those who want the gist:

* Narrow (8-byte) completely overlapping store-to-load forwarding (all
those cases we see in the -O code) is fast on Zen 3 and Zen 4 in all
measured cases, and on the other microarchitectures in most measured
cases.

* Wide (16-byte) completely overlapping store-to-load forwarding (-O3
code fdor the wl>ws=>wl case) is significantly slower on those
machines where the non-vectorized counterpart is fast (Zen4, Zen3,
Gracemont), but on a number of uarchs the non-vectorized counterpart
has the same slowdown.

* Narrow-to-wide or partially overlapping wide-to-wide store-to-load
forwarding is very slow and tends to become slower (in cycles) with
newer generations. It is already slow if the dependency chain ends
soon after the wide load, and cases involving recurrences tend to be
even slower.

* Wide-store-to-narrow-load forwarding is cheap.

So it seems that unless the compiler has very good knowledge that the
wide load was not preceded by a recent store to one of the involved
addresses, it is better not to vectorize two narrow loads into a wide
load.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Jan 31 12:27:40 2026

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
from gcc's auto-vectorization for the bubble-sort benchmark of John Hennessy's collection of small integer benchmarks.

What is the PR number?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.arch on Sat Jan 31 18:42:49 2026

From Newsgroup: comp.arch

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
from gcc's auto-vectorization for the bubble-sort benchmark of John
Hennessy's collection of small integer benchmarks.

What is the PR number?

By now you should know that I consider gcc bug reports a waste of
time. I last told you that in
<2025Jul15.080403@mips.complang.tuwien.ac.at> and gave PR93811 as an
example where I have wasted my time with creating a PR, and the status
of this PR has not changed in the meantime.

You seem to think that it is worthwhile creating gcc bug reports, so
go ahead and create one yourself. I think the web page contains all information necessary, but if you miss something, let me know.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sat Jan 31 21:11:24 2026

From Newsgroup: comp.arch

On Sat, 31 Jan 2026 18:42:49 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

I have seen big slowdowns (factor 5.7 on a Rocket Lake with
gcc-14.2) from gcc's auto-vectorization for the bubble-sort
benchmark of John Hennessy's collection of small integer
benchmarks.

What is the PR number?

By now you should know that I consider gcc bug reports a waste of
time. I last told you that in
<2025Jul15.080403@mips.complang.tuwien.ac.at> and gave PR93811 as an
example where I have wasted my time with creating a PR, and the status
of this PR has not changed in the meantime.

You seem to think that it is worthwhile creating gcc bug reports, so
go ahead and create one yourself. I think the web page contains all information necessary, but if you miss something, let me know.

- anton

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't really
count.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sat Jan 31 21:21:42 2026

From Newsgroup: comp.arch

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

Thomas Koenig <tkoenig@netcologne.de> writes:

Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:

I have seen big slowdowns (factor 5.7 on a Rocket Lake with gcc-14.2)
from gcc's auto-vectorization for the bubble-sort benchmark of John
Hennessy's collection of small integer benchmarks.

What is the PR number?

By now you should know that I consider gcc bug reports a waste of
time.

Posting to this newsgroup certainly is, at least as far as actually accomplishing anything is concerned. Otherwise, you would have
at least a chance of having this fixed, especially if it is
a regression.

But let me qualify the above statement: Make a self-contained, small
test case, and I'll submit a PR for you.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 00:33:15 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't really
count.

The case in point appears to be a regression, which are supposed to be
fixed, and receive much higher attention than "normal" bugs.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 11:06:29 2026

From Newsgroup: comp.arch

On Sun, 1 Feb 2026 00:33:15 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't
really count.

The case in point appears to be a regression, which are supposed to be
fixed, and receive much higher attention than "normal" bugs.

According to my experience, it could receive higher attention than
average pessimization case, but there is close to zero chance
that it would be fixed at the end.
The typical scenario for such cases is that they "fall between chairs"
of tree optimization and target code generation and neither party is
taking responsibility.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 09:16:27 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't really
count.

Maybe some figures to put this into perspective.

In 2025, 593 missed-optimization bugs were closed, most of them
marked as fixed, 528 new ones were submitted. As of today, there
are 3672 missed-optimization bugs open, so we are looking at arount
a 6 year average turnover.

97 missed-optimization regressions were submitted in 2025, with
174 of them closed, with 318 missed-optimization regressions open
right now, so it is more of a two-year average turnover (and
there seems to be progress in reducing those).

I chose 2025 because it is easy to search for; it does not
correspond to a gcc release cycle.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 09:17:11 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

On Sun, 1 Feb 2026 00:33:15 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't
really count.

The case in point appears to be a regression, which are supposed to be
fixed, and receive much higher attention than "normal" bugs.

According to my experience, it could receive higher attention than
average pessimization case, but there is close to zero chance
that it would be fixed at the end.
The typical scenario for such cases is that they "fall between chairs"
of tree optimization and target code generation and neither party is
taking responsibility.

Do you have an example?
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From David Brown@david.brown@hesbynett.no to comp.arch on Sun Feb 1 11:50:29 2026

From Newsgroup: comp.arch

On 01/02/2026 01:33, Thomas Koenig wrote:

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't really
count.

As have I.

There is also the fact that not all changes or improvements that might
have been inspired by bug reports lead to comments or closure on the bug report itself. Like all large development projects, there's a mismatch between the people interested in doing the technical stuff and improving
the program, and the interest in paperwork and bureaucracy. That is particularly true for things that involve larger, more structural or algorithmic changes, rather than individual small patches.

The case in point appears to be a regression, which are supposed to be
fixed, and receive much higher attention than "normal" bugs.

Yes, that's the idea. You still won't get 100% hit rate, but it will be higher than for general "missed optimisation" reports.

And of course with any major changes to code generation, there are
likely to be some regressions - if it's 3 steps forward and 1 step back,
it can be a positive improvement in general even though there are
regressions. If that's the case, then the answer is likely to be a
compiler flag or tuneable for helping the particular code.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 13:18:17 2026

From Newsgroup: comp.arch

On Sun, 1 Feb 2026 09:16:27 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't
really count.

Maybe some figures to put this into perspective.

In 2025, 593 missed-optimization bugs were closed, most of them
marked as fixed, 528 new ones were submitted. As of today, there
are 3672 missed-optimization bugs open, so we are looking at arount
a 6 year average turnover.

97 missed-optimization regressions were submitted in 2025, with
174 of them closed, with 318 missed-optimization regressions open
right now, so it is more of a two-year average turnover (and
there seems to be progress in reducing those).

I chose 2025 because it is easy to search for; it does not
correspond to a gcc release cycle.

I very rarely submit "missed optimization" bugs.
As far as I am concerned, missed optimization is not a bug, it is
business as usual, with the hope of improvement in the future.
My cases are nearly always "compiler tries too be too smart with
horrible consequences" rather than "compiler is too stupid".

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 11:31:56 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

I very rarely submit "missed optimization" bugs.
As far as I am concerned, missed optimization is not a bug, it is
business as usual, with the hope of improvement in the future.

It still makes sense to submit a PR (which stands for Problem
Report) so people can look at it when they want to improve
code generation. I have submitted quite a few of these - of the
733 PRs I have submitted so far, 120 were missed-optimization.

My cases are nearly always "compiler tries too be too smart with
horrible consequences" rather than "compiler is too stupid".

Keyword wrong-code then, or suboptimal code generation? The latter
is also classified as missed-optimiztation.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 13:35:16 2026

From Newsgroup: comp.arch

On Sun, 1 Feb 2026 09:17:11 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

On Sun, 1 Feb 2026 00:33:15 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't
really count.

The case in point appears to be a regression, which are supposed
to be fixed, and receive much higher attention than "normal" bugs.

According to my experience, it could receive higher attention than
average pessimization case, but there is close to zero chance
that it would be fixed at the end.
The typical scenario for such cases is that they "fall between
chairs" of tree optimization and target code generation and neither
party is taking responsibility.

Do you have an example?

Here is a good example
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
By chance, it is remotely related to Anton's case.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 13:54:11 2026

From Newsgroup: comp.arch

On Sun, 1 Feb 2026 13:18:17 +0200
Michael S <already5chosen@yahoo.com> wrote:

On Sun, 1 Feb 2026 09:16:27 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

I had above-zero success rate with gcc bug reports related to
compiler pessimization. May be, 10%. May be even 15%. I didn't
really count.

Maybe some figures to put this into perspective.

In 2025, 593 missed-optimization bugs were closed, most of them
marked as fixed, 528 new ones were submitted. As of today, there
are 3672 missed-optimization bugs open, so we are looking at arount
a 6 year average turnover.

97 missed-optimization regressions were submitted in 2025, with
174 of them closed, with 318 missed-optimization regressions open
right now, so it is more of a two-year average turnover (and
there seems to be progress in reducing those).

I chose 2025 because it is easy to search for; it does not
correspond to a gcc release cycle.

I very rarely submit "missed optimization" bugs.
As far as I am concerned, missed optimization is not a bug, it is
business as usual, with the hope of improvement in the future.
My cases are nearly always "compiler tries too be too smart with
horrible consequences" rather than "compiler is too stupid".

I just re-checked one of very few cases where what I reported could be
properly called "missed optimization". https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86975

It was fixed in gcc14, 5 or 6 years later.

https://godbolt.org/z/T1o34ean8

"The mills of Gcc maintanance grind slow, but they grind exceedingly
fine"

Unfortunately, only on MIPS, for which I care very little.
It was not fixed on Nios2, the architecture that I really care about,
because Nios2 is no longer supported by gcc.

--- Synchronet 3.21b-Linux NewsLink 1.2

From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Feb 1 11:58:58 2026

From Newsgroup: comp.arch

Michael S <already5chosen@yahoo.com> schrieb:

Here is a good example
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
By chance, it is remotely related to Anton's case.

That's a good one, and would apparently quite some work to fix.
I've pinged it, BTW.

By the way, regarding your quota of fixed bugs: I see 15 bugs
submitted, 3 as WONTFIX, 5 as FIXED. If you take out the WONTFIX
(for an architecture which is no longer supported due to lack of
a maintainer), you have a 42% success quota so far, at least for
the e-mail address in the PR, not 10-15% as you estimated.
--
This USENET posting was made without artificial intelligence,
artificial impertinence, artificial arrogance, artificial stupidity,
artificial flavorings or artificial colorants.
--- Synchronet 3.21b-Linux NewsLink 1.2

From Michael S@already5chosen@yahoo.com to comp.arch on Sun Feb 1 14:11:18 2026

From Newsgroup: comp.arch

On Sun, 1 Feb 2026 11:58:58 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

Michael S <already5chosen@yahoo.com> schrieb:

Here is a good example
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105617
By chance, it is remotely related to Anton's case.

That's a good one, and would apparently quite some work to fix.
I've pinged it, BTW.

By the way, regarding your quota of fixed bugs: I see 15 bugs
submitted, 3 as WONTFIX, 5 as FIXED. If you take out the WONTFIX
(for an architecture which is no longer supported due to lack of
a maintainer), you have a 42% success quota so far, at least for
the e-mail address in the PR, not 10-15% as you estimated.

Here is one case that I did not submit [not just out of laziness, but
also], because I was not sure whether it is bug or feature. https://www.realworldtech.com/forum/?threadid=226267&curpostid=226267
Although the observation made by Freddie causes me to believe that the
change of behavior was not intentional.

--- Synchronet 3.21b-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Tue Feb 3 08:27:31 2026
  from Moore, Ok via Telnet
- Noozle
  Tue Feb 3 07:13:28 2026
  from Noozle City via Telnet
- Microbot
  Mon Feb 2 10:07:38 2026
  from Moore, Ok via Telnet
- Noozle
  Mon Feb 2 08:57:17 2026
  from Noozle City via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,096
Nodes:	10 (0 / 10)
Uptime:	398:09:28
Calls:	14,036
Calls today:	2
Files:	187,082
D/L today:	2,450 files (1,578M bytes)
Messages:	2,479,082

store to wide load forwarding

Who's Online

Recent Visitors

System Info