• Re: More of my philosophy about the AMD infinity architecture andmore of my thoughts..

    From Angel@vvvvvvvvv11111111@yahoo.com to comp.programming on Thu Apr 13 15:25:53 2023
    From Newsgroup: comp.programming

    Hey, drug...................................................😏
    On Sunday, November 13, 2022 at 12:19:54 AM UTC+2, Amine Moulay Ramdane wrote:
    Hello,



    More of my philosophy about the AMD infinity architecture and more of my thoughts..

    I am a white arab, and i think i am smart since i have also
    invented many scalable algorithms and algorithms..


    So i think that AMD company from USA is moving from the infinity fabric to the infinity architecture, so read in the following article so that
    to notice it:

    https://www.anandtech.com/show/15596/amd-moves-from-infinity-fabric-to-infinity-architecture-connecting-everything-to-everything


    And reread my following thoughts that bring more precision:

    More of my philosophy about the new Epyc Genoa and about Core Complex Die (CCD) and Core-complex(CCX) and more of my thoughts..

    I have just looked at the following paper from AMD and i invite
    you to look at it:

    https://developer.amd.com/wp-content/resources/56827-1-0.pdf

    And as you notice above that you have to look at how many
    Core Complex Dies (CCDs) you have, since it tells you more
    about how many connections of Infinity Fabric you have, and it is
    an important information, since look at the following article
    about the new AMD Epyc Genoa:

    https://wccftech.com/amd-epyc-genoa-cpu-lineup-specs-benchmarks-leak-up-to-2-6x-faster-than-intel-xeon/


    And if you look in the above article at the cost optimized new EPYC Genoa 9224 16 cores with 64 MB of L3 cache, it comes with 4 Core Complex Dies (CCDs) and not 8, but i think that if there is
    only 4 Core Complex Dies (CCDs) and not 8 , and if there is only 4 infinity fabric controllers that connect the 4 Core Complex Dies (CCDs), and if the connection bandwidth of the infinity fabric of the 4 Core Complex Dies (CCDs) and of the 8 is the same, so i think that as i am saying below that the above EPYC Genoa 9224 16 cores with 64 MB of L3 cache and 4 CCDs parallelizes less or scalable less than EPYC Genoa 16 cores with 256 MB L3 cache comes with 8 CCDs.


    More of my thoughts about technology and about Apple Silicon M1 Emulating x86 and more of my thoughts..

    I have just looked at the following articles about Rosetta 2 and the benchmarks of Apple Silicon M1 Emulating x86:

    https://www.computerworld.com/article/3597949/everything-you-need-to-know-about-rosetta-2-on-apple-silicon-macs.html

    and read also here:

    https://www.macrumors.com/2020/11/15/m1-chip-emulating-x86-benchmark/

    But i think that the problem with Apple Silicon M1 and the next Apple Silicon M2 is that Rosetta 2 only lets you run x86–64 macOS apps. That would be apps that were built for macOS (not Windows) and aren't 32-bit. The macOS restriction eliminates huge numbers of Windows apps, and 64-bit restriction eliminates even more.

    Also read the following:

    Apple says new M2 chip won’t beat Intel’s finest

    Read more here:

    https://www.pcworld.com/article/782139/apple-m2-chip-wont-beat-intels-finest.html


    And here is what i am saying on my following thoughts about technology about Arm Vs. X86:

    More of my philosophy about the Apple Silicon and about Arm Vs. X86 and more of my thoughts..

    I invite you to read carefully the following interesting article so
    that to understand more:

    Overhyped Apple Silicon: Arm Vs. X86 Is Irrelevant

    https://seekingalpha.com/article/4447703-overhyped-apple-silicon-arm-vs-x86-is-irrelevant


    More of my philosophy about code compression of RISC-V and ARM and more of my thoughts..

    I think i am highly smart, and i have just read the following paper
    that says that RISC-V Compressed programs are 25% smaller than RISC-V programs, fetch 25% fewer instruction bits than RISC-V programs, and incur fewer instruction cache misses. Its code size is competitive with other compressed RISCs. RVC is expected to improve the performance and energy per operation of RISC-V.

    Read more here to notice it:

    https://people.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf


    So i think RVC has the same compression as ARM Thumb-2, so i think
    that i was correct in my previous thoughts , read them below,
    so i think we have now to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression or RVC.

    More of my philosophy of who will be the winner, x86 or x64 or ARM and more of my thoughts..


    I think i am highly smart, and i think that since x86 or x64 has complex instructions and ARM has simple instructions, so i think that x86 or x64 is more cache friendly, but ARM has wanted to solve the problem by compressing the code by using Thumb-2 that compresses the code, so i think Thumb-2 compresses the size of the code by around 25%, so i think
    we have to look if the x86 or x64 are still more cache friendly even with Thumb-2 compression, and i think that x86 or x64 will still optimize more the power or energy efficiency, so i think that there remains that since x86 or x64 has other big advantages, like the advantage that i am talking about below, so i think the x86 or x64 will be still successful big players in the future, so i think it will be the "tendency". So i think that x86 and x64 will be good for a long time to make money in business, and they will be good for business for USA that make the AMD or Intel CPUs.


    More of my philosophy about x86 or x64 and ARM architectures and more of my thoughts..

    I think i am highly smart, and i think that x86 or x64 architectures
    has another big advantage over ARM architecture, and it is the following:


    "The Bright Parts of x86

    Backward Compatibility

    Compatibility is a two-edged sword. One reason that ARM does better in low-power contexts is that its simpler decoder doesn't have to be compatible with large accumulations of legacy cruft. The downside is that ARM operating systems need to be modified for every new chip version.

    In contrast, the latest 64-bit chips from AMD and Intel are still able to boot PC DOS, the 16-bit operating system that came with the original IBM PC. Other hardware in the system might not be supported, but the CPUs have retained backward compatibility with every version since 1978.

    Many of the bad things about x86 are due to this backward compatibility, but it's worth remembering the benefit that we've had as a result: New PCs have always been able to run old software."

    Read more here on the following web link so that to notice it:

    https://www.informit.com/articles/article.aspx?p=1676714&seqNum=6


    So i think that you can not compare x86 or x64 to ARM, since it is
    not just a power efficiency comparison, like some are doing it by comparing the Apple M1 Pro ARM CPU to x86 or x64 CPUs, it is why i think that x86 or x64 architectures will be here for a long time, so i think that they will be good for a long time to make money in business, and they are a good business for USA that make the AMD or Intel CPUs.

    More of my philosophy about weak memory model and ARM and more of my thoughts..


    I think ARM hardware memory model is not good, since it is a
    weak memory model, so ARM has to provide us with a TSO memory
    model that is compatible with x86 TSO memory model, and read what Kent Dickey is saying about it in my following writing:


    ProValid, LLC was formed in 2003 to provide hardware design and verification consulting services.

    Kent Dickey, founder and President, has had 20 years experience in hardware design and verification. Kent worked at Hewlett-Packard and Intel Corporation, leading teams in ASIC chip design and pre-silicon and post-silicon hardware verification. He architected bus interface chips for high-end servers at both companies. Kent has received more than 10 patents for innovative work in both design and verification.

    Read more here about him:

    https://www.provalid.com/about/about.html


    And read the following thoughts of Kent Dickey about the weak memory model such as of ARM:

    "First, the academic literature on ordering models is terrible. My eyes glaze over and it's just so boring.

    I'm going to guess "niev" means naive. I find that surprising since x86
    is basically TSO. TSO is a good idea. I think weakly ordered CPUs are a
    bad idea.

    TSO is just a handy name for the Sparc and x86 effective ordering for writeback cacheable memory: loads are ordered, and stores are buffered and will complete in order but drain separately from the main CPU pipeline. TSO can allow loads to hit stores in the buffer and see the new value, this doesn't really matter for general ordering purposes.

    TSO lets you write basic producer/consumer code with no barriers. In fact, about the only type of code that doesn't just work with no barriers on TSO is Lamport's Bakery Algorithm since it relies on "if I write a location and read it back and it's still there, other CPUs must see that value as well", which isn't true for TSO.

    Lock free programming "just works" with TSO or stronger ordering guarantees, and it's extremely difficult to automate putting in barriers for complex algorithms for weakly ordered systems. So code for weakly ordered systems tend to either toss in lots of barriers, or use explicit locks (with barriers). And extremely weakly ordered systems are very hard to reason about, and especially hard to program since many implementations are not as weakly ordered as the specification says they could be, so just running your code and having it work is insufficient. Alpha was terrible in this regard, and I'm glad it's silliness died with it.

    HP PA-RISC was documented as weakly ordered, but all implementations guaranteed full system sequential consistency (and it was tested in and enforced, but not including things like cache flushing, which did need barriers). No one wanted to risk breaking software from the original in-order fully sequential machines that might have relied on it. It wasn't really a performance issue, especially once OoO was added.

    Weakly ordered CPUs are a bad idea in much the same way in-order VLIW is a bad idea. Certain niche applications might work out fine, but not for a general purpose CPU. It's better to throw some hardware at making TSO perform well, and keep the software simple and easy to get right.

    Kent"


    Read the rest on the following web link:

    https://groups.google.com/g/comp.arch/c/fSIpGiBhUj0


    And you can read much more of my thoughts about technology in the following web links:


    https://groups.google.com/g/alt.culture.morocco/c/MosH5fY4g_Y

    And here:

    https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4







    Thank you,
    Amine Moulay Ramdane.
    --- Synchronet 3.20a-Linux NewsLink 1.114