• Re: IA-64

    From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.arch on Mon Mar 23 21:01:15 2026
    From Newsgroup: comp.arch

    Terje Mathisen <terje.mathisen@tmsw.no> writes:

    Tim Rentsch wrote:

    Terje Mathisen <terje.mathisen@tmsw.no> writes:

    Tim Rentsch wrote:

    [...]

    An unrelated item for your reading pleasure...

    Take an unbiased coin and start flipping it. Keep flipping until
    the number of heads first exceeds the number of tails. Compute the
    fraction: the number of heads divided by the number of flips (which
    always gives a number between 0.5 and 1.0).

    Repeat the above process as many times as desired. Compute the
    average of all the fractions and what do you get?

    I heard about this yesterday from a friend. That's a hint, of
    sorts. (It is now Sunday afternoon where I am.)

    So, by definition the list of possible sequences start with
    H ; 1/2 of all
    THH ; 1/8
    TTHHH ; 1/32
    THTHH ; 1/32 Sum up to here is 22/32
    TTTHHHH ; 1/128
    TTHTHHH
    TTHHTHH
    THTTHHH
    THTHTHH
    etc

    Here's a wild-assed guess: sqrt(0.5) = 0.707

    That's an interesting idea for how to analyze it. I'm not sure it
    works. One thing I can say for sure is when I tried to replicate it
    in a program I got wrong answers, or maybe it converges very slowly.
    An easy way to get a result that matches the theoretical value is
    just to simulate the coin flips using a random number generator. To
    save you the trouble of doing that the ultimate value is pi/4 (and
    it converges VERY slowly).

    So related to calculating pi by picking two random numbers and use
    them as coordinates into a [0..1 x 0..1] square.

    If that's true I don't see how or why it's true. I haven't tried
    to understand the derivation I was given earlier.

    pi/4 =~ 0.78539816, so a bit larger than my wild-assed guess. :-)

    I thought your guess was pretty reasonable. I didn't have an
    opportunity to make a guess because I knew the answer before
    I understood the method.

    Incidentally, the hint mentioned above is that I heard about it on
    pi day, March 14th. :)

    I did not grok that hint. :-(

    Definitely a very subtle hint. I didn't really expect anyone to
    get it, but I wanted to at least give an opportunity. And I've
    been surprised before by how smart some netizens are.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From David Brown@david.brown@hesbynett.no to comp.arch on Tue Mar 24 09:24:27 2026
    From Newsgroup: comp.arch

    On 24/03/2026 05:01, Tim Rentsch wrote:
    Terje Mathisen <terje.mathisen@tmsw.no> writes:

    Tim Rentsch wrote:

    Terje Mathisen <terje.mathisen@tmsw.no> writes:

    Tim Rentsch wrote:

    [...]

    An unrelated item for your reading pleasure...

    Take an unbiased coin and start flipping it. Keep flipping until
    the number of heads first exceeds the number of tails. Compute the
    fraction: the number of heads divided by the number of flips (which >>>>> always gives a number between 0.5 and 1.0).

    Repeat the above process as many times as desired. Compute the
    average of all the fractions and what do you get?

    I heard about this yesterday from a friend. That's a hint, of
    sorts. (It is now Sunday afternoon where I am.)

    So, by definition the list of possible sequences start with
    H ; 1/2 of all
    THH ; 1/8
    TTHHH ; 1/32
    THTHH ; 1/32 Sum up to here is 22/32
    TTTHHHH ; 1/128
    TTHTHHH
    TTHHTHH
    THTTHHH
    THTHTHH
    etc

    Here's a wild-assed guess: sqrt(0.5) = 0.707

    That's an interesting idea for how to analyze it. I'm not sure it
    works. One thing I can say for sure is when I tried to replicate it
    in a program I got wrong answers, or maybe it converges very slowly.
    An easy way to get a result that matches the theoretical value is
    just to simulate the coin flips using a random number generator. To
    save you the trouble of doing that the ultimate value is pi/4 (and
    it converges VERY slowly).

    So related to calculating pi by picking two random numbers and use
    them as coordinates into a [0..1 x 0..1] square.

    If that's true I don't see how or why it's true. I haven't tried
    to understand the derivation I was given earlier.

    I have not looked at the square with random numbers thing, so I can't
    comment on any similarities.

    As for the coin tossing, I would say it is just coincidence that there
    is a pi in the end result. When you are dealing with combinations of increasing numbers of things, you see factorials. When you are dealing
    with probabilities, you see converging sums. When you have converging
    sums with elements containing numerators of the form a . b ^ n and denominators with n!, you have something that looks like a Taylor
    series. And sometimes these can be pushed and shoved into matching the
    Taylor series for a common transcendental function. The probability
    questions you hear about are the ones that then give you a sum that
    involves popular numbers like pi or e.


    pi/4 =~ 0.78539816, so a bit larger than my wild-assed guess. :-)

    I thought your guess was pretty reasonable. I didn't have an
    opportunity to make a guess because I knew the answer before
    I understood the method.

    For this particular problem, convergence is /really/ slow - even if you calculate the probabilities rather than doing actual coin tosses. (In
    the Numberphile video, he did 10,000 coin tosses, and the result was
    about as far from pi/4 as sqrt(1/2) is.) So I agree that sqrt(1/2) is a reasonable guess, as you have to sum up a very large number of steps
    before you exceed that.


    Incidentally, the hint mentioned above is that I heard about it on
    pi day, March 14th. :)

    I did not grok that hint. :-(

    Definitely a very subtle hint. I didn't really expect anyone to
    get it, but I wanted to at least give an opportunity. And I've
    been surprised before by how smart some netizens are.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Sun Apr 5 06:49:00 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Paul Clayton <paaronclayton@gmail.com> posted:

    On 11/5/25 3:43 PM, MitchAlsup wrote:
    [snip]
    I am now working on predictors for a 6-wide My 66000 machine--which is a bit
    different.
    a) VEC-LOOP loops do not alter the branch prediction tables.
    b) Predication clauses do not alter the BPTs.

    Not recording the history of predicates may have a negative
    effect on global history predictors. (I do not know if anyone
    has studied this, but it has been mentioned — e.g.,
    "[predication] has a negative side-effect because the removal
    of branches eliminates useful correlation information
    necessary for conventional branch predictors" from "Improving
    Branch Prediction and Predicated Execution in Out-of-Order
    Processors", Eduardo Quiñones et al., 2007.)

    It depends on where you are looking! If you think branch prediction
    alters where FETCH is Fetching, then MY 66000 predication does not
    do predication prediction--predication is used when the join point
    will have already been fetched by the time the condition is known.
    Then, either the then clause or the else clause will be nullified
    without backup (i.e., branch prediction repair).

    DECODE is still able to predict then-clause versus else-clause
    and maintain the no-backup property, as long as both sides are
    issued into the execution window.

    Predicate prediction can also be useful when the availability
    of the predicate is delayed. Similarly, selective eager
    execution might be worthwhile when the predicate is delayed;
    the selection is likely to be predictive (resource use might
    be a basis for selection but even estimating that might be
    predictive).

    The difference is that predication prediction never needs branch
    prediction repair.

    What happens to the instructions after the predicate?

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? And what happens if the prediction was wrong?
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Sun Apr 5 20:35:46 2026
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Paul Clayton <paaronclayton@gmail.com> posted:

    On 11/5/25 3:43 PM, MitchAlsup wrote:
    [snip]
    I am now working on predictors for a 6-wide My 66000 machine--which is a bit
    different.
    a) VEC-LOOP loops do not alter the branch prediction tables.
    b) Predication clauses do not alter the BPTs.

    Not recording the history of predicates may have a negative
    effect on global history predictors. (I do not know if anyone
    has studied this, but it has been mentioned — e.g.,
    "[predication] has a negative side-effect because the removal
    of branches eliminates useful correlation information
    necessary for conventional branch predictors" from "Improving
    Branch Prediction and Predicated Execution in Out-of-Order
    Processors", Eduardo Quiñones et al., 2007.)

    It depends on where you are looking! If you think branch prediction
    alters where FETCH is Fetching, then MY 66000 predication does not
    do predication prediction--predication is used when the join point
    will have already been fetched by the time the condition is known.
    Then, either the then clause or the else clause will be nullified
    without backup (i.e., branch prediction repair).

    DECODE is still able to predict then-clause versus else-clause
    and maintain the no-backup property, as long as both sides are
    issued into the execution window.

    Predicate prediction can also be useful when the availability
    of the predicate is delayed. Similarly, selective eager
    execution might be worthwhile when the predicate is delayed;
    the selection is likely to be predictive (resource use might
    be a basis for selection but even estimating that might be
    predictive).

    The difference is that predication prediction never needs branch
    prediction repair.

    What happens to the instructions after the predicate?

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? like::

    peq0 r1,tf
    ldd r3,[r4,#24]
    ldd r3,[r4,#28]

    And what happens if the prediction was wrong?

    Some form of backup and do it right, along with some form of
    not updating the cache on the one which was not supposed to
    be executed {or TLB or L2}.


    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Thomas Koenig@tkoenig@netcologne.de to comp.arch on Mon Apr 6 05:11:21 2026
    From Newsgroup: comp.arch

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? like::

    peq0 r1,tf
    ldd r3,[r4,#24]
    ldd r3,[r4,#28]

    Yes, that can be simplified.

    Could the load in the original be speculatively executed?


    And what happens if the prediction was wrong?

    Some form of backup and do it right, along with some form of
    not updating the cache on the one which was not supposed to
    be executed {or TLB or L2}.

    Does your answer apply to my original code or to the one
    that you posted? If it only applies to the latter, I can
    easily make up an example that cannot be simplified the
    way you did (let's take that as a given).
    --
    This USENET posting was made without artificial intelligence,
    artificial impertinence, artificial arrogance, artificial stupidity,
    artificial flavorings or artificial colorants.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From MitchAlsup@user5857@newsgrouper.org.invalid to comp.arch on Mon Apr 6 16:24:36 2026
    From Newsgroup: comp.arch


    Thomas Koenig <tkoenig@netcologne.de> posted:

    MitchAlsup <user5857@newsgrouper.org.invalid> schrieb:

    Thomas Koenig <tkoenig@netcologne.de> posted:

    Let's say we have

    [...]
    peq0 r1,tf
    mov r2,#24
    mov r2,#48
    ldd r3,[r4,r2,0]
    [...]

    can the ldd be speculatively executed or not? like::

    peq0 r1,tf
    ldd r3,[r4,#24]
    ldd r3,[r4,#28]

    Yes, that can be simplified.

    Could the load in the original be speculatively executed?


    And what happens if the prediction was wrong?

    Some form of backup and do it right, along with some form of
    not updating the cache on the one which was not supposed to
    be executed {or TLB or L2}.

    Does your answer apply to my original code or to the one
    that you posted?

    Both.

    If it only applies to the latter, I can
    easily make up an example that cannot be simplified the
    way you did (let's take that as a given).

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Robert Finch@robfi680@gmail.com to comp.arch on Tue Apr 7 22:53:59 2026
    From Newsgroup: comp.arch

    Dedicated a general-purpose register (out of 128 GPRs) to store the
    round mode for different data types. Each data type has a nibble to hold
    the round mode.

    Bits
    0 to 3 FLT - floating point
    4 to 7 DFLT - decimal float
    8 to 11 POS - posit (reserved)
    12 to 15 FIX - fixed point
    16 to 19 INT - integer (arithmetic shift right, average)

    If the dynamic round mode is changed, then the round mode is visible
    with the same register rename as other GPRs. The round mode is then
    easily updated with a bitfield insert (DEP).

    It does mean the round mode occupies an operand slot in the RS.

    I suppose there could be a separate rounding mode for each precision too.

    The round mode is separate from the FP status reg. which is not a GPR.
    The FP status reg. is stored in the ROB and eventually makes it back to
    the architectural FP status reg. It is not readable without using a FP
    FENCE instruction first.

    Thinking about merging status registers for different data types into
    the same status register.



    --- Synchronet 3.21f-Linux NewsLink 1.2