• Re: Matmul in VVM

    From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Jun 1 15:45:37 2026
    From Newsgroup: comp.arch

    On 5/29/2026 9:58 AM, MitchAlsup wrote:

    Stephen Fuld <sfuld@alumni.cmu.edu.invalid> posted:

    snip

    My original proposal allows you to execute one instruction based on the
    value in a register, i.e. if the register contains the value 3, then the
    third instruction is the only one executed. The enhanced version allows
    more flexibility. For example, you could allow to specify execute all
    instructions up to the number in the register. As I said, while the use
    case for the basic instruction is clear, emulating register indexing, I
    am not sure there are any use cases for the enhancement.

    Not sure what the source code would look like in order for the compiler
    to recognize this pattern and optimize to your solution.

    Good question. I have thought about it for a while, and though I am far
    from a compiler expert, I have come up with a potential solution at
    least for the basic proposal.

    The idea is if the compiler sees a SWITCH statement where the clauses
    that are switched to will compile to one instruction (not counting any instructions need for array addressing (which would be handled outside
    the SWITCH, e.g. a loop counter), then it could emit the PREDNAT (or
    whatever name is better) followed by the single instructions for each
    clause. I am sure this needs more specificity, but I hope you get the idea.

    I am still not sure of the benefit of the "enhanced" instruction, and
    haven't come up with any reasonable source code that would benefit from it.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Stefan Monnier@monnier@iro.umontreal.ca to comp.arch on Mon Jun 1 14:58:35 2026
    From Newsgroup: comp.arch

    MitchAlsup [2026-05-29 16:55:38] wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> posted:
    But instead of skipping, You can "predicate away" the undesirable
    instructions. So, in sum, I think what you describe can be made to
    work. The main problem is that it will "fill" your dataflow core with
    many "useless" instructions, so it risks making the whole loop too large
    for vVM and it risks also making it inefficient (in case all
    N instructions end up speculatively executed and the predication
    operates by throwing away N-1 of the values).
    If the predicated instructions are used in at least 1 iteration, they
    are not useless.

    They may not be useless overall, but they still waste resources at each iteration where they're not used. Traditional predication of an `if`
    gives a "50% waste" (for equal size branches or when each branch is
    taken as often as the other), whereas a predicated `switch` results in
    a waste of `N-1/N`. As N grows larger this becomes discouraging.


    === Stefan
    --- Synchronet 3.22a-Linux NewsLink 1.2
  • From Stephen Fuld@sfuld@alumni.cmu.edu.invalid to comp.arch on Mon Jun 1 23:40:38 2026
    From Newsgroup: comp.arch

    On 6/1/2026 11:58 AM, Stefan Monnier wrote:
    MitchAlsup [2026-05-29 16:55:38] wrote:
    Stefan Monnier <monnier@iro.umontreal.ca> posted:
    But instead of skipping, You can "predicate away" the undesirable
    instructions. So, in sum, I think what you describe can be made to
    work. The main problem is that it will "fill" your dataflow core with
    many "useless" instructions, so it risks making the whole loop too large >>> for vVM and it risks also making it inefficient (in case all
    N instructions end up speculatively executed and the predication
    operates by throwing away N-1 of the values).
    If the predicated instructions are used in at least 1 iteration, they
    are not useless.

    They may not be useless overall, but they still waste resources at each iteration where they're not used. Traditional predication of an `if`
    gives a "50% waste" (for equal size branches or when each branch is
    taken as often as the other), whereas a predicated `switch` results in
    a waste of `N-1/N`. As N grows larger this becomes discouraging.

    OK, but what exactly are we wasting? We are taking space in the
    reservation stations, but we are saving executing actual load
    instructions. So the "waste" results in faster execution.
    --
    - Stephen Fuld
    (e-mail address disguised to prevent spam)
    --- Synchronet 3.22a-Linux NewsLink 1.2