• Re: My 66000 and High word facility

    From Paul A. Clayton@paaronclayton@gmail.com to comp.arch on Wed Oct 16 13:40:13 2024
    From Newsgroup: comp.arch

    On 8/12/24 2:22 AM, Terje Mathisen wrote:
    Brett wrote:
    [snip]
    I can understand the reluctance to go to 6 bit register
    specifiers, it
    burns up your opcode space and makes encoding everything more
    difficult.
    But today that is an unserviced market which will get customers
    to give you
    a look. Put out some vapor ware and see what customers say.

    The solution (?) have always looked obvious to me: Some form of
    huffmann encoding of register specifiers, so that the most common
    ones (bottom 16 or 32) require just a small amount of space (as
    today), and then either a prefix or a suffix to provide extra bits
    when you want to use those higher register numbers. Mitch's CARRY
    sets up a single extra register for a set of operations, a WIDE
    prefix could contain two extra register bits for four registers
    over the next 2 or 3 instructions.

    As long as this doesn't make the decoder a speed limiter, it would
    be zero cost for regular code and still quite cheap except for
    increasing code size by 33-50% for the inner loops of algorithms
    that need 64 or even 128 regs.

    Fujitsu's SPARC64 VIIIfx had a Set XAR (eXtended Arithmetic
    Register) instruction that provided three bits to four register
    fields and a SIMD bit to up to two instructions.

    (Of course x86-64 has a prefix that added one bit to two register
    fields for one instruction. AVX512/AVX10 further extend the
    register set to 32 entries.).

    An alternative would be to have opcode bits be variably placed
    such that the register fields are always in the same positions but
    are truncated for some encodings (based on fixed
    opcode information). I.e., the extra register field bits could be
    interpreted as opcode bits or register bits (or perhaps immediate
    bits).

    Another possibility would be to have smaller special purpose
    register fields that when expanded become general purpose. E.g., 8
    address and 8 data registers might be expanded to 32 general
    purpose registers.

    The tradeoffs of contiguity of fields (register, opcode,
    immediate) and consistency of interpretation would seem to depend
    on criticality in the pipeline. Except for control flow
    instructions, register names seem more critical than opcode.
    Immediate fields are usually least critical. Relative control flow
    is one exception; recognizing a branch and the offset early can be
    beneficial. Similarly loads from a stable base (e.g., global
    pointer, stack pointer, also signature caches) might start early
    with a earlier known offset. Also, some immediate additions might
    be merged early — comparable to trace cache optimizations but not
    cached — or converted to a single dependency delay.

    ADD R5 ← R5 #100;
    BEZ R7 TARGET; // predicted not taken
    ADD R5 ← R5 #32;

    could be converted to

    ADD R5 ← R1 #132;

    (This would require recovering more slowly from an earlier
    checkpoint on a branch misprediction or a means of inserting a SUB
    R5 ← R5 #32 to correct the state.)

    ADD R6 ← R5 #100;
    ADD R5 ← R6 #32;

    could be converted to

    ADD R6 ← R5 #100;
    ADD R5 ← R5 #132;

    (Intel recently started optimizing this case.)

    An argument might also be made that the operands for multiple
    operations might be beneficially encoded to exploit variability
    in the number of register operands, possibly including temporary
    results and destructive operations. (Maybe not a strong or a
    sensible argument, but an argument.☺)
    --- Synchronet 3.20a-Linux NewsLink 1.114