From Newsgroup: comp.arch
On 8/12/24 2:22 AM, Terje Mathisen wrote:
Brett wrote:
[snip]
I can understand the reluctance to go to 6 bit register
specifiers, it
burns up your opcode space and makes encoding everything more
difficult.
But today that is an unserviced market which will get customers
to give you
a look. Put out some vapor ware and see what customers say.
The solution (?) have always looked obvious to me: Some form of
huffmann encoding of register specifiers, so that the most common
ones (bottom 16 or 32) require just a small amount of space (as
today), and then either a prefix or a suffix to provide extra bits
when you want to use those higher register numbers. Mitch's CARRY
sets up a single extra register for a set of operations, a WIDE
prefix could contain two extra register bits for four registers
over the next 2 or 3 instructions.
As long as this doesn't make the decoder a speed limiter, it would
be zero cost for regular code and still quite cheap except for
increasing code size by 33-50% for the inner loops of algorithms
that need 64 or even 128 regs.
Fujitsu's SPARC64 VIIIfx had a Set XAR (eXtended Arithmetic
Register) instruction that provided three bits to four register
fields and a SIMD bit to up to two instructions.
(Of course x86-64 has a prefix that added one bit to two register
fields for one instruction. AVX512/AVX10 further extend the
register set to 32 entries.).
An alternative would be to have opcode bits be variably placed
such that the register fields are always in the same positions but
are truncated for some encodings (based on fixed
opcode information). I.e., the extra register field bits could be
interpreted as opcode bits or register bits (or perhaps immediate
bits).
Another possibility would be to have smaller special purpose
register fields that when expanded become general purpose. E.g., 8
address and 8 data registers might be expanded to 32 general
purpose registers.
The tradeoffs of contiguity of fields (register, opcode,
immediate) and consistency of interpretation would seem to depend
on criticality in the pipeline. Except for control flow
instructions, register names seem more critical than opcode.
Immediate fields are usually least critical. Relative control flow
is one exception; recognizing a branch and the offset early can be
beneficial. Similarly loads from a stable base (e.g., global
pointer, stack pointer, also signature caches) might start early
with a earlier known offset. Also, some immediate additions might
be merged early — comparable to trace cache optimizations but not
cached — or converted to a single dependency delay.
ADD R5 ← R5 #100;
BEZ R7 TARGET; // predicted not taken
ADD R5 ← R5 #32;
could be converted to
ADD R5 ← R1 #132;
(This would require recovering more slowly from an earlier
checkpoint on a branch misprediction or a means of inserting a SUB
R5 ← R5 #32 to correct the state.)
ADD R6 ← R5 #100;
ADD R5 ← R6 #32;
could be converted to
ADD R6 ← R5 #100;
ADD R5 ← R5 #132;
(Intel recently started optimizing this case.)
An argument might also be made that the operands for multiple
operations might be beneficially encoded to exploit variability
in the number of register operands, possibly including temporary
results and destructive operations. (Maybe not a strong or a
sensible argument, but an argument.☺)
--- Synchronet 3.20a-Linux NewsLink 1.114