I was studying the IA-64 reciprocal estimate, and it has two source
operands as if a divide were taking place, then it checks for special
values for a divide operation, setting a predicate bit if subsequent instructions should not execute. PowerPC does not do this, taking only a single source operand.
The IA64 estimate seems a bit unusual to me, but there is a certain
amount of sense checking for divide issues since most likely a divide happens next.
I am considering replicating something similar, but branching instead
of using predicates. Seems to me a number of special cases could benefit bypassing the divide code. Values like 1/2, 1/4, .etc are easy to
calculate exact reciprocals for.
Robert Finch <robfi680@gmail.com> posted:
I was studying the IA-64 reciprocal estimate, and it has two source
operands as if a divide were taking place, then it checks for special
values for a divide operation, setting a predicate bit if subsequent
instructions should not execute. PowerPC does not do this, taking only a
single source operand.
In practice, there is ½ a Newton-Raphson iteration difference.
Figuring out that the FDIV is a power of 2 is a job for the
compiler 80% of the time.
The IA64 estimate seems a bit unusual to me, but there is a certain
amount of sense checking for divide issues since most likely a divide
happens next.
IA-64 assumes that there are 2 FMAC units per core, and the FDIV code sequence is designed for that. {Markstien has several papers on FDIV,
FSQRT and the transcendentals for IA-64}
I am considering replicating something similar, but branching instead
of using predicates. Seems to me a number of special cases could benefit
bypassing the divide code. Values like 1/2, 1/4, .etc are easy to
calculate exact reciprocals for.
They are also easy to find in the divisor and allows taking a different sequence through FDIV. So, as I see it, this is something HW should be
doing. After HW does this, SW can still gain something. It is only when
HW does nothing that SW alone can add significant performance (remember
FDIV is typically under 2% of instructions executed.)
Robert Finch <robfi680@gmail.com> posted:
I was studying the IA-64 reciprocal estimate, and it has two source
operands as if a divide were taking place, then it checks for special
values for a divide operation, setting a predicate bit if subsequent
instructions should not execute. PowerPC does not do this, taking only a
single source operand.
In practice, there is ½ a Newton-Raphson iteration difference.
Figuring out that the FDIV is a power of 2 is a job for the
compiler 80% of the time.
The IA64 estimate seems a bit unusual to me, but there is a certain
amount of sense checking for divide issues since most likely a divide
happens next.
IA-64 assumes that there are 2 FMAC units per core, and the FDIV code sequence is designed for that. {Markstien has several papers on FDIV,
FSQRT and the transcendentals for IA-64}
I am considering replicating something similar, but branching instead
of using predicates. Seems to me a number of special cases could benefit
bypassing the divide code. Values like 1/2, 1/4, .etc are easy to
calculate exact reciprocals for.
They are also easy to find in the divisor and allows taking a different sequence through FDIV. So, as I see it, this is something HW should be
doing. After HW does this, SW can still gain something. It is only when
HW does nothing that SW alone can add significant performance (remember
FDIV is typically under 2% of instructions executed.)
On 3/25/2026 4:53 PM, MitchAlsup wrote:
Robert Finch <robfi680@gmail.com> posted:
I was studying the IA-64 reciprocal estimate, and it has two source
operands as if a divide were taking place, then it checks for special
values for a divide operation, setting a predicate bit if subsequent
instructions should not execute. PowerPC does not do this, taking only a >>> single source operand.
In practice, there is ½ a Newton-Raphson iteration difference.
Figuring out that the FDIV is a power of 2 is a job for the
compiler 80% of the time.
The IA64 estimate seems a bit unusual to me, but there is a certain
amount of sense checking for divide issues since most likely a divide
happens next.
IA-64 assumes that there are 2 FMAC units per core, and the FDIV code
sequence is designed for that. {Markstien has several papers on FDIV,
FSQRT and the transcendentals for IA-64}
I am considering replicating something similar, but branching instead
of using predicates. Seems to me a number of special cases could benefit >>> bypassing the divide code. Values like 1/2, 1/4, .etc are easy to
calculate exact reciprocals for.
They are also easy to find in the divisor and allows taking a different
sequence through FDIV. So, as I see it, this is something HW should be
doing. After HW does this, SW can still gain something. It is only when
HW does nothing that SW alone can add significant performance (remember
FDIV is typically under 2% of instructions executed.)
Fwiw, my code loves nice floating points...
https://paulbourke.net/fractals/multijulia
Let me zoom before exploding...
On 2026-03-26 4:20 a.m., Chris M. Thomasson wrote:
On 3/25/2026 4:53 PM, MitchAlsup wrote:
Robert Finch <robfi680@gmail.com> posted:
I was studying the IA-64 reciprocal estimate, and it has two source
operands as if a divide were taking place, then it checks for special
values for a divide operation, setting a predicate bit if subsequent
instructions should not execute. PowerPC does not do this, taking
only a
single source operand.
In practice, there is ½ a Newton-Raphson iteration difference.
Figuring out that the FDIV is a power of 2 is a job for the
compiler 80% of the time.
The IA64 estimate seems a bit unusual to me, but there is a certain
amount of sense checking for divide issues since most likely a divide
happens next.
IA-64 assumes that there are 2 FMAC units per core, and the FDIV code
sequence is designed for that. {Markstien has several papers on FDIV,
FSQRT and the transcendentals for IA-64}
I am considering replicating something similar, but branching instead
of using predicates. Seems to me a number of special cases could
benefit
bypassing the divide code. Values like 1/2, 1/4, .etc are easy to
calculate exact reciprocals for.
They are also easy to find in the divisor and allows taking a different
sequence through FDIV. So, as I see it, this is something HW should be
doing. After HW does this, SW can still gain something. It is only when
HW does nothing that SW alone can add significant performance (remember
FDIV is typically under 2% of instructions executed.)
Fwiw, my code loves nice floating points...
https://paulbourke.net/fractals/multijulia
Let me zoom before exploding...
Beautiful fractals.
*****
Hit a divide anomaly.
Made better use of the BRAM storing reciprocal values. The estimate is
now good to better than 11-bits. That makes half-precision division
faster than multiply. Half-precision divides take only two clock cycles.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,113 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 492335:42:32 |
| Calls: | 14,238 |
| Files: | 186,312 |
| D/L today: |
3,553 files (1,156M bytes) |
| Messages: | 2,514,865 |