• [gawk] FP precision

    From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Jan 20 10:56:19 2023
    From Newsgroup: comp.lang.awk

    In an article about "AWK As A Major Systems Programming Language"[*]
    (in chapter 5.3 "Future Work") we can read:
    "Some issues are known and may not be resolvable. For example,
    64-bit integer values such as the timestamps in stat() data on
    modern systems don’t fit into awk’s 64-bit double-precision
    numbers which only have 53 bits of significand. This is also a
    problem for the bit-manipulation functions."
    I was a bit astonished to read that; I thought that IEEE 80-bit FP
    (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
    or, generally not in applications?

    This (and other answers) on a SO post[**] may address the question:
    "That is, you may have 32-bit or 64-bit variables, but when they
    are loaded into the FPU registers, they are converted to 80 bit;
    the FPU then (by default) performs all calculations in 80 but;
    after the calculation, the result is stored back into a 32-bit
    or 64-bit variables."
    So it's standard only in FPUs and losses are accepted when passing
    values from FPUs to memory entities (presumably for performance
    reasons)?

    Janis

    [*] http://www.skeeve.com/awk-sys-prog.html

    [**] https://stackoverflow.com/questions/612507/what-are-the-applications-benefits-of-an-80-bit-extended-precision-data-type
    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Fri Jan 20 09:53:40 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    In an article about "AWK As A Major Systems Programming Language"[*]
    (in chapter 5.3 "Future Work") we can read:
    "Some issues are known and may not be resolvable. For example,
    64-bit integer values such as the timestamps in stat() data on
    modern systems don’t fit into awk’s 64-bit double-precision
    numbers which only have 53 bits of significand. This is also a
    problem for the bit-manipulation functions."
    I was a bit astonished to read that; I thought that IEEE 80-bit FP
    (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
    or, generally not in applications?

    True -- but a 64-bit time_t value can be stored in a 64-bit IEEE double
    without loss of information as long as it's not too big. The smallest
    positive integer that can't be represented exactly in a 64-bit IEEE
    double is 2**53+1; as a time_t. That's around 285 billion years in the
    future.

    Other 64-bit integer values can be a problem. Large values will lose
    their low-order bits.

    $ gawk 'BEGIN{print(2**53-1); print(2**53); print(2**53+1)}'
    9007199254740991
    9007199254740992
    9007199254740992
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for XCOM Labs
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Jan 20 19:55:37 2023
    From Newsgroup: comp.lang.awk

    On 20.01.2023 18:53, Keith Thompson wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    In an article about "AWK As A Major Systems Programming Language"[*]
    (in chapter 5.3 "Future Work") we can read:
    "Some issues are known and may not be resolvable. For example,
    64-bit integer values such as the timestamps in stat() data on
    modern systems don’t fit into awk’s 64-bit double-precision
    numbers which only have 53 bits of significand. This is also a
    problem for the bit-manipulation functions."
    I was a bit astonished to read that; I thought that IEEE 80-bit FP
    (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
    or, generally not in applications?

    True -- but a 64-bit time_t value can be stored in a 64-bit IEEE double without loss of information as long as it's not too big.

    Yes. - My point was that a [standard] 80 bit FP number would have a
    64 bit mantissa that allows lossless storage (in the mantissa) while
    simply ignoring the signs/exponent parts; for gawk 64 bit operations
    and 64 bit time_t. If implementation [of gawk] would technically use
    an 80 bit "carrier" (instead of a 64 bit "long integer") the issues
    mentioned might not be an issue. Or are you saying that the "problem"
    mentioned by the article is just a gawk implementation issue to not
    use the "64 bit carrier" sophisticatedly (and instead unnecessarily
    try to use only the 56 bit mantissa or the long integer)?

    The smallest
    positive integer that can't be represented exactly in a 64-bit IEEE
    double is 2**53+1; as a time_t. That's around 285 billion years in the future.

    [...]

    Janis


    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Fri Jan 20 11:40:21 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    On 20.01.2023 18:53, Keith Thompson wrote:
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
    In an article about "AWK As A Major Systems Programming Language"[*]
    (in chapter 5.3 "Future Work") we can read:
    "Some issues are known and may not be resolvable. For example,
    64-bit integer values such as the timestamps in stat() data on
    modern systems don’t fit into awk’s 64-bit double-precision
    numbers which only have 53 bits of significand. This is also a
    problem for the bit-manipulation functions."
    I was a bit astonished to read that; I thought that IEEE 80-bit FP
    (with a 64 bit mantissa) would be standard nowadays. Not in GNU Awk,
    or, generally not in applications?

    True -- but a 64-bit time_t value can be stored in a 64-bit IEEE double
    without loss of information as long as it's not too big.

    Yes. - My point was that a [standard] 80 bit FP number would have a
    64 bit mantissa that allows lossless storage (in the mantissa) while
    simply ignoring the signs/exponent parts; for gawk 64 bit operations
    and 64 bit time_t. If implementation [of gawk] would technically use
    an 80 bit "carrier" (instead of a 64 bit "long integer") the issues
    mentioned might not be an issue. Or are you saying that the "problem" mentioned by the article is just a gawk implementation issue to not
    use the "64 bit carrier" sophisticatedly (and instead unnecessarily
    try to use only the 56 bit mantissa or the long integer)?

    The smallest
    positive integer that can't be represented exactly in a 64-bit IEEE
    double is 2**53+1; as a time_t. That's around 285 billion years in the
    future.

    [...]

    Different implementations of awk might use different floating-point representations on different platforms. I don't think the
    characteristics of floating-point are defined by the language.
    (Is there even a formal language definition?)

    There aren't many systems these days that don't use IEEE floating-point,
    but awk could probably be supported on such systems. There's a VMS port
    of gawk, and VAX floating-point probably uses a different mantissa size.

    My point, I guess, is that typical awk implementations store numbers in
    64-bit IEEE floating-point, which means they can't store full 64-bit
    integers (and can lose precision silently) -- but time_t values are
    probably not the best illustation of that issue, because while time_t is typically a signed 64-bit integer, most time_t values fit in 32 bits (33 starting in 2038, which still won't be a problem for awk).
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for XCOM Labs
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.lang.awk on Sat Jan 21 08:33:35 2023
    From Newsgroup: comp.lang.awk

    On 2023-01-20, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Different implementations of awk might use different floating-point representations on different platforms. I don't think the
    characteristics of floating-point are defined by the language.
    (Is there even a formal language definition?)

    In defining Awk, the POSIX spec defers a lot of details to C,
    which seems like handwaving, but actually pins things down.

    E.g. in the area of numeric constants:

    The token NUMBER shall represent a numeric constant. Its form and
    numeric value shall either be equivalent to the
    decimal-floating-constant token as specified by the ISO C standard, or
    it shall be a sequence of decimal digits and shall be evaluated as an
    integer constant in decimal. In addition, implementations may accept
    numeric constants with the form and numeric value equivalent to the
    hexadecimal-constant and hexadecimal-floating-constant tokens as
    specified by the ISO C standard.

    If the value is too large or too small to be representable (see
    Concepts Derived from the ISO C Standard), the behavior is undefined.

    This "Concepts Derived from the ISO C Standard" section is a general
    one, outside of the Awk chapter.

    https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html#tag_17_01_02

    It's a large section, whose earliest paragraphs are relevant to Awk
    numerics.

    1.1.2 Concepts Derived from the ISO C Standard

    Some of the standard utilities perform complex data manipulation using
    their own procedure and arithmetic languages, as defined in their
    EXTENDED DESCRIPTION or OPERANDS sections. Unless otherwise noted, the
    arithmetic and semantic concepts (precision, type conversion, control
    flow, and so on) shall be equivalent to those defined in the ISO C
    standard, as described in the following sections. Note that there is no
    requirement that the standard utilities be implemented in any particular
    programming language.

    Arithmetic Precision and Operations

    Integer variables and constants, including the values of operands and
    option-arguments, used by the standard utilities listed in this volume
    of POSIX.1-2017 shall be implemented as equivalent to the ISO C standard
    signed long data type; floating point shall be implemented as equivalent
    to the ISO C standard double type. Conversions between types shall be as
    described in the ISO C standard. All variables shall be initialized to
    zero if they are not otherwise assigned by the input to the application.

    [ ... ]
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.113