• Re: GAWK: Converting a number to a string (the hard way!) - Discuss!

    From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Fri Jan 14 08:00:08 2022
    From Newsgroup: comp.lang.awk

    On 13.01.2022 14:40, Kenny McCormack wrote:

    Note, BTW, that I have verified that when you printf with %c, it only uses the low 8 bits of the number you pass in. So, you don't need to do any "AND"ing.

    I also used that assumption in my code upthread but forgot to point
    out that this is not reliable or is generally even not true because
    that depends on the locale that you have set. Just two samples from
    a Unix context...

    $ printf "%s\n" 65 65601 | LC_ALL=C awk '{printf "%c\n", $0}' | od -c -tx1 0000000 A \n A \n
    41 0a 41 0a

    $ printf "%s\n" 65 65601 | LC_ALL=C.UTF-8 awk '{printf "%c\n", $0}' | od
    -c -tx1
    0000000 A \n 360 220 201 201 \n
    41 0a f0 90 81 81 0a

    So depending on context and requirements the AND'ing might still be
    necessary or the locale explicitly adjusted (as in the sample here).

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Fri Jan 14 14:29:46 2022
    From Newsgroup: comp.lang.awk

    In article <srr71o$ll4$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    On 13.01.2022 14:40, Kenny McCormack wrote:

    Note, BTW, that I have verified that when you printf with %c, it only uses >> the low 8 bits of the number you pass in. So, you don't need to do any
    "AND"ing.

    I also used that assumption in my code upthread but forgot to point
    out that this is not reliable or is generally even not true because
    that depends on the locale that you have set. Just two samples from
    a Unix context...

    I get it, but I am not too concerned about it. Since this method already assumes 32 bits and little-endian, I would just add to the list of
    assumptions: "No goofy locale settings". I.e., it works in the C locale.

    In fact, on almost all of my machines, I put code in my startup files to
    unset any locale related environment variables and/or set them to just "C". Makes life a lot more predictable.

    BTW(1), this is sort of the genesis of this thread. I was looking for a more straightforward way to do it - that wouldn't depend on so many simplifying assumptions in order to work. Seems there ought to be a simpler way to
    just put 4 bytes into a string. That's what I was hoping for...

    BTW(2), TAWK has this covered - there are functions "pack" and "unpack" specifically for this sort of thing - packing values into (and unpacking
    out of) strings that act as structs that you pass to low-level routines.
    Of course, the fact that TAWK directly supports access to low-level
    routines obliges it to provide these functionalities. Native GAWK does not (yet) provide access to low-level stuff. The dialect of GAWK that I
    program in, does.

    Of course, I could make this whole problem go away by writing yet another extension lib to do it - but I was trying to avoid doing that.
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Infallibility
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sat Jan 15 02:25:56 2022
    From Newsgroup: comp.lang.awk

    On 14.01.2022 15:29, Kenny McCormack wrote:

    I get it, but I am not too concerned about it. Since this method already assumes 32 bits and little-endian, I would just add to the list of assumptions: "No goofy locale settings". I.e., it works in the C locale.

    Fair enough. For others here it might be a fact to consider to not
    get surprised.


    Of course, I could make this whole problem go away by writing yet another extension lib to do it - but I was trying to avoid doing that.

    And that (with GNU Awk) would be the way to go.

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Mon Jan 17 02:37:43 2022
    From Newsgroup: comp.lang.awk

    On Friday, January 14, 2022 at 8:26:02 PM UTC-5, Janis Papanagnou wrote:
    On 14.01.2022 15:29, Kenny McCormack wrote:

    I get it, but I am not too concerned about it. Since this method already assumes 32 bits and little-endian, I would just add to the list of assumptions: "No goofy locale settings". I.e., it works in the C locale.
    Fair enough. For others here it might be a fact to consider to not
    get surprised.

    Of course, I could make this whole problem go away by writing yet another extension lib to do it - but I was trying to avoid doing that.
    And that (with GNU Awk) would be the way to go.

    Janis

    if u wanna make it consistent regardless of locale settings , just add a large multiple of 256 that's larger than 0x10FFFF -

    LC_ALL="UTF-8" gawk -e 'BEGIN { printf("%c",65601+8^7) }' | od -baxco
    0000000 101
    A
    0041
    A
    000101 0000001
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Mon Jan 17 02:39:42 2022
    From Newsgroup: comp.lang.awk

    if u wanna make it consistent regardless of locale settings, add a very large multiple of 256 above 0x10FFFF :

    LC_ALL="UTF-8" gawk -e 'BEGIN { printf("%c",65601+8^7) }' | od -baxco
    0000000 101
    A
    0041
    A
    000101 0000001

    % LC_ALL="UTF-8" gawk -e 'BEGIN { printf("%c",65601) }' | od -baxco
    0000000 360 220 201 201
    ? 90 81 81
    90f0 8181
    360 220 201 201
    110360 100601 0000004
    --- Synchronet 3.19b-Linux NewsLink 1.113