• number to binary / binary to number

    From Digi@cosmogen@gmail.com to comp.lang.awk on Sat Apr 16 05:49:06 2022
    From Newsgroup: comp.lang.awk

    what is "heavy" in gawk?

    i may provide example of what i meaning:

    if you're parsing some structures, for example or generate an graphic file or cutting audio file or any more - you will need to work with the numbers in it's binary forms:


    thus "0123" is the 0x33323130 (hex: 30 31 32 33)

    so to convert from number to it's binary form i will need something like:


    BEGIN{
    for ( i = 0; i < 256; i++ )
    ASC[ CHR[ i ] = sprintf( "%.c", i ) ] = i

    n = numtobin32( 0x1234567 )

    print dump( n )

    }

    func numtobin32( n ) {

    return CHR[ and( 0xFF, n ) ] \
    CHR[ and( 0xFF, rshift( n, 8 ) ) ] \
    CHR[ and( 0xFF, rshift( n, 16 ) ) ] \
    CHR[ and( 0xFF, rshift( n, 24 ) ) ] }

    the opposite conversion is looking even worse:

    func bintonum( t ,a,r,A ) {

    split( t, A, "" )
    r = 0
    while( ++a in A )
    r = lshift( r, 8 ) + ASC[ A[ a ] ]
    return r + 0 }

    the paradox is in that both conversions above transform one four bytes of data into another - exactly the same four bytes ...
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sat Apr 16 15:06:26 2022
    From Newsgroup: comp.lang.awk

    On 16.04.2022 14:49, Digi wrote:
    what is "heavy" in gawk?

    i may provide example of what i meaning:

    if you're parsing some structures, for example or generate an graphic file or cutting audio file or any more - you will need to work with the numbers in it's binary forms:


    thus "0123" is the 0x33323130 (hex: 30 31 32 33)

    so to convert from number to it's binary form i will need something like:


    BEGIN{
    for ( i = 0; i < 256; i++ )
    ASC[ CHR[ i ] = sprintf( "%.c", i ) ] = i

    n = numtobin32( 0x1234567 )

    print dump( n )

    }

    func numtobin32( n ) {

    return CHR[ and( 0xFF, n ) ] \
    CHR[ and( 0xFF, rshift( n, 8 ) ) ] \
    CHR[ and( 0xFF, rshift( n, 16 ) ) ] \
    CHR[ and( 0xFF, rshift( n, 24 ) ) ] }

    the opposite conversion is looking even worse:

    func bintonum( t ,a,r,A ) {

    split( t, A, "" )
    r = 0
    while( ++a in A )
    r = lshift( r, 8 ) + ASC[ A[ a ] ]
    return r + 0 }

    the paradox is in that both conversions above transform one four bytes of data into another - exactly the same four bytes ...


    You haven't provided any question or statement in your post. Nonetheless
    I'd say that the answer to your thoughts is @include "binary_ops", where
    the library is what you may want to provide.

    Janis

    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sat Apr 16 15:21:06 2022
    From Newsgroup: comp.lang.awk

    On 16.04.2022 15:06, Janis Papanagnou wrote:
    On 16.04.2022 14:49, Digi wrote:
    what is "heavy" in gawk?

    i may provide example of what i meaning:

    if you're parsing some structures, for example or generate an graphic file or cutting audio file or any more - you will need to work with the numbers in it's binary forms:


    thus "0123" is the 0x33323130 (hex: 30 31 32 33)

    so to convert from number to it's binary form i will need something like:


    BEGIN{
    for ( i = 0; i < 256; i++ )
    ASC[ CHR[ i ] = sprintf( "%.c", i ) ] = i

    [...]

    You haven't provided any question or statement in your post. Nonetheless
    I'd say that the answer to your thoughts is @include "binary_ops", where
    the library is what you may want to provide.

    I saw that there's an 'ordchar' extension in gawk's extension directory
    which may help you reduce own implementation efforts.


    Janis


    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Sat Apr 16 13:47:57 2022
    From Newsgroup: comp.lang.awk

    In article <613c5e48-d8f9-4df6-8c3e-f78462669035n@googlegroups.com>,
    Digi <cosmogen@gmail.com> wrote:
    what is "heavy" in gawk?

    i may provide example of what i meaning:

    if you're parsing some structures, for example or generate an graphic file or >cutting audio file or any more - you will need to work with the numbers in it's
    binary forms:

    Yes, this is a weakness in the AWK model. The reason we program in AWK in
    the first place is so that we don't have to worry about this stuff - i.e.,
    the underlying representations of numbers and strings. We can just program
    as if numbers and strings are native (and basically interchangeable) types.

    But, every so often, we have a need to go below the surface. To get at the underlying bits and bytes. In my case, this is usually when I want to
    access some underlying Unix/Linux functionality that GAWK doesn't currently provide. For example, I recently decided to re-implement the "touch"
    utility in GAWK. To do so, I had to be able to access (one of the many) "utime" function(s) from GAWK. As it happens, accessing the function
    itself was easy (once you have already built functionality to access
    arbitrary system calls from GAWK, as I have done), but the hard part (hard
    only because it had not already been done) was creating functionality to convert a GAWK number into the underlying binary representation needed to
    pass to the system call.

    This was all written up in a recent thread here on this newsgroup (q.v.).
    The gist of it was that this function will do the work:

    function encode(n, i,s) {
    s = sprintf("%c",n)
    for (i=1; i<4; i++)
    s = s sprintf("%c",rshift(n,i*8))
    return s
    }

    But it is not exactly a thing of beauty.

    Overall, I think the best advice I can give is that if you think you're
    going to be doing this in any ongoing scale, you will probably end up
    writing an extension library (in C) to do most of the nitty gritty stuff.
    --
    Men rarely (if ever) manage to dream up a God superior to themselves.
    Most Gods have the manners and morals of a spoiled child.
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Digi@cosmogen@gmail.com to comp.lang.awk on Sun Apr 17 05:03:19 2022
    From Newsgroup: comp.lang.awk

    Jaanis:

    "I saw that there's an 'ordchar' extension in gawk's extension directory
    which may help you reduce own implementation efforts."
    yeah, i just providing (commonly) an examples in it's best(performance) case. and i want just discuss about some themes in gawk.

    i hear that this is good place for this. isn't ? is there another places? )

    Kenny:

    "Yes, this is a weakness in the AWK model."

    but i hear here from somebody from gawk team that gawk is positions itself as: perfectly suited for "pure file parsing".
    it is still so, but this is strange that the best language have that's kind of weakness.

    it's looks like such kind of things should be compensated by the two new dynamic extensions:

    n = bintonum( t )

    and like:

    t = numtobin( n, bytewide)

    but this is also not the perfect solution because of at least one reason: dynamic extensions is also requiring file infrastructure that is hard on the remote machines.

    however it is looks like i should start to do that by myself. i mean writing dynamic extensions ... it's time =)

    regards
    D



    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Apr 22 06:17:47 2022
    From Newsgroup: comp.lang.awk

    @Digi : If you want a num-to-bin to handle it all, even when using gawk unicode mode, you can try this :
    The exact code can print out any arbitrary 4-byte combination in mawk-1, mawk-2, macOS nawk, gawk byte mode, and gawk unicode mode.
    It also has auto awk-variant detection in order to make necessary behavior adjustments, such as mawk-1 not printing large hex or negative hex, nawk not printing negative hex, mawk-2 not having auto-comma feature, and ensuring gawk-unicode-mode doesn't interpret the values as unicode code points.
    gawk -e '
    function encode(_,__,___) {
    return \
    sprintf("%c%c%c%c",
    (__<__)*((_%=(___=(__^=__^=__+=__^=__<__)*__*__)*__)+\
    (_=(_+___*__)%(___*__)))+___+(_/=___),
    ___+(_*=__),
    ___+(_*=__),
    ___+(_*=__))
    } BEGIN {
    ___=2^2^5;
    __["1701734259"]
    __["3891792015"]
    __["2405365991"]
    __[ sprintf("%.f", 3^32-4^7) ]
    __["-444025027"]=\
    __["3850942269"]=-1;

    PROCINFO["sorted_in"] = "@val_num_asc";
    for(_ in __) {
    printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47"\
    )"22.f | 0x %16.8x )> = [[ %-.4s ]] \n",
    _,+"0x1" ? ((_%___)+___)%___ : _, encode(_))
    } }'
    32-bit usgned <( -444,025,027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
    32-bit usgned <( 3,850,942,269 | 0x e588b73d )> = [[ 刷= ]]
    32-bit usgned <( 1,701,734,259 | 0x 656e6773 )> = [[ engs ]]
    32-bit usgned <( 1,853,020,188,835,457 | 0x 6954fe21dfe81 )> = [[ ??? ]]
    32-bit usgned <( 2,405,365,991 | 0x 8f5ef8e7 )> = [[ ?^?? ]]
    32-bit usgned <( 3,891,792,015 | 0x e7f8088f )> = [[ ?? ]]
    % echo; gawk -b -e 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*((_%=(___=(__^=__^=__+=__^=__<__)*__*__)*__)+(_=(_+___*__)%(___*__)))+___+(_/=___),(_*=__)+___,___+(_*=__),___+int(_*__)) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %-.4s ]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    32-bit usgned <( -444,025,027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
    32-bit usgned <( 3,850,942,269 | 0x e588b73d )> = [[ 刷= ]]
    32-bit usgned <( 1,701,734,259 | 0x 656e6773 )> = [[ engs ]]
    32-bit usgned <( 1,853,020,188,835,457 | 0x 6954fe21dfe81 )> = [[ ??? ]]
    32-bit usgned <( 2,405,365,991 | 0x 8f5ef8e7 )> = [[ ?^?? ]]
    32-bit usgned <( 3,891,792,015 | 0x e7f8088f )> = [[ ?? ]]
    % echo; mawk 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*((_%=(___=(__^=__^=__+=__^=__<__)*__*__)*__)+(_=(_+___*__)%(___*__)))+___+(_/=___),(_*=__)+___,___+(_*=__),___+int(_*__)) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %-.4s ]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    32-bit usgned <( 1,701,734,259 | 0x 656e6773 )> = [[ engs ]]
    32-bit usgned <( 1,853,020,188,835,457 | 0x e21dfe81 )> = [[ ??? ]]
    32-bit usgned <( 2,405,365,991 | 0x 8f5ef8e7 )> = [[ ?^?? ]]
    32-bit usgned <( 3,891,792,015 | 0x e7f8088f )> = [[ ?? ]]
    32-bit usgned <( 3,850,942,269 | 0x e588b73d )> = [[ 刷? ]]
    32-bit usgned <( -444,025,027 | 0x e588b73d )> = [[ 刷? ]]
    % echo; mawk2 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*((_%=(___=(__^=__^=__+=__^=__<__)*__*__)*__)+(_=(_+___*__)%(___*__)))+___+(_/=___),(_*=__)+___,___+(_*=__),___+int(_*__)) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %-.4s ]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    32-bit usgned <( 3891792015 | 0x e7f8088f )> = [[ ?? ]]
    32-bit usgned <( 3850942269 | 0x e588b73d )> = [[ 刷= ]]
    32-bit usgned <( 1701734259 | 0x 656e6773 )> = [[ engs ]]
    32-bit usgned <( 1853020188835457 | 0x 6954fe21dfe81 )> = [[ ??? ]]
    32-bit usgned <( 2405365991 | 0x 8f5ef8e7 )> = [[ ?^?? ]]
    32-bit usgned <( -444025027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
    % echo; nawk 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*((_%=(___=(__^=__^=__+=__^=__<__)*__*__)*__)+(_=(_+___*__)%(___*__)))+___+(_/=___),(_*=__)+___,___+(_*=__),___+int(_*__)) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %-.4s ]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    32-bit usgned <( 3,850,942,269 | 0x e588b73d )> = [[ 刷? ]]
    32-bit usgned <( 3,891,792,015 | 0x e7f8088f )> = [[ ?? ]]
    32-bit usgned <( 1,701,734,259 | 0x 656e6773 )> = [[ engs ]]
    32-bit usgned <( -444,025,027 | 0x e588b73d )> = [[ 刷? ]]
    32-bit usgned <( 2,405,365,991 | 0x 8f5ef8e7 )> = [[ ?^?? ]]
    32-bit usgned <( 1,853,020,188,835,457 | 0x e21dfe81 )> = [[ ??? ]]
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Apr 22 06:38:58 2022
    From Newsgroup: comp.lang.awk

    [[ REPOSTING due to minor mawk-1/nawk bug found ]]
    @Digi : If you want a num-to-bin to handle it all, even when using gawk unicode mode, you can try this :
    The exact code can print out any arbitrary 4-byte combination in mawk-1.3.4, mawk-1.9.9.6, gawk 5.1.1 byte mode, gawk 5.1.1 unicode mode, and macOS nawk.
    It also has auto awk-variant detection in order to make necessary behavior adjustments, such as mawk-1 not printing large hex or negative hex, nawk not printing negative hex, mawk-2 not having auto-comma feature, and ensuring gawk-unicode-mode doesn't interpret the values as multi-byte unicode code points. (remove the leading dots - they're only for formatting purposes on newsgroup)
    .. gawk -e '
    ....function encode(_,__,___) {
    ....return \
    ....sprintf("%c%c%c%c",(__<__)*((_%=___=(__^=__^=__+=__=__==__)\ ........*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, .......................................... int(_*=__)%__+___, ..........................................int(_*=__)%__+___, ........................................int(_*__)%__+___)
    .. } BEGIN {
    ........___=2^2^5; __["1701734259"]__["3891792015"] ........__["2405365991"]__[sprintf("%.f",3^32-4^7)] ........__["-444025027"]=__["3850942269"]=-1 ........PROCINFO["sorted_in"]="@val_num_asc"
    ........for(_ in __) {
    ............printf(" \t 32-bit usgned <( %"(\ ..................("\333\222")~"[^\333\222]"?"":"\47"\
    .................. )"22.f | 0x %16.8x )> = [[ %.4s\t]] \n", .................._, +"0x1" ? ((_%___)+___)%___ : _, encode(_))
    ........}
    ....}'
    ........32-bit usgned <(.......... -444,025,027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
    ........32-bit usgned <(..1,853,020,188,835,457 | 0x....6954fe21dfe81 )> = [[ ??? ]]
    ........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
    ........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
    % echo; mawk 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    ........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
    ........32-bit usgned <(..1,853,020,188,835,457 | 0x........ e21dfe81 )> = [[ ??? ]]
    ........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
    ........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
    ........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(.......... -444,025,027 | 0x........ e588b73d )> = [[ 刷= ]]
    % echo; mawk2 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    ........32-bit usgned <(............ 3891792015 | 0x........ e7f8088f )> = [[ ?? ]]
    ........32-bit usgned <(............ 3850942269 | 0x........ e588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(............ 1701734259 | 0x........ 656e6773 )> = [[ engs ]]
    ........32-bit usgned <(...... 1853020188835457 | 0x....6954fe21dfe81 )> = [[ ??? ]]
    ........32-bit usgned <(............ 2405365991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
    ........32-bit usgned <(............ -444025027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
    % echo; nawk 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    ........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
    ........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
    ........32-bit usgned <(.......... -444,025,027 | 0x........ e588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
    ........32-bit usgned <(..1,853,020,188,835,457 | 0x........ e21dfe81 )> = [[ ??? ]]
    % echo; gawk -b -e 'function encode(_,__,___) { return sprintf("%c%c%c%c",(__<__)*( (_%=___=(__^=__^=__+=__=__==__)*__*__*__)+(_=int(_+___)%___))+int(_/=___/=__)%__+___, int(_*=__)%__+___,int(_*=__)%__+___,int(_*__)%__+___) } BEGIN {___=2^2^5; __["1701734259"]__["3891792015"]__["2405365991"]__[sprintf("%.f",3^32-4^7)];__["-444025027"]=__["3850942269"]=-1; PROCINFO["sorted_in"]="@val_num_asc"; for(_ in __) { printf(" \t 32-bit usgned <( %"(("\333\222")~"[^\333\222]"?"":"\47")"22.f | 0x %16.8x )> = [[ %.4s\t]] \n",_,+"0x1" ? ((_%___)+___)%___ : _, encode(_)) } }'
    ........32-bit usgned <(.......... -444,025,027 | 0x ffffffffe588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(..........3,850,942,269 | 0x........ e588b73d )> = [[ 刷= ]]
    ........32-bit usgned <(..........1,701,734,259 | 0x........ 656e6773 )> = [[ engs ]]
    ........32-bit usgned <(..1,853,020,188,835,457 | 0x....6954fe21dfe81 )> = [[ ??? ]]
    ........32-bit usgned <(..........2,405,365,991 | 0x........ 8f5ef8e7 )> = [[ ?^?? ]]
    ........32-bit usgned <(..........3,891,792,015 | 0x........ e7f8088f )> = [[ ?? ]]
    --- Synchronet 3.19c-Linux NewsLink 1.113