• Can I output a binary file from an AWK program?

    From Alan Mackenzie@acm@muc.de to comp.lang.awk on Sat Mar 28 19:14:26 2026
    From Newsgroup: comp.lang.awk

    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    Can AWK do this? (It is supposed to by Turing complete, isn't it?)

    Or should I dust off my three-quarters forgotton Python and write the
    program in Python (Or P***) instead?

    Thanks for the help!
    --
    Alan Mackenzie (Nuremberg, Germany).

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sat Mar 28 21:17:19 2026
    From Newsgroup: comp.lang.awk

    On 2026-03-28 20:14, Alan Mackenzie wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    This needs clarification. - Note that Awk has no 32 bit integer data
    type. Usually you have text input, say "1234567", that is internally
    stored in a numerically interpreted field.

    Can AWK do this? (It is supposed to by Turing complete, isn't it?)

    Of course that can be done. You could read in the number and apply a
    sequence of modulus and division operations on the read-in number; say
    awk '{n=$0; printf "%c\n", n%256; n=int(n/256); ... }'

    Or, in GNU Awk, you could extract the bytes by its bit-functions,
    and(), rshift(), to extract the octet parts, then print them as a
    character as above.

    Janis

    [...]

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Alan Mackenzie@acm@muc.de to comp.lang.awk on Sun Mar 29 11:30:31 2026
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 2026-03-28 20:14, Alan Mackenzie wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    This needs clarification. - Note that Awk has no 32 bit integer data
    type. Usually you have text input, say "1234567", that is internally
    stored in a numerically interpreted field.

    This field being either a string or a floating point number. But I need
    to output 32-bit integers regardless of the internal form.

    Can AWK do this? (It is supposed to by Turing complete, isn't it?)

    Of course that can be done. You could read in the number and apply a
    sequence of modulus and division operations on the read-in number; say
    awk '{n=$0; printf "%c\n", n%256; n=int(n/256); ... }'

    OK, thanks.

    Or, in GNU Awk, you could extract the bytes by its bit-functions,
    and(), rshift(), to extract the octet parts, then print them as a
    character as above.

    I see that now. But it is ugly enough that maybe I really do want to
    use some other language rather than AWK.

    Janis
    --
    Alan Mackenzie (Nuremberg, Germany).

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sun Mar 29 13:35:27 2026
    From Newsgroup: comp.lang.awk

    On 2026-03-29 13:30, Alan Mackenzie wrote:
    On 2026-03-28 20:14, Alan Mackenzie wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.
    [...]

    I see that now. But it is ugly enough that maybe I really do want to
    use some other language rather than AWK.

    That decision is certainly depending on what else your program/script
    is going to do. If it's mainly such conversions then I'd likely come
    to the same conclusion.

    Janis

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.awk on Sun Mar 29 23:45:18 2026
    From Newsgroup: comp.lang.awk

    On 2026-03-28, Alan Mackenzie <acm@muc.de> wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    Firstly, Awk's printf has a %c specifier which will output any byte:

    $ awk 'BEGIN { printf("%c%c", 0x41, 0x0A) }'
    A
    $

    Another idea is to output a textual dump compatible with xxd.
    (xxd == fairly widely installed utility that comes with Vim).
    xxd can revert xxd dumps back to binary:

    $ xxd /etc/bash_completion
    00000000: 2e20 2f75 7372 2f73 6861 7265 2f62 6173 . /usr/share/bas
    00000010: 682d 636f 6d70 6c65 7469 6f6e 2f62 6173 h-completion/bas
    00000020: 685f 636f 6d70 6c65 7469 6f6e 0a h_completion.
    $ xxd /etc/bash_completion | xxd -r
    . /usr/share/bash-completion/bash_completion

    IIRC there is a bit of tolerance in "xxd -r". Indeed:

    $ xxd -r
    0: 41 57 4B 0A
    AWK
    0: 42 494E 0A
    BIN
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.awk on Sun Mar 29 17:07:30 2026
    From Newsgroup: comp.lang.awk

    Kaz Kylheku <046-301-5902@kylheku.com> writes:
    On 2026-03-28, Alan Mackenzie <acm@muc.de> wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    Firstly, Awk's printf has a %c specifier which will output any byte:

    $ awk 'BEGIN { printf("%c%c", 0x41, 0x0A) }'
    A
    $
    [...]

    The behavior seems to depend on the current locale. I haven't
    investigated it thoroughly. I don't know whether there's a way to
    force binary output.

    For example, the cent sign '¢' is U+00a2, represented in UTF-8
    as the two-byte sequence 0xc2, 0xa2.

    I have LANG=en_US.UTF-8 in my environment.

    $ gawk --version | head -n 1
    GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
    $ gawk 'BEGIN { printf("%c\n", 0xa2) }'
    ¢
    $ LANG=C gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
    00000000 a2 0a |..|
    00000002
    $

    (I filtered the last through hd because the output is not valid UTF-8.)

    nawk behaves similarly. mawk and busybox awk do not.
    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    void Void(void) { Void(); } /* The recursive call of the void */
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Mon Mar 30 09:37:31 2026
    From Newsgroup: comp.lang.awk

    In article <10q99ai$ph7$1@news.muc.de>, Alan Mackenzie <acm@muc.de> wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    Can AWK do this? (It is supposed to by Turing complete, isn't it?)

    Your question has two parts. First, is the general question of whether
    GAWK can (correctly) read and write binary files. The answer to this is
    "Yes, at least under Unix/Linux", but it may not be entirely supported by
    the AWK/GAWK developers/honchos. There was a thread recently in one of the
    GNU GAWK mailing lists about this sort of thing, and one of the GAWK
    developers stated that AWK is not really the right tool for dealing with
    files that are a mixture of text and binary data.

    But it can be done and it does work, provided you are careful about what
    you are doing. Note that you can run into problems if you are using some
    other OS(es) that do strange things with line endings. So, some
    clarification about your environment would be in order.

    Second is the specific point about writing a memory object (e.g., a 32 bit integer) directly to a file. As mentioned, AWK does not provide a direct
    way to do this, so the answer to this question is probably "no" - that is,
    you can't do it in straight, unadorned AWK/GAWK. Now, if I were doing this myself, I'd write a short GAWK extension library function that wraps the "fwrite" function, then use that. The gist of it is that the extension lib would do:

    fwrite(&num,4,1,stdout);

    I would do it this way because I like AWK/GAWK a lot and would want to
    continue using it and have no interest in learning any of those other languages. You may or may not agree with me on this.

    Note, BTW, that if it weren't for one thing, this could be done directly
    using my call_any() extension library function, without having to write yet another lib for fwrite(). The problem is that (as far as I can tell) there
    is no way to pass "stdout" from an AWK program to the lib (unless I've overlooked something). "FILE *" objects are not a known type in GAWK. Actually, now that I think about it, I don't think "int *" is either, so
    that also will be problematic.

    Or should I dust off my three-quarters forgotten Python and write the
    program in Python (Or P***) instead?

    That's up to you, but I'd say "no".

    It would depend on how comfortable you are writing an extension lib.
    --
    Religion is what keeps the poor from murdering the rich.

    - Napoleon Bonaparte -

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Thu Apr 2 12:06:10 2026
    From Newsgroup: comp.lang.awk

    In article <10qdg8r$3gdh5$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <10q99ai$ph7$1@news.muc.de>, Alan Mackenzie <acm@muc.de> wrote: >>I think the Subject: line says it all. I have a text source file to >>convert into a binary output file. For example, I want to be able to >>output a 32-bit integer as a four byte little-endian binary integer.

    Can AWK do this? (It is supposed to by Turing complete, isn't it?)
    ...

    Now, if I were doing this myself, I'd write a short GAWK extension library >function that wraps the "fwrite" function, then use that. The gist of it
    is that the extension lib would do:

    fwrite(&num,4,1,stdout);

    I would do it this way because I like AWK/GAWK a lot and would want to >continue using it and have no interest in learning any of those other >languages. You may or may not agree with me on this.

    Since I had an hour to kill, I threw together the code shown below. Note
    that it may have more includes than it needs and also other extraneous stuff; this is because every GAWK extension is just a copy of the previous one,
    with the active code changed out. Note also that this is for GAWK 4.1;
    since they keep changing the API, you may have to modify it if yuu are
    using a different version of GAWK.

    Usage would be like:

    gawk4 -l ./fwrite32 'BEGIN { printf "12345678 in binary is: ";fwrite32(12345678);print "" }'

    Developed under Linux, but I am told this stuff works seamlessly under
    Windows GAWK as well. OP has yet to clarify which OS/environment he is
    using.

    --- Cut Here ---
    /*
    * fwrite32.c - Provide fwrite for GAWK
    * Compile command:
    gcc -shared -I.. -W -Wall -Werror -fPIC -o fwrite32.so fwrite32.c
    */

    #include <stdio.h>
    #include <stddef.h>
    #include <string.h>
    #include <assert.h>
    #include <errno.h>
    #include <stdlib.h>
    #include <alloca.h>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <sys/wait.h>
    #include <signal.h>
    #include <stdarg.h>

    #include "gawkapi.h"
    #define STR str_value.str
    #define LEN str_value.len

    static const gawk_api_t *api; /* for convenience macros to work */
    static awk_ext_id_t *ext_id;
    static const char *ext_version = "fwrite32 extension: version 1.0";
    #include "lintwarn.h"

    int plugin_is_GPL_compatible;

    static int checkArgNum(char *fn_name,int nargs,int wantedArgs)
    {
    if (nargs == wantedArgs) return 0;
    lintwarn(ext_id,"%s: wrong # of arguments (%d) - should be %d!",
    fn_name,nargs,wantedArgs);
    return 1;
    }

    /* do_fwrite32_version */

    static awk_value_t *
    do_fwrite32_version(int nargs, awk_value_t *result)
    {
    if (checkArgNum("fwrite32_version",nargs,0))
    goto the_end;
    return make_const_string(ext_version,strlen(ext_version),result);

    the_end:
    return make_const_string("<ERROR>",7,result);
    }

    /* do_fwrite32 */

    static awk_value_t *
    do_fwrite32(int nargs, awk_value_t *result)
    {
    awk_value_t arg;
    int num;

    if (nargs == 0) {
    puts("fwrite32: Need at least one arg!");
    goto the_end;
    }
    for (int i=0; i<nargs; i++) {
    if (!get_argument(i, AWK_NUMBER , &arg)) {
    lintwarn(ext_id,"fwrite32: Fatal error retrieving arg #%d!",i+1);
    goto the_end;
    }
    num = arg.num_value;
    fwrite(&num,4,1,stdout);
    }
    return make_number(0, result);

    the_end:
    return make_const_string("<ERROR>",7,result);
    }

    static awk_ext_func_t func_table[] = {
    { "fwrite32_version", do_fwrite32_version, 0 },
    { "fwrite32", do_fwrite32, 4 },
    };

    /* define the dl_load function using the boilerplate macro */

    dl_load_func(func_table, fwrite32, "")

    --- Cut Here ---
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Mandela
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Kaz Kylheku@046-301-5902@kylheku.com to comp.lang.awk on Fri Apr 17 00:25:58 2026
    From Newsgroup: comp.lang.awk

    On 2026-03-30, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    Kaz Kylheku <046-301-5902@kylheku.com> writes:
    On 2026-03-28, Alan Mackenzie <acm@muc.de> wrote:
    I think the Subject: line says it all. I have a text source file to
    convert into a binary output file. For example, I want to be able to
    output a 32-bit integer as a four byte little-endian binary integer.

    Firstly, Awk's printf has a %c specifier which will output any byte:

    $ awk 'BEGIN { printf("%c%c", 0x41, 0x0A) }'
    A
    $
    [...]

    The behavior seems to depend on the current locale. I haven't
    investigated it thoroughly. I don't know whether there's a way to
    force binary output.

    For example, the cent sign '¢' is U+00a2, represented in UTF-8
    as the two-byte sequence 0xc2, 0xa2.

    I have LANG=en_US.UTF-8 in my environment.

    $ gawk --version | head -n 1
    GNU Awk 5.2.1, API 3.2, PMA Avon 8-g1, (GNU MPFR 4.2.1, GNU MP 6.3.0)
    $ gawk 'BEGIN { printf("%c\n", 0xa2) }'
    ¢
    $ LANG=C gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
    00000000 a2 0a |..|
    00000002
    $

    I just read your response now, having returned from some travels.

    On a whim, informed by something unrelated to GNU Awk, I tried this:

    $ gawk 'BEGIN { printf("%c\n", 0xa2) }'
    ¢
    $ gawk 'BEGIN { printf("%c\n", 0xa2) }' | hd
    00000000 c2 a2 0a |...|
    00000003
    $ gawk 'BEGIN { printf("%c\n", 0xdca2) }'

    $ gawk 'BEGIN { printf("%c\n", 0xdca2) }' | hd
    00000000 a2 0a |..|
    00000002

    I.e. we are mapping the A2 to the surrogate pair region DCxx.
    When we do this, Awk seems to be putting out the A2 byte
    that we want (and not the UTF-8 encoding of the U+DCA2 code
    point, which would be wrong).

    This is a "thing out there", and I know about it from having implemented
    it in the TXR project's UTF-8 handling also: the concept of mapping
    invalid bytes in UTF-8 input to DCxx code points, and then mapping DCxx
    code points back to bytes on output. (This is in contrast to strict
    handling, such as throwing exceptions on invalid bytes in UTF-8.)

    It seems that GNU Awk implements at least some aspect of this idea.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.21f-Linux NewsLink 1.2