• A feature I'd like to see in GAWK...

    From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Mon Jul 15 18:28:31 2024
    From Newsgroup: comp.lang.awk

    As we know, AWK in general, and GAWK in particular, has several different
    ways of getting data into the program. In addition to the Automatic Input
    Loop (the main feature of AWK), there are several variations of "getline".

    "getline" can be used with files, or with processes (in 2 different ways!),
    or even with network sockets. But the problem with getline is that using
    it breaks the Automatic Input Loop. You can't use the standard "pattern/action" paradigm if your input is coming in via "getline". Yes,
    there are workarounds and yes we've all gotten used to it, but it is a
    shame. For one thing, you can write your program as a shell script, and
    use the shell to pipe in the data from a process. But this is ugly. And
    not always sufficient.

    Now, I have written a GAWK extension to handle this - called "pipeline".
    Here is a sample script that uses "pipeline". Note that the Linux "df"
    command has a "-l" option to show you only the local filesystems, but what
    I usually want is the non-local ones - that's much more interesting. The
    only way I can figure how to get that is to run "df" twice and compare the output with and without "-l". Here is my program (non-local-df):

    --- Cut Here ---
    @load "pipeline"
    @include "abort"
    # Note: You can ignore the "abort" stuff. It is part of my ecosystem, but
    # probably not part of yours.
    BEGIN {
    testAbort(ARGC > 1,"This program takes no args!!!",1)
    pipeline("in","df -l")
    while (ARGC < 3)
    ARGV[ARGC++] = "-"
    }
    ENDFILE { if (ARGIND == 1) pipeline("in","df") }
    ARGIND == 1 { x[$1]; next }
    FNR == 1 || !($1 in x)
    --- Cut Here ---

    Needless to say, I'd like to see this sort of functionality built-in.

    It seems to me that GAWK has been sort of fishing around lately looking for
    new worlds to conquer. Some features have been added lately that seem (to
    me anyway) sort of "out of place". namespaces, MPFR arithmetic (apparently, now deprecated), persistent memory (nifty idea, though I don't really see
    the practicality - and have not gotten around to testing it - i.e.,
    compiling up a new enough version to try it).

    I think something like the above would be more in line with the sort of
    things I'd like to see in GAWK.
    --
    Adderall, pseudoephed, teleprompter
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Mack The Knife@mack@the-knife.org to comp.lang.awk on Tue Jul 16 14:29:10 2024
    From Newsgroup: comp.lang.awk

    While this is interesting, it can actually be done very easily from the
    shell level, using process substitution:

    awk -f foo.awk <(df) <(df)


    In article <v73pof$3gdp5$2@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    As we know, AWK in general, and GAWK in particular, has several different >ways of getting data into the program. In addition to the Automatic Input >Loop (the main feature of AWK), there are several variations of "getline".

    "getline" can be used with files, or with processes (in 2 different ways!), >or even with network sockets. But the problem with getline is that using
    it breaks the Automatic Input Loop. You can't use the standard >"pattern/action" paradigm if your input is coming in via "getline". Yes, >there are workarounds and yes we've all gotten used to it, but it is a
    shame. For one thing, you can write your program as a shell script, and
    use the shell to pipe in the data from a process. But this is ugly. And
    not always sufficient.

    Now, I have written a GAWK extension to handle this - called "pipeline".
    Here is a sample script that uses "pipeline". Note that the Linux "df" >command has a "-l" option to show you only the local filesystems, but what
    I usually want is the non-local ones - that's much more interesting. The >only way I can figure how to get that is to run "df" twice and compare the >output with and without "-l". Here is my program (non-local-df):

    --- Cut Here ---
    @load "pipeline"
    @include "abort"
    # Note: You can ignore the "abort" stuff. It is part of my ecosystem, but
    # probably not part of yours.
    BEGIN {
    testAbort(ARGC > 1,"This program takes no args!!!",1)
    pipeline("in","df -l")
    while (ARGC < 3)
    ARGV[ARGC++] = "-"
    }
    ENDFILE { if (ARGIND == 1) pipeline("in","df") }
    ARGIND == 1 { x[$1]; next }
    FNR == 1 || !($1 in x)
    --- Cut Here ---

    Needless to say, I'd like to see this sort of functionality built-in.

    It seems to me that GAWK has been sort of fishing around lately looking for >new worlds to conquer. Some features have been added lately that seem (to
    me anyway) sort of "out of place". namespaces, MPFR arithmetic (apparently, >now deprecated), persistent memory (nifty idea, though I don't really see
    the practicality - and have not gotten around to testing it - i.e.,
    compiling up a new enough version to try it).

    I think something like the above would be more in line with the sort of >things I'd like to see in GAWK.

    --
    Adderall, pseudoephed, teleprompter


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Tue Jul 16 16:25:28 2024
    From Newsgroup: comp.lang.awk

    In article <669683b6$0$713$14726298@news.sunsite.dk>,
    Mack The Knife <mack@the-knife.org> wrote:
    While this is interesting, it can actually be done very easily from the
    shell level, using process substitution:

    awk -f foo.awk <(df -l) <(df)

    Which, as noted in the OP, is ugly and not AWK, but rather shell.
    (As I said, we all know the workarounds - and we all know they are ugly)

    And it doesn't work if you have to calculate the value of the process to
    run inside the AWK script (which isn't the case with my "df" example, but
    is why I used the phrase "not always sufficient").
    --
    After 4 years of disastrous screwups, Trump now favors 3 policies that I support:
    1) $2K/pp stimulus money. Who doesn't want more money?
    2) Water pressure. My shower doesn't work very well; I want Donnie to come fix it.
    3) Repeal of Section 230. This will lead to the demise of Face/Twit/Gram. Yey!
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.awk on Tue Jul 16 17:10:25 2024
    From Newsgroup: comp.lang.awk

    On 2024-07-15, Kenny McCormack <gazelle@shell.xmission.com> wrote:
    --- Cut Here ---
    @load "pipeline"
    @include "abort"
    # Note: You can ignore the "abort" stuff. It is part of my ecosystem, but
    # probably not part of yours.
    BEGIN {
    testAbort(ARGC > 1,"This program takes no args!!!",1)
    pipeline("in","df -l")
    while (ARGC < 3)
    ARGV[ARGC++] = "-"
    }
    ENDFILE { if (ARGIND == 1) pipeline("in","df") }
    ARGIND == 1 { x[$1]; next }
    FNR == 1 || !($1 in x)
    --- Cut Here ---

    Needless to say, I'd like to see this sort of functionality built-in.

    TXR Lisp Awk macro:

    (awk (:inputs (open-command "df -l")) (#/tmpfs/ (prn [f 5])))
    /run
    /dev/shm
    /run/lock
    /sys/fs/cgroup
    /run/user/122
    /run/user/500
    nil

    :inputs arguments can be files, lists of strings, input streams.

    (awk (:inputs '("alpha beta" "gamma delta")) (t (prn [f 0])))
    alpha
    gamma
    nil
    (awk (:inputs "/etc/hostname") (t (prn [f 0])))
    sun-go
    nil

    nil is the return value of the awk expression. You can control that.
    The awk construct establishes a hidden block named awk around
    your code.

    E.g. return the first tmpfs path from "df -l":

    (awk (:inputs (open-command "df -l"))
    (#/tmpfs/ (return-from awk [f 5])))
    "/run"
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Arti F. Idiot@addr@is.invalid to comp.lang.awk on Tue Jul 16 14:05:56 2024
    From Newsgroup: comp.lang.awk

    On 7/15/24 12:28 PM, Kenny McCormack wrote:
    I think something like the above would be more in line with the sort of things I'd like to see in GAWK.

    +1 ; great idea.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Wed Jul 17 12:23:00 2024
    From Newsgroup: comp.lang.awk

    In article <v76jr4$tu3$1@nnrp.usenet.blueworldhosting.com>,
    Arti F. Idiot <addr@is.invalid> wrote:
    On 7/15/24 12:28 PM, Kenny McCormack wrote:
    I think something like the above would be more in line with the sort of
    things I'd like to see in GAWK.

    +1 ; great idea.

    Well, I think so. The idea is that you shouldn't have to give up the most intrinsic part of AWK (the pattern/action paradigm) just because your input isn't a named (i.e., on the command line) file.

    I think of it as "rehabilitating getline". Bringing it back into the fold, rather than exiling it to the sidelines.

    Note also that my "pipeline" extension only handles the case of a simple process (either input or output - i.e., like AWK's "getline" and "print"
    with "|" redirection). It doesn't handle any of the other variations of getline/print - such as the ones that interface with network sockets. It
    would be nice if a built-in approach did those things as well (and better
    than my extension does).
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/FreeCollege
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Jeremy Brubaker@jbrubake.362@orionarts.invalid to comp.lang.awk on Fri Jul 19 14:26:35 2024
    From Newsgroup: comp.lang.awk

    On 2024-07-15, Kenny McCormack wrote:
    As we know, AWK in general, and GAWK in particular, has several different ways of getting data into the program. In addition to the Automatic Input Loop (the main feature of AWK), there are several variations of "getline".

    "getline" can be used with files, or with processes (in 2 different ways!), or even with network sockets. But the problem with getline is that using
    it breaks the Automatic Input Loop. You can't use the standard "pattern/action" paradigm if your input is coming in via "getline". Yes, there are workarounds and yes we've all gotten used to it, but it is a
    shame. For one thing, you can write your program as a shell script, and
    use the shell to pipe in the data from a process. But this is ugly. And
    not always sufficient.

    Now, I have written a GAWK extension to handle this - called
    "pipeline".

    That sounds quite useful. I am fairly certain I have wished a feature
    like that existed and ended up just wrapping awk with sh but I agree
    that's ugly.

    Awk is underrated IMHO. Not that json/yaml/etc aren't useful things but frequently when I seem them used my first thought is "If you had just
    done well-formatted text records I could have parsed this with awk".

    -- () www.asciiribbon.org | Jeremy Brubaker /\ - against html mail | јЬruЬаkе@оrіоnаrtѕ.іо / neonrex on IRC

    Even a hawk is an eagle among crows.
    --- Synchronet 3.20a-Linux NewsLink 1.114