• function() return VAR vs ..return $0

    From someone@someone@invalid.invalid to comp.lang.awk on Mon Feb 20 10:50:01 2023
    From Newsgroup: comp.lang.awk

    Hello again,
    I wrote a small looping AWK script to practice use of functions and have
    a few questions which maybe some of you could weigh in on.

    The script:
    --
    #! /usr/bin/awk -f
    # dwmstat.awk -- populate dwm(1) window mgr status area.

    BEGIN {
    while(1) {
    status_str = " " temp() " | " load() " | " date() " "
    #system("xsetroot -name '"status_str"'")
    printf "%s\n", status_str
    sleep()
    }
    }

    function temp() {
    while ("sensors -A coretemp-isa-0000"|getline) {
    if ($0 ~ /Package/) {
    sub("\\+","",$4)
    TEMP = "core: " $4
    break
    }
    }
    close("sensors -A coretemp-isa-0000")
    return TEMP
    }

    function uptime() {
    "uptime -p" |getline
    close("uptime -p")
    sub("up","&:")
    sub("ou","")
    sub("utes","")
    return $0
    }

    function load() {
    "uptime" |getline
    close("uptime")
    sub("^.*age:","load:")
    return $0
    }

    function date() {
    "date '+%a %b %d %Y | %I:%M.%S %p'" |getline
    close("date '+%a %b %d %Y | %I:%M.%S %p'")
    sub("\n","")
    return $0
    }

    function sleep () {
    return system("sleep 5");\
    close("sleep 5")
    }

    --
    The script returns a status line that looks like this:
    core: 33.0°C | load: 0.05, 0.10, 0.09 | Mon Feb 20 2023 | 10:44.49 AM

    The commented out xsetroot(1) line will eventually be used to write
    status area via the "-name" parameter; 'printf "%s\n", status_str' is
    just for testing.

    The questions:
    All the functions just return "$0" and the script as written appears to
    run fine. However I've also written a version that uses VARS in the
    various functions, i.e.
    --

    function load_v() {
    "uptime" |getline LOAD
    close("uptime")
    sub("^.*age:","load:",LOAD)
    return LOAD
    }

    --

    I did this because I noticed that while omitting the function vars
    mostly works in some cases -- splitting date and time into two functions
    for example -- the 'return $0' is contaminated with data returned from
    other functions. Is this because "$0" is ultimately a global variable
    or something else, say, a lack of garbage collection?

    In case it matters:
    - OS system being used: Debian 11.x
    - AWKs being used: gawk and mawk

    Other questions:
    - should 'load_v()' be 'load( LOAD)' ? Why?
    - are all these close() calls really necessary?
    - can this script be improved or streamlined further?

    Regards,
    jeorge
    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Feb 20 21:49:59 2023
    From Newsgroup: comp.lang.awk

    On 20.02.2023 18:50, someone wrote:
    Hello again,
    I wrote a small looping AWK script to practice use of functions and have
    a few questions which maybe some of you could weigh in on.

    The script:
    --
    #! /usr/bin/awk -f
    # dwmstat.awk -- populate dwm(1) window mgr status area.

    BEGIN {
    while(1) {
    status_str = " " temp() " | " load() " | " date() " "
    #system("xsetroot -name '"status_str"'")
    printf "%s\n", status_str
    sleep()
    }
    }

    function temp() {
    while ("sensors -A coretemp-isa-0000"|getline) {
    if ($0 ~ /Package/) {
    sub("\\+","",$4)
    TEMP = "core: " $4
    break
    }
    }
    close("sensors -A coretemp-isa-0000")
    return TEMP
    }

    function uptime() {
    "uptime -p" |getline
    close("uptime -p")
    sub("up","&:")
    sub("ou","")
    sub("utes","")
    return $0
    }

    function load() {
    "uptime" |getline
    close("uptime")
    sub("^.*age:","load:")
    return $0
    }

    function date() {
    "date '+%a %b %d %Y | %I:%M.%S %p'" |getline
    close("date '+%a %b %d %Y | %I:%M.%S %p'")
    sub("\n","")
    return $0
    }

    function sleep () {
    return system("sleep 5");\
    close("sleep 5")
    }

    --
    The script returns a status line that looks like this:
    core: 33.0°C | load: 0.05, 0.10, 0.09 | Mon Feb 20 2023 | 10:44.49 AM

    The commented out xsetroot(1) line will eventually be used to write
    status area via the "-name" parameter; 'printf "%s\n", status_str' is
    just for testing.

    The questions:
    All the functions just return "$0" and the script as written appears to
    run fine. However I've also written a version that uses VARS in the
    various functions, i.e.
    --

    function load_v() {
    "uptime" |getline LOAD
    close("uptime")
    sub("^.*age:","load:",LOAD)
    return LOAD
    }

    --

    I did this because I noticed that while omitting the function vars
    mostly works in some cases -- splitting date and time into two functions
    for example -- the 'return $0' is contaminated with data returned from
    other functions. Is this because "$0" is ultimately a global variable
    or something else, say, a lack of garbage collection?

    In case it matters:

    It matters with e.g. 'date' that gawk supports with built-in functions.
    (Cannot tell about mawk's 'date' support.)

    - OS system being used: Debian 11.x
    - AWKs being used: gawk and mawk

    Other questions:
    - should 'load_v()' be 'load( LOAD)' ? Why?

    Local variables (like LOAD) should be declared in the function argument
    list to create a local variable instance (and not a global variable).

    'getline var' is to prefer (to a simple 'getline') to not overwrite $0
    and to leave awk's native read-loop intact.

    - are all these close() calls really necessary?

    close() is necessary for commands that need to be re-invoked anew. To
    make that clear some examples...

    Every '"ps" | getline var' will return in var one line of the same 'ps'
    output. '"date" | getline var' will return the output of the one same
    'date' call if called repeatedly, so subsequent calls will be empty.

    - can this script be improved or streamlined further?

    To prevent code duplication and errors I'd put the commands as strings
    and use, e.g.,
    date_cmd = "date '+%a %b %d %Y | %I:%M.%S %p'"
    date_cmd | getline
    close (date_cmd)
    (and similar for the other external commands, especially for those that
    need a close()).

    I'd use arguments for the functions, e.g. funct sleep(sec) to make them
    more universally usable in case of extensions.

    And it's not obvious to me why all these shell functionality is embedded
    in an awk script, and what the awk code frame actually adds here.

    Janis


    Regards,
    jeorge

    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From jeorge@someone@invalid.invalid to comp.lang.awk on Sat Feb 25 12:34:20 2023
    From Newsgroup: comp.lang.awk

    Thanks for the feedback. Ya, embedding shell commands in AWK frame was
    just a practice thing mostly; probably not much of an additional burden
    on most modern computers.

    jeorge
    --- Synchronet 3.20a-Linux NewsLink 1.113
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Feb 26 15:43:34 2023
    From Newsgroup: comp.lang.awk

    On 2/25/2023 1:34 PM, jeorge wrote:
    Thanks for the feedback.  Ya, embedding shell commands in AWK frame was just a practice thing mostly; probably not much of an additional burden
    on most modern computers.

    Embedding shell commands in AWK introduces a massive burden on any
    computer, often turning tasks that should run in seconds or minutes into
    tasks that take hours or days to run, due to awk having to create a
    subshell each time it has to call such a command. Consider this with
    just 1000 lines of input:

    1) Call a GNU awk function to print the seconds since the epoch:

    $ time seq 1000 | awk '{print systime()}' >/dev/null

    real 0m0.040s
    user 0m0.000s
    sys 0m0.000s


    2) Embed a shell command to do the same thing:

    $ time seq 1000 | awk '{system("date +%s")}' >/dev/null

    real 0m29.628s
    user 0m0.420s
    sys 0m2.410s


    3) Doing the same thing in a shell loop (slow but still much faster than calling it from awk):

    $ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }

    real 0m17.796s
    user 0m0.858s
    sys 0m2.198s


    Regards,

    Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From jeorge@someone@invalid.invalid to comp.lang.awk on Sun Feb 26 20:06:47 2023
    From Newsgroup: comp.lang.awk

    On 2/26/23 2:43 PM, Ed Morton wrote:
    Embedding shell commands in AWK introduces a massive burden on any
    computer, often turning tasks that should run in seconds or minutes into tasks that take hours or days to run, due to awk having to create a
    subshell each time it has to call such a command. Consider this with
    just 1000 lines of input:

    1) Call a GNU awk function to print the seconds since the epoch:

    $ time seq 1000 | awk '{print systime()}' >/dev/null

    real    0m0.040s
    user    0m0.000s
    sys     0m0.000s


    2) Embed a shell command to do the same thing:

    $ time seq 1000 | awk '{system("date +%s")}' >/dev/null

    real    0m29.628s
    user    0m0.420s
    sys     0m2.410s


    3) Doing the same thing in a shell loop (slow but still much faster than calling it from awk):

    $ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }

    real    0m17.796s
    user    0m0.858s
    sys     0m2.198s

    Hmm, I guess my computer is a bit faster:

    $ time seq 1000 | awk '{print systime()}' >/dev/null
    real 0m0.004s
    user 0m0.003s
    sys 0m0.002s

    $ time seq 1000 | awk '{system("date +%s")}' >/dev/null
    real 0m0.836s
    user 0m0.782s
    sys 0m0.099s

    $ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }
    real 0m0.826s
    user 0m0.676s
    sys 0m0.218s

    But I do take your point -- using systime() is over 200 times faster.
    Probably one simple uses something other than awk when things need to
    happen quickly/efficiently and awk lacks a built-in.

    Looking at my practice script some of the data could be pulled from
    /proc, i.e. load and uptime. Other things like core temp, pulled from sensors(1), or battery charge, pulled from upower(1), might not be too
    bad if only done every minute or so.

    Anyway, I appreciate the feedback. I should probably try to rein in my compulsion to over-apply awk as I learn more about it.

    jeorge
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Feb 27 07:23:48 2023
    From Newsgroup: comp.lang.awk

    On 27.02.2023 04:06, jeorge wrote:
    On 2/26/23 2:43 PM, Ed Morton wrote:
    Embedding shell commands in AWK introduces a massive burden on any
    computer, often turning tasks that should run in seconds or minutes
    into tasks that take hours or days to run, due to awk having to create
    a subshell each time it has to call such a command. Consider this with
    just 1000 lines of input:

    1) Call a GNU awk function to print the seconds since the epoch:

    $ time seq 1000 | awk '{print systime()}' >/dev/null

    real 0m0.040s
    user 0m0.000s
    sys 0m0.000s


    2) Embed a shell command to do the same thing:

    $ time seq 1000 | awk '{system("date +%s")}' >/dev/null

    real 0m29.628s
    user 0m0.420s
    sys 0m2.410s


    3) Doing the same thing in a shell loop (slow but still much faster
    than calling it from awk):

    $ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }

    real 0m17.796s
    user 0m0.858s
    sys 0m2.198s

    Hmm, I guess my computer is a bit faster:

    $ time seq 1000 | awk '{print systime()}' >/dev/null
    real 0m0.004s
    user 0m0.003s
    sys 0m0.002s

    $ time seq 1000 | awk '{system("date +%s")}' >/dev/null
    real 0m0.836s
    user 0m0.782s
    sys 0m0.099s

    $ time { seq 1000 | while IFS= read -r; do date +%s; done >/dev/null; }
    real 0m0.826s
    user 0m0.676s
    sys 0m0.218s

    But I do take your point -- using systime() is over 200 times faster. Probably one simple uses something other than awk when things need to
    happen quickly/efficiently and awk lacks a built-in.

    Looking at my practice script some of the data could be pulled from
    /proc, i.e. load and uptime. Other things like core temp, pulled from sensors(1), or battery charge, pulled from upower(1), might not be too
    bad if only done every minute or so.

    Anyway, I appreciate the feedback. I should probably try to rein in my compulsion to over-apply awk as I learn more about it.

    jeorge

    Even though in this specific test case we only want "seconds since
    Epoch" that systime() returns, in the general case, to be fair, we
    better compare the time functions with formatting included; instead
    of

    $ time seq 100000 | awk '{print systime()}' >/dev/null

    better

    $ time seq 100000 | awk '{print strftime("%s")}' >/dev/null

    Otherwise we'd only measure the single specific case.

    And since we're at it; values of magnitude "0m0.004s" might measure
    just noise. To get a more accurate result the N of 'seq <N>' should
    be made larger.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Mon Feb 27 07:26:26 2023
    From Newsgroup: comp.lang.awk

    In article <tth6o8$2o3p$1@nnrp.usenet.blueworldhosting.com>,
    jeorge <someone@invalid.invalid> wrote:
    Embedding shell commands in AWK introduces a burden on small
    computers, often turning tasks that should run in micro-seconds into
    tasks that take seconds or minutes to run, due to awk having to create a
    subshell each time it has to call such a command.

    I have made some edits to the above text, to better reflect modern reality.
    ...

    Looking at my practice script some of the data could be pulled from
    /proc, i.e. load and uptime. Other things like core temp, pulled from >sensors(1), or battery charge, pulled from upower(1), might not be too
    bad if only done every minute or so.

    Given what I think I understand about your task(s), it'd probably be better
    to just write it as a (bash) shell script. Modern bash has most of what
    you need to do real scripting. About the only thing missing is decimal
    (aka, floating point) arithmetic, and this can usually be easily done via a "bc" co-process (or, in a pinch, a call to awk).

    Note that many, but not all, of the things you list can be done "natively"
    in gawk, using direct access to things in /proc and/or /sys, but it is
    often easier and clearer to use the access tools mentioned above (the
    various things listed with (1) after their names).
    --
    Never, ever, ever forget that "Both sides do it" is strictly a Republican meme.

    It is always the side that sucks that insists on saying "Well, you suck, too".

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Mon Feb 27 09:08:45 2023
    From Newsgroup: comp.lang.awk

    On 27.02.2023 08:26, Kenny McCormack wrote:
    In article <tth6o8$2o3p$1@nnrp.usenet.blueworldhosting.com>,
    jeorge <someone@invalid.invalid> wrote:

    [...]

    Given what I think I understand about your task(s), it'd probably be better to just write it as a (bash) shell script.

    Yeah, that's what I also thought when I upthread asked for a rationale
    of using awk as technical frame for a shell task. - I read the OP's
    statements as if he's just experimenting with awk.

    Modern bash has most of what
    you need to do real scripting. About the only thing missing is decimal
    (aka, floating point) arithmetic, and this can usually be easily done via a "bc" co-process (or, in a pinch, a call to awk).

    Note that the shell features are minimalistic here, so any standard
    POSIX shell will do, and then you can use ksh (instead of bash) to
    also do the FP arithmetics in shell and avoid clumsy and inefficient
    workaround with external processes.

    Janis


    [...]


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Wed Mar 8 11:43:41 2023
    From Newsgroup: comp.lang.awk

    On Monday, February 27, 2023 at 3:08:47 AM UTC-5, Janis Papanagnou wrote:
    On 27.02.2023 08:26, Kenny McCormack wrote:
    In article <tth6o8$2o3p$1...@nnrp.usenet.blueworldhosting.com>,
    jeorge <som...@invalid.invalid> wrote:

    [...]

    Given what I think I understand about your task(s), it'd probably be better
    to just write it as a (bash) shell script.
    Yeah, that's what I also thought when I upthread asked for a rationale
    of using awk as technical frame for a shell task. - I read the OP's statements as if he's just experimenting with awk.
    Modern bash has most of what
    you need to do real scripting. About the only thing missing is decimal (aka, floating point) arithmetic, and this can usually be easily done via a
    "bc" co-process (or, in a pinch, a call to awk).
    Note that the shell features are minimalistic here, so any standard
    POSIX shell will do, and then you can use ksh (instead of bash) to
    also do the FP arithmetics in shell and avoid clumsy and inefficient workaround with external processes.

    Janis


    […]
    if you only care for unix epochs numerically, and don't mind constantly resetting your rand() seed, then here's one way to extract it within just about any awk, even those without systime() :
    for _____ in 1; do (____='($!NF = sprintf("%.0s%.*f", srand(), ___ = ( (__=srand()) ~ "#" ) * 6, substr(__, ++___)))^_'; for ___ in 'gawk -P' 'gawk -c' 'gawk -M' 'gawk -l time' 'gawk -be' nawk mawk1 mawk2 ; do echo " $___ ::\n\n$( (time ( jot 1000 | $(printf '%s' "$___" ) "$____" ) | gcat -b ) | gtail -n 3 )\n"; done; gawk -p- "$____" <<<'' ) done | gsed -zE 's/\t/ /g; s/ /. /g'
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 88% cpu 0.018 total
    gcat -b 0.00s user 0.00s system 7% cpu 0.018 total
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 90% cpu 0.017 total
    gcat -b 0.00s user 0.00s system 9% cpu 0.016 total
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.48s user 0.01s system 99% cpu 0.484 total
    gcat -b 0.00s user 0.00s system 0% cpu 0.484 total
    gawk: warning: The time extension is obsolete. Use the timex extension from gawkextlib instead.
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 89% cpu 0.017 total
    gcat -b 0.00s user 0.00s system 8% cpu 0.017 total
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.01s user 0.00s system 91% cpu 0.016 total
    gcat -b 0.00s user 0.00s system 9% cpu 0.016 total
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.00s user 0.00s system 102% cpu 0.007 total
    gcat -b 0.00s user 0.00s system 20% cpu 0.007 total
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.00s user 0.00s system 106% cpu 0.004 total
    gcat -b 0.00s user 0.00s system 41% cpu 0.004 total
    ( jot 1000 | $(printf '%s' "$___" ) "$____"; ) 0.00s user 0.00s system 103% cpu 0.006 total
    gcat -b 0.00s user 0.00s system 22% cpu 0.006 total
    gawk -P ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    gawk -c ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    gawk -M ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    gawk -l time ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    gawk -be ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    nawk ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    mawk1 ::
    . 998. . 1678303956
    . 999. . 1678303956
    . 1000. . 1678303956
    mawk2 ::
    . 998. . 1678303956.854256
    . 999. . 1678303956.854260
    . 1000. . 1678303956.854263
    1678303956
    . . # gawk profile, created Wed Mar. 8 14:32:36 2023
    . . # Rule(s)
    . . 1. ($! NF = sprintf("%.0s%.*f", srand(), ___ = ((__ = srand()) ~ "#") * 6, substr(__, ++___))) ^ _ { # 1
    . . 1. . . print
    . . }
    ** srand() needs to be called twice since it only returns the previous seed
    ** mawk2 uniquely provides micro-second precision for platforms that support it. Floor it or round it to align with the others.
    --- Synchronet 3.20a-Linux NewsLink 1.114