• The Art of Unix Programming - Case Study: awk

    From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sun Jan 30 22:02:40 2022
    From Newsgroup: comp.lang.awk

    I accidentally stumbled across the book "The Art of UNIX Programming"
    (2004), by Eric S. Raymond. It has a chapter on Awk (about one and a
    half page long). I was a bit astonished about quite some statements, valuations, and conclusions. (And not only in the light of a recent
    Fosslife article that Arnold informed us about here in c.l.a in May
    2021.)

    Here are two paragraphs quoted from the book. I'm interested in your
    opinions.

    " The awk language was originally designed to be a small,
    expressive special-purpose language for report generation.
    Unfortunately, it turns out to have been designed at a bad
    spot on the complexity-vs.-power curve. The action language
    is noncompact, but the pattern-driven framework it sits
    inside keeps it from being generally applicable — that’s the
    worst of both worlds. And the new-school scripting languages
    can do anything awk can; their equivalent programs are
    usually just as readable, if not more so. "

    " For a few years after the release of Perl in 1987, awk
    remained competitive simply because it had a smaller, faster
    implementation. But as the cost of compute cycles and memory
    dropped, the economic reasons for favoring a special-purpose
    language that was relatively thrifty with both lost their
    force. Programmers increasingly chose to do awklike things
    with Perl or (later) Python, rather than keep two different
    scripting languages in their heads. By the year 2000 awk had
    become little more than a memory for most old-school Unix
    hackers, and not a particularly nostalgic one. "


    Janis
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Sun Jan 30 21:43:32 2022
    From Newsgroup: comp.lang.awk

    In article <st6udg$k03$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    I accidentally stumbled across the book "The Art of UNIX Programming"
    (2004), by Eric S. Raymond. It has a chapter on Awk (about one and a
    half page long). I was a bit astonished about quite some statements, >valuations, and conclusions. (And not only in the light of a recent
    Fosslife article that Arnold informed us about here in c.l.a in May
    2021.)

    Here are two paragraphs quoted from the book. I'm interested in your >opinions.

    Obviously, this guy is full of crap.

    That's not as uncommon a situation (even in those we are supposed to admire
    and hold up as heroes) as we'd like it to be.
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/Pedantic
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Jan 30 15:48:51 2022
    From Newsgroup: comp.lang.awk

    On 1/30/2022 3:02 PM, Janis Papanagnou wrote:
    I accidentally stumbled across the book "The Art of UNIX Programming"
    (2004), by Eric S. Raymond. It has a chapter on Awk (about one and a
    half page long). I was a bit astonished about quite some statements, valuations, and conclusions. (And not only in the light of a recent
    Fosslife article that Arnold informed us about here in c.l.a in May
    2021.)

    Here are two paragraphs quoted from the book. I'm interested in your opinions.

    " The awk language was originally designed to be a small,
    expressive special-purpose language for report generation.
    Unfortunately, it turns out to have been designed at a bad
    spot on the complexity-vs.-power curve. The action language
    is noncompact, but the pattern-driven framework it sits
    inside keeps it from being generally applicable — that’s the
    worst of both worlds. And the new-school scripting languages
    can do anything awk can; their equivalent programs are
    usually just as readable, if not more so. "

    " For a few years after the release of Perl in 1987, awk
    remained competitive simply because it had a smaller, faster
    implementation. But as the cost of compute cycles and memory
    dropped, the economic reasons for favoring a special-purpose
    language that was relatively thrifty with both lost their
    force. Programmers increasingly chose to do awklike things
    with Perl or (later) Python, rather than keep two different
    scripting languages in their heads. By the year 2000 awk had
    become little more than a memory for most old-school Unix
    hackers, and not a particularly nostalgic one. "


    Janis

    Sounds like he completely missed the point on how and why to use awk, misunderstood the huge benefits of a tiny language that doesn't have a
    million constructs to do things "compactly", and is unaware of current
    awk usage which, if the questions posted on StackOverflow and
    StackExchange are any indication, is thriving.

    Ed.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Sun Jan 30 21:59:40 2022
    From Newsgroup: comp.lang.awk

    In article <st70q4$3r4c6$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <st6udg$k03$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    I accidentally stumbled across the book "The Art of UNIX Programming" >>(2004), by Eric S. Raymond. It has a chapter on Awk (about one and a
    half page long). I was a bit astonished about quite some statements, >>valuations, and conclusions. (And not only in the light of a recent >>Fosslife article that Arnold informed us about here in c.l.a in May
    2021.)

    Here are two paragraphs quoted from the book. I'm interested in your >>opinions.

    Obviously, this guy is full of crap.

    That's not as uncommon a situation (even in those we are supposed to admire >and hold up as heroes) as we'd like it to be.

    It's funny in particular, since he mentions the power-complexity curve, and
    I always thought that was AWK's main strength - that's thing I always liked about it - that it was perfectly situated on that curve. You can do really cool things in AWK w/o having to spend lots of time bowing down to the gods
    of the language. I.e., with AWK, you can sit down and start writing your algorithm w/o having to spend lots of time writing boilerplate code to get started, as with most other languages.

    The problem really is as it with everything - ya always gotta push the new stuff. Whether we're talking about books, movies, TV, music, programming languages, whatever. You always have to be pushing the new stuff and disparaging the old. In fact, I read something recently that most of the interest in music these days is in old music and this is viewed as a
    certified Bad Thing, by people who need people to be interested in (and, of course, buying and supporting) new music in order to keep the economic
    engine running.
    --
    Reading any post by Fred Hodgin, you're always faced with the choice of:
    lunatic, moron, or troll.

    I always try to be generous and give benefit of the doubt, by assuming troll. --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sun Jan 30 23:23:59 2022
    From Newsgroup: comp.lang.awk

    On 30.01.2022 22:59, Kenny McCormack wrote:
    In article <st70q4$3r4c6$1@news.xmission.com>,
    Kenny McCormack <gazelle@shell.xmission.com> wrote:
    In article <st6udg$k03$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    I accidentally stumbled across the book "The Art of UNIX Programming"
    (2004), by Eric S. Raymond. It has a chapter on Awk (about one and a
    half page long). I was a bit astonished about quite some statements,
    valuations, and conclusions. (And not only in the light of a recent
    Fosslife article that Arnold informed us about here in c.l.a in May
    2021.)

    Here are two paragraphs quoted from the book. I'm interested in your
    opinions.

    Obviously, this guy is full of crap.

    That's not as uncommon a situation (even in those we are supposed to admire >> and hold up as heroes) as we'd like it to be.

    It's funny in particular, since he mentions the power-complexity curve, and
    I always thought that was AWK's main strength - that's thing I always liked about it - that it was perfectly situated on that curve.

    Yep. Exactly this was where I just thought: "WHAT?" (or rather "WTF?").
    This ratio is what I also regularly communicate as big strength of Awk.

    I read that paragraph two or three times to understand Eric's mindset; obviously he comes from a general purpose language as being the target function. I would say (as also Ed formulated it), he missed the point.

    I also stumbled across the argument of other languages (in context of
    Perl and Python as the only mentioned languages) supposedly being the preferable alternatives ("are usually just as readable, if not more
    so").

    You can do really
    cool things in AWK w/o having to spend lots of time bowing down to the gods of the language. I.e., with AWK, you can sit down and start writing your algorithm w/o having to spend lots of time writing boilerplate code to get started, as with most other languages.

    The problem really is as it with everything - ya always gotta push the new stuff. Whether we're talking about books, movies, TV, music, programming languages, whatever. You always have to be pushing the new stuff and disparaging the old. [...]

    It's interesting that in the same book (200 pages earlier), in chapter
    1.2, "The Durability of Unix", he points to the strengths of Unix and
    why Unix persisted (and even got stronger).

    Schizophrenic.

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Sun Jan 30 23:26:59 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    I accidentally stumbled across the book "The Art of UNIX Programming"
    (2004), by Eric S. Raymond. It has a chapter on Awk (about one and a
    half page long). I was a bit astonished about quite some statements, valuations, and conclusions. (And not only in the light of a recent
    Fosslife article that Arnold informed us about here in c.l.a in May
    2021.)

    Here are two paragraphs quoted from the book. I'm interested in your opinions.

    " The awk language was originally designed to be a small,
    expressive special-purpose language for report generation.
    Unfortunately, it turns out to have been designed at a bad
    spot on the complexity-vs.-power curve. The action language
    is noncompact, but the pattern-driven framework it sits
    inside keeps it from being generally applicable — that’s the
    worst of both worlds. And the new-school scripting languages
    can do anything awk can; their equivalent programs are
    usually just as readable, if not more so. "

    " For a few years after the release of Perl in 1987, awk
    remained competitive simply because it had a smaller, faster
    implementation. But as the cost of compute cycles and memory
    dropped, the economic reasons for favoring a special-purpose
    language that was relatively thrifty with both lost their
    force. Programmers increasingly chose to do awklike things
    with Perl or (later) Python, rather than keep two different
    scripting languages in their heads. By the year 2000 awk had
    become little more than a memory for most old-school Unix
    hackers, and not a particularly nostalgic one. "

    I think there's some truth in this. I don't like the quasi-scientific
    way it's put -- I'll bet ESR has no measurements of complexity or
    power -- but the story matches my experience of people moving away from
    AWK.

    As someone else has said here, there's a lot to be said for a small
    language, but that advantage starts to drain away as soon as you are
    forced to bite the bullet of using a bigger one (whatever that really
    means). A huge driver of this for Perl was CPAN. Perl had publicly
    shared modules so you could knock up something to parse out bits of
    HTML, process emails and so on in just an hour or so. And you could
    avoid name clashes quite easily. At the time (and maybe to this day)
    you shared AWK code by literal copying of text into your script, hoping
    that no name clashes would cause trouble.

    Of course, AWK was not designed for things like HTML, but once you know
    enough Perl to do the one project that needed it, it's right there for
    the next one, even if AWK would do that one just as well. In fact, even
    if AWK can do the next project /better/, because keeping two scripting languages in your head is not as easy as keeping one there.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Mon Jan 31 11:02:08 2022
    From Newsgroup: comp.lang.awk

    On 31.01.2022 00:26, Ben Bacarisse wrote:

    As someone else has said here, there's a lot to be said for a small
    language, but that advantage starts to drain away as soon as you are
    forced to bite the bullet of using a bigger one (whatever that really
    means). A huge driver of this for Perl was CPAN. Perl had publicly
    shared modules so you could knock up something to parse out bits of
    HTML, process emails and so on in just an hour or so. And you could
    avoid name clashes quite easily. At the time (and maybe to this day)
    you shared AWK code by literal copying of text into your script, hoping
    that no name clashes would cause trouble.

    I am skeptical about that. Aren't you essentially drawing the picture
    of "featureitis" - feature driven language enhancements? It seems to
    me that often hype starts and initially fosters a new language, and
    fans continue using these languages for things initially unintended,
    so that the application domain is expanded step by step, and (if done
    in the right way) by libraries, successfully (in a way). The result
    appears to be asymptotically evolution of general purpose languages, or something that's intended as one; often ignoring sophisticated design (Javascript comes to my mind)[*]. But the point is, in my opinion, that
    the original intent to be a small language that covers only a special
    domain gets lost. The schizophrenic thing - also just in my opinion -
    is that it seems contrary to Unix-Philosophy, the separation of duties
    and keeping tools small and specialized -; incidentally also described
    by ESR in that book extensively. It's one thing if folks are fans of
    some new language (because of design, features, applicable for their
    domain, or a good marketing division, etc.) and focus on that; that's
    fine. Or whether powerful special purpose languages - I consider awk
    as one - are dismissed because they are not general purpose monoliths
    or feature-full toolboxes.[**]

    Janis

    [*] This is different from language designs as we know it e.g. from
    the 1960's (e.g. Simula, Algol) or even later (e.g. C++), especially
    when standards-driven.

    [**] Incidentally GNU Awk opens that path with its Extension Library,
    without actually taking it.

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Mon Jan 31 17:15:56 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 31.01.2022 00:26, Ben Bacarisse wrote:

    As someone else has said here, there's a lot to be said for a small
    language, but that advantage starts to drain away as soon as you are
    forced to bite the bullet of using a bigger one (whatever that really
    means). A huge driver of this for Perl was CPAN. Perl had publicly
    shared modules so you could knock up something to parse out bits of
    HTML, process emails and so on in just an hour or so. And you could
    avoid name clashes quite easily. At the time (and maybe to this day)
    you shared AWK code by literal copying of text into your script, hoping
    that no name clashes would cause trouble.

    I am skeptical about that. Aren't you essentially drawing the picture
    of "featureitis" - feature driven language enhancements?

    I don't think so because I don't think I'm talking about language
    enhancements.

    It seems to
    me that often hype starts and initially fosters a new language, and
    fans continue using these languages for things initially unintended,
    so that the application domain is expanded step by step, and (if done
    in the right way) by libraries, successfully (in a way). The result
    appears to be asymptotically evolution of general purpose languages, or something that's intended as one; often ignoring sophisticated design (Javascript comes to my mind)[*].

    I must be having a bad day because I don't follow. I was describing how
    a lot of people I know transitioned away from AWK. It started with a
    task that AWK was not good at (I'm not blaming AWK here) but then they
    have two options for the next task.

    I don't think any of the people I am thinking of were subject to hype.
    For one thing, I'm talking about the late 80s and early 90s. There
    really wasn't much "hype" about scripting languages. You just used
    whatever tools suited the task.

    But the point is, in my opinion, that
    the original intent to be a small language that covers only a special
    domain gets lost. The schizophrenic thing - also just in my opinion -
    is that it seems contrary to Unix-Philosophy, the separation of duties
    and keeping tools small and specialized -; incidentally also described
    by ESR in that book extensively.

    This is why keeping AWK simple and narrowly focused is good. But that
    will inevitable lead people to find alternatives for some tasks, and
    that is a danger (if you want to look at it like a competition) because
    it opens the door to using these other alternatives in the future.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Mon Jan 31 18:55:02 2022
    From Newsgroup: comp.lang.awk

    On 2022-01-31, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:
    On 31.01.2022 00:26, Ben Bacarisse wrote:

    As someone else has said here, there's a lot to be said for a small
    language, but that advantage starts to drain away as soon as you are
    forced to bite the bullet of using a bigger one (whatever that really
    means). A huge driver of this for Perl was CPAN. Perl had publicly
    shared modules so you could knock up something to parse out bits of
    HTML, process emails and so on in just an hour or so. And you could
    avoid name clashes quite easily. At the time (and maybe to this day)
    you shared AWK code by literal copying of text into your script, hoping
    that no name clashes would cause trouble.

    I am skeptical about that. Aren't you essentially drawing the picture
    of "featureitis" - feature driven language enhancements? It seems to
    me that often hype starts and initially fosters a new language, and
    fans continue using these languages for things initially unintended,

    Perl had enough capabilities in the core language that someone could
    write, say, a useable interface module to some RDBMS. Or HTTP serving or querying or whatever.

    For some people, not having such a ready-made module will be a
    deal-breaker. And that's even if it *can* be written in some language
    they are considering. Most of those things can't be written in Awk
    because it has insufficient system access.

    [**] Incidentally GNU Awk opens that path with its Extension Library,
    without actually taking it.

    This is is probably too late, and in an awkward form. Leading scripting languages nowadays have a FFI (foreign function interface), whereby you
    can bind to shared libraries without having to compile (let alone write)
    any C code, just using FFI statements in the scripting language.

    Lisps had this kind of FFI going back to at least the 1980's. E.g.
    DEC's VaxLisp.

    Here is a documentation reference: VAX LISP VMS System Access
    Programming Guide, May 1986:

    http://www.softwarepreservation.org/projects/LISP/common_lisp_family/dec/VAX_LISP_VMS_System_Access_Programming_Guide_May86.pdf

    There are examplews in section 2.10 like, calling an Acos function
    in a math library:

    (DEFINE-EXTERNAL-ROUTINE (MTH$ACOSD
    :FILE "MTHRTL"
    :RESULT (:LISP-TYPE SINGLE-FLOAT
    :VAX-TYPE :F-FLOATING))
    "This routine returns the arc cosine
    of an angle in degrees."
    (X :LISP-TYPE SINGLE-FLOAT
    :VAX-TYPE :F-FLOATING))

    That's it: bind to the MTH$ACOSD symbol in the MTHRTL library
    (RTL == run-time library, likely). The lisp and external types
    of the paramter and return value are specified, and all the conversion
    happens automatically.

    After the above DEFINE-EXTERNAL-ROUTINE incantation, it's ready for use:

    Lisp> (CALL-OUT MTH$ACOSD 0.5)
    60.0

    OK, that's either state of the art for 1986, or else a lower bound for what
    was state of the art.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Tue Feb 1 16:59:14 2022
    From Newsgroup: comp.lang.awk

    On 31.01.2022 18:15, Ben Bacarisse wrote:
    [ keep things simple and specialized Unix philosophy ]

    This is why keeping AWK simple and narrowly focused is good. But that
    will inevitable lead people to find alternatives for some tasks, and
    that is a danger (if you want to look at it like a competition) because
    it opens the door to using these other alternatives in the future.

    Well, I think it's okay to find a language better suited for a given
    task. Certainly better that if every [simple] language gets enhanced
    only for competition purposes (which is no argument for me).

    In that light I cannot understand how ESR came to the statements that
    I quoted in my OP.

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Tue Feb 1 16:43:44 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 31.01.2022 18:15, Ben Bacarisse wrote:
    [ keep things simple and specialized Unix philosophy ]

    This is why keeping AWK simple and narrowly focused is good. But that
    will inevitable lead people to find alternatives for some tasks, and
    that is a danger (if you want to look at it like a competition) because
    it opens the door to using these other alternatives in the future.

    Well, I think it's okay to find a language better suited for a given
    task. Certainly better that if every [simple] language gets enhanced
    only for competition purposes (which is no argument for me).

    In that light I cannot understand how ESR came to the statements that
    I quoted in my OP.

    The problem is cognitive load. Not everyone can (or wants to) keep
    multiple general-purpose programming languages, several scripting
    languages and a couple of shell languages in their head all the time.

    To my mind, where ESR errs is in thinking of AWK as a scripting language
    at all. If you think of it as a tool for manipulating line-oriented
    text files to be used alongside Unixes other tool like grep, cut, sort,
    uniq then you probably won't mind the space it takes up in your head.

    The sweet spot is a simple, easy to remember language, that can do 95%
    of the tasks you want to script. ESR thinks AWK misses that sweet spot
    and that that explains why it is not much used these days. You need a particular computing environment to see exactly where AWK fits in.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Tue Feb 1 22:21:06 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-01, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    To my mind, where ESR errs is in thinking of AWK as a scripting language
    at all. If you think of it as a tool for manipulating line-oriented
    text files to be used alongside Unixes other tool like grep, cut, sort,
    uniq then you probably won't mind the space it takes up in your head.

    Where ESR errs is believing that Awk is a language he actually knows.

    Otherwise he'd know that you can use the "curly brace dialect" without
    the pattern-condtiion framework, other than a BEGIN clause.

    function helper()
    {
    }

    function main()
    {
    helper();
    }

    BEGIN { main(); }

    Awk turns off the pattern-action framework when there are no patterns
    and actions other than BEGIN.

    He said some pretty quirky things about Lisp also, like it being
    useful for some profound enlightenment "once you get it" that will
    magically make you a better programmer for the rest of your days,
    without you having to learn anything about Lisp or continue using
    it beyond that one "eureka" moment.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Wed Feb 2 03:02:21 2022
    From Newsgroup: comp.lang.awk

    Kaz Kylheku <480-992-1380@kylheku.com> writes:

    On 2022-02-01, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    To my mind, where ESR errs is in thinking of AWK as a scripting language
    at all. If you think of it as a tool for manipulating line-oriented
    text files to be used alongside Unixes other tool like grep, cut, sort,
    uniq then you probably won't mind the space it takes up in your head.

    Where ESR errs is believing that Awk is a language he actually knows.

    Otherwise he'd know that you can use the "curly brace dialect" without
    the pattern-condtiion framework, other than a BEGIN clause.

    function helper()
    {
    }

    function main()
    {
    helper();
    }

    BEGIN { main(); }

    Awk turns off the pattern-action framework when there are no patterns
    and actions other than BEGIN.

    I'll take your word that he did not know this. But how does this weaken
    what he was saying? The "curly brace dialect" of AWK is hardly a better
    AWK. It's AWK without the most convenient part (for most tasks).
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Wed Feb 2 06:29:33 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-02, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    Kaz Kylheku <480-992-1380@kylheku.com> writes:

    On 2022-02-01, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    To my mind, where ESR errs is in thinking of AWK as a scripting language >>> at all. If you think of it as a tool for manipulating line-oriented
    text files to be used alongside Unixes other tool like grep, cut, sort,
    uniq then you probably won't mind the space it takes up in your head.

    Where ESR errs is believing that Awk is a language he actually knows.

    Otherwise he'd know that you can use the "curly brace dialect" without
    the pattern-condtiion framework, other than a BEGIN clause.

    function helper()
    {
    }

    function main()
    {
    helper();
    }

    BEGIN { main(); }

    Awk turns off the pattern-action framework when there are no patterns
    and actions other than BEGIN.

    I'll take your word that he did not know this.

    At that time he wrote incorrect statements, which he apparently hasn't
    subject to errata.

    ESR> Programs in awk consist of pattern/action pairs.

    Contradicted by my example above: function definitions are not
    pattern/action pairs.

    ESR> Each pattern is a regular expression, [...]

    Nope.

    ESR> The action language is noncompact, but the pattern-driven
    framework it sits inside keeps it from being generally applicable

    Whatever.

    But how does this weaken
    what he was saying? The "curly brace dialect" of AWK is hardly a better
    AWK.

    I think that by "action language" he means that stuff written between
    the curly braces. It is supposedly trapped in this pattern/action DSL and
    so cannot be used to just write normal programs.

    It's AWK without the most convenient part (for most tasks).

    But it's not exactly/entirely without it. because something like

    { print $1 + $2 }

    can be coded explicitly in the "action language":

    function main()
    {
    while (getline > 0) {
    print $1 + $2
    }
    }

    BEGIN { main() }

    Call getline to to do the input and field splitting, then
    code your own loop around it, using if for the conditions.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Jeremy Brubaker@jbrubake.362@orionarts.invalid to comp.lang.awk on Thu Feb 3 15:31:45 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-01, Kaz Kylheku wrote:
    On 2022-02-01, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    To my mind, where ESR errs is in thinking of AWK as a scripting language
    at all. If you think of it as a tool for manipulating line-oriented
    text files to be used alongside Unixes other tool like grep, cut, sort,
    uniq then you probably won't mind the space it takes up in your head.

    Where ESR errs is believing that Awk is a language he actually knows.

    Otherwise he'd know that you can use the "curly brace dialect" without
    the pattern-condtiion framework, other than a BEGIN clause.

    function helper()
    {
    }

    function main()
    {
    helper();
    }

    BEGIN { main(); }

    Awk turns off the pattern-action framework when there are no patterns
    and actions other than BEGIN.

    I recently figured this out and it made me appreciate awk more. I found
    it a good language for processing text files even beyond the standard
    case where each line is the same record type.

    By embracing awk's capabilities as a more robust language I could solve problems with text file input fairly quickly by combining the standard pattern-action method with the function oriented approach Kaz mentions
    above.

    Awk isn't good for everything, but I'm glad I have a better appreciation
    of it.
    --
    () www.asciiribbon.org | Jeremy Brubaker
    /\ - against html mail | јЬruЬаkе@оrіоnаrtѕ.іо / neonrex on IRC

    I Know A Joke!!
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Olaf Schultz@o.schultz@enhydralutris.de to comp.lang.awk on Fri Feb 4 20:41:15 2022
    From Newsgroup: comp.lang.awk

    Am 30.01.22 um 22:02 schrieb Janis Papanagnou:
    ..

    Just my 5 ct:
    With programming languages I started with BASIC and switched in approx.
    1989 to AutoLISP.

    I'm using awk since 1996 for data conversion at first. And from that it
    is to 99.x percent my coding-language (beside LaTeX;-)
    Now mainly for manipulation of FE pre- and post-processing (Nastran, Abaqus...) on Unix/Linux. But also for data processing at home
    (measurement, conversion...)

    Other colleagues tend to use python or TCL (as the Calculation/FEM-Tools
    are using this as interface-language).

    The underestimated smoothness of awk is: Piping from the command line
    for very little helpers... Oh it gets to complex? Go with the same
    language to a larger piece code in an editor.

    In a few cases we made benchmarks against coding (execution speed) and readability and coding speed...) awk was not the slowest (perl, python).

    So I try to encourage new colleagues to use awk as often as I can.

    Olaf

    PS: An awk a day keeps the python away;-)

    PPS: And a careful look in codes and files I receive at work from
    unknown people: I'm not the only awk-user there. So that language is
    clearly not dead, it's underestimated.

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sat Feb 5 13:45:21 2022
    From Newsgroup: comp.lang.awk

    On 04.02.2022 20:41, Olaf Schultz wrote:

    So I try to encourage new colleagues to use awk as often as I can.

    And sometimes curious colleagues are asking. I recall to have done
    some data evaluations, a couple awk scripts (1-4 liners), and when
    they saw how cute solutions with awk can be they immediately asked
    for a presentation.[*]

    Janis

    [*] A similar reaction, BTW, I once had when doing editing with vi;
    some Unix tools are just brilliant.

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Tue Feb 8 16:28:08 2022
    From Newsgroup: comp.lang.awk

    all i can say is that eric raymond definitely exists in his own fantasy land where he equates unnecessary complexity as a virtue, and treats bloat as a feature.

    if he could tell me what to type in python3 to apply comma-formatting to a single 526,824,456-digit integer in less than 6.24 seconds, great i'll heed his advice and switch over.

    until such time, i'll trust the raw power of awk above all.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Tue Feb 8 16:49:48 2022
    From Newsgroup: comp.lang.awk

    i'm an ultra late-comer to awk - only discovering it in 2017-2018. and the moment i found it, i realized nearly all else - perl R python java C# - can be thrown straight into the toilet, if performance is a key criteria for the task at a hand
    unwashed masses still think awk is for benchmarking against perl or python. i skip those and directly benchmark my codes against compiled C-code binaries, which awk is very competitive against - rather remarkable for something largely, if not entirely, single threaded.
    the analogy i would use is python being an artist with every brush and every color, pre-mixed, every paint type, and tasked to draw on a 3x2 canvas.
    awk would be an artist with only 2 brushes, 1 type of paint, and only the 3 basic colors - even getting to orange green and purple would require manual mixing by the painter herself…... and the only thing constraining her from fully expressing her talents in ceiling murals,
    would be the height of the Sistine Chapel itself.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Axel Reichert@mail@axel-reichert.de to comp.lang.awk on Wed Feb 9 08:49:59 2022
    From Newsgroup: comp.lang.awk

    Kpop 2GM <jason.cy.kwan@gmail.com> writes:

    i'm an ultra late-comer to awk - only discovering it in 2017-2018. and
    the moment i found it, i realized nearly all else - perl R python java
    C# - can be thrown straight into the toilet, if performance is a key
    criteria for the task at a hand

    I would rather go for TCW (Total Cost of Wizardry): A competent Python programmer once consulted me on performance tuning for an (ASCII data
    mangling) script he had written (which took him about 30 min). It was
    running since 10 min, and no end in sight according to a monitor on the (transformed) output. After he had explained the task at hand, I replied
    that I would not use Python, but rather some Unix command line tools. I
    started immediately, cobbled something together (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep). It delivered the desired results before his Python script was finished. So
    the final tally was "10 min" versus "> 30 min + 10 min + 10 min".

    Once the logic becomes more intricate, I will usually go for Python
    though, so I will use awk mostly for command line use, rarely as a file
    to be run by "awk -f".

    I was also a later-comer to this tool. When I started to learn Perl in
    the late 90s, I learned that it was a superset to sed and awk (coming
    even with conversion scripts), and so I gave the older tools another try
    (the "man" pages were completely incomprehensible to me before, I could
    not wrap my head around stream processing). Once it clicked, I rarely
    used Perl anywmore.

    Same goes for spreadsheet tools, for which I also seldom feel the need.

    Best regards

    Axel
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Wed Feb 9 16:36:06 2022
    From Newsgroup: comp.lang.awk

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    (And as an essential plus; you can keep state information in the awk
    instance where managing state between the first and the last process in
    a pipeline is cumbersome, to say the least, sometimes "impossible", and
    usually inefficient. - But I am digressing.)

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Wed Feb 9 10:56:52 2022
    From Newsgroup: comp.lang.awk

    (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep). It delivered the desired results before his Python script was finished. So
    the final tally was "10 min" versus "> 30 min + 10 min + 10 min".

    Once the logic becomes more intricate, I will usually go for Python
    though, so I will use awk mostly for command line use, rarely as a file
    to be run by "awk -f".
    funny you mentioned "the usual suspects". You can replicate the following test benchmark attempting a bare-bones replication of unix utility [ wc ] , for both GNU wc ("gwc") and BSD wc ("wc") :
    Obviously this isn't a full UTF8 validator to deal with all the edge cases of non-UTF8-compliant input, but assuming the input is already known to be UTF8-valid text, even after setting locale to LC_ALL=C, i.e. byte-level only and not UTF8-aware, to count rows, UTF8 characters, and bytes of a 1.84 GB input file,
    when compared against 17.9 secs of BSD wc and 23.3 secs of GNU wc,
    — gawk 5.1.1 posts a reasonably competitive time of 31.5 secs,
    — mawk 1.3.4 's time of 19.3secs beats GNU wc and being only slightly slower than BSD wc, while
    — mawk 1.9.9.6 's impressive 12.7secs leaves both in the dust, some 41% faster than BSD wc, and a whopping 83% faster than GNU wc. I wasn't kidding when I said I benchmark awk codes against C binaries instead of against perl or python.
    an interpreted scripting language that only can use 1 cpu core comes in as much as 41-83% faster than compiled C-code binaries. And it took me less than 10 mins to write this.
    * I couldn't set the same locales for wc otherwise they couldn't count UTF8 properly
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    command ::
    echo " awk0 :: ${awk0}" ; ( time ( pv < "${f}" | LC_ALL=C "${awk0}" 'BEGIN { FS = "^$"

    } { bytes += length
    } /[\200-\377]/ { gsub(/[\200-\301\365-\377]+/,_)
    } { chars += length
    } END {
    printf("rows = %\43.f | "\
    "UTF8 chars = %\43.f | "\
    "bytes = %\43.f\n",\
    NR, \
    chars+NR, \
    bytes+NR) } ' ) ) | lgp3
    awk0 :: mawk
    1.85GiB 0:00:19 [97.8MiB/s] [============================================>] 100%
    ( pv < "${f}" | LC_ALL=C "${awk0}" ; ) 18.75s user 1.31s system 103% cpu 19.344 total
    rows = 12494275. | UTF8 chars = 1285316715. | bytes = 1983544693.
    awk0 :: gawk
    1.85GiB 0:00:31 [60.1MiB/s] [============================================>] 100%
    ( pv < "${f}" | LC_ALL=C "${awk0}" ; ) 31.02s user 0.94s system 101% cpu 31.474 total
    rows = 12494275. | UTF8 chars = 1285316715. | bytes = 1983544693.
    awk0 :: mawk2
    1.85GiB 0:00:12 [ 148MiB/s] [============================================>] 100%
    ( pv < "${f}" | LC_ALL=C "${awk0}" ; ) 12.31s user 1.09s system 105% cpu 12.729 total
    rows = 12494275. | UTF8 chars = 1285316715. | bytes = 1983544693.
    in0: 1.85GiB 0:00:23 [81.3MiB/s] [81.3MiB/s] [=====================>] 100%
    ( pvE 0.1 in0 < "${f}" | gwc -lcm; ) 22.74s user 1.29s system 103% cpu 23.297 total
    12494275 1285316715 1983544693
    in0: 1.85GiB 0:00:17 [ 105MiB/s] [ 105MiB/s] [=====================>] 100%
    ( pvE 0.1 in0 < "${f}" | wc -lm; ) 17.18s user 1.96s system 106% cpu 17.951 total
    12494275 1285316715
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Wed Feb 9 21:05:47 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    That sometimes works, but the trouble is that once you've used AWK's pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted. This was, for me, an
    obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    but I could have used grep and cut in place of the first AWK. Maybe I'm
    just not good at remembering the details of all the key functions, but I
    find I use AWK in pipelines quite a lot.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Andreas Eder@a_eder_muc@web.de to comp.lang.awk on Wed Feb 9 22:54:22 2022
    From Newsgroup: comp.lang.awk

    On Mi 09 Feb 2022 at 16:36, Janis Papanagnou <janis_papanagnou@hotmail.com> wrote:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    +1

    'Andreas
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Wed Feb 9 22:11:38 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-09, Kpop 2GM <jason.cy.kwan@gmail.com> wrote:
    — gawk 5.1.1 posts a reasonably competitive time of 31.5 secs,
    — mawk 1.3.4 's time of 19.3secs beats GNU wc and being only slightly slower than BSD wc, while
    — mawk 1.9.9.6 's impressive 12.7secs leaves both in the dust, some 41% faster than BSD wc, and a whopping 83% faster than GNU wc. I wasn't kidding when I said I benchmark awk codes against C binaries instead of against perl or python.

    Why would you need a UTF8 validator in languages that are
    largely Unicode-ignorant?

    It's a bit like a visual guitar tuner for the deaf.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Wed Feb 9 22:22:43 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-09, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    That sometimes works, but the trouble is that once you've used AWK's pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted. This was, for me, an obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    You can split $3 into fields by assigning its value to $0, after
    tweaking FS for the inner field separator:

    $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
    wanted two three,a,b,c <- input
    three:a:b:c <- output

    You have to save and restore FS to do this repeatedly for
    different records of the outer file. Another approach is to
    use the split function to populate an array, where the pattern
    is an argument (only defaulting to FS if omitted).
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Thu Feb 10 01:41:16 2022
    From Newsgroup: comp.lang.awk

    On 09.02.2022 22:05, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    That sometimes works,

    My observation is that it usually works smoothly, and only sometimes
    (the edge cases, I called them above) not obviously straightforward,
    but usually just in a slightly different way. But it works generally.

    but the trouble is that once you've used AWK's
    pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted.

    You can always simply split() the fields, no need to invoke another
    process just for another implicit loop that awk supports.

    This was, for me, an obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    I understand the impulse to develop commands that way; that usually
    leads to such horrible and inflexible cascades of the tools mentioned
    above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).

    And as soon as you need yet more information from the first instance
    this approach needs more workarounds, e.g. passing state information
    through the OS level.

    Of course there's many ways to skin a cat. I just advocate to think
    about one-process solutions before following the reflex to construct
    inflexible pipeline constructs.


    but I could have used grep and cut in place of the first AWK. Maybe I'm
    just not good at remembering the details of all the key functions,

    The nice thing about awk - actually already mentioned in context of
    the features/complexity vs. power comments - is that you don't need
    to memorize a lot;[*] I think awk is terse and compact enough. YMMV.

    but I find I use AWK in pipelines quite a lot.

    That's how we learned it; pipelining through simple dedicated tools.
    I also still do that. My observation is that whenever a more powerful
    tool like awk gets into use, the more primitive tools in the pipeline
    can be eliminated, the whole pipeline gets then refactored, typically
    for efficiency, flexibility, robustness, and clarity in design.

    I want to close my comment with another aspect; the primitive helper
    tools are often restricted and incoherent.[*] In GNU context you have additional options that I'm glad to be able to use, but if you want to
    stay standard conforming the tools might not "suffice" or usage gets
    more bulky. With awk the standard version supports already the powerful
    core.

    Janis

    [*] If I'd have a remembering issue then it would be how options, e.g.
    the delimiters, are (differently) defined in the various tools, since
    options are incoherent and inconsistently named across the tools, and
    such options have also different semantics. That results in a lot man
    page lookups and more software maintenance issues.

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Thu Feb 10 01:07:32 2022
    From Newsgroup: comp.lang.awk

    Kaz Kylheku <480-992-1380@kylheku.com> writes:

    On 2022-02-09, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured >>>> prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    That sometimes works, but the trouble is that once you've used AWK's
    pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted. This was, for me, an
    obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    You can split $3 into fields by assigning its value to $0, after
    tweaking FS for the inner field separator:

    $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
    wanted two three,a,b,c <- input
    three:a:b:c <- output

    Sure, but you don't get to use pattern/action pairs on the result.

    You have to save and restore FS to do this repeatedly for
    different records of the outer file. Another approach is to
    use the split function to populate an array, where the pattern
    is an argument (only defaulting to FS if omitted).

    I would much prefer to use split, but only if someone stopped me doing
    it the natural way with a pipeline.

    I suspected there would be a slew of replies about how to do it in one
    command! However, I seriously doubt that there is any Unix programmer
    or sysadmin who has not used AWK in a pipeline with comments that could, relatively easily, be coded into the {} part of one or more actions. I
    really don't think my point is very contentious.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Thu Feb 10 01:37:17 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 22:05, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured >>>> prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead
    of connecting and running a lot of such processes use just one instance
    of awk. The functions expressed in those tools are - modulo a few edge
    cases - basics in Awk and part of its core.

    That sometimes works,

    My observation is that it usually works smoothly, and only sometimes
    (the edge cases, I called them above) not obviously straightforward,
    but usually just in a slightly different way. But it works generally.

    but the trouble is that once you've used AWK's
    pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted.

    You can always simply split() the fields, no need to invoke another
    process just for another implicit loop that awk supports.

    Yes, there's no need, but why worry about it? Maybe I am alone in
    thinking processes are cheap.

    But more to the point, a pipeline is an elegant, easily understood, and
    often natural way to organise a task. I will keep using them, even if
    there is no need.

    This was, for me, an obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    I understand the impulse to develop commands that way; that usually
    leads to such horrible and inflexible cascades of the tools mentioned
    above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).

    And as soon as you need yet more information from the first instance
    this approach needs more workarounds, e.g. passing state information
    through the OS level.

    A pipeline is not the right structure for such tasks, but there are a
    huge number of tasks where combining Unix tools is the simplest
    solution.

    Of course there's many ways to skin a cat. I just advocate to think
    about one-process solutions before following the reflex to construct inflexible pipeline constructs.


    but I could have used grep and cut in place of the first AWK. Maybe I'm
    just not good at remembering the details of all the key functions,

    The nice thing about awk - actually already mentioned in context of
    the features/complexity vs. power comments - is that you don't need
    to memorize a lot;[*] I think awk is terse and compact enough. YMMV.

    But since I use pipelines so much, I rarely use split, patsplit, gsub or gensub. I find myself checking their arguments pretty much every time I
    use them.

    but I find I use AWK in pipelines quite a lot.

    That's how we learned it; pipelining through simple dedicated tools.
    I also still do that.

    Why? Serious question. It sound like a dreadful risk based on your
    comments above. Doing is "usually leads to such horrible and inflexible cascades of the tools" when there is no need "to invoke another
    process". What makes you sometimes take the risk of horrible cascades
    and pay the price of another process?

    I ask because it's possible we disagree only on how frequently it should
    be done, and about exactly what circumstances warrant it.

    My observation is that whenever a more powerful
    tool like awk gets into use, the more primitive tools in the pipeline
    can be eliminated,

    I think we all agree that it /can/ be done.

    the whole pipeline gets then refactored, typically for efficiency, flexibility, robustness, and clarity in design.

    That's where I disagree. I often choose a pipeline because it is the
    most robust, flexible and clear design. (I rarely care about efficiency
    when doing this sort of thing.)

    I do it in other contexts too. In Haskell, because of it's lazy
    evaluation, you can chain function calls that filter and process lists,
    even potentially infinite ones. It often results in clear, easy to
    modify code.

    I want to close my comment with another aspect; the primitive helper
    tools are often restricted and incoherent.[*] In GNU context you have additional options that I'm glad to be able to use, but if you want to
    stay standard conforming the tools might not "suffice" or usage gets
    more bulky. With awk the standard version supports already the powerful
    core.

    I agree. That's a shame, but an inevitable cost of piecemeal historical development.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Thu Feb 10 07:59:43 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-10, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    Kaz Kylheku <480-992-1380@kylheku.com> writes:

    On 2022-02-09, Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured >>>>> prominently among other usual suspects, such as tr, sed, cut, grep).

    Hmm.. - these four tools are amongst those where I usually say; instead >>>> of connecting and running a lot of such processes use just one instance >>>> of awk. The functions expressed in those tools are - modulo a few edge >>>> cases - basics in Awk and part of its core.

    That sometimes works, but the trouble is that once you've used AWK's
    pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted. This was, for me, an
    obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    You can split $3 into fields by assigning its value to $0, after
    tweaking FS for the inner field separator:

    $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
    wanted two three,a,b,c <- input
    three:a:b:c <- output

    Sure, but you don't get to use pattern/action pairs on the result.

    But that's largely just syntactic sugar for a glorified case statement.

    Instead of

    /abc/ { ... }
    $2 > $3 { ... }

    you have to write

    if (/abc/) { ... }
    if ($2 > $3) { ... }

    kind of thing.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Axel Reichert@mail@axel-reichert.de to comp.lang.awk on Thu Feb 10 18:33:08 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    I understand the impulse to develop commands that way; that usually
    leads to such horrible and inflexible cascades of the tools mentioned
    above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).

    And as soon as you need yet more information from the first instance
    this approach needs more workarounds, e.g. passing state information
    through the OS level.

    Of course there's many ways to skin a cat. I just advocate to think
    about one-process solutions before following the reflex to construct inflexible pipeline constructs.

    It seems that like Ben I am a pipeliner, the igniting spark probably
    "Opening the software toolbox":

    https://www.gnu.org/software/coreutils/manual/html_node/Opening-the-software-toolbox.html

    I know that a lot can be done within awk, but it often does not seem to
    meet my way of thinking. For example, I might start with a grep. To my
    surprise it finds many matches, so further processing is called for, say
    awk '{print $3}' or similar. At that point, I will NOT replace the grep
    with awk '/.../', because it is easier to just add another pipeline
    after fetching the command from history using the up arrow. And so on,
    adding pipeline after pipeline (which I also can easily relate to
    functional programming). Once the whole dataflow is ready, I will
    usually not "refactor" the beast, only in glaringly obvious cases/optimizations. I might even have started with a (in hindsight)
    Useless Use Of Cat. On the more ambitious side, I well remember how
    proud I was when plumbing several xargs into a pipeline:

    foo | bar | xargs -i baz {} 333 | quux | xargs fubar

    By now this is a common idiom for me on the command line.

    But full ACK on passing information from the first instance downstream,
    at which point I tend to start using Python. But up to then pipelining
    "just flows". That's what they were designed for. (-:

    Axel

    P. S.: I will keep your advice in memory, though, to avoid my worst
    excesses. Point taken.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Thu Feb 10 20:54:07 2022
    From Newsgroup: comp.lang.awk

    one-liner solution to that wanted-three question :

    echo 'wanted two three,a,b,c' \
    \
    | [mg]awk '/^wanted/ && gsub(",", substr(":", ($0=$3)~"", 1)) + 1'

    three:a:b:c
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Thu Feb 10 21:22:32 2022
    From Newsgroup: comp.lang.awk

    On Thursday, February 10, 2022 at 2:59:45 AM UTC-5, Kaz Kylheku wrote:
    On 2022-02-10, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
    Kaz Kylheku <480-99...@kylheku.com> writes:

    On 2022-02-09, Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
    Janis Papanagnou <janis_pa...@hotmail.com> writes:

    On 09.02.2022 08:49, Axel Reichert wrote:
    [ about an ASCII data mangling Python script ]
    [....] I started immediately, cobbled something together (awk featured >>>>> prominently among other usual suspects, such as tr, sed, cut, grep). >>>>
    Hmm.. - these four tools are amongst those where I usually say; instead >>>> of connecting and running a lot of such processes use just one instance >>>> of awk. The functions expressed in those tools are - modulo a few edge >>>> cases - basics in Awk and part of its core.

    That sometimes works, but the trouble is that once you've used AWK's
    pattern/action once feature, you can't do so again -- you are stuck
    inside the action part. Just the other day I needed to split fields
    within a filed after finding the lines I wanted. This was, for me, an >>> obvious case for two processes:

    awk -F: '/wanted/ { print $3 }' | awk -F, '...'

    You can split $3 into fields by assigning its value to $0, after
    tweaking FS for the inner field separator:

    $ awk '/wanted/ { FS=","; $0=$3; OFS=":"; $1=$1; print }'
    wanted two three,a,b,c <- input
    three:a:b:c <- output

    Sure, but you don't get to use pattern/action pairs on the result.
    But that's largely just syntactic sugar for a glorified case statement.

    Instead of

    /abc/ { ... }
    $2 > $3 { ... }

    you have to write

    if (/abc/) { ... }
    if ($2 > $3) { ... }

    kind of thing.

    two different one-liner solutions i managed to conjure up, neither of which requires dealing with patterns, or arrays, or patsplit, but both involve assigning back to $0


    command 1 is

    [ echo "wanted two three,a,b,c" | mawk2 '/wanted/ * gsub(",", substr(":",$_!=($_=$NF),_~_))' ]

    three:a:b:c

    command 2 is

    [ echo "wanted two three,a,b,c" | mawk2 -F, '/wanted/ && ($!_=substr($!_,match($!_,/[^ \t]+$/) ) )' OFS=":" ]

    three:a:b:c
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Fri Feb 11 07:43:18 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-11, Kpop 2GM <jason.cy.kwan@gmail.com> wrote:
    one-liner solution to that wanted-three question :

    echo 'wanted two three,a,b,c' \
    \
    | [mg]awk '/^wanted/ && gsub(",", substr(":", ($0=$3)~"", 1)) + 1'

    three:a:b:c

    Are you positively sure that you're taking my example literally enough?

    Try this:

    sed -e 's/wanted two //' -e 's/,/:/g'
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Axel Reichert@mail@axel-reichert.de to comp.lang.awk on Fri Feb 11 09:27:28 2022
    From Newsgroup: comp.lang.awk

    Kpop 2GM <jason.cy.kwan@gmail.com> writes:

    command 1 is

    [ echo "wanted two three,a,b,c" | mawk2 '/wanted/ * gsub(",", substr(":",$_!=($_=$NF),_~_))' ]

    three:a:b:c

    command 2 is

    [ echo "wanted two three,a,b,c" | mawk2 -F, '/wanted/ && ($!_=substr($!_,match($!_,/[^ \t]+$/) ) )' OFS=":" ]

    three:a:b:c

    And both seem to me horrendously unelegant compared to

    echo "wanted two three,a,b,c" | awk '{print $3}' | tr ',' ':'

    But maybe I missed some detail from the original task and '/wanted/' has
    to be added as awk pattern.

    Best regards

    Axel
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Feb 11 00:35:36 2022
    From Newsgroup: comp.lang.awk

    On Friday, February 11, 2022 at 2:43:21 AM UTC-5, Kaz Kylheku wrote:
    On 2022-02-11, Kpop 2GM <jason....@gmail.com> wrote:
    one-liner solution to that wanted-three question :

    echo 'wanted two three,a,b,c' \
    \
    | [mg]awk '/^wanted/ && gsub(",", substr(":", ($0=$3)~"", 1)) + 1'

    three:a:b:c

    Are you positively sure that you're taking my example literally enough?

    Try this:

    sed -e 's/wanted two //' -e 's/,/:/g'

    i think yours doesn't enforce the filter criteria but simply cleaned it up, which could be circumvented as such :

    echo $'wanted two threeA,a,b,c\nhi wanted two threeB,a,b,c' \
    \
    [mg]awk 'sub("^wanted.+ ","")*gsub(",",":")'
    threeA:a:b:c

    echo $'wanted two threeA,a,b,c\nhi wanted two threeB,a,b,c' | sed -e 's/wanted two //' -e 's/,/:/g'
    threeA:a:b:c
    hi threeB:a:b:c
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Feb 11 00:39:30 2022
    From Newsgroup: comp.lang.awk

    echo "wanted two three,a,b,c" | awk '{print $3}' | tr ',' ':'

    if elegance is a concern then no pattern and no print statement is more elegant, at least to me

    echo "wanted two three,a,b,c" | gawk '($_=$NF)~_' | tr ',' ':'
    three:a:b:c
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Feb 11 01:05:32 2022
    From Newsgroup: comp.lang.awk

    On Thursday, February 10, 2022 at 12:33:11 PM UTC-5, Axel Reichert wrote:
    Janis Papanagnou <janis_pa...@hotmail.com> writes:

    I understand the impulse to develop commands that way; that usually
    leads to such horrible and inflexible cascades of the tools mentioned above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).

    And as soon as you need yet more information from the first instance
    this approach needs more workarounds, e.g. passing state information through the OS level.

    Of course there's many ways to skin a cat. I just advocate to think
    about one-process solutions before following the reflex to construct inflexible pipeline constructs.
    It seems that like Ben I am a pipeliner, the igniting spark probably "Opening the software toolbox":

    https://www.gnu.org/software/coreutils/manual/html_node/Opening-the-software-toolbox.html

    I know that a lot can be done within awk, but it often does not seem to
    meet my way of thinking. For example, I might start with a grep. To my surprise it finds many matches, so further processing is called for, say
    awk '{print $3}' or similar. At that point, I will NOT replace the grep
    with awk '/.../', because it is easier to just add another pipeline
    after fetching the command from history using the up arrow. And so on, adding pipeline after pipeline (which I also can easily relate to
    functional programming). Once the whole dataflow is ready, I will
    usually not "refactor" the beast, only in glaringly obvious cases/optimizations. I might even have started with a (in hindsight)
    Useless Use Of Cat. On the more ambitious side, I well remember how
    proud I was when plumbing several xargs into a pipeline:

    foo | bar | xargs -i baz {} 333 | quux | xargs fubar

    By now this is a common idiom for me on the command line.


    speaking of specialized tools for piping at the command line, how many open-source utilities you're aware of that could decode unsigned hex to arbitrary precision piping in from /dev/stdin with nothing more than :

    gawk -nM '$!_=+$_'

    or this variant, pre-negate it for you before outputting : gawk -nM '$!_=-$_'
    or this variant, pre-double it for you before outputting : gawk -nM '$!_-=-$_'

    or this best one yet :

    gawk -nM '$(($!_-=-$_)~_)++'

    decoding unsigned hex to arbitrary precision, and returning 2 n + 1 of that input. maybe it's even cleaner in perl i dunno
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Kaz Kylheku@480-992-1380@kylheku.com to comp.lang.awk on Fri Feb 11 17:40:13 2022
    From Newsgroup: comp.lang.awk

    On 2022-02-11, Axel Reichert <mail@axel-reichert.de> wrote:
    Kpop 2GM <jason.cy.kwan@gmail.com> writes:

    command 1 is

    [ echo "wanted two three,a,b,c" | mawk2 '/wanted/ * gsub(",", substr(":",$_!=($_=$NF),_~_))' ]

    three:a:b:c

    command 2 is

    [ echo "wanted two three,a,b,c" | mawk2 -F, '/wanted/ && ($!_=substr($!_,match($!_,/[^ \t]+$/) ) )' OFS=":" ]

    three:a:b:c

    And both seem to me horrendously unelegant compared to

    How nice of you to provide company; now Kpop 2GM doesn't have to feel
    he's the only one missing the point.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Fri Feb 11 21:27:47 2022
    From Newsgroup: comp.lang.awk

    It just occurred to me that we are discussing basically Unix/shell
    issues at the moment, but since there's a strong relation to awk I
    abstain from marking it [OT].

    On 10.02.2022 18:33, Axel Reichert wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    I understand the impulse to develop commands that way; that usually
    leads to such horrible and inflexible cascades of the tools mentioned
    above (cat, sed, grep, cut, head, tail, tr, wc, pr, or yet more awks).

    [...]

    It seems that like Ben I am a pipeliner, the igniting spark probably
    "Opening the software toolbox":

    https://www.gnu.org/software/coreutils/manual/html_node/Opening-the-software-toolbox.html

    I know that a lot can be done within awk, but it often does not seem to
    meet my way of thinking. For example, I might start with a grep. To my surprise it finds many matches, so further processing is called for, say
    awk '{print $3}' or similar. At that point, I will NOT replace the grep
    with awk '/.../', because it is easier to just add another pipeline
    after fetching the command from history using the up arrow. And so on,
    adding pipeline after pipeline (which I also can easily relate to
    functional programming). Once the whole dataflow is ready, I will
    usually not "refactor" the beast, only in glaringly obvious cases/optimizations. I might even have started with a (in hindsight)
    Useless Use Of Cat. On the more ambitious side, I well remember how
    proud I was when plumbing several xargs into a pipeline:

    foo | bar | xargs -i baz {} 333 | quux | xargs fubar

    By now this is a common idiom for me on the command line.

    But full ACK on passing information from the first instance downstream,
    at which point I tend to start using Python. But up to then pipelining
    "just flows". That's what they were designed for. (-:

    Axel

    P. S.: I will keep your advice in memory, though, to avoid my worst
    excesses. Point taken.


    Don't get me wrong. Pipelines of tools are not "bad" and I also wrote:

    "That's how we learned it; pipelining through simple dedicated tools.
    I also still do that. [...]"

    Some application cases or programming patterns are also often not that
    easily implementable by other "monolithic" languages (including awk).
    Your double xargs programming pattern is certainly rare, but I used it
    also at times, and it didn't occur to me to try to re-implement it by
    awk just for the sake of praising the language or something like that.
    The same with two instances of awk, it may make sense, there's no dogma.

    It's not a black-or-white issue. And personal preferences vary anyway.
    But sometimes the way we're used to use a tool-chest is in our way of
    "finding" (actually just writing down) better solutions.

    What I was addressing is the use of programs with primitive functions
    that awk is providing in a simple and consistent way inherently! My
    impression is that Unix folks today learn command-line patterns about
    the same way I did decades ago; starting from cat, cut, grep, sed (and
    so on), often not even reaching awk - because it's more complex than
    the simpler tools dedicated to a task. The joined dedicated Unix tools
    are not simpler, though, rather the opposite. Recognizing that let me
    change the way how I start my tasks; for simple searches it wouldn't
    occur to me to use awk, but as soon as a search task gets slightly
    more complex, say searching for lines with two matches, /A/&&/B/, I'd
    use awk. Or if it is clear from the beginning that the task will be
    harder and more clumsy to implement with pipes, e.g. extracting keys
    from a file to match records in another file; then I don't even think
    about how that (maybe) could be implemented by function compositions
    with primitive Unix programs, I'd immediately take awk if appropriate.

    There are immediate gains, but also gains not obvious in the first
    moment. You may observe that, say, /A/&&/B/ would return too many
    results; no problem to restrict the results set simply with further qualifications like, say, $1~/A/ && $1~/B/ , adding a FS definition,
    and whatnot. Of course grep A | grep B isn't complex (less terse,
    okay, but still not complex), but you're often in a dead end if your
    demands get extended (e.g. by matching fields instead of lines). For
    a one-shot ad hoc task the greps are okay, and I see that I sometimes
    start with a command, then add another piped command to the previous
    one (from shell history), and then a third one. But as soon as this
    goes into a shell script these commands are getting optimized. Lately
    I don't even start with pipes if I think the command might get bulky.

    Note that here I am not advocating for smart awk code patterns as we occasionally suggest here for fun, to keep things minimalistic terse.
    It's just the mundane inherent features I am talking about. Features
    that are consistently available with awk. And that's different to the
    Unix tools set where I have a hammer, a nail, and a screwdriver, and
    think about how to combine them to change the light bulb. - Yes, I'm exaggerating (at least a bit), but you get the point.

    Folks learn by examples, and we see (in shell newsgroups or in books)
    often bad solutions based on published code that follows the historic
    approach, copy/pasted without thinking, and copied as code pattern in
    their future projects, posts, lectures, and own books.

    It got a bit pathetic.

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Fri Feb 11 21:58:40 2022
    From Newsgroup: comp.lang.awk

    On 10.02.2022 02:37, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    You can always simply split() the fields, no need to invoke another
    process just for another implicit loop that awk supports.

    Yes, there's no need, but why worry about it? Maybe I am alone in
    thinking processes are cheap.

    I just try to avoid unnecessary processes. A dozen is not an issue,
    but once you've embedded them in shell loops it might get an issue.


    But more to the point, a pipeline is an elegant, easily understood, and
    often natural way to organise a task.

    Agreed.

    [...]

    A pipeline is not the right structure for such tasks, but there are a
    huge number of tasks where combining Unix tools is the simplest
    solution.

    Agreed.


    The nice thing about awk - actually already mentioned in context of
    the features/complexity vs. power comments - is that you don't need
    to memorize a lot;[*] I think awk is terse and compact enough. YMMV.

    But since I use pipelines so much, I rarely use split, patsplit, gsub or gensub. I find myself checking their arguments pretty much every time I
    use them.

    (Some of the mentioned functions are non-standard, GNU Awk'ish.)

    Well, if you don't use them regularly you'll have to look up the docs. Personally I think the [standard] functions are easy to remember, but
    okay. Myself I can easy remember them if only by thinking about their
    finally placed default arguments; split (what, where [,by-what])
    omitting the "by-what" will use the standard FS, and "what", "where"
    I think is the natural order I'd expect, and similarly with gsub;
    gsub (what, by-what [, where]) omitting the where will operate on
    the whole line, and "what", "by-what" I think are the natural order.

    The non-standard GNU functions patsplit() and gensub() I also have to
    look up, but I think just because I rarely have a need to use these
    functions.


    That's how we learned it; pipelining through simple dedicated tools.
    I also still do that.

    Why? Serious question. It sound like a dreadful risk based on your
    comments above. Doing is "usually leads to such horrible and inflexible cascades of the tools" when there is no need "to invoke another
    process". What makes you sometimes take the risk of horrible cascades
    and pay the price of another process?

    I think this is answered in my previous post, my reply to Axel's post.

    It's certainly not something I'd call a risk, because I can control it,
    I can make the decisions, based on application case, requirements, and expertise.


    I ask because it's possible we disagree only on how frequently it should
    be done, and about exactly what circumstances warrant it.

    I think we should not take the theme too dogmatic or too strict. To
    quote from my other post:

    What I was addressing is the use of programs with primitive functions
    that awk is providing in a simple and consistent way inherently!


    the whole pipeline gets then refactored, typically for efficiency,
    flexibility, robustness, and clarity in design.

    That's where I disagree. I often choose a pipeline because it is the
    most robust, flexible and clear design. (I rarely care about efficiency
    when doing this sort of thing.)

    We do not disagree concerning the clearness of the pipe concept. It is
    just very _primitive_ (an advantage, and a restriction WRT flexibility).


    I want to close my comment with another aspect; the primitive helper
    tools are often restricted and incoherent.[*] In GNU context you have
    additional options that I'm glad to be able to use, but if you want to
    stay standard conforming the tools might not "suffice" or usage gets
    more bulky. With awk the standard version supports already the powerful
    core.

    I agree. That's a shame, but an inevitable cost of piecemeal historical development.

    My resume is that we "mostly"[*] agree. :-)

    Janis

    [*] Term borrowed from the HHGTTG.

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Axel Reichert@mail@axel-reichert.de to comp.lang.awk on Fri Feb 11 22:45:23 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 10.02.2022 18:33, Axel Reichert wrote:

    P. S.: I will keep your advice in memory, though, to avoid my worst
    excesses. Point taken.


    Don't get me wrong. Pipelines of tools are not "bad" and I also wrote:

    "That's how we learned it; pipelining through simple dedicated tools.
    I also still do that. [...]"

    Yes, sure, I noticed that. I do think that it's mostly about stylistic
    matters at this point of the discussion. I like to compare
    shell/Unix/awk/CLI issues with a language: A former boss was impressed
    by what was feasible with these strange "words", which to him sounded
    like Greek. He wanted me to save these utterances for others to benefit
    from my "words of wisdom". I argued that they were not a "quote" to be
    put into some anthology, but a spontaneous sentence formed during "live
    talk". The point for me was not "memorizing" them (shell script), but
    being able to speak.

    Of course this analogy is valid only for ad hoc stuff, but that is how I
    use them in almost all cases: These are throw-away command lines and
    only very rarely I see the potential for them to be re-used. It is for
    those instances that I will try to remember your words of
    wisdom/warning. (-:

    Your double xargs programming pattern is certainly rare

    Not for me, my guess is that I use it daily.

    as soon as a search task gets slightly more complex, say searching for
    lines with two matches, /A/&&/B/, I'd use awk.

    grep A ... | grep B

    or, often

    grep -E '(A|B)' ...

    Again, due to the ad-hoc nature of my shell usage, I will often not
    notice that two matches are needed, only after the first grep command
    has been executed: "Ah, I will need another match!"

    more clumsy to implement with pipes, e.g. extracting keys from a file
    to match records in another file; then I don't even think about how
    that (maybe) could be implemented by function compositions with
    primitive Unix programs

    But you do know "join"? An often overlooked gem.

    a one-shot ad hoc task the greps are okay, and I see that I sometimes
    start with a command, then add another piped command to the previous
    one (from shell history), and then a third one. But as soon as this
    goes into a shell script these commands are getting optimized.

    Yes. And it is at that point that I should try to reduce the number of
    tools used. In fact I am a big fan of universal weapons.

    Best regards

    Axel
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Fri Feb 11 23:32:44 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 10.02.2022 02:37, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    You can always simply split() the fields, no need to invoke another
    process just for another implicit loop that awk supports.

    Yes, there's no need, but why worry about it? Maybe I am alone in
    thinking processes are cheap.

    I just try to avoid unnecessary processes. A dozen is not an issue,
    but once you've embedded them in shell loops it might get an issue.

    That does not really advance the discussion because almost no processes
    (other than one to preform the task) are actually necessary. It's clear
    that you attach a different weighting to the various considerations, but
    I don't really know more than that. For my part, I can't remember the
    last time I even thought about how many processes were involved in a
    pipeline.

    But more to the point, a pipeline is an elegant, easily understood, and
    often natural way to organise a task.

    Agreed.

    [...]

    A pipeline is not the right structure for such tasks, but there are a
    huge number of tasks where combining Unix tools is the simplest
    solution.

    Agreed.


    The nice thing about awk - actually already mentioned in context of
    the features/complexity vs. power comments - is that you don't need
    to memorize a lot;[*] I think awk is terse and compact enough. YMMV.

    But since I use pipelines so much, I rarely use split, patsplit, gsub or
    gensub. I find myself checking their arguments pretty much every time I
    use them.

    (Some of the mentioned functions are non-standard, GNU Awk'ish.)

    Well, if you don't use them regularly you'll have to look up the docs. Personally I think the [standard] functions are easy to remember, but
    okay. Myself I can easy remember them if only by thinking about their
    finally placed default arguments; split (what, where [,by-what])
    omitting the "by-what" will use the standard FS, and "what", "where"
    I think is the natural order I'd expect, and similarly with gsub;
    gsub (what, by-what [, where]) omitting the where will operate on
    the whole line, and "what", "by-what" I think are the natural order.

    The non-standard GNU functions patsplit() and gensub() I also have to
    look up, but I think just because I rarely have a need to use these functions.


    That's how we learned it; pipelining through simple dedicated tools.
    I also still do that.

    Why? Serious question. It sound like a dreadful risk based on your
    comments above. Doing is "usually leads to such horrible and inflexible
    cascades of the tools" when there is no need "to invoke another
    process". What makes you sometimes take the risk of horrible cascades
    and pay the price of another process?

    I think this is answered in my previous post, my reply to Axel's post.

    It's certainly not something I'd call a risk, because I can control it,
    I can make the decisions, based on application case, requirements, and expertise.

    That's an odd answer. Do you think I can't control the risk, or was the
    advice about it "usually [leading] to such horrible and inflexible
    cascades of the tools" aimed at some unstated group who are not as good
    at controlling risk as you and I?

    I ask because it's possible we disagree only on how frequently it should
    be done, and about exactly what circumstances warrant it.

    I think we should not take the theme too dogmatic or too strict. To
    quote from my other post:

    What I was addressing is the use of programs with primitive functions
    that awk is providing in a simple and consistent way inherently!


    the whole pipeline gets then refactored, typically for efficiency,
    flexibility, robustness, and clarity in design.

    That's where I disagree. I often choose a pipeline because it is the
    most robust, flexible and clear design. (I rarely care about efficiency
    when doing this sort of thing.)

    We do not disagree concerning the clearness of the pipe concept. It is
    just very _primitive_ (an advantage, and a restriction WRT
    flexibility).

    But you also appear to consider it costly and risky. That's the
    difference I think.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sat Feb 12 16:43:10 2022
    From Newsgroup: comp.lang.awk

    On 11.02.2022 22:45, Axel Reichert wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    Yes, sure, I noticed that. I do think that it's mostly about stylistic matters at this point of the discussion. I like to compare
    shell/Unix/awk/CLI issues with a language: A former boss was impressed
    by what was feasible with these strange "words", which to him sounded
    like Greek. He wanted me to save these utterances for others to benefit
    from my "words of wisdom". I argued that they were not a "quote" to be
    put into some anthology, but a spontaneous sentence formed during "live talk". The point for me was not "memorizing" them (shell script), but
    being able to speak.

    Of course this analogy is valid only for ad hoc stuff, but that is how I
    use them in almost all cases: These are throw-away command lines and
    only very rarely I see the potential for them to be re-used. [...]

    I understand that analogy. And thinking about your and Ben's posts I
    concluded that this must be the reason for your approach and view of
    the topic. (But please correct me if I misinterpreted your reasons.)

    I indeed have two types of tasks, ad hoc tools, and applications that
    are supporting regular occurring tasks. And often it is the case that
    the former evolve to the latter! I already described the former tasks
    in my previous posts by the (often incremental) development of tools;
    starting with a command and appending pipe-connected other commands
    to refine the task, to make the output better usable, and whatnot. At
    the other side there's the functionality that goes in the direction
    of software development; I'm therefore also reluctant to call what I
    am creating here as "shell script", I often call it "shell program".
    How these two types of code are written/designed depends on the type.

    I'd like to jump in with anecdotes from Real Life (as you've done as
    well), since what I wrote can probably be better understood that way
    of how I am thinking.

    Quite some years ago I had an interview for a job, and since it was
    a Unix/Linux running company they asked me about writing code for a
    shell task; basically log file analysis (as far as my faint memories
    serve). I heard the task and quickly typed a simple 1-liner of awk;
    because that was what appeared to be the right tool for the given
    task, and for the perspective that the requirements for such tools
    or tool-applications will typically quickly get extended and grow
    ("implement this", "now add that", "and this function would help",
    "we also missed to consider that", etc.). The interviewing person
    was (positively!) astonished, and he told me that he expected some
    solution with 'cut', 'uniq', 'wc', and the like. So I provided him
    also the piped variant (luckily even remembering uniq's option -c),
    but also pointed him out the inherent drawbacks and restrictions.

    On another occasion we quickly needed an analysis of mobile phone
    customer data from a production database. My boss sat adjacent and
    I told him to just get and produce this data instantly; one minute
    and an awk 1-liner later we had that information. His expectation
    was that we'd have needed some high level programming environment,
    and so he was quite fascinated. With piped commands that wouldn't
    have been possible to achieve, and with a high-level language it
    would have required a lot more time.

    It all depends on the specific task, and the perspective whereto
    the application will likely evolve (and of course also personal
    preference and specific experience what makes more or less sense).


    as soon as a search task gets slightly more complex, say searching for
    lines with two matches, /A/&&/B/, I'd use awk.

    grep A ... | grep B

    I anticipated that pattern already in my previous post to that you
    are replying here (but didn't quote), so I refer to my original text.


    more clumsy to implement with pipes, e.g. extracting keys from a file
    to match records in another file; then I don't even think about how
    that (maybe) could be implemented by function compositions with
    primitive Unix programs

    But you do know "join"? An often overlooked gem.

    I know the 'join' command but don't see what that has to do with what I
    wrote here. By "function composition" I meant that programs represent functions; tool x does f, tool y does g, and combining tool x and y by,
    say, x|y does g o f, where o is the function connector, a composition
    of functionality (and code).

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Sat Feb 12 17:24:29 2022
    From Newsgroup: comp.lang.awk

    On 12.02.2022 00:32, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
    On 10.02.2022 02:37, Ben Bacarisse wrote:

    Why? Serious question. It sound like a dreadful risk based on your
    comments above. Doing is "usually leads to such horrible and inflexible >>> cascades of the tools" when there is no need "to invoke another
    process". What makes you sometimes take the risk of horrible cascades
    and pay the price of another process?

    I think this is answered in my previous post, my reply to Axel's post.

    It's certainly not something I'd call a risk, because I can control it,
    I can make the decisions, based on application case, requirements, and
    expertise.

    That's an odd answer. Do you think I can't control the risk,

    No, Ben. You were (IMO unnecessarily) introducing the (IMO also
    inappropriate) term "risk". And I pointed out that I see risks
    only in cases where one cannot control the situation or where I
    am restricted in any ways in my decisions. I was neither saying
    nor implying anything about you. (This discussion got a spin in
    an [emotional] direction that I don't want to follow.)

    In a reply to Axel (a few minutes ago) I've just extended on my
    view; it may (or may not) shed some light on some open questions
    or maybe give some insights about possible differences of personal
    approaches or mindsets.

    Janis

    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Ben Bacarisse@ben.usenet@bsb.me.uk to comp.lang.awk on Sat Feb 12 22:25:06 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 12.02.2022 00:32, Ben Bacarisse wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
    On 10.02.2022 02:37, Ben Bacarisse wrote:

    Why? Serious question. It sound like a dreadful risk based on your
    comments above. Doing is "usually leads to such horrible and inflexible >>>> cascades of the tools" when there is no need "to invoke another
    process". What makes you sometimes take the risk of horrible cascades >>>> and pay the price of another process?

    I think this is answered in my previous post, my reply to Axel's post.

    It's certainly not something I'd call a risk, because I can control it,
    I can make the decisions, based on application case, requirements, and
    expertise.

    That's an odd answer. Do you think I can't control the risk,

    No, Ben. You were (IMO unnecessarily) introducing the (IMO also inappropriate) term "risk".

    It seems to me reasonable to characterise doing something that "usually
    leads to such horrible and inflexible cascades of the tools" as taking a
    risk. A practice (that you describe as an impulse!) that usually leads
    to anything horrible and inflexible is surely risky, isn't it?

    And I pointed out that I see risks
    only in cases where one cannot control the situation or where I
    am restricted in any ways in my decisions.

    I accept that this is what you intended to say, but it's not what you
    said. There was no inclusive "one" in your justification for your use
    of pipelines of basic tools.

    You had warned me of where following the "impulse" I had demonstrated in
    my example usually leads, but when I was surprised that you "still do
    that" you tell me it's fine because "I can control it, I can make the decisions, based on application case, requirements, and expertise". No
    "where one can make the decisions", no "where one can control it".

    I was neither saying nor implying anything about you.

    If you did not intend your remarks imply and say things about me, you
    phrased it badly. Talking about developing commands "that way",
    immediately after an example of mine inevitably includes me in the
    remark about it being an (understandable) impulse.

    (This discussion got a spin in
    an [emotional] direction that I don't want to follow.)

    Telling someone they are acting on an impulse that usually leads to
    horrible and inflexible code is obviously somewhat personal.
    --
    Ben.
    --- Synchronet 3.19b-Linux NewsLink 1.113
  • From Axel Reichert@mail@axel-reichert.de to comp.lang.awk on Sun Feb 13 22:00:41 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 11.02.2022 22:45, Axel Reichert wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    more clumsy to implement with pipes, e.g. extracting keys from a file
    to match records in another file; then I don't even think about how
    that (maybe) could be implemented by function compositions with
    primitive Unix programs

    But you do know "join"? An often overlooked gem.

    I know the 'join' command but don't see what that has to do with what I
    wrote here. By "function composition" I meant that programs represent functions; tool x does f, tool y does g, and combining tool x and y by,
    say, x|y does g o f, where o is the function connector, a composition
    of functionality (and code).

    foo-1.txt:
    foo 1 2 3
    Foo 4 5 6 7
    FOO 8 9

    foo-2.txt:
    foo 456
    Foo 45 67
    FOO 89

    To me, the first column seems like a key and the whole line like a
    record. To get something like

    foo-joined.txt:
    foo 1 2 456
    Foo 4 5 45
    Foo 8 9 89

    would be a typical job for join. Hence my question. But we digress from
    awk. (-:

    Axel
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From olivier gabathuler@ogabathuler@free.fr to comp.lang.awk on Wed Feb 16 14:11:39 2022
    From Newsgroup: comp.lang.awk

    Hi, thank you for your outstanding contributions and discussions on awk.

    Working with it from more than 20 years now and still amazed at the power of this wonderful language !

    This is my modest contribution to shed light on a usage too little documented on the Internet, I named "Record Separator" : https://rosettacode.org/wiki/Search_in_paragraph%27s_text

    Olivier Gabathuler
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Thu Feb 17 03:12:19 2022
    From Newsgroup: comp.lang.awk

    On 13.02.2022 22:00, Axel Reichert wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    On 11.02.2022 22:45, Axel Reichert wrote:
    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    more clumsy to implement with pipes, e.g. extracting keys from a file
    to match records in another file; then I don't even think about how
    that (maybe) could be implemented by function compositions with
    primitive Unix programs

    But you do know "join"? An often overlooked gem.

    I know the 'join' command but don't see what that has to do with what I
    wrote here. By "function composition" I meant that programs represent
    functions; tool x does f, tool y does g, and combining tool x and y by,
    say, x|y does g o f, where o is the function connector, a composition
    of functionality (and code).

    foo-1.txt:
    foo 1 2 3
    Foo 4 5 6 7
    FOO 8 9

    foo-2.txt:
    foo 456
    Foo 45 67
    FOO 89

    To me, the first column seems like a key and the whole line like a
    record.

    Sure. You join two data sets identified by a common key. But so what?

    You have probably been triggered by the formulation of a sample use ("extracting keys from a file to match records in another file") in
    my post where I was more aiming at patterns like

    awk '
    NR==FNR { map[$1] ; next }
    $1 in map
    ' keys data

    (i.e. a filtering task - in a simple form also doable by grep) or like

    awk '
    NR==FNR { map[$1] = $2 ; next }
    { for (i in map) gsub (i, map[i]) }
    1
    ' mapping data

    (i.e. a simple replacement task).

    One point was that function compositions by piped commands may (or not)
    work for reductions of original data but if you need context from a
    previous stage you are lost (or rather; need workarounds).

    To get something like

    foo-joined.txt:
    foo 1 2 456
    Foo 4 5 45
    Foo 8 9 89

    would be a typical job for join. Hence my question.

    Yes, as I said in previous posts as well, I do know join but that was
    not what I had been speaking about.


    But we digress from awk. (-:

    I hope I dragged the thread back on topic with the awk samples. ;-)

    Janis


    Axel


    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Thu Feb 17 03:36:08 2022
    From Newsgroup: comp.lang.awk

    On 16.02.2022 23:11, olivier gabathuler wrote:
    Hi, thank you for your outstanding contributions and discussions on
    awk.

    Working with it from more than 20 years now and still amazed at the
    power of this wonderful language !

    This is my modest contribution to shed light on a usage too little
    documented on the Internet, I named "Record Separator" : https://rosettacode.org/wiki/Search_in_paragraph%27s_text

    I had a peek view into the awk code and (unstructured) data sample.

    The task is described not very specific as:
    "The goal is to verify the presence of a word or regular expression
    within several paragraphs of text (structured or not) and to print
    the relevant paragraphs on the standard output."

    When I saw the code I first wondered about the definition of a two
    newlines output record separator just to define the same as input
    separator to the next awk stage. (An indication for a candidate to
    be refactored.)

    It seems that your code basically extracts from records of blocks
    those blocks that contain a specific string. In addition it changes
    the data in a subtle way beyond the formulated task description.

    Personally my first attempt for such a task would have been simpler
    (using awk's multi-line data blocks feature), something like

    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt

    with possible extensions to test for the patterns in specific fields
    (by adding FS = "\n") so that the patterns if appearing in the data
    won't compromise the correct function.

    (Note that the output of above code keeps the matched data intact.)

    Yes, features relying on the separators allow interesting solutions.
    (In the given case it's arguable whether they've been used sensibly.)

    Janis


    Olivier Gabathuler


    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Axel Reichert@mail@axel-reichert.de to comp.lang.awk on Thu Feb 17 09:13:12 2022
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou@hotmail.com> writes:

    Sure. You join two data sets identified by a common key. But so what?

    Hey, I was very proud when I discovered this 15 years ago. (-:

    You have probably been triggered by the formulation of a sample use ("extracting keys from a file to match records in another file") in my
    post

    Yes.

    I hope I dragged the thread back on topic with the awk samples. ;-)

    You did, and I am happy to learn more here from, it seems, much more
    advanced awk usage than I am used to so far. Thanks!

    Axel
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From olivier gabathuler@ogabathuler@free.fr to comp.lang.awk on Fri Feb 18 10:08:58 2022
    From Newsgroup: comp.lang.awk

    Le jeudi 17 février 2022 à 03:36:11 UTC+1, Janis Papanagnou a écrit :
    On 16.02.2022 23:11, olivier gabathuler wrote:
    Hi, thank you for your outstanding contributions and discussions on
    awk.

    Working with it from more than 20 years now and still amazed at the
    power of this wonderful language !

    This is my modest contribution to shed light on a usage too little documented on the Internet, I named "Record Separator" : https://rosettacode.org/wiki/Search_in_paragraph%27s_text
    I had a peek view into the awk code and (unstructured) data sample.

    The task is described not very specific as:
    "The goal is to verify the presence of a word or regular expression
    within several paragraphs of text (structured or not) and to print
    the relevant paragraphs on the standard output."

    When I saw the code I first wondered about the definition of a two
    newlines output record separator just to define the same as input
    separator to the next awk stage. (An indication for a candidate to
    be refactored.)

    It seems that your code basically extracts from records of blocks
    those blocks that contain a specific string. In addition it changes
    the data in a subtle way beyond the formulated task description.

    Personally my first attempt for such a task would have been simpler
    (using awk's multi-line data blocks feature), something like

    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt

    with possible extensions to test for the patterns in specific fields
    (by adding FS = "\n") so that the patterns if appearing in the data
    won't compromise the correct function.

    (Note that the output of above code keeps the matched data intact.)

    Yes, features relying on the separators allow interesting solutions.
    (In the given case it's arguable whether they've been used sensibly.)

    Janis


    Olivier Gabathuler

    Hi Janis,
    thanks for your response :-)
    Just to understand, the output with
    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt
    is :
    ..
    ----------------
    [Tue Jan 21 16:16:19.250245 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] Traceback (most recent call last):
    [Tue Jan 21 16:16:19.252221 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] SystemError: unable to access /home/dir
    [Tue Jan 21 16:16:19.249067 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Failed to exec Python script file '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
    [Tue Jan 21 16:16:19.249609 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Exception occurred processing WSGI script '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
    ----------------
    12/01 19:24:57.726 ERROR| log:0072| post-test sysinfo error: 11/01 18:24:57.727 ERROR| traceback:0013| Traceback (most recent call last): 11/01 18:24:57.728 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/common_lib/log.py", line 70, in decorated_func 11/01 18:24:57.729 ERROR| traceback:0013| fn(*args, **dargs) 11/01 18:24:57.730 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/bin/base_sysinfo.py", line 286, in log_after_each_test 11/01 18:24:57.731 ERROR| traceback:0013| old_packages = set(self._installed_packages) 11/01 18:24:57.731 ERROR| traceback:0013| SystemError: no such file or directory
    ----------------
    ..
    not exactly the output I expect, but as you said, I was not specific enough in the description of the output formatting.
    I will fix that.
    In fact I took this example, but in my working life on +10k Linux boxes as sysadmin, I used RS extensively to parse a lot of logs, so..
    Olivier G.
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Fri Feb 18 21:57:12 2022
    From Newsgroup: comp.lang.awk

    On 18.02.2022 19:08, olivier gabathuler wrote:
    Le jeudi 17 février 2022 à 03:36:11 UTC+1, Janis Papanagnou a écrit :
    On 16.02.2022 23:11, olivier gabathuler wrote:
    Hi, thank you for your outstanding contributions and discussions on
    awk.

    Working with it from more than 20 years now and still amazed at the
    power of this wonderful language !

    This is my modest contribution to shed light on a usage too little
    documented on the Internet, I named "Record Separator" :
    https://rosettacode.org/wiki/Search_in_paragraph%27s_text
    I had a peek view into the awk code and (unstructured) data sample.

    The task is described not very specific as:
    "The goal is to verify the presence of a word or regular expression
    within several paragraphs of text (structured or not) and to print
    the relevant paragraphs on the standard output."

    When I saw the code I first wondered about the definition of a two
    newlines output record separator just to define the same as input
    separator to the next awk stage. (An indication for a candidate to
    be refactored.)

    It seems that your code basically extracts from records of blocks
    those blocks that contain a specific string. In addition it changes
    the data in a subtle way beyond the formulated task description.

    Personally my first attempt for such a task would have been simpler
    (using awk's multi-line data blocks feature), something like

    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt

    with possible extensions to test for the patterns in specific fields
    (by adding FS = "\n") so that the patterns if appearing in the data
    won't compromise the correct function.

    (Note that the output of above code keeps the matched data intact.)

    Yes, features relying on the separators allow interesting solutions.
    (In the given case it's arguable whether they've been used sensibly.)

    Janis


    Olivier Gabathuler


    Hi Janis,
    thanks for your response :-)

    Just to understand, the output with
    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt
    is :
    ..
    ----------------
    [Tue Jan 21 16:16:19.250245 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] Traceback (most recent call last):
    [Tue Jan 21 16:16:19.252221 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] SystemError: unable to access /home/dir
    [Tue Jan 21 16:16:19.249067 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Failed to exec Python script file '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
    [Tue Jan 21 16:16:19.249609 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Exception occurred processing WSGI script '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
    ----------------
    12/01 19:24:57.726 ERROR| log:0072| post-test sysinfo error: 11/01 18:24:57.727 ERROR| traceback:0013| Traceback (most recent call last): 11/01 18:24:57.728 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/common_lib/log.py", line 70, in decorated_func 11/01 18:24:57.729 ERROR| traceback:0013| fn(*args, **dargs) 11/01 18:24:57.730 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/bin/base_sysinfo.py", line 286, in log_after_each_test 11/01 18:24:57.731 ERROR| traceback:0013| old_packages = set(self._installed_packages) 11/01 18:24:57.731 ERROR| traceback:0013| SystemError: no such file or directory
    ----------------
    ..
    not exactly the output I expect, but as you said, I was not specific enough in the description of the output formatting.
    I will fix that.

    Actually I was much more saying and implying. To expand on it...

    From the code and the task description it was unclear whether the
    output of your script was just by accident or deliberately beyond
    the description on the web page.

    If it was by accident differing - as a consequence of a convoluted
    design based on the field separators - then above simple code is an
    immediate improvement (in more than one aspect).

    If your task was actually to output the matching lines, but these
    matching lines should start from the keyword "Traceback" (and the
    leading time stamps suppressed), then you can and should formulate
    that in a clean way; not only the description but also the code
    should be clearly formulated.

    A clean awk function is simply substr($0,index($0,"Traceback"))
    and the resulting code still clean and comprehensible; instead of
    printing the whole record

    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }

    /Traceback/ && /SystemError/ ## this implies: print $0

    ' Traceback.txt

    you just print the desired part

    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }

    /Traceback/ && /SystemError/ {
    print substr($0,index($0,"Traceback"))
    }
    ' Traceback.txt

    A simple straightforward addition without side-effects or any hard
    to follow program logic. No unnecessary awk instances, FS-fiddling,
    or anything.

    And this code prints at least the same output as the code you posted
    on that web page. That code of yours was

    awk -v ORS='\n\n' '/SystemError/ { print RS $0 }' RS="Traceback" Traceback.txt |\
    awk -v ORS='\n----------------\n' '/Traceback/' RS="\n\n"

    If you think this code is in any way to prefer I'd be interested in
    your explanations. - No, not really, that was just rhetorical.


    In fact I took this example, but in my working life on +10k Linux boxes as sysadmin, I used RS extensively to parse a lot of logs, so..

    In fact, if that posted code you showed here is a characteristic code
    sample, then I doubt that it's a good idea to spread it to +10k Linux
    systems.

    But that taunt aside; there's nothing wrong in using the awk separators,
    it's a basic feature any proficient awk authority will [sensibly] use.
    It's its pathological or unnecessary use I consider to be problematic.

    YMMV.

    Janis


    Olivier G.


    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou@hotmail.com to comp.lang.awk on Fri Feb 18 22:15:17 2022
    From Newsgroup: comp.lang.awk

    On 18.02.2022 21:57, Janis Papanagnou wrote:
    It's its pathological or unnecessary use I consider to be problematic.

    It just occurred to me that the circle closes. The thread started
    with a book called "The Art of Unix Programming", resembling the
    classic books title "The Art of Computer Programming" from Donald
    Knuth, from an epoch where there was a clear path away from hacking
    code together individually by dedicated computer experts, towards
    taking a more systematic and scientific approach. My feeling is that
    we're still balancing on the bleeding edge of software development,
    between hacks and - whatever.

    Janis

    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From olivier gabathuler@ogabathuler@free.fr to comp.lang.awk on Sat Feb 19 11:55:32 2022
    From Newsgroup: comp.lang.awk

    Le vendredi 18 février 2022 à 21:57:14 UTC+1, Janis Papanagnou a écrit :
    On 18.02.2022 19:08, olivier gabathuler wrote:
    Le jeudi 17 février 2022 à 03:36:11 UTC+1, Janis Papanagnou a écrit :
    On 16.02.2022 23:11, olivier gabathuler wrote:
    Hi, thank you for your outstanding contributions and discussions on
    awk.

    Working with it from more than 20 years now and still amazed at the
    power of this wonderful language !

    This is my modest contribution to shed light on a usage too little
    documented on the Internet, I named "Record Separator" :
    https://rosettacode.org/wiki/Search_in_paragraph%27s_text
    I had a peek view into the awk code and (unstructured) data sample.

    The task is described not very specific as:
    "The goal is to verify the presence of a word or regular expression
    within several paragraphs of text (structured or not) and to print
    the relevant paragraphs on the standard output."

    When I saw the code I first wondered about the definition of a two
    newlines output record separator just to define the same as input
    separator to the next awk stage. (An indication for a candidate to
    be refactored.)

    It seems that your code basically extracts from records of blocks
    those blocks that contain a specific string. In addition it changes
    the data in a subtle way beyond the formulated task description.

    Personally my first attempt for such a task would have been simpler
    (using awk's multi-line data blocks feature), something like

    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt

    with possible extensions to test for the patterns in specific fields
    (by adding FS = "\n") so that the patterns if appearing in the data
    won't compromise the correct function.

    (Note that the output of above code keeps the matched data intact.)

    Yes, features relying on the separators allow interesting solutions.
    (In the given case it's arguable whether they've been used sensibly.)

    Janis


    Olivier Gabathuler


    Hi Janis,
    thanks for your response :-)

    Just to understand, the output with
    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/
    ' Traceback.txt
    is :
    ..
    ----------------
    [Tue Jan 21 16:16:19.250245 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] Traceback (most recent call last):
    [Tue Jan 21 16:16:19.252221 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] SystemError: unable to access /home/dir
    [Tue Jan 21 16:16:19.249067 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Failed to exec Python script file '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
    [Tue Jan 21 16:16:19.249609 2020] [wsgi:error] [pid 6515:tid 3041002528] [remote 10.0.0.12:50757] mod_wsgi (pid=6515): Exception occurred processing WSGI script '/home/pi/RaspBerryPiAdhan/www/sysinfo.wsgi'.
    ----------------
    12/01 19:24:57.726 ERROR| log:0072| post-test sysinfo error: 11/01 18:24:57.727 ERROR| traceback:0013| Traceback (most recent call last): 11/01 18:24:57.728 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/common_lib/log.py", line 70, in decorated_func 11/01 18:24:57.729 ERROR| traceback:0013| fn(*args, **dargs) 11/01 18:24:57.730 ERROR| traceback:0013| File "/tmp/sysinfo/autoserv-0tMj3m/bin/base_sysinfo.py", line 286, in log_after_each_test 11/01 18:24:57.731 ERROR| traceback:0013| old_packages = set(self._installed_packages) 11/01 18:24:57.731 ERROR| traceback:0013| SystemError: no such file or directory
    ----------------
    ..
    not exactly the output I expect, but as you said, I was not specific enough in the description of the output formatting.
    I will fix that.
    Actually I was much more saying and implying. To expand on it...

    From the code and the task description it was unclear whether the
    output of your script was just by accident or deliberately beyond
    the description on the web page.

    If it was by accident differing - as a consequence of a convoluted
    design based on the field separators - then above simple code is an immediate improvement (in more than one aspect).

    If your task was actually to output the matching lines, but these
    matching lines should start from the keyword "Traceback" (and the
    leading time stamps suppressed), then you can and should formulate
    that in a clean way; not only the description but also the code
    should be clearly formulated.

    A clean awk function is simply substr($0,index($0,"Traceback"))
    and the resulting code still clean and comprehensible; instead of
    printing the whole record
    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/ ## this implies: print $0

    ' Traceback.txt

    you just print the desired part
    awk 'BEGIN { RS = "" ; ORS = "\n----------------\n" }
    /Traceback/ && /SystemError/ {
    print substr($0,index($0,"Traceback"))
    }
    ' Traceback.txt

    A simple straightforward addition without side-effects or any hard
    to follow program logic. No unnecessary awk instances, FS-fiddling,
    or anything.

    And this code prints at least the same output as the code you posted
    on that web page. That code of yours was

    awk -v ORS='\n\n' '/SystemError/ { print RS $0 }'
    RS="Traceback" Traceback.txt |\
    awk -v ORS='\n----------------\n' '/Traceback/' RS="\n\n"

    If you think this code is in any way to prefer I'd be interested in
    your explanations. - No, not really, that was just rhetorical.

    In fact I took this example, but in my working life on +10k Linux boxes as sysadmin, I used RS extensively to parse a lot of logs, so..
    In fact, if that posted code you showed here is a characteristic code sample, then I doubt that it's a good idea to spread it to +10k Linux systems.

    But that taunt aside; there's nothing wrong in using the awk separators, it's a basic feature any proficient awk authority will [sensibly] use.
    It's its pathological or unnecessary use I consider to be problematic.

    YMMV.

    Janis


    Olivier G.

    Hi Janis,
    thank you very munch for your explanations and apologize for my mistake.
    Yes you stated it properly, this is exaclty what I wanted (and with using index function).
    I will fix that on rosettacode.org
    Have a nice day !
    Olivier G.
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Tue Mar 1 08:05:46 2022
    From Newsgroup: comp.lang.awk

    @Axel :
    how many scripting languages you know of that can calculate the modulo of 3 for a single hexadecimal, 2.07 GB in size, with a log2(x) value of approx. 8.9 billion (~8,898,328,444),
    in just 14.3 seconds :
    echo; ( time ( nice echo '0x888889999888888888877765432CC111111111111111111111188888888CCCCCCCCC88888888899998088888888877765432111111111111111110000000000000011111888888888DDDDDDDD8888888899998888888FFFFFFFFFFFFFFFFFFFFFFFFFfFFFFFFFFFFf88877765432111111111111111111111188888888888888888999988888888887776543211111111111111111111118888AAA88888888888889B9998888888888777654321111111111111111111111' | mawk2 'sub(/^0[Xx]/,"")+gsub(//,$0,$0)+gsub(/........................./,$0,$0)+sub("^",("0x1")($0)($0),$0)+1' | pvE0 | mawk2 'function mod3(_) {
    return \
    (sub("^0[Xx]","",_)<(substr(_,76,1)==""))\
    ? (length(_)<16?(+_):substr(_,+1,15)+substr(_,16,15)+\
    substr(_,31,15)+substr(_,46,15)+substr(_,61,15) \
    ) % 3 \
    : gsub("[ -0369CFILORUXcfx_\n]+","",_)*0+\
    (length(_)*+2 + \
    gsub("[258BEHKNQTWZbe]+","",_)*0-\
    length(_)) % 3 } BEGIN { FS=ORS; RS="^$" } END { print mod3($1) }' ) | pvE9 | ggP '[0-9]*'| lgp3 ) |ecp
    in0: 0.00 B 0:00:00 [0.00 B/s] [0.00 B/s] [<=> ]
    out9: 2.00 B 0:00:14 [ 143miB/s] [ 143miB/s] [<=> ]
    in0: 2.07GiB 0:00:01 [1.72GiB/s] [1.72GiB/s] [ <=> ] ( nice echo | mawk2 | pvE 0.1 in0 | mawk2 ; ) 12.21s user 2.25s system 101% cpu 14.299 total
    pvE 0.1 out9 0.01s user 0.02s system 0% cpu 14.298 total
    ggrep --text -P --color=always '[0-9]*' 0.00s user 0.00s system 0% cpu 14.296 total
    1
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Kpop 2GM@jason.cy.kwan@gmail.com to comp.lang.awk on Fri Mar 4 19:49:23 2022
    From Newsgroup: comp.lang.awk

    On Wednesday, February 9, 2022 at 2:50:01 AM UTC-5, Axel Reichert wrote:
    Kpop 2GM <> writes:

    i'm an ultra late-comer to awk - only discovering it in 2017-2018. and
    the moment i found it, i realized nearly all else - perl R python java
    C# - can be thrown straight into the toilet, if performance is a key criteria for the task at a hand
    I would rather go for TCW (Total Cost of Wizardry): A competent Python programmer once consulted me on performance tuning for an (ASCII data mangling) script he had written (which took him about 30 min). It was running since 10 min, and no end in sight according to a monitor on the (transformed) output. After he had explained the task at hand, I replied that I would not use Python, but rather some Unix command line tools. I started immediately, cobbled something together (awk featured
    prominently among other usual suspects, such as tr, sed, cut, grep). It delivered the desired results before his Python script was finished. So
    the final tally was "10 min" versus "> 30 min + 10 min + 10 min".

    Once the logic becomes more intricate, I will usually go for Python
    though, so I will use awk mostly for command line use, rarely as a file
    to be run by "awk -f".

    I was also a later-comer to this tool. When I started to learn Perl in
    the late 90s, I learned that it was a superset to sed and awk (coming
    even with conversion scripts), and so I gave the older tools another try (the "man" pages were completely incomprehensible to me before, I could
    not wrap my head around stream processing). Once it clicked, I rarely
    used Perl anywmore.

    Same goes for spreadsheet tools, for which I also seldom feel the need.

    Best regards

    Axel

    @Axel :

    interesting that you brought up unix CLI utilities : gnu-sed is almost unbelievably slow in this very basic replacement task -

    - replacing all odd-digits with a 1, and
    - replacing all even digits with a 0

    Maybe you can tell me what I'm doing wrong in gnu-sed, cuz I can't seem to be able to even get it to within 3x slower than mawk-1.3.4.

    f='jwengowengonoewgnwoegn.txt'; gwc -lcm "${f}";
    echo;
    ( time ( pv -q < "${f}" | mawk 'BEGIN {FS=(FS0="[3579]")(FS0);OFS="11"; ORS=""; RS="^$" } (NF=NF)+gsub("[2468][2468]","00")+gsub("[2468]","0")+gsub(FS0,"1")' | pvE 0.25 mid | xxh128sum ) ) | lgp3 ;
    sleep 1;
    ( time ( pv -q < "${f}" | gsed -zE 's/[3579]{2}/11/g;s/[2468]{2}/00/g;s/[3579]/1/g;s/[2468]/0/g' | pvE 0.25 mid | xxh128sum ) ) | lgp3

    1 275150613 275150613 jwengowengonoewgnwoegn.txt

    mid: 262MiB 0:00:02 [94.8MiB/s] [94.8MiB/s] [<=> ] ( pv -q < "${f}" | mawk | pvE 0.25 mid | xxh128sum; ) 2.52s user 0.35s system 102% cpu 2.788 total
    1fee31c387358cc9b13eea846746dfbc stdin

    mid: 262MiB 0:00:08 [30.1MiB/s] [30.1MiB/s] [<=> ] ( pv -q < "${f}" | gsed -zE | pvE 0.25 mid | xxh128sum; ) 8.57s user 0.29s system 101% cpu 8.745 total
    1fee31c387358cc9b13eea846746dfbc stdin


    The 4Chan Teller
    --- Synchronet 3.19c-Linux NewsLink 1.113