• Difficulty to use sensible line breaks in expressions

    From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Wed Oct 12 16:13:59 2022
    From Newsgroup: comp.lang.awk

    About the difficulty to use sensible line breaks in expressions,
    without adding syntactically spurious escape characters.
    (Note 1: The need for line breaks arise with longer expressions.)
    (Note 2: Yes, we can use/add line-continuation/escape characters.)

    1
    2 function f (a,b) { }
    3
    4 {
    5 # okay
    6 if (f(a,b) < c + d) print a, b, c, d
    7
    8 # okay
    9 if (f(a,b) < c + d) print a, b,
    10 c, d
    11
    12 # okay
    13 if (f(a,
    14 b) < c + d) print a, b, c, d
    15
    16 # error
    17 if (f(a, b) <
    18 c + d) print a, b, c, d
    19
    20 # error
    21 if (f(a,b) < c +
    22 d) print a, b, c, d
    23
    24 # error
    25 if (f(a,b) < c + d
    26 ) print a, b, c, d
    27
    28 # okay
    29 if (f(a,b) < c &&
    30 d) print a, b, c, d
    31
    32 # okay
    33 if (f(a,b) < (c &&
    34 d)) print a, b, c, d
    35
    36 # error
    37 if (f(a,b) < (c +
    38 d)) print a, b, c, d
    39 }

    awk: awk-breaks:18: if (f(a, b) <
    awk: awk-breaks:18: ^ unexpected newline or end of string
    awk: awk-breaks:18: c + d) print a, b, c, d
    awk: awk-breaks:18: ^ syntax error
    awk: awk-breaks:22: if (f(a,b) < c +
    awk: awk-breaks:22: ^ unexpected newline or end of string awk: awk-breaks:26: if (f(a,b) < c + d
    awk: awk-breaks:26: ^ unexpected newline or end of
    string
    awk: awk-breaks:38: if (f(a,b) < (c +
    awk: awk-breaks:38: ^ unexpected newline or end of string awk: awk-breaks:38: d)) print a, b, c, d
    awk: awk-breaks:38: ^ syntax error
    awk: awk-breaks:38: d)) print a, b, c, d
    awk: awk-breaks:38: ^ syntax error
    awk: awk-breaks:39: d)) print a, b, c, d
    awk: awk-breaks:39: ^ unexpected
    newline or end of string


    Is throwing (some/any of) these syntax errors mandated by POSIX? - If
    not, Awk variants, I suppose, could decide to implement semantically
    sensible [valid] interpretations and remove existing inconsistencies?

    Janis
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.lang.awk on Wed Oct 12 16:56:19 2022
    From Newsgroup: comp.lang.awk

    On 2022-10-12, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    About the difficulty to use sensible line breaks in expressions,
    without adding syntactically spurious escape characters.
    (Note 1: The need for line breaks arise with longer expressions.)
    (Note 2: Yes, we can use/add line-continuation/escape characters.)

    1
    2 function f (a,b) { }
    3
    4 {
    5 # okay
    6 if (f(a,b) < c + d) print a, b, c, d
    7
    8 # okay
    9 if (f(a,b) < c + d) print a, b,
    10 c, d
    11
    12 # okay
    13 if (f(a,
    14 b) < c + d) print a, b, c, d
    15
    16 # error
    17 if (f(a, b) <
    18 c + d) print a, b, c, d
    19
    20 # error
    21 if (f(a,b) < c +
    22 d) print a, b, c, d
    23
    24 # error
    25 if (f(a,b) < c + d
    26 ) print a, b, c, d
    27
    28 # okay
    29 if (f(a,b) < c &&
    30 d) print a, b, c, d
    31
    32 # okay
    33 if (f(a,b) < (c &&
    34 d)) print a, b, c, d
    35
    36 # error
    37 if (f(a,b) < (c +
    38 d)) print a, b, c, d
    39 }

    awk: awk-breaks:18: if (f(a, b) <
    awk: awk-breaks:18: ^ unexpected newline or end of string
    awk: awk-breaks:18: c + d) print a, b, c, d
    awk: awk-breaks:18: ^ syntax error
    awk: awk-breaks:22: if (f(a,b) < c +
    awk: awk-breaks:22: ^ unexpected newline or end of string awk: awk-breaks:26: if (f(a,b) < c + d
    awk: awk-breaks:26: ^ unexpected newline or end of
    string
    awk: awk-breaks:38: if (f(a,b) < (c +
    awk: awk-breaks:38: ^ unexpected newline or end of string awk: awk-breaks:38: d)) print a, b, c, d
    awk: awk-breaks:38: ^ syntax error
    awk: awk-breaks:38: d)) print a, b, c, d
    awk: awk-breaks:38: ^ syntax error
    awk: awk-breaks:39: d)) print a, b, c, d
    awk: awk-breaks:39: ^ unexpected
    newline or end of string


    Is throwing (some/any of) these syntax errors mandated by POSIX? - If
    not, Awk variants, I suppose, could decide to implement semantically
    sensible [valid] interpretations and remove existing inconsistencies?

    Newlines are significant in Awk, and appear as a token (the NEWLINE
    token int the POSIX grammar).

    Not all parts of the grammar recognize newline tokens, so they
    cause a syntax error.

    I think that would require that, for instance the phrase structure for
    E + E would admit zero or more newline tokens on either side of the +,
    which are ignored.

    Or else, we have the parser communicate with the lexer, so that the
    lexer makes newlines disappear and reappear in a syntax-directed way.

    I suspect that this wouldn't be upstreamed into gawk.

    I have a fork of gawk called egawk (enhanced gnu awk) where this
    approach could be tried.

    At certain points in the parser, we call
    some function in the lexer which says "eat newlines; do not feed me
    NEWLINE tokens", and at other points we re-enable newlines.

    The lexer could do it itself; for instance if a '(' token is processed,
    it may be okay to enable newline-eating until the matching ')',
    which just requires a counter. So then line breaks would be allowed
    in anything parenthesized, without disturbing their syntactic role
    as alternative semicolon terminators.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.lang.awk on Wed Oct 12 19:15:30 2022
    From Newsgroup: comp.lang.awk

    On 2022-10-12, Kaz Kylheku <864-117-4973@kylheku.com> wrote:
    I have a fork of gawk called egawk (enhanced gnu awk) where this
    approach could be tried.

    I got it working very easily, at the proof of concept stage,
    not having validated test cases and such:

    Patched:

    ~/gawk$ ./gawk 'BEGIN {
    if (x +
    x == 0) { print "blah" } }'
    blah

    Stock distro gawk:

    ~/gawk$ gawk 'BEGIN {
    if (x +
    x == 0) { print "blah" } }'
    gawk: cmd. line:3: if (x +
    gawk: cmd. line:3: ^ unexpected newline or end of string
    gawk: cmd. line:3: x == 0) { print "blah" } }
    gawk: cmd. line:3: ^ syntax error


    Patched, in --posix mode:

    ~/gawk$ ./gawk --posix 'BEGIN {
    if (x +
    x == 0) { print "blah" } }'
    gawk: cmd. line:3: if (x +
    gawk: cmd. line:3: ^ unexpected newline or end of string
    gawk: cmd. line:3: x == 0) { print "blah" } }
    gawk: cmd. line:3: ^ syntax error


    Patch:

    ~/gawk$ git diff awkgram.y
    diff --git a/awkgram.y b/awkgram.y
    index fc35100d..c24e35c5 100644
    --- a/awkgram.y
    +++ b/awkgram.y
    @@ -3911,6 +3911,13 @@ yylex(void)

    case '\n':
    sourceline++;
    + /*
    + * If not in POSIX mode, allow free-form newline in bracketed
    + * and parenthesized expressions, by swallowing '\n' rather than
    + * turning it into a NEWLINE token.
    + */
    + if (! do_posix && in_parens)
    + goto retry;
    return lasttok = NEWLINE;

    case '#': /* it's a comment */

    Very easy; the lexer already counts parentheses, so nothing to do.

    All of the above said and patched, note that you can use backslash continuations, which is a bit ugly:

    ~/gawk$ gawk 'BEGIN {
    if (x + \
    x == 0) { print "blah" } }'
    blah

    So before trying to upstreaming, you need a convincing argument why standard-conforming backslash-newline continuations aren't good enough.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Wed Oct 12 23:24:34 2022
    From Newsgroup: comp.lang.awk

    On 12.10.2022 21:15, Kaz Kylheku wrote:

    So before trying to upstreaming, you need a convincing argument why standard-conforming backslash-newline continuations aren't good enough.

    I acknowledged line-continuation/escapes in my OP:
    About the difficulty to use sensible line breaks in expressions,
    without adding syntactically spurious escape characters.
    ...
    (Note 2: Yes, we can use/add line-continuation/escape characters.)

    It may be just me, but I consider line-continuation as a hack of the
    last century or even of the 1960's (cf. the '+' symbol in column 1 of
    punch cards, where THAT continuation has NOT the issues of invisible
    whitespace characters after the '\' that we have at least since the
    UNIX epoch). In the Awk language, because of its design, we have to
    put certain things together on a line because of an otherwise changed semantics; e.g. pattern { action } cannot be split before the
    braces. In other places (see my OP-examples) it's syntactically and semantically unnecessary. There's also inconsistencies (see examples
    again) in expressions (with + vs. && to name just one).

    But as you pointed out in your first post, the syntax is in POSIX, so
    at least in POSIX mode it should behave standard conforming. (If the
    POSIX syntax is "informational" only the valuation may change, though.)

    In cases where fatal (syntax-)errors are [unnecessarily] produced,
    though, I think that a more graceful/accommodating behavior would
    not only add to readability, safety, and consistency, it might also
    increase the attractivity for new users and acceptance by users (in
    case anyone is concerned about such considerations).

    That's all. I don't think that anything will change here. And I will
    continue to write lengthy lines in Awk (where its syntax requires it)
    and hope to not need looking into it again some time later, or check
    (in case of bug tracking) whether any continuation will have a NL
    immediately after it. And in 10 years when I will have forgot my post
    I'll probably ask that question again.

    Janis

    PS: Thanks for your prove of concept and tests.

    --- Synchronet 3.19c-Linux NewsLink 1.113