• approaches for reformatting data points into pairs

    From Bryan@bryanlepore@gmail.com to comp.lang.awk on Fri Mar 17 07:21:08 2023
    From Newsgroup: comp.lang.awk

    I am interested in generally organizing a long string of comma-separated numbers ("CSV" or "CVS") in different ways. For instance, I'd like to get every other pair of numbers (see below for work). This might be useful and extendable for basic mathematical analysis, or practical reformatting of program output. E.g. svg files have paths with such features (see the "q" or "c" commands), or for plotting different sets the data, e.g. every other pair, or other combinations. (However, I note that the gnuplot "every" command is also useful for this).
    For example this sequence: -10.000000,-9.000000,-8.000000,-7.000000,-[...trim...]7.000000,8.000000,9.000000,10.000000
    can be put into different groups, for example these "x,y" data points : -10.000, -9.000
    -8.000, -7.000
    -6.000, -5.000
    -4.000, -3.000
    -2.000, -1.000
    0.000, 1.000
    2.000, 3.000
    4.000, 5.000
    6.000, 7.000
    8.000, 9.000
    10.000,
    (note there is no partner for the last pair). This script will do that (with extra details shown to help follow the processes):
    awk_dev_test_seq=$(seq -s',' -f '%f' -10 10)
    gawk -F, '
    {
    {
    for (i=1;i<=NF;i++ )
    {
    if ( i % 2 == 0 ) printf("i=%s Y:%3.3f%s ", i, $i, "\n")
    else
    printf("i=%s X:%3.3f%s ", i, $i, ",")
    }
    }
    }' <<EOF
    ${awk_dev_test_seq}
    EOF
    The number in (i % 2 == 0 ) can be adjusted to get e.g. each line containing the three consecutive numbers by changing "i % 2" to "i % 3". results :
    i=1 X:-10.000, i=2 X:-9.000, i=3 Y:-8.000
    ... and so on. I have been looking at how to do other groupings of the data - for example, getting every other *pair* of numbers would be interesting, illustrated in this pseudo-output :
    keep this line : -10.000, -9.000
    Skip this line->-8.000, -7.000
    keep this line : -6.000, -5.000
    Skip this line-> -4.000, -3.000
    keep this line : -2.000, -1.000
    I am asking what approaches might be best to do that in awk - if/else, while, for, or other control sequences (I think is the term for those).
    Tried to keep this short, but I'll note some interesting postings on this topic :
    "Parsing standard CVS data by gawk" https://lists.gnu.org/archive/html/bug-gawk/2015-07/msg00002.html
    "CSV parsing with awk" https://backreference.org/2010/04/17/csv-parsing-with-awk/index.html
    -Bryan
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Fri Mar 17 16:09:27 2023
    From Newsgroup: comp.lang.awk

    On 17.03.2023 15:21, Bryan wrote:
    I am interested in generally organizing a long string of
    comma-separated numbers ("CSV" or "CVS") in different ways. For
    instance, I'd like to get every other pair of numbers (see below for
    work). This might be useful and extendable for basic mathematical
    analysis, or practical reformatting of program output. E.g. svg files
    have paths with such features (see the "q" or "c" commands), or for
    plotting different sets the data, e.g. every other pair, or other combinations. (However, I note that the gnuplot "every" command is
    also useful for this).

    For example this sequence:

    -10.000000,-9.000000,-8.000000,-7.000000,-[...trim...]7.000000,8.000000,9.000000,10.000000

    can be put into different groups, for example these "x,y" data points :

    -10.000, -9.000
    -8.000, -7.000
    -6.000, -5.000
    -4.000, -3.000
    -2.000, -1.000
    0.000, 1.000
    2.000, 3.000
    4.000, 5.000
    6.000, 7.000
    8.000, 9.000
    10.000,

    (note there is no partner for the last pair). This script will do
    that (with extra details shown to help follow the processes):


    I'm not sure you want some "universal" script or just hints for coding variants. For the former case you should specify the requirements
    accurately. In the latter case see below...

    awk_dev_test_seq=$(seq -s',' -f '%f' -10 10)
    gawk -F, '
    {
    {
    for (i=1;i<=NF;i++ )
    {
    if ( i % 2 == 0 ) printf("i=%s Y:%3.3f%s ", i, $i, "\n")
    else
    printf("i=%s X:%3.3f%s ", i, $i, ",")
    }
    }
    }' <<EOF
    ${awk_dev_test_seq}
    EOF

    Personally I'd take a (slightly) different approach here, like doing
    a handling of irregular (odd) cases

    awk -F, '
    NF % 2 == 1 { ...in case of odd number of fields - what to do?... }
    NF % 2 == 0 { ...(regular?) case of even number of fields... }
    '

    (The second condition may be irrelevant if you use the first action
    to fix your data, and you can fall through in the regular case.)

    For the iteration I'd do

    for (i=1; i<=NF; i+=2) # i.e. increment by 2

    and print a pair of numbers in one single print statement

    printf "X:%3.3f%s,Y:%3.3f%s\n", $i, $(i+1)

    (adjust the formatting string and arguments as desired).

    In case you want to skip a data pair adjust the increment
    appropriately, say, by i+=4 (for your example below), or by
    i+=3 if you want to skip a data value (say a Z-coordinate).

    Janis


    The number in (i % 2 == 0 ) can be adjusted to get e.g. each line
    containing the three consecutive numbers by changing "i % 2" to "i % 3". results :

    i=1 X:-10.000, i=2 X:-9.000, i=3 Y:-8.000

    ... and so on. I have been looking at how to do other groupings of
    the data - for example, getting every other *pair* of numbers would be interesting, illustrated in this pseudo-output :

    keep this line : -10.000, -9.000
    Skip this line->-8.000, -7.000
    keep this line : -6.000, -5.000
    Skip this line-> -4.000, -3.000
    keep this line : -2.000, -1.000

    I am asking what approaches might be best to do that in awk -
    if/else, while, for, or other control sequences (I think is the term
    for those).

    Tried to keep this short, but I'll note some interesting postings on this topic :

    "Parsing standard CVS data by gawk" https://lists.gnu.org/archive/html/bug-gawk/2015-07/msg00002.html
    "CSV parsing with awk" https://backreference.org/2010/04/17/csv-parsing-with-awk/index.html

    -Bryan


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bryan@bryanlepore@gmail.com to comp.lang.awk on Fri Mar 17 09:32:03 2023
    From Newsgroup: comp.lang.awk

    On Friday, March 17, 2023 at 11:09:30 AM UTC-4, Janis Papanagnou wrote:
    Personally I'd take a (slightly) different approach here, like doing
    a handling of irregular (odd) cases

    awk -F, '
    NF % 2 == 1 { ...in case of odd number of fields - what to do?... }
    NF % 2 == 0 { ...(regular?) case of even number of fields... }
    '

    (The second condition may be irrelevant if you use the first action
    to fix your data, and you can fall through in the regular case.)
    This is interesting, thanks.

    For the iteration I'd do

    for (i=1; i<=NF; i+=2) # i.e. increment by 2

    and print a pair of numbers in one single print statement

    printf "X:%3.3f%s,Y:%3.3f%s\n", $i, $(i+1)

    (adjust the formatting string and arguments as desired).

    In case you want to skip a data pair adjust the increment
    appropriately, say, by i+=4 (for your example below), or by
    i+=3 if you want to skip a data value (say a Z-coordinate).
    that idea - in the following script - appears to be exactly what I mean: awk_dev_test_seq=$(seq -s',' -f '%f' -10 10)
    gawk -F, '
    {
    for (i=1; i<=NF; i+=4 )
    printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )
    }' <<EOF
    ${awk_dev_test_seq}
    EOF
    output:
    i=1 -10.000 -9.000
    i=5 -6.000 -5.000
    i=9 -2.000 -1.000
    i=13 2.000 3.000
    i=17 6.000 7.000
    That helped a lot, thank you.
    -Bryan
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sat Mar 18 04:14:31 2023
    From Newsgroup: comp.lang.awk

    On 17.03.2023 17:32, Bryan wrote:
    printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )

    I see you added parenthesis. But note that 'printf' - as 'print',
    but as opposed to 'sprintf()' - is a statement, not a function.
    Just by the way.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Sat Mar 18 04:17:25 2023
    From Newsgroup: comp.lang.awk

    In article <tv3aao$2aucf$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 17.03.2023 17:32, Bryan wrote:
    printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )

    I see you added parenthesis. But note that 'printf' - as 'print',
    but as opposed to 'sprintf()' - is a statement, not a function.

    Although you don't say so explicitly, the implication is that using
    parentheses with printf is wrong. This implication is incorrect.

    Although the parens are optional in most cases, they are necessary in
    certain cases. I always use them (when I use printf in awk), because:

    1) It looks better (IMHO, of course). It conforms more to what we
    would expect to see in C.
    2) It is necessary in certain cases, so might as well use them always.
    --
    The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
    http://user.xmission.com/~gazelle/Sigs/DanQuayle
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Sat Mar 18 06:19:59 2023
    From Newsgroup: comp.lang.awk

    On 18.03.2023 05:17, Kenny McCormack wrote:
    In article <tv3aao$2aucf$1@dont-email.me>,
    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    On 17.03.2023 17:32, Bryan wrote:
    printf ( "i=%s %3.3f %3.3f \n", i, $i, $(i+1) )

    I see you added parenthesis. But note that 'printf' - as 'print',
    but as opposed to 'sprintf()' - is a statement, not a function.

    Although you don't say so explicitly, the implication is that using parentheses with printf is wrong. This implication is incorrect.

    There was no implication that they are wrong - actually they work.

    But to know that it is a statement and not a function allows you
    to understand how the mechanics are, and to derive explanations
    for cases in which expressions parentheses are necessary, and in
    these cases it's not because of [wrongly assuming] that it is a
    function. In other words; knowing the difference allows to grasp
    the semantics of these language construct.


    Although the parens are optional in most cases, they are necessary in
    certain cases. I always use them (when I use printf in awk), because:

    1) It looks better (IMHO, of course). It conforms more to what we
    would expect to see in C.
    2) It is necessary in certain cases, so might as well use them always.

    That's worth religious wars. :-) I'll abstain.

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114