• Unique In Column

    From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 2 07:10:04 2023
    From Newsgroup: comp.lang.awk

    # verifies an item is unique to the 2nd column
    #
    # example file.csv...
    #
    # name, alias
    #
    # john, kiwi
    # suzi, apple
    # suzi, orange

    BEGIN { FS = ",[ \t]*|[ \t]+" }

    { Field2Values[tolower($2)] = 1 }

    END { if (uniqueItem("apple", FILENAME) != 0) exit 1 }

    function uniqueItem(field2, file) {

    lowerField2 = tolower(field2)

    if(lowerField2 in Field2Values) {
    print "Error: '" field2 "' was found in 2nd column of " file
    return 1
    } else print "Item: '" field2 "' is unique to 2nd column of " file

    return 0
    }

    # eof
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 2 07:37:03 2023
    From Newsgroup: comp.lang.awk

    Mike Sanders <porkchop@invalid.foo> wrote:

    # verifies an item is unique to the 2nd column

    quick update, why hard-code a field number anyhow?

    # verifies an item is unique to a give column
    #
    # example file.csv...
    #
    # name, alias
    #
    # john, kiwi
    # suzi, apple
    # suzi, orange

    BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

    { FieldValues[tolower($COL)] = 1 }

    END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

    function uniqueItem(col, field, file) {

    lowerField = tolower(field)

    if(lowerField in FieldValues) {
    print "Error: '" field "' was found in column " col " of " file
    return 1
    } else print "Item: '" field "' is unique to column " col " of " file

    return 0
    }

    # eof
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 09:26:30 2023
    From Newsgroup: comp.lang.awk

    On 10/2/2023 2:37 AM, Mike Sanders wrote:
    Mike Sanders <porkchop@invalid.foo> wrote:

    # verifies an item is unique to the 2nd column

    quick update, why hard-code a field number anyhow?

    # verifies an item is unique to a give column
    #
    # example file.csv...
    #
    # name, alias
    #
    # john, kiwi
    # suzi, apple
    # suzi, orange

    BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

    { FieldValues[tolower($COL)] = 1 }

    END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

    function uniqueItem(col, field, file) {

    lowerField = tolower(field)

    if(lowerField in FieldValues) {
    print "Error: '" field "' was found in column " col " of " file
    return 1
    } else print "Item: '" field "' is unique to column " col " of " file

    return 0
    }

    # eof


    That's checking whether or not a value exists, not whether or not it's
    unique, and producing the wrong output. If we modify it to take a
    variable fruit:

    $ cat tst.awk
    BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

    { FieldValues[tolower($COL)] = 1 }

    END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

    function uniqueItem(col, field, file) {

    lowerField = tolower(field)

    if(lowerField in FieldValues) {
    print "Error: '" field "' was found in column " col " of " file
    return 1
    } else print "Item: '" field "' is unique to column " col " of " file

    return 0
    }

    and add a second "apple" in column 2 of your CSV:

    $ cat file.csv
    john, kiwi
    suzi, apple
    suzi, orange
    gwen, apple

    then we can run it as:

    $ awk -v fruit='kiwi' -f tst.awk file.csv
    Error: 'kiwi' was found in column 2 of file.csv
    $ awk -v fruit='apple' -f tst.awk file.csv
    Error: 'apple' was found in column 2 of file.csv
    $ awk -v fruit='grape' -f tst.awk file.csv
    Item: 'grape' is unique to column 2 of file.csv

    and you can see it's reporting that "grape" is a unique value when it's
    not actually present at all.

    If we change the script to:

    $ cat tst.awk
    BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

    { FieldValues[tolower($COL)]++ }

    END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

    function uniqueItem(col, field, file) {

    lowerField = tolower(field)

    if(lowerField in FieldValues) {
    if (FieldValues[lowerField] == 1) {
    print "Item: '" field "' is unique to column " col " of " file
    }
    else {
    print "Error: '" field "' was found in column " col " of " file
    return 1
    }
    }
    else {
    print "Error: '" field "' was not found in column " col " of " file
    }

    return 0
    }

    THEN it'll report unique "fruit" values correctly as well as reporting
    which are present/absent:

    $ awk -v fruit='kiwi' -f tst.awk file.csv
    Item: 'kiwi' is unique to column 2 of file.csv
    $ awk -v fruit='apple' -f tst.awk file.csv
    Error: 'apple' was found in column 2 of file.csv
    $ awk -v fruit='grape' -f tst.awk file.csv
    Error: 'grape' was not found in column 2 of file.csv

    Regards,

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Nov 6 03:14:17 2023
    From Newsgroup: comp.lang.awk

    Ed Morton <mortonspam@gmail.com> wrote:

    [...]

    Thanks Ed, must study your example & mull it over =)
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114