From Newsgroup: comp.lang.awk
On 10/2/2023 2:37 AM, Mike Sanders wrote:
Mike Sanders <porkchop@invalid.foo> wrote:
# verifies an item is unique to the 2nd column
quick update, why hard-code a field number anyhow?
# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }
{ FieldValues[tolower($COL)] = 1 }
END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }
function uniqueItem(col, field, file) {
lowerField = tolower(field)
if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file
return 0
}
# eof
That's checking whether or not a value exists, not whether or not it's
unique, and producing the wrong output. If we modify it to take a
variable fruit:
$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }
{ FieldValues[tolower($COL)] = 1 }
END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }
function uniqueItem(col, field, file) {
lowerField = tolower(field)
if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file
return 0
}
and add a second "apple" in column 2 of your CSV:
$ cat file.csv
john, kiwi
suzi, apple
suzi, orange
gwen, apple
then we can run it as:
$ awk -v fruit='kiwi' -f tst.awk file.csv
Error: 'kiwi' was found in column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Item: 'grape' is unique to column 2 of file.csv
and you can see it's reporting that "grape" is a unique value when it's
not actually present at all.
If we change the script to:
$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }
{ FieldValues[tolower($COL)]++ }
END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }
function uniqueItem(col, field, file) {
lowerField = tolower(field)
if(lowerField in FieldValues) {
if (FieldValues[lowerField] == 1) {
print "Item: '" field "' is unique to column " col " of " file
}
else {
print "Error: '" field "' was found in column " col " of " file
return 1
}
}
else {
print "Error: '" field "' was not found in column " col " of " file
}
return 0
}
THEN it'll report unique "fruit" values correctly as well as reporting
which are present/absent:
$ awk -v fruit='kiwi' -f tst.awk file.csv
Item: 'kiwi' is unique to column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Error: 'grape' was not found in column 2 of file.csv
Regards,
Ed.
--- Synchronet 3.20a-Linux NewsLink 1.114