Forum: War Ensemble BBS

Unique In Column

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 2 07:10:04 2023

From Newsgroup: comp.lang.awk

# verifies an item is unique to the 2nd column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+" }

{ Field2Values[tolower($2)] = 1 }

END { if (uniqueItem("apple", FILENAME) != 0) exit 1 }

function uniqueItem(field2, file) {

lowerField2 = tolower(field2)

if(lowerField2 in Field2Values) {
print "Error: '" field2 "' was found in 2nd column of " file
return 1
} else print "Item: '" field2 "' is unique to 2nd column of " file

return 0
}

# eof
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Oct 2 07:37:03 2023

From Newsgroup: comp.lang.awk

Mike Sanders <porkchop@invalid.foo> wrote:

# verifies an item is unique to the 2nd column

quick update, why hard-code a field number anyhow?

# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

# eof
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 5 09:26:30 2023

From Newsgroup: comp.lang.awk

On 10/2/2023 2:37 AM, Mike Sanders wrote:

Mike Sanders <porkchop@invalid.foo> wrote:

# verifies an item is unique to the 2nd column

quick update, why hard-code a field number anyhow?

# verifies an item is unique to a give column
#
# example file.csv...
#
# name, alias
#
# john, kiwi
# suzi, apple
# suzi, orange

BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, "apple", FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

# eof

That's checking whether or not a value exists, not whether or not it's
unique, and producing the wrong output. If we modify it to take a
variable fruit:

$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)] = 1 }

END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
print "Error: '" field "' was found in column " col " of " file
return 1
} else print "Item: '" field "' is unique to column " col " of " file

return 0
}

and add a second "apple" in column 2 of your CSV:

$ cat file.csv
john, kiwi
suzi, apple
suzi, orange
gwen, apple

then we can run it as:

$ awk -v fruit='kiwi' -f tst.awk file.csv
Error: 'kiwi' was found in column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Item: 'grape' is unique to column 2 of file.csv

and you can see it's reporting that "grape" is a unique value when it's
not actually present at all.

If we change the script to:

$ cat tst.awk
BEGIN { FS = ",[ \t]*|[ \t]+"; COL = 2 }

{ FieldValues[tolower($COL)]++ }

END { if (uniqueItem(COL, fruit, FILENAME) != 0) exit 1 }

function uniqueItem(col, field, file) {

lowerField = tolower(field)

if(lowerField in FieldValues) {
if (FieldValues[lowerField] == 1) {
print "Item: '" field "' is unique to column " col " of " file
}
else {
print "Error: '" field "' was found in column " col " of " file
return 1
}
}
else {
print "Error: '" field "' was not found in column " col " of " file
}

return 0
}

THEN it'll report unique "fruit" values correctly as well as reporting
which are present/absent:

$ awk -v fruit='kiwi' -f tst.awk file.csv
Item: 'kiwi' is unique to column 2 of file.csv
$ awk -v fruit='apple' -f tst.awk file.csv
Error: 'apple' was found in column 2 of file.csv
$ awk -v fruit='grape' -f tst.awk file.csv
Error: 'grape' was not found in column 2 of file.csv

Regards,

Ed.
--- Synchronet 3.20a-Linux NewsLink 1.114

From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Mon Nov 6 03:14:17 2023

From Newsgroup: comp.lang.awk

Ed Morton <mortonspam@gmail.com> wrote:

[...]

Thanks Ed, must study your example & mull it over =)
--
:wq
Mike Sanders

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Grey Gamer
  Tue May 7 23:03:00 2024
  from Show Low, Az via Telnet
- Microbot
  Tue May 7 18:43:53 2024
  from Moore, Ok via Telnet
- Microbot
  Wed May 8 17:40:14 2024
  from Moore, Ok via Telnet
- Grey Gamer
  Wed May 8 00:08:54 2024
  from Show Low, Az via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	920
Nodes:	10 (2 / 8)
Uptime:	134:37:34
Calls:	12,193
Calls today:	2
Files:	186,528
Messages:	2,238,276

Unique In Column

Who's Online

Recent Visitors

System Info