I'm using gawk 5.1.0, bash 5.1.16, Ubuntu 22.04.2. I will write and
provide a lot of material in case it is useful or there is conflict
in the script, but I am trying not to ramble.
I prepared a test script below - which should be easy to copy/paste
into a shell, e.g. bash. I am focused on the gsub regexps, which are obviously contrived to replace all these different strings which - as
they vary from output from another program - take the general form (attempting a "plain English" version):
[open apostrophe][the word "path"][maybe an underscore][various
digits][end apostrophe]
I want to take all of that ^^^ and delete it - or equivalently
replace it with nothing (ideally), to prepare input to gnuplot as
"x,y" or "x y" data - two columns.
I tried using this type of command :
gsub("^[a-z]{4}$","TEST") ;
... and more, e.g. trying sub and gensub - but did not get far - I am
aware of a curly brace escape that is important or not depending on
the awk version, so I also tried with \{ and \}.
I put "TEST" in the present case for testing a few different cases. I
wrote this script based on extensive reading of a certain popular
online resource and the The Awk Programming Language (1988 - maybe
time for a newer edition?). This is a useful script because as I find
new types of output from the upstream program (a whole other story),
I might add new gsub commands to take care of it.
copy/paste example script:
echo "\
{\"path_1234567\"\
:[`seq -s',' -f '%f' 1 20 `],\
\"path_123456\"\
:[`seq -s',' -f '%f' 1 20 `],\
\"path_1234\"\
:[`seq -s',' -f '%f' 1 20 `],\
\"path1234\"\
:[`seq -s',' -f '%f' 1 20 `]}" | \
gawk -F, '
{
gsub("\{","") ;
gsub("\}","") ;
gsub("\]","") ;
gsub("^[a-z]{4}$","TEST") ;
gsub("\"[a-z][a-z][a-z][a-z]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9]\":\\\[","TESTSEVEN") ;
gsub("\"[a-z][a-z][a-z][a-z][0-9][0-9][0-9][0-9][0-9][0-9]\":\\\[","TESTSIX") ;
gsub("\"[a-z][a-z][a-z][a-z][0-9][0-9][0-9][0-9]\":\\\[","TESTFOURB") ;
gsub("\"[a-z][a-z][a-z][a-z]_[0-9][0-9][0-9][0-9]\":\\\[","TESTFOURA") ;
for (i=1;i<=NF;i++)
{
printf("%s%s",$i,i%2?",":"\n")
}
}'
... the last printf thing is perhaps for another post, but (IIUC)
matches every 2nd comma and replaces it with a newline.
So that's the
"x,y" data idea. I hope that is clear - I imagine the regexps in the [a-z][0-9] parts ought to be able to go all into one gsub if I knew
the syntax or what to read about.
But with your samples above you can also use other regexp syntaxes,
like ? (for optional parts) and use grouping with parenthesis (...)
for longer subexpressions, e.g.
[a-z][4}_?[0-9]{4}([0-9]{2})?
for an optional underscore and two optional digits.
Apologies for the `seq` synthetic data, I'll prepare it the better way
next time.
But with your samples above you can also use other regexp syntaxes,
like ? (for optional parts) and use grouping with parenthesis (...)
for longer subexpressions, e.g.
[a-z][4}_?[0-9]{4}([0-9]{2})?
for an optional underscore and two optional digits.
This is exactly what I was looking for and it works (I think a typo is
in there but let's leave it for now).
I tried {1-4} to get a range, but it didn't work - is that the idea? so
[a-z]{4}_?[0-9]{4}([0-9]{1-4})?
to match any number of digits from 1 to 4?
This is great. My old awk book (Aho, Kernighan, and Weinberger) has a
table on p.32 saying :
"expression [c1-c2] matches any character in the range beginning
with c1 and ending with c2."
... p.30 has more discussion, and I never saw anything about the
comma "," to indicate a range - perhaps this is a strong indication I
need to get a better book.
On 12.03.2023 21:11, Bryan wrote:
This is great. My old awk book (Aho, Kernighan, and Weinberger) [...]
The multiplicity syntax {N}, {N,}, {,M}, {N,M} is not supported by the classic awk ("nawk") that is based of Aho's, etc. book. More recent and commonly used Awks like GNU awk supports it, though. That's why there's
no mention in that book.
I noticed in the "Computerphile" video with Brian Kernighan - shared
on this user group - that a new version of The Awk Book might be in
the works as of August 2022.
Meanwhile, the overnight delivery is in-hand now, [...] There is
more.
Lastly, fom the back cover :
"You have the freedom to copy and modify this GNU manual."
Glad to support the FSF in this way!
On 14.03.2023 14:55, Bryan wrote:
I noticed in the "Computerphile" video with Brian Kernighan - shared
on this user group - that a new version of The Awk Book might be in
the works as of August 2022.
I cannot find a new version of the original Awk book with Google
(or other commercial providers). Could you provide a link, please?
Or are you speaking about Arnold Robbin's book? (Especially since
below you mention GNU and the FSF.)
I'm certainly confused by your mention of Brian Kernighan, one of
the authors of the original book.
I noticed in the "Computerphile" video with Brian Kernighan - shared
on this user group - that a new version of The Awk Book might be in
the works as of August 2022.
Meanwhile, the overnight delivery is in-hand now, and, from page 45:
"[begin quote]
{n}
{n,}
{n,m}
One or two numbers inside braces denote an *interval expression*. If
there is one number in the braces, the preceeding regexp is repeated n
times. If there are two numbers separated by a comma, the preceding
regexp is repeated n to m times. if [p. 46] there is one number
followed by a comma, then the preceding regexp is repeated at least n times:[end quote]"
... examples shown are :
wh{3}y Matches 'whhhy', but not 'why' or 'whhhhy'.
wh{3,5}y matches 'whhhy', 'whhhy', or 'whhhhhy' only.
wh{2,}y matches 'whhy', 'whhhy', and so on.
There is more.
Lastly, fom the back cover :
"You have the freedom to copy and modify this GNU manual."
Glad to support the FSF in this way!
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
On 14.03.2023 14:55, Bryan wrote:
I noticed in the "Computerphile" video with Brian Kernighan - shared
on this user group - that a new version of The Awk Book might be in
the works as of August 2022.
I cannot find a new version of the original Awk book with Google
(or other commercial providers). Could you provide a link, please?
Or are you speaking about Arnold Robbin's book? (Especially since
below you mention GNU and the FSF.)
I'm certainly confused by your mention of Brian Kernighan, one of
the authors of the original book.
Th phrase "might be in the works" means only that there is a possibility
that a new edition might be in preparation. Is that's what's confusing?
Bryan is clearly talking about a new version of the original book, but
he is referring to the most vague suggestion that there might, soon, be
a new edition. As far as I can tell there isn't one, but there could be
on "in the works" (i.e. in preparation).
I apologize for the confusion!
I will make a note on the Brian Kernighan video thread - the video I listened to/watched when stuck (not a bad idea, IMHO).
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 793 |
Nodes: | 10 (1 / 9) |
Uptime: | 40:07:03 |
Calls: | 11,106 |
Calls today: | 3 |
Files: | 186,086 |
Messages: | 1,751,481 |