Hi
I have file with 7 fields.
The first field is serial number
In some records 5th field is missing.
Few records got truncated with the next record. In the sample file
I have shown only two records truncation but in some cases even three to four records got truncated.
sample file:
1 651 643786485 107249 5190 M SMITH 1284
2 963 212018826 103480 M746 R WADHWA 156
3 232 215036022 105012 M743 SAMBA 337
4 232 215036023 105012 M743 SAMBA 443
5 054 215036704 103325 KIYA K 351 ====> 5th field is missing
6 205 308363068 103402 5537 Mc DON 943
7 231 343328800 105880 MANO M 6403 8 231 343329128 105880 MANO M 8324 =====> in both the records 5th field is missing
9 309 361257222 103595 M564 C R SAM 102 10 309 361297561 103595 M564 C R SAM 332
11 216 308659868 625402 9693 FERNAND 365
The required output:
1 651 643786485 107249 5190 M SMITH 1284
2 963 212018826 103480 M746 R WADHWA 156
3 232 215036022 105012 M743 SAMBA 337
4 232 215036023 105012 M743 SAMBA 443
5 054 215036704 103325 4897 KIYA K 351
6 205 308363068 103402 5537 Mc DON 943
7 231 343328800 105880 MANO M 6403
8 231 343329128 105880 MANO M 8324
9 309 361257222 103595 M564 C R SAM 102
10 309 361297561 103595 M564 C R SAM 332
I have tried by considering the serial number as RS but did not get the desired result
awk 'BEGIN{RS="[0-9]+"}{
print $0 RT
}' file
Actually I need first four fields(including serial number) and the last field.
If the "," delimiter is given in the output that would be more helpful.
Thank you
Actually I need first four fields(including serial number) and the last field.
If the "," delimiter is given in the output that would be more helpful.
The contents of your post is inconsistent...
On 18.01.2023 04:30, raj wrote:
Hi
I have file with 7 fields.
No. Field numbers vary. A typical value is 8.
The first field is serial number
No. There's gaps, or, joined subsequent lines.
In some records 5th field is missing.
Also other fields in joined lines.
Few records got truncated with the next record. In the sample file
I have shown only two records truncation but in some cases even three to four records got truncated.
sample file:
1 651 643786485 107249 5190 M SMITH 1284
2 963 212018826 103480 M746 R WADHWA 156
3 232 215036022 105012 M743 SAMBA 337
4 232 215036023 105012 M743 SAMBA 443
5 054 215036704 103325 KIYA K 351 ====> 5th field is missing
6 205 308363068 103402 5537 Mc DON 943
7 231 343328800 105880 MANO M 6403 8 231 343329128 105880 MANO M 8324 =====> in both the records 5th field is missing
9 309 361257222 103595 M564 C R SAM 102 10 309 361297561 103595 M564 C R SAM 332
11 216 308659868 625402 9693 FERNAND 365
The required output:
1 651 643786485 107249 5190 M SMITH 1284
2 963 212018826 103480 M746 R WADHWA 156
3 232 215036022 105012 M743 SAMBA 337
4 232 215036023 105012 M743 SAMBA 443
5 054 215036704 103325 4897 KIYA K 351
And where from should that "4897" come?
6 205 308363068 103402 5537 Mc DON 943
7 231 343328800 105880 MANO M 6403
8 231 343329128 105880 MANO M 8324
You want records with 7 and 8 fields mixed?
9 309 361257222 103595 M564 C R SAM 102
10 309 361297561 103595 M564 C R SAM 332
I have tried by considering the serial number as RS but did not get the desired result
awk 'BEGIN{RS="[0-9]+"}{
print $0 RT
}' file
Actually I need first four fields(including serial number) and the last field.
This does not match with the "required output" above.
If the "," delimiter is given in the output that would be more helpful.
Thank you
...so fix your data sample and requirements first.
And have a closer look on the definition of lines that have a number
of fields that may be 14, 15, 16, and how to distinguish that data.
And speak with the one who created that data trash to fix his process.
Janis
[...]
The data was copy and pasted in a text editor from a pdf file.
The user is not having any tool/access to convert the pdf to doc or excel.
The problem is arising when it is directly copied from the pdf file.
That is the reason for inconsistency.
[snip]
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 793 |
Nodes: | 10 (1 / 9) |
Uptime: | 39:32:19 |
Calls: | 11,106 |
Calls today: | 3 |
Files: | 186,086 |
Messages: | 1,751,466 |