For a number of reasons I am looking for a way of recording a
list of the files (and file-like objects) on a Unix system at
certain points in time. The main output would simply be sorted
text with one fully-qualified file name on each line.
What follows is my first attempt at it. I'd appreciate any
feedback on whether I am going about it the right way or
whether it could be improved either in concept or in coding.
There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is
that the reports would be stored in timestamped files so that
comparisons between one report and another could be made later.
The first, and primary, script generates nothing other than
names and is as follows.
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in
time. The main output would simply be sorted text with one
fully-qualified file name on each line.
What follows is my first attempt at it. I'd appreciate any feedback on whether I am going about it the right way or whether it could be
improved either in concept or in coding.
There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
one report and another could be made later.
The first, and primary, script generates nothing other than names and is
as follows.
export LC_ALL=C sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.
I am not sure I coded the command right albeit that it seems to work on
test cases.
The output from that starts with lines such as
/
/bin /boot /boot/System.map-5.15.0-101-generic /boot/System.map-5.15.0-102-generic ...etc...
Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.
The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases.
That takes the first file as input and has the following form.
cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2
The output from that is such as
drwxr-xr-x 23 root root 4096 Apr 13 2023 /
lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot ...etc...
As for run times, if anyone's interested, despite the server I ran this
on having multiple locally mounted filesystems and one NFS the initial
tests ran in 90 seconds to generate the first file and 5 minutes to
generate the second, which would mean (as long as no faults are found)
that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each
night.
That's the idea. As I say, comments, advice and criticisms on the idea
or on the coding would be appreciated!
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.
I am not sure I coded the command right albeit that it seems to work on
test cases.
The output from that starts with lines such as
/
/bin
/boot
/boot/System.map-5.15.0-101-generic
/boot/System.map-5.15.0-102-generic
...etc...
Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.
The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases. That takes the first file as input and has the following form.
cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2
On Fri, 12 Apr 2024 13:39:34 +0100,
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
I know just enough linux admin to be dangerous so this is probably
a dumb question, but I'm wondering why use find rather than ls?
On Fri, 12 Apr 2024 13:39:34 +0100, James Harris
As for run times, if anyone's interested, despite the server I ran
this on having multiple locally mounted filesystems and one NFS the
initial tests ran in 90 seconds to generate the first file and 5
minutes to generate the second, which would mean (as long as no
faults are found) that it would be no problem to run at least the
first script whenever required. Other than that, I'd probably also
schedule both to run each night.
Since there is a significant difference in run times, you might want
to try running your first find(1) with the -ls option, instead of using
the pipeline to ls(1). (You could also possibly do it all with one
find(1) command, and use cut(1), awk(1) or perl(1) to split things
up, but my brain isn't fully booted yet this morning to figure that
out. ;) )
Ted Heise <theise@panix.com> wrote:
On Fri, 12 Apr 2024 13:39:34 +0100,
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
I know just enough linux admin to be dangerous so this is
probably a dumb question, but I'm wondering why use find
rather than ls?
Because 'find' is intended to be a filesystem traversal tool,
and it includes the ability to exclude parts of the tree (the
-path -prune invocations above exclude looking in those
sub-trees).
'ls' is meant to display directories to humans, and it has only
a rudimentary ability to walk the tree, and no ability to
exclude parts of the tree you don't want to see.
For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in time. The main output would simply be sorted text with one
fully-qualified file name on each line.
What follows is my first attempt at it. I'd appreciate any feedback on whether I am going about it the right way or whether it could be
improved either in concept or in coding.
There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
one report and another could be made later.
The first, and primary, script generates nothing other than names and is
as follows.
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.
I am not sure I coded the command right albeit that it seems to work on
test cases.
The output from that starts with lines such as
/
/bin
/boot
/boot/System.map-5.15.0-101-generic
/boot/System.map-5.15.0-102-generic
...etc...
Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.
The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases. That takes the first file as input and has the following form.
cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2
The output from that is such as
drwxr-xr-x 23 root root 4096 Apr 13 2023 /
lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot
...etc...
As for run times, if anyone's interested, despite the server I ran this
on having multiple locally mounted filesystems and one NFS the initial
tests ran in 90 seconds to generate the first file and 5 minutes to
generate the second, which would mean (as long as no faults are found)
that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each night.
That's the idea. As I say, comments, advice and criticisms on the idea
or on the coding would be appreciated!
grep supports "null terminated lines" using the "--null-data" so you
could 'grep' null terminated lines files without translating them.
For a number of reasons I am looking for a way of recording a list of<snip>
the files (and file-like objects) on a Unix system at certain points in
time. The main output would simply be sorted text with one
fully-qualified file name on each line.
You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.
One thing, find has a "printf" option, where you can format the output.
you can remove the need for "tr" by using this instead of "-print0".
If you are going to output null terminated filenames (-print0) then
don't almost immediately throw out the nulls by converting them to
newlines.
... filenames can contain pretty much any character, including newlines.
On Fri, 12 Apr 2024 15:29:02 -0000 (UTC), Borax Man wrote:Good point, though I think newlines in a filename are an abomination.
One thing, find has a "printf" option, where you can format the output. you can remove the need for "tr" by using this instead of "-print0".
-print0 can handle arbitrary filenames. Even the find(1) man page, in the section “UNUSUAL FILENAMES”, says
If you are able to decide what format to use for the output of
find then it is normally better to use `\0' as a terminator than
to use newline, as file names can contain white space and newline
characters.
And that is what -print0 does.
... I think newlines in a filename are an abomination.
In my file/directory names, if I feel the urge to have a “/” in the name, I use a “∕” instead.
Lawrence D'Oliveiro wrote:
In my file/directory names, if I feel the urge to have a “/” in the
name, I use a “∕” instead.
Similarly for filenames of BSI documents including amendment numbers, I
will use a U+A789 character instead of a colon ...
On Fri, 12 Apr 2024 13:39:34 +0100, James Harris wrote:
You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.
If you want a list of mounted filesystems, you can either read from /proc/mounts or use the “findmnt” command (Linux-specific).
On 13/04/2024 01:05, Lawrence D'Oliveiro wrote:
If you want a list of mounted filesystems, you can either read from
/proc/mounts or use the “findmnt” command (Linux-specific).
Whats wrong with 'mount' ?
On Sat, 13 Apr 2024 10:41:40 +1000, Borax Man wrote:
... I think newlines in a filename are an abomination.
I avoid them myself. But since they are permitted, if you want your code
to be fully general, you have to deal with them.
On Sat, 13 Apr 2024 08:26:36 +0100, Andy Burns wrote:
Lawrence D'Oliveiro wrote:
In my file/directory names, if I feel the urge to have a “/” in the
name, I use a “∕” instead.
Similarly for filenames of BSI documents including amendment numbers, I
will use a U+A789 character instead of a colon ...
I use U+2236 RATIO. In the “Hack” font I like to use, it looks more like U+003A COLON than U+A789 MODIFIER LETTER COLON does.
I do this in Linux, because some commands (e.g. network-related ones) interpret U+003A COLON as a host-name prefix.
James Harris <james.harris.1@gmail.com> wrote:
export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1
If you are going to output null terminated filenames (-print0) then
don't almost immediately throw out the nulls by converting them to
newlines. The purpose of -print0 and the nulls is to avoid *any*
problems with any filename character (i.e., a filename /can/ contain a newline). If, by any chance, you have even one filename with a
newline in the name, converting the nulls to newlines for storage will
break the storage file (i.e., you can't differentate the "newlines
ending filenames" from the "newlines that belong inside a filename").
Convert the nulls to newlines only when you want to view with less,
then your "files of records" are not corrupt from the start.:
tr '\0' '\n' < /tmp/f1 | less ; or
< /tmp/f1 tr '\0' '\n' | less ; if you prefer the input file on the
left
However, if you want 'data' on the files, you'd be better off using the 'stat' command, as it is intended to 'aquire' meta-data about files
(and by far more meta-data than what ls shows).
You'd feed it filenames with xargs as well:
xargs -0 stat --format="%i %h %n" > /tmp/f2
To output inode, number of links, and name (granted, you can get this
from ls as well, but look through the stat man page, there is a lot
more stuff you can pull out).
The one tool that does not (yet) seem to consume "null terminated
lines" is diff, and you can work around that by converting to newlines
at the time you do the diff:
diff -u <(tr '\0' '\n' < /tmp/f1) <(tr '\0' '\n' < /tmp/f2)
And, note, all of these "convert nulls to newlines at time of need" can
be scripted, so you could have a "file-list-diff" script that contains:
#!/bin/bash
diff -u <(tr '\0' '\n' < "$1") <(tr '\0' '\n' < "$2")
And then you can diff two of your audit files by:
file-list-diff /tmp/f1 /tmp/f2
And not have to remember the tr invocation and process substitution
syntax to do the same at every call.
I am trying to do the zero termination just now but have run in to a problem. The above find command may report errors such as permission failures and missing files. I really should include such info in the
output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.
To get around that I tried
find 2> >(tr '\n' '\0')
I am trying to do the zero termination just now but have run in to a
problem. The above find command may report errors such as permission
failures and missing files. I really should include such info in the
output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 918 |
Nodes: | 10 (2 / 8) |
Uptime: | 14:12:36 |
Calls: | 12,177 |
Calls today: | 7 |
Files: | 186,522 |
Messages: | 2,235,169 |