Forum: War Ensemble BBS

Files tree

From James Harris@james.harris.1@gmail.com to comp.os.linux.misc on Fri Apr 12 13:39:34 2024

From Newsgroup: comp.os.linux.misc

For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in
time. The main output would simply be sorted text with one
fully-qualified file name on each line.

What follows is my first attempt at it. I'd appreciate any feedback on
whether I am going about it the right way or whether it could be
improved either in concept or in coding.

There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
one report and another could be made later.

The first, and primary, script generates nothing other than names and is
as follows.

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.

I am not sure I coded the command right albeit that it seems to work on
test cases.

The output from that starts with lines such as

/
/bin
/boot
/boot/System.map-5.15.0-101-generic
/boot/System.map-5.15.0-102-generic
...etc...

Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.

The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases.
That takes the first file as input and has the following form.

cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2

The output from that is such as

drwxr-xr-x 23 root root 4096 Apr 13 2023 /
lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin
drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot
...etc...

As for run times, if anyone's interested, despite the server I ran this
on having multiple locally mounted filesystems and one NFS the initial
tests ran in 90 seconds to generate the first file and 5 minutes to
generate the second, which would mean (as long as no faults are found)
that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each
night.

That's the idea. As I say, comments, advice and criticisms on the idea
or on the coding would be appreciated!
--
James Harris

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ted Heise@theise@panix.com to comp.os.linux.misc on Fri Apr 12 13:07:20 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 13:39:34 +0100,
James Harris <james.harris.1@gmail.com> wrote:

For a number of reasons I am looking for a way of recording a
list of the files (and file-like objects) on a Unix system at
certain points in time. The main output would simply be sorted
text with one fully-qualified file name on each line.

What follows is my first attempt at it. I'd appreciate any
feedback on whether I am going about it the right way or
whether it could be improved either in concept or in coding.

There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is
that the reports would be stored in timestamped files so that
comparisons between one report and another could be made later.

The first, and primary, script generates nothing other than
names and is as follows.

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

I know just enough linux admin to be dangerous so this is probably
a dumb question, but I'm wondering why use find rather than ls?
--
Ted Heise <theise@panix.com> West Lafayette, IN, USA
--- Synchronet 3.20a-Linux NewsLink 1.114

From vallor@vallor@cultnix.org to comp.os.linux.misc on Fri Apr 12 14:08:02 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 13:39:34 +0100, James Harris
<james.harris.1@gmail.com> wrote in <uvba27$2c40q$1@dont-email.me>:

For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in
time. The main output would simply be sorted text with one
fully-qualified file name on each line.

What follows is my first attempt at it. I'd appreciate any feedback on whether I am going about it the right way or whether it could be
improved either in concept or in coding.

There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
one report and another could be made later.

The first, and primary, script generates nothing other than names and is
as follows.

export LC_ALL=C sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

Filenames with newlines will contain your record delimiter, which
will be passed through without modification. You might want
to rethink this.

You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.

I am not sure I coded the command right albeit that it seems to work on
test cases.

The output from that starts with lines such as

/
/bin /boot /boot/System.map-5.15.0-101-generic /boot/System.map-5.15.0-102-generic ...etc...

Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.

The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases.
That takes the first file as input and has the following form.

cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2

The output from that is such as

drwxr-xr-x 23 root root 4096 Apr 13 2023 /
lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot ...etc...

As for run times, if anyone's interested, despite the server I ran this
on having multiple locally mounted filesystems and one NFS the initial
tests ran in 90 seconds to generate the first file and 5 minutes to
generate the second, which would mean (as long as no faults are found)
that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each
night.

Since there is a significant difference in run times, you might want
to try running your first find(1) with the -ls option, instead of using
the pipeline to ls(1). (You could also possibly do it all with one
find(1) command, and use cut(1), awk(1) or perl(1) to split things
up, but my brain isn't fully booted yet this morning to figure that
out. ;) )

That's the idea. As I say, comments, advice and criticisms on the idea
or on the coding would be appreciated!

A commendable first effort! Just be careful -- filenames can contain
pretty much any character, including newlines.

BTW, there is a newsgroup "comp.unix.shell" that is alive and
active, if you were inclined to broaden your audience.
--
-v
--- Synchronet 3.20a-Linux NewsLink 1.114

From Rich@rich@example.invalid to comp.os.linux.misc on Fri Apr 12 14:13:10 2024

From Newsgroup: comp.os.linux.misc

James Harris <james.harris.1@gmail.com> wrote:

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

If you are going to output null terminated filenames (-print0) then
don't almost immediately throw out the nulls by converting them to
newlines. The purpose of -print0 and the nulls is to avoid *any*
problems with any filename character (i.e., a filename /can/ contain a newline). If, by any chance, you have even one filename with a
newline in the name, converting the nulls to newlines for storage will
break the storage file (i.e., you can't differentate the "newlines
ending filenames" from the "newlines that belong inside a filename").

Convert the nulls to newlines only when you want to view with less,
then your "files of records" are not corrupt from the start.:

tr '\0' '\n' < /tmp/f1 | less ; or

< /tmp/f1 tr '\0' '\n' | less ; if you prefer the input file on the
left

You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.

You get to decide which 'files' are important to track for whatever it
is you are wanting to do.

I am not sure I coded the command right albeit that it seems to work on
test cases.

It will work, until that day you end up with a file name containing a
newline, then it will break for that filename. Now, if you never
encounter a filename with a newline it won't break. But the change to
make sure it never breaks is trivial.

The output from that starts with lines such as

/
/bin
/boot
/boot/System.map-5.15.0-101-generic
/boot/System.map-5.15.0-102-generic
...etc...

Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.

grep supports "null terminated lines" using the "--null-data" so you
could 'grep' null terminated lines files without translating them.

The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases. That takes the first file as input and has the following form.

cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2

Here's where you will /break/ things if you ever encounter a file with
a newline in its filename, converting the newlines back to nulls
results in that filename with an embedded newline becoming "two lines"
(two files) -- and neither will be found.

If you kept the nulls in f1, your above can be:

xargs -0 ls -ld < /tmp/f2 > /tmp/f2

However, if you want 'data' on the files, you'd be better off using the
'stat' command, as it is intended to 'aquire' meta-data about files
(and by far more meta-data than what ls shows).

You'd feed it filenames with xargs as well:

xargs -0 stat --format="%i %h %n" > /tmp/f2

To output inode, number of links, and name (granted, you can get this
from ls as well, but look through the stat man page, there is a lot
more stuff you can pull out).

The one tool that does not (yet) seem to consume "null terminated
lines" is diff, and you can work around that by converting to newlines
at the time you do the diff:

diff -u <(tr '\0' '\n' < /tmp/f1) <(tr '\0' '\n' < /tmp/f2)

And, note, all of these "convert nulls to newlines at time of need" can
be scripted, so you could have a "file-list-diff" script that contains:

#!/bin/bash
diff -u <(tr '\0' '\n' < "$1") <(tr '\0' '\n' < "$2")

And then you can diff two of your audit files by:

file-list-diff /tmp/f1 /tmp/f2

And not have to remember the tr invocation and process substitution
syntax to do the same at every call.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Rich@rich@example.invalid to comp.os.linux.misc on Fri Apr 12 14:15:49 2024

From Newsgroup: comp.os.linux.misc

Ted Heise <theise@panix.com> wrote:

On Fri, 12 Apr 2024 13:39:34 +0100,

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

I know just enough linux admin to be dangerous so this is probably
a dumb question, but I'm wondering why use find rather than ls?

Because 'find' is intended to be a filesystem traversal tool, and it
includes the ability to exclude parts of the tree (the -path -prune invocations above exclude looking in those sub-trees).

'ls' is meant to display directories to humans, and it has only a
rudimentary ability to walk the tree, and no ability to exclude parts
of the tree you don't want to see.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Rich@rich@example.invalid to comp.os.linux.misc on Fri Apr 12 14:19:01 2024

From Newsgroup: comp.os.linux.misc

vallor <vallor@cultnix.org> wrote:

On Fri, 12 Apr 2024 13:39:34 +0100, James Harris

As for run times, if anyone's interested, despite the server I ran
this on having multiple locally mounted filesystems and one NFS the
initial tests ran in 90 seconds to generate the first file and 5
minutes to generate the second, which would mean (as long as no
faults are found) that it would be no problem to run at least the
first script whenever required. Other than that, I'd probably also
schedule both to run each night.

Since there is a significant difference in run times, you might want
to try running your first find(1) with the -ls option, instead of using
the pipeline to ls(1). (You could also possibly do it all with one
find(1) command, and use cut(1), awk(1) or perl(1) to split things
up, but my brain isn't fully booted yet this morning to figure that
out. ;) )

The difference in runtime is not because of the pipeline.

The runtime difference is the "find" pass only has to retreive the
filenames, while the second "ls" pass has to also retreive all the
metadata (i.e., read the inodes).

Retreiving only filenames is less disk IO, and far fewer seeks (if on a spinning rust disk), than retreiving all the file metadata.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ted Heise@theise@panix.com to comp.os.linux.misc on Fri Apr 12 14:25:15 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 14:15:49 -0000 (UTC),
Rich <rich@example.invalid> wrote:

Ted Heise <theise@panix.com> wrote:

On Fri, 12 Apr 2024 13:39:34 +0100,

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

I know just enough linux admin to be dangerous so this is
probably a dumb question, but I'm wondering why use find
rather than ls?

Because 'find' is intended to be a filesystem traversal tool,
and it includes the ability to exclude parts of the tree (the
-path -prune invocations above exclude looking in those
sub-trees).

'ls' is meant to display directories to humans, and it has only
a rudimentary ability to walk the tree, and no ability to
exclude parts of the tree you don't want to see.

Okay, that makes sense. Thanks for helping educate me!
--
Ted Heise <theise@panix.com> West Lafayette, IN, USA
--- Synchronet 3.20a-Linux NewsLink 1.114

From Borax Man@rotflol2@hotmail.com to comp.os.linux.misc on Fri Apr 12 15:29:02 2024

From Newsgroup: comp.os.linux.misc

On 2024-04-12, James Harris <james.harris.1@gmail.com> wrote:

For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in time. The main output would simply be sorted text with one
fully-qualified file name on each line.

What follows is my first attempt at it. I'd appreciate any feedback on whether I am going about it the right way or whether it could be
improved either in concept or in coding.

There are two tiny scripts. In the examples below they write to
temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
one report and another could be made later.

The first, and primary, script generates nothing other than names and is
as follows.

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.

I am not sure I coded the command right albeit that it seems to work on
test cases.

The output from that starts with lines such as

/
/bin
/boot
/boot/System.map-5.15.0-101-generic
/boot/System.map-5.15.0-102-generic
...etc...

Such a form would be ideal for input to grep and diff to look for
relevant files that have been added or removed between any two runs.

The second, and less important, part is to store (in a separate file)
info about each of the file names as that may be relevant in some cases. That takes the first file as input and has the following form.

cat /tmp/f1 |\
tr '\n' '\0' |\
xargs -0 sudo ls -ld > /tmp/f2

The output from that is such as

drwxr-xr-x 23 root root 4096 Apr 13 2023 /
lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot
...etc...

As for run times, if anyone's interested, despite the server I ran this
on having multiple locally mounted filesystems and one NFS the initial
tests ran in 90 seconds to generate the first file and 5 minutes to
generate the second, which would mean (as long as no faults are found)
that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each night.

That's the idea. As I say, comments, advice and criticisms on the idea
or on the coding would be appreciated!

One thing, find has a "printf" option, where you can format the
output. you can remove the need for "tr" by using this instead of
"-print0".

-printf "%P\n"

That will also remove the leading slash, which I think is a good idea
in this case. Use the lower case p to keep the starting point of the
file and have the leading path.

If you are wanting to validate a directory tree, that is, see if it
has changed, I would recommend using mtree. It's available in debian
under the mtree-bsd package.

Mtree can output a list of files, plus other attributes to a spec
file, and can tell you later, according to the spec file, what changes
have been made. The problem with your "find" method, is you can't
tell if a file has simply been modified.

Using mtree, you can do two things. One generate a specification
file, which is really a list of files plus selected attributes at any
point in time AND, see what changes have been made. As a bonus, you
can get it to output the spec in a simple format, using the "-C"
option, and you get output very similar to "find" with a little extra
info tacked on, which you could remove using a pipe.

You could output to a spec file which as the date in the filename,
then run mtree against any previous spec file to see what has changes
between that spec and the current state.

If you just want the list of files, find works fine, with the
suggestion I made about printing the filename, but have a look at
mtree because I think it will save you a bit of coding.

Thats the thing with Linux, or computing in general, its likely what
you thought of has already been done, and there is a tool which does
it, or easily adapated to do it.
--- Synchronet 3.20a-Linux NewsLink 1.114

From vallor@vallor@cultnix.org to comp.os.linux.misc on Fri Apr 12 15:38:40 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 14:13:10 -0000 (UTC), Rich <rich@example.invalid>
wrote in <uvbfhm$2d8f0$1@dont-email.me>:

grep supports "null terminated lines" using the "--null-data" so you
could 'grep' null terminated lines files without translating them.

I didn't know about that one -- thank you for posting about it!
--
-v
--- Synchronet 3.20a-Linux NewsLink 1.114

From David W. Hodgins@dwhodgins@nomail.afraid.org to comp.os.linux.misc on Fri Apr 12 12:25:54 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 08:39:34 -0400, James Harris <james.harris.1@gmail.com> wrote:

For a number of reasons I am looking for a way of recording a list of
the files (and file-like objects) on a Unix system at certain points in
time. The main output would simply be sorted text with one
fully-qualified file name on each line.

<snip>

Use a command that's designed for that purpose. https://oldmanprogrammer.net/source.php?dir=projects/tree

Using "tree --noreport -ifax /" lists all files on the root file system, excluding other file systems such as /dev.
Repeat the command for each file system you want to include.
For a file with control characters in the name, it uses octal escape codes.
For example, in a directory called test with a file called "some$'\n'file",
it shows ...
$ tree --noreport -ifax /home/dave/test
/home/dave/test
/home/dave/test/some\012file

Check your distro to see if it already has the tree package available.
Mageia has it available, but not installed unless specifically requested.

Regards, Dave Hodgins
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:05:29 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 13:39:34 +0100, James Harris wrote:

You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.

If you want a list of mounted filesystems, you can either read from /proc/mounts or use the “findmnt” command (Linux-specific).
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:23:31 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 15:29:02 -0000 (UTC), Borax Man wrote:

One thing, find has a "printf" option, where you can format the output.
you can remove the need for "tr" by using this instead of "-print0".

-print0 can handle arbitrary filenames. Even the find(1) man page, in the section “UNUSUAL FILENAMES”, says

If you are able to decide what format to use for the output of
find then it is normally better to use `\0' as a terminator than
to use newline, as file names can contain white space and newline
characters.

And that is what -print0 does.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:25:19 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 14:13:10 -0000 (UTC), Rich wrote:

If you are going to output null terminated filenames (-print0) then
don't almost immediately throw out the nulls by converting them to
newlines.

And also note that Bash has a technique to read those null-terminated pathnames, stopping at the null without being confused by any newlines
prior to that.

You can then use “printf %q” to output your report with those funny characters properly escaped.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:28:21 2024

From Newsgroup: comp.os.linux.misc

On Fri, 12 Apr 2024 14:08:02 -0000 (UTC), vallor wrote:

... filenames can contain pretty much any character, including newlines.

Pathnames can contain any character except null.

Components of pathnames (file names, directory names) can contain any character except “/” (the path separator) and null.

So “/” in a pathname is used to split it into components: zero or more directory names, ending in a directory or file name.

In my file/directory names, if I feel the urge to have a “/” in the name, I use a “∕” instead.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Borax Man@rotflol2@hotmail.com to comp.os.linux.misc on Sat Apr 13 10:41:40 2024

From Newsgroup: comp.os.linux.misc

On Sat, 13 Apr 2024 00:23:31 -0000 (UTC)
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:

On Fri, 12 Apr 2024 15:29:02 -0000 (UTC), Borax Man wrote:

One thing, find has a "printf" option, where you can format the output. you can remove the need for "tr" by using this instead of "-print0".

-print0 can handle arbitrary filenames. Even the find(1) man page, in the section “UNUSUAL FILENAMES”, says

If you are able to decide what format to use for the output of
find then it is normally better to use `\0' as a terminator than
to use newline, as file names can contain white space and newline
characters.

And that is what -print0 does.

Good point, though I think newlines in a filename are an abomination.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 03:14:55 2024

From Newsgroup: comp.os.linux.misc

On Sat, 13 Apr 2024 10:41:40 +1000, Borax Man wrote:

... I think newlines in a filename are an abomination.

I avoid them myself. But since they are permitted, if you want your code
to be fully general, you have to deal with them.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Andy Burns@usenet@andyburns.uk to comp.os.linux.misc on Sat Apr 13 08:26:36 2024

From Newsgroup: comp.os.linux.misc

Lawrence D'Oliveiro wrote:

In my file/directory names, if I feel the urge to have a “/” in the name, I use a “∕” instead.

Similarly for filenames of BSI documents including amendment numbers, I
will use a U+A789 character instead of a colon, to keep Windows happy.

e.g. BS 7671:2018+A2:2022.pdf
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 07:46:59 2024

From Newsgroup: comp.os.linux.misc

On Sat, 13 Apr 2024 08:26:36 +0100, Andy Burns wrote:

Lawrence D'Oliveiro wrote:

In my file/directory names, if I feel the urge to have a “/” in the
name, I use a “∕” instead.

Similarly for filenames of BSI documents including amendment numbers, I
will use a U+A789 character instead of a colon ...

I use U+2236 RATIO. In the “Hack” font I like to use, it looks more like U+003A COLON than U+A789 MODIFIER LETTER COLON does.

I do this in Linux, because some commands (e.g. network-related ones) interpret U+003A COLON as a host-name prefix.
--- Synchronet 3.20a-Linux NewsLink 1.114

From The Natural Philosopher@tnp@invalid.invalid to comp.os.linux.misc on Sat Apr 13 08:57:27 2024

From Newsgroup: comp.os.linux.misc

On 13/04/2024 01:05, Lawrence D'Oliveiro wrote:

On Fri, 12 Apr 2024 13:39:34 +0100, James Harris wrote:

You'll see I made some choices such as to omit files from /proc but not
from /dev, for example, to record any lost+found contents, to record
mounted filesystems, to show just one level of /tmp, etc.

If you want a list of mounted filesystems, you can either read from /proc/mounts or use the “findmnt” command (Linux-specific).

Whats wrong with 'mount' ?
--
“Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong remedies.”
― Groucho Marx

--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 22:01:50 2024

From Newsgroup: comp.os.linux.misc

On Sat, 13 Apr 2024 08:57:27 +0100, The Natural Philosopher wrote:

On 13/04/2024 01:05, Lawrence D'Oliveiro wrote:

If you want a list of mounted filesystems, you can either read from
/proc/mounts or use the “findmnt” command (Linux-specific).

Whats wrong with 'mount' ?

The man page explains.
--- Synchronet 3.20a-Linux NewsLink 1.114

From candycanearter07@candycanearter07@candycanearter07.nomail.afraid to comp.os.linux.misc on Mon Apr 15 15:40:08 2024

From Newsgroup: comp.os.linux.misc

Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 03:14 this Saturday (GMT):

On Sat, 13 Apr 2024 10:41:40 +1000, Borax Man wrote:

... I think newlines in a filename are an abomination.

I avoid them myself. But since they are permitted, if you want your code
to be fully general, you have to deal with them.

Unfortunately.
--
user <candycane> is generated from /dev/urandom
--- Synchronet 3.20a-Linux NewsLink 1.114

From candycanearter07@candycanearter07@candycanearter07.nomail.afraid to comp.os.linux.misc on Mon Apr 15 15:40:09 2024

From Newsgroup: comp.os.linux.misc

Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 07:46 this Saturday (GMT):

On Sat, 13 Apr 2024 08:26:36 +0100, Andy Burns wrote:

Lawrence D'Oliveiro wrote:

In my file/directory names, if I feel the urge to have a “/” in the
name, I use a “∕” instead.

Similarly for filenames of BSI documents including amendment numbers, I
will use a U+A789 character instead of a colon ...

I use U+2236 RATIO. In the “Hack” font I like to use, it looks more like U+003A COLON than U+A789 MODIFIER LETTER COLON does.

I do this in Linux, because some commands (e.g. network-related ones) interpret U+003A COLON as a host-name prefix.

That seems very inconvenient.
--
user <candycane> is generated from /dev/urandom
--- Synchronet 3.20a-Linux NewsLink 1.114

From James Harris@james.harris.1@gmail.com to comp.os.linux.misc on Sat Apr 20 16:23:27 2024

From Newsgroup: comp.os.linux.misc

On 12/04/2024 15:13, Rich wrote:

James Harris <james.harris.1@gmail.com> wrote:

export LC_ALL=C
sudo find /\
-path "/proc/*" -prune -o\
-path "/run/*" -prune -o\
-path "/sys/*" -prune -o\
-path "/tmp/*/*" -prune -o\
-print0 | sort -z | tr '\0' '\n' > /tmp/f1

If you are going to output null terminated filenames (-print0) then
don't almost immediately throw out the nulls by converting them to
newlines. The purpose of -print0 and the nulls is to avoid *any*
problems with any filename character (i.e., a filename /can/ contain a newline). If, by any chance, you have even one filename with a
newline in the name, converting the nulls to newlines for storage will
break the storage file (i.e., you can't differentate the "newlines
ending filenames" from the "newlines that belong inside a filename").

Convert the nulls to newlines only when you want to view with less,
then your "files of records" are not corrupt from the start.:

tr '\0' '\n' < /tmp/f1 | less ; or

< /tmp/f1 tr '\0' '\n' | less ; if you prefer the input file on the
left

I am trying to do the zero termination just now but have run in to a
problem. The above find command may report errors such as permission
failures and missing files. I really should include such info in the
output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.

To get around that I tried

find 2> >(tr '\n' '\0')

That partly works. After sorting, error messages appear at the end (they
begin with 'f' where non-error lines begin with '/') which is fine but
there is usually one garbled line between good results and error
messages and very likely some other corruption elsewhere in the file.

I guess that's due to output buffering but even

stdbuf -oL -eL find 2> >(stdbuf -oL -eL tr '\n' '\0')

doesn't work. It is already getting well beyond my comfort zone and is
getting increasingly complex which, it has to be said, would not be the
case with newline terminators.

Hence this post to ask for suggestions on where to go next.

I guess I could write find's stdout and stderr to temp files, sed the
stderr data, convert newlines to nuls, combine with the stdout data and
then I'll be back on track and can sort the result. But before I do that
I thought to check back for suggestions. Are there any simpler ways?

...

However, if you want 'data' on the files, you'd be better off using the 'stat' command, as it is intended to 'aquire' meta-data about files
(and by far more meta-data than what ls shows).

You'd feed it filenames with xargs as well:

xargs -0 stat --format="%i %h %n" > /tmp/f2

To output inode, number of links, and name (granted, you can get this
from ls as well, but look through the stat man page, there is a lot
more stuff you can pull out).

Agreed, stat would be better.

The one tool that does not (yet) seem to consume "null terminated
lines" is diff, and you can work around that by converting to newlines
at the time you do the diff:

diff -u <(tr '\0' '\n' < /tmp/f1) <(tr '\0' '\n' < /tmp/f2)

And, note, all of these "convert nulls to newlines at time of need" can
be scripted, so you could have a "file-list-diff" script that contains:

#!/bin/bash
diff -u <(tr '\0' '\n' < "$1") <(tr '\0' '\n' < "$2")

And then you can diff two of your audit files by:

file-list-diff /tmp/f1 /tmp/f2

And not have to remember the tr invocation and process substitution
syntax to do the same at every call.

Understood. Thanks for the clear info.
--
James Harris

--- Synchronet 3.20a-Linux NewsLink 1.114

From Rich@rich@example.invalid to comp.os.linux.misc on Sat Apr 20 18:18:22 2024

From Newsgroup: comp.os.linux.misc

James Harris <james.harris.1@gmail.com> wrote:

I am trying to do the zero termination just now but have run in to a problem. The above find command may report errors such as permission failures and missing files. I really should include such info in the
output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.

To get around that I tried

find 2> >(tr '\n' '\0')

Since 'errors' are not "filenames" but rather "information about
problems" (likely problems you'd want to investigate), don't try to
zero terminate them and include them in the "filename list". Instead
write the error stream to a separate file:

find 2> error-stream

And check the size of the "error-stream" (substitute your own name, or
a temporary name for "error-stream") file when the process is finished.

If the "error-stream" file is zero bytes, no issues occurred (well,
none that 'find' reported). You can continue with the rest of what
your process is doing, and remove the empty 'error-stream' file.

If the "error-stream" file is non-zero, save it somewhere that you can
find it (and relate it to which run it was for) and raise alarm bells
to yourself to investigate what caused the error(s).

--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 20 22:39:58 2024

From Newsgroup: comp.os.linux.misc

On Sat, 20 Apr 2024 16:23:27 +0100, James Harris wrote:

I am trying to do the zero termination just now but have run in to a
problem. The above find command may report errors such as permission
failures and missing files. I really should include such info in the
output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.

This is the point at which to give up on trying to do it in a shell
script, and switch to using a proper programming language that gives you
more control over what is going on.

I recommend Python.
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online
Recent Visitors
- Dave Perfect
  Fri May 3 16:23:05 2024
  from Las Vegas, Nv via Telnet
- Dave Perfect
  Fri May 3 16:10:48 2024
  from Las Vegas, Nv via Telnet
- Grey Gamer
  Fri May 3 09:57:43 2024
  from Show Low, Az via Telnet
- Grey Gamer
  Fri May 3 09:45:46 2024
  from Show Low, Az via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	918
Nodes:	10 (2 / 8)
Uptime:	14:12:36
Calls:	12,177
Calls today:	7
Files:	186,522
Messages:	2,235,169

Files tree

Who's Online

Recent Visitors

System Info