• Files tree

    From James Harris@james.harris.1@gmail.com to comp.os.linux.misc on Fri Apr 12 13:39:34 2024
    From Newsgroup: comp.os.linux.misc

    For a number of reasons I am looking for a way of recording a list of
    the files (and file-like objects) on a Unix system at certain points in
    time. The main output would simply be sorted text with one
    fully-qualified file name on each line.

    What follows is my first attempt at it. I'd appreciate any feedback on
    whether I am going about it the right way or whether it could be
    improved either in concept or in coding.

    There are two tiny scripts. In the examples below they write to
    temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
    one report and another could be made later.

    The first, and primary, script generates nothing other than names and is
    as follows.

    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    You'll see I made some choices such as to omit files from /proc but not
    from /dev, for example, to record any lost+found contents, to record
    mounted filesystems, to show just one level of /tmp, etc.

    I am not sure I coded the command right albeit that it seems to work on
    test cases.

    The output from that starts with lines such as

    /
    /bin
    /boot
    /boot/System.map-5.15.0-101-generic
    /boot/System.map-5.15.0-102-generic
    ...etc...

    Such a form would be ideal for input to grep and diff to look for
    relevant files that have been added or removed between any two runs.

    The second, and less important, part is to store (in a separate file)
    info about each of the file names as that may be relevant in some cases.
    That takes the first file as input and has the following form.

    cat /tmp/f1 |\
    tr '\n' '\0' |\
    xargs -0 sudo ls -ld > /tmp/f2

    The output from that is such as

    drwxr-xr-x 23 root root 4096 Apr 13 2023 /
    lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin
    drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot
    ...etc...

    As for run times, if anyone's interested, despite the server I ran this
    on having multiple locally mounted filesystems and one NFS the initial
    tests ran in 90 seconds to generate the first file and 5 minutes to
    generate the second, which would mean (as long as no faults are found)
    that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each
    night.

    That's the idea. As I say, comments, advice and criticisms on the idea
    or on the coding would be appreciated!
    --
    James Harris

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ted Heise@theise@panix.com to comp.os.linux.misc on Fri Apr 12 13:07:20 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 13:39:34 +0100,
    James Harris <james.harris.1@gmail.com> wrote:
    For a number of reasons I am looking for a way of recording a
    list of the files (and file-like objects) on a Unix system at
    certain points in time. The main output would simply be sorted
    text with one fully-qualified file name on each line.

    What follows is my first attempt at it. I'd appreciate any
    feedback on whether I am going about it the right way or
    whether it could be improved either in concept or in coding.

    There are two tiny scripts. In the examples below they write to
    temporary files f1 and f2 to test the mechanism but the idea is
    that the reports would be stored in timestamped files so that
    comparisons between one report and another could be made later.

    The first, and primary, script generates nothing other than
    names and is as follows.

    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    I know just enough linux admin to be dangerous so this is probably
    a dumb question, but I'm wondering why use find rather than ls?
    --
    Ted Heise <theise@panix.com> West Lafayette, IN, USA
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From vallor@vallor@cultnix.org to comp.os.linux.misc on Fri Apr 12 14:08:02 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 13:39:34 +0100, James Harris
    <james.harris.1@gmail.com> wrote in <uvba27$2c40q$1@dont-email.me>:

    For a number of reasons I am looking for a way of recording a list of
    the files (and file-like objects) on a Unix system at certain points in
    time. The main output would simply be sorted text with one
    fully-qualified file name on each line.

    What follows is my first attempt at it. I'd appreciate any feedback on whether I am going about it the right way or whether it could be
    improved either in concept or in coding.

    There are two tiny scripts. In the examples below they write to
    temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
    one report and another could be made later.

    The first, and primary, script generates nothing other than names and is
    as follows.

    export LC_ALL=C sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    Filenames with newlines will contain your record delimiter, which
    will be passed through without modification. You might want
    to rethink this.


    You'll see I made some choices such as to omit files from /proc but not
    from /dev, for example, to record any lost+found contents, to record
    mounted filesystems, to show just one level of /tmp, etc.

    I am not sure I coded the command right albeit that it seems to work on
    test cases.

    The output from that starts with lines such as

    /
    /bin /boot /boot/System.map-5.15.0-101-generic /boot/System.map-5.15.0-102-generic ...etc...

    Such a form would be ideal for input to grep and diff to look for
    relevant files that have been added or removed between any two runs.

    The second, and less important, part is to store (in a separate file)
    info about each of the file names as that may be relevant in some cases.
    That takes the first file as input and has the following form.

    cat /tmp/f1 |\
    tr '\n' '\0' |\
    xargs -0 sudo ls -ld > /tmp/f2

    The output from that is such as

    drwxr-xr-x 23 root root 4096 Apr 13 2023 /
    lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot ...etc...

    As for run times, if anyone's interested, despite the server I ran this
    on having multiple locally mounted filesystems and one NFS the initial
    tests ran in 90 seconds to generate the first file and 5 minutes to
    generate the second, which would mean (as long as no faults are found)
    that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each
    night.

    Since there is a significant difference in run times, you might want
    to try running your first find(1) with the -ls option, instead of using
    the pipeline to ls(1). (You could also possibly do it all with one
    find(1) command, and use cut(1), awk(1) or perl(1) to split things
    up, but my brain isn't fully booted yet this morning to figure that
    out. ;) )


    That's the idea. As I say, comments, advice and criticisms on the idea
    or on the coding would be appreciated!

    A commendable first effort! Just be careful -- filenames can contain
    pretty much any character, including newlines.

    BTW, there is a newsgroup "comp.unix.shell" that is alive and
    active, if you were inclined to broaden your audience.
    --
    -v
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rich@rich@example.invalid to comp.os.linux.misc on Fri Apr 12 14:13:10 2024
    From Newsgroup: comp.os.linux.misc

    James Harris <james.harris.1@gmail.com> wrote:
    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    If you are going to output null terminated filenames (-print0) then
    don't almost immediately throw out the nulls by converting them to
    newlines. The purpose of -print0 and the nulls is to avoid *any*
    problems with any filename character (i.e., a filename /can/ contain a newline). If, by any chance, you have even one filename with a
    newline in the name, converting the nulls to newlines for storage will
    break the storage file (i.e., you can't differentate the "newlines
    ending filenames" from the "newlines that belong inside a filename").

    Convert the nulls to newlines only when you want to view with less,
    then your "files of records" are not corrupt from the start.:

    tr '\0' '\n' < /tmp/f1 | less ; or

    < /tmp/f1 tr '\0' '\n' | less ; if you prefer the input file on the
    left

    You'll see I made some choices such as to omit files from /proc but not
    from /dev, for example, to record any lost+found contents, to record
    mounted filesystems, to show just one level of /tmp, etc.

    You get to decide which 'files' are important to track for whatever it
    is you are wanting to do.

    I am not sure I coded the command right albeit that it seems to work on
    test cases.

    It will work, until that day you end up with a file name containing a
    newline, then it will break for that filename. Now, if you never
    encounter a filename with a newline it won't break. But the change to
    make sure it never breaks is trivial.

    The output from that starts with lines such as

    /
    /bin
    /boot
    /boot/System.map-5.15.0-101-generic
    /boot/System.map-5.15.0-102-generic
    ...etc...

    Such a form would be ideal for input to grep and diff to look for
    relevant files that have been added or removed between any two runs.

    grep supports "null terminated lines" using the "--null-data" so you
    could 'grep' null terminated lines files without translating them.

    The second, and less important, part is to store (in a separate file)
    info about each of the file names as that may be relevant in some cases. That takes the first file as input and has the following form.

    cat /tmp/f1 |\
    tr '\n' '\0' |\
    xargs -0 sudo ls -ld > /tmp/f2

    Here's where you will /break/ things if you ever encounter a file with
    a newline in its filename, converting the newlines back to nulls
    results in that filename with an embedded newline becoming "two lines"
    (two files) -- and neither will be found.

    If you kept the nulls in f1, your above can be:

    xargs -0 ls -ld < /tmp/f2 > /tmp/f2

    However, if you want 'data' on the files, you'd be better off using the
    'stat' command, as it is intended to 'aquire' meta-data about files
    (and by far more meta-data than what ls shows).

    You'd feed it filenames with xargs as well:

    xargs -0 stat --format="%i %h %n" > /tmp/f2

    To output inode, number of links, and name (granted, you can get this
    from ls as well, but look through the stat man page, there is a lot
    more stuff you can pull out).

    The one tool that does not (yet) seem to consume "null terminated
    lines" is diff, and you can work around that by converting to newlines
    at the time you do the diff:

    diff -u <(tr '\0' '\n' < /tmp/f1) <(tr '\0' '\n' < /tmp/f2)

    And, note, all of these "convert nulls to newlines at time of need" can
    be scripted, so you could have a "file-list-diff" script that contains:

    #!/bin/bash
    diff -u <(tr '\0' '\n' < "$1") <(tr '\0' '\n' < "$2")

    And then you can diff two of your audit files by:

    file-list-diff /tmp/f1 /tmp/f2

    And not have to remember the tr invocation and process substitution
    syntax to do the same at every call.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rich@rich@example.invalid to comp.os.linux.misc on Fri Apr 12 14:15:49 2024
    From Newsgroup: comp.os.linux.misc

    Ted Heise <theise@panix.com> wrote:
    On Fri, 12 Apr 2024 13:39:34 +0100,
    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    I know just enough linux admin to be dangerous so this is probably
    a dumb question, but I'm wondering why use find rather than ls?

    Because 'find' is intended to be a filesystem traversal tool, and it
    includes the ability to exclude parts of the tree (the -path -prune invocations above exclude looking in those sub-trees).

    'ls' is meant to display directories to humans, and it has only a
    rudimentary ability to walk the tree, and no ability to exclude parts
    of the tree you don't want to see.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rich@rich@example.invalid to comp.os.linux.misc on Fri Apr 12 14:19:01 2024
    From Newsgroup: comp.os.linux.misc

    vallor <vallor@cultnix.org> wrote:
    On Fri, 12 Apr 2024 13:39:34 +0100, James Harris
    As for run times, if anyone's interested, despite the server I ran
    this on having multiple locally mounted filesystems and one NFS the
    initial tests ran in 90 seconds to generate the first file and 5
    minutes to generate the second, which would mean (as long as no
    faults are found) that it would be no problem to run at least the
    first script whenever required. Other than that, I'd probably also
    schedule both to run each night.

    Since there is a significant difference in run times, you might want
    to try running your first find(1) with the -ls option, instead of using
    the pipeline to ls(1). (You could also possibly do it all with one
    find(1) command, and use cut(1), awk(1) or perl(1) to split things
    up, but my brain isn't fully booted yet this morning to figure that
    out. ;) )

    The difference in runtime is not because of the pipeline.

    The runtime difference is the "find" pass only has to retreive the
    filenames, while the second "ls" pass has to also retreive all the
    metadata (i.e., read the inodes).

    Retreiving only filenames is less disk IO, and far fewer seeks (if on a spinning rust disk), than retreiving all the file metadata.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ted Heise@theise@panix.com to comp.os.linux.misc on Fri Apr 12 14:25:15 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 14:15:49 -0000 (UTC),
    Rich <rich@example.invalid> wrote:
    Ted Heise <theise@panix.com> wrote:
    On Fri, 12 Apr 2024 13:39:34 +0100,
    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    I know just enough linux admin to be dangerous so this is
    probably a dumb question, but I'm wondering why use find
    rather than ls?

    Because 'find' is intended to be a filesystem traversal tool,
    and it includes the ability to exclude parts of the tree (the
    -path -prune invocations above exclude looking in those
    sub-trees).

    'ls' is meant to display directories to humans, and it has only
    a rudimentary ability to walk the tree, and no ability to
    exclude parts of the tree you don't want to see.

    Okay, that makes sense. Thanks for helping educate me!
    --
    Ted Heise <theise@panix.com> West Lafayette, IN, USA
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Borax Man@rotflol2@hotmail.com to comp.os.linux.misc on Fri Apr 12 15:29:02 2024
    From Newsgroup: comp.os.linux.misc

    On 2024-04-12, James Harris <james.harris.1@gmail.com> wrote:
    For a number of reasons I am looking for a way of recording a list of
    the files (and file-like objects) on a Unix system at certain points in time. The main output would simply be sorted text with one
    fully-qualified file name on each line.

    What follows is my first attempt at it. I'd appreciate any feedback on whether I am going about it the right way or whether it could be
    improved either in concept or in coding.

    There are two tiny scripts. In the examples below they write to
    temporary files f1 and f2 to test the mechanism but the idea is that the reports would be stored in timestamped files so that comparisons between
    one report and another could be made later.

    The first, and primary, script generates nothing other than names and is
    as follows.

    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    You'll see I made some choices such as to omit files from /proc but not
    from /dev, for example, to record any lost+found contents, to record
    mounted filesystems, to show just one level of /tmp, etc.

    I am not sure I coded the command right albeit that it seems to work on
    test cases.

    The output from that starts with lines such as

    /
    /bin
    /boot
    /boot/System.map-5.15.0-101-generic
    /boot/System.map-5.15.0-102-generic
    ...etc...

    Such a form would be ideal for input to grep and diff to look for
    relevant files that have been added or removed between any two runs.

    The second, and less important, part is to store (in a separate file)
    info about each of the file names as that may be relevant in some cases. That takes the first file as input and has the following form.

    cat /tmp/f1 |\
    tr '\n' '\0' |\
    xargs -0 sudo ls -ld > /tmp/f2

    The output from that is such as

    drwxr-xr-x 23 root root 4096 Apr 13 2023 /
    lrwxrwxrwx 1 root root 7 Mar 7 2023 /bin -> usr/bin drwxr-xr-x 3 root root 4096 Apr 11 11:30 /boot
    ...etc...

    As for run times, if anyone's interested, despite the server I ran this
    on having multiple locally mounted filesystems and one NFS the initial
    tests ran in 90 seconds to generate the first file and 5 minutes to
    generate the second, which would mean (as long as no faults are found)
    that it would be no problem to run at least the first script whenever required. Other than that, I'd probably also schedule both to run each night.

    That's the idea. As I say, comments, advice and criticisms on the idea
    or on the coding would be appreciated!


    One thing, find has a "printf" option, where you can format the
    output. you can remove the need for "tr" by using this instead of
    "-print0".

    -printf "%P\n"

    That will also remove the leading slash, which I think is a good idea
    in this case. Use the lower case p to keep the starting point of the
    file and have the leading path.

    If you are wanting to validate a directory tree, that is, see if it
    has changed, I would recommend using mtree. It's available in debian
    under the mtree-bsd package.

    Mtree can output a list of files, plus other attributes to a spec
    file, and can tell you later, according to the spec file, what changes
    have been made. The problem with your "find" method, is you can't
    tell if a file has simply been modified.

    Using mtree, you can do two things. One generate a specification
    file, which is really a list of files plus selected attributes at any
    point in time AND, see what changes have been made. As a bonus, you
    can get it to output the spec in a simple format, using the "-C"
    option, and you get output very similar to "find" with a little extra
    info tacked on, which you could remove using a pipe.

    You could output to a spec file which as the date in the filename,
    then run mtree against any previous spec file to see what has changes
    between that spec and the current state.

    If you just want the list of files, find works fine, with the
    suggestion I made about printing the filename, but have a look at
    mtree because I think it will save you a bit of coding.

    Thats the thing with Linux, or computing in general, its likely what
    you thought of has already been done, and there is a tool which does
    it, or easily adapated to do it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From vallor@vallor@cultnix.org to comp.os.linux.misc on Fri Apr 12 15:38:40 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 14:13:10 -0000 (UTC), Rich <rich@example.invalid>
    wrote in <uvbfhm$2d8f0$1@dont-email.me>:

    grep supports "null terminated lines" using the "--null-data" so you
    could 'grep' null terminated lines files without translating them.

    I didn't know about that one -- thank you for posting about it!
    --
    -v
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David W. Hodgins@dwhodgins@nomail.afraid.org to comp.os.linux.misc on Fri Apr 12 12:25:54 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 08:39:34 -0400, James Harris <james.harris.1@gmail.com> wrote:

    For a number of reasons I am looking for a way of recording a list of
    the files (and file-like objects) on a Unix system at certain points in
    time. The main output would simply be sorted text with one
    fully-qualified file name on each line.
    <snip>

    Use a command that's designed for that purpose. https://oldmanprogrammer.net/source.php?dir=projects/tree

    Using "tree --noreport -ifax /" lists all files on the root file system, excluding other file systems such as /dev.
    Repeat the command for each file system you want to include.
    For a file with control characters in the name, it uses octal escape codes.
    For example, in a directory called test with a file called "some$'\n'file",
    it shows ...
    $ tree --noreport -ifax /home/dave/test
    /home/dave/test
    /home/dave/test/some\012file

    Check your distro to see if it already has the tree package available.
    Mageia has it available, but not installed unless specifically requested.

    Regards, Dave Hodgins
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:05:29 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 13:39:34 +0100, James Harris wrote:

    You'll see I made some choices such as to omit files from /proc but not
    from /dev, for example, to record any lost+found contents, to record
    mounted filesystems, to show just one level of /tmp, etc.

    If you want a list of mounted filesystems, you can either read from /proc/mounts or use the “findmnt” command (Linux-specific).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:23:31 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 15:29:02 -0000 (UTC), Borax Man wrote:

    One thing, find has a "printf" option, where you can format the output.
    you can remove the need for "tr" by using this instead of "-print0".

    -print0 can handle arbitrary filenames. Even the find(1) man page, in the section “UNUSUAL FILENAMES”, says

    If you are able to decide what format to use for the output of
    find then it is normally better to use `\0' as a terminator than
    to use newline, as file names can contain white space and newline
    characters.

    And that is what -print0 does.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:25:19 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 14:13:10 -0000 (UTC), Rich wrote:

    If you are going to output null terminated filenames (-print0) then
    don't almost immediately throw out the nulls by converting them to
    newlines.

    And also note that Bash has a technique to read those null-terminated pathnames, stopping at the null without being confused by any newlines
    prior to that.

    You can then use “printf %q” to output your report with those funny characters properly escaped.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 00:28:21 2024
    From Newsgroup: comp.os.linux.misc

    On Fri, 12 Apr 2024 14:08:02 -0000 (UTC), vallor wrote:

    ... filenames can contain pretty much any character, including newlines.

    Pathnames can contain any character except null.

    Components of pathnames (file names, directory names) can contain any character except “/” (the path separator) and null.

    So “/” in a pathname is used to split it into components: zero or more directory names, ending in a directory or file name.

    In my file/directory names, if I feel the urge to have a “/” in the name, I use a “∕” instead.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Borax Man@rotflol2@hotmail.com to comp.os.linux.misc on Sat Apr 13 10:41:40 2024
    From Newsgroup: comp.os.linux.misc

    On Sat, 13 Apr 2024 00:23:31 -0000 (UTC)
    Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
    On Fri, 12 Apr 2024 15:29:02 -0000 (UTC), Borax Man wrote:

    One thing, find has a "printf" option, where you can format the output. you can remove the need for "tr" by using this instead of "-print0".

    -print0 can handle arbitrary filenames. Even the find(1) man page, in the section “UNUSUAL FILENAMES”, says

    If you are able to decide what format to use for the output of
    find then it is normally better to use `\0' as a terminator than
    to use newline, as file names can contain white space and newline
    characters.

    And that is what -print0 does.
    Good point, though I think newlines in a filename are an abomination.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 03:14:55 2024
    From Newsgroup: comp.os.linux.misc

    On Sat, 13 Apr 2024 10:41:40 +1000, Borax Man wrote:

    ... I think newlines in a filename are an abomination.

    I avoid them myself. But since they are permitted, if you want your code
    to be fully general, you have to deal with them.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Andy Burns@usenet@andyburns.uk to comp.os.linux.misc on Sat Apr 13 08:26:36 2024
    From Newsgroup: comp.os.linux.misc

    Lawrence D'Oliveiro wrote:

    In my file/directory names, if I feel the urge to have a “/” in the name, I use a “∕” instead.

    Similarly for filenames of BSI documents including amendment numbers, I
    will use a U+A789 character instead of a colon, to keep Windows happy.

    e.g. BS 7671:2018+A2:2022.pdf
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 07:46:59 2024
    From Newsgroup: comp.os.linux.misc

    On Sat, 13 Apr 2024 08:26:36 +0100, Andy Burns wrote:

    Lawrence D'Oliveiro wrote:

    In my file/directory names, if I feel the urge to have a “/” in the
    name, I use a “∕” instead.

    Similarly for filenames of BSI documents including amendment numbers, I
    will use a U+A789 character instead of a colon ...

    I use U+2236 RATIO. In the “Hack” font I like to use, it looks more like U+003A COLON than U+A789 MODIFIER LETTER COLON does.

    I do this in Linux, because some commands (e.g. network-related ones) interpret U+003A COLON as a host-name prefix.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From The Natural Philosopher@tnp@invalid.invalid to comp.os.linux.misc on Sat Apr 13 08:57:27 2024
    From Newsgroup: comp.os.linux.misc

    On 13/04/2024 01:05, Lawrence D'Oliveiro wrote:
    On Fri, 12 Apr 2024 13:39:34 +0100, James Harris wrote:

    You'll see I made some choices such as to omit files from /proc but not
    from /dev, for example, to record any lost+found contents, to record
    mounted filesystems, to show just one level of /tmp, etc.

    If you want a list of mounted filesystems, you can either read from /proc/mounts or use the “findmnt” command (Linux-specific).

    Whats wrong with 'mount' ?
    --
    “Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong remedies.”
    ― Groucho Marx

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 13 22:01:50 2024
    From Newsgroup: comp.os.linux.misc

    On Sat, 13 Apr 2024 08:57:27 +0100, The Natural Philosopher wrote:

    On 13/04/2024 01:05, Lawrence D'Oliveiro wrote:

    If you want a list of mounted filesystems, you can either read from
    /proc/mounts or use the “findmnt” command (Linux-specific).

    Whats wrong with 'mount' ?

    The man page explains.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From candycanearter07@candycanearter07@candycanearter07.nomail.afraid to comp.os.linux.misc on Mon Apr 15 15:40:08 2024
    From Newsgroup: comp.os.linux.misc

    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 03:14 this Saturday (GMT):
    On Sat, 13 Apr 2024 10:41:40 +1000, Borax Man wrote:

    ... I think newlines in a filename are an abomination.

    I avoid them myself. But since they are permitted, if you want your code
    to be fully general, you have to deal with them.


    Unfortunately.
    --
    user <candycane> is generated from /dev/urandom
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From candycanearter07@candycanearter07@candycanearter07.nomail.afraid to comp.os.linux.misc on Mon Apr 15 15:40:09 2024
    From Newsgroup: comp.os.linux.misc

    Lawrence D'Oliveiro <ldo@nz.invalid> wrote at 07:46 this Saturday (GMT):
    On Sat, 13 Apr 2024 08:26:36 +0100, Andy Burns wrote:

    Lawrence D'Oliveiro wrote:

    In my file/directory names, if I feel the urge to have a “/” in the
    name, I use a “∕” instead.

    Similarly for filenames of BSI documents including amendment numbers, I
    will use a U+A789 character instead of a colon ...

    I use U+2236 RATIO. In the “Hack” font I like to use, it looks more like U+003A COLON than U+A789 MODIFIER LETTER COLON does.

    I do this in Linux, because some commands (e.g. network-related ones) interpret U+003A COLON as a host-name prefix.


    That seems very inconvenient.
    --
    user <candycane> is generated from /dev/urandom
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From James Harris@james.harris.1@gmail.com to comp.os.linux.misc on Sat Apr 20 16:23:27 2024
    From Newsgroup: comp.os.linux.misc

    On 12/04/2024 15:13, Rich wrote:
    James Harris <james.harris.1@gmail.com> wrote:
    export LC_ALL=C
    sudo find /\
    -path "/proc/*" -prune -o\
    -path "/run/*" -prune -o\
    -path "/sys/*" -prune -o\
    -path "/tmp/*/*" -prune -o\
    -print0 | sort -z | tr '\0' '\n' > /tmp/f1

    If you are going to output null terminated filenames (-print0) then
    don't almost immediately throw out the nulls by converting them to
    newlines. The purpose of -print0 and the nulls is to avoid *any*
    problems with any filename character (i.e., a filename /can/ contain a newline). If, by any chance, you have even one filename with a
    newline in the name, converting the nulls to newlines for storage will
    break the storage file (i.e., you can't differentate the "newlines
    ending filenames" from the "newlines that belong inside a filename").

    Convert the nulls to newlines only when you want to view with less,
    then your "files of records" are not corrupt from the start.:

    tr '\0' '\n' < /tmp/f1 | less ; or

    < /tmp/f1 tr '\0' '\n' | less ; if you prefer the input file on the
    left

    I am trying to do the zero termination just now but have run in to a
    problem. The above find command may report errors such as permission
    failures and missing files. I really should include such info in the
    output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.

    To get around that I tried

    find 2> >(tr '\n' '\0')

    That partly works. After sorting, error messages appear at the end (they
    begin with 'f' where non-error lines begin with '/') which is fine but
    there is usually one garbled line between good results and error
    messages and very likely some other corruption elsewhere in the file.

    I guess that's due to output buffering but even

    stdbuf -oL -eL find 2> >(stdbuf -oL -eL tr '\n' '\0')

    doesn't work. It is already getting well beyond my comfort zone and is
    getting increasingly complex which, it has to be said, would not be the
    case with newline terminators.

    Hence this post to ask for suggestions on where to go next.

    I guess I could write find's stdout and stderr to temp files, sed the
    stderr data, convert newlines to nuls, combine with the stdout data and
    then I'll be back on track and can sort the result. But before I do that
    I thought to check back for suggestions. Are there any simpler ways?

    ...

    However, if you want 'data' on the files, you'd be better off using the 'stat' command, as it is intended to 'aquire' meta-data about files
    (and by far more meta-data than what ls shows).

    You'd feed it filenames with xargs as well:

    xargs -0 stat --format="%i %h %n" > /tmp/f2

    To output inode, number of links, and name (granted, you can get this
    from ls as well, but look through the stat man page, there is a lot
    more stuff you can pull out).

    Agreed, stat would be better.


    The one tool that does not (yet) seem to consume "null terminated
    lines" is diff, and you can work around that by converting to newlines
    at the time you do the diff:

    diff -u <(tr '\0' '\n' < /tmp/f1) <(tr '\0' '\n' < /tmp/f2)

    And, note, all of these "convert nulls to newlines at time of need" can
    be scripted, so you could have a "file-list-diff" script that contains:

    #!/bin/bash
    diff -u <(tr '\0' '\n' < "$1") <(tr '\0' '\n' < "$2")

    And then you can diff two of your audit files by:

    file-list-diff /tmp/f1 /tmp/f2

    And not have to remember the tr invocation and process substitution
    syntax to do the same at every call.

    Understood. Thanks for the clear info.
    --
    James Harris


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Rich@rich@example.invalid to comp.os.linux.misc on Sat Apr 20 18:18:22 2024
    From Newsgroup: comp.os.linux.misc

    James Harris <james.harris.1@gmail.com> wrote:
    I am trying to do the zero termination just now but have run in to a problem. The above find command may report errors such as permission failures and missing files. I really should include such info in the
    output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.

    To get around that I tried

    find 2> >(tr '\n' '\0')


    Since 'errors' are not "filenames" but rather "information about
    problems" (likely problems you'd want to investigate), don't try to
    zero terminate them and include them in the "filename list". Instead
    write the error stream to a separate file:

    find 2> error-stream

    And check the size of the "error-stream" (substitute your own name, or
    a temporary name for "error-stream") file when the process is finished.

    If the "error-stream" file is zero bytes, no issues occurred (well,
    none that 'find' reported). You can continue with the rest of what
    your process is doing, and remove the empty 'error-stream' file.

    If the "error-stream" file is non-zero, save it somewhere that you can
    find it (and relate it to which run it was for) and raise alarm bells
    to yourself to investigate what caused the error(s).

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.os.linux.misc on Sat Apr 20 22:39:58 2024
    From Newsgroup: comp.os.linux.misc

    On Sat, 20 Apr 2024 16:23:27 +0100, James Harris wrote:

    I am trying to do the zero termination just now but have run in to a
    problem. The above find command may report errors such as permission
    failures and missing files. I really should include such info in the
    output but coming from stderr such lines are newline- rather than nul-terminated and therefore cannot be combined with just 2>&1.

    This is the point at which to give up on trying to do it in a shell
    script, and switch to using a proper programming language that gives you
    more control over what is going on.

    I recommend Python.
    --- Synchronet 3.20a-Linux NewsLink 1.114