• Linux Embedded: how to get info from a running service

    From pozz@pozzugno@gmail.com to comp.arch.embedded on Mon Aug 7 23:28:25 2023
    From Newsgroup: comp.arch.embedded

    I developed an application (in Python language) that is started by
    systemd at startup and stay running forever (until the box is rebooted
    or shutdown).

    The user can get some info of the system by a web application, as a
    typical router or NAS. The user points his web browser to the local IP
    address of my Linux box and a graphical interface appears. This web application is written in Python (WSGI) deployed on the box thanks to
    nginx + gunicorn + Flask.

    Now the question is: how to get infos from a running service such that
    they can be shown on the web application?
    The infos are specific of my application, they aren't standard. I'm in
    control of the running service and WSGI application, so I can use
    whatever solution is better.

    Of course this is a typical IPC scenario: one process is WSGI and the
    other is the running service.
    I can use shared memory, message queues, named pipes, unix sockets,
    Internet sockets, D-Bus and many other mechanisms.

    Is there one that is better to use in my case? After some reading, maybe
    D-Bus can be a good way. I understood systemd already uses D-Bus to
    exchange infos about its services and units.
    However its implementation is not straightforward as a unix socket with
    custom messages.

    What do you suggest?

    PS: In the past I read only a few posts regarding Linux development,
    even if it's for embedded devices. However I don't know how to ask
    questions related to linux development, I noticed Usenet linux groups
    are somewhat dead.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Aug 8 12:27:35 2023
    From Newsgroup: comp.arch.embedded

    On 07/08/2023 23:28, pozz wrote:
    I developed an application (in Python language) that is started by
    systemd at startup and stay running forever (until the box is rebooted
    or shutdown).

    The user can get some info of the system by a web application, as a
    typical router or NAS. The user points his web browser to the local IP address of my Linux box and a graphical interface appears. This web application is written in Python (WSGI) deployed on the box thanks to
    nginx + gunicorn + Flask.

    Now the question is: how to get infos from a running service such that
    they can be shown on the web application?
    The infos are specific of my application, they aren't standard. I'm in control of the running service and WSGI application, so I can use
    whatever solution is better.

    Of course this is a typical IPC scenario: one process is WSGI and the
    other is the running service.
    I can use shared memory, message queues, named pipes, unix sockets,
    Internet sockets, D-Bus and many other mechanisms.

    Is there one that is better to use in my case? After some reading, maybe D-Bus can be a good way. I understood systemd already uses D-Bus to
    exchange infos about its services and units.
    However its implementation is not straightforward as a unix socket with custom messages.

    What do you suggest?

    PS: In the past I read only a few posts regarding Linux development,
    even if it's for embedded devices. However I don't know how to ask
    questions related to linux development, I noticed Usenet linux groups
    are somewhat dead.

    I don't know what kind of information you are needing, but an easy
    option might be to have the python service regularly write out a json
    format file with the current status or other information. The web app
    can have Javascript that regularly reads that file and handles it on the user's web browser. And if you want to go the other way, your Python
    code can use "inotify" waits to see file writes from the web server.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Aug 8 14:59:52 2023
    From Newsgroup: comp.arch.embedded

    Il 08/08/2023 12:27, David Brown ha scritto:
    On 07/08/2023 23:28, pozz wrote:
    [...]
    What do you suggest?

    PS: In the past I read only a few posts regarding Linux development,
    even if it's for embedded devices. However I don't know how to ask
    questions related to linux development, I noticed Usenet linux groups
    are somewhat dead.

    I don't know what kind of information you are needing, but an easy
    option might be to have the python service regularly write out a json
    format file with the current status or other information.  The web app
    can have Javascript that regularly reads that file and handles it on the user's web browser.  And if you want to go the other way, your Python
    code can use "inotify" waits to see file writes from the web server.

    Sincerely I don't like your solution. First of all, you are writing
    regularly on a normal file in the filesystem. Ok, maybe I can use a
    tmpfs filesystem in RAM.

    Another issue I see is synchronization. Without a sync mechanism, the
    reader could read bad data, because the writer is writing to it.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Tue Aug 8 14:31:02 2023
    From Newsgroup: comp.arch.embedded

    On 2023-08-08, pozz <pozzugno@gmail.com> wrote:

    Sincerely I don't like your solution. First of all, you are writing regularly on a normal file in the filesystem. Ok, maybe I can use a
    tmpfs filesystem in RAM.

    Another issue I see is synchronization. Without a sync mechanism, the
    reader could read bad data, because the writer is writing to it.

    There is a trivial way to deal with that which has been used since
    time immemorial on Unix:

    Write to a temporary file, then close and reanme it. The open() and
    rename() system calls are atomic with respect to each other. The task
    calling open() will get either the old file or the new file, never
    something "in between".

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Aug 8 16:31:36 2023
    From Newsgroup: comp.arch.embedded

    On 08/08/2023 14:59, pozz wrote:
    Il 08/08/2023 12:27, David Brown ha scritto:
    On 07/08/2023 23:28, pozz wrote:
    [...]
    What do you suggest?

    PS: In the past I read only a few posts regarding Linux development,
    even if it's for embedded devices. However I don't know how to ask
    questions related to linux development, I noticed Usenet linux groups
    are somewhat dead.

    I don't know what kind of information you are needing, but an easy
    option might be to have the python service regularly write out a json
    format file with the current status or other information.  The web app
    can have Javascript that regularly reads that file and handles it on
    the user's web browser.  And if you want to go the other way, your
    Python code can use "inotify" waits to see file writes from the web
    server.

    Sincerely I don't like your solution. First of all, you are writing regularly on a normal file in the filesystem. Ok, maybe I can use a
    tmpfs filesystem in RAM.


    That would be the normal choice, yes.

    Another issue I see is synchronization. Without a sync mechanism, the
    reader could read bad data, because the writer is writing to it.


    You typically handle this by writing to "status.tmp", then renaming
    (moving) it to "status.json", or whatever names you are using. Renaming
    a file like this is guaranteed atomic on Linux - anything attempting to
    open a handle to "status.json" will either get the old file (which is
    kept alive while the file descriptor is open) or the new file. This is
    not the first situation in which people wanted to avoid reading
    half-written files!



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Tue Aug 8 17:14:51 2023
    From Newsgroup: comp.arch.embedded

    Il 08/08/2023 16:31, David Brown ha scritto:
    On 08/08/2023 14:59, pozz wrote:
    Il 08/08/2023 12:27, David Brown ha scritto:
    On 07/08/2023 23:28, pozz wrote:
    [...]
    What do you suggest?

    PS: In the past I read only a few posts regarding Linux development,
    even if it's for embedded devices. However I don't know how to ask
    questions related to linux development, I noticed Usenet linux
    groups are somewhat dead.

    I don't know what kind of information you are needing, but an easy
    option might be to have the python service regularly write out a json
    format file with the current status or other information.  The web
    app can have Javascript that regularly reads that file and handles it
    on the user's web browser.  And if you want to go the other way, your
    Python code can use "inotify" waits to see file writes from the web
    server.

    Sincerely I don't like your solution. First of all, you are writing
    regularly on a normal file in the filesystem. Ok, maybe I can use a
    tmpfs filesystem in RAM.


    That would be the normal choice, yes.

    Another issue I see is synchronization. Without a sync mechanism, the
    reader could read bad data, because the writer is writing to it.


    You typically handle this by writing to "status.tmp", then renaming
    (moving) it to "status.json", or whatever names you are using.  Renaming
    a file like this is guaranteed atomic on Linux - anything attempting to
    open a handle to "status.json" will either get the old file (which is
    kept alive while the file descriptor is open) or the new file.  This is
    not the first situation in which people wanted to avoid reading
    half-written files!

    Good thing to know.

    Just to better understand what happens. If reader opens status.json just before the writer rename status.tmp to status.json, we will have a
    process (the reader) that reads from the old version of "status.json"
    instead of the new version that is really on the filesystem?

    Consider that the reader could keep open the old status.json for a long
    time. Does the OS guarantee that old data (maybe 1GB) can be read even
    if a new file with new data is available?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Tue Aug 8 18:54:33 2023
    From Newsgroup: comp.arch.embedded

    On 08/08/2023 17:14, pozz wrote:
    Il 08/08/2023 16:31, David Brown ha scritto:
    On 08/08/2023 14:59, pozz wrote:
    Il 08/08/2023 12:27, David Brown ha scritto:
    On 07/08/2023 23:28, pozz wrote:
    [...]
    What do you suggest?

    PS: In the past I read only a few posts regarding Linux
    development, even if it's for embedded devices. However I don't
    know how to ask questions related to linux development, I noticed
    Usenet linux groups are somewhat dead.

    I don't know what kind of information you are needing, but an easy
    option might be to have the python service regularly write out a
    json format file with the current status or other information.  The
    web app can have Javascript that regularly reads that file and
    handles it on the user's web browser.  And if you want to go the
    other way, your Python code can use "inotify" waits to see file
    writes from the web server.

    Sincerely I don't like your solution. First of all, you are writing
    regularly on a normal file in the filesystem. Ok, maybe I can use a
    tmpfs filesystem in RAM.


    That would be the normal choice, yes.

    Another issue I see is synchronization. Without a sync mechanism, the
    reader could read bad data, because the writer is writing to it.


    You typically handle this by writing to "status.tmp", then renaming
    (moving) it to "status.json", or whatever names you are using.
    Renaming a file like this is guaranteed atomic on Linux - anything
    attempting to open a handle to "status.json" will either get the old
    file (which is kept alive while the file descriptor is open) or the
    new file.  This is not the first situation in which people wanted to
    avoid reading half-written files!

    Good thing to know.

    Just to better understand what happens. If reader opens status.json just before the writer rename status.tmp to status.json, we will have a
    process (the reader) that reads from the old version of "status.json" instead of the new version that is really on the filesystem?


    Yes, exactly.

    A file in Linux exists independently from filenames. There can be many
    things pointing to a file, and the file exists until there are no more pointers. Usually these "pointers" are directory entries, but they can
    also be open file descriptors (which are actually visible as pseudofiles
    in the /proc filesystem).

    So when you open the "status.json" file, you get that file, and it stays
    in existence at least until the file is closed. The new "status.tmp" is
    a different file. The rename just makes a new pointer to the new file,
    and erases the old pointer to the old file.

    Consider that the reader could keep open the old status.json for a long time. Does the OS guarantee that old data (maybe 1GB) can be read even
    if a new file with new data is available?


    Yes, as long as you hold the file descriptor open.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From pozz@pozzugno@gmail.com to comp.arch.embedded on Wed Aug 9 09:27:12 2023
    From Newsgroup: comp.arch.embedded

    Il 08/08/2023 18:54, David Brown ha scritto:
    On 08/08/2023 17:14, pozz wrote:
    Il 08/08/2023 16:31, David Brown ha scritto:
    On 08/08/2023 14:59, pozz wrote:
    Il 08/08/2023 12:27, David Brown ha scritto:
    On 07/08/2023 23:28, pozz wrote:
    [...]
    What do you suggest?

    PS: In the past I read only a few posts regarding Linux
    development, even if it's for embedded devices. However I don't
    know how to ask questions related to linux development, I noticed >>>>>> Usenet linux groups are somewhat dead.

    I don't know what kind of information you are needing, but an easy
    option might be to have the python service regularly write out a
    json format file with the current status or other information.  The >>>>> web app can have Javascript that regularly reads that file and
    handles it on the user's web browser.  And if you want to go the
    other way, your Python code can use "inotify" waits to see file
    writes from the web server.

    Sincerely I don't like your solution. First of all, you are writing
    regularly on a normal file in the filesystem. Ok, maybe I can use a
    tmpfs filesystem in RAM.


    That would be the normal choice, yes.

    Another issue I see is synchronization. Without a sync mechanism,
    the reader could read bad data, because the writer is writing to it.


    You typically handle this by writing to "status.tmp", then renaming
    (moving) it to "status.json", or whatever names you are using.
    Renaming a file like this is guaranteed atomic on Linux - anything
    attempting to open a handle to "status.json" will either get the old
    file (which is kept alive while the file descriptor is open) or the
    new file.  This is not the first situation in which people wanted to
    avoid reading half-written files!

    Good thing to know.

    Just to better understand what happens. If reader opens status.json
    just before the writer rename status.tmp to status.json, we will have
    a process (the reader) that reads from the old version of
    "status.json" instead of the new version that is really on the
    filesystem?


    Yes, exactly.

    A file in Linux exists independently from filenames.  There can be many things pointing to a file, and the file exists until there are no more pointers.  Usually these "pointers" are directory entries, but they can also be open file descriptors (which are actually visible as pseudofiles
    in the /proc filesystem).

    Is this behaviour the same for whatever filesystem (ext2, fat, ...)?

    What I don't understand is what exactly happens under the hood.

    Consider the following sequences:
    - process W (writer) write version A to status.tmp
    - process W rename status.tmp to status.json
    - process W write version B to status.tmp
    - process R (reader) open file status.json (version A)
    - process W rename status.tmp to status.json
    [Now all new open operations on status.json will get new version of data] [process W could write/rename status.tmp/json 1000 times]
    - after one hour (just to say), process R starts reading from the file

    From what I understand, process R will get the full contents of version
    A (even if it restarts reading changing file position many times). The
    OS takes care of data A, because this "ghost file"[1] is in use.

    Most probably, if the file size is small, the OS copy its contents in a
    cache in RAM when process R open the file, so process R will read from
    RAM and this explains why it will get the original version A content.

    Anyway, in general the file could be any size, maybe 1GB. So I assume at
    least some parts of version A data still remains in the HDD, even when
    process W write/rename a new version.
    Until process R doesn't close the file, version A data are phisically on
    the HDD, consuming part of its memory. Is it correct?

    [1] Ghost because it can't be read by any other process.



    So when you open the "status.json" file, you get that file, and it stays
    in existence at least until the file is closed.  The new "status.tmp" is
    a different file.  The rename just makes a new pointer to the new file,
    and erases the old pointer to the old file.

    Consider that the reader could keep open the old status.json for a
    long time. Does the OS guarantee that old data (maybe 1GB) can be read
    even if a new file with new data is available?


    Yes, as long as you hold the file descriptor open.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.arch.embedded on Wed Aug 9 10:29:30 2023
    From Newsgroup: comp.arch.embedded

    On 09/08/2023 09:27, pozz wrote:
    Il 08/08/2023 18:54, David Brown ha scritto:
    On 08/08/2023 17:14, pozz wrote:
    Il 08/08/2023 16:31, David Brown ha scritto:
    On 08/08/2023 14:59, pozz wrote:
    Il 08/08/2023 12:27, David Brown ha scritto:
    On 07/08/2023 23:28, pozz wrote:
    [...]
    What do you suggest?

    PS: In the past I read only a few posts regarding Linux
    development, even if it's for embedded devices. However I don't >>>>>>> know how to ask questions related to linux development, I noticed >>>>>>> Usenet linux groups are somewhat dead.

    I don't know what kind of information you are needing, but an easy >>>>>> option might be to have the python service regularly write out a
    json format file with the current status or other information.
    The web app can have Javascript that regularly reads that file and >>>>>> handles it on the user's web browser.  And if you want to go the >>>>>> other way, your Python code can use "inotify" waits to see file
    writes from the web server.

    Sincerely I don't like your solution. First of all, you are writing >>>>> regularly on a normal file in the filesystem. Ok, maybe I can use a >>>>> tmpfs filesystem in RAM.


    That would be the normal choice, yes.

    Another issue I see is synchronization. Without a sync mechanism,
    the reader could read bad data, because the writer is writing to it. >>>>>

    You typically handle this by writing to "status.tmp", then renaming
    (moving) it to "status.json", or whatever names you are using.
    Renaming a file like this is guaranteed atomic on Linux - anything
    attempting to open a handle to "status.json" will either get the old
    file (which is kept alive while the file descriptor is open) or the
    new file.  This is not the first situation in which people wanted to >>>> avoid reading half-written files!

    Good thing to know.

    Just to better understand what happens. If reader opens status.json
    just before the writer rename status.tmp to status.json, we will have
    a process (the reader) that reads from the old version of
    "status.json" instead of the new version that is really on the
    filesystem?


    Yes, exactly.

    A file in Linux exists independently from filenames.  There can be
    many things pointing to a file, and the file exists until there are no
    more pointers.  Usually these "pointers" are directory entries, but
    they can also be open file descriptors (which are actually visible as
    pseudofiles in the /proc filesystem).

    Is this behaviour the same for whatever filesystem (ext2, fat, ...)?


    Yes (as far as I understand it), though some filesystems (like fat)
    don't support multiple directory links to the same file. All *nix
    suitable filesystems do, because hard links are a key feature. (In
    use-cases like yours, you would - as you suggested - put the file on a
    tmpfs mount.)

    What I don't understand is what exactly happens under the hood.

    Consider the following sequences:
    - process W (writer) write version A to status.tmp
    - process W rename status.tmp to status.json
    - process W write version B to status.tmp
    - process R (reader) open file status.json (version A)
    - process W rename status.tmp to status.json
    [Now all new open operations on status.json will get new version of data] [process W could write/rename status.tmp/json 1000 times]
    - after one hour (just to say), process R starts reading from the file


    Yes.

    It's easy to try all this from two Python interactive sessions. It's
    hard to test race conditions of doing things quickly, but waiting an
    hour should be easy to simulate :-)

    From what I understand, process R will get the full contents of version
    A (even if it restarts reading changing file position many times). The
    OS takes care of data A, because this "ghost file"[1] is in use.


    Yes. Just don't close the file descriptor between accesses. (You can duplicate the file descriptor, close the first one, and carry on using
    the duplicate to access the old file.)

    Most probably, if the file size is small, the OS copy its contents in a cache in RAM when process R open the file, so process R will read from
    RAM and this explains why it will get the original version A content.


    It is all just like normal files. But since version A is marked for
    immediate deletion upon closure, the OS knows it never needs to survive
    a reboot. So if the data was not written out to disk, but was still in
    the write cache, then it will never be written out to the disk unless
    the system is desperate for memory.

    This is similar to the trick for temporary files. Create a new file for writing. Open a read/write handle to the file. Then delete the file.
    You can use the handle (and duplicates of it) to access the file, pass
    on the handle to child processes, etc., but the OS knows it does not
    need to actually write anything to the disk. (Windows has special API
    calls to get a similar effect for immediately deletable files.)

    Anyway, in general the file could be any size, maybe 1GB. So I assume at least some parts of version A data still remains in the HDD, even when process W write/rename a new version.
    Until process R doesn't close the file, version A data are phisically on
    the HDD, consuming part of its memory. Is it correct?


    Yes. You have understood perfectly.

    [1] Ghost because it can't be read by any other process.


    It can, through a duplicate file descriptor. In practice this is
    commonly done by child processes - when you fork(), the child gets a
    copy of all the open file descriptors.



    So when you open the "status.json" file, you get that file, and it
    stays in existence at least until the file is closed.  The new
    "status.tmp" is a different file.  The rename just makes a new pointer
    to the new file, and erases the old pointer to the old file.

    Consider that the reader could keep open the old status.json for a
    long time. Does the OS guarantee that old data (maybe 1GB) can be
    read even if a new file with new data is available?


    Yes, as long as you hold the file descriptor open.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Grant Edwards@invalid@invalid.invalid to comp.arch.embedded on Wed Aug 9 13:00:59 2023
    From Newsgroup: comp.arch.embedded

    On 2023-08-09, David Brown <david.brown@hesbynett.no> wrote:
    On 09/08/2023 09:27, pozz wrote:

    [1] Ghost because it can't be read by any other process.


    It can, through a duplicate file descriptor. In practice this is
    commonly done by child processes - when you fork(), the child gets a
    copy of all the open file descriptors.

    You can also pass filedescriptors to non-related processes via
    Unix-domain sockets.

    https://copyconstruct.medium.com/file-descriptor-transfer-over-unix-domain-sockets-dcbbf5b3b6ec

    https://www.sobyte.net/post/2022-01/pass-fd-over-domain-socket/
    --- Synchronet 3.20a-Linux NewsLink 1.114