• GlusterFS with replica 3

    From ^Bart@none@none.it to alt.os.linux on Sat Jul 5 08:41:23 2025
    From Newsgroup: alt.os.linux

    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and
    buffer/cache about 5GB.

    I read on internet AWS starts from 16GB of ram for GlusterFS, other
    documents said to use 12GB of ram, do you have experience about it?

    ^Bart
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence D'Oliveiro@ldo@nz.invalid to alt.os.linux on Sat Jul 5 07:18:35 2025
    From Newsgroup: alt.os.linux

    On Sat, 5 Jul 2025 08:41:23 +0200, ^Bart wrote:

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    Any messages in the logs? journalctl? dmesg?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Daniel70@daniel47@eternal-september.org to alt.os.linux on Sat Jul 5 20:08:15 2025
    From Newsgroup: alt.os.linux

    On 5/07/2025 4:41 pm, ^Bart wrote:
    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and buffer/cache about 5GB.

    Free Ram 300MB approx
    Used Ram 1.5GB
    Buffer/cache 5GB approx

    Total 8GB approx
    Available Ram 8GB approx

    .... so are you all full up??

    I read on internet AWS starts from 16GB of ram for GlusterFS, other documents said to use 12GB of ram, do you have experience about it?

    ^Bart

    If you clear your Buffer/cache, might things run better??
    --
    Daniel70
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From J.O. Aho@user@example.net to alt.os.linux on Sat Jul 5 17:25:43 2025
    From Newsgroup: alt.os.linux

    On 05/07/2025 08.41, ^Bart wrote:

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    Have you seen anything in the logs? Maybe check /var/log/glusterfs/glusterd.log
    It can be lock that hasn't been released, sadly only fix is to restart glusterd.


    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and buffer/cache about 5GB.

    It's quite many years since I used GlusterFS, but back then at work we
    had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
    SAN based storage as nodes, the system got degradation with having high read/write. In the end those was replaced by standard NFS servers which
    gave more stability and then have replication from one SAN to another,
    sure not a fully HA solution.

    I think I should have pushed more for Lustre and the guys at CERN were
    really helpful when I did some testing with just a simple setup which
    needed a bit more nodes, sadly the requirements changed on the way that
    made we needed to provide the file system to a closed source operating
    system with poor file system support.


    I read on internet AWS starts from 16GB of ram for GlusterFS, other documents said to use 12GB of ram, do you have experience about it?
    gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
    Gigabit network.
    --
    //Aho
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From vallor@vallor@cultnix.org to alt.os.linux on Sun Jul 6 02:09:52 2025
    From Newsgroup: alt.os.linux

    On Sat, 5 Jul 2025 20:08:15 +1000, Daniel70
    <daniel47@eternal-september.org> wrote in <104atij$1e1pj$1@dont-email.me>:

    On 5/07/2025 4:41 pm, ^Bart wrote:
    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and
    buffer/cache about 5GB.

    Free Ram 300MB approx
    Used Ram 1.5GB
    Buffer/cache 5GB approx

    Total 8GB approx
    Available Ram 8GB approx

    .... so are you all full up??

    I read on internet AWS starts from 16GB of ram for GlusterFS, other
    documents said to use 12GB of ram, do you have experience about it?

    ^Bart

    If you clear your Buffer/cache, might things run better??

    Hi Daniel,

    The way Linux works, free memory gets used as Buffer/cache.
    As more memory is allocated, it pulls it from the B/C. It's
    basically part of the "free" memory, but being "borrowed"
    by the OS for better performance.
    --
    -v System76 Thelio Mega v1.1 x86_64 NVIDIA RTX 3090Ti 24G
    OS: Linux 6.15.4 D: Mint 22.1 DE: Xfce 4.18 Mem: 258G
    "Ever notice how fast Windows runs? Neither have I."
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Paul@nospam@needed.invalid to alt.os.linux on Sun Jul 6 04:12:45 2025
    From Newsgroup: alt.os.linux

    On Sat, 7/5/2025 10:09 PM, vallor wrote:
    On Sat, 5 Jul 2025 20:08:15 +1000, Daniel70
    <daniel47@eternal-september.org> wrote in <104atij$1e1pj$1@dont-email.me>:

    On 5/07/2025 4:41 pm, ^Bart wrote:
    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes >>> but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and
    buffer/cache about 5GB.

    Free Ram 300MB approx
    Used Ram 1.5GB
    Buffer/cache 5GB approx

    Total 8GB approx
    Available Ram 8GB approx

    .... so are you all full up??

    I read on internet AWS starts from 16GB of ram for GlusterFS, other
    documents said to use 12GB of ram, do you have experience about it?

    ^Bart

    If you clear your Buffer/cache, might things run better??

    Hi Daniel,

    The way Linux works, free memory gets used as Buffer/cache.
    As more memory is allocated, it pulls it from the B/C. It's
    basically part of the "free" memory, but being "borrowed"
    by the OS for better performance.


    "If you clear your Buffer/cache, might things run better"

    Some people will tell you to test that for your very own
    self as a Linux user, and to your surprise, it is as they
    predicted, it makes no difference if you do a drop-cache.

    On some of the older Linux media, I used to write stuff
    as a liner note. This is from the paper wrapper on the
    Knoppix 5.3.1 DVD I made a while back.

    echo 1 > /proc/sys/vm/drop_caches

    1=PageCache
    2=Dentries,Inodes
    3=Both

    If you run "top" in one terminal session, then issue
    the command in another terminal session, you will see
    some numbers change in the "top" display, but the performance
    of the machine does not change.

    *******

    The OPs problem is at a different scale than what a
    home user can reproduce. Perhaps a person who has
    a Linux based work environment, has seen such a setup
    and can comment. Us home users, it would be pretty difficult
    to make a convincing setup.

    Imagine if the owner of Archive.org came online and
    asked a question about "a problem with his setup".
    Not many users here, have an Archive.org setup in
    their basement :-}

    Paul
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ^Bart@none@none.it to alt.os.linux on Mon Jul 7 22:52:37 2025
    From Newsgroup: alt.os.linux

    Have you seen anything in the logs? Maybe check /var/log/glusterfs/ glusterd.log

    There's nothing wrong in the normal mode but when the system starts the
    backup of some db, more or less six, the node changes from Y to N but
    just with the most important db 1,9GB and it works well with other dbs
    untill 1,4GB and on the logs of gluster I can read something like "the
    node is disconnect" because can't read other peers.

    It can be lock that hasn't been released, sadly only fix is to restart glusterd.

    It's very sadly I can just restart the daemon to fix the "N" :\ but I
    could try to add more ram and get other 2GB so change from 8GB to 10GB.

    It's quite many years since I used GlusterFS, but back then at work we
    had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
    SAN based storage as nodes, the system got degradation with having high

    I think GlusterFS needs more than 8GB of ram to prevent spikes when the
    system does backup, ok also without backup job there isn't a lot of free memory (300-400MB) but there aren't down nodes!

    read/write. In the end those was replaced by standard NFS servers which
    gave more stability and then have replication from one SAN to another,
    sure not a fully HA solution.

    I'm watching cephfs but on internet I read it needs more ram than what I
    use now so... like what I wrote above I think now I could try to upgrade
    ram and run tests on GlusterFS because to change a "production cluster"
    is not easy like a charm but I know also there aren't future plans about gluster and I heard it will be closed so... cephfs will be the only alternative.

    gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
    Gigabit network.

    I read it but in a real production environment I think the cpu and ram quantities are little bit different...

    ^Bart
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From J.O. Aho@user@example.net to alt.os.linux on Tue Jul 8 09:10:54 2025
    From Newsgroup: alt.os.linux

    On 07/07/2025 22.52, ^Bart wrote:
    Have you seen anything in the logs? Maybe check /var/log/glusterfs/
    glusterd.log

    There's nothing wrong in the normal mode but when the system starts the backup of some db, more or less six, the node changes from Y to N but
    just with the most important db 1,9GB and it works well with other dbs untill 1,4GB and on the logs of gluster I can read something like "the
    node is disconnect" because can't read other peers.

    Could this be that you are reaching max transfer on your network, which
    could lead to that there ain't enough bandwidth for both file transfer
    and checking of nodes are up?

    Another alternative is that the nodes can't handle the speed the
    incoming traffic has for a longer time, as it's on 100% disk
    utilization, network traffic will also suffer (I have seen this, when
    disk is slow, everything else seems to go into a crawl and network
    connections fail as everything is queued up and while queued packages
    times out).


    It can be lock that hasn't been released, sadly only fix is to restart
    glusterd.

    It's very sadly I can just restart the daemon to fix the "N" :\ but I
    could try to add more ram and get other 2GB so change from 8GB to 10GB.

    If the RAM is used a cache before things are written down to disk, this
    could help for a while, until the extra 2 GB is used up. If lucky that
    is more than needed and then nothing ill will happen until you have more
    data to backup.


    It's quite many years since I used GlusterFS, but back then at work we
    had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
    SAN based storage as nodes, the system got degradation with having high

    I think GlusterFS needs more than 8GB of ram to prevent spikes when the system does backup, ok also without backup job there isn't a lot of free memory (300-400MB) but there aren't down nodes!

    read/write. In the end those was replaced by standard NFS servers
    which gave more stability and then have replication from one SAN to
    another, sure not a fully HA solution.

    I'm watching cephfs but on internet I read it needs more ram than what I
    use now so... like what I wrote above I think now I could try to upgrade
    ram and run tests on GlusterFS because to change a "production cluster"
    is not easy like a charm but I know also there aren't future plans about gluster and I heard it will be closed so... cephfs will be the only alternative.

    Yeah, it's a load of work, think we had like 48h downtime when we
    switched from gluster to nfs, customers wasn't that happy.

    gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
    Gigabit network.

    I read it but in a real production environment I think the cpu and ram quantities are little bit different...

    All depends on what you are doing, small amount cpu/ram works fine in
    lab environments as you usually don't have 300+ clients trying to write.
    --
    //Aho

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ^Bart@none@none.it to alt.os.linux on Sun Oct 26 21:25:51 2025
    From Newsgroup: alt.os.linux

    All depends on what you are doing, small amount cpu/ram works fine in
    lab environments as you usually don't have 300+ clients trying to write.

    My CEO found the solution, the issue of the Wordpress sync by glusterfs
    was... millions of small files made from Wordpress! :D

    If sometime we delete these files the cluster works well!

    We understood it because if we did a simple df -h on /var/wwww/ and the
    node went down!

    So... I'm sorry, glusterfs you wasn't the guilty! :D

    ^Bart
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux on Sun Oct 26 20:56:16 2025
    From Newsgroup: alt.os.linux

    On Sun, 26 Oct 2025 21:25:51 +0100, ^Bart wrote:

    My CEO found the solution, the issue of the Wordpress sync by glusterfs was... millions of small files made from Wordpress! :D

    If sometime we delete these files the cluster works well!

    Maybe if you delete WordPress altogether, your site would work even
    better ...

    I’ve been setting up WordPress for a client, and it’s funny how sluggish it is, immediately after you have got it going, before you have even done
    any actual work with it.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Joerg Walther@joerg.walther@magenta.de to alt.os.linux on Mon Oct 27 17:40:28 2025
    From Newsgroup: alt.os.linux

    Lawrence D´Oliveiro wrote:

    I’ve been setting up WordPress for a client, and it’s funny how sluggish >it is, immediately after you have got it going, before you have even done >any actual work with it.

    Shared hosting most likely? Usually it is possible to give your WP
    instance more CPU power for a couple of bucks.

    -jw-
    --
    And now for something completely different...
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux on Mon Oct 27 21:15:21 2025
    From Newsgroup: alt.os.linux

    On Mon, 27 Oct 2025 17:40:28 +0100, Joerg Walther wrote:

    Lawrence D’Oliveiro wrote:

    I’ve been setting up WordPress for a client, and it’s funny how
    sluggish it is, immediately after you have got it going, before you
    have even done any actual work with it.

    Shared hosting most likely?

    Nope. Dedicated in-house VM, on pretty decent hardware (lots of RAM
    and disk, CPU cores in the dozens) under XCP-ng. There are other
    company intranet apps running on other VMs, written (by me) in both
    PHP and Python, and they all work much more snappily than WordPress.
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From J.O. Aho@user@example.net to alt.os.linux on Tue Oct 28 09:10:17 2025
    From Newsgroup: alt.os.linux

    On 27/10/2025 22.15, Lawrence D’Oliveiro wrote:
    On Mon, 27 Oct 2025 17:40:28 +0100, Joerg Walther wrote:

    Lawrence D’Oliveiro wrote:

    I’ve been setting up WordPress for a client, and it’s funny how
    sluggish it is, immediately after you have got it going, before you
    have even done any actual work with it.

    Shared hosting most likely?

    Nope. Dedicated in-house VM, on pretty decent hardware (lots of RAM
    and disk, CPU cores in the dozens) under XCP-ng.

    It's not that much about how much the host machine has, it's how much
    you assigned the VM and of course if you have over provisioned

    There are other
    company intranet apps running on other VMs, written (by me) in both
    PHP and Python, and they all work much more snappily than WordPress.

    Are they at the same size and as much db dependent as WP?
    Also the database may be the bottle neck, as I don't know anything of
    your setup it's difficult to just say what could be wrong...

    There are so much more that can be different in a VM compared to run it
    on bare metal, for example memcached had big issues some years ago when
    run in a VM, it just was extremely slow, on a smaller bare metal
    instance it was snappy as hell. The reason was how the memory was
    handled in the by the virtualization.
    --
    //Aho




    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux on Tue Oct 28 23:09:55 2025
    From Newsgroup: alt.os.linux

    On Tue, 28 Oct 2025 09:10:17 +0100, J.O. Aho wrote:

    Are they at the same size and as much db dependent as WP?

    Database-dependent -- several of them are quite heavily so.

    Same size and complexity ... obviously not.
    --- Synchronet 3.21a-Linux NewsLink 1.2