Forum: War Ensemble BBS

GlusterFS with replica 3

From ^Bart@none@none.it to alt.os.linux on Sat Jul 5 08:41:23 2025

From Newsgroup: alt.os.linux

Hello everyone,

I have a cluster with the latest GlusterFS on Debian12 with three nodes
but when I try a simple du -sh of the /var/www the node went in "N"
state and it doesn't come back to Y so I need to manually restart
glusterd daemon.

If I don't run backup, rsync, du, etc. the cluster works well, I used
also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
for each node, free ram is roughly of 300MB, used about 1,5GB and
buffer/cache about 5GB.

I read on internet AWS starts from 16GB of ram for GlusterFS, other
documents said to use 12GB of ram, do you have experience about it?

^Bart
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence D'Oliveiro@ldo@nz.invalid to alt.os.linux on Sat Jul 5 07:18:35 2025

From Newsgroup: alt.os.linux

On Sat, 5 Jul 2025 08:41:23 +0200, ^Bart wrote:

I have a cluster with the latest GlusterFS on Debian12 with three nodes
but when I try a simple du -sh of the /var/www the node went in "N"
state and it doesn't come back to Y so I need to manually restart
glusterd daemon.

Any messages in the logs? journalctl? dmesg?
--- Synchronet 3.21a-Linux NewsLink 1.2

From Daniel70@daniel47@eternal-september.org to alt.os.linux on Sat Jul 5 20:08:15 2025

From Newsgroup: alt.os.linux

On 5/07/2025 4:41 pm, ^Bart wrote:

Hello everyone,

I have a cluster with the latest GlusterFS on Debian12 with three nodes
but when I try a simple du -sh of the /var/www the node went in "N"
state and it doesn't come back to Y so I need to manually restart
glusterd daemon.

If I don't run backup, rsync, du, etc. the cluster works well, I used
also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
for each node, free ram is roughly of 300MB, used about 1,5GB and buffer/cache about 5GB.

Free Ram 300MB approx
Used Ram 1.5GB
Buffer/cache 5GB approx

Total 8GB approx
Available Ram 8GB approx

.... so are you all full up??

I read on internet AWS starts from 16GB of ram for GlusterFS, other documents said to use 12GB of ram, do you have experience about it?

^Bart

If you clear your Buffer/cache, might things run better??
--
Daniel70
--- Synchronet 3.21a-Linux NewsLink 1.2

From J.O. Aho@user@example.net to alt.os.linux on Sat Jul 5 17:25:43 2025

From Newsgroup: alt.os.linux

On 05/07/2025 08.41, ^Bart wrote:

I have a cluster with the latest GlusterFS on Debian12 with three nodes
but when I try a simple du -sh of the /var/www the node went in "N"
state and it doesn't come back to Y so I need to manually restart
glusterd daemon.

Have you seen anything in the logs? Maybe check /var/log/glusterfs/glusterd.log
It can be lock that hasn't been released, sadly only fix is to restart glusterd.

If I don't run backup, rsync, du, etc. the cluster works well, I used
also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
for each node, free ram is roughly of 300MB, used about 1,5GB and buffer/cache about 5GB.

It's quite many years since I used GlusterFS, but back then at work we
had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
SAN based storage as nodes, the system got degradation with having high read/write. In the end those was replaced by standard NFS servers which
gave more stability and then have replication from one SAN to another,
sure not a fully HA solution.

I think I should have pushed more for Lustre and the guys at CERN were
really helpful when I did some testing with just a simple setup which
needed a bit more nodes, sadly the requirements changed on the way that
made we needed to provide the file system to a closed source operating
system with poor file system support.

I read on internet AWS starts from 16GB of ram for GlusterFS, other documents said to use 12GB of ram, do you have experience about it?

gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
Gigabit network.
--
//Aho
--- Synchronet 3.21a-Linux NewsLink 1.2

From vallor@vallor@cultnix.org to alt.os.linux on Sun Jul 6 02:09:52 2025

From Newsgroup: alt.os.linux

On Sat, 5 Jul 2025 20:08:15 +1000, Daniel70
<daniel47@eternal-september.org> wrote in <104atij$1e1pj$1@dont-email.me>:

On 5/07/2025 4:41 pm, ^Bart wrote:

Hello everyone,

I have a cluster with the latest GlusterFS on Debian12 with three nodes
but when I try a simple du -sh of the /var/www the node went in "N"
state and it doesn't come back to Y so I need to manually restart
glusterd daemon.

If I don't run backup, rsync, du, etc. the cluster works well, I used
also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
for each node, free ram is roughly of 300MB, used about 1,5GB and
buffer/cache about 5GB.

Free Ram 300MB approx
Used Ram 1.5GB
Buffer/cache 5GB approx

Total 8GB approx
Available Ram 8GB approx

.... so are you all full up??

I read on internet AWS starts from 16GB of ram for GlusterFS, other
documents said to use 12GB of ram, do you have experience about it?

^Bart

If you clear your Buffer/cache, might things run better??

Hi Daniel,

The way Linux works, free memory gets used as Buffer/cache.
As more memory is allocated, it pulls it from the B/C. It's
basically part of the "free" memory, but being "borrowed"
by the OS for better performance.
--
-v System76 Thelio Mega v1.1 x86_64 NVIDIA RTX 3090Ti 24G
OS: Linux 6.15.4 D: Mint 22.1 DE: Xfce 4.18 Mem: 258G
"Ever notice how fast Windows runs? Neither have I."
--- Synchronet 3.21a-Linux NewsLink 1.2

From Paul@nospam@needed.invalid to alt.os.linux on Sun Jul 6 04:12:45 2025

From Newsgroup: alt.os.linux

On Sat, 7/5/2025 10:09 PM, vallor wrote:

On Sat, 5 Jul 2025 20:08:15 +1000, Daniel70
<daniel47@eternal-september.org> wrote in <104atij$1e1pj$1@dont-email.me>:

On 5/07/2025 4:41 pm, ^Bart wrote:

Hello everyone,

I have a cluster with the latest GlusterFS on Debian12 with three nodes >>> but when I try a simple du -sh of the /var/www the node went in "N"
state and it doesn't come back to Y so I need to manually restart
glusterd daemon.

If I don't run backup, rsync, du, etc. the cluster works well, I used
also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
for each node, free ram is roughly of 300MB, used about 1,5GB and
buffer/cache about 5GB.

Free Ram 300MB approx
Used Ram 1.5GB
Buffer/cache 5GB approx

Total 8GB approx
Available Ram 8GB approx

.... so are you all full up??

I read on internet AWS starts from 16GB of ram for GlusterFS, other
documents said to use 12GB of ram, do you have experience about it?

^Bart

If you clear your Buffer/cache, might things run better??

Hi Daniel,

The way Linux works, free memory gets used as Buffer/cache.
As more memory is allocated, it pulls it from the B/C. It's
basically part of the "free" memory, but being "borrowed"
by the OS for better performance.

"If you clear your Buffer/cache, might things run better"

Some people will tell you to test that for your very own
self as a Linux user, and to your surprise, it is as they
predicted, it makes no difference if you do a drop-cache.

On some of the older Linux media, I used to write stuff
as a liner note. This is from the paper wrapper on the
Knoppix 5.3.1 DVD I made a while back.

echo 1 > /proc/sys/vm/drop_caches

1=PageCache
2=Dentries,Inodes
3=Both

If you run "top" in one terminal session, then issue
the command in another terminal session, you will see
some numbers change in the "top" display, but the performance
of the machine does not change.

*******

The OPs problem is at a different scale than what a
home user can reproduce. Perhaps a person who has
a Linux based work environment, has seen such a setup
and can comment. Us home users, it would be pretty difficult
to make a convincing setup.

Imagine if the owner of Archive.org came online and
asked a question about "a problem with his setup".
Not many users here, have an Archive.org setup in
their basement :-}

Paul
--- Synchronet 3.21a-Linux NewsLink 1.2

From ^Bart@none@none.it to alt.os.linux on Mon Jul 7 22:52:37 2025

From Newsgroup: alt.os.linux

Have you seen anything in the logs? Maybe check /var/log/glusterfs/ glusterd.log

There's nothing wrong in the normal mode but when the system starts the
backup of some db, more or less six, the node changes from Y to N but
just with the most important db 1,9GB and it works well with other dbs
untill 1,4GB and on the logs of gluster I can read something like "the
node is disconnect" because can't read other peers.

It can be lock that hasn't been released, sadly only fix is to restart glusterd.

It's very sadly I can just restart the daemon to fix the "N" :\ but I
could try to add more ram and get other 2GB so change from 8GB to 10GB.

It's quite many years since I used GlusterFS, but back then at work we
had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
SAN based storage as nodes, the system got degradation with having high

I think GlusterFS needs more than 8GB of ram to prevent spikes when the
system does backup, ok also without backup job there isn't a lot of free memory (300-400MB) but there aren't down nodes!

read/write. In the end those was replaced by standard NFS servers which
gave more stability and then have replication from one SAN to another,
sure not a fully HA solution.

I'm watching cephfs but on internet I read it needs more ram than what I
use now so... like what I wrote above I think now I could try to upgrade
ram and run tests on GlusterFS because to change a "production cluster"
is not easy like a charm but I know also there aren't future plans about gluster and I heard it will be closed so... cephfs will be the only alternative.

gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
Gigabit network.

I read it but in a real production environment I think the cpu and ram quantities are little bit different...

^Bart
--- Synchronet 3.21a-Linux NewsLink 1.2

From J.O. Aho@user@example.net to alt.os.linux on Tue Jul 8 09:10:54 2025

From Newsgroup: alt.os.linux

On 07/07/2025 22.52, ^Bart wrote:

Have you seen anything in the logs? Maybe check /var/log/glusterfs/
glusterd.log

There's nothing wrong in the normal mode but when the system starts the backup of some db, more or less six, the node changes from Y to N but
just with the most important db 1,9GB and it works well with other dbs untill 1,4GB and on the logs of gluster I can read something like "the
node is disconnect" because can't read other peers.

Could this be that you are reaching max transfer on your network, which
could lead to that there ain't enough bandwidth for both file transfer
and checking of nodes are up?

Another alternative is that the nodes can't handle the speed the
incoming traffic has for a longer time, as it's on 100% disk
utilization, network traffic will also suffer (I have seen this, when
disk is slow, everything else seems to go into a crawl and network
connections fail as everything is queued up and while queued packages
times out).

It can be lock that hasn't been released, sadly only fix is to restart
glusterd.

It's very sadly I can just restart the daemon to fix the "N" :\ but I
could try to add more ram and get other 2GB so change from 8GB to 10GB.

If the RAM is used a cache before things are written down to disk, this
could help for a while, until the extra 2 GB is used up. If lucky that
is more than needed and then nothing ill will happen until you have more
data to backup.

It's quite many years since I used GlusterFS, but back then at work we
had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
SAN based storage as nodes, the system got degradation with having high

I think GlusterFS needs more than 8GB of ram to prevent spikes when the system does backup, ok also without backup job there isn't a lot of free memory (300-400MB) but there aren't down nodes!

read/write. In the end those was replaced by standard NFS servers
which gave more stability and then have replication from one SAN to
another, sure not a fully HA solution.

I'm watching cephfs but on internet I read it needs more ram than what I
use now so... like what I wrote above I think now I could try to upgrade
ram and run tests on GlusterFS because to change a "production cluster"
is not easy like a charm but I know also there aren't future plans about gluster and I heard it will be closed so... cephfs will be the only alternative.

Yeah, it's a load of work, think we had like 48h downtime when we
switched from gluster to nfs, customers wasn't that happy.

gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
Gigabit network.

I read it but in a real production environment I think the cpu and ram quantities are little bit different...

All depends on what you are doing, small amount cpu/ram works fine in
lab environments as you usually don't have 300+ clients trying to write.
--
//Aho

--- Synchronet 3.21a-Linux NewsLink 1.2

From ^Bart@none@none.it to alt.os.linux on Sun Oct 26 21:25:51 2025

From Newsgroup: alt.os.linux

All depends on what you are doing, small amount cpu/ram works fine in
lab environments as you usually don't have 300+ clients trying to write.

My CEO found the solution, the issue of the Wordpress sync by glusterfs
was... millions of small files made from Wordpress! :D

If sometime we delete these files the cluster works well!

We understood it because if we did a simple df -h on /var/wwww/ and the
node went down!

So... I'm sorry, glusterfs you wasn't the guilty! :D

^Bart
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux on Sun Oct 26 20:56:16 2025

From Newsgroup: alt.os.linux

On Sun, 26 Oct 2025 21:25:51 +0100, ^Bart wrote:

My CEO found the solution, the issue of the Wordpress sync by glusterfs was... millions of small files made from Wordpress! :D

If sometime we delete these files the cluster works well!

Maybe if you delete WordPress altogether, your site would work even
better ...

I’ve been setting up WordPress for a client, and it’s funny how sluggish it is, immediately after you have got it going, before you have even done
any actual work with it.
--- Synchronet 3.21a-Linux NewsLink 1.2

From Joerg Walther@joerg.walther@magenta.de to alt.os.linux on Mon Oct 27 17:40:28 2025

From Newsgroup: alt.os.linux

Lawrence DÂ´Oliveiro wrote:

I’ve been setting up WordPress for a client, and it’s funny how sluggish >it is, immediately after you have got it going, before you have even done >any actual work with it.

Shared hosting most likely? Usually it is possible to give your WP
instance more CPU power for a couple of bucks.

-jw-
--
And now for something completely different...
--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux on Mon Oct 27 21:15:21 2025

From Newsgroup: alt.os.linux

On Mon, 27 Oct 2025 17:40:28 +0100, Joerg Walther wrote:

Lawrence D’Oliveiro wrote:

I’ve been setting up WordPress for a client, and it’s funny how
sluggish it is, immediately after you have got it going, before you
have even done any actual work with it.

Shared hosting most likely?

Nope. Dedicated in-house VM, on pretty decent hardware (lots of RAM
and disk, CPU cores in the dozens) under XCP-ng. There are other
company intranet apps running on other VMs, written (by me) in both
PHP and Python, and they all work much more snappily than WordPress.
--- Synchronet 3.21a-Linux NewsLink 1.2

From J.O. Aho@user@example.net to alt.os.linux on Tue Oct 28 09:10:17 2025

From Newsgroup: alt.os.linux

On 27/10/2025 22.15, Lawrence D’Oliveiro wrote:

On Mon, 27 Oct 2025 17:40:28 +0100, Joerg Walther wrote:

Lawrence D’Oliveiro wrote:

I’ve been setting up WordPress for a client, and it’s funny how
sluggish it is, immediately after you have got it going, before you
have even done any actual work with it.

Shared hosting most likely?

Nope. Dedicated in-house VM, on pretty decent hardware (lots of RAM
and disk, CPU cores in the dozens) under XCP-ng.

It's not that much about how much the host machine has, it's how much
you assigned the VM and of course if you have over provisioned

There are other
company intranet apps running on other VMs, written (by me) in both
PHP and Python, and they all work much more snappily than WordPress.

Are they at the same size and as much db dependent as WP?
Also the database may be the bottle neck, as I don't know anything of
your setup it's difficult to just say what could be wrong...

There are so much more that can be different in a VM compared to run it
on bare metal, for example memcached had big issues some years ago when
run in a VM, it just was extremely slow, on a smaller bare metal
instance it was snappy as hell. The reason was how the memory was
handled in the by the virtualization.
--
//Aho

--- Synchronet 3.21a-Linux NewsLink 1.2

From Lawrence =?iso-8859-13?q?D=FFOliveiro?=@ldo@nz.invalid to alt.os.linux on Tue Oct 28 23:09:55 2025

From Newsgroup: alt.os.linux

On Tue, 28 Oct 2025 09:10:17 +0100, J.O. Aho wrote:

Are they at the same size and as much db dependent as WP?

Database-dependent -- several of them are quite heavily so.

Same size and complexity ... obviously not.
--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Microbot
  Sun Nov 2 10:23:48 2025
  from Moore, Ok via Telnet
- Microbot
  Sat Nov 1 11:39:40 2025
  from Moore, Ok via Telnet
- Microbot
  Fri Oct 31 13:29:16 2025
  from Moore, Ok via Telnet
- Microbot
  Thu Oct 30 14:49:53 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,075
Nodes:	10 (0 / 10)
Uptime:	97:11:33
Calls:	13,798
Calls today:	1
Files:	186,990
D/L today:	7,110 files (1,970M bytes)
Messages:	2,438,546

GlusterFS with replica 3

Who's Online

Recent Visitors

System Info