Forum: War Ensemble BBS

C++20 futex with heavy contention slower than mutex

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Mar 29 14:14:11 2024

From Newsgroup: comp.lang.c++

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.

#include <iostream>
#include <thread>
#include <mutex>
#include <atomic>
#include <functional>
#include <chrono>
#include <vector>

using namespace std;
using namespace chrono;

int main()
{
constexpr int64_t ROUNDS = 10000;
auto bench = [&]( char const *head, auto fn )
{
cout << head << endl;
atomic_int64_t tSum( 0 );
vector<jthread> threads;
int hc = jthread::hardware_concurrency();
for( int nThreads = 1; nThreads <= hc; ++nThreads )
{
for( unsigned t = nThreads; t; --t )
threads.emplace_back( [&]()
{
auto start = high_resolution_clock::now();
int64_t cmp = 0;
for( int64_t r = ROUNDS; r; --r )
cmp = fn( cmp );
tSum += duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
} );
threads.resize( 0 );
cout << "\t" << nThreads << ": " << tSum /
((double)nThreads * ROUNDS) << endl;
}
};
{
mutex mtx;
bench( "mutex: ",
[&]( int64_t )
{
mtx.lock();
mtx.unlock();
return 0;
} );
}
{
atomic_int64_t futex( 0 );
constexpr int64_t HIBIT = numeric_limits<int64_t>::min();
bench( "futex:",
[&]( int64_t cmp )
{
int64_t niu;
for( ; ; )
if( cmp >= 0 )
if( futex.compare_exchange_weak( cmp, niu = cmp
| HIBIT, memory_order_acquire, memory_order_relaxed ) )
{
cmp = niu;
break;
}
else;
else
if( futex.compare_exchange_weak( cmp, niu = cmp
+ 1, memory_order_relaxed, memory_order_relaxed ) )
{
futex.wait( niu, memory_order_acquire );
cmp = futex.load( memory_order_relaxed );
}
for( ; ; )
if( cmp & ~HIBIT )
if( futex.compare_exchange_weak( cmp, niu =
(cmp - 1) & ~HIBIT, memory_order_release, memory_order_relaxed ) )
{
cmp = niu;
futex.notify_one();
break;
}
else;
else
if( futex.compare_exchange_weak( cmp, 0, memory_order_release, memory_order_relaxed ) )
{
cmp = 0;
break;
}
return cmp;
} );
}
}

This are the results under Ubuntu 20.04 LTS:

mutex:
1: 3.9134
2: 5.8706
3: 33.1753
4: 48.63
5: 71.6777
6: 109.556
7: 151.271
8: 208.871
9: 288.031
10: 353.536
11: 446.28
12: 563.841
13: 701.134
14: 833.08
15: 983.734
16: 1138.48
17: 1297.53
18: 1481.15
19: 1664.51
20: 1857.88
21: 2053.83
22: 2285.07
23: 2546.67
24: 2782.53
25: 3065.25
26: 3349.4
27: 3652.06
28: 3971.25
29: 4339.13
30: 4727.31
31: 5129.95
32: 5499.05
futex:
1: 3.3654
2: 5.02155
3: 6.69107
4: 15.2522
5: 16.8824
6: 185.235
7: 162.176
8: 360.421
9: 2272.83
10: 4805.11
11: 8987.24
12: 14225.5
13: 20076.4
14: 27199.4
15: 36711.8
16: 49480.8
17: 57641.8
18: 75633.2
19: 97196.6
20: 122223
21: 147617
22: 180377
23: 216084
24: 256032
25: 299681
26: 345518
27: 398785
28: 445499
29: 506351
30: 572440
31: 643455
32: 721461

With all threads locking and unlocking the conventional mutex is 131
timea faster.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 15:28:16 2024

From Newsgroup: comp.lang.c++

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.

[...]

A futex is not a mutex!

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 15:30:48 2024

From Newsgroup: comp.lang.c++

On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.

[...]

A futex is not a mutex!

I have to take a look at the logic you used for your "mutex" based on a
futex, not enough time right now. Perhaps, the std::mutex is just way
more efficient that your use of a futex? Keep in mind futexs are tricky:

https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

;^)

It depends on the logic you use.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 16:29:43 2024

From Newsgroup: comp.lang.c++

On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:

On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.

[...]

A futex is not a mutex!

I have to take a look at the logic you used for your "mutex" based on a futex, not enough time right now. Perhaps, the std::mutex is just way
more efficient that your use of a futex? Keep in mind futexs are tricky:

https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

;^)

It depends on the logic you use.

You generally want a "waitbit" type of logic... I posted a mutex based
on a futex here a while back. I just need to find it. Iirc, it used a
Windows "futex"... If I can find it, you should test against it. It used
NO compare and swap. Ugghhhh.... Try to avoid that.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 19:46:29 2024

From Newsgroup: comp.lang.c++

On 3/29/2024 4:29 PM, Chris M. Thomasson wrote:

On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:

On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with >>>> Linux.

[...]

A futex is not a mutex!

I have to take a look at the logic you used for your "mutex" based on
a futex, not enough time right now. Perhaps, the std::mutex is just
way more efficient that your use of a futex? Keep in mind futexs are
tricky:

https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

;^)

It depends on the logic you use.

You generally want a "waitbit" type of logic... I posted a mutex based
on a futex here a while back. I just need to find it. Iirc, it used a Windows "futex"... If I can find it, you should test against it. It used
NO compare and swap. Ugghhhh.... Try to avoid that.

Another key, try to avoid waiting on a futex, strive for that. Maximize
the fast paths, and minimize the slow paths.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 30 05:18:35 2024

From Newsgroup: comp.lang.c++

Am 29.03.2024 um 23:28 schrieb Chris M. Thomasson:

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.

[...]

A futex is not a mutex!

You can build a mutex on top of a futex and that's the typical usage.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 30 09:25:26 2024

From Newsgroup: comp.lang.c++

Am 30.03.2024 um 00:29 schrieb Chris M. Thomasson:

On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:

On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with >>>> Linux.

[...]

A futex is not a mutex!

I have to take a look at the logic you used for your "mutex" based on
a futex, not enough time right now. Perhaps, the std::mutex is just
way more efficient that your use of a futex? Keep in mind futexs are
tricky:

https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

;^)

It depends on the logic you use.

You generally want a "waitbit" type of logic...

The sign-bit of my uint64 shows that signs that the lock is occupied
and the 63 lower bits sign the number of threads waiting to get owner-
ship. But i simplified the code a lot by using just an atomic_bool.
Now under Windows the futex is superior from 12 threads on. But under
Linux the futex is slower except for one or two threads.

Windows:

1: 248%
2: 54%
3: 27%
4: 28%
5: 33%
6: 31%
7: 37%
8: 37%
9: 53%
10: 42%
11: 82%
12: 101%
13: 116%
14: 142%
15: 153%
16: 160%
17: 175%
18: 191%
19: 192%
20: 194%
21: 206%
22: 211%
23: 221%
24: 220%
25: 223%
26: 220%
27: 220%
28: 223%
29: 222%
30: 220%
31: 219%
32: 216%

Linux:

1: 121%
2: 106%
3: 67%
4: 99%
5: 87%
6: 18%
7: 13%
8: 12%
9: 10%
10: 10%
11: 9%
12: 9%
13: 8%
14: 8%
15: 8%
16: 8%
17: 8%
18: 8%
19: 8%
20: 8%
21: 7%
22: 7%
23: 8%
24: 7%
25: 7%
26: 7%
27: 7%
28: 7%
29: 7%
30: 7%
31: 7%
32: 6%

#include <iostream>
#include <thread>
#include <atomic>
#include <chrono>
#include <vector>
#include <mutex>

using namespace std;
using namespace chrono;

int main()
{
unsigned hc = jthread::hardware_concurrency();
auto test = [&]( char const *head, auto fn, vector<double> &times )
{
cout << head << endl;
constexpr size_t ROUNDS = 10'000;
atomic_int64_t tSum, roundsSum;
vector<jthread> threads;
for( unsigned nThreads = 1; nThreads <= hc; ++nThreads )
{
auto st = []<typename T, integral X>( atomic<T> &a, X v ) { return
a.store( (T)v, memory_order_relaxed ); };
st( tSum, 0 );
st( roundsSum, 0 );
atomic_bool stop( false );
auto ld = []<typename T>( atomic<T> &a ) { return a.load(
memory_order_relaxed ); };
for( unsigned t = nThreads; t; --t )
threads.emplace_back( [&]
{
auto start = high_resolution_clock::now();
int64_t rounds = 0;
for( ; !ld( stop ); fn(), ++rounds );
auto add = []( atomic_int64_t &a, int64_t add ) { a.fetch_add(
add, memory_order_relaxed ); };
add( tSum, duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() );
add( roundsSum, rounds );
} );
this_thread::sleep_for( 100ms );
st( stop, true );
threads.resize( 0 );
times.emplace_back( ld( tSum ) / (double)ld( roundsSum ) );
cout << "\t" << nThreads << ": " << times.back() << endl;
}
};
vector<double> mutexTimes, futexTimes;
{
mutex mtx;
test( "mutex:",
[&]
{
mtx.lock();
mtx.unlock();
}, mutexTimes );
}
{
atomic_bool futex( false );
test( "futex:",
[&]
{
for( bool cmp; !futex.compare_exchange_weak( cmp = false, true,
memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
}
cout << "relation: " << endl;
for( size_t i = 0; i != hc; ++i )
cout << "\t" << i + 1 << ": " << (int)(100.0 * mutexTimes[i] / futexTimes[i] + 0.5) << "%" << endl;
}
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 30 11:51:59 2024

From Newsgroup: comp.lang.c++

On 3/30/2024 1:25 AM, Bonita Montero wrote:

{
                for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                    futex.wait( true, memory_order_relaxed );
                futex.store( false, memory_order_release );
                futex.notify_one();
            }, futexTimes );

You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a
hard time finding the damn thing in comp.lang.c++. I will try to find it.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 30 12:00:31 2024

From Newsgroup: comp.lang.c++

On 3/29/2024 6:14 AM, Bonita Montero wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.

[...]

I think I found my highly experimental windows futex based mutex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

(read all to refresh the context... ;^)

Notice:
___________________
void unlock()
{
if (InterlockedExchange(&m_state, 0) == 2)
{
WakeByAddressSingle(&m_state);
}
}
___________________

We only signal when we have to!

copy and paste from the link. Sorry about the copy and paste destroying
all of my indentation, argh!

Can you run this Bonita? Fwiw, it would be my first time using a Windows
Futex! lol:
______________________________________
#include <iostream>
#include <thread>
#include <functional>

#define WIN32_LEAN_AND_MEAN
#include <Windows.h>

#define CT_L2_ALIGNMENT 128
#define CT_THREADS 32
#define CT_ITERS 6666666

struct ct_futex_mutex
{
ULONG alignas(CT_L2_ALIGNMENT) m_state;

ct_futex_mutex() : m_state(0)
{

}

void lock()
{
if (InterlockedExchange(&m_state, 1))
{
while (InterlockedExchange(&m_state, 2))
{
ULONG cmp = 2;
WaitOnAddress(&m_state, &cmp, sizeof(ULONG), INFINITE);
}
}
}

void unlock()
{
if (InterlockedExchange(&m_state, 0) == 2)
{
WakeByAddressSingle(&m_state);
}
}
};

struct ct_shared
{
ct_futex_mutex m_mtx;
unsigned long m_count;

ct_shared() : m_count(0) {}

~ct_shared()
{
if (m_count != 0)
{
std::cout << "counter is totally fubar!\n";
}
}
};

void ct_thread(ct_shared& shared)
{
for (unsigned long i = 0; i < CT_ITERS; ++i)
{
shared.m_mtx.lock();
++shared.m_count;
shared.m_mtx.unlock();

shared.m_mtx.lock();
--shared.m_count;
shared.m_mtx.unlock();
}
}

int main()
{
std::thread threads[CT_THREADS];

std::cout << "Starting up...\n";

{
ct_shared shared;

for (unsigned long i = 0; i < CT_THREADS; ++i)
{
threads[i] = std::thread(ct_thread, std::ref(shared));
}

std::cout << "Running...\n";

for (unsigned long i = 0; i < CT_THREADS; ++i)
{
threads[i].join();
}
}

std::cout << "Completed!\n";

return 0;
}
______________________________________

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 30 12:02:28 2024

From Newsgroup: comp.lang.c++

On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:

On 3/30/2024 1:25 AM, Bonita Montero wrote:

{
                 for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                     futex.wait( true, memory_order_relaxed );
                 futex.store( false, memory_order_release ); >>                  futex.notify_one();
             }, futexTimes );

You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.

I found my older experimental code, take a deep look at ct_futex_mutex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 30 21:18:06 2024

From Newsgroup: comp.lang.c++

Am 30.03.2024 um 19:51 schrieb Chris M. Thomasson:

You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.

Futex notify and wait are at least very cheap with Windows.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 13:11:52 2024

From Newsgroup: comp.lang.c++

On 3/30/2024 1:18 PM, Bonita Montero wrote:

Am 30.03.2024 um 19:51 schrieb Chris M. Thomasson:

You need to introduce some sort of waitbit logic to minimize calls to
futex.notify_one() and futex.wait(). I posted an example but am having
a hard time finding the damn thing in comp.lang.c++. I will try to
find it.

Futex notify and wait are at least very cheap with Windows.

Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on every
mutex unlock, and wait on every point of contention wrt lock. Interesting.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 22:01:55 2024

From Newsgroup: comp.lang.c++

On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:

On 3/30/2024 1:25 AM, Bonita Montero wrote:

{
                 for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                     futex.wait( true, memory_order_relaxed );
                 futex.store( false, memory_order_release ); >>                  futex.notify_one();
             }, futexTimes );

You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.

Also, does notify_one automatically imply release semantics?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 22:02:23 2024

From Newsgroup: comp.lang.c++

On 3/31/2024 10:01 PM, Chris M. Thomasson wrote:

On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:

On 3/30/2024 1:25 AM, Bonita Montero wrote:

{
                 for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                     futex.wait( true, memory_order_relaxed );
                 futex.store( false, memory_order_release );
                 futex.notify_one();
             }, futexTimes );

You need to introduce some sort of waitbit logic to minimize calls to
futex.notify_one() and futex.wait(). I posted an example but am having
a hard time finding the damn thing in comp.lang.c++. I will try to
find it.

Also, does notify_one automatically imply release semantics?

I don't think so, humm. Cannot remember.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 1 08:32:08 2024

From Newsgroup: comp.lang.c++

Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on every mutex unlock, and wait on every point of contention wrt lock. Interesting.

I'm using notify and wait only when there's contention.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 23:40:28 2024

From Newsgroup: comp.lang.c++

On 3/31/2024 11:32 PM, Bonita Montero wrote:

Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock.
Interesting.

I'm using notify and wait only when there's contention.

Even here?
_____________
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
____________

What am I missing?

futex.store( false, memory_order_release );
futex.notify_one();

?

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 1 10:57:32 2024

From Newsgroup: comp.lang.c++

Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:

On 3/31/2024 11:32 PM, Bonita Montero wrote:

Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock.
Interesting.

I'm using notify and wait only when there's contention.

Even here?
_____________
                 for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                     futex.wait( true, memory_order_relaxed );
                 futex.store( false, memory_order_release );
                 futex.notify_one();
             }, futexTimes );
____________

What am I missing?

                 futex.store( false, memory_order_release );
                 futex.notify_one();

?

Ok, I was talking about my older version.
But notify is fast.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 1 11:01:05 2024

From Newsgroup: comp.lang.c++

Am 01.04.2024 um 10:57 schrieb Bonita Montero:

Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:

On 3/31/2024 11:32 PM, Bonita Montero wrote:

Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock.
Interesting.

I'm using notify and wait only when there's contention.

Even here?
_____________
                  for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                      futex.wait( true, memory_order_relaxed );
                  futex.store( false, memory_order_release );
                  futex.notify_one();
              }, futexTimes );
____________

What am I missing?

                  futex.store( false, memory_order_release );
                  futex.notify_one();

?

Ok, I was talking about my older version.
But notify is fast.

An uncontended notify is 2.5ns on my Zen4 computer.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c++ on Mon Apr 1 12:18:00 2024

From Newsgroup: comp.lang.c++

On Fri, 29 Mar 2024 14:14:11 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.

In case of heavy contention what to consider 'faster' is not at all
obvious.

On lightly loaded system with more cores than work to do (a typical
client) 'faster' means faster forward progress of group of contending
threads. Achieved by very long polling before switching to wait,
probably up to several tens of usec and by hyperactive tickless OS
scheduler.

On heavily loaded system with much more work to do than available
cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
than 500 nsec and by 'passive' OS scheduler that rarely intervenes
outside of clock tick.

And of course there are cases in the middle.

And then traditional HPC with MPI that is completely different kettle of
fish.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Apr 1 12:27:31 2024

From Newsgroup: comp.lang.c++

On 4/1/2024 2:01 AM, Bonita Montero wrote:

Am 01.04.2024 um 10:57 schrieb Bonita Montero:

Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:

On 3/31/2024 11:32 PM, Bonita Montero wrote:

Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock. >>>>> Interesting.

I'm using notify and wait only when there's contention.

Even here?
_____________
                  for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                      futex.wait( true, memory_order_relaxed );
                  futex.store( false, memory_order_release );
                  futex.notify_one();
              }, futexTimes );
____________

What am I missing?

                  futex.store( false, memory_order_release );
                  futex.notify_one();

?

Ok, I was talking about my older version.
But notify is fast.

An uncontended notify is 2.5ns on my Zen4 computer.

For the Microsoft version, right? I don't have Linux installed at the
moment.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Apr 2 04:37:47 2024

From Newsgroup: comp.lang.c++

Am 01.04.2024 um 21:27 schrieb Chris M. Thomasson:

On 4/1/2024 2:01 AM, Bonita Montero wrote:

Am 01.04.2024 um 10:57 schrieb Bonita Montero:

Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:

On 3/31/2024 11:32 PM, Bonita Montero wrote:

Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

Well, I have not taken a look at Microsoft's Futex implementation. >>>>>> Afaict, your code showcases that aspect wrt calling into notify on >>>>>> every mutex unlock, and wait on every point of contention wrt
lock. Interesting.

I'm using notify and wait only when there's contention.

Even here?
_____________
                  for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
                      futex.wait( true, memory_order_relaxed );
                  futex.store( false, memory_order_release );
                  futex.notify_one();
              }, futexTimes );
____________

What am I missing?

                  futex.store( false, memory_order_release );
                  futex.notify_one();

?

Ok, I was talking about my older version.
But notify is fast.

An uncontended notify is 2.5ns on my Zen4 computer.

For the Microsoft version, right? I don't have Linux installed at the moment.

Linux with libstdc++ is 1.7ns.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Apr 4 19:56:13 2024

From Newsgroup: comp.lang.c++

On 4/1/2024 2:18 AM, Michael S wrote:

On Fri, 29 Mar 2024 14:14:11 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:

The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.

In case of heavy contention what to consider 'faster' is not at all
obvious.

On lightly loaded system with more cores than work to do (a typical
client) 'faster' means faster forward progress of group of contending threads. Achieved by very long polling before switching to wait,
probably up to several tens of usec and by hyperactive tickless OS
scheduler.

On heavily loaded system with much more work to do than available
cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
than 500 nsec and by 'passive' OS scheduler that rarely intervenes
outside of clock tick.

And of course there are cases in the middle.

And then traditional HPC with MPI that is completely different kettle of fish.

Ditto. However, I did find it interesting (20+ years ago) to test worst
case scenarios. What happens when I blast this mutex algorithm A with artificial load vs algo B forcing it into horrible periods of radical contention. How do they handle it...
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Apr 12 14:16:52 2024

From Newsgroup: comp.lang.c++

On 4/1/2024 7:37 PM, Bonita Montero wrote:
[...]

Ok, I was talking about my older version.
But notify is fast.

An uncontended notify is 2.5ns on my Zen4 computer.

Keep in mind that notify and wait are slow paths. One needs to strive to minimize calls to notify and wait. Calling notify and/or wait when they
do _have_ to be called is a rather inefficient design.

For the Microsoft version, right? I don't have Linux installed at the
moment.

Linux with libstdc++ is 1.7ns.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 13 03:00:21 2024

From Newsgroup: comp.lang.c++

Am 12.04.2024 um 23:16 schrieb Chris M. Thomasson:

On 4/1/2024 7:37 PM, Bonita Montero wrote:
[...]

Ok, I was talking about my older version.
But notify is fast.

An uncontended notify is 2.5ns on my Zen4 computer.

Keep in mind that notify and wait are slow paths.

There's no slow path with futexes since you've to notify _always_.
And 1.7ns isn't slow.

One needs to strive to minimize calls to notify and wait. Calling notify and/or wait when they
do _have_ to be called is a rather inefficient design.

For the Microsoft version, right? I don't have Linux installed at the
moment.

Linux with libstdc++ is 1.7ns.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 12:26:55 2024

From Newsgroup: comp.lang.c++

On 4/12/2024 6:00 PM, Bonita Montero wrote:

Am 12.04.2024 um 23:16 schrieb Chris M. Thomasson:

On 4/1/2024 7:37 PM, Bonita Montero wrote:
[...]

Ok, I was talking about my older version.
But notify is fast.

An uncontended notify is 2.5ns on my Zen4 computer.

Keep in mind that notify and wait are slow paths.

There's no slow path with futexes since you've to notify _always_.

Huh? Nope! Where did you get that idea from? calling notify is a slow
path, and calling wait is a slow path.

And 1.7ns isn't slow.

You do not want to _always_ call notify and/or wait when you do not have to.

One needs to strive to minimize calls to notify and wait. Calling
notify and/or wait when they do _have_ to be called is a rather
inefficient design.

For the Microsoft version, right? I don't have Linux installed at
the moment.

Linux with libstdc++ is 1.7ns.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 06:19:16 2024

From Newsgroup: comp.lang.c++

Am 13.04.2024 um 21:26 schrieb Chris M. Thomasson:

On 4/12/2024 6:00 PM, Bonita Montero wrote:

There's no slow path with futexes since you've to notify _always_.

Huh? Nope! Where did you get that idea from? calling notify is a slow
path, and calling wait is a slow path.

Unlocking a futex always involves a notify since the notify doesn't
depend on a state like a wait().

And 1.7ns isn't slow.

You do not want to _always_ call notify and/or wait when you do not have
to.

You don't unterstand futexes. Show me your mutex basing on C++20 futexes.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 21:26:12 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 9:19 PM, Bonita Montero wrote:

Am 13.04.2024 um 21:26 schrieb Chris M. Thomasson:

On 4/12/2024 6:00 PM, Bonita Montero wrote:

There's no slow path with futexes since you've to notify _always_.

Huh? Nope! Where did you get that idea from? calling notify is a slow
path, and calling wait is a slow path.

Unlocking a futex always involves a notify since the notify doesn't
depend on a state like a wait().

And 1.7ns isn't slow.

You do not want to _always_ call notify and/or wait when you do not
have to.

You don't unterstand futexes. Show me your mutex basing on C++20 futexes.

Did you read it? using windwos futexes:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

Notice how it does not unlock all the time?

I can port it to c++20.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 06:30:52 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20 futexes.

Did you read it? using windwos futexes: https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 06:34:09 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 06:30 schrieb Bonita Montero:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

That's the simplest code in C++20:

struct fute_xchg_mutex
{
fute_xchg_mutex();
void lock();
void unlock();
private:
atomic_bool m_locked;
};

fute_xchg_mutex::fute_xchg_mutex() :
m_locked( false )
{
}

void fute_xchg_mutex::lock()
{
while( m_locked.exchange( true, memory_order_acquire ) )
m_locked.wait( true, memory_order_relaxed );
}

void fute_xchg_mutex::unlock()
{
m_locked.store( false, memory_order_release );
m_locked.notify_one();
}

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:21:22 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 9:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

I beg to disagree.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:21:42 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 9:34 PM, Bonita Montero wrote:

Am 14.04.2024 um 06:30 schrieb Bonita Montero:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

That's the simplest code in C++20:

struct fute_xchg_mutex
{
    fute_xchg_mutex();
    void lock();
    void unlock();
private:
    atomic_bool m_locked;
};

fute_xchg_mutex::fute_xchg_mutex() :
    m_locked( false )
{
}

void fute_xchg_mutex::lock()
{
    while( m_locked.exchange( true, memory_order_acquire ) )
        m_locked.wait( true, memory_order_relaxed );
}

void fute_xchg_mutex::unlock()
{
    m_locked.store( false, memory_order_release );
    m_locked.notify_one();

^^^^^^^^^^

NONONNOONONO!

}

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:22:29 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 9:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

this is a complete troll, right?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 07:30:59 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:

On 4/13/2024 9:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

this is a complete troll, right?

You don't know futexes.
Show me your code with a mutex basing on a futex.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:32:27 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 10:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:

On 4/13/2024 9:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>> Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

this is a complete troll, right?

You don't know futexes.
Show me your code with a mutex basing on a futex.

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

Do you really want me to port this to c++20?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 07:36:25 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:

On 4/13/2024 10:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:

On 4/13/2024 9:30 PM, Bonita Montero wrote:

Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

On 4/13/2024 9:19 PM, Bonita Montero wrote:

You don't unterstand futexes. Show me your mutex basing on C++20
futexes.

Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>> Notice how it does not unlock all the time?
I can port it to c++20.

Show me your mutex with a futex.
It's impossible to write that without notifying *always*.

this is a complete troll, right?

You don't know futexes.
Show me your code with a mutex basing on a futex.

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

Do you really want me to port this to c++20?

Show it here with correct formatting.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:42:57 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 10:36 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:

[...]

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

Do you really want me to port this to c++20?

Show it here with correct formatting.

Do you mean, indentation? ;^)

Anyway, you commented in the thread I linked you to.

So, you can read my code, right?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 07:46:08 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:

On 4/13/2024 10:36 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:

[...]

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

Do you really want me to port this to c++20?

Show it here with correct formatting.

Do you mean, indentation? ;^)

Anyway, you commented in the thread I linked you to.

So, you can read my code, right?

You don't understand futexes since you call notifying with
the slow path slow. But notifying is even not slow if there
are contenders with futexes.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:46:33 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 10:42 PM, Chris M. Thomasson wrote:

On 4/13/2024 10:36 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:

[...]

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

Do you really want me to port this to c++20?

Show it here with correct formatting.

Do you mean, indentation? ;^)

Anyway, you commented in the thread I linked you to.

So, you can read my code, right?

I use the explicit Windows API for a futex. Think of:

https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress

;^)
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:48:45 2024

From Newsgroup: comp.lang.c++

On 4/13/2024 10:46 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:

On 4/13/2024 10:36 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:

[...]

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>
Do you really want me to port this to c++20?

Show it here with correct formatting.

Do you mean, indentation? ;^)

Anyway, you commented in the thread I linked you to.

So, you can read my code, right?

You don't understand futexes since you call notifying with
the slow path slow. But notifying is even not slow if there
are contenders with futexes.

What? I am quite sure you read this right?
____________________
void unlock()
{
if (InterlockedExchange(&m_state, 0) == 2)
{
WakeByAddressSingle(&m_state);
}
}
____________________

Take careful notice how I try to avoid a call into WakeByAddressSingle?
See? ;^o

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 11:25:33 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 07:48 schrieb Chris M. Thomasson:

On 4/13/2024 10:46 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:

On 4/13/2024 10:36 PM, Bonita Montero wrote:

Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:

[...]

I did. Read again, well Windows Futex:

https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>>
Do you really want me to port this to c++20?

Show it here with correct formatting.

Do you mean, indentation? ;^)

Anyway, you commented in the thread I linked you to.

So, you can read my code, right?

You don't understand futexes since you call notifying with
the slow path slow. But notifying is even not slow if there
are contenders with futexes.

What? I am quite sure you read this right?
____________________
void unlock()
{
    if (InterlockedExchange(&m_state, 0) == 2)
    {
        WakeByAddressSingle(&m_state);
    }
}
____________________

Take careful notice how I try to avoid a call into WakeByAddressSingle?
See? ;^o

This is from a benchmark of two constantly contending threads:

atomic with locker counter + std::binary_semaphore:
10.4597
atoomic with locker counter + shown futephore:
14.9949
Chris Code:
18.6445
shown atomic_bool with always wakeup:
24.7084
std::mutex:
33.8646

The numbers are in nanoseconds for a locking unlocking iteration.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 15:12:17 2024

From Newsgroup: comp.lang.c++

That's my code:

#include <iostream>
#include <mutex>
#include <thread>
#include <chrono>
#include <semaphore>
#include <algorithm>
#include <vector>

using namespace std;
using namespace chrono;

int main()
{
using kv = pair<char const *, double>;
vector<kv> results;
auto bench = [&]( char const *what, auto fn )
{
constexpr size_t ROUNDS = 10'000'000;
atomic_int64_t tSum = 0;
auto thr = [&]()
{
auto start = high_resolution_clock::now();
for( size_t r = ROUNDS; r; --r )
fn();
tSum.fetch_add( duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() );
};
jthread jtA( thr ), jtB( thr );
jtA.join(), jtB.join();
results.emplace_back( what, tSum / ((double)ROUNDS * 2.0) );
};
// atomic coutner with futex-semaphore
atomic_uint32_t
lockCounter( 0 ),
futSem( 0 );
bench( "atomic coutner with futex-semaphore", [&]
{
if( lockCounter.fetch_add( 1, memory_order_acquire ) > 0 )
for( uint32_t before = futSem.load(
memory_order_relaxed ); ; )
if( !before )
futSem.wait( 0, memory_order_relaxed ),
before = futSem.load( memory_order_relaxed );
else if( futSem.compare_exchange_strong( before,
before - 1, memory_order_acquire, memory_order_relaxed ) ) [[likely]]
break;
if( lockCounter.fetch_sub( 1, memory_order_relaxed ) > 1 )
for( uint32_t before = futSem.load(
memory_order_relaxed ); ; )
if( futSem.compare_exchange_strong( before, before
+ 1, memory_order_release, memory_order_relaxed ) )
{
futSem.notify_one();
break;
}
} );
// std::mutex
mutex mtx;
bench( "std::mutex", [&] { mtx.lock(), mtx.unlock(); });
// atomic coutner with a binary_semaphore
binary_semaphore sem( 0 );
bench( "atomic coutner with binary_semaphore", [&]
{
if( lockCounter.fetch_add( 1, memory_order_acquire ) > 0 )
sem.acquire();
if( lockCounter.fetch_sub( 1, memory_order_release ) > 1 )
sem.release();
} );
// xchg-mutex and always wakeup
atomic_bool lockFlag( false );
bench( "xchg-mutex and always wakeup", [&]
{
while( lockFlag.exchange( true, memory_order_acquire ) )
lockFlag.wait( true, memory_order_relaxed );
lockFlag.store( false, memory_order_release );
lockFlag.notify_one();
} );
// Chris' mutex
atomic_int lockState( 0 );
bench( "Chris' mutex", [&]
{
if( lockState.exchange( 1, memory_order_acquire ) )
while( lockState.exchange( 2, memory_order_acquire ) )
lockState.wait( 2, memory_order_acquire );
if( lockState.exchange( 0, memory_order_release ) == 2 )
lockState.notify_one();
} );
sort( results.begin(), results.end(), []( kv &lhs, kv &rhs ) {
return lhs.second < rhs.second; } );
for( kv &result : results )
cout << result.first << ": " << result.second << endl;
}
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 12:02:18 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out those
neat C++20 futexes.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 12:06:07 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 12:02 PM, Chris M. Thomasson wrote:

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Error! It's not my semaphore!

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out those
neat C++20 futexes.

OOPS! I misstated, I did not create this algo. Afaict, the original can
be found here:

https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html

Sorry everybody. Now, here is _my_ rwlock:

https://vorbrodt.blog/2019/02/14/read-write-mutex

All me! :^)
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 12:10:19 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

    atomic_int lockState( 0 );
    bench( "Chris' mutex", [&]
        {
            if( lockState.exchange( 1, memory_order_acquire ) )
                while( lockState.exchange( 2, memory_order_acquire ) )

                    lockState.wait( 2, memory_order_acquire );

No need for memory_order_acquire on the wait. Relaxed will do.

            if( lockState.exchange( 0, memory_order_release ) == 2 )
                lockState.notify_one();
        } );
    sort( results.begin(), results.end(), []( kv &lhs, kv &rhs ) { return lhs.second < rhs.second; } );
    for( kv &result : results )
        cout << result.first << ": " << result.second << endl;
}

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 22:57:30 2024

From Newsgroup: comp.lang.c++

Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out those
neat C++20 futexes.

A semaphore with a mutex, you're funny ...
That ain't fast.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 20:23:28 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 1:57 PM, Bonita Montero wrote:

Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out those
neat C++20 futexes.

A semaphore with a mutex, you're funny ...
That ain't fast.

The fast_semaphore::semaphore is meant to be a kernel semaphore. So, a
Windows one will work. A Linux one will work, and the portable mutex
condvar semaphore impl emulation will work. This semaphore is fast wrt
its fast paths. Btw, you have heard of the benaphore before, right?

https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26

Its been there for a while... ;^) It's something that every threading
nerd should know about. Remember when you called me a nerd several times
in the past? lol. ;^)

Anyway, take a close look at the fast paths vs the slow paths:

https://pastebin.com/raw/Q7p6e7Xc

_________________
class fast_semaphore
{
public:
fast_semaphore(int count) noexcept
: m_count(count), m_semaphore(0) {}

void post()
{
std::atomic_thread_fence(std::memory_order_release);
int count = m_count.fetch_add(1, std::memory_order_relaxed);
if (count < 0)
m_semaphore.post();
}

void wait()
{
int count = m_count.fetch_sub(1, std::memory_order_relaxed);
if (count < 1)
m_semaphore.wait();
std::atomic_thread_fence(std::memory_order_acquire);
}

private:
std::atomic<int> m_count;
semaphore m_semaphore;
};
_________________

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 20:26:13 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:

On 4/14/2024 1:57 PM, Bonita Montero wrote:

Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out
those neat C++20 futexes.

A semaphore with a mutex, you're funny ...
That ain't fast.

[...]

I think you might have a problem differentiating a fast-path from a slow-path... Humm...

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 15 06:15:55 2024

From Newsgroup: comp.lang.c++

Am 15.04.2024 um 05:26 schrieb Chris M. Thomasson:

On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:

On 4/14/2024 1:57 PM, Bonita Montero wrote:

Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out
those neat C++20 futexes.

A semaphore with a mutex, you're funny ...
That ain't fast.

[...]

I think you might have a problem differentiating a fast-path from a slow-path... Humm...

Absolutely not, but you consider a futex wake as slow.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 22:03:24 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 9:15 PM, Bonita Montero wrote:

Am 15.04.2024 um 05:26 schrieb Chris M. Thomasson:

On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:

On 4/14/2024 1:57 PM, Bonita Montero wrote:

Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:

On 4/14/2024 6:12 AM, Bonita Montero wrote:

That's my code:

[...]

Might as well throw my semaphore into the mix:

https://vorbrodt.blog/2019/02/05/fast-semaphore/

I will code up a little program, most likely today, to check out
those neat C++20 futexes.

A semaphore with a mutex, you're funny ...
That ain't fast.

[...]

I think you might have a problem differentiating a fast-path from a
slow-path... Humm...

Absolutely not, but you consider a futex wake as slow.

Yes. I consider a futex notify and wake as "slow" paths. One should
strive to avoid them. I don't know about the current impl of futex wrt
Linux and Windows. I just treat them as kernel calls that should be on
the slow path. Not on the fast path. The damn thing might even lock an internal hashed mutex on the address for all I know wrt an unknown impl
of a futex.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 15 08:20:51 2024

From Newsgroup: comp.lang.c++

Am 15.04.2024 um 07:03 schrieb Chris M. Thomasson:

On 4/14/2024 9:15 PM, Bonita Montero wrote:

Absolutely not, but you consider a futex wake as slow.

Yes. I consider a futex notify and wake as "slow" paths.

I've mentioned that a futex wake with no contenders is 1.7ns or about
10 clock cycles on my Zen4-CPU; that's nothing worth to think about.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Apr 15 12:08:57 2024

From Newsgroup: comp.lang.c++

On 4/14/2024 11:20 PM, Bonita Montero wrote:

Am 15.04.2024 um 07:03 schrieb Chris M. Thomasson:

On 4/14/2024 9:15 PM, Bonita Montero wrote:

Absolutely not, but you consider a futex wake as slow.

Yes. I consider a futex notify and wake as "slow" paths.

I've mentioned that a futex wake with no contenders is 1.7ns or about
10 clock cycles on my Zen4-CPU; that's nothing worth to think about.

I don't know what that futex notify is actually doing under the covers.
One impl might be faster than another. Whatever. I still consider a
futex notify to be a slow path because it denotes contention. Trying to
avoid a call into it is still rather "prudent", well according to me.
Why call a notify when we do not have to?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Apr 16 06:32:11 2024

From Newsgroup: comp.lang.c++

Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

I don't know what that futex notify is actually doing under the covers.

In 10 clock cycles you can't do much.

One impl might be faster than another.

With Windows it's ony 2.7ns and with Linux 1.7ns.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Apr 16 11:31:34 2024

From Newsgroup: comp.lang.c++

Am 16.04.2024 um 06:32 schrieb Bonita Montero:

Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

I don't know what that futex notify is actually doing under the covers.

In 10 clock cycles you can't do much.

One impl might be faster than another.

With Windows it's ony 2.7ns and with Linux 1.7ns.

Try it yourself:

#include <iostream>
#include <atomic>
#include <chrono>

using namespace std;
using namespace chrono;

int main()
{
atomic_bool ab;
auto start = high_resolution_clock::now();
for( size_t i = 1'000'000'000; i; --i )
ab.notify_one();
cout << duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e9 << endl;
}

With WSL2 on a Zen4-CPU it's only 1.2ns, i.e. seven clock cycles.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Apr 16 14:07:35 2024

From Newsgroup: comp.lang.c++

On 4/16/2024 2:31 AM, Bonita Montero wrote:

Am 16.04.2024 um 06:32 schrieb Bonita Montero:

Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

I don't know what that futex notify is actually doing under the covers.

In 10 clock cycles you can't do much.

One impl might be faster than another.

With Windows it's ony 2.7ns and with Linux 1.7ns.

Try it yourself:

#include <iostream>
#include <atomic>
#include <chrono>

using namespace std;
using namespace chrono;

int main()
{
    atomic_bool ab;
    auto start = high_resolution_clock::now();
    for( size_t i = 1'000'000'000; i; --i )
        ab.notify_one();
    cout << duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e9 << endl;
}

With WSL2 on a Zen4-CPU it's only 1.2ns, i.e. seven clock cycles.

I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Apr 16 14:14:25 2024

From Newsgroup: comp.lang.c++

On 4/15/2024 9:32 PM, Bonita Montero wrote:

Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

I don't know what that futex notify is actually doing under the covers.

In 10 clock cycles you can't do much.

Well, futex notify might have fast paths in and of itself. To be prudent
I would need to see how they implement it to allow a futex notify by,
every time. Fair enough?

One impl might be faster than another.

With Windows it's ony 2.7ns and with Linux 1.7ns.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Wed Apr 17 10:24:20 2024

From Newsgroup: comp.lang.c++

Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:

I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.

It will practically not matter how it is implemented if it is that fast.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Wed Apr 17 11:48:42 2024

From Newsgroup: comp.lang.c++

On 4/17/2024 1:24 AM, Bonita Montero wrote:

Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:

I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.

It will practically not matter how it is implemented if it is that fast.

It just might take an internal hashed mutex to check for contention...
Humm... I don't know until I look at it. I don't even want it to do a
CAS, or execute any membars on a "fast-path". Since I don't know what
its doing under the covers, I still think its "prudent" to try to avoid calling into it when we do absolutely have to.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Wed Apr 17 11:52:13 2024

From Newsgroup: comp.lang.c++

On 4/17/2024 11:48 AM, Chris M. Thomasson wrote:

On 4/17/2024 1:24 AM, Bonita Montero wrote:

Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:

I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.

It will practically not matter how it is implemented if it is that fast.

It just might take an internal hashed mutex to check for contention... Humm... I don't know until I look at it. I don't even want it to do a
CAS, or execute any membars on a "fast-path". Since I don't know what
its doing under the covers, I still think its "prudent" to try to avoid calling into it when we do absolutely have to.

^^^^^^^^^^^^^^

God damn typos! Sorry Bonita!

I still think its "prudent" to try to avoid calling into it, _except_
when we absolutely have to.

Argh! Sorry again.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Thu Apr 18 15:12:22 2024

From Newsgroup: comp.lang.c++

Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:

Well, futex notify might have fast paths in and of itself. To be prudent
I would need to see how they implement it to allow a futex notify by,
every time. Fair enough?

I'm asking myself if it would be possible to have context-switching as
most as possible in userspace. If there would be a context-switch from
one thread of a process to another thread because a timeslice expired
the kernel should send a signal to the thread and the thread does the
userspace context-switch by itself. Only if there's a context switch
to another process' thread or in kernel mode the kernel's scheduler
acts itself.
This would give the opportunity to have voluntary context switches
when doing locking much faster than trough the kernel, and voluntary
context switches usually happen with a much higher frequency that
there would be a real gain.
With Linux this would be possible trough signals and on Windows the
kernel could induce SEH-exceptions for a thread-switch.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Richard Damon@richard@damon-family.org to comp.lang.c++ on Thu Apr 18 22:03:12 2024

From Newsgroup: comp.lang.c++

On 4/18/24 9:12 AM, Bonita Montero wrote:

Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:

Well, futex notify might have fast paths in and of itself. To be
prudent I would need to see how they implement it to allow a futex
notify by, every time. Fair enough?

I'm asking myself if it would be possible to have context-switching as
most as possible in userspace. If there would be a context-switch from
one thread of a process to another thread because a timeslice expired
the kernel should send a signal to the thread and the thread does the userspace context-switch by itself. Only if there's a context switch
to another process' thread or in kernel mode the kernel's scheduler
acts itself.
This would give the opportunity to have voluntary context switches
when doing locking much faster than trough the kernel, and voluntary
context switches usually happen with a much higher frequency that
there would be a real gain.
With Linux this would be possible trough signals and on Windows the
kernel could induce SEH-exceptions for a thread-switch.

How do you "signal" a user-thread without doing a kernel operation and a thread switch?

Admittedly, if the kernel knows it is switching from one thread to
another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Apr 19 06:18:18 2024

From Newsgroup: comp.lang.c++

Am 19.04.2024 um 04:03 schrieb Richard Damon:

How do you "signal" a user-thread without doing a kernel operation
and a thread switch?

The signals for the involuntary userspace thread-switch would be sent
by a dedicated kernel-thread or by the timer interrupt. This would be
more expensive than a thread-switch through the timer interrupt but
as voluntary thread-switches have a much higher frequency this would
be outweighed.

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of context-switch, but it still needs to deal with kernel space operations.

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.
--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Fri Apr 19 13:41:36 2024

From Newsgroup: comp.lang.c++

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Apr 19 15:44:37 2024

From Newsgroup: comp.lang.c++

Am 19.04.2024 um 15:41 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.

The thing that I'm imaging is still 1:1 but if threads are in userspace thread-switching would be done by the userspace.
--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Fri Apr 19 14:48:44 2024

From Newsgroup: comp.lang.c++

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 15:41 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations. >>>

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.

The thing that I'm imaging is still 1:1 but if threads are in userspace >thread-switching would be done by the userspace.

Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Apr 19 17:32:38 2024

From Newsgroup: comp.lang.c++

Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).

I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.

--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Fri Apr 19 18:27:14 2024

From Newsgroup: comp.lang.c++

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).

I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.

https://www.kernel.org/

Feel free to modify the kernel to your heart's content.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 20 04:48:47 2024

From Newsgroup: comp.lang.c++

Am 19.04.2024 um 20:27 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).

I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.

https://www.kernel.org/

Feel free to modify the kernel to your heart's content.

Seems you don't understand the idead and you think this isn't
possible.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Richard Damon@richard@damon-family.org to comp.lang.c++ on Fri Apr 19 23:00:18 2024

From Newsgroup: comp.lang.c++

On 4/19/24 12:18 AM, Bonita Montero wrote:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

How do you "signal" a user-thread without doing a kernel operation
and a thread switch?

The signals for the involuntary userspace thread-switch would be sent
by a dedicated kernel-thread or by the timer interrupt. This would be
more expensive than a thread-switch through the timer interrupt but
as voluntary thread-switches have a much higher frequency this would
be outweighed.

TO WHAT?

Are you going to reserve a core with a dedicated thread to do this?

To "interrupt" a user thread to notify it, you would either need to
perform a context switch to save the threads previous context or make
the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
that you want to go to.

Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.

A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.

But you still had the kernal doing a context switch.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 20 06:11:57 2024

From Newsgroup: comp.lang.c++

Am 20.04.2024 um 05:00 schrieb Richard Damon:

On 4/19/24 12:18 AM, Bonita Montero wrote:

Am 19.04.2024 um 04:03 schrieb Richard Damon:

How do you "signal" a user-thread without doing a kernel operation
and a thread switch?

The signals for the involuntary userspace thread-switch would be sent
by a dedicated kernel-thread or by the timer interrupt. This would be
more expensive than a thread-switch through the timer interrupt but
as voluntary thread-switches have a much higher frequency this would
be outweighed.

TO WHAT?

Are you going to reserve a core with a dedicated thread to do this?

A userspace thread is scheduled on a core if there's no other thread
on that core. But if there would be other threads eligible to run on
the core they would be scheduled by a signal in userspace if their
context is also in userspace.

To "interrupt" a user thread to notify it, you would either need to
perform a context switch to save the threads previous context or make
the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
that you want to go to.

You simply don't understand my idea !

--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Sat Apr 20 15:16:49 2024

From Newsgroup: comp.lang.c++

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 20:27 schrieb Scott Lurndal:

Bonita Montero <Bonita.Montero@gmail.com> writes:

Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).

I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.

https://www.kernel.org/

Feel free to modify the kernel to your heart's content.

Seems you don't understand the idead and you think this isn't
possible.

It seems the lack of understanding is on you. Du verstehst es nicht.

"... should not make context switches to another thread inside
the same process if the thread is within userspace".

It seems clear that the thread within userspace is completely
invisible to the kernel, thus it cannot by definition switch to it.

It's not that I don't think it is possible, I just don't think
it provides any measurable benefit for the added complexity.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 20 17:36:35 2024

From Newsgroup: comp.lang.c++

Am 20.04.2024 um 17:16 schrieb Scott Lurndal:

It seems clear that the thread within userspace is completely
invisible to the kernel, thus it cannot by definition switch to it.

I'm not talking about userlevel-threads but the state when a thread
is in userspace-state. You lack imagination since you don't understand
my idea.

--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	915
Nodes:	10 (1 / 9)
Uptime:	22:09:51
Calls:	12,168
Files:	186,520
Messages:	2,233,970

C++20 futex with heavy contention slower than mutex

Who's Online

System Info