The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.
On 3/29/2024 6:14 AM, Bonita Montero wrote:
The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.
A futex is not a mutex!
On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
On 3/29/2024 6:14 AM, Bonita Montero wrote:
The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.
A futex is not a mutex!
I have to take a look at the logic you used for your "mutex" based on a futex, not enough time right now. Perhaps, the std::mutex is just way
more efficient that your use of a futex? Keep in mind futexs are tricky:
https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf
;^)
It depends on the logic you use.
On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:
On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
On 3/29/2024 6:14 AM, Bonita Montero wrote:
The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with >>>> Linux.
A futex is not a mutex!
I have to take a look at the logic you used for your "mutex" based on
a futex, not enough time right now. Perhaps, the std::mutex is just
way more efficient that your use of a futex? Keep in mind futexs are
tricky:
https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf
;^)
It depends on the logic you use.
You generally want a "waitbit" type of logic... I posted a mutex based
on a futex here a while back. I just need to find it. Iirc, it used a Windows "futex"... If I can find it, you should test against it. It used
NO compare and swap. Ugghhhh.... Try to avoid that.
On 3/29/2024 6:14 AM, Bonita Montero wrote:
The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.
A futex is not a mutex!
On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:
On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
On 3/29/2024 6:14 AM, Bonita Montero wrote:
The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with >>>> Linux.
A futex is not a mutex!
I have to take a look at the logic you used for your "mutex" based on
a futex, not enough time right now. Perhaps, the std::mutex is just
way more efficient that your use of a futex? Keep in mind futexs are
tricky:
https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf
;^)
It depends on the logic you use.
You generally want a "waitbit" type of logic...
{
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
The following program simulates constant locking und unlocking of one[...]
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.
On 3/30/2024 1:25 AM, Bonita Montero wrote:
{
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release ); >> futex.notify_one();
}, futexTimes );
You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.
You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.
Am 30.03.2024 um 19:51 schrieb Chris M. Thomasson:
You need to introduce some sort of waitbit logic to minimize calls to
futex.notify_one() and futex.wait(). I posted an example but am having
a hard time finding the damn thing in comp.lang.c++. I will try to
find it.
Futex notify and wait are at least very cheap with Windows.
On 3/30/2024 1:25 AM, Bonita Montero wrote:
{
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release ); >> futex.notify_one();
}, futexTimes );
You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.
On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:
On 3/30/2024 1:25 AM, Bonita Montero wrote:
{
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
You need to introduce some sort of waitbit logic to minimize calls to
futex.notify_one() and futex.wait(). I posted an example but am having
a hard time finding the damn thing in comp.lang.c++. I will try to
find it.
Also, does notify_one automatically imply release semantics?
Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on every mutex unlock, and wait on every point of contention wrt lock. Interesting.
Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:
Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock.
Interesting.
I'm using notify and wait only when there's contention.
On 3/31/2024 11:32 PM, Bonita Montero wrote:
Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:
Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock.
Interesting.
I'm using notify and wait only when there's contention.
Even here?
_____________
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
____________
What am I missing?
futex.store( false, memory_order_release );
futex.notify_one();
?
Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
On 3/31/2024 11:32 PM, Bonita Montero wrote:
Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:
Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock.
Interesting.
I'm using notify and wait only when there's contention.
Even here?
_____________
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
____________
What am I missing?
futex.store( false, memory_order_release );
futex.notify_one();
?
Ok, I was talking about my older version.
But notify is fast.
The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
Linux.
Am 01.04.2024 um 10:57 schrieb Bonita Montero:
Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
On 3/31/2024 11:32 PM, Bonita Montero wrote:
Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:
Well, I have not taken a look at Microsoft's Futex implementation.
Afaict, your code showcases that aspect wrt calling into notify on
every mutex unlock, and wait on every point of contention wrt lock. >>>>> Interesting.
I'm using notify and wait only when there's contention.
Even here?
_____________
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
____________
What am I missing?
futex.store( false, memory_order_release );
futex.notify_one();
?
Ok, I was talking about my older version.
But notify is fast.
An uncontended notify is 2.5ns on my Zen4 computer.
On 4/1/2024 2:01 AM, Bonita Montero wrote:
Am 01.04.2024 um 10:57 schrieb Bonita Montero:
Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
On 3/31/2024 11:32 PM, Bonita Montero wrote:
Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:
Well, I have not taken a look at Microsoft's Futex implementation. >>>>>> Afaict, your code showcases that aspect wrt calling into notify on >>>>>> every mutex unlock, and wait on every point of contention wrt
lock. Interesting.
I'm using notify and wait only when there's contention.
Even here?
_____________
for( bool cmp; !futex.compare_exchange_weak( cmp =
false, true, memory_order_acquire, memory_order_relaxed ); )
futex.wait( true, memory_order_relaxed );
futex.store( false, memory_order_release );
futex.notify_one();
}, futexTimes );
____________
What am I missing?
futex.store( false, memory_order_release );
futex.notify_one();
?
Ok, I was talking about my older version.
But notify is fast.
An uncontended notify is 2.5ns on my Zen4 computer.
For the Microsoft version, right? I don't have Linux installed at the moment.
On Fri, 29 Mar 2024 14:14:11 +0100
Bonita Montero <Bonita.Montero@gmail.com> wrote:
The following program simulates constant locking und unlocking of one
to jthread::hardware_concurrency() threadas with a std::mutex and a
futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
5 threads constantly contending, but beyond the CPU time of the futex
explodes and the conventional mutex is faster with Windows as and with
Linux.
In case of heavy contention what to consider 'faster' is not at all
obvious.
On lightly loaded system with more cores than work to do (a typical
client) 'faster' means faster forward progress of group of contending threads. Achieved by very long polling before switching to wait,
probably up to several tens of usec and by hyperactive tickless OS
scheduler.
On heavily loaded system with much more work to do than available
cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
than 500 nsec and by 'passive' OS scheduler that rarely intervenes
outside of clock tick.
And of course there are cases in the middle.
And then traditional HPC with MPI that is completely different kettle of fish.
Ok, I was talking about my older version.
But notify is fast.
An uncontended notify is 2.5ns on my Zen4 computer.
For the Microsoft version, right? I don't have Linux installed at the
moment.
Linux with libstdc++ is 1.7ns.
On 4/1/2024 7:37 PM, Bonita Montero wrote:
[...]
Ok, I was talking about my older version.
But notify is fast.
An uncontended notify is 2.5ns on my Zen4 computer.
Keep in mind that notify and wait are slow paths.
One needs to strive to minimize calls to notify and wait. Calling notify and/or wait when they
do _have_ to be called is a rather inefficient design.
For the Microsoft version, right? I don't have Linux installed at the
moment.
Linux with libstdc++ is 1.7ns.
Am 12.04.2024 um 23:16 schrieb Chris M. Thomasson:
On 4/1/2024 7:37 PM, Bonita Montero wrote:
[...]
Ok, I was talking about my older version.
But notify is fast.
An uncontended notify is 2.5ns on my Zen4 computer.
Keep in mind that notify and wait are slow paths.
There's no slow path with futexes since you've to notify _always_.
And 1.7ns isn't slow.
One needs to strive to minimize calls to notify and wait. Calling
notify and/or wait when they do _have_ to be called is a rather
inefficient design.
For the Microsoft version, right? I don't have Linux installed at
the moment.
Linux with libstdc++ is 1.7ns.
On 4/12/2024 6:00 PM, Bonita Montero wrote:
There's no slow path with futexes since you've to notify _always_.
Huh? Nope! Where did you get that idea from? calling notify is a slow
path, and calling wait is a slow path.
And 1.7ns isn't slow.
You do not want to _always_ call notify and/or wait when you do not have
to.
Am 13.04.2024 um 21:26 schrieb Chris M. Thomasson:
On 4/12/2024 6:00 PM, Bonita Montero wrote:
There's no slow path with futexes since you've to notify _always_.
Huh? Nope! Where did you get that idea from? calling notify is a slow
path, and calling wait is a slow path.
Unlocking a futex always involves a notify since the notify doesn't
depend on a state like a wait().
And 1.7ns isn't slow.
You do not want to _always_ call notify and/or wait when you do not
have to.
You don't unterstand futexes. Show me your mutex basing on C++20 futexes.
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20 futexes.
Did you read it? using windwos futexes: https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
Am 14.04.2024 um 06:30 schrieb Bonita Montero:^^^^^^^^^^
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
That's the simplest code in C++20:
struct fute_xchg_mutex
{
fute_xchg_mutex();
void lock();
void unlock();
private:
atomic_bool m_locked;
};
fute_xchg_mutex::fute_xchg_mutex() :
m_locked( false )
{
}
void fute_xchg_mutex::lock()
{
while( m_locked.exchange( true, memory_order_acquire ) )
m_locked.wait( true, memory_order_relaxed );
}
void fute_xchg_mutex::unlock()
{
m_locked.store( false, memory_order_release );
m_locked.notify_one();
}
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
On 4/13/2024 9:30 PM, Bonita Montero wrote:
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
this is a complete troll, right?
Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:
On 4/13/2024 9:30 PM, Bonita Montero wrote:
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>> Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
this is a complete troll, right?
You don't know futexes.
Show me your code with a mutex basing on a futex.
On 4/13/2024 10:30 PM, Bonita Montero wrote:
Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:
On 4/13/2024 9:30 PM, Bonita Montero wrote:
Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:
On 4/13/2024 9:19 PM, Bonita Montero wrote:
You don't unterstand futexes. Show me your mutex basing on C++20
futexes.
Did you read it? using windwos futexes:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>> Notice how it does not unlock all the time?
I can port it to c++20.
Show me your mutex with a futex.
It's impossible to write that without notifying *always*.
this is a complete troll, right?
You don't know futexes.
Show me your code with a mutex basing on a futex.
I did. Read again, well Windows Futex:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Do you really want me to port this to c++20?
Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:[...]
I did. Read again, well Windows Futex:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Do you really want me to port this to c++20?
Show it here with correct formatting.
On 4/13/2024 10:36 PM, Bonita Montero wrote:
Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:[...]
I did. Read again, well Windows Futex:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Do you really want me to port this to c++20?
Show it here with correct formatting.
Do you mean, indentation? ;^)
Anyway, you commented in the thread I linked you to.
So, you can read my code, right?
On 4/13/2024 10:36 PM, Bonita Montero wrote:
Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:[...]
I did. Read again, well Windows Futex:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
Do you really want me to port this to c++20?
Show it here with correct formatting.
Do you mean, indentation? ;^)
Anyway, you commented in the thread I linked you to.
So, you can read my code, right?
Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:
On 4/13/2024 10:36 PM, Bonita Montero wrote:
Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:[...]
I did. Read again, well Windows Futex:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>
Do you really want me to port this to c++20?
Show it here with correct formatting.
Do you mean, indentation? ;^)
Anyway, you commented in the thread I linked you to.
So, you can read my code, right?
You don't understand futexes since you call notifying with
the slow path slow. But notifying is even not slow if there
are contenders with futexes.
On 4/13/2024 10:46 PM, Bonita Montero wrote:
Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:
On 4/13/2024 10:36 PM, Bonita Montero wrote:
Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:[...]
I did. Read again, well Windows Futex:
https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>>
Do you really want me to port this to c++20?
Show it here with correct formatting.
Do you mean, indentation? ;^)
Anyway, you commented in the thread I linked you to.
So, you can read my code, right?
You don't understand futexes since you call notifying with
the slow path slow. But notifying is even not slow if there
are contenders with futexes.
What? I am quite sure you read this right?
____________________
void unlock()
{
if (InterlockedExchange(&m_state, 0) == 2)
{
WakeByAddressSingle(&m_state);
}
}
____________________
Take careful notice how I try to avoid a call into WakeByAddressSingle?
See? ;^o
That's my code:[...]
On 4/14/2024 6:12 AM, Bonita Montero wrote:
That's my code:[...]
Might as well throw my semaphore into the mix:
https://vorbrodt.blog/2019/02/05/fast-semaphore/
I will code up a little program, most likely today, to check out those
neat C++20 futexes.
That's my code:
atomic_int lockState( 0 );
bench( "Chris' mutex", [&]
{
if( lockState.exchange( 1, memory_order_acquire ) )
while( lockState.exchange( 2, memory_order_acquire ) )
lockState.wait( 2, memory_order_acquire );
if( lockState.exchange( 0, memory_order_release ) == 2 )
lockState.notify_one();
} );
sort( results.begin(), results.end(), []( kv &lhs, kv &rhs ) { return lhs.second < rhs.second; } );
for( kv &result : results )
cout << result.first << ": " << result.second << endl;
}
On 4/14/2024 6:12 AM, Bonita Montero wrote:
That's my code:[...]
Might as well throw my semaphore into the mix:
https://vorbrodt.blog/2019/02/05/fast-semaphore/
I will code up a little program, most likely today, to check out those
neat C++20 futexes.
Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
On 4/14/2024 6:12 AM, Bonita Montero wrote:
That's my code:[...]
Might as well throw my semaphore into the mix:
https://vorbrodt.blog/2019/02/05/fast-semaphore/
I will code up a little program, most likely today, to check out those
neat C++20 futexes.
A semaphore with a mutex, you're funny ...
That ain't fast.
On 4/14/2024 1:57 PM, Bonita Montero wrote:[...]
Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
On 4/14/2024 6:12 AM, Bonita Montero wrote:
That's my code:[...]
Might as well throw my semaphore into the mix:
https://vorbrodt.blog/2019/02/05/fast-semaphore/
I will code up a little program, most likely today, to check out
those neat C++20 futexes.
A semaphore with a mutex, you're funny ...
That ain't fast.
On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:
On 4/14/2024 1:57 PM, Bonita Montero wrote:[...]
Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
On 4/14/2024 6:12 AM, Bonita Montero wrote:
That's my code:[...]
Might as well throw my semaphore into the mix:
https://vorbrodt.blog/2019/02/05/fast-semaphore/
I will code up a little program, most likely today, to check out
those neat C++20 futexes.
A semaphore with a mutex, you're funny ...
That ain't fast.
I think you might have a problem differentiating a fast-path from a slow-path... Humm...
Am 15.04.2024 um 05:26 schrieb Chris M. Thomasson:
On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:
On 4/14/2024 1:57 PM, Bonita Montero wrote:[...]
Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
On 4/14/2024 6:12 AM, Bonita Montero wrote:
That's my code:[...]
Might as well throw my semaphore into the mix:
https://vorbrodt.blog/2019/02/05/fast-semaphore/
I will code up a little program, most likely today, to check out
those neat C++20 futexes.
A semaphore with a mutex, you're funny ...
That ain't fast.
I think you might have a problem differentiating a fast-path from a
slow-path... Humm...
Absolutely not, but you consider a futex wake as slow.
On 4/14/2024 9:15 PM, Bonita Montero wrote:
Absolutely not, but you consider a futex wake as slow.
Yes. I consider a futex notify and wake as "slow" paths.
Am 15.04.2024 um 07:03 schrieb Chris M. Thomasson:
On 4/14/2024 9:15 PM, Bonita Montero wrote:
Absolutely not, but you consider a futex wake as slow.
Yes. I consider a futex notify and wake as "slow" paths.
I've mentioned that a futex wake with no contenders is 1.7ns or about
10 clock cycles on my Zen4-CPU; that's nothing worth to think about.
I don't know what that futex notify is actually doing under the covers.
One impl might be faster than another.
Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:
I don't know what that futex notify is actually doing under the covers.
In 10 clock cycles you can't do much.
One impl might be faster than another.
With Windows it's ony 2.7ns and with Linux 1.7ns.
Am 16.04.2024 um 06:32 schrieb Bonita Montero:
Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:
I don't know what that futex notify is actually doing under the covers.
In 10 clock cycles you can't do much.
One impl might be faster than another.
With Windows it's ony 2.7ns and with Linux 1.7ns.
Try it yourself:
#include <iostream>
#include <atomic>
#include <chrono>
using namespace std;
using namespace chrono;
int main()
{
atomic_bool ab;
auto start = high_resolution_clock::now();
for( size_t i = 1'000'000'000; i; --i )
ab.notify_one();
cout << duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e9 << endl;
}
With WSL2 on a Zen4-CPU it's only 1.2ns, i.e. seven clock cycles.
Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:
I don't know what that futex notify is actually doing under the covers.
In 10 clock cycles you can't do much.
One impl might be faster than another.
With Windows it's ony 2.7ns and with Linux 1.7ns.
I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.
Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:
I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.
It will practically not matter how it is implemented if it is that fast.
On 4/17/2024 1:24 AM, Bonita Montero wrote:^^^^^^^^^^^^^^
Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:
I will, however I am working on another project right now, heavy
graphics based. Basically, I would want to see how notify_one is
actually implemented. Bust out the disassembler.
It will practically not matter how it is implemented if it is that fast.
It just might take an internal hashed mutex to check for contention... Humm... I don't know until I look at it. I don't even want it to do a
CAS, or execute any membars on a "fast-path". Since I don't know what
its doing under the covers, I still think its "prudent" to try to avoid calling into it when we do absolutely have to.
Well, futex notify might have fast paths in and of itself. To be prudent
I would need to see how they implement it to allow a futex notify by,
every time. Fair enough?
Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:
Well, futex notify might have fast paths in and of itself. To be
prudent I would need to see how they implement it to allow a futex
notify by, every time. Fair enough?
I'm asking myself if it would be possible to have context-switching as
most as possible in userspace. If there would be a context-switch from
one thread of a process to another thread because a timeslice expired
the kernel should send a signal to the thread and the thread does the userspace context-switch by itself. Only if there's a context switch
to another process' thread or in kernel mode the kernel's scheduler
acts itself.
This would give the opportunity to have voluntary context switches
when doing locking much faster than trough the kernel, and voluntary
context switches usually happen with a much higher frequency that
there would be a real gain.
With Linux this would be possible trough signals and on Windows the
kernel could induce SEH-exceptions for a thread-switch.
How do you "signal" a user-thread without doing a kernel operation
and a thread switch?
Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of context-switch, but it still needs to deal with kernel space operations.
Am 19.04.2024 um 04:03 schrieb Richard Damon:
Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.
A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 19.04.2024 um 04:03 schrieb Richard Damon:
Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.
A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.
SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.
Am 19.04.2024 um 15:41 schrieb Scott Lurndal:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 19.04.2024 um 04:03 schrieb Richard Damon:
Admittedly, if the kernel knows it is switching from one threadA context switch through the kernel is always expensive. A user
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations. >>>
-level thread switch when blocking for a lock would be much faster.
SVR4.2MP implemented a M-N thread model (M user threads mapped to
N kernel threads). Turned out not to work well.
The thing that I'm imaging is still 1:1 but if threads are in userspace >thread-switching would be done by the userspace.
Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).
Am 19.04.2024 um 16:48 schrieb Scott Lurndal:
Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).
I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 19.04.2024 um 16:48 schrieb Scott Lurndal:
Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).
I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.
https://www.kernel.org/
Feel free to modify the kernel to your heart's content.
Am 19.04.2024 um 04:03 schrieb Richard Damon:
How do you "signal" a user-thread without doing a kernel operation
and a thread switch?
The signals for the involuntary userspace thread-switch would be sent
by a dedicated kernel-thread or by the timer interrupt. This would be
more expensive than a thread-switch through the timer interrupt but
as voluntary thread-switches have a much higher frequency this would
be outweighed.
Admittedly, if the kernel knows it is switching from one thread
to another in the same process it can do a lighter weight sort of
context-switch, but it still needs to deal with kernel space operations.
A context switch through the kernel is always expensive. A user
-level thread switch when blocking for a lock would be much faster.
On 4/19/24 12:18 AM, Bonita Montero wrote:
Am 19.04.2024 um 04:03 schrieb Richard Damon:
How do you "signal" a user-thread without doing a kernel operation
and a thread switch?
The signals for the involuntary userspace thread-switch would be sent
by a dedicated kernel-thread or by the timer interrupt. This would be
more expensive than a thread-switch through the timer interrupt but
as voluntary thread-switches have a much higher frequency this would
be outweighed.
TO WHAT?
Are you going to reserve a core with a dedicated thread to do this?
To "interrupt" a user thread to notify it, you would either need to
perform a context switch to save the threads previous context or make
the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
that you want to go to.
Am 19.04.2024 um 20:27 schrieb Scott Lurndal:
Bonita Montero <Bonita.Montero@gmail.com> writes:
Am 19.04.2024 um 16:48 schrieb Scott Lurndal:
Feel free to prototype it using setcontext(2), getcontext(2) and
makecontext(2).
I'd need the support of the kernel which should not make context
switches to another thread inside the same process if the thread
is within userspace. And the kernel should have to periodically
inject signals from the timer interrupt to userspace to make it
possible that the userspace-code does the involuntary context
-switch on its own. And I'd need synchronization-primitives like
mutexes and semaphores that would do the otherwise costly context
-switch in userspace; but that's rather easy compared to the kernel
support.
https://www.kernel.org/
Feel free to modify the kernel to your heart's content.
Seems you don't understand the idead and you think this isn't
possible.
It seems clear that the thread within userspace is completely
invisible to the kernel, thus it cannot by definition switch to it.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 915 |
Nodes: | 10 (1 / 9) |
Uptime: | 22:09:51 |
Calls: | 12,168 |
Files: | 186,520 |
Messages: | 2,233,970 |