• C++20 futex with heavy contention slower than mutex

    From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Mar 29 14:14:11 2024
    From Newsgroup: comp.lang.c++

    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with
    Linux.

    #include <iostream>
    #include <thread>
    #include <mutex>
    #include <atomic>
    #include <functional>
    #include <chrono>
    #include <vector>

    using namespace std;
    using namespace chrono;

    int main()
    {
    constexpr int64_t ROUNDS = 10000;
    auto bench = [&]( char const *head, auto fn )
    {
    cout << head << endl;
    atomic_int64_t tSum( 0 );
    vector<jthread> threads;
    int hc = jthread::hardware_concurrency();
    for( int nThreads = 1; nThreads <= hc; ++nThreads )
    {
    for( unsigned t = nThreads; t; --t )
    threads.emplace_back( [&]()
    {
    auto start = high_resolution_clock::now();
    int64_t cmp = 0;
    for( int64_t r = ROUNDS; r; --r )
    cmp = fn( cmp );
    tSum += duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count();
    } );
    threads.resize( 0 );
    cout << "\t" << nThreads << ": " << tSum /
    ((double)nThreads * ROUNDS) << endl;
    }
    };
    {
    mutex mtx;
    bench( "mutex: ",
    [&]( int64_t )
    {
    mtx.lock();
    mtx.unlock();
    return 0;
    } );
    }
    {
    atomic_int64_t futex( 0 );
    constexpr int64_t HIBIT = numeric_limits<int64_t>::min();
    bench( "futex:",
    [&]( int64_t cmp )
    {
    int64_t niu;
    for( ; ; )
    if( cmp >= 0 )
    if( futex.compare_exchange_weak( cmp, niu = cmp
    | HIBIT, memory_order_acquire, memory_order_relaxed ) )
    {
    cmp = niu;
    break;
    }
    else;
    else
    if( futex.compare_exchange_weak( cmp, niu = cmp
    + 1, memory_order_relaxed, memory_order_relaxed ) )
    {
    futex.wait( niu, memory_order_acquire );
    cmp = futex.load( memory_order_relaxed );
    }
    for( ; ; )
    if( cmp & ~HIBIT )
    if( futex.compare_exchange_weak( cmp, niu =
    (cmp - 1) & ~HIBIT, memory_order_release, memory_order_relaxed ) )
    {
    cmp = niu;
    futex.notify_one();
    break;
    }
    else;
    else
    if( futex.compare_exchange_weak( cmp, 0, memory_order_release, memory_order_relaxed ) )
    {
    cmp = 0;
    break;
    }
    return cmp;
    } );
    }
    }

    This are the results under Ubuntu 20.04 LTS:

    mutex:
    1: 3.9134
    2: 5.8706
    3: 33.1753
    4: 48.63
    5: 71.6777
    6: 109.556
    7: 151.271
    8: 208.871
    9: 288.031
    10: 353.536
    11: 446.28
    12: 563.841
    13: 701.134
    14: 833.08
    15: 983.734
    16: 1138.48
    17: 1297.53
    18: 1481.15
    19: 1664.51
    20: 1857.88
    21: 2053.83
    22: 2285.07
    23: 2546.67
    24: 2782.53
    25: 3065.25
    26: 3349.4
    27: 3652.06
    28: 3971.25
    29: 4339.13
    30: 4727.31
    31: 5129.95
    32: 5499.05
    futex:
    1: 3.3654
    2: 5.02155
    3: 6.69107
    4: 15.2522
    5: 16.8824
    6: 185.235
    7: 162.176
    8: 360.421
    9: 2272.83
    10: 4805.11
    11: 8987.24
    12: 14225.5
    13: 20076.4
    14: 27199.4
    15: 36711.8
    16: 49480.8
    17: 57641.8
    18: 75633.2
    19: 97196.6
    20: 122223
    21: 147617
    22: 180377
    23: 216084
    24: 256032
    25: 299681
    26: 345518
    27: 398785
    28: 445499
    29: 506351
    30: 572440
    31: 643455
    32: 721461

    With all threads locking and unlocking the conventional mutex is 131
    timea faster.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 15:28:16 2024
    From Newsgroup: comp.lang.c++

    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
    Linux.
    [...]

    A futex is not a mutex!

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 15:30:48 2024
    From Newsgroup: comp.lang.c++

    On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with
    Linux.
    [...]

    A futex is not a mutex!


    I have to take a look at the logic you used for your "mutex" based on a
    futex, not enough time right now. Perhaps, the std::mutex is just way
    more efficient that your use of a futex? Keep in mind futexs are tricky:

    https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

    ;^)

    It depends on the logic you use.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 16:29:43 2024
    From Newsgroup: comp.lang.c++

    On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:
    On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with
    Linux.
    [...]

    A futex is not a mutex!


    I have to take a look at the logic you used for your "mutex" based on a futex, not enough time right now. Perhaps, the std::mutex is just way
    more efficient that your use of a futex? Keep in mind futexs are tricky:

    https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

    ;^)

    It depends on the logic you use.

    You generally want a "waitbit" type of logic... I posted a mutex based
    on a futex here a while back. I just need to find it. Iirc, it used a
    Windows "futex"... If I can find it, you should test against it. It used
    NO compare and swap. Ugghhhh.... Try to avoid that.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Mar 29 19:46:29 2024
    From Newsgroup: comp.lang.c++

    On 3/29/2024 4:29 PM, Chris M. Thomasson wrote:
    On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:
    On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with >>>> Linux.
    [...]

    A futex is not a mutex!


    I have to take a look at the logic you used for your "mutex" based on
    a futex, not enough time right now. Perhaps, the std::mutex is just
    way more efficient that your use of a futex? Keep in mind futexs are
    tricky:

    https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

    ;^)

    It depends on the logic you use.

    You generally want a "waitbit" type of logic... I posted a mutex based
    on a futex here a while back. I just need to find it. Iirc, it used a Windows "futex"... If I can find it, you should test against it. It used
    NO compare and swap. Ugghhhh.... Try to avoid that.

    Another key, try to avoid waiting on a futex, strive for that. Maximize
    the fast paths, and minimize the slow paths.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 30 05:18:35 2024
    From Newsgroup: comp.lang.c++

    Am 29.03.2024 um 23:28 schrieb Chris M. Thomasson:
    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with
    Linux.
    [...]

    A futex is not a mutex!


    You can build a mutex on top of a futex and that's the typical usage.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 30 09:25:26 2024
    From Newsgroup: comp.lang.c++

    Am 30.03.2024 um 00:29 schrieb Chris M. Thomasson:
    On 3/29/2024 3:30 PM, Chris M. Thomasson wrote:
    On 3/29/2024 3:28 PM, Chris M. Thomasson wrote:
    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with >>>> Linux.
    [...]

    A futex is not a mutex!


    I have to take a look at the logic you used for your "mutex" based on
    a futex, not enough time right now. Perhaps, the std::mutex is just
    way more efficient that your use of a futex? Keep in mind futexs are
    tricky:

    https://cis.temple.edu/~giorgio/cis307/readings/futex.pdf

    ;^)

    It depends on the logic you use.

    You generally want a "waitbit" type of logic...

    The sign-bit of my uint64 shows that signs that the lock is occupied
    and the 63 lower bits sign the number of threads waiting to get owner-
    ship. But i simplified the code a lot by using just an atomic_bool.
    Now under Windows the futex is superior from 12 threads on. But under
    Linux the futex is slower except for one or two threads.

    Windows:

    1: 248%
    2: 54%
    3: 27%
    4: 28%
    5: 33%
    6: 31%
    7: 37%
    8: 37%
    9: 53%
    10: 42%
    11: 82%
    12: 101%
    13: 116%
    14: 142%
    15: 153%
    16: 160%
    17: 175%
    18: 191%
    19: 192%
    20: 194%
    21: 206%
    22: 211%
    23: 221%
    24: 220%
    25: 223%
    26: 220%
    27: 220%
    28: 223%
    29: 222%
    30: 220%
    31: 219%
    32: 216%

    Linux:

    1: 121%
    2: 106%
    3: 67%
    4: 99%
    5: 87%
    6: 18%
    7: 13%
    8: 12%
    9: 10%
    10: 10%
    11: 9%
    12: 9%
    13: 8%
    14: 8%
    15: 8%
    16: 8%
    17: 8%
    18: 8%
    19: 8%
    20: 8%
    21: 7%
    22: 7%
    23: 8%
    24: 7%
    25: 7%
    26: 7%
    27: 7%
    28: 7%
    29: 7%
    30: 7%
    31: 7%
    32: 6%

    #include <iostream>
    #include <thread>
    #include <atomic>
    #include <chrono>
    #include <vector>
    #include <mutex>

    using namespace std;
    using namespace chrono;

    int main()
    {
    unsigned hc = jthread::hardware_concurrency();
    auto test = [&]( char const *head, auto fn, vector<double> &times )
    {
    cout << head << endl;
    constexpr size_t ROUNDS = 10'000;
    atomic_int64_t tSum, roundsSum;
    vector<jthread> threads;
    for( unsigned nThreads = 1; nThreads <= hc; ++nThreads )
    {
    auto st = []<typename T, integral X>( atomic<T> &a, X v ) { return
    a.store( (T)v, memory_order_relaxed ); };
    st( tSum, 0 );
    st( roundsSum, 0 );
    atomic_bool stop( false );
    auto ld = []<typename T>( atomic<T> &a ) { return a.load(
    memory_order_relaxed ); };
    for( unsigned t = nThreads; t; --t )
    threads.emplace_back( [&]
    {
    auto start = high_resolution_clock::now();
    int64_t rounds = 0;
    for( ; !ld( stop ); fn(), ++rounds );
    auto add = []( atomic_int64_t &a, int64_t add ) { a.fetch_add(
    add, memory_order_relaxed ); };
    add( tSum, duration_cast<nanoseconds>(
    high_resolution_clock::now() - start ).count() );
    add( roundsSum, rounds );
    } );
    this_thread::sleep_for( 100ms );
    st( stop, true );
    threads.resize( 0 );
    times.emplace_back( ld( tSum ) / (double)ld( roundsSum ) );
    cout << "\t" << nThreads << ": " << times.back() << endl;
    }
    };
    vector<double> mutexTimes, futexTimes;
    {
    mutex mtx;
    test( "mutex:",
    [&]
    {
    mtx.lock();
    mtx.unlock();
    }, mutexTimes );
    }
    {
    atomic_bool futex( false );
    test( "futex:",
    [&]
    {
    for( bool cmp; !futex.compare_exchange_weak( cmp = false, true,
    memory_order_acquire, memory_order_relaxed ); )
    futex.wait( true, memory_order_relaxed );
    futex.store( false, memory_order_release );
    futex.notify_one();
    }, futexTimes );
    }
    cout << "relation: " << endl;
    for( size_t i = 0; i != hc; ++i )
    cout << "\t" << i + 1 << ": " << (int)(100.0 * mutexTimes[i] / futexTimes[i] + 0.5) << "%" << endl;
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 30 11:51:59 2024
    From Newsgroup: comp.lang.c++

    On 3/30/2024 1:25 AM, Bonita Montero wrote:
    {
                    for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                        futex.wait( true, memory_order_relaxed );
                    futex.store( false, memory_order_release );
                    futex.notify_one();
                }, futexTimes );


    You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a
    hard time finding the damn thing in comp.lang.c++. I will try to find it.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 30 12:00:31 2024
    From Newsgroup: comp.lang.c++

    On 3/29/2024 6:14 AM, Bonita Montero wrote:
    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
    Linux.
    [...]

    I think I found my highly experimental windows futex based mutex:


    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    (read all to refresh the context... ;^)


    Notice:
    ___________________
    void unlock()
    {
    if (InterlockedExchange(&m_state, 0) == 2)
    {
    WakeByAddressSingle(&m_state);
    }
    }
    ___________________


    We only signal when we have to!


    copy and paste from the link. Sorry about the copy and paste destroying
    all of my indentation, argh!


    Can you run this Bonita? Fwiw, it would be my first time using a Windows
    Futex! lol:
    ______________________________________
    #include <iostream>
    #include <thread>
    #include <functional>

    #define WIN32_LEAN_AND_MEAN
    #include <Windows.h>


    #define CT_L2_ALIGNMENT 128
    #define CT_THREADS 32
    #define CT_ITERS 6666666


    struct ct_futex_mutex
    {
    ULONG alignas(CT_L2_ALIGNMENT) m_state;

    ct_futex_mutex() : m_state(0)
    {

    }

    void lock()
    {
    if (InterlockedExchange(&m_state, 1))
    {
    while (InterlockedExchange(&m_state, 2))
    {
    ULONG cmp = 2;
    WaitOnAddress(&m_state, &cmp, sizeof(ULONG), INFINITE);
    }
    }
    }

    void unlock()
    {
    if (InterlockedExchange(&m_state, 0) == 2)
    {
    WakeByAddressSingle(&m_state);
    }
    }
    };


    struct ct_shared
    {
    ct_futex_mutex m_mtx;
    unsigned long m_count;

    ct_shared() : m_count(0) {}

    ~ct_shared()
    {
    if (m_count != 0)
    {
    std::cout << "counter is totally fubar!\n";
    }
    }
    };


    void ct_thread(ct_shared& shared)
    {
    for (unsigned long i = 0; i < CT_ITERS; ++i)
    {
    shared.m_mtx.lock();
    ++shared.m_count;
    shared.m_mtx.unlock();

    shared.m_mtx.lock();
    --shared.m_count;
    shared.m_mtx.unlock();
    }
    }


    int main()
    {
    std::thread threads[CT_THREADS];

    std::cout << "Starting up...\n";

    {
    ct_shared shared;

    for (unsigned long i = 0; i < CT_THREADS; ++i)
    {
    threads[i] = std::thread(ct_thread, std::ref(shared));
    }

    std::cout << "Running...\n";

    for (unsigned long i = 0; i < CT_THREADS; ++i)
    {
    threads[i].join();
    }
    }

    std::cout << "Completed!\n";

    return 0;
    }
    ______________________________________


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 30 12:02:28 2024
    From Newsgroup: comp.lang.c++

    On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:
    On 3/30/2024 1:25 AM, Bonita Montero wrote:
    {
                     for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                         futex.wait( true, memory_order_relaxed );
                     futex.store( false, memory_order_release ); >>                  futex.notify_one();
                 }, futexTimes );


    You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.

    I found my older experimental code, take a deep look at ct_futex_mutex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 30 21:18:06 2024
    From Newsgroup: comp.lang.c++

    Am 30.03.2024 um 19:51 schrieb Chris M. Thomasson:

    You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.

    Futex notify and wait are at least very cheap with Windows.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 13:11:52 2024
    From Newsgroup: comp.lang.c++

    On 3/30/2024 1:18 PM, Bonita Montero wrote:
    Am 30.03.2024 um 19:51 schrieb Chris M. Thomasson:

    You need to introduce some sort of waitbit logic to minimize calls to
    futex.notify_one() and futex.wait(). I posted an example but am having
    a hard time finding the damn thing in comp.lang.c++. I will try to
    find it.

    Futex notify and wait are at least very cheap with Windows.


    Well, I have not taken a look at Microsoft's Futex implementation.
    Afaict, your code showcases that aspect wrt calling into notify on every
    mutex unlock, and wait on every point of contention wrt lock. Interesting.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 22:01:55 2024
    From Newsgroup: comp.lang.c++

    On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:
    On 3/30/2024 1:25 AM, Bonita Montero wrote:
    {
                     for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                         futex.wait( true, memory_order_relaxed );
                     futex.store( false, memory_order_release ); >>                  futex.notify_one();
                 }, futexTimes );


    You need to introduce some sort of waitbit logic to minimize calls to futex.notify_one() and futex.wait(). I posted an example but am having a hard time finding the damn thing in comp.lang.c++. I will try to find it.

    Also, does notify_one automatically imply release semantics?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 22:02:23 2024
    From Newsgroup: comp.lang.c++

    On 3/31/2024 10:01 PM, Chris M. Thomasson wrote:
    On 3/30/2024 11:51 AM, Chris M. Thomasson wrote:
    On 3/30/2024 1:25 AM, Bonita Montero wrote:
    {
                     for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                         futex.wait( true, memory_order_relaxed );
                     futex.store( false, memory_order_release );
                     futex.notify_one();
                 }, futexTimes );


    You need to introduce some sort of waitbit logic to minimize calls to
    futex.notify_one() and futex.wait(). I posted an example but am having
    a hard time finding the damn thing in comp.lang.c++. I will try to
    find it.

    Also, does notify_one automatically imply release semantics?

    I don't think so, humm. Cannot remember.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 1 08:32:08 2024
    From Newsgroup: comp.lang.c++

    Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

    Well, I have not taken a look at Microsoft's Futex implementation.
    Afaict, your code showcases that aspect wrt calling into notify on every mutex unlock, and wait on every point of contention wrt lock. Interesting.

    I'm using notify and wait only when there's contention.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 31 23:40:28 2024
    From Newsgroup: comp.lang.c++

    On 3/31/2024 11:32 PM, Bonita Montero wrote:
    Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

    Well, I have not taken a look at Microsoft's Futex implementation.
    Afaict, your code showcases that aspect wrt calling into notify on
    every mutex unlock, and wait on every point of contention wrt lock.
    Interesting.

    I'm using notify and wait only when there's contention.

    Even here?
    _____________
    for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
    futex.wait( true, memory_order_relaxed );
    futex.store( false, memory_order_release );
    futex.notify_one();
    }, futexTimes );
    ____________

    What am I missing?

    futex.store( false, memory_order_release );
    futex.notify_one();

    ?

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 1 10:57:32 2024
    From Newsgroup: comp.lang.c++

    Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
    On 3/31/2024 11:32 PM, Bonita Montero wrote:
    Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

    Well, I have not taken a look at Microsoft's Futex implementation.
    Afaict, your code showcases that aspect wrt calling into notify on
    every mutex unlock, and wait on every point of contention wrt lock.
    Interesting.

    I'm using notify and wait only when there's contention.

    Even here?
    _____________
                     for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                         futex.wait( true, memory_order_relaxed );
                     futex.store( false, memory_order_release );
                     futex.notify_one();
                 }, futexTimes );
    ____________

    What am I missing?

                     futex.store( false, memory_order_release );
                     futex.notify_one();

    ?


    Ok, I was talking about my older version.
    But notify is fast.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 1 11:01:05 2024
    From Newsgroup: comp.lang.c++

    Am 01.04.2024 um 10:57 schrieb Bonita Montero:
    Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
    On 3/31/2024 11:32 PM, Bonita Montero wrote:
    Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

    Well, I have not taken a look at Microsoft's Futex implementation.
    Afaict, your code showcases that aspect wrt calling into notify on
    every mutex unlock, and wait on every point of contention wrt lock.
    Interesting.

    I'm using notify and wait only when there's contention.

    Even here?
    _____________
                      for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                          futex.wait( true, memory_order_relaxed );
                      futex.store( false, memory_order_release );
                      futex.notify_one();
                  }, futexTimes );
    ____________

    What am I missing?

                      futex.store( false, memory_order_release );
                      futex.notify_one();

    ?


    Ok, I was talking about my older version.
    But notify is fast.


    An uncontended notify is 2.5ns on my Zen4 computer.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c++ on Mon Apr 1 12:18:00 2024
    From Newsgroup: comp.lang.c++

    On Fri, 29 Mar 2024 14:14:11 +0100
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex explodes and the conventional mutex is faster with Windows as and with
    Linux.


    In case of heavy contention what to consider 'faster' is not at all
    obvious.

    On lightly loaded system with more cores than work to do (a typical
    client) 'faster' means faster forward progress of group of contending
    threads. Achieved by very long polling before switching to wait,
    probably up to several tens of usec and by hyperactive tickless OS
    scheduler.

    On heavily loaded system with much more work to do than available
    cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
    than 500 nsec and by 'passive' OS scheduler that rarely intervenes
    outside of clock tick.

    And of course there are cases in the middle.

    And then traditional HPC with MPI that is completely different kettle of
    fish.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Apr 1 12:27:31 2024
    From Newsgroup: comp.lang.c++

    On 4/1/2024 2:01 AM, Bonita Montero wrote:
    Am 01.04.2024 um 10:57 schrieb Bonita Montero:
    Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
    On 3/31/2024 11:32 PM, Bonita Montero wrote:
    Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

    Well, I have not taken a look at Microsoft's Futex implementation.
    Afaict, your code showcases that aspect wrt calling into notify on
    every mutex unlock, and wait on every point of contention wrt lock. >>>>> Interesting.

    I'm using notify and wait only when there's contention.

    Even here?
    _____________
                      for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                          futex.wait( true, memory_order_relaxed );
                      futex.store( false, memory_order_release );
                      futex.notify_one();
                  }, futexTimes );
    ____________

    What am I missing?

                      futex.store( false, memory_order_release );
                      futex.notify_one();

    ?


    Ok, I was talking about my older version.
    But notify is fast.


    An uncontended notify is 2.5ns on my Zen4 computer.


    For the Microsoft version, right? I don't have Linux installed at the
    moment.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Apr 2 04:37:47 2024
    From Newsgroup: comp.lang.c++

    Am 01.04.2024 um 21:27 schrieb Chris M. Thomasson:
    On 4/1/2024 2:01 AM, Bonita Montero wrote:
    Am 01.04.2024 um 10:57 schrieb Bonita Montero:
    Am 01.04.2024 um 08:40 schrieb Chris M. Thomasson:
    On 3/31/2024 11:32 PM, Bonita Montero wrote:
    Am 31.03.2024 um 22:11 schrieb Chris M. Thomasson:

    Well, I have not taken a look at Microsoft's Futex implementation. >>>>>> Afaict, your code showcases that aspect wrt calling into notify on >>>>>> every mutex unlock, and wait on every point of contention wrt
    lock. Interesting.

    I'm using notify and wait only when there's contention.

    Even here?
    _____________
                      for( bool cmp; !futex.compare_exchange_weak( cmp =
    false, true, memory_order_acquire, memory_order_relaxed ); )
                          futex.wait( true, memory_order_relaxed );
                      futex.store( false, memory_order_release );
                      futex.notify_one();
                  }, futexTimes );
    ____________

    What am I missing?

                      futex.store( false, memory_order_release );
                      futex.notify_one();

    ?


    Ok, I was talking about my older version.
    But notify is fast.


    An uncontended notify is 2.5ns on my Zen4 computer.


    For the Microsoft version, right? I don't have Linux installed at the moment.

    Linux with libstdc++ is 1.7ns.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Thu Apr 4 19:56:13 2024
    From Newsgroup: comp.lang.c++

    On 4/1/2024 2:18 AM, Michael S wrote:
    On Fri, 29 Mar 2024 14:14:11 +0100
    Bonita Montero <Bonita.Montero@gmail.com> wrote:

    The following program simulates constant locking und unlocking of one
    to jthread::hardware_concurrency() threadas with a std::mutex and a
    futex. On my 16 core / 32 thread Zen4 system a futex is faster up to
    5 threads constantly contending, but beyond the CPU time of the futex
    explodes and the conventional mutex is faster with Windows as and with
    Linux.


    In case of heavy contention what to consider 'faster' is not at all
    obvious.

    On lightly loaded system with more cores than work to do (a typical
    client) 'faster' means faster forward progress of group of contending threads. Achieved by very long polling before switching to wait,
    probably up to several tens of usec and by hyperactive tickless OS
    scheduler.

    On heavily loaded system with much more work to do than available
    cores, 'faster' means more work done by unrelated threads and processes. Achieved by very short polling before switching to wait, probably less
    than 500 nsec and by 'passive' OS scheduler that rarely intervenes
    outside of clock tick.

    And of course there are cases in the middle.

    And then traditional HPC with MPI that is completely different kettle of fish.


    Ditto. However, I did find it interesting (20+ years ago) to test worst
    case scenarios. What happens when I blast this mutex algorithm A with artificial load vs algo B forcing it into horrible periods of radical contention. How do they handle it...
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Fri Apr 12 14:16:52 2024
    From Newsgroup: comp.lang.c++

    On 4/1/2024 7:37 PM, Bonita Montero wrote:
    [...]
    Ok, I was talking about my older version.
    But notify is fast.


    An uncontended notify is 2.5ns on my Zen4 computer.

    Keep in mind that notify and wait are slow paths. One needs to strive to minimize calls to notify and wait. Calling notify and/or wait when they
    do _have_ to be called is a rather inefficient design.




    For the Microsoft version, right? I don't have Linux installed at the
    moment.

    Linux with libstdc++ is 1.7ns.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 13 03:00:21 2024
    From Newsgroup: comp.lang.c++

    Am 12.04.2024 um 23:16 schrieb Chris M. Thomasson:
    On 4/1/2024 7:37 PM, Bonita Montero wrote:
    [...]
    Ok, I was talking about my older version.
    But notify is fast.


    An uncontended notify is 2.5ns on my Zen4 computer.

    Keep in mind that notify and wait are slow paths.

    There's no slow path with futexes since you've to notify _always_.
    And 1.7ns isn't slow.

    One needs to strive to minimize calls to notify and wait. Calling notify and/or wait when they
    do _have_ to be called is a rather inefficient design.




    For the Microsoft version, right? I don't have Linux installed at the
    moment.

    Linux with libstdc++ is 1.7ns.



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 12:26:55 2024
    From Newsgroup: comp.lang.c++

    On 4/12/2024 6:00 PM, Bonita Montero wrote:
    Am 12.04.2024 um 23:16 schrieb Chris M. Thomasson:
    On 4/1/2024 7:37 PM, Bonita Montero wrote:
    [...]
    Ok, I was talking about my older version.
    But notify is fast.


    An uncontended notify is 2.5ns on my Zen4 computer.

    Keep in mind that notify and wait are slow paths.

    There's no slow path with futexes since you've to notify _always_.

    Huh? Nope! Where did you get that idea from? calling notify is a slow
    path, and calling wait is a slow path.

    And 1.7ns isn't slow.

    You do not want to _always_ call notify and/or wait when you do not have to.



    One needs to strive to  minimize calls to notify and wait. Calling
    notify and/or wait when they do _have_ to be called is a rather
    inefficient design.






    For the Microsoft version, right? I don't have Linux installed at
    the moment.

    Linux with libstdc++ is 1.7ns.




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 06:19:16 2024
    From Newsgroup: comp.lang.c++

    Am 13.04.2024 um 21:26 schrieb Chris M. Thomasson:

    On 4/12/2024 6:00 PM, Bonita Montero wrote:

    There's no slow path with futexes since you've to notify _always_.

    Huh? Nope! Where did you get that idea from? calling notify is a slow
    path, and calling wait is a slow path.

    Unlocking a futex always involves a notify since the notify doesn't
    depend on a state like a wait().


    And 1.7ns isn't slow.

    You do not want to _always_ call notify and/or wait when you do not have
    to.

    You don't unterstand futexes. Show me your mutex basing on C++20 futexes.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 21:26:12 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 9:19 PM, Bonita Montero wrote:
    Am 13.04.2024 um 21:26 schrieb Chris M. Thomasson:

    On 4/12/2024 6:00 PM, Bonita Montero wrote:

    There's no slow path with futexes since you've to notify _always_.

    Huh? Nope! Where did you get that idea from? calling notify is a slow
    path, and calling wait is a slow path.

    Unlocking a futex always involves a notify since the notify doesn't
    depend on a state like a wait().


    And 1.7ns isn't slow.

    You do not want to _always_ call notify and/or wait when you do not
    have to.

    You don't unterstand futexes. Show me your mutex basing on C++20 futexes.

    Did you read it? using windwos futexes:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    Notice how it does not unlock all the time?

    I can port it to c++20.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 06:30:52 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20 futexes.

    Did you read it? using windwos futexes: https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
    Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 06:34:09 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 06:30 schrieb Bonita Montero:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
    Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    That's the simplest code in C++20:

    struct fute_xchg_mutex
    {
    fute_xchg_mutex();
    void lock();
    void unlock();
    private:
    atomic_bool m_locked;
    };

    fute_xchg_mutex::fute_xchg_mutex() :
    m_locked( false )
    {
    }

    void fute_xchg_mutex::lock()
    {
    while( m_locked.exchange( true, memory_order_acquire ) )
    m_locked.wait( true, memory_order_relaxed );
    }

    void fute_xchg_mutex::unlock()
    {
    m_locked.store( false, memory_order_release );
    m_locked.notify_one();
    }


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:21:22 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 9:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
    Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    I beg to disagree.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:21:42 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 9:34 PM, Bonita Montero wrote:
    Am 14.04.2024 um 06:30 schrieb Bonita Montero:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
    Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    That's the simplest code in C++20:

    struct fute_xchg_mutex
    {
        fute_xchg_mutex();
        void lock();
        void unlock();
    private:
        atomic_bool m_locked;
    };

    fute_xchg_mutex::fute_xchg_mutex() :
        m_locked( false )
    {
    }

    void fute_xchg_mutex::lock()
    {
        while( m_locked.exchange( true, memory_order_acquire ) )
            m_locked.wait( true, memory_order_relaxed );
    }

    void fute_xchg_mutex::unlock()
    {
        m_locked.store( false, memory_order_release );
        m_locked.notify_one();
    ^^^^^^^^^^

    NONONNOONONO!

    }



    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:22:29 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 9:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
    Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    this is a complete troll, right?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 07:30:59 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:
    On 4/13/2024 9:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ
    Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    this is a complete troll, right?

    You don't know futexes.
    Show me your code with a mutex basing on a futex.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:32:27 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 10:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:
    On 4/13/2024 9:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>> Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    this is a complete troll, right?

    You don't know futexes.
    Show me your code with a mutex basing on a futex.


    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    Do you really want me to port this to c++20?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 07:36:25 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:
    On 4/13/2024 10:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:22 schrieb Chris M. Thomasson:
    On 4/13/2024 9:30 PM, Bonita Montero wrote:
    Am 14.04.2024 um 06:26 schrieb Chris M. Thomasson:

    On 4/13/2024 9:19 PM, Bonita Montero wrote:

    You don't unterstand futexes. Show me your mutex basing on C++20
    futexes.

    Did you read it? using windwos futexes:
    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>> Notice how it does not unlock all the time?
    I can port it to c++20.

    Show me your mutex with a futex.
    It's impossible to write that without notifying *always*.

    this is a complete troll, right?

    You don't know futexes.
    Show me your code with a mutex basing on a futex.


    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    Do you really want me to port this to c++20?

    Show it here with correct formatting.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:42:57 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 10:36 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:
    [...]
    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    Do you really want me to port this to c++20?

    Show it here with correct formatting.


    Do you mean, indentation? ;^)

    Anyway, you commented in the thread I linked you to.

    So, you can read my code, right?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 07:46:08 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:
    On 4/13/2024 10:36 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:
    [...]
    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    Do you really want me to port this to c++20?

    Show it here with correct formatting.


    Do you mean, indentation? ;^)

    Anyway, you commented in the thread I linked you to.

    So, you can read my code, right?

    You don't understand futexes since you call notifying with
    the slow path slow. But notifying is even not slow if there
    are contenders with futexes.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:46:33 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 10:42 PM, Chris M. Thomasson wrote:
    On 4/13/2024 10:36 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:
    [...]
    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ

    Do you really want me to port this to c++20?

    Show it here with correct formatting.


    Do you mean, indentation? ;^)

    Anyway, you commented in the thread I linked you to.

    So, you can read my code, right?

    I use the explicit Windows API for a futex. Think of:

    https://learn.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress

    ;^)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Apr 13 22:48:45 2024
    From Newsgroup: comp.lang.c++

    On 4/13/2024 10:46 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:
    On 4/13/2024 10:36 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:
    [...]
    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>
    Do you really want me to port this to c++20?

    Show it here with correct formatting.


    Do you mean, indentation? ;^)

    Anyway, you commented in the thread I linked you to.

    So, you can read my code, right?

    You don't understand futexes since you call notifying with
    the slow path slow. But notifying is even not slow if there
    are contenders with futexes.

    What? I am quite sure you read this right?
    ____________________
    void unlock()
    {
    if (InterlockedExchange(&m_state, 0) == 2)
    {
    WakeByAddressSingle(&m_state);
    }
    }
    ____________________

    Take careful notice how I try to avoid a call into WakeByAddressSingle?
    See? ;^o

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 11:25:33 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 07:48 schrieb Chris M. Thomasson:
    On 4/13/2024 10:46 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:42 schrieb Chris M. Thomasson:
    On 4/13/2024 10:36 PM, Bonita Montero wrote:
    Am 14.04.2024 um 07:32 schrieb Chris M. Thomasson:
    [...]
    I did. Read again, well Windows Futex:

    https://groups.google.com/g/comp.lang.c++/c/1MZvhswJ6DQ/m/qyaYH-i0CgAJ >>>>>
    Do you really want me to port this to c++20?

    Show it here with correct formatting.


    Do you mean, indentation? ;^)

    Anyway, you commented in the thread I linked you to.

    So, you can read my code, right?

    You don't understand futexes since you call notifying with
    the slow path slow. But notifying is even not slow if there
    are contenders with futexes.

    What? I am quite sure you read this right?
    ____________________
    void unlock()
    {
        if (InterlockedExchange(&m_state, 0) == 2)
        {
            WakeByAddressSingle(&m_state);
        }
    }
    ____________________

    Take careful notice how I try to avoid a call into WakeByAddressSingle?
    See? ;^o


    This is from a benchmark of two constantly contending threads:

    atomic with locker counter + std::binary_semaphore:
    10.4597
    atoomic with locker counter + shown futephore:
    14.9949
    Chris Code:
    18.6445
    shown atomic_bool with always wakeup:
    24.7084
    std::mutex:
    33.8646

    The numbers are in nanoseconds for a locking unlocking iteration.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 15:12:17 2024
    From Newsgroup: comp.lang.c++

    That's my code:

    #include <iostream>
    #include <mutex>
    #include <thread>
    #include <chrono>
    #include <semaphore>
    #include <algorithm>
    #include <vector>

    using namespace std;
    using namespace chrono;

    int main()
    {
    using kv = pair<char const *, double>;
    vector<kv> results;
    auto bench = [&]( char const *what, auto fn )
    {
    constexpr size_t ROUNDS = 10'000'000;
    atomic_int64_t tSum = 0;
    auto thr = [&]()
    {
    auto start = high_resolution_clock::now();
    for( size_t r = ROUNDS; r; --r )
    fn();
    tSum.fetch_add( duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() );
    };
    jthread jtA( thr ), jtB( thr );
    jtA.join(), jtB.join();
    results.emplace_back( what, tSum / ((double)ROUNDS * 2.0) );
    };
    // atomic coutner with futex-semaphore
    atomic_uint32_t
    lockCounter( 0 ),
    futSem( 0 );
    bench( "atomic coutner with futex-semaphore", [&]
    {
    if( lockCounter.fetch_add( 1, memory_order_acquire ) > 0 )
    for( uint32_t before = futSem.load(
    memory_order_relaxed ); ; )
    if( !before )
    futSem.wait( 0, memory_order_relaxed ),
    before = futSem.load( memory_order_relaxed );
    else if( futSem.compare_exchange_strong( before,
    before - 1, memory_order_acquire, memory_order_relaxed ) ) [[likely]]
    break;
    if( lockCounter.fetch_sub( 1, memory_order_relaxed ) > 1 )
    for( uint32_t before = futSem.load(
    memory_order_relaxed ); ; )
    if( futSem.compare_exchange_strong( before, before
    + 1, memory_order_release, memory_order_relaxed ) )
    {
    futSem.notify_one();
    break;
    }
    } );
    // std::mutex
    mutex mtx;
    bench( "std::mutex", [&] { mtx.lock(), mtx.unlock(); });
    // atomic coutner with a binary_semaphore
    binary_semaphore sem( 0 );
    bench( "atomic coutner with binary_semaphore", [&]
    {
    if( lockCounter.fetch_add( 1, memory_order_acquire ) > 0 )
    sem.acquire();
    if( lockCounter.fetch_sub( 1, memory_order_release ) > 1 )
    sem.release();
    } );
    // xchg-mutex and always wakeup
    atomic_bool lockFlag( false );
    bench( "xchg-mutex and always wakeup", [&]
    {
    while( lockFlag.exchange( true, memory_order_acquire ) )
    lockFlag.wait( true, memory_order_relaxed );
    lockFlag.store( false, memory_order_release );
    lockFlag.notify_one();
    } );
    // Chris' mutex
    atomic_int lockState( 0 );
    bench( "Chris' mutex", [&]
    {
    if( lockState.exchange( 1, memory_order_acquire ) )
    while( lockState.exchange( 2, memory_order_acquire ) )
    lockState.wait( 2, memory_order_acquire );
    if( lockState.exchange( 0, memory_order_release ) == 2 )
    lockState.notify_one();
    } );
    sort( results.begin(), results.end(), []( kv &lhs, kv &rhs ) {
    return lhs.second < rhs.second; } );
    for( kv &result : results )
    cout << result.first << ": " << result.second << endl;
    }
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 12:02:18 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out those
    neat C++20 futexes.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 12:06:07 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 12:02 PM, Chris M. Thomasson wrote:
    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    Error! It's not my semaphore!


    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out those
    neat C++20 futexes.


    OOPS! I misstated, I did not create this algo. Afaict, the original can
    be found here:

    https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html

    Sorry everybody. Now, here is _my_ rwlock:

    https://vorbrodt.blog/2019/02/14/read-write-mutex

    All me! :^)
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 12:10:19 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:

    [...]
        atomic_int lockState( 0 );
        bench( "Chris' mutex", [&]
            {
                if( lockState.exchange( 1, memory_order_acquire ) )
                    while( lockState.exchange( 2, memory_order_acquire ) )


                        lockState.wait( 2, memory_order_acquire );

    No need for memory_order_acquire on the wait. Relaxed will do.


                if( lockState.exchange( 0, memory_order_release ) == 2 )
                    lockState.notify_one();
            } );
        sort( results.begin(), results.end(), []( kv &lhs, kv &rhs ) { return lhs.second < rhs.second; } );
        for( kv &result : results )
            cout << result.first << ": " << result.second << endl;
    }

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Apr 14 22:57:30 2024
    From Newsgroup: comp.lang.c++

    Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out those
    neat C++20 futexes.


    A semaphore with a mutex, you're funny ...
    That ain't fast.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 20:23:28 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 1:57 PM, Bonita Montero wrote:
    Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out those
    neat C++20 futexes.


    A semaphore with a mutex, you're funny ...
    That ain't fast.

    The fast_semaphore::semaphore is meant to be a kernel semaphore. So, a
    Windows one will work. A Linux one will work, and the portable mutex
    condvar semaphore impl emulation will work. This semaphore is fast wrt
    its fast paths. Btw, you have heard of the benaphore before, right?

    https://www.haiku-os.org/legacy-docs/benewsletter/Issue1-26.html#Engineering1-26

    Its been there for a while... ;^) It's something that every threading
    nerd should know about. Remember when you called me a nerd several times
    in the past? lol. ;^)

    Anyway, take a close look at the fast paths vs the slow paths:

    https://pastebin.com/raw/Q7p6e7Xc

    _________________
    class fast_semaphore
    {
    public:
    fast_semaphore(int count) noexcept
    : m_count(count), m_semaphore(0) {}

    void post()
    {
    std::atomic_thread_fence(std::memory_order_release);
    int count = m_count.fetch_add(1, std::memory_order_relaxed);
    if (count < 0)
    m_semaphore.post();
    }

    void wait()
    {
    int count = m_count.fetch_sub(1, std::memory_order_relaxed);
    if (count < 1)
    m_semaphore.wait();
    std::atomic_thread_fence(std::memory_order_acquire);
    }

    private:
    std::atomic<int> m_count;
    semaphore m_semaphore;
    };
    _________________




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 20:26:13 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:
    On 4/14/2024 1:57 PM, Bonita Montero wrote:
    Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out
    those neat C++20 futexes.


    A semaphore with a mutex, you're funny ...
    That ain't fast.
    [...]

    I think you might have a problem differentiating a fast-path from a slow-path... Humm...

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 15 06:15:55 2024
    From Newsgroup: comp.lang.c++

    Am 15.04.2024 um 05:26 schrieb Chris M. Thomasson:
    On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:
    On 4/14/2024 1:57 PM, Bonita Montero wrote:
    Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out
    those neat C++20 futexes.


    A semaphore with a mutex, you're funny ...
    That ain't fast.
    [...]

    I think you might have a problem differentiating a fast-path from a slow-path... Humm...


    Absolutely not, but you consider a futex wake as slow.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Apr 14 22:03:24 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 9:15 PM, Bonita Montero wrote:
    Am 15.04.2024 um 05:26 schrieb Chris M. Thomasson:
    On 4/14/2024 8:23 PM, Chris M. Thomasson wrote:
    On 4/14/2024 1:57 PM, Bonita Montero wrote:
    Am 14.04.2024 um 21:02 schrieb Chris M. Thomasson:
    On 4/14/2024 6:12 AM, Bonita Montero wrote:
    That's my code:
    [...]

    Might as well throw my semaphore into the mix:

    https://vorbrodt.blog/2019/02/05/fast-semaphore/

    I will code up a little program, most likely today, to check out
    those neat C++20 futexes.


    A semaphore with a mutex, you're funny ...
    That ain't fast.
    [...]

    I think you might have a problem differentiating a fast-path from a
    slow-path... Humm...


    Absolutely not, but you consider a futex wake as slow.


    Yes. I consider a futex notify and wake as "slow" paths. One should
    strive to avoid them. I don't know about the current impl of futex wrt
    Linux and Windows. I just treat them as kernel calls that should be on
    the slow path. Not on the fast path. The damn thing might even lock an internal hashed mutex on the address for all I know wrt an unknown impl
    of a futex.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Apr 15 08:20:51 2024
    From Newsgroup: comp.lang.c++

    Am 15.04.2024 um 07:03 schrieb Chris M. Thomasson:

    On 4/14/2024 9:15 PM, Bonita Montero wrote:

    Absolutely not, but you consider a futex wake as slow.

    Yes. I consider a futex notify and wake as "slow" paths.

    I've mentioned that a futex wake with no contenders is 1.7ns or about
    10 clock cycles on my Zen4-CPU; that's nothing worth to think about.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Apr 15 12:08:57 2024
    From Newsgroup: comp.lang.c++

    On 4/14/2024 11:20 PM, Bonita Montero wrote:
    Am 15.04.2024 um 07:03 schrieb Chris M. Thomasson:

    On 4/14/2024 9:15 PM, Bonita Montero wrote:

    Absolutely not, but you consider a futex wake as slow.

    Yes. I consider a futex notify and wake as "slow" paths.

    I've mentioned that a futex wake with no contenders is 1.7ns or about
    10 clock cycles on my Zen4-CPU; that's nothing worth to think about.


    I don't know what that futex notify is actually doing under the covers.
    One impl might be faster than another. Whatever. I still consider a
    futex notify to be a slow path because it denotes contention. Trying to
    avoid a call into it is still rather "prudent", well according to me.
    Why call a notify when we do not have to?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Apr 16 06:32:11 2024
    From Newsgroup: comp.lang.c++

    Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

    I don't know what that futex notify is actually doing under the covers.

    In 10 clock cycles you can't do much.

    One impl might be faster than another.


    With Windows it's ony 2.7ns and with Linux 1.7ns.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Apr 16 11:31:34 2024
    From Newsgroup: comp.lang.c++

    Am 16.04.2024 um 06:32 schrieb Bonita Montero:
    Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

    I don't know what that futex notify is actually doing under the covers.

    In 10 clock cycles you can't do much.

    One impl might be faster than another.


    With Windows it's ony 2.7ns and with Linux 1.7ns.

    Try it yourself:

    #include <iostream>
    #include <atomic>
    #include <chrono>

    using namespace std;
    using namespace chrono;

    int main()
    {
    atomic_bool ab;
    auto start = high_resolution_clock::now();
    for( size_t i = 1'000'000'000; i; --i )
    ab.notify_one();
    cout << duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e9 << endl;
    }

    With WSL2 on a Zen4-CPU it's only 1.2ns, i.e. seven clock cycles.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Apr 16 14:07:35 2024
    From Newsgroup: comp.lang.c++

    On 4/16/2024 2:31 AM, Bonita Montero wrote:
    Am 16.04.2024 um 06:32 schrieb Bonita Montero:
    Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

    I don't know what that futex notify is actually doing under the covers.

    In 10 clock cycles you can't do much.

    One impl might be faster than another.


    With Windows it's ony 2.7ns and with Linux 1.7ns.

    Try it yourself:

    #include <iostream>
    #include <atomic>
    #include <chrono>

    using namespace std;
    using namespace chrono;

    int main()
    {
        atomic_bool ab;
        auto start = high_resolution_clock::now();
        for( size_t i = 1'000'000'000; i; --i )
            ab.notify_one();
        cout << duration_cast<nanoseconds>( high_resolution_clock::now() - start ).count() / 1.0e9 << endl;
    }

    With WSL2 on a Zen4-CPU it's only 1.2ns, i.e. seven clock cycles.


    I will, however I am working on another project right now, heavy
    graphics based. Basically, I would want to see how notify_one is
    actually implemented. Bust out the disassembler.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Apr 16 14:14:25 2024
    From Newsgroup: comp.lang.c++

    On 4/15/2024 9:32 PM, Bonita Montero wrote:
    Am 15.04.2024 um 21:08 schrieb Chris M. Thomasson:

    I don't know what that futex notify is actually doing under the covers.

    In 10 clock cycles you can't do much.

    Well, futex notify might have fast paths in and of itself. To be prudent
    I would need to see how they implement it to allow a futex notify by,
    every time. Fair enough?



    One impl might be faster than another.


    With Windows it's ony 2.7ns and with Linux 1.7ns.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Wed Apr 17 10:24:20 2024
    From Newsgroup: comp.lang.c++

    Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:

    I will, however I am working on another project right now, heavy
    graphics based. Basically, I would want to see how notify_one is
    actually implemented. Bust out the disassembler.

    It will practically not matter how it is implemented if it is that fast.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Wed Apr 17 11:48:42 2024
    From Newsgroup: comp.lang.c++

    On 4/17/2024 1:24 AM, Bonita Montero wrote:
    Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:

    I will, however I am working on another project right now, heavy
    graphics based. Basically, I would want to see how notify_one is
    actually implemented. Bust out the disassembler.

    It will practically not matter how it is implemented if it is that fast.

    It just might take an internal hashed mutex to check for contention...
    Humm... I don't know until I look at it. I don't even want it to do a
    CAS, or execute any membars on a "fast-path". Since I don't know what
    its doing under the covers, I still think its "prudent" to try to avoid calling into it when we do absolutely have to.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Wed Apr 17 11:52:13 2024
    From Newsgroup: comp.lang.c++

    On 4/17/2024 11:48 AM, Chris M. Thomasson wrote:
    On 4/17/2024 1:24 AM, Bonita Montero wrote:
    Am 16.04.2024 um 23:07 schrieb Chris M. Thomasson:

    I will, however I am working on another project right now, heavy
    graphics based. Basically, I would want to see how notify_one is
    actually implemented. Bust out the disassembler.

    It will practically not matter how it is implemented if it is that fast.

    It just might take an internal hashed mutex to check for contention... Humm... I don't know until I look at it. I don't even want it to do a
    CAS, or execute any membars on a "fast-path". Since I don't know what
    its doing under the covers, I still think its "prudent" to try to avoid calling into it when we do absolutely have to.
    ^^^^^^^^^^^^^^

    God damn typos! Sorry Bonita!

    I still think its "prudent" to try to avoid calling into it, _except_
    when we absolutely have to.

    Argh! Sorry again.


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Thu Apr 18 15:12:22 2024
    From Newsgroup: comp.lang.c++

    Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:

    Well, futex notify might have fast paths in and of itself. To be prudent
    I would need to see how they implement it to allow a futex notify by,
    every time. Fair enough?

    I'm asking myself if it would be possible to have context-switching as
    most as possible in userspace. If there would be a context-switch from
    one thread of a process to another thread because a timeslice expired
    the kernel should send a signal to the thread and the thread does the
    userspace context-switch by itself. Only if there's a context switch
    to another process' thread or in kernel mode the kernel's scheduler
    acts itself.
    This would give the opportunity to have voluntary context switches
    when doing locking much faster than trough the kernel, and voluntary
    context switches usually happen with a much higher frequency that
    there would be a real gain.
    With Linux this would be possible trough signals and on Windows the
    kernel could induce SEH-exceptions for a thread-switch.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@richard@damon-family.org to comp.lang.c++ on Thu Apr 18 22:03:12 2024
    From Newsgroup: comp.lang.c++

    On 4/18/24 9:12 AM, Bonita Montero wrote:
    Am 16.04.2024 um 23:14 schrieb Chris M. Thomasson:

    Well, futex notify might have fast paths in and of itself. To be
    prudent I would need to see how they implement it to allow a futex
    notify by, every time. Fair enough?

    I'm asking myself if it would be possible to have context-switching as
    most as possible in userspace. If there would be a context-switch from
    one thread of a process to another thread because a timeslice expired
    the kernel should send a signal to the thread and the thread does the userspace context-switch by itself. Only if there's a context switch
    to another process' thread or in kernel mode the kernel's scheduler
    acts itself.
    This would give the opportunity to have voluntary context switches
    when doing locking much faster than trough the kernel, and voluntary
    context switches usually happen with a much higher frequency that
    there would be a real gain.
    With Linux this would be possible trough signals and on Windows the
    kernel could induce SEH-exceptions for a thread-switch.


    How do you "signal" a user-thread without doing a kernel operation and a thread switch?

    Admittedly, if the kernel knows it is switching from one thread to
    another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Apr 19 06:18:18 2024
    From Newsgroup: comp.lang.c++

    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    How do you "signal" a user-thread without doing a kernel operation
    and a thread switch?

    The signals for the involuntary userspace thread-switch would be sent
    by a dedicated kernel-thread or by the timer interrupt. This would be
    more expensive than a thread-switch through the timer interrupt but
    as voluntary thread-switches have a much higher frequency this would
    be outweighed.

    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of context-switch, but it still needs to deal with kernel space operations.

    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Fri Apr 19 13:41:36 2024
    From Newsgroup: comp.lang.c++

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.

    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    SVR4.2MP implemented a M-N thread model (M user threads mapped to
    N kernel threads). Turned out not to work well.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Apr 19 15:44:37 2024
    From Newsgroup: comp.lang.c++

    Am 19.04.2024 um 15:41 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.

    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    SVR4.2MP implemented a M-N thread model (M user threads mapped to
    N kernel threads). Turned out not to work well.

    The thing that I'm imaging is still 1:1 but if threads are in userspace thread-switching would be done by the userspace.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Fri Apr 19 14:48:44 2024
    From Newsgroup: comp.lang.c++

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 15:41 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations. >>>
    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    SVR4.2MP implemented a M-N thread model (M user threads mapped to
    N kernel threads). Turned out not to work well.

    The thing that I'm imaging is still 1:1 but if threads are in userspace >thread-switching would be done by the userspace.

    Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Fri Apr 19 17:32:38 2024
    From Newsgroup: comp.lang.c++

    Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

    Feel free to prototype it using setcontext(2), getcontext(2) and makecontext(2).

    I'd need the support of the kernel which should not make context
    switches to another thread inside the same process if the thread
    is within userspace. And the kernel should have to periodically
    inject signals from the timer interrupt to userspace to make it
    possible that the userspace-code does the involuntary context
    -switch on its own. And I'd need synchronization-primitives like
    mutexes and semaphores that would do the otherwise costly context
    -switch in userspace; but that's rather easy compared to the kernel
    support.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Fri Apr 19 18:27:14 2024
    From Newsgroup: comp.lang.c++

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

    Feel free to prototype it using setcontext(2), getcontext(2) and
    makecontext(2).

    I'd need the support of the kernel which should not make context
    switches to another thread inside the same process if the thread
    is within userspace. And the kernel should have to periodically
    inject signals from the timer interrupt to userspace to make it
    possible that the userspace-code does the involuntary context
    -switch on its own. And I'd need synchronization-primitives like
    mutexes and semaphores that would do the otherwise costly context
    -switch in userspace; but that's rather easy compared to the kernel
    support.

    https://www.kernel.org/

    Feel free to modify the kernel to your heart's content.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 20 04:48:47 2024
    From Newsgroup: comp.lang.c++

    Am 19.04.2024 um 20:27 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

    Feel free to prototype it using setcontext(2), getcontext(2) and
    makecontext(2).

    I'd need the support of the kernel which should not make context
    switches to another thread inside the same process if the thread
    is within userspace. And the kernel should have to periodically
    inject signals from the timer interrupt to userspace to make it
    possible that the userspace-code does the involuntary context
    -switch on its own. And I'd need synchronization-primitives like
    mutexes and semaphores that would do the otherwise costly context
    -switch in userspace; but that's rather easy compared to the kernel
    support.

    https://www.kernel.org/

    Feel free to modify the kernel to your heart's content.

    Seems you don't understand the idead and you think this isn't
    possible.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Richard Damon@richard@damon-family.org to comp.lang.c++ on Fri Apr 19 23:00:18 2024
    From Newsgroup: comp.lang.c++

    On 4/19/24 12:18 AM, Bonita Montero wrote:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    How do you "signal" a user-thread without doing a kernel operation
    and a thread switch?

    The signals for the involuntary userspace thread-switch would be sent
    by a dedicated kernel-thread or by the timer interrupt. This would be
    more expensive than a thread-switch through the timer interrupt but
    as voluntary thread-switches have a much higher frequency this would
    be outweighed.

    TO WHAT?

    Are you going to reserve a core with a dedicated thread to do this?

    To "interrupt" a user thread to notify it, you would either need to
    perform a context switch to save the threads previous context or make
    the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
    that you want to go to.


    Admittedly, if the kernel knows it is switching from one thread
    to another in the same process it can do a lighter weight sort of
    context-switch, but it still needs to deal with kernel space operations.

    A context switch through the kernel is always expensive. A user
    -level thread switch when blocking for a lock would be much faster.

    But you still had the kernal doing a context switch.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 20 06:11:57 2024
    From Newsgroup: comp.lang.c++

    Am 20.04.2024 um 05:00 schrieb Richard Damon:
    On 4/19/24 12:18 AM, Bonita Montero wrote:
    Am 19.04.2024 um 04:03 schrieb Richard Damon:

    How do you "signal" a user-thread without doing a kernel operation
    and a thread switch?

    The signals for the involuntary userspace thread-switch would be sent
    by a dedicated kernel-thread or by the timer interrupt. This would be
    more expensive than a thread-switch through the timer interrupt but
    as voluntary thread-switches have a much higher frequency this would
    be outweighed.

    TO WHAT?

    Are you going to reserve a core with a dedicated thread to do this?

    A userspace thread is scheduled on a core if there's no other thread
    on that core. But if there would be other threads eligible to run on
    the core they would be scheduled by a signal in userspace if their
    context is also in userspace.

    To "interrupt" a user thread to notify it, you would either need to
    perform a context switch to save the threads previous context or make
    the interrupt non-returnable. If you are going to context switch to the notification thread, you might as well switch the the new user-thread
    that you want to go to.

    You simply don't understand my idea !


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Sat Apr 20 15:16:49 2024
    From Newsgroup: comp.lang.c++

    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 20:27 schrieb Scott Lurndal:
    Bonita Montero <Bonita.Montero@gmail.com> writes:
    Am 19.04.2024 um 16:48 schrieb Scott Lurndal:

    Feel free to prototype it using setcontext(2), getcontext(2) and
    makecontext(2).

    I'd need the support of the kernel which should not make context
    switches to another thread inside the same process if the thread
    is within userspace. And the kernel should have to periodically
    inject signals from the timer interrupt to userspace to make it
    possible that the userspace-code does the involuntary context
    -switch on its own. And I'd need synchronization-primitives like
    mutexes and semaphores that would do the otherwise costly context
    -switch in userspace; but that's rather easy compared to the kernel
    support.

    https://www.kernel.org/

    Feel free to modify the kernel to your heart's content.

    Seems you don't understand the idead and you think this isn't
    possible.


    It seems the lack of understanding is on you. Du verstehst es nicht.

    "... should not make context switches to another thread inside
    the same process if the thread is within userspace".

    It seems clear that the thread within userspace is completely
    invisible to the kernel, thus it cannot by definition switch to it.

    It's not that I don't think it is possible, I just don't think
    it provides any measurable benefit for the added complexity.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Apr 20 17:36:35 2024
    From Newsgroup: comp.lang.c++

    Am 20.04.2024 um 17:16 schrieb Scott Lurndal:

    It seems clear that the thread within userspace is completely
    invisible to the kernel, thus it cannot by definition switch to it.

    I'm not talking about userlevel-threads but the state when a thread
    is in userspace-state. You lack imagination since you don't understand
    my idea.

    --- Synchronet 3.20a-Linux NewsLink 1.114