• Idea for spin-wait loops

    From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sat Mar 23 17:53:40 2024
    From Newsgroup: comp.lang.c++

    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 23 13:52:49 2024
    From Newsgroup: comp.lang.c++

    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sat Mar 23 13:58:02 2024
    From Newsgroup: comp.lang.c++

    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Mar 24 07:37:33 2024
    From Newsgroup: comp.lang.c++

    Am 23.03.2024 um 21:52 schrieb Chris M. Thomasson:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    Not all kinds of mutexes can be done with a futex.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Sun Mar 24 07:38:02 2024
    From Newsgroup: comp.lang.c++

    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Sun Mar 24 12:33:42 2024
    From Newsgroup: comp.lang.c++

    On 3/23/2024 11:37 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:52 schrieb Chris M. Thomasson:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    Not all kinds of mutexes can be done with a futex.


    Have you ever heard of an asymmetric mutex?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c++ on Sun Mar 24 20:43:37 2024
    From Newsgroup: comp.lang.c++

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until

    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Mar 25 07:23:14 2024
    From Newsgroup: comp.lang.c++

    Am 24.03.2024 um 21:43 schrieb Scott Lurndal:

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    MONITOR / MWAIT is nearly the same except for the timeout.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c++ on Mon Mar 25 14:34:50 2024
    From Newsgroup: comp.lang.c++

    On Sun, 24 Mar 2024 20:43:37 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until

    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).


    It seems, I didn't understand the idea.
    Of course, the waiting thread/core has the word in question in its
    L1D cache when it enters the wait loop.
    Of course, it is awaken if/when the the word is evicted from the cache
    for unrelated reason, i.e. practically because of capacity conflict
    caused by activity of other threads that are running on the same
    core. There is nothing wrong with spurious awakenings as long as they
    are rare.

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    The problem does exist and primitive proposed by Bonita is not new. It
    is a minor modification of Monitor/Mwait.
    For current Intel and AMD processors this sort of things is
    relatively unattractive because at 2 threads per core and with rather measurable throughput gains achieved by running 2 threads instead of
    one (for AMD up to 30%, for Intel a little less, but often measurable),
    each thread is a valuable resource, so you don't really want to keep it
    paused for too long time. And the whole point of Bonita's amendment of
    existing mechanism is that the software has more control on long waits.

    On IBM POWER and on few of Sun/Oracle chips they have up to 8 threads
    per core, so each thread is not that valuable. It means that longer uninterrupted wait has more sense and control of duration of the
    timeout is more desirable. But may be IBM's and Oracle's variants of
    MWAIT already have it?




    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c++ on Mon Mar 25 19:11:22 2024
    From Newsgroup: comp.lang.c++

    On Mon, 25 Mar 2024 14:34:50 +0200
    Michael S <already5chosen@yahoo.com> wrote:

    On Sun, 24 Mar 2024 20:43:37 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until


    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).


    It seems, I didn't understand the idea.

    I meant to say 'you' instead of 'I'.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Mon Mar 25 18:53:52 2024
    From Newsgroup: comp.lang.c++

    Am 25.03.2024 um 13:34 schrieb Michael S:

    The problem does exist and primitive proposed by Bonita is not new.
    It is a minor modification of Monitor/Mwait.

    Functionally the modification is minor, but the effect would be
    major since the cache-interconnect traffic would be minimized.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Mon Mar 25 19:48:27 2024
    From Newsgroup: comp.lang.c++

    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Mar 26 11:12:07 2024
    From Newsgroup: comp.lang.c++

    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout
    according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory.
    Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Mar 26 13:02:47 2024
    From Newsgroup: comp.lang.c++

    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout >>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory. >>>>>> Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Mar 26 13:13:58 2024
    From Newsgroup: comp.lang.c++

    On 3/25/2024 10:53 AM, Bonita Montero wrote:
    Am 25.03.2024 um 13:34 schrieb Michael S:

    The problem does exist and primitive proposed by Bonita is not new.
    It is a minor modification of Monitor/Mwait.

    Functionally the modification is minor, but the effect would be
    major since the cache-interconnect traffic would be minimized.


    Ask over in comp.arch
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Tue Mar 26 21:23:07 2024
    From Newsgroup: comp.lang.c++

    Am 26.03.2024 um 21:02 schrieb Chris M. Thomasson:
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait >>>>>>> -loops. The idea is that a thread of a processors enters a sleep >>>>>>> state if a word in memory is equal to a certain register until
    the cacheline containing the word is modified or there's a timeout >>>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory. >>>>>>> Polling would occur only if the cacheline would be modified by
    another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout... You
    are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    MWAIT could replace polling / spinning on a mutex for a limited
    time if it would have a timeout.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Mar 26 13:30:45 2024
    From Newsgroup: comp.lang.c++

    On 3/26/2024 1:23 PM, Bonita Montero wrote:
    Am 26.03.2024 um 21:02 schrieb Chris M. Thomasson:
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait >>>>>>>> -loops. The idea is that a thread of a processors enters a sleep >>>>>>>> state if a word in memory is equal to a certain register until >>>>>>>> the cacheline containing the word is modified or there's a timeout >>>>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in memory. >>>>>>>> Polling would occur only if the cacheline would be modified by >>>>>>>> another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    MWAIT could replace polling / spinning on a mutex for a limited
    time if it would have a timeout.


    So, you timeout, check some other stuff, then wait again. Still sounds
    like polling?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Tue Mar 26 13:31:24 2024
    From Newsgroup: comp.lang.c++

    On 3/26/2024 1:30 PM, Chris M. Thomasson wrote:
    On 3/26/2024 1:23 PM, Bonita Montero wrote:
    Am 26.03.2024 um 21:02 schrieb Chris M. Thomasson:
    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:
    On 3/23/2024 1:52 PM, Chris M. Thomasson wrote:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait >>>>>>>>> -loops. The idea is that a thread of a processors enters a sleep >>>>>>>>> state if a word in memory is equal to a certain register until >>>>>>>>> the cacheline containing the word is modified or there's a timeout >>>>>>>>> according to the timestamp-counter's value.
    This would eliminate active spinning and polling a value in >>>>>>>>> memory.
    Polling would occur only if the cacheline would be modified by >>>>>>>>> another thread.

    futex

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    MWAIT could replace polling / spinning on a mutex for a limited
    time if it would have a timeout.


    So, you timeout, check some other stuff, then wait again. Still sounds
    like polling?

    Sounds like you want a hardware based futex.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Bonita Montero@Bonita.Montero@gmail.com to comp.lang.c++ on Wed Mar 27 10:18:47 2024
    From Newsgroup: comp.lang.c++

    Am 26.03.2024 um 21:30 schrieb Chris M. Thomasson:

    So, you timeout, check some other stuff, then wait again.
    Still sounds like polling?

    The checks only would occur if the cacheline containing the
    word actually was modified.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Michael S@already5chosen@yahoo.com to comp.lang.c++ on Wed Mar 27 17:09:57 2024
    From Newsgroup: comp.lang.c++

    On Tue, 26 Mar 2024 13:02:47 -0700
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> wrote:

    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    I don't know what you mean by 'get around'.
    The main point of original Monitor/MWAIT is to allow to one SMT thread
    to do polling on memory address in a way that consumes almost no core's execution resources thus allowing to the other SMT thread(s) of the
    same core to run faster. The sort of more intelligent PAUSE.
    In the absence of other SMT threads the main advantage of polling
    loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
    power consumption.
    As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
    polling loop provides virtually no advantage relatively to STPL. Both
    are quite efficient from CCT perspective, at least as long as programmer
    does not do anything stupid.

    Later on Intel invented 'MWAIT for Power Management' that has slightly different objectives. But that is O.T.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c++ on Wed Mar 27 12:58:50 2024
    From Newsgroup: comp.lang.c++

    On 3/27/2024 8:09 AM, Michael S wrote:
    On Tue, 26 Mar 2024 13:02:47 -0700
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> wrote:

    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less
    interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    I don't know what you mean by 'get around'.

    Turing a "hot" spin wait into a cooler one...

    ;^)


    The main point of original Monitor/MWAIT is to allow to one SMT thread
    to do polling on memory address in a way that consumes almost no core's execution resources thus allowing to the other SMT thread(s) of the
    same core to run faster. The sort of more intelligent PAUSE.
    In the absence of other SMT threads the main advantage of polling
    loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
    power consumption.
    As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
    polling loop provides virtually no advantage relatively to STPL. Both
    are quite efficient from CCT perspective, at least as long as programmer
    does not do anything stupid.

    Later on Intel invented 'MWAIT for Power Management' that has slightly different objectives. But that is O.T.


    Indeed.
    --- Synchronet 3.20a-Linux NewsLink 1.114