• GIL-Removal Project Takes Another Step (Posting On Python-ListProhibited)

    From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.python on Fri Mar 15 01:51:12 2024
    From Newsgroup: comp.lang.python

    Python takes another step in removing a major bottleneck to
    multithreading performance <https://devclass.com/2024/03/12/python-progresses-towards-faster-concurrency-option-to-disable-gil-merged-into-main-code/>.

    Currently Python uses a combination of reference-counting and garbage collection in order to avoid the need for programmers to have to keep
    track of allocated objects and remembering when to dispose of them.
    A pure garbage collection scheme, like in Java or LISP, makes it easier
    to support multithreading, but at the cost of memory usage that can
    easily get out of hand. Reference counting helps to ensure that objects disappear as soon as the program forgets its last reference to them,
    and this works well for most objects in a typical program, with garbage collection as a fallback for cleaning up the rest.

    But Python’s present scheme for maintaining reference counts (the
    “Global Interpeter Lock” or “GIL”) prevents Python code for taking full advantage of multiple threads. Some are advocating switching to the
    pure garbage-collection approach, but fortunately (I think) this is not
    the plan that has been adopted by the Council. Instead, they are going
    to use a technique known as “Biased Reference Counting”. This splits
    the reference count into two components, one managed by a thread which
    is considered to “own” the object (and is presumably responsible for
    most accesses to the object), while the other is managed on a shared
    basis by other threads accessing the object (and making fewer accesses
    to it). This seems to offer the best performance in tests so far.

    The switchover is a complicated procedure, which is certain to have implications for some existing Python code that never had to worry about
    thread safety before, as well as far-reaching implications for the
    design of the CPython implementation itself. So it will take place in
    multiple stages over some years, and if worst comes to worst, the
    changes can always be rolled back. (Or a different strategy chosen.)

    Seems some people are still smarting over the flak they got from the
    Python 2 → 3 transition. “This is not Python 4,” they are saying. But
    why not call it “Python 4”, as a warning over the likely compatibility issues? Even if it probably won’t be quite as painful ...
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.python on Mon Mar 18 17:11:52 2024
    From Newsgroup: comp.lang.python

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    But Python’s present scheme for maintaining reference counts (the
    “Global Interpeter Lock” or “GIL”) prevents Python code for taking full
    advantage of multiple threads. Some are advocating switching to the
    pure garbage-collection approach, but fortunately (I think) this is not
    the plan that has been adopted by the Council.

    I don't know the pro and anti GC arguments specifically in flight, so I
    won't take a side here, but the idea that GC uses more memory seems
    erroneous to me. MicroPython uses GC and runs in machines with as
    little as 16KB of ram (the BBC Micro v1). I've used the CircuitPython
    variant on chips with 32KB of ram and it is reasonably comfortable on
    those. I don't think CPython has ever run on machines that small. GC
    was invented in the 1950s for use in Lisp, on computers of that era that
    were tiny compared with today's computers.

    Classic GC adds one bit of overhead to each object, while reference
    counting requires storing a refcount that is potentially large. A
    strict refcounting approach in a big computer might even need 64 bit
    refcounts. Also, the refcount has to be modified constantly. I'm
    amazed if that doesn't slow things down badly even in the single
    threaded case.

    Also, the "with" statement was added to Python partly to support
    GC-based implementations, as some applications were relying on
    refcounting to release resources when the object went out of scope, a
    bad kludge.

    Instead, they are going to use a technique known as “Biased Reference Counting”.

    That sounds ugly but I'm far away from it so dunno.

    This seems to offer the best performance in tests so far.

    They compared it to a serious GC and it won? If yes, that is
    interesting.

    Seems some people are still smarting over the flak they got from the
    Python 2 → 3 transition. “This is not Python 4,” they are saying. But why not call it “Python 4”, as a warning over the likely compatibility issues? Even if it probably won’t be quite as painful ...

    We seem to be getting a Python 4 transition (i.e. breaking old code)
    with each new release of Python 3, so this is just more of the same.

    Anyway I'm glad effort is being made to remove the GIL. If it were up
    to me, I'd switch to BEAM or something like it as the underlying VM.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.python on Tue Mar 19 02:33:44 2024
    From Newsgroup: comp.lang.python

    On Mon, 18 Mar 2024 17:11:52 -0700, Paul Rubin wrote:

    ... the idea that GC uses more memory seems erroneous to me.

    It will use whatever memory it is permitted to use.

    Consider that every time you call a method, a new method-wrapper object is dynamically created, and almost always immediately deleted when the call returns. So even a very simple, seemingly well-behaved Python script, if running for long enough, would consume more and more memory if it were not
    for reference-counting.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.python on Tue Mar 19 17:51:54 2024
    From Newsgroup: comp.lang.python

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    So even a very simple, seemingly well-behaved Python script, if
    running for long enough, would consume more and more memory if it were
    not for reference-counting.

    That is completely false. It's usual to set a GC to run every so-many allocations. GHC normally does a minor GC every 256K of allocations so
    that the most recent stuff fits in the L2 CPU cache, speeding things up
    a lot. Refcounting schemes are of course incapable of that optimization because they don't relocate objects in memory.

    You can of course configure a GC to not run very often, in which case
    the memory region can get large. That is an optimization you do
    intentionally, to spend less CPU time doing GC, and of course you only
    do that if you have the memory for it. I think you are imagining that
    people always do that, but again remember MicroPython.

    The allocation of a new method wrapper on every method call is of course something that the interpreter could also be optimized to not do. The
    Emacs Lisp interpreter does something like that for function args, IIRC.
    They are passed on a permanent stack instead of in temporary cons cells.

    Erlang on a midsized server can run millions of lightweight processes in
    its VM, each with its own GC. The minimum ram size of an Erlang process
    is around 2KB iirc. But I don't know if they get bigger than that
    before the GC runs.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.python on Wed Mar 20 03:14:49 2024
    From Newsgroup: comp.lang.python

    On Tue, 19 Mar 2024 17:51:54 -0700, Paul Rubin wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    So even a very simple, seemingly well-behaved Python script, if running
    for long enough, would consume more and more memory if it were not for
    reference-counting.

    That is completely false. It's usual to set a GC to [fix it so it’s not false] ...

    In other words, it’s not “completely” false if you have to do something to
    make it false. But that GC process creates its own overhead, not to
    mention the latency when there isn’t quite enough memory for an allocation and you have to wait until the next GC run to proceed. Run the GC a
    thousand times a second, and the latency is still 1 millisecond.

    With reference counting, most objects are immediately freed as soon as
    they are discarded--no need to wait for the next GC run.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Greg Ewing@greg.ewing@canterbury.ac.nz to comp.lang.python on Wed Mar 20 20:29:30 2024
    From Newsgroup: comp.lang.python

    On 20/03/24 4:14 pm, Lawrence D'Oliveiro wrote:
    not to
    mention the latency when there isn’t quite enough memory for an allocation and you have to wait until the next GC run to proceed. Run the GC a
    thousand times a second, and the latency is still 1 millisecond.

    That's not the way it usually works. If you run out of memory, you
    run a GC there and then. You don't have to wait for GCs to occur on
    a time schedule.

    Also, as a previous poster pointed out, GCs are typically scheduled
    by number of allocations, not by time.
    --
    Greg
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Chris Angelico@rosuav@gmail.com to comp.lang.python on Wed Mar 20 18:42:21 2024
    From Newsgroup: comp.lang.python

    On Wed, 20 Mar 2024 at 18:31, Greg Ewing via Python-list <python-list@python.org> wrote:

    On 20/03/24 4:14 pm, Lawrence D'Oliveiro wrote:
    not to
    mention the latency when there isn’t quite enough memory for an allocation
    and you have to wait until the next GC run to proceed. Run the GC a thousand times a second, and the latency is still 1 millisecond.

    That's not the way it usually works. If you run out of memory, you
    run a GC there and then. You don't have to wait for GCs to occur on
    a time schedule.

    Also, as a previous poster pointed out, GCs are typically scheduled
    by number of allocations, not by time.

    FYI you're violating someone's request by responding to them in a way
    that results in it getting onto python-list, so it's probably safest
    to just ignore cranks and trolls and let them stew in their own
    juices.
    But normally the GC doesn't need to be scheduled at all. In CPython,
    the only reason to "run garbage collection" is to detect cycles, so
    you would have to be generating inordinate amounts of cyclic garbage
    for this to matter at all.
    ChrisA
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.python on Wed Mar 20 01:35:00 2024
    From Newsgroup: comp.lang.python

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    In other words, it’s not “completely” false if you have to do something to
    make it false.

    No you don't have to do anything, it is false by default. In Java the
    default GC interval is relatively small, and to make it bigger you use
    the -XMx option iirc.

    Remember we are talking about GIL removal to get speedups on multicore processors. I expect any such processors these days

    But that GC process creates its own overhead

    Refcounting has overhead too!


    not to mention the latency when there isn’t quite enough memory for an allocation and you have to wait until the next GC run to proceed.

    If you are GC'ing every N allocations and you run out of free space
    before you've done those N, you increase the region size by asking for
    more memory from the system. If the system is out of memory, it is
    out of memory and you need a bigger computer or some other change.
    You don't "wait for the next GC run" as if it were a periodic daemon.

    With reference counting, most objects are immediately freed as soon as
    they are discarded--no need to wait for the next GC run.

    In other words you effectively GC every time an object is freed instead
    of having a tuneable parameter that you can optimize. And of course you
    don't get freedom from pauses either. If you allocate a million element
    list in Python, then drop the last reference to the list, you have to
    decrement the refcounts of each of the million elements, however long
    that takes. Plus if that decrements most or all of them to zero, you
    have to free them one by one. With a copying-style GC, you never have
    to visit those elements or free them individually.

    Look, widely used GC'd languages include Java, SBCL and other comparable
    Lisp systems, GHC, OCaml, Erlang, .NET (C# and F#), Golang, current incarnations of Javascript, and others. All of them beat the pants off
    of CPython in performance. If you're claiming CPython's refcounting
    system somehow outperforms the above mentioned GC's, I'd be interested
    in seeing some benchmarks. There may be some trade-offs in CPython that
    make its refcount system still advantageous for some things, but
    performance is unlikely to be one.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.python on Wed Mar 20 01:37:50 2024
    From Newsgroup: comp.lang.python

    Chris Angelico <rosuav@gmail.com> writes:
    FYI you're violating someone's request by responding to them in a way
    that results in it getting onto python-list,

    I don't know if I'm doing that, but if yes, it's not on purpose. I'm responding on Usenet.

    In CPython, the only reason to "run garbage collection" is to detect
    cycles

    That's the current version. We're discussing the possibility of
    switching from refcounting to a GC system as part of the GIL removal
    project.
    --- Synchronet 3.20a-Linux NewsLink 1.114