• Re: Memory allocators and reporting the allocation size

    From Dmitry A. Kazakov@mailbox@dmitry-kazakov.de to comp.lang.misc on Tue Dec 13 10:23:56 2022
    From Newsgroup: comp.lang.misc

    On 2022-12-12 22:46, James Harris wrote:
    On 07/12/2022 19:48, Dmitry A. Kazakov wrote:
    On 2022-12-07 19:31, James Harris wrote:
    On 07/12/2022 16:46, Dmitry A. Kazakov wrote:
    On 2022-12-07 17:37, James Harris wrote:

    Well, formal specifications are more relevant for end-user apps but >>>>> someone writing middleware will not now how the software might be
    used in future.

    As a designer of a middleware I can tell you that for it
    specifications are even more important.

    Specifications for middleware performance? What form do they take?

    Performance is a non-functional requirement, ergo, not a part of
    formal specifications. Formal specifications contain only functional
    requirements.

    I appear to be somewhere between you and David on this. For sure, it's
    not possible to set user-facing performance figures for software we
    don't know the future uses of! But we can design it so that its run time will scale predictably and evenly and reasonably well. That's so for the
    two modules we have been discussing: a string library and a memory allocator.

    Yes, in other words, remembering the motto (of rather sad history):

    "don't be evil"

    Programs must be written in good conscience.
    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From David Brown@david.brown@hesbynett.no to comp.lang.misc on Tue Dec 13 15:16:09 2022
    From Newsgroup: comp.lang.misc

    On 12/12/2022 22:17, James Harris wrote:
    On 07/12/2022 16:20, James Harris wrote:
    On 06/12/2022 07:46, David Brown wrote:

    ...

    to squeeze in a few extra bytes.  The rest is done by a /new/
    allocation, not an enlargement.  That means all the time spend trying
    to figure out if you have an extra byte or two is wasted.

    Your understanding of the proposal may be improving but there's some
    way to go. The resize call would not be limited to padding for
    alignment but could return a *lot* more if a lot more were available.

    Maybe it's the small memory requirements I used in the example which
    is misleading you. They were just illustrative. Say, instead, an
    existing allocation, D, is 400k and a library using that space wanted
    to extend it by 100k bytes. It would call

       resize(D, 500_000)

    If there was enough space after D for it to be extended to 500k then
    that's what would happen. The size of D would be increased to (at
    least) 500k without moving the initial 400k to new memory.

    I see no reply to the above. Maybe, David, you now better understand the proposal. Hopefully! :-)


    I didn't reply because I thought we had covered everything several times already. I understand your proposal - I disagree that it is realistic
    in practice or a good way to implement allocators, strings, or anything
    else. But I am not here to try to force my opinion on you, or convince
    you to change your mind - or to have my mind changed either. We learn
    from discussing different thoughts and opinions, and sharing different experiences, and from putting our own thoughts and ideas into words. I
    expect no more from you than to consider the suggestions or ideas I
    present - implement them, be inspired by them or reject them as it suits.

    So for now, at least, I think we have covered that topic and moved on.

    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From David Brown@david.brown@hesbynett.no to comp.lang.misc on Tue Dec 13 15:38:43 2022
    From Newsgroup: comp.lang.misc

    On 12/12/2022 22:13, James Harris wrote:
    On 06/12/2022 16:33, Dmitry A. Kazakov wrote:
    On 2022-12-06 14:57, David Brown wrote:

    ...

    I don't think you understand what the term "undefined behaviour"
    means.   Many people don't, even when they are seasoned programmers - >>> and that is why they fear it.

    It simply means the specifications don't say anything about the given
    circumstances.

    No, it means exactly what it says: the program's behavior is
    undefined, it can do absolutely anything = be in any possible state.
    The state of the Universe is always confined and variation of states
    are always localized in space and time. Which is why engineering of
    hardware and programming are so different.

    I agree with your interpretation of undefined behaviour. I hope David reviews his position because I recall him saying that he welcomed (or similar) UB in a programming language. Definitions matter.



    I do welcome UB in programming languages - I think it is a good thing to
    be clear and up-front about it. It is a bad idea for a language to try
    and hide its limitations and give the impression that it will always
    save the programmer from their mistakes. Programmers are already too
    lazy, even in languages with UB ("I don't need to check if this alloc
    returns a null pointer - the OS will handle the crash neatly if I'm wrong").

    I much prefer a clear separation of responsibilities and guarantees.
    The programmer promises to write code that follows certain rules, and
    the language/library promises to give certain effects as long as the
    rules are followed. As a language user, I don't want a language or
    library that has extra inefficiencies, complications or irregularities
    to support programmers who don't learn the language and follow its
    rules. As a language implementer or library implementer, I don't want
    to have to try to figure out what the user meant or conjure up sensible answers to senseless questions.

    Trust the programmer. Trust the language and the libraries. Everyone
    is happier that way - it's all simpler, cleaner, and more efficient.

    And then provide tools that can help people find their mistakes and
    improve their code!


    Yes, definitions matter. So do lack of definitions. Undefined
    behaviour has /no/ definition - by the definition of the words
    themselves! Anyone who thinks they have an "interpretation of undefined behaviour" has missed the point entirely - it has no definition or interpretation.


    Any function (or operator, or other feature of the language or library)
    should be specified in terms of two things:

    1. Precondition.
    2. Postcondition.

    The precondition says what must be true before the function is called.
    It includes the number and types of the parameters, any limitations on
    the parameters, any global data or environmental conditions, etc. The programmer promises that this expression will be "true" when the
    function is called. The function implementer can, and should, rely upon
    it being true - any effort spent handling situations when it is not true
    is wasted effort and inefficiency.

    The postcondition says what is true when the function returns, and
    includes the number and types of the return values, any effects on
    global data or the environment, side-effects, etc. The function
    implementer's job is to ensure the postcondition is true on the
    assumption that the precondition is true. The function user can then
    rely upon it being true after the call - any effort spend checking it is wasted effort.


    "Undefined behaviour" is then simply calling the function when the precondition is not true. What happens if you try? The function specification does not say. End of story.
    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From David Brown@david.brown@hesbynett.no to comp.lang.misc on Tue Dec 13 16:17:49 2022
    From Newsgroup: comp.lang.misc

    On 12/12/2022 22:07, James Harris wrote:
    On 06/12/2022 14:19, David Brown wrote:
    On 05/12/2022 23:38, James Harris wrote:
    On 04/12/2022 23:12, David Brown wrote:
    On 04/12/2022 15:07, James Harris wrote:
    On 04/12/2022 13:27, David Brown wrote:

    ...

    I thought that was what you and Bart (who IIRC also mentioned
    something similar) had in mind and I spend some time designing an
    allocator to support splitting memory in that way down to the byte
    level. It was an interesting exercise and I may use it one day but
    I thought you had some use cases in mind.

    Who cares?  Launch the nasal daemons.

    Never, never, and never!


    Garbage in, garbage out.

    You cannot give the program sensible answers to stupid questions.
    And no matter how much you try to check the program's requests for
    sanity and valid values, there will be a programmer who will outwit
    you and find new ways to be wrong that are beyond your imagination.
    If you have a low-level language that allows things like pointers
    and conversions between points and integers, this will happen /very/
    easily, and there is nothing you can do to stop it.

    ...

    The concept of undefined behaviour - of "garbage in, garbage out" -
    is as old as computing:

    Garbage out is not the same as undefined behaviour. Here's an example
    of garbage in leading to garbage out

       Enter your age: 1000
       You owe us $-1,234,567

    Here's an example of UB

       Enter your age: 1000
       Deleting files in your home directory


    Nasal demons should never be permitted.



    No.

    An example of undefined behaviour is :

    Enter your age between 0 and 200, and I will convert it to Mars years.

    No one has said what will happen here if you enter 1000.  The
    developer can do whatever he or she likes in such cases, including
    ignoring it entirely, or converting it accurately to Mars years.

    Where did you get that idea of undefined behaviour from? It is different
    and far more benign than anything I've seen before.

    It's not benign or malign - it is undefined. (I could have added "the
    program signs you up for a one-way ticket to Mars" as a possible result.)


    What you describe sounds more like /unspecified/ behaviour to me.
    Undefined is more like the example you gave in another post which ends with

      delete_recursive(path)

    In other words, for undefined behaviour ANYTHING could happen, including something catastrophic. That's why I say it should be prohibited.


    Undefined behaviour means nobody has defined what will happen.

    The /possibility/ of undefined behaviour in a function is a good thing.
    It is very rare that it makes sense to define the behaviour for
    absolutely any input in absolutely any situation. And no matter how
    hard you try, you can't cover /everything/ - some things are outside
    your control. How is your function supposed to behave when a cosmic ray causes an SEU in the processor leading to an unexpected jump? What is
    the result of your square root function when the user spills coffee on
    the computer? How will it react to a stack overflow?

    Trying to /execute/ undefined behaviour is the problem.

    int add_and_divide(int a, int b, int c) {
    return (a + b) / c;
    }

    This has undefined behaviour in C - it is UB if c is 0, or if (a + b) overflows, or if (a + b) equals INT_MIN and c equals -1. (I bet you
    didn't even think of that last one.)

    It is absolutely /fine/, as long as its limitations are clear.

    There is only a problem if the code calling "add_and_divide()" has
    errors, and calls it with inappropriate arguments. So yes, prohibit it
    - tell the programmer not to do that, because it would be a bug in the
    code. (It may be that the fault lies with the documenter of the
    function, or the person specifying the code, or the programmer that
    calls the function that calls add_and_divide, but it is still a bug.)

    All run-time undefined behaviour that is executed is the result of buggy software. Further, all bugs result in undefined behaviour - by
    definition, a "bug" means the code is not doing what it is supposed to
    do, and thus the program is not behaving according to the defined specification.)


    The worst thing you can do - IMHO, as always - is pretend that you can
    "solve" the "problem" of undefined behaviour by trying to define the
    behaviour of everything. You want to make a language were it is
    impossible for the programmer to have bugs in their code? Good luck
    with that! The "best" you can do is tell programmers you think they are
    so incompetent that you'd rather slow down their code and give them meaningless and wrong results than trust them to do their job properly.


    (Again, let me stress it is useful for tools - compiler warnings,
    linters, analysers, IDE's, debuggers, run-time sanitizers, etc., - to
    help find and correct bugs at all levels.)




    The precondition to the program is simple and clear - it is /your/
    fault if you put in an age outside the range.

    There is no simple age limit but the behaviour up to a certain limit can
    be specified, along with an undertaking to reject anything higher.


    That would be changing the precondition and the postcondition. Your
    program no longer has a limit to the value you put in. But it is no
    longer a program that calculates the sum owed, or your age in Mars
    years. It is now a program that /sometimes/ does that, and sometimes
    does something else, depending on what value you enter. So it is a significantly different specification for a significantly different
    program. Maybe that's fine, maybe not.

    (When you have unsanitized data coming in from the outside, it is almost invariably best to have a specification that includes checks and limits
    on the data.)



    The designer of a saucepan says what will happen if you put it on a
    normal cooker, or in a dishwasher, or in a home oven - because that is
    expected use of the saucepan.  He does not say what will happen when
    you throw it in a blast furnace, or put it on your head and dance
    naked in the street - those are undefined behaviours.  Should those
    possibilities have been included in the instruction manual for the
    saucepan?  Should the saucepan manufacturer take responsibility for
    knowing the local laws where you live, and determining if you'll be
    arrested, locked in a padded cell, freeze to death, or simply have a
    refreshing dance in your own garden?

    So why should the implementer of sqrt consider all the ways you could
    attempt to use the function outside of its specification?

    Mathematical functions can vet their inputs before passing them through
    to the procedure body.


    Mathematical functions - /real/ mathematical functions - are often
    defined on limited domains. The mathematical square root function on
    reals is not defined for negative values. √-1 is undefined behaviour. Trying to define the function as mapping from ℝ to ℝ ∪ { "Error: Negative square root" } would be an utter disaster in mathematics.

    At its heart, programming is a branch of mathematics. Trying to define
    the indefinable can appear tempting, but it is a bad idea in the long run.

    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From Bart@bc@freeuk.com to comp.lang.misc on Tue Dec 13 15:26:24 2022
    From Newsgroup: comp.lang.misc

    On 13/12/2022 14:38, David Brown wrote:
    On 12/12/2022 22:13, James Harris wrote:

    I agree with your interpretation of undefined behaviour. I hope David
    reviews his position because I recall him saying that he welcomed (or
    similar) UB in a programming language. Definitions matter.



    I do welcome UB in programming languages - I think it is a good thing to
    be clear and up-front about it.

    Except that, with C, and exploitative C compilers, that doesn't happen.
    They simply assume UB can never happen, and use that as an excuse to be
    able to do what they like, and you didn't expect. That's the opposite of
    being up-front!


    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From David Brown@david.brown@hesbynett.no to comp.lang.misc on Tue Dec 13 18:12:57 2022
    From Newsgroup: comp.lang.misc

    On 13/12/2022 16:26, Bart wrote:
    On 13/12/2022 14:38, David Brown wrote:
    On 12/12/2022 22:13, James Harris wrote:

    I agree with your interpretation of undefined behaviour. I hope David
    reviews his position because I recall him saying that he welcomed (or
    similar) UB in a programming language. Definitions matter.



    I do welcome UB in programming languages - I think it is a good thing
    to be clear and up-front about it.

    Except that, with C, and exploitative C compilers, that doesn't happen.
    They simply assume UB can never happen, and use that as an excuse to be
    able to do what they like, and you didn't expect. That's the opposite of being up-front!


    No, they are up-front about it. The language standards say, for
    example, that signed integer overflow is undefined behaviour. You don't
    have any justification for any expectations about what might happen if
    you try to overflow your integers, or dereference a null pointer.

    It's really not difficult in most cases. If you have 16-bit integers (I always use that in Usenet posts, because the numbers are easier!), then
    it is quite clear that you cannot add 20,000 and 15,000 and expect a mathematically correct answer. I have no expectations for what would
    happen if I tried to do something that is clearly wrong mathematically,
    and clearly wrong in the definition of the language. The language says
    it has no meaning, the compiler says it implements the language - what
    more could I ask for?



    --- Synchronet 3.19c-Linux NewsLink 1.113
  • From James Harris@james.harris.1@gmail.com to comp.lang.misc on Tue Dec 12 19:03:24 2023
    From Newsgroup: comp.lang.misc

    -On 28/11/2022 07:11, David Brown wrote:
    On 26/11/2022 14:05, James Harris wrote:
    On Friday, 25 November 2022 at 16:15:08 UTC, David Brown wrote:
    On 25/11/2022 14:02, James Harris wrote:
    I will get back to you guys on other topics but I need to improve
    the memory allocator I use in my compiler and that has led to the
    following query.

    Imagine an allocator which carries out plain malloc-type
    allocations (i.e. specified as a number of 8-bit bytes suitably
    aligned for any type) without having to have compatibility with
    malloc, and in a language which is not C.

    For the sake of discussion the set of calls could be

    m_alloc m_calloc m_realloc m_resize m_free
    In a new language, I would not follow the names from C as closely.
    For one thing, it could confuse people - they may think they do the
    same thing as in C. For another, "calloc" in particular is a silly
    name. And you don't need "realloc" and "resize".

    The idea is that realloc could move the allocation whereas resize
    could only resize it in place. The latter is not present in C's
    malloc family.


    Where would such a function be useful?  If the application needs to
    expand an object, it needs to expand it - if that means moving it, so be it.  I can't see where you would have use for a function that /might/ be able to expand an object.  Do you have /real/ use-cases in mind?

    Sure. Imagine implementing a rope data structure

    https://en.wikipedia.org/wiki/Rope_(data_structure)

    or anything which could be in parts and/or of varying size.




    It can make sense to distinguish between "alloc_zeroed" and
    "alloc_unintialised", for performance reasons.

    Agreed, but what exactly would you want for the calls? Maybe

    m_alloc(size) m_alloc_a(size, flags)

    ?


    There could be many different interfaces.  Dmitry also pointed out other possible distinctions for allocation of shared memory, unpageable
    memory, and so on.  Not all memory is created equal!

    Maybe you want a more object-oriented interface, for supporting several different pools or memory allocation choices.

    There is use for (and, I would argue, a need for) various different
    types of allocator. Some allocators are good for fixed-size objects,
    some for space which does not need to be returned until a program
    terminates, and some for where releases are in the opposite order from requests, for example.

    But such allocators impose restrictions on how they can be used or how
    they can be implemented, e.g. requiring meta space to manage the storage space.

    There is still a need for one approach to be completely general:

    * not requiring fixed-size allocations
    * not requiring separate data structures to describe data space
    * permitting arbitrary order of allocation and deallocation

    Once one has a completely general allocator then it can take control of available address space and other, more specialised allocators can be
    built on top of it - e.g. by requesting meta space and data space for an array-style allocator.

    ...


    The one thing I would avoid at all costs, in any interfaces, is the
    "integer flag" nonsense found in many languages (such as C).  If you
    really want some kind of flags, then at least be sure your language has strong type-checked enumerations.

    I presume you mean not to combine flags into an integer such as

    mem_allocator_call(size, flags)

    where flags is such as

    ALLOC_DATA | ALLOC_WRITABLE

    but what would you use instead of an integer?




    Would there be any value in giving the programmer a way to find
    out the size of a given allocation? I don't mean the size
    requested but the size allocated (which would be greater than or
    equal to the size requested).

    No. It is pointless.

    Why would anyone want to know the result of such a function? The
    only conceivable thought would be if they had first allocated space
    for x units, and then they now want to store y units - calling
    "get_real_size" could let them know if they need to call "resize"
    or not.

    The answer is that they should simply call "resize". If "resize"
    does not need to allocate new memory because the real size is big
    enough, it does nothing.

    It's partly for performance, as Bart's comments have backed up. If
    code can find out the capacity then it can fill that allocation up to
    the stated capacity without having to make any other calls and
    without risking moving what has been stored so far.


    There is no performance benefit in real code - and you should not care
    about silly cases.

    On the contrary, there is huge potential for real benefit in real code.
    There is no advantage in toy code. That's the difference.


    I ask because it's simple enough to precede an aligned chunk of
    memory with the allocation size. The allocator may well do that
    anyway in order to help it manage memory. So there seems to be no
    good reason to keep that info from the programmer. The question
    is over whether there's value in allowing the programmer to get
    such info. I am thinking that he could get the value with a call
    such as

    m_msize(p)

    It seems simple enough, potentially useful, and should be
    harmless. But it's noticeable that the malloc-type calls don't
    have anything similar. So maybe there's a good reason why not.

    I thought it might cause implementation issues such as requiring
    a larger header if the allocator wants to search only free
    memory. But whatever implementation is used I think the allocator
    will still need to either store or calculate the size of the
    memory allocated so I am not sure I see a problem with the idea.

    What do you guys think? Opinions welcome!

    I think you should not try to copy C's malloc/free mechanism. In
    particular, I do not think the memory allocator should track the
    sizes of allocations. "free" should always include the size,
    matching the value used in "alloc". (And "resize" should pass the
    old size as well as the new size.)

    Bart said the same and the suggestion surprises me. It seems prone to
    error.


    Your choices with memory management are basically to either trust the programmer, or do everything automatically with garbage collection.
    Even if you use something like C++'s RAII mechanisms for allocating and deallocating, you still have to trust the programmer to some extent.  If
    you are relying on the programmer to get their pointers right for calls
    to "free", why are you worried that they'll get the size wrong?

    The implementation has to know the sizes. There's no point in requiring
    the client program to know the sizes as well.



    Storing the size of allocation was not too bad in earliest days of
    C, with simpler processors, no caches, no multi-threading, and
    greater concern for simple allocation implementations than for
    performance or reliability. Modern allocators no longer work with a
    simple linked list storing sizes and link pointers at an address
    below the address returned to the user, so why follow a similar
    interface?

    Programs know the size of memory they requested when they allocated
    it. They know, almost invariably, the size when they are freeing
    the memory. They know the size of the types, and the size of the
    arrays. So having the memory allocator store this too is a waste.

    That's sometimes the case but not always. Say I wanted to read a line
    from a file a byte at a time without knowing in advance how long the
    line would be (a common-enough requirement but one which C programs
    all too often fudge by defining a maximum line length). The required
    allocation could not be known in advance.


    That is irrelevant.  (On a side note, it makes sense to have library
    calls that make this kind of common function more convenient.)  The programmer knows the size of any "malloc" calls or "realloc" calls made
    - that's the size given back to "free".

    Please see above. The implementation (of a fully general allocator) has
    to know the sizes so it knows what's free and what's in use.


    There are various specialised allocation techniques but the malloc
    style is the most general I know of.


    ...


    As Dmitry has pointed out, there are /many/ ways to make allocators, and none of them are optimal in all cases.  Trying to squeeze things into
    one very limited interface simply guarantees that you never get the best choice in your code.

    I did not say that there should be only one allocator.



    Once you've dropped size tracking, you can easily separate
    allocation tracking from the memory in use - this gives neater
    sizes for pools and better cache and virtual page usage. You can
    have pools of different sizes - perhaps 16 byte blocks for the
    smallest, then 256 byte blocks, then 4 KB blocks, etc. (You might
    prefer different sizes, perhaps with more intermediary sizes.) A
    pool element would have 64 blocks. Allocation within a pool is done
    by quick searches within a single 64-bit map - no need to search
    through lists.

    I presume by pools you mean slots of a certain size I may, longer
    term, let a program specify slot sizes and pool geometry. But that's
    for the future and it wouldn't work well with variable-length
    strings, especially where the length is not known in advance. So I
    need something simple and general for now.


    Make a bad interface now and you'll be stuck with it.

    Indeed. Hence the original question.


    I still don't understand your targets - either in terms of computer
    systems, or expected uses.  If the target is modern PC-style systems (so that you can, like in Bart's languages, assume that you have 64-bit
    integers and plenty of memory), then you can make a string format that
    has uses, say, 256 byte lumps.  Some of that is for pointers and length counts to handle chains for big strings, and maybe reference counting,
    with over 200 bytes free for the string itself.  The solid majority of
    all strings in practical use will fit in one lump, and you have no challenges with memory allocators, resizing, or the rest of it.  That simplification will completely outweigh the inefficiency of the wasted memory.  And then for long strings, doing everything in a single C-style block is inefficient anyway - you'll want some kind of linked chain anyway.

    The targets are simply 'computers'. They can be of different sizes -
    e.g. 12-bit processors with tiny amounts of RAM up to 64-bit processors
    with oodles of RAM, and beyond. The size of the machine shouldn't much
    matter.
    --
    James Harris


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From David Brown@david.brown@hesbynett.no to comp.lang.misc on Sun Dec 17 21:53:13 2023
    From Newsgroup: comp.lang.misc

    On 12/12/2023 20:03, James Harris wrote:
    -On 28/11/2022 07:11, David Brown wrote:
    On 26/11/2022 14:05, James Harris wrote:
    On Friday, 25 November 2022 at 16:15:08 UTC, David Brown wrote:
    On 25/11/2022 14:02, James Harris wrote:
    I will get back to you guys on other topics but I need to improve
    the memory allocator I use in my compiler and that has led to the
    following query.

    Imagine an allocator which carries out plain malloc-type
    allocations (i.e. specified as a number of 8-bit bytes suitably
    aligned for any type) without having to have compatibility with
    malloc, and in a language which is not C.

    For the sake of discussion the set of calls could be

    m_alloc m_calloc m_realloc m_resize m_free
    In a new language, I would not follow the names from C as closely.
    For one thing, it could confuse people - they may think they do the
    same thing as in C. For another, "calloc" in particular is a silly
    name. And you don't need "realloc" and "resize".

    The idea is that realloc could move the allocation whereas resize
    could only resize it in place. The latter is not present in C's
    malloc family.


    Where would such a function be useful?  If the application needs to
    expand an object, it needs to expand it - if that means moving it, so be
    it.  I can't see where you would have use for a function that /might/ be
    able to expand an object.  Do you have /real/ use-cases in mind?

    Sure. Imagine implementing a rope data structure

      https://en.wikipedia.org/wiki/Rope_(data_structure)

    or anything which could be in parts and/or of varying size.

    You still don't need a "resize" that is distinct from "realloc". The
    parts of your ropes would have a known allocation size (ideally matching
    the appropriate units for the main allocator) and a "used" size. Small changes that can be handled within the existing allocated size for the
    rope are easily done without any memory allocation functions, and if you
    need more, you have a new block on your rope.

    The number of times - even with a rope - that a "resize if possible
    without a new allocation" function is useful is negligible.

    /Vastly/ more useful, I think, would be to re-think allocation systems entirely - don't copy C. C's memory management was invented in a time
    before caches, and before SMP. Stop trying to tweak C's outdated
    choices with hacks that have almost no real-world uses, and starting
    thinking differently.

    For example, have your allocation functions return a tuple of address
    and actual available memory in the allocation. For data structures
    where you think "resize" might have relevance, store the real available
    memory size in the blocks - then you have all the benefits of "resize"
    for free.

    And don't store the size of the allocation data in hidden parts of the allocation, as is traditional in C malloc/free implementations. For 95%
    of all allocations, the software knows what the allocated size is when
    it is time to free the allocation. And for the other 5%, the size can
    easily be stored, tracked or calculated when needed. Including
    allocation size and pointers to free lists or linked list chains of allocations in the allocated memory made sense before caches - it does
    not make sense now. Keep the metadata separate.





    It can make sense to distinguish between "alloc_zeroed" and
    "alloc_unintialised", for performance reasons.

    Agreed, but what exactly would you want for the calls? Maybe

    m_alloc(size) m_alloc_a(size, flags)

    ?


    There could be many different interfaces.  Dmitry also pointed out other
    possible distinctions for allocation of shared memory, unpageable
    memory, and so on.  Not all memory is created equal!

    Maybe you want a more object-oriented interface, for supporting several
    different pools or memory allocation choices.

    There is use for (and, I would argue, a need for) various different
    types of allocator. Some allocators are good for fixed-size objects,
    some for space which does not need to be returned until a program terminates, and some for where releases are in the opposite order from requests, for example.

    Yes.


    But such allocators impose restrictions on how they can be used or how
    they can be implemented, e.g. requiring meta space to manage the storage space.

    Size-size allocators are particularly efficient for their metadata - all
    you need is a single bit per allocation.

    Metadata is always needed. Putting it within the allocated space is an outdated and inefficient solution, and does not save space.


    There is still a need for one approach to be completely general:

    Certainly there are uses for general allocations as well as more
    specific ones. But often most allocations in a program can be better
    handled by specific ones.


    * not requiring fixed-size allocations

    Yes.

    * not requiring separate data structures to describe data space

    No.

    * permitting arbitrary order of allocation and deallocation

    Yes.


    Once one has a completely general allocator then it can take control of available address space and other, more specialised allocators can be
    built on top of it - e.g. by requesting meta space and data space for an array-style allocator.


    That is certainly possible.

    ...


    The one thing I would avoid at all costs, in any interfaces, is the
    "integer flag" nonsense found in many languages (such as C).  If you
    really want some kind of flags, then at least be sure your language has
    strong type-checked enumerations.

    I presume you mean not to combine flags into an integer such as

      mem_allocator_call(size, flags)

    where flags is such as

      ALLOC_DATA | ALLOC_WRITABLE

    but what would you use instead of an integer?


    Use proper types - strong enumerations, sets, structs with bits or
    flags, etc. /Strong/ types. It all boils down to words of some size in
    the end, but the stronger your types as seen by the user in the
    programming language, the less possibility there is for error or misunderstanding.




    Would there be any value in giving the programmer a way to find
    out the size of a given allocation? I don't mean the size
    requested but the size allocated (which would be greater than or
    equal to the size requested).

    No. It is pointless.

    Why would anyone want to know the result of such a function? The
    only conceivable thought would be if they had first allocated space
    for x units, and then they now want to store y units - calling
    "get_real_size" could let them know if they need to call "resize"
    or not.

    The answer is that they should simply call "resize". If "resize"
    does not need to allocate new memory because the real size is big
    enough, it does nothing.

    It's partly for performance, as Bart's comments have backed up. If
    code can find out the capacity then it can fill that allocation up to
    the stated capacity without having to make any other calls and
    without risking moving what has been stored so far.


    There is no performance benefit in real code - and you should not care
    about silly cases.

    On the contrary, there is huge potential for real benefit in real code. There is no advantage in toy code. That's the difference.


    I completely disagree, and nothing you have suggested here or previously
    has given me any reason to suspect that "resize" is useful. You've
    given vague ideas about things where you think it might be helpful, but nothing that is concrete and nothing that - IMHO - would not be better
    off handled in a very different way. However, this is /your/ language,
    not mine, and if you want to disagree with me on this point then that is absolutely fine. I think we can call this one a dead donkey, and stop
    beating it. There are surely many other points for which my suggestions
    or recommendations could be more useful to you, and we should focus there.


    I ask because it's simple enough to precede an aligned chunk of
    memory with the allocation size. The allocator may well do that
    anyway in order to help it manage memory. So there seems to be no
    good reason to keep that info from the programmer. The question
    is over whether there's value in allowing the programmer to get
    such info. I am thinking that he could get the value with a call
    such as

    m_msize(p)

    It seems simple enough, potentially useful, and should be
    harmless. But it's noticeable that the malloc-type calls don't
    have anything similar. So maybe there's a good reason why not.

    I thought it might cause implementation issues such as requiring
    a larger header if the allocator wants to search only free
    memory. But whatever implementation is used I think the allocator
    will still need to either store or calculate the size of the
    memory allocated so I am not sure I see a problem with the idea.

    What do you guys think? Opinions welcome!

    I think you should not try to copy C's malloc/free mechanism. In
    particular, I do not think the memory allocator should track the
    sizes of allocations. "free" should always include the size,
    matching the value used in "alloc". (And "resize" should pass the
    old size as well as the new size.)

    Bart said the same and the suggestion surprises me. It seems prone to
    error.


    Your choices with memory management are basically to either trust the
    programmer, or do everything automatically with garbage collection.
    Even if you use something like C++'s RAII mechanisms for allocating and
    deallocating, you still have to trust the programmer to some extent.  If
    you are relying on the programmer to get their pointers right for calls
    to "free", why are you worried that they'll get the size wrong?

    The implementation has to know the sizes.

    Nope.

    There's no point in requiring
    the client program to know the sizes as well.

    The client /already/ knows the sizes - after all, it asked for the
    memory of particular sizes in the first place. There is no point in the /implementation/ tracking that data as well as the client.


    Make a bad interface now and you'll be stuck with it.

    Indeed. Hence the original question.

    Unfortunately, I have long forgotten the original question!



    I still don't understand your targets - either in terms of computer
    systems, or expected uses.  If the target is modern PC-style systems
    (so that you can, like in Bart's languages, assume that you have
    64-bit integers and plenty of memory), then you can make a string
    format that has uses, say, 256 byte lumps.  Some of that is for
    pointers and length counts to handle chains for big strings, and maybe
    reference counting, with over 200 bytes free for the string itself.
    The solid majority of all strings in practical use will fit in one
    lump, and you have no challenges with memory allocators, resizing, or
    the rest of it.  That simplification will completely outweigh the
    inefficiency of the wasted memory.  And then for long strings, doing
    everything in a single C-style block is inefficient anyway - you'll
    want some kind of linked chain anyway.

    The targets are simply 'computers'. They can be of different sizes -
    e.g. 12-bit processors with tiny amounts of RAM up to 64-bit processors
    with oodles of RAM, and beyond. The size of the machine shouldn't much matter.


    The size of the target matters /enormously/ for the language.

    If it has to work on very small systems, then the language has to be
    small and limited. If it only needs to work on big systems, it can be
    far higher level - and thus much more helpful to the programmer.

    If your language and its standard library need to work on small microcontrollers, you need to make sure the runtime minimises ram usage
    and is optimised for tiny data requirements. You probably don't want
    dynamic memory at all, because people writing for such devices avoid
    dynamic memory as much as possible. You will want no more than minimum
    string handling, with blind handling of 8-bit characters, to avoid
    library overheads. You won't have floating point support. You won't
    have IO libraries, or network support, or file handling. You will
    either have no multi-threading support, or you will have your own RTOS. Everything revolves around predictable timing with very small data needs.

    If it is for big machines (Raspberry Pi, PC, etc.), then you have a
    completely different world. It doesn't matter if your system uses a
    megabyte of ram extra - that's a few microseconds of wasted time at
    most. Memory allocation is vital - it needs to be fast and low
    time-cost, with great care about cache lines, but it doesn't matter if
    ram is wasted. You need to fit the threading world of the OS's you
    support, and programmer tools to help avoid data races, false sharing,
    and other hard to debug issues would be of great help. So would
    language support for multi-threading of algorithms - perhaps also multi-processing, and even distributed computing (using OpenMP and MPI).
    Vectorisation of code makes a huge difference. You need full IO,
    networking and file support, using UTF-8 throughout - but with libraries
    for other encodings for foreign function interfaces. Everything must be optimised for big data structures, because anything else is just noise.

    These are completely different worlds. You can't make a language that
    is good for both.




    --- Synchronet 3.20a-Linux NewsLink 1.114