On 06/12/2022 08:43, Dmitry A. Kazakov wrote:
On 2022-12-06 00:47, James Harris wrote:
Rather than a contract (which can lead to an error if the
precondition is not met) it may be better to /define/ the range of
possible inputs to a function: a call which has parameters which do
not meet the definitions will not even match with the function which
cannot handle them. For example,
sqrt(-4.0)
would not match with a definition
function sqrt(real value :: value >= 0) -> real
but would match with
function sqrt(real value) -> complex
It is untyped. The formal proof is trivial, I gave it already:
X := 4.0;
if HALT(p) then
X := -4.0;
end if;
sqrt (X); -- What is the type of?
What does HALT(p) do?
On 06/12/2022 07:56, David Brown wrote:
On 05/12/2022 23:25, James Harris wrote:
On 04/12/2022 22:59, David Brown wrote:
On 04/12/2022 19:21, James Harris wrote:
...
The case in point is two libraries: a memory allocator and a string >>>>> library which uses the memory allocator. What I am saying to you is >>>>> that such libraries need to be correct but they also need to scale
to large strings, many strings, many string operations, many other
non-string allocation activity, etc, because it's impossible to say >>>>> just now how they will be used. It's no good to just "focus on
correctness".
I did not say that you should ignore performance or efficiency. I
said /correctness/ trumps performance every time. /Every/ time. No >>>> exceptions.
You can say it yet again but that doesn't mean it's true. Correctness
is essential, of course, but for some programs which need hard
real-time guarantees performance is equally as important.
What part of "specific timing requirements are part of correctness" is
giving you trouble here? Other requirements can be part of
correctness too - if the customer says the program must be written in
Quick Basic, then a much faster and neater alternative in Turbo Pascal
is /wrong/.
None. I indicated agreement with that statement.
It really is impossible to stress this too much. Correctness is
always key - it doesn't matter how efficient or performant an
incorrect program is. If you do not understand that, you are in the
wrong profession.
Which part of "Correctness is essential" is giving you trouble? ;-)
If particular specific performance expectations are part of the
requirements, then they are part of /correctness/.
Indeed.
If you are just looking for "as fast as practically possible given
the budget constraints for the development process", then that's
fine too - but it /always/ comes secondary to correctness.
That is at least a development on what you said before but it is
still impractical. If you implemented a simple n-squared sort "due to
budget constraints" it may work fine for weeks but then hang a system
which got hit with a much larger data set. It does not scale. I dread
to think how much 'professional' and paid-for software has followed
similar principles but I've been unfortunate enough to use some of it.
The problem is not from programmers focusing on correctness. The main
problem in software development is failures at the specification level.
Well, formal specifications are more relevant for end-user apps but
someone writing middleware will not now how the software might be used
in future. It is then important to anticipate potential future uses. As
well as correctness such software needs to /scale/.
On 05/12/2022 23:38, James Harris wrote:
On 04/12/2022 23:12, David Brown wrote:
On 04/12/2022 15:07, James Harris wrote:
On 04/12/2022 13:27, David Brown wrote:
...
I thought that was what you and Bart (who IIRC also mentioned
something similar) had in mind and I spend some time designing an
allocator to support splitting memory in that way down to the byte
level. It was an interesting exercise and I may use it one day but I
thought you had some use cases in mind.
Who cares? Launch the nasal daemons.
Never, never, and never!
Garbage in, garbage out.
You cannot give the program sensible answers to stupid questions.
And no matter how much you try to check the program's requests for
sanity and valid values, there will be a programmer who will outwit
you and find new ways to be wrong that are beyond your imagination.
If you have a low-level language that allows things like pointers and
conversions between points and integers, this will happen /very/
easily, and there is nothing you can do to stop it.
...
The concept of undefined behaviour - of "garbage in, garbage out" -
is as old as computing:
Garbage out is not the same as undefined behaviour. Here's an example
of garbage in leading to garbage out
Enter your age: 1000
You owe us $-1,234,567
Here's an example of UB
Enter your age: 1000
Deleting files in your home directory
Nasal demons should never be permitted.
No.
An example of undefined behaviour is :
Enter your age between 0 and 200, and I will convert it to Mars years.
No one has said what will happen here if you enter 1000. The developer
can do whatever he or she likes in such cases, including ignoring it entirely, or converting it accurately to Mars years.
The precondition to the program is simple and clear - it is /your/ fault
if you put in an age outside the range.
The designer of a saucepan says what will happen if you put it on a
normal cooker, or in a dishwasher, or in a home oven - because that is expected use of the saucepan. He does not say what will happen when you throw it in a blast furnace, or put it on your head and dance naked in
the street - those are undefined behaviours. Should those possibilities have been included in the instruction manual for the saucepan? Should
the saucepan manufacturer take responsibility for knowing the local laws where you live, and determining if you'll be arrested, locked in a
padded cell, freeze to death, or simply have a refreshing dance in your
own garden?
So why should the implementer of sqrt consider all the ways you could attempt to use the function outside of its specification?
On 2022-12-06 14:57, David Brown wrote:
I don't think you understand what the term "undefined behaviour"
means. Many people don't, even when they are seasoned programmers -
and that is why they fear it.
It simply means the specifications don't say anything about the given
circumstances.
No, it means exactly what it says: the program's behavior is undefined,
it can do absolutely anything = be in any possible state. The state of
the Universe is always confined and variation of states are always
localized in space and time. Which is why engineering of hardware and programming are so different.
On 06/12/2022 07:46, David Brown wrote:
to squeeze in a few extra bytes. The rest is done by a /new/
allocation, not an enlargement. That means all the time spend trying
to figure out if you have an extra byte or two is wasted.
Your understanding of the proposal may be improving but there's some way
to go. The resize call would not be limited to padding for alignment but could return a *lot* more if a lot more were available.
Maybe it's the small memory requirements I used in the example which is misleading you. They were just illustrative. Say, instead, an existing allocation, D, is 400k and a library using that space wanted to extend
it by 100k bytes. It would call
resize(D, 500_000)
If there was enough space after D for it to be extended to 500k then
that's what would happen. The size of D would be increased to (at least) 500k without moving the initial 400k to new memory.
On 2022-12-07 19:31, James Harris wrote:
On 07/12/2022 16:46, Dmitry A. Kazakov wrote:
On 2022-12-07 17:37, James Harris wrote:
Well, formal specifications are more relevant for end-user apps but
someone writing middleware will not now how the software might be
used in future.
As a designer of a middleware I can tell you that for it
specifications are even more important.
Specifications for middleware performance? What form do they take?
Performance is a non-functional requirement, ergo, not a part of formal specifications. Formal specifications contain only functional requirements.
On 07/12/2022 17:37, James Harris wrote:
On 06/12/2022 07:56, David Brown wrote:
On 05/12/2022 23:25, James Harris wrote:
On 04/12/2022 22:59, David Brown wrote:
On 04/12/2022 19:21, James Harris wrote:
The case in point is two libraries: a memory allocator and a
string library which uses the memory allocator. What I am saying
to you is that such libraries need to be correct but they also
need to scale to large strings, many strings, many string
operations, many other non-string allocation activity, etc,
because it's impossible to say just now how they will be used.
It's no good to just "focus on correctness".
I did not say that you should ignore performance or efficiency. I >>>>> said /correctness/ trumps performance every time. /Every/ time.
No exceptions.
You can say it yet again but that doesn't mean it's true.
Correctness is essential, of course, but for some programs which
need hard real-time guarantees performance is equally as important.
What part of "specific timing requirements are part of correctness"
is giving you trouble here? Other requirements can be part of
correctness too - if the customer says the program must be written in
Quick Basic, then a much faster and neater alternative in Turbo
Pascal is /wrong/.
None. I indicated agreement with that statement.
OK - sorry for misinterpreting you here.
Well, formal specifications are more relevant for end-user apps but
someone writing middleware will not now how the software might be used
in future. It is then important to anticipate potential future uses.
As well as correctness such software needs to /scale/.
I would say it is the opposite.
Formal specifications for end-user situations are notoriously difficult.
How do you define "user-friendly", or "feels comfortable in use" ? Of course it is good to /try/ to specify that kind of thing, but in reality
you do countless rounds of trials and tests, and in the end the specification is "works the way it did on that afternoon when the beta testers were happy".
The deeper into the system you get - through middleware, libraries, and
down to low-level libraries and language details - the easier it is to
make a clear specification, and the more important it is to have clear specifications. Make them formal and testable if possible and practical (real engineering always has budget constraints!).
The end user app is specified "show the items in alphabetical order".
The app will use library functions to do that - and to know which
library and which functions, these need to be specified in detail
(scaling, algorithm class, collation method, language support, etc.).
On 07/12/2022 19:48, Dmitry A. Kazakov wrote:
On 2022-12-07 19:31, James Harris wrote:
On 07/12/2022 16:46, Dmitry A. Kazakov wrote:
On 2022-12-07 17:37, James Harris wrote:
Well, formal specifications are more relevant for end-user apps but >>>>> someone writing middleware will not now how the software might be
used in future.
As a designer of a middleware I can tell you that for it
specifications are even more important.
Specifications for middleware performance? What form do they take?
Performance is a non-functional requirement, ergo, not a part of
formal specifications. Formal specifications contain only functional
requirements.
I appear to be somewhere between you and David on this. For sure, it's
not possible to set user-facing performance figures for software we
don't know the future uses of! But we can design it so that its run time will scale predictably and evenly and reasonably well. That's so for the
two modules we have been discussing: a string library and a memory allocator.
On 07/12/2022 16:20, James Harris wrote:
On 06/12/2022 07:46, David Brown wrote:
...
to squeeze in a few extra bytes. The rest is done by a /new/
allocation, not an enlargement. That means all the time spend trying
to figure out if you have an extra byte or two is wasted.
Your understanding of the proposal may be improving but there's some
way to go. The resize call would not be limited to padding for
alignment but could return a *lot* more if a lot more were available.
Maybe it's the small memory requirements I used in the example which
is misleading you. They were just illustrative. Say, instead, an
existing allocation, D, is 400k and a library using that space wanted
to extend it by 100k bytes. It would call
resize(D, 500_000)
If there was enough space after D for it to be extended to 500k then
that's what would happen. The size of D would be increased to (at
least) 500k without moving the initial 400k to new memory.
I see no reply to the above. Maybe, David, you now better understand the proposal. Hopefully! :-)
On 06/12/2022 16:33, Dmitry A. Kazakov wrote:
On 2022-12-06 14:57, David Brown wrote:
...
I don't think you understand what the term "undefined behaviour"
means. Many people don't, even when they are seasoned programmers - >>> and that is why they fear it.
It simply means the specifications don't say anything about the given
circumstances.
No, it means exactly what it says: the program's behavior is
undefined, it can do absolutely anything = be in any possible state.
The state of the Universe is always confined and variation of states
are always localized in space and time. Which is why engineering of
hardware and programming are so different.
I agree with your interpretation of undefined behaviour. I hope David reviews his position because I recall him saying that he welcomed (or similar) UB in a programming language. Definitions matter.
On 06/12/2022 14:19, David Brown wrote:
On 05/12/2022 23:38, James Harris wrote:
On 04/12/2022 23:12, David Brown wrote:
On 04/12/2022 15:07, James Harris wrote:
On 04/12/2022 13:27, David Brown wrote:
...
I thought that was what you and Bart (who IIRC also mentioned
something similar) had in mind and I spend some time designing an
allocator to support splitting memory in that way down to the byte
level. It was an interesting exercise and I may use it one day but
I thought you had some use cases in mind.
Who cares? Launch the nasal daemons.
Never, never, and never!
Garbage in, garbage out.
You cannot give the program sensible answers to stupid questions.
And no matter how much you try to check the program's requests for
sanity and valid values, there will be a programmer who will outwit
you and find new ways to be wrong that are beyond your imagination.
If you have a low-level language that allows things like pointers
and conversions between points and integers, this will happen /very/
easily, and there is nothing you can do to stop it.
...
The concept of undefined behaviour - of "garbage in, garbage out" -
is as old as computing:
Garbage out is not the same as undefined behaviour. Here's an example
of garbage in leading to garbage out
Enter your age: 1000
You owe us $-1,234,567
Here's an example of UB
Enter your age: 1000
Deleting files in your home directory
Nasal demons should never be permitted.
No.
An example of undefined behaviour is :
Enter your age between 0 and 200, and I will convert it to Mars years.
No one has said what will happen here if you enter 1000. The
developer can do whatever he or she likes in such cases, including
ignoring it entirely, or converting it accurately to Mars years.
Where did you get that idea of undefined behaviour from? It is different
and far more benign than anything I've seen before.
What you describe sounds more like /unspecified/ behaviour to me.
Undefined is more like the example you gave in another post which ends with
delete_recursive(path)
In other words, for undefined behaviour ANYTHING could happen, including something catastrophic. That's why I say it should be prohibited.
The precondition to the program is simple and clear - it is /your/
fault if you put in an age outside the range.
There is no simple age limit but the behaviour up to a certain limit can
be specified, along with an undertaking to reject anything higher.
The designer of a saucepan says what will happen if you put it on a
normal cooker, or in a dishwasher, or in a home oven - because that is
expected use of the saucepan. He does not say what will happen when
you throw it in a blast furnace, or put it on your head and dance
naked in the street - those are undefined behaviours. Should those
possibilities have been included in the instruction manual for the
saucepan? Should the saucepan manufacturer take responsibility for
knowing the local laws where you live, and determining if you'll be
arrested, locked in a padded cell, freeze to death, or simply have a
refreshing dance in your own garden?
So why should the implementer of sqrt consider all the ways you could
attempt to use the function outside of its specification?
Mathematical functions can vet their inputs before passing them through
to the procedure body.
On 12/12/2022 22:13, James Harris wrote:
I agree with your interpretation of undefined behaviour. I hope David
reviews his position because I recall him saying that he welcomed (or
similar) UB in a programming language. Definitions matter.
I do welcome UB in programming languages - I think it is a good thing to
be clear and up-front about it.
On 13/12/2022 14:38, David Brown wrote:
On 12/12/2022 22:13, James Harris wrote:
I agree with your interpretation of undefined behaviour. I hope David
reviews his position because I recall him saying that he welcomed (or
similar) UB in a programming language. Definitions matter.
I do welcome UB in programming languages - I think it is a good thing
to be clear and up-front about it.
Except that, with C, and exploitative C compilers, that doesn't happen.
They simply assume UB can never happen, and use that as an excuse to be
able to do what they like, and you didn't expect. That's the opposite of being up-front!
On 26/11/2022 14:05, James Harris wrote:
On Friday, 25 November 2022 at 16:15:08 UTC, David Brown wrote:
On 25/11/2022 14:02, James Harris wrote:
I will get back to you guys on other topics but I need to improveIn a new language, I would not follow the names from C as closely.
the memory allocator I use in my compiler and that has led to the
following query.
Imagine an allocator which carries out plain malloc-type
allocations (i.e. specified as a number of 8-bit bytes suitably
aligned for any type) without having to have compatibility with
malloc, and in a language which is not C.
For the sake of discussion the set of calls could be
m_alloc m_calloc m_realloc m_resize m_free
For one thing, it could confuse people - they may think they do the
same thing as in C. For another, "calloc" in particular is a silly
name. And you don't need "realloc" and "resize".
The idea is that realloc could move the allocation whereas resize
could only resize it in place. The latter is not present in C's
malloc family.
Where would such a function be useful? If the application needs to
expand an object, it needs to expand it - if that means moving it, so be it. I can't see where you would have use for a function that /might/ be able to expand an object. Do you have /real/ use-cases in mind?
It can make sense to distinguish between "alloc_zeroed" and
"alloc_unintialised", for performance reasons.
Agreed, but what exactly would you want for the calls? Maybe
m_alloc(size) m_alloc_a(size, flags)
?
There could be many different interfaces. Dmitry also pointed out other possible distinctions for allocation of shared memory, unpageable
memory, and so on. Not all memory is created equal!
Maybe you want a more object-oriented interface, for supporting several different pools or memory allocation choices.
The one thing I would avoid at all costs, in any interfaces, is the
"integer flag" nonsense found in many languages (such as C). If you
really want some kind of flags, then at least be sure your language has strong type-checked enumerations.
No. It is pointless.
Would there be any value in giving the programmer a way to find
out the size of a given allocation? I don't mean the size
requested but the size allocated (which would be greater than or
equal to the size requested).
Why would anyone want to know the result of such a function? The
only conceivable thought would be if they had first allocated space
for x units, and then they now want to store y units - calling
"get_real_size" could let them know if they need to call "resize"
or not.
The answer is that they should simply call "resize". If "resize"
does not need to allocate new memory because the real size is big
enough, it does nothing.
It's partly for performance, as Bart's comments have backed up. If
code can find out the capacity then it can fill that allocation up to
the stated capacity without having to make any other calls and
without risking moving what has been stored so far.
There is no performance benefit in real code - and you should not care
about silly cases.
I ask because it's simple enough to precede an aligned chunk ofI think you should not try to copy C's malloc/free mechanism. In
memory with the allocation size. The allocator may well do that
anyway in order to help it manage memory. So there seems to be no
good reason to keep that info from the programmer. The question
is over whether there's value in allowing the programmer to get
such info. I am thinking that he could get the value with a call
such as
m_msize(p)
It seems simple enough, potentially useful, and should be
harmless. But it's noticeable that the malloc-type calls don't
have anything similar. So maybe there's a good reason why not.
I thought it might cause implementation issues such as requiring
a larger header if the allocator wants to search only free
memory. But whatever implementation is used I think the allocator
will still need to either store or calculate the size of the
memory allocated so I am not sure I see a problem with the idea.
What do you guys think? Opinions welcome!
particular, I do not think the memory allocator should track the
sizes of allocations. "free" should always include the size,
matching the value used in "alloc". (And "resize" should pass the
old size as well as the new size.)
Bart said the same and the suggestion surprises me. It seems prone to
error.
Your choices with memory management are basically to either trust the programmer, or do everything automatically with garbage collection.
Even if you use something like C++'s RAII mechanisms for allocating and deallocating, you still have to trust the programmer to some extent. If
you are relying on the programmer to get their pointers right for calls
to "free", why are you worried that they'll get the size wrong?
Storing the size of allocation was not too bad in earliest days of
C, with simpler processors, no caches, no multi-threading, and
greater concern for simple allocation implementations than for
performance or reliability. Modern allocators no longer work with a
simple linked list storing sizes and link pointers at an address
below the address returned to the user, so why follow a similar
interface?
Programs know the size of memory they requested when they allocated
it. They know, almost invariably, the size when they are freeing
the memory. They know the size of the types, and the size of the
arrays. So having the memory allocator store this too is a waste.
That's sometimes the case but not always. Say I wanted to read a line
from a file a byte at a time without knowing in advance how long the
line would be (a common-enough requirement but one which C programs
all too often fudge by defining a maximum line length). The required
allocation could not be known in advance.
That is irrelevant. (On a side note, it makes sense to have library
calls that make this kind of common function more convenient.) The programmer knows the size of any "malloc" calls or "realloc" calls made
- that's the size given back to "free".
There are various specialised allocation techniques but the malloc
style is the most general I know of.
As Dmitry has pointed out, there are /many/ ways to make allocators, and none of them are optimal in all cases. Trying to squeeze things into
one very limited interface simply guarantees that you never get the best choice in your code.
Once you've dropped size tracking, you can easily separate
allocation tracking from the memory in use - this gives neater
sizes for pools and better cache and virtual page usage. You can
have pools of different sizes - perhaps 16 byte blocks for the
smallest, then 256 byte blocks, then 4 KB blocks, etc. (You might
prefer different sizes, perhaps with more intermediary sizes.) A
pool element would have 64 blocks. Allocation within a pool is done
by quick searches within a single 64-bit map - no need to search
through lists.
I presume by pools you mean slots of a certain size I may, longer
term, let a program specify slot sizes and pool geometry. But that's
for the future and it wouldn't work well with variable-length
strings, especially where the length is not known in advance. So I
need something simple and general for now.
Make a bad interface now and you'll be stuck with it.
I still don't understand your targets - either in terms of computer
systems, or expected uses. If the target is modern PC-style systems (so that you can, like in Bart's languages, assume that you have 64-bit
integers and plenty of memory), then you can make a string format that
has uses, say, 256 byte lumps. Some of that is for pointers and length counts to handle chains for big strings, and maybe reference counting,
with over 200 bytes free for the string itself. The solid majority of
all strings in practical use will fit in one lump, and you have no challenges with memory allocators, resizing, or the rest of it. That simplification will completely outweigh the inefficiency of the wasted memory. And then for long strings, doing everything in a single C-style block is inefficient anyway - you'll want some kind of linked chain anyway.
-On 28/11/2022 07:11, David Brown wrote:
On 26/11/2022 14:05, James Harris wrote:
On Friday, 25 November 2022 at 16:15:08 UTC, David Brown wrote:
On 25/11/2022 14:02, James Harris wrote:
I will get back to you guys on other topics but I need to improveIn a new language, I would not follow the names from C as closely.
the memory allocator I use in my compiler and that has led to the
following query.
Imagine an allocator which carries out plain malloc-type
allocations (i.e. specified as a number of 8-bit bytes suitably
aligned for any type) without having to have compatibility with
malloc, and in a language which is not C.
For the sake of discussion the set of calls could be
m_alloc m_calloc m_realloc m_resize m_free
For one thing, it could confuse people - they may think they do the
same thing as in C. For another, "calloc" in particular is a silly
name. And you don't need "realloc" and "resize".
The idea is that realloc could move the allocation whereas resize
could only resize it in place. The latter is not present in C's
malloc family.
Where would such a function be useful? If the application needs to
expand an object, it needs to expand it - if that means moving it, so be
it. I can't see where you would have use for a function that /might/ be
able to expand an object. Do you have /real/ use-cases in mind?
Sure. Imagine implementing a rope data structure
https://en.wikipedia.org/wiki/Rope_(data_structure)
or anything which could be in parts and/or of varying size.
It can make sense to distinguish between "alloc_zeroed" and
"alloc_unintialised", for performance reasons.
Agreed, but what exactly would you want for the calls? Maybe
m_alloc(size) m_alloc_a(size, flags)
?
There could be many different interfaces. Dmitry also pointed out other
possible distinctions for allocation of shared memory, unpageable
memory, and so on. Not all memory is created equal!
Maybe you want a more object-oriented interface, for supporting several
different pools or memory allocation choices.
There is use for (and, I would argue, a need for) various different
types of allocator. Some allocators are good for fixed-size objects,
some for space which does not need to be returned until a program terminates, and some for where releases are in the opposite order from requests, for example.
But such allocators impose restrictions on how they can be used or how
they can be implemented, e.g. requiring meta space to manage the storage space.
There is still a need for one approach to be completely general:
* not requiring fixed-size allocations
* not requiring separate data structures to describe data space
* permitting arbitrary order of allocation and deallocation
Once one has a completely general allocator then it can take control of available address space and other, more specialised allocators can be
built on top of it - e.g. by requesting meta space and data space for an array-style allocator.
...
The one thing I would avoid at all costs, in any interfaces, is the
"integer flag" nonsense found in many languages (such as C). If you
really want some kind of flags, then at least be sure your language has
strong type-checked enumerations.
I presume you mean not to combine flags into an integer such as
mem_allocator_call(size, flags)
where flags is such as
ALLOC_DATA | ALLOC_WRITABLE
but what would you use instead of an integer?
No. It is pointless.
Would there be any value in giving the programmer a way to find
out the size of a given allocation? I don't mean the size
requested but the size allocated (which would be greater than or
equal to the size requested).
Why would anyone want to know the result of such a function? The
only conceivable thought would be if they had first allocated space
for x units, and then they now want to store y units - calling
"get_real_size" could let them know if they need to call "resize"
or not.
The answer is that they should simply call "resize". If "resize"
does not need to allocate new memory because the real size is big
enough, it does nothing.
It's partly for performance, as Bart's comments have backed up. If
code can find out the capacity then it can fill that allocation up to
the stated capacity without having to make any other calls and
without risking moving what has been stored so far.
There is no performance benefit in real code - and you should not care
about silly cases.
On the contrary, there is huge potential for real benefit in real code. There is no advantage in toy code. That's the difference.
I ask because it's simple enough to precede an aligned chunk ofI think you should not try to copy C's malloc/free mechanism. In
memory with the allocation size. The allocator may well do that
anyway in order to help it manage memory. So there seems to be no
good reason to keep that info from the programmer. The question
is over whether there's value in allowing the programmer to get
such info. I am thinking that he could get the value with a call
such as
m_msize(p)
It seems simple enough, potentially useful, and should be
harmless. But it's noticeable that the malloc-type calls don't
have anything similar. So maybe there's a good reason why not.
I thought it might cause implementation issues such as requiring
a larger header if the allocator wants to search only free
memory. But whatever implementation is used I think the allocator
will still need to either store or calculate the size of the
memory allocated so I am not sure I see a problem with the idea.
What do you guys think? Opinions welcome!
particular, I do not think the memory allocator should track the
sizes of allocations. "free" should always include the size,
matching the value used in "alloc". (And "resize" should pass the
old size as well as the new size.)
Bart said the same and the suggestion surprises me. It seems prone to
error.
Your choices with memory management are basically to either trust the
programmer, or do everything automatically with garbage collection.
Even if you use something like C++'s RAII mechanisms for allocating and
deallocating, you still have to trust the programmer to some extent. If
you are relying on the programmer to get their pointers right for calls
to "free", why are you worried that they'll get the size wrong?
The implementation has to know the sizes.
There's no point in requiring
the client program to know the sizes as well.
Make a bad interface now and you'll be stuck with it.
Indeed. Hence the original question.
I still don't understand your targets - either in terms of computer
systems, or expected uses. If the target is modern PC-style systems
(so that you can, like in Bart's languages, assume that you have
64-bit integers and plenty of memory), then you can make a string
format that has uses, say, 256 byte lumps. Some of that is for
pointers and length counts to handle chains for big strings, and maybe
reference counting, with over 200 bytes free for the string itself.
The solid majority of all strings in practical use will fit in one
lump, and you have no challenges with memory allocators, resizing, or
the rest of it. That simplification will completely outweigh the
inefficiency of the wasted memory. And then for long strings, doing
everything in a single C-style block is inefficient anyway - you'll
want some kind of linked chain anyway.
The targets are simply 'computers'. They can be of different sizes -
e.g. 12-bit processors with tiny amounts of RAM up to 64-bit processors
with oodles of RAM, and beyond. The size of the machine shouldn't much matter.
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 910 |
Nodes: | 10 (0 / 10) |
Uptime: | 191:09:19 |
Calls: | 12,115 |
Calls today: | 1 |
Files: | 186,503 |
Messages: | 2,226,170 |