Forum: War Ensemble BBS

Re: transpiling to low level C

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Tue Dec 17 18:29:37 2024

From Newsgroup: comp.lang.c

On 17.12.2024 00:26, bart wrote:

On 16/12/2024 20:39, Janis Papanagnou wrote:

I wasn't commenting on any "IL",

The subthread was about ILs.

You obviously missed that I wasn't quoting anything else from this
subthread but the one isolated statement I replied to (which also
wasn't a part of a larger paragraph but standing alone on its own).

(That's why I think it's good style to strip to the essentials one
intends to reply to and don't assume (or refer) to any unspoken or
unquoted parts of a thread; keep posts self-contained! YMMV. That
will also keep potential confusions and misunderstandings small.)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Dec 17 14:55:34 2024

From Newsgroup: comp.lang.c

Em 12/17/2024 4:03 AM, BGB escreveu:

On 12/16/2024 5:21 AM, Thiago Adams wrote:

On 15/12/2024 20:53, BGB wrote:

On 12/15/2024 3:32 PM, bart wrote:

On 15/12/2024 19:08, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive than C. >>>>> You can easily write a C++-statement that would hunddres of lines in >>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>> less expressive makes it even less readable, and that's also true for >>>>> your reduced C.

That's not really the point of it. This reduced C is used as an
intermediate language for a compiler target. It will not usually be
read, or maintained.

An intermediate language needs to at a lower level than the source
language.

And for this project, it needs to be compilable by any C89 compiler.

Generating C++ would be quite useless.

As an IL, even C is a little overkill, unless turned into a
restricted subset (say, along similar lines to GCC's GIMPLE).

Say:
   Only function-scope variables allowed;
   No high-level control structures;
   ...

Say:
   int foo(int x)
   {
     int i, v;
     for(i=x, v=0; i>0; i--)
       v=v*i;
     return(v);
   }

Becoming, say:
   int foo(int x)
   {
     int i;
     int v;
     i=x;
     v=0;
     if(i<=0)goto L1;
     L0:
     v=v*i;
     i=i-1;
     if(i>0)goto L0;
     L1:
     return v;
   }

...

I have considered to remove loops and keep only goto.
But I think this is not bring too much simplification.

It depends.

If the compiler works like an actual C compiler, with a full parser and
AST stage, yeah, it may not save much.

If the parser is a thin wrapper over 3AC operations (only allowing statements that map 1:1 with a 3AC IR operation), it may save a bit more...

As for whether or not it makes sense to use a C like syntax here, this
is more up for debate (for practical use within a compiler, I would
assume a binary serialization rather than an ASCII syntax, though ASCII
may be better in terms of inter-operation or human readability).

But, as can be noted, I would assume a binary serialization that is
oriented around operators; and *not* about serializing the structures
used to implement those operators. Also I would assume that the IR need
not be in SSA form (conversion to full SSA could be done when reading in
the IR operations).

Ny argument is that not using SSA form means fewer issues for both the serialization format and compiler front-end to need to deal with (and is comparably easy to regenerate for the backend, with the backend
operating with its internal IR in SSA form).

Well, contrast to LLVM assuming everything is always in SSA form.

...

I also have considered split expressions.

For instance

if (a*b+c) {}

into

register int r1 = a * b;
register int r2 = r1 + c;
if (r2) {}

This would make easier to add overflow checks in runtime (if desired)
and implement things like _complex

Is this what you mean by 3AC or SSA?

This would definitely simplify expressions grammar.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Dec 17 14:59:49 2024

From Newsgroup: comp.lang.c

Em 12/17/2024 2:55 PM, Thiago Adams escreveu:

Em 12/17/2024 4:03 AM, BGB escreveu:

On 12/16/2024 5:21 AM, Thiago Adams wrote:

On 15/12/2024 20:53, BGB wrote:

On 12/15/2024 3:32 PM, bart wrote:

On 15/12/2024 19:08, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive than C. >>>>>> You can easily write a C++-statement that would hunddres of lines in >>>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>>> less expressive makes it even less readable, and that's also true for >>>>>> your reduced C.

That's not really the point of it. This reduced C is used as an
intermediate language for a compiler target. It will not usually be >>>>> read, or maintained.

An intermediate language needs to at a lower level than the source
language.

And for this project, it needs to be compilable by any C89 compiler. >>>>>
Generating C++ would be quite useless.

As an IL, even C is a little overkill, unless turned into a
restricted subset (say, along similar lines to GCC's GIMPLE).

Say:
   Only function-scope variables allowed;
   No high-level control structures;
   ...

Say:
   int foo(int x)
   {
     int i, v;
     for(i=x, v=0; i>0; i--)
       v=v*i;
     return(v);
   }

Becoming, say:
   int foo(int x)
   {
     int i;
     int v;
     i=x;
     v=0;
     if(i<=0)goto L1;
     L0:
     v=v*i;
     i=i-1;
     if(i>0)goto L0;
     L1:
     return v;
   }

...

I have considered to remove loops and keep only goto.
But I think this is not bring too much simplification.

It depends.

If the compiler works like an actual C compiler, with a full parser
and AST stage, yeah, it may not save much.

If the parser is a thin wrapper over 3AC operations (only allowing
statements that map 1:1 with a 3AC IR operation), it may save a bit
more...

As for whether or not it makes sense to use a C like syntax here, this
is more up for debate (for practical use within a compiler, I would
assume a binary serialization rather than an ASCII syntax, though
ASCII may be better in terms of inter-operation or human readability).

But, as can be noted, I would assume a binary serialization that is
oriented around operators; and *not* about serializing the structures
used to implement those operators. Also I would assume that the IR
need not be in SSA form (conversion to full SSA could be done when
reading in the IR operations).

Ny argument is that not using SSA form means fewer issues for both the
serialization format and compiler front-end to need to deal with (and
is comparably easy to regenerate for the backend, with the backend
operating with its internal IR in SSA form).

Well, contrast to LLVM assuming everything is always in SSA form.

...

I also have considered split expressions.

For instance

if (a*b+c) {}

into

register int r1 = a * b;
register int r2 = r1 + c;
if (r2) {}

This would make easier to add overflow checks in runtime (if desired)
and implement things like _complex

Is this what you mean by 3AC or SSA?

This would definitely simplify expressions grammar.

I also have consider remove local scopes. But I think local scopes may
be useful to better use stack reusing the same addresses when variables
goes out of scope.
For instance

{
int i =1;
{
int a = 2;
}
{
int b = 3;
}
}
I think scope makes easier to use the same stack position of a and b
because it is easier to see a does not exist any more.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Dec 17 15:16:48 2024

From Newsgroup: comp.lang.c

Em 12/17/2024 2:59 PM, Thiago Adams escreveu:

Em 12/17/2024 2:55 PM, Thiago Adams escreveu:

Em 12/17/2024 4:03 AM, BGB escreveu:

On 12/16/2024 5:21 AM, Thiago Adams wrote:

On 15/12/2024 20:53, BGB wrote:

On 12/15/2024 3:32 PM, bart wrote:

On 15/12/2024 19:08, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive
than C.
You can easily write a C++-statement that would hunddres of lines in >>>>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>>>> less expressive makes it even less readable, and that's also true >>>>>>> for
your reduced C.

That's not really the point of it. This reduced C is used as an
intermediate language for a compiler target. It will not usually
be read, or maintained.

An intermediate language needs to at a lower level than the source >>>>>> language.

And for this project, it needs to be compilable by any C89 compiler. >>>>>>
Generating C++ would be quite useless.

As an IL, even C is a little overkill, unless turned into a
restricted subset (say, along similar lines to GCC's GIMPLE).

Say:
   Only function-scope variables allowed;
   No high-level control structures;
   ...

Say:
   int foo(int x)
   {
     int i, v;
     for(i=x, v=0; i>0; i--)
       v=v*i;
     return(v);
   }

Becoming, say:
   int foo(int x)
   {
     int i;
     int v;
     i=x;
     v=0;
     if(i<=0)goto L1;
     L0:
     v=v*i;
     i=i-1;
     if(i>0)goto L0;
     L1:
     return v;
   }

...

I have considered to remove loops and keep only goto.
But I think this is not bring too much simplification.

It depends.

If the compiler works like an actual C compiler, with a full parser
and AST stage, yeah, it may not save much.

If the parser is a thin wrapper over 3AC operations (only allowing
statements that map 1:1 with a 3AC IR operation), it may save a bit
more...

As for whether or not it makes sense to use a C like syntax here,
this is more up for debate (for practical use within a compiler, I
would assume a binary serialization rather than an ASCII syntax,
though ASCII may be better in terms of inter-operation or human
readability).

But, as can be noted, I would assume a binary serialization that is
oriented around operators; and *not* about serializing the structures
used to implement those operators. Also I would assume that the IR
need not be in SSA form (conversion to full SSA could be done when
reading in the IR operations).

Ny argument is that not using SSA form means fewer issues for both
the serialization format and compiler front-end to need to deal with
(and is comparably easy to regenerate for the backend, with the
backend operating with its internal IR in SSA form).

Well, contrast to LLVM assuming everything is always in SSA form.

...

I also have considered split expressions.

For instance

if (a*b+c) {}

into

register int r1 = a * b;
register int r2 = r1 + c;
if (r2) {}

This would make easier to add overflow checks in runtime (if desired)
and implement things like _complex

Is this what you mean by 3AC or SSA?

This would definitely simplify expressions grammar.

I also have consider remove local scopes. But I think local scopes may
be useful to better use stack reusing the same addresses when variables
goes out of scope.
For instance

{
int i =1;
{
int a = 2;
}
{
int b = 3;
}
}
I think scope makes easier to use the same stack position of a and b
because it is easier to see a does not exist any more.

also remove structs changing by unsigned char [] and cast parts of it to access members.

I think this the lower level possible in c.

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Tue Dec 17 18:37:47 2024

From Newsgroup: comp.lang.c

On 17/12/2024 18:16, Thiago Adams wrote:

also remove structs changing by unsigned char [] and cast parts of it to access members.

I think this the lower level possible in c.

This is what I do in my IL, where structs are just fixed blocks of so
many bytes.

But there are some things to consider:

* A struct may still need alignment corresponding to the strictest
alignment among the members. (Any padding between members and at the end should already be taken care of.)

I use an alignment based on overall size, so a 40-byte struct is assumed
to have an 64-bit max alignment, but it may only need 16-bit alignment.
That is harmless, but it can be fixed with some extra metadata.

With a C char[], you can choose to use a short[] array for example
(obviously of half the length) to signal that it needs 16-bit alignment.

* Some machine ABIs, like SYS V for 64 bits, may need to know the
internal layout of structs when they are passed 'by value'.

If reduced down to char[], this info will be missing.

I ignore this because I only target Win64 ABI. It only comes up in SYS
V, when calling functions across an FFI, and when the API uses value
structs, which is uncommon. And also makes I can't make head or tail of
the rules.

--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Tue Dec 17 18:46:00 2024

From Newsgroup: comp.lang.c

bart <bc@freeuk.com> wrote:

If you try to extract any meaning, it is that any control flow can be expressed either with 'goto' or with 'recursive functions'.

This is what I picked up on. Who on earth would eschew 'goto' and use
such a disproportionately more complex and inefficient method like
recursive functions?

Due to silly conding standard? Or in language that does not have
'goto'.

How would you even express an arbitrary goto from random point X in a function to random point Y, which may be inside differently nested
blocks, via a recursive function?

AFAICS in C main limitation is that you either pass all variables
as parameters (ugly and verbose) or use only global variables
(much worse than 'goto'). The following silly example shows
that 'if' can be simulated using array of function pointers and
indirect calls:

static int bar(int a) {
return a + 1;
}

static int baz(int a) {
return 2*a;
}

int
silly(int a) {
int (*t[2])(int) = {bar, baz};
return (*t[!!(a > 3)])(a);
}

If you compile it with 'gcc -S -O2' you can see that actually there
are no function calls in generated code (but generated code is clearly
crappy). However, needed optimication is really simple, so
in principle any compiler could do better. OTOH code like
this is rare in practice, so probably compiler writers did not
bother.

In similar way one can simulate dense C 'switch'.

Main point is that function call at the end of say 'F' to function
'G' which retruns in the same way as 'F' can be compiled to some
stack shuffling + goto (this is called 'tail call optimization').

IIUC at least some Scheme and ML compilers keep calls in intermediate
level representation (both have no 'goto') and convert them to
jumps only when emiting machine code.

Similar thing was used by one disassembly system: it generated "high
level code" by converting all jumps in machine code to function
calls. Later the result was cleaned up by transformations, in
particular recursion elimination.

Of course, for orginal purpose of this thread replacing 'if' by
indirect calls is useless
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Tue Dec 17 12:51:04 2024

From Newsgroup: comp.lang.c

On 12/17/2024 6:04 AM, bart wrote:

On 16/12/2024 21:23, Lawrence D'Oliveiro wrote:

On Sun, 15 Dec 2024 17:53:30 -0600, BGB wrote:

As an IL, even C is a little overkill, unless turned into a restricted
subset ...

Why not use WASM as your IL?

Have you tried it? I mean, directly generating WASM from a compiler front-end, not just using somebody else's tool to do so.

WASM is a stack-based language, but one that supposedly doesn't even
have branching, although there is a 'br' statement, with some restrictions.

Information about it is quite elusive; it took me 5 minutes to even get examples of what it looks like (and I've seen it before).

C can apparently compile to WASM via Clang, so I tried this program:

void F(void) {
    int i=0;
    while (i<10000) ++i;
}

which compiled to 128 lines of WASM (technically, some form of 'WAT', as WASM is a binary format). The 60 lines correspondoing to F are shown
below, and below that, is my own stack IL code.

So, what do you with your WASM/WAT program once generated? I've no idea, except that WASM is inextricably typed up with with browsers and with JavaScript, in which I have no interest.

With C, you run a compiler; with ASM, an assembler; these formats are
well understood.

You can appreciate that it can be easier to devise your own format and
your own tools that you understand 100%.

Hmm... It looks like the WASM example is already trying to follow SSA
rules, then mapped to a stack IL... Not necessarily the best way to do
it IMO.

But, yeah, in BGBCC I am also using a stack-based IL (RIL), which
follows rules more in a similar category to .NET CIL (in that, stack
items carry type, and the stack is generally fully emptied on branch).

In my IL, labels are identified with a LABEL opcode (with an immediate),
and things like branches work by having the branch target and label
having the same immediate (label ID).

I ended up considering this preferable to byte offsets, as:
Easier to generate from the front-end;
LABEL also marks the start/end of basic blocks;
...

There are also opcodes to convey the source filename and line number,
these don't generate any output but merely serve to transport filename
and line number information (useful for debugging).

RIL was a little weird in that functions and variables are themselves
defined via bytecode operations. This is unlike both JVM and .NET CIL,
which had used external metadata/structures for defining functions and variables (nevermind the significant differences between JVM and .NET in
this area).

This is pros/cons, main downside of the current format is that it
requires the bytecode modules to be loaded sequentially and fully. This
works OK for a compiler on a modern PC, but does impose on RAM somewhat
for a compiler on a more memory-constrained target. One idea would be to individually wrap functions and have a mechanism so that they can be
loaded dynamically. But, this hasn't really been done for my existing
IL. Most likely option is that metadata continues to be defined via
bytecode operations, just that each function is separately wrapped, and
there may be an index to map function names to the corresponding "lump"
(say, if using a WAD variant as the top-level container).

Say:
Lump name is "FNC01234" (IWAD) or "func_1234" (WAD2).
And there is a table mapping "FOO_SomeFunction" to "FNC01234" or "func_1234".
But, this sort of things, along with past ideas to try moving this over
to a format along similar lines to RIFF/AVI, have generally fizzled
(along with possible debate over to to the merits of a WAD-like or
RIFF-like format).

Though, an arguably simpler option might be to just individually wrap
the bytecode for each translation unit, and have an manifest of what
symbols are present. In this way, it would function more like a
traditional static library (as opposed to the current strategy of
globing all of the translation units in the library into a single large
blob of bytecode); and probably dumping the bytecode for each
translation unit into a WAD (again, possibly either IWAD or WAD2, though probably WAD2 in this case, as the comparably larger lumps would
eliminate most concern over the larger directory entries).

When converting to the 3AC IR, there is the quirk that function calls
are split into multiple parts:
The CALL operation, which ends the current basic-block;
A CSRV operation, which is at the start of a new basic block.
CSRV = Caller Save Return Value.

In cases where the 3AC was being interpreted, this was better, as the
CSRV operation serves to save the return value from the called function
to the correct place in the caller's frame (where the interpreter does
not use recursion for its own operation).
Internal conversion to 3AC was faster than trying to directly interpret
a stack bytecode (as well as 3AC being a better format for code generation).

-------------------------------------- F:                                      # @F
    .functype    F () -> ()
    .local      i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32,
i32, i32, i32
# %bb.0:
    global.get    __stack_pointer
    local.set    0
    i32.const    16
    local.set    1
    local.get    0
    local.get    1
    i32.sub
    local.set    2
    i32.const    0
    local.set    3
    local.get    2
    local.get    3
    i32.store    12 .LBB0_1:                                # =>This Inner Loop Header: Depth=1
    block
    loop                                        # label1:
    local.get    2
    i32.load    12
    local.set    4
    i32.const    10000
    local.set    5
    local.get    4
    local.set    6
    local.get    5
    local.set    7
    local.get    6
    local.get    7
    i32.lt_s
    local.set    8
    i32.const    1
    local.set    9
    local.get    8
    local.get    9
    i32.and
    local.set    10
    local.get    10
    i32.eqz
    br_if       1                               # 1: down to label0
# %bb.2:                                #   in Loop: Header=BB0_1 Depth=1
    local.get    2
    i32.load    12
    local.set    11
    i32.const    1
    local.set    12
    local.get    11
    local.get    12
    i32.add
    local.set    13
    local.get    2
    local.get    13
    i32.store    12
    br          0                               # 0: up to label1
.LBB0_3:
    end_loop
    end_block                               # label0:
    return
    end_function

-----------------------------

proc F::
           local    i32       i.1
    load     i32       0
    store    i32       i.1
    jump               #2
#4:
    load     u64       &i.1
    incrto   i32 /1
#2:
    load     i32       i.1
    load     i32       10000
    jumplt   i32       #4
#3:
#1:
    retproc
endproc

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Dec 17 16:07:45 2024

From Newsgroup: comp.lang.c

Em 12/17/2024 3:37 PM, bart escreveu:

On 17/12/2024 18:16, Thiago Adams wrote:

also remove structs changing by unsigned char [] and cast parts of it
to access members.

I think this the lower level possible in c.

This is what I do in my IL, where structs are just fixed blocks of so
many bytes.

How do you do with struct parameters?

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Tue Dec 17 13:07:44 2024

From Newsgroup: comp.lang.c

On 12/17/2024 11:55 AM, Thiago Adams wrote:

Em 12/17/2024 4:03 AM, BGB escreveu:

On 12/16/2024 5:21 AM, Thiago Adams wrote:

On 15/12/2024 20:53, BGB wrote:

On 12/15/2024 3:32 PM, bart wrote:

On 15/12/2024 19:08, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive than C. >>>>>> You can easily write a C++-statement that would hunddres of lines in >>>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>>> less expressive makes it even less readable, and that's also true for >>>>>> your reduced C.

That's not really the point of it. This reduced C is used as an
intermediate language for a compiler target. It will not usually be >>>>> read, or maintained.

An intermediate language needs to at a lower level than the source
language.

And for this project, it needs to be compilable by any C89 compiler. >>>>>
Generating C++ would be quite useless.

As an IL, even C is a little overkill, unless turned into a
restricted subset (say, along similar lines to GCC's GIMPLE).

Say:
   Only function-scope variables allowed;
   No high-level control structures;
   ...

Say:
   int foo(int x)
   {
     int i, v;
     for(i=x, v=0; i>0; i--)
       v=v*i;
     return(v);
   }

Becoming, say:
   int foo(int x)
   {
     int i;
     int v;
     i=x;
     v=0;
     if(i<=0)goto L1;
     L0:
     v=v*i;
     i=i-1;
     if(i>0)goto L0;
     L1:
     return v;
   }

...

I have considered to remove loops and keep only goto.
But I think this is not bring too much simplification.

It depends.

If the compiler works like an actual C compiler, with a full parser
and AST stage, yeah, it may not save much.

If the parser is a thin wrapper over 3AC operations (only allowing
statements that map 1:1 with a 3AC IR operation), it may save a bit
more...

As for whether or not it makes sense to use a C like syntax here, this
is more up for debate (for practical use within a compiler, I would
assume a binary serialization rather than an ASCII syntax, though
ASCII may be better in terms of inter-operation or human readability).

But, as can be noted, I would assume a binary serialization that is
oriented around operators; and *not* about serializing the structures
used to implement those operators. Also I would assume that the IR
need not be in SSA form (conversion to full SSA could be done when
reading in the IR operations).

Ny argument is that not using SSA form means fewer issues for both the
serialization format and compiler front-end to need to deal with (and
is comparably easy to regenerate for the backend, with the backend
operating with its internal IR in SSA form).

Well, contrast to LLVM assuming everything is always in SSA form.

...

I also have considered split expressions.

For instance

if (a*b+c) {}

into

register int r1 = a * b;
register int r2 = r1 + c;
if (r2) {}

This would make easier to add overflow checks in runtime (if desired)
and implement things like _complex

Is this what you mean by 3AC or SSA?

3AC means that IR expressed 3 (or sometimes more) operands per IR op.

So:
MUL r1, a, b
Rather than, say, stack:
LOAD a
LOAD b
MUL
STORE r1

SSA:
Static Single Assignment

Generally:
Every variable may only be assigned once (more like in a functional programming language);
Generally, variables are "merged" in the control-flow via PHI operators
(which variable merges in depending on which path control came from).

IMHO, while SSA is preferable for backend analysis, optimization, and
code generation; it is undesirable pretty much everywhere else as it
adds too much complexity.

Better IMO for the frontend compiler and main IL stage to assume that
local variables are freely mutable.

Typically, global variables are excluded in most variants, and remain
fully mutable; but may be handled as designated LOAD/STORE operations.

In BGBCC though, full SSA only applies to temporaries. Normal local
variables are merely flagged by "version", and all versions of the same
local variable implicitly merge back together at each branch/label.

This allows some similar advantages (for analysis and optimization)
while limiting some of the complexities. Though, this differs from
temporaries which are assumed to essentially fully disappear once they
go outside of the span in which they exist (albeit with an awkward case
to deal with temporaries that cross basic-block boundaries, which need
to actually "exist" in some semi-concrete form, more like local variables).

Note that unless the address is taken of a local variable, it need not
have any backing in memory. Temporaries can never have their address
taken, so generally exist exclusively in CPU registers.

This would definitely simplify expressions grammar.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Tue Dec 17 16:33:09 2024

From Newsgroup: comp.lang.c

Em 12/17/2024 4:07 PM, BGB escreveu:

On 12/17/2024 11:55 AM, Thiago Adams wrote:

Em 12/17/2024 4:03 AM, BGB escreveu:

On 12/16/2024 5:21 AM, Thiago Adams wrote:

On 15/12/2024 20:53, BGB wrote:

On 12/15/2024 3:32 PM, bart wrote:

On 15/12/2024 19:08, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive
than C.
You can easily write a C++-statement that would hunddres of lines in >>>>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>>>> less expressive makes it even less readable, and that's also true >>>>>>> for
your reduced C.

That's not really the point of it. This reduced C is used as an
intermediate language for a compiler target. It will not usually
be read, or maintained.

An intermediate language needs to at a lower level than the source >>>>>> language.

And for this project, it needs to be compilable by any C89 compiler. >>>>>>
Generating C++ would be quite useless.

As an IL, even C is a little overkill, unless turned into a
restricted subset (say, along similar lines to GCC's GIMPLE).

Say:
   Only function-scope variables allowed;
   No high-level control structures;
   ...

Say:
   int foo(int x)
   {
     int i, v;
     for(i=x, v=0; i>0; i--)
       v=v*i;
     return(v);
   }

Becoming, say:
   int foo(int x)
   {
     int i;
     int v;
     i=x;
     v=0;
     if(i<=0)goto L1;
     L0:
     v=v*i;
     i=i-1;
     if(i>0)goto L0;
     L1:
     return v;
   }

...

I have considered to remove loops and keep only goto.
But I think this is not bring too much simplification.

It depends.

If the compiler works like an actual C compiler, with a full parser
and AST stage, yeah, it may not save much.

If the parser is a thin wrapper over 3AC operations (only allowing
statements that map 1:1 with a 3AC IR operation), it may save a bit
more...

As for whether or not it makes sense to use a C like syntax here,
this is more up for debate (for practical use within a compiler, I
would assume a binary serialization rather than an ASCII syntax,
though ASCII may be better in terms of inter-operation or human
readability).

But, as can be noted, I would assume a binary serialization that is
oriented around operators; and *not* about serializing the structures
used to implement those operators. Also I would assume that the IR
need not be in SSA form (conversion to full SSA could be done when
reading in the IR operations).

Ny argument is that not using SSA form means fewer issues for both
the serialization format and compiler front-end to need to deal with
(and is comparably easy to regenerate for the backend, with the
backend operating with its internal IR in SSA form).

Well, contrast to LLVM assuming everything is always in SSA form.

...

I also have considered split expressions.

For instance

if (a*b+c) {}

into

register int r1 = a * b;
register int r2 = r1 + c;
if (r2) {}

This would make easier to add overflow checks in runtime (if desired)
and implement things like _complex

Is this what you mean by 3AC or SSA?

3AC means that IR expressed 3 (or sometimes more) operands per IR op.

So:
MUL r1, a, b
Rather than, say, stack:
LOAD a
LOAD b
MUL
STORE r1

SSA:
Static Single Assignment

Oh sorry .. I knew what SSA is.

Generally:
Every variable may only be assigned once (more like in a functional programming language);
Generally, variables are "merged" in the control-flow via PHI operators (which variable merges in depending on which path control came from).

I do similar merge in my flow analysis but without the concept of SSA.

IMHO, while SSA is preferable for backend analysis, optimization, and
code generation; it is undesirable pretty much everywhere else as it
adds too much complexity.

Better IMO for the frontend compiler and main IL stage to assume that
local variables are freely mutable.

Typically, global variables are excluded in most variants, and remain
fully mutable; but may be handled as designated LOAD/STORE operations.

In BGBCC though, full SSA only applies to temporaries. Normal local variables are merely flagged by "version", and all versions of the same local variable implicitly merge back together at each branch/label.

Sorry what is BGBCC ? (C compiler?)

This allows some similar advantages (for analysis and optimization)
while limiting some of the complexities. Though, this differs from temporaries which are assumed to essentially fully disappear once they
go outside of the span in which they exist (albeit with an awkward case
to deal with temporaries that cross basic-block boundaries, which need
to actually "exist" in some semi-concrete form, more like local variables).

Note that unless the address is taken of a local variable, it need not
have any backing in memory. Temporaries can never have their address
taken, so generally exist exclusively in CPU registers.

This would definitely simplify expressions grammar.

It can be added in the future.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Tue Dec 17 19:40:53 2024

From Newsgroup: comp.lang.c

On Tue, 17 Dec 2024 12:04:29 +0000, bart wrote:

Information about it is quite elusive ...

Did you try the usual place for Web-related stuff?

<https://developer.mozilla.org/en-US/docs/WebAssembly>
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Tue Dec 17 19:42:51 2024

From Newsgroup: comp.lang.c

On 17/12/2024 19:07, Thiago Adams wrote:

Em 12/17/2024 3:37 PM, bart escreveu:

On 17/12/2024 18:16, Thiago Adams wrote:

also remove structs changing by unsigned char [] and cast parts of it
to access members.

I think this the lower level possible in c.

This is what I do in my IL, where structs are just fixed blocks of so
many bytes.

How do you do with struct parameters?

In the IL they are always passed notionally by value. This side of the
IL (that is, the frontend compile that generates IL), knows nothing
about the target, such as ABI details.

(In practice, some things are known, like the word size of the target,
since that can change characteristics of the source language, like the
size of 'int' or of 'void*'. It also needs to assume, or request from
the backend, argument evaluation order, although my IL can reverse order
if necessary.)

It is the backend, on the other size of the IL, that needs to deal with
those details.

That can include making copies of structs that the ABI says are passed
by value. But when targeting SYS V ABI (which I haven't attempted yet),
it may need to know the internal layout of a struct.

You can however do experiments with using SYS V on Linux (must be 64 bits):

* Create test structs with, say, int32 or int64 elements

* Write a test function where such a struct is passed by value, and
then return a modified copy

* Rerun the test using a version of the function where a char[] version
of the struct is passed and returned, and which contains the member
access casts you suggested

* See if it gives the same results.

You might need a union of the two structs, or use memcpy to transfer
contents, before and after calling the test function.
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Tue Dec 17 19:45:49 2024

From Newsgroup: comp.lang.c

On 17/12/2024 19:40, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 12:04:29 +0000, bart wrote:

Information about it is quite elusive ...

Did you try the usual place for Web-related stuff?

<https://developer.mozilla.org/en-US/docs/WebAssembly>

That's all at the wrong level, eg:

"When you've written code in C/C++, you can then compile it into Wasm
using a tool like Emscripten"

It's not aimed at people /implementing/ such a tool.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Keith Thompson@Keith.S.Thompson+u@gmail.com to comp.lang.c on Tue Dec 17 12:13:15 2024

From Newsgroup: comp.lang.c

bart <bc@freeuk.com> writes:

On 17/12/2024 01:19, Keith Thompson wrote:

bart <bc@freeuk.com> writes:
[SNIP]

In that case I've no idea what you were trying to say.

When somebody says that 'goto' can emulate any control structure, then
clearly some of them need to be conditional; that is implied.

Your reply suggested they you can do away with 'goto', and use
recursive functions, in a scenario where no other control structures
need exist.

OK, if this is not for an IL, then it's not a language I would care
for either. Why tie one hand behind your back for no good reason?

I read Janis's post. I saw a suggestion that certain constructs are
*theoretically* unnecessary. I saw no suggestion of any advocacy for
such an approach.
"""
A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.
"""

This doesn't actually make much sense. So 'goto' is necessary, but
'goto' *is*?

I presume you didn't write what you intended to write. Responding to
what I *think* you meant :

Either
"if" and "goto"
or
"if" and recursive functions
are theoretically sufficient to express certain kinds of algorithms
(I'm handwaving a bit). Which implies that "goto" is not strictly
necessary. It also implies that recursive functions are not strictly
necessary if you have "goto".

Since this is comp.lang.c, not comp.theory (or what comp.theory was
intended to be), I'm not going to go into the details, nor am I going to
take the time to express the concept in mathematically rigorous terms.

If you try to extract any meaning, it is that any control flow can be expressed either with 'goto' or with 'recursive functions'.

Yes, either of those plus "if". It appears you understand the point.

This is what I picked up on. Who on earth would eschew 'goto' and use
such a disproportionately more complex and inefficient method like
recursive functions?

Perhaps it wasn't clear initially, but it should be by now,
that Janis was talking about what's theoretically sufficient to
express general algorithms. You seized on the silly idea that
Janis was *advocating* the use of one of the two minimal methods in
an intermediate language for a compiler. The idea Janis brought
up (briefly, in passing) is about theoretical computer science,
not practical software engineering. (Janis, please correct me if
I'm mistaken.)

Repeatedly asking why anyone would do such a thing misses the point.

[...]
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
void Void(void) { Void(); } /* The recursive call of the void */
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Tue Dec 17 22:25:44 2024

From Newsgroup: comp.lang.c

On Tue, 17 Dec 2024 19:45:49 +0000, bart wrote:

On 17/12/2024 19:40, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 12:04:29 +0000, bart wrote:

Information about it is quite elusive ...

Did you try the usual place for Web-related stuff?

<https://developer.mozilla.org/en-US/docs/WebAssembly>

It's not aimed at people /implementing/ such a tool.

It is aimed at those capable of following the links to relevant specs.
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Tue Dec 17 22:45:14 2024

From Newsgroup: comp.lang.c

On 17/12/2024 18:46, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

If you try to extract any meaning, it is that any control flow can be
expressed either with 'goto' or with 'recursive functions'.

This is what I picked up on. Who on earth would eschew 'goto' and use
such a disproportionately more complex and inefficient method like
recursive functions?

Due to silly conding standard? Or in language that does not have
'goto'.

It was suggested that 'theoretically', 'goto' could be replaced by
recursive function calls.

Whether still within the context of a language with no other control
flow instructions, is not known. The suggester also chose not to share examples of how it would work.

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Tue Dec 17 22:55:53 2024

From Newsgroup: comp.lang.c

On 17/12/2024 22:25, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 19:45:49 +0000, bart wrote:

On 17/12/2024 19:40, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 12:04:29 +0000, bart wrote:

Information about it is quite elusive ...

Did you try the usual place for Web-related stuff?

<https://developer.mozilla.org/en-US/docs/WebAssembly>

It's not aimed at people /implementing/ such a tool.

It is aimed at those capable of following the links to relevant specs.

It also a pretty terrible link. Trying to extract useful info a snippet
at a time is like pulling teeth. Here I was merely after an example of
WASM textual format.

WASM is somewhat like LLVM in that there the docs are so extensive that
they become impossible.

Show me (I assume you know all about it) how to write Hello, World in
WAT format, and what tool I need to download and use to run it. On Windows.

I can do it with my IL in half a page.
--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Wed Dec 18 00:23:12 2024

From Newsgroup: comp.lang.c

bart <bc@freeuk.com> wrote:

On 17/12/2024 18:46, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

If you try to extract any meaning, it is that any control flow can be
expressed either with 'goto' or with 'recursive functions'.

This is what I picked up on. Who on earth would eschew 'goto' and use
such a disproportionately more complex and inefficient method like
recursive functions?

Due to silly conding standard? Or in language that does not have
'goto'.

It was suggested that 'theoretically', 'goto' could be replaced by
recursive function calls.

Whether still within the context of a language with no other control
flow instructions, is not known. The suggester also chose not to share examples of how it would work.

The example I gave (and you snipped) was supposed to explain how
the technique works, but it seems that it is not enough. So
let us look at another example. Start from ordinary C code that
only uses global variables (this is not strictly necessary, but
let as make such assumption for simplicity):

int n;
int * a;
int b;
int i;

...
/* Simple search loop */
for(i = 0; i < n; i++) {
if (a[i] == b) {
break;
}
}

First, express flow control using only conditional and unconditional
jump:

l0:
i = 0;
goto l3;
l1:
int c1 = a[i] == b;
if (c1) {
goto l4;
} else {
goto l2;
}
l2:
i++;
l3:
int c2 = i < n;
if (c2) {
goto l1;
} else {
goto l4;
}
l4:
;

Note, I introduced more jumps than strictly necessary, so that
hunks between labels end either in conditional or unconditional
jump.

Next, replace each hunk staring in a label, up to (but not
including) next label, by a new function. Replace final jumps
by function calls, for conditional jumps using the same trick
as in previous 'silly' example:

int n;
int * a;
int b;
int i;

void l2(void);
void l3(void);
void l4(void);

void l0(void) {
i = 0;
l3();
}

void l1(void) {
void (*(t[2]))(void) = {l4, l2};
int c1 = a[i] == b;
(*(t[c1]))();
}

void l2(void) {
i++;
l3();
}

void l3(void) {
void (*(t[]))(void) = {l1, l4};
int c2 = i < n;
(*(t[c2]))();
}

void l4(void) {
}

Note: 'l4' is different than other functions, intead of calling
something it returns, ensuring that the sequence of calls
eventually terminate.

I hope that principles are clear now. If you compile this
with gcc at -O2 you will see that there are no calls
in generated code, only jumps. Slightly better code is
generated by clang. Note that generated code uses stack
only for final return.

BTW: you can see that currently tcc do not support this
coding style, that is code generated by tcc dully performs
all calls leading possibly to stack overflow and to
lower performance. Code generated by tcc from "jumpy"
version looks slightly worse than code generated by
clang from version using calls.
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Wed Dec 18 01:24:42 2024

From Newsgroup: comp.lang.c

On 18/12/2024 00:23, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

On 17/12/2024 18:46, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

If you try to extract any meaning, it is that any control flow can be
expressed either with 'goto' or with 'recursive functions'.

This is what I picked up on. Who on earth would eschew 'goto' and use
such a disproportionately more complex and inefficient method like
recursive functions?

Due to silly conding standard? Or in language that does not have
'goto'.

It was suggested that 'theoretically', 'goto' could be replaced by
recursive function calls.

Whether still within the context of a language with no other control
flow instructions, is not known. The suggester also chose not to share
examples of how it would work.

The example I gave (and you snipped) was supposed to explain how
the technique works, but it seems that it is not enough.

It showed how to do conditional code without explicit branching. It
didn't seem to me to cover arbitrary gotos, or where recursion comes
into it.

(Actually I implemented it in my two languages to compare performance to 'straight' versions, however my test called silly() lots of times so it
wasn't a good test.)

So
let us look at another example. Start from ordinary C code that
only uses global variables (this is not strictly necessary, but
let as make such assumption for simplicity):

int n;
int * a;
int b;
int i;

...
/* Simple search loop */
for(i = 0; i < n; i++) {
if (a[i] == b) {
break;
}
}

First, express flow control using only conditional and unconditional
jump:

l0:
i = 0;
goto l3;
l1:
int c1 = a[i] == b;
if (c1) {
goto l4;
} else {
goto l2;
}
l2:
i++;
l3:
int c2 = i < n;
if (c2) {
goto l1;
} else {
goto l4;
}
l4:
;

Note, I introduced more jumps than strictly necessary, so that
hunks between labels end either in conditional or unconditional
jump.

Next, replace each hunk staring in a label, up to (but not
including) next label, by a new function. Replace final jumps
by function calls, for conditional jumps using the same trick
as in previous 'silly' example:

int n;
int * a;
int b;
int i;

void l2(void);
void l3(void);
void l4(void);

void l0(void) {
i = 0;
l3();
}

void l1(void) {
void (*(t[2]))(void) = {l4, l2};
int c1 = a[i] == b;
(*(t[c1]))();
}

void l2(void) {
i++;
l3();
}

void l3(void) {
void (*(t[]))(void) = {l1, l4};
int c2 = i < n;
(*(t[c2]))();
}

void l4(void) {
}

Note: 'l4' is different than other functions, intead of calling
something it returns, ensuring that the sequence of calls
eventually terminate.

OK thanks for this. I tried to duplicate it based on this starting point:

#include <stdio.h>

int n=6;
int a[]={10,20,30,40,50,60};
int b=30;
int i;

int main(void) {
for(i = 0; i < n; i++) {
printf("%d\n",a[i]);
if (a[i] == b) {
break;
}
}
}

This prints 10 20 30 as it is. But the version with the function calls
showed only '10'. If I swapped '{l1, l4}' in l3(), then I got '10 10 20'.

I didn't spend too long to debug it further. I will take your word that
this works. (I tried 3 compilers all with the same results, including TCC.)

I don't fully understand it; what I got was that you first produce
linear code with labels. Each span between labels is turned into a
function. To 'step into' label L, or jump to L, I have to do L().

There would still be lots of questions (even ignoring the problems of accessing locals), like what the return path is, or how an early return
would work (also returning a value). Or what kind of pressure the stack
would be under.

It looks like a crude form of threaded code (which, when I use that,
never returns, and it doesn't use a stack either).

I've seen enough to know that it would be last kind of IL I would choose (unless it was the last IL left in the world - then maybe).

There is also the oddity that eliminating a simple kind of branching
relies on more elaborate branching: call and return mechanisms.

More interesting and more practical would be replacing call/return by
'goto'! (It would need to support label pointers or indirect jumps,
unless runtime code modification was allowed.)

(my test)
--------------------------
#include <stdio.h>

int n=6;
int a[]={10,20,30,40,50,60};
int b=30;
int i;

void k2(void);
void k3(void);
void k4(void);

void k0(void) {
i = 0;
k3();
}

void k1(void) {
void (*(t[2]))(void) = {k4, k2};
printf("%d\n",a[i]);
int c1 = a[i] == b;
(*(t[c1]))();
}

void k2(void) {
i++;
// k3();
}

void k3(void) {
void (*(t[]))(void) = {k4, k1};
int c2 = i < n;
(*(t[c2]))();
}

void k4(void) {
}

int main(void) {
k0();
k1();
k2();
k3();
k4();
}

--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Wed Dec 18 03:51:17 2024

From Newsgroup: comp.lang.c

bart <bc@freeuk.com> wrote:

On 18/12/2024 00:23, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

On 17/12/2024 18:46, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

If you try to extract any meaning, it is that any control flow can be >>>>> expressed either with 'goto' or with 'recursive functions'.

This is what I picked up on. Who on earth would eschew 'goto' and use >>>>> such a disproportionately more complex and inefficient method like
recursive functions?

Due to silly conding standard? Or in language that does not have
'goto'.

It was suggested that 'theoretically', 'goto' could be replaced by
recursive function calls.

Whether still within the context of a language with no other control
flow instructions, is not known. The suggester also chose not to share
examples of how it would work.

The example I gave (and you snipped) was supposed to explain how
the technique works, but it seems that it is not enough.

It showed how to do conditional code without explicit branching. It
didn't seem to me to cover arbitrary gotos, or where recursion comes
into it.

(Actually I implemented it in my two languages to compare performance to 'straight' versions, however my test called silly() lots of times so it wasn't a good test.)

So
let us look at another example. Start from ordinary C code that
only uses global variables (this is not strictly necessary, but
let as make such assumption for simplicity):

int n;
int * a;
int b;
int i;

...
/* Simple search loop */
for(i = 0; i < n; i++) {
if (a[i] == b) {
break;
}
}

First, express flow control using only conditional and unconditional
jump:

l0:
i = 0;
goto l3;
l1:
int c1 = a[i] == b;
if (c1) {
goto l4;
} else {
goto l2;
}
l2:
i++;
l3:
int c2 = i < n;
if (c2) {
goto l1;
} else {
goto l4;
}
l4:
;

Note, I introduced more jumps than strictly necessary, so that
hunks between labels end either in conditional or unconditional
jump.

Next, replace each hunk staring in a label, up to (but not
including) next label, by a new function. Replace final jumps
by function calls, for conditional jumps using the same trick
as in previous 'silly' example:

int n;
int * a;
int b;
int i;

void l2(void);
void l3(void);
void l4(void);

void l0(void) {
i = 0;
l3();
}

void l1(void) {
void (*(t[2]))(void) = {l4, l2};

^^^^^^^
Should be
l2, l4

int c1 = a[i] == b;
(*(t[c1]))();
}

void l2(void) {
i++;
l3();
}

void l3(void) {
void (*(t[]))(void) = {l1, l4};

^^^^^^
l4, l2

int c2 = i < n;
(*(t[c2]))();
}

void l4(void) {
}

Note: 'l4' is different than other functions, intead of calling
something it returns, ensuring that the sequence of calls
eventually terminate.

OK thanks for this. I tried to duplicate it based on this starting point:

#include <stdio.h>

int n=6;
int a[]={10,20,30,40,50,60};
int b=30;
int i;

int main(void) {
for(i = 0; i < n; i++) {
printf("%d\n",a[i]);
if (a[i] == b) {
break;
}
}
}

This prints 10 20 30 as it is. But the version with the function calls showed only '10'. If I swapped '{l1, l4}' in l3(), then I got '10 10 20'.

Sorry, there was a thinko: 1 is true and this is the second element
of the array, while I was thinking that the first one is true branch
and second is false branch.

I didn't spend too long to debug it further. I will take your word that
this works. (I tried 3 compilers all with the same results, including TCC.)

I don't fully understand it; what I got was that you first produce
linear code with labels. Each span between labels is turned into a
function. To 'step into' label L, or jump to L, I have to do L().

Yes.

There would still be lots of questions (even ignoring the problems of accessing locals), like what the return path is, or how an early return would work (also returning a value). Or what kind of pressure the stack would be under.

OK, you take a function F, it has some arguments and local variables.
And some retrun type. You create "entry function" to take the
same arguments as F and has the same return type as F. You tranform
body as above, but now each new function has the same return type
as F and arguments are arguments of original function + extra arguments,
one for each local variable of F. In "entry function" you call
function corresponding to first label passing it arguments and
initial values of local variables of F. In each subseqent call
you pass argument and values around so that they are available
in each new function. And the call is an argument to return
statement. When you want to return you simply return value,
without performing a call.

Stack use depend on optimizations in your compiler. With small
effort compiler can recognize that it will return value (possibly
void) from a call and replace such call by stack+register
shuffling + jump. Actually when there is return value, you
have something like

return lx(a0, a1, ..., ak);

which is easy to recognize due to 'return' keyword. One also
need to check that types agree (C automatically applies integer
convertions, but such convertions may produce real code, so in
such case one needs normal call). In void case one need to
check that there the call is textually last thing or that
it is followed by return statement. Stack+register
shuffling may require some code before control transfer, but
call can be replaced by jump.

So, if compiler has tail call optimization, then there is no
more stack use than maximum needed by any of the functions.

Note: I described general transformation, partially to show
that 'if' is _not_ needed. But similar style is used to
write code by hand. In hand written code people do not
bother with transforming 'if', which makes tail call
optimization a bit more complicated. OTOH, unlike rather
ugly code produced by mechanical transformation, hand
written code depending on tail call optimization may be quite
nice and readible. There is potential trouble: sometimes
author thinks that a call is a tail call, but compiler
disagrees, leading to lower efficiency.

Of course, when compiler do not have tail call optimization,
then stack use may be quite high.

It looks like a crude form of threaded code (which, when I use that,
never returns, and it doesn't use a stack either).

IMO it is quite different than what I know as threaded code.

I've seen enough to know that it would be last kind of IL I would choose (unless it was the last IL left in the world - then maybe).

There is also the oddity that eliminating a simple kind of branching
relies on more elaborate branching: call and return mechanisms.

One motivation for eliminating 'goto' is that it is not easy to
say what effect 'goto' has on variables. I mean, variables keep
ther values, but when you may arrive to given point from several
places than values of variables depend on place that control came
from, and this may be hard to analyze. In a sense functions have
the same problem, but there is well-developed technique to reason
about function calls. So both jumps and function calls are
hard to analyze, but eliminating jumps allows re-use of work
done for functions.

More interesting and more practical would be replacing call/return by 'goto'! (It would need to support label pointers or indirect jumps,
unless runtime code modification was allowed.)

The point is that calls are strictly more powerful than jumps
(you get parameter passing and local variables).
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Wed Dec 18 05:55:31 2024

From Newsgroup: comp.lang.c

On Tue, 17 Dec 2024 22:55:53 +0000, bart wrote:

On 17/12/2024 22:25, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 19:45:49 +0000, bart wrote:

It's not aimed at people /implementing/ such a tool.

It is aimed at those capable of following the links to relevant specs.

It also a pretty terrible link.

Did you see this link <https://developer.mozilla.org/en-US/docs/WebAssembly/Reference>? Lots of examples from there.
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Wed Dec 18 12:08:24 2024

From Newsgroup: comp.lang.c

On 17/12/2024 18:51, BGB wrote:

On 12/17/2024 6:04 AM, bart wrote:

C can apparently compile to WASM via Clang, so I tried this program:

  void F(void) {
     int i=0;
     while (i<10000) ++i;
  }

which compiled to 128 lines of WASM (technically, some form of 'WAT',
as WASM is a binary format). The 60 lines correspondoing to F are
shown below, and below that, is my own stack IL code.

I'm not even sure what format that code is in, as WAT is supposed to use S-expressions. The generated code is flat. It differs in other ways from examples of WAT.

Hmm... It looks like the WASM example is already trying to follow SSA
rules, then mapped to a stack IL... Not necessarily the best way to do
it IMO.

I hadn't considered that SSA could be represented in stack form.

But couldn't each push be converted to an assignment to a fresh
variable, and the same with pop?

As for Phi functions, the only similar thing I encounter (but could be mistaken), is when there is a choice of paths to yield a value (such as
(c ? a : b) in C; my language has several such constructs).

With stack code, the result conveniently ends up on top of the stack
whichever path is taken, which is a big advantage. Unless you then have
to convert that to register code, and need to ensure the values end up
in the same register when the control paths join up again.

But, yeah, in BGBCC I am also using a stack-based IL (RIL), which
follows rules more in a similar category to .NET CIL (in that, stack
items carry type, and the stack is generally fully emptied on branch).

In my IL, labels are identified with a LABEL opcode (with an immediate),
and things like branches work by having the branch target and label
having the same immediate (label ID).

So, you jump to label L123, and the label looks like:

L123:

I think that is pretty standard! But it sounds like you use a very tight encoding for bytecode, while mine uses a 32-byte descriptor for each IL instruction.

(One quibble with labels is whether a label definition occupies an
actual IL instruction. With my IL used as a backend for static
languages, it does. And there can be clusters of labels at the same spot.

With dynamic bytecode designed for interpretation, it doesn't. It uses a different structure. This means labels don't need to be 'executed' when encountered.)

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Dec 18 17:19:01 2024

From Newsgroup: comp.lang.c

On 17.12.2024 21:13, Keith Thompson wrote:

bart <bc@freeuk.com> writes:

[...]

[...]
(Janis, please correct me if I'm mistaken.)

I think it couldn't have been explained clearer. - Thanks.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Wed Dec 18 17:26:49 2024

From Newsgroup: comp.lang.c

On 17.12.2024 19:46, Waldek Hebisch wrote:

bart <bc@freeuk.com> wrote:

[...]

[ ponderings about where recursive functions might be used ]

Due to silly conding standard? Or in language that does not have
'goto'.

(I'd rule out the "coding standards" hypothesis.)

Languages without 'goto', I suppose, would either have other control
constructs ('while', etc.) to formulate in an imperative style, or
be of the Functional Programming Languages type.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 18 12:50:20 2024

From Newsgroup: comp.lang.c

On 12/18/2024 6:08 AM, bart wrote:

On 17/12/2024 18:51, BGB wrote:

On 12/17/2024 6:04 AM, bart wrote:

C can apparently compile to WASM via Clang, so I tried this program:

  void F(void) {
     int i=0;
     while (i<10000) ++i;
  }

which compiled to 128 lines of WASM (technically, some form of 'WAT',
as WASM is a binary format). The 60 lines correspondoing to F are
shown below, and below that, is my own stack IL code.

I'm not even sure what format that code is in, as WAT is supposed to use S-expressions. The generated code is flat. It differs in other ways from examples of WAT.

Dunno there...

It looks like WASM has changed slightly from what I remember when I
originally looked at it, so it could be "possible" if it could be made
to support separate compilation and similar.

Hmm... It looks like the WASM example is already trying to follow SSA
rules, then mapped to a stack IL... Not necessarily the best way to do
it IMO.

I hadn't considered that SSA could be represented in stack form.

But couldn't each push be converted to an assignment to a fresh
variable, and the same with pop?

As for Phi functions, the only similar thing I encounter (but could be mistaken), is when there is a choice of paths to yield a value (such as
(c ? a : b) in C; my language has several such constructs).

I was mostly noting that it appeared that every operation was creating a
new variable and only assigning to it once.

I didn't look too much more closely than this, only to note that it was different.

With stack code, the result conveniently ends up on top of the stack whichever path is taken, which is a big advantage. Unless you then have
to convert that to register code, and need to ensure the values end up
in the same register when the control paths join up again.

With JVM, the rule was that all paths landing at the same label need to
have the same stack depth and same types.

With .NET, the rule was that the stack was always empty, any merging
would need to be done using variables.

BGBCC is sorta mixed:
In most cases, it follows the .NET rule;
A special-case exception exists mostly for implementing the ?: operation (which in turn has special stack operations to signal its use).

BEGINU // start a ?: operator
L0:
... //one case
SETU
JMP L2
L1:
... //other case
SETU
JMP L2
ENDU
L2:

This is a bit of wonk, if I were designing it now, would likely do it
the same as .NET, and use temporary variables.

Actually, I might be tempted to use a 3AC IR as well (though, probably non-SSA). And, probably design things a bit differently.

In this case, if I did a 3AC IR, might design a textual syntax along
similar lines to BASIC or FORTRAN 77 (albeit probably without the
fixed-column formatting or line numbers).

Though, the nominal format for use in the compiler would remain binary.

But, yeah, in BGBCC I am also using a stack-based IL (RIL), which
follows rules more in a similar category to .NET CIL (in that, stack
items carry type, and the stack is generally fully emptied on branch).

In my IL, labels are identified with a LABEL opcode (with an
immediate), and things like branches work by having the branch target
and label having the same immediate (label ID).

So, you jump to label L123, and the label looks like:

L123:

Yeah, in textual form.
Though, the label is internally represented as, say:
LABEL 123

IIRC, usually numbering starts over from 0 for each function, though in
the backend IR all labels get a unique number within a 24-bit numbering
space.

The labels are then split into several categories:
Global labels, used to identify functions/variables, with an associated
name;
IL labels, which were mapped over from the RIL bytecode;
Temporary labels, which exist solely in the backend;
Line numbers, not true labels, mostly exist to convey line-number info (associated with a file-name and line number);
Special/Architectural, used as placeholders for things like CPU
registers (for variable load/store).

I think that is pretty standard! But it sounds like you use a very tight encoding for bytecode, while mine uses a 32-byte descriptor for each IL instruction.

(One quibble with labels is whether a label definition occupies an
actual IL instruction. With my IL used as a backend for static
languages, it does. And there can be clusters of labels at the same spot.

With dynamic bytecode designed for interpretation, it doesn't. It uses a different structure. This means labels don't need to be 'executed' when encountered.)

In my interpreters, it always uses a bytecode operation.
However, apart from my very early interpreters, typically the stack IL
is not used directly.

So, a personal timeline was like:
2003/2004: BGBScript came into existence
First version used DOM and directly walked the DOM tree.
Used a GC, generated lots of garbage objects;
Syntax was based on JavaScript with some wonk;
Was horridly slow.
2006:
BGBScript VM (BS-VM) was rewritten to S-Expressions internally;
Dropped some of the original wonk, moving to a cleaner JS syntax;
Went to a bytecode interpreter.
2007:
BGBCC was written using the frontend from the 2003 VM as a base;
The IL design was based on 2006 BS-VM;
Replaced the original DOM with a custom stand-in;
Used parts of the 2006 VM as well.
2009:
The BS-VM was modified to turn the stack IL into 3AC and run this;
Also had a JIT and similar by this point;
Using 3AC and JIT made things significantly faster;
Also tended to leak a lot less garbage,
operating mostly at "steady state".
Syntactically, it had become more like ActionScript3 or HaXE.
2013: Created BGBScript2 (BS2)
This mostly resembled a Java/C#/AS3 hybrid;
Eliminated the GC in favor of primarily static + manual MM.
2015/2016: Created the BGBTech2 3D engine
Partly written in a mix of C and BGBScript2
Was my biggest project to use BS2

Then:
2017: Started on my BJX1 project
Revived BGBCC, used it as the compiler.
2019: Rebooted the project to BJX2.
BJX1 quickly turned into a huge mess
which was non-viable to implement in an FPGA.
Until now, BJX2 project has continued.

Some stuff following the design of the BS2 VM was back-ported onto
BGBCC, but in many ways, BGBCC has a lot more cruft.

In the BS2 VM, the image format is a TLV container.
There is a string table, data area for functions/etc;
Index tables;
...
Generally, functions could be loaded and converted to 3AC on demand.

The IL in the BS2 VM was not a pure stack machine, but more like:
OP with 2 stack args, stack dest (common with BGBCC)
OP with 2 stack args, local dest (common with BGBCC)
OP with 2 local args, stack dest
OP with 2 local args, local dest (like in 3AC)
OP with local and immediate, stack dest
OP with local and immediate, local dest
OP with local and stack, stack dest
OP with local and stack, local dest

This was more complicated, but reduced the number of IL operations. Internally, it all converted to 3AC for the backend interpreter.

The incentive to do this for BGBCC was less, as folding the
local-variable or constant-loads into the operator is less immediately beneficial to a compiler; but does make the bytecode loader more
complicated. Folding the destination register into the bytecode ops in
many cases is still relevant, as it is comparably harder to fold the destination-store into the 3AC op than to fold a source load.

Generally, bytecode ops and operands were encoded with VLNs (variable
length numbers).

Generally (numberic VLN):
00..7F: 0..127
00..BF XX: 128..16383
C0..DF XX XX: 16384..2M
...

These values were encoded in MSB first order, and could directly
represent values up to 64 bits (in both the BS2VM and BGBCC, 128-bit
values tend to be represented as pairs of 64-bit values).

For signed integer values, the sign was folded into the LSB.
Floating point values were represented as a base/exponent VLN pair.
Basically, an integer value scaled by a power-of-2 exponent.

Opcodes were different, IIRC:
00..DF: Single Byte
E0..EF: Two Byte (224..4095)
F0..F7: Three Byte
...

But, generally, only 1 and 2 byte cases were used.

IIRC, did not define a textual notation for the BS2VM's ASM.

Local variables, labels, etc, were all identified as numeric indices.
Typically a single byte.

Like JVM, and unlike BGBCC, in the BS2VM, all the variables (including arguments) were held in an array of local variables (BGBCC has locals, arguments, and temporaries, as 3 separate spaces).

IIRC, BS2VM had still used variable type-tagging (like BGBCC and .NET),
rather than the untyped variables with typed operators scheme (what JVM
had used).

But, typed operators more make sense if you intend to interpret the
stack bytecode directly, which was generally not done in my VMs (except
in very early versions). Otherwise, implicitly typed operators probably
make more sense.

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 18 12:51:04 2024

From Newsgroup: comp.lang.c

On 12/17/2024 1:42 PM, bart wrote:

On 17/12/2024 19:07, Thiago Adams wrote:

Em 12/17/2024 3:37 PM, bart escreveu:

On 17/12/2024 18:16, Thiago Adams wrote:

also remove structs changing by unsigned char [] and cast parts of
it to access members.

I think this the lower level possible in c.

This is what I do in my IL, where structs are just fixed blocks of so
many bytes.

How do you do with struct parameters?

In the IL they are always passed notionally by value. This side of the
IL (that is, the frontend compile that generates IL), knows nothing
about the target, such as ABI details.

(In practice, some things are known, like the word size of the target,
since that can change characteristics of the source language, like the
size of 'int' or of 'void*'. It also needs to assume, or request from
the backend, argument evaluation order, although my IL can reverse order
if necessary.)

It is the backend, on the other size of the IL, that needs to deal with those details.

That can include making copies of structs that the ABI says are passed
by value. But when targeting SYS V ABI (which I haven't attempted yet),
it may need to know the internal layout of a struct.

You can however do experiments with using SYS V on Linux (must be 64 bits):

* Create test structs with, say, int32 or int64 elements

* Write a test function where such a struct is passed by value, and
then return a modified copy

* Rerun the test using a version of the function where a char[] version
of the struct is passed and returned, and which contains the member
access casts you suggested

* See if it gives the same results.

You might need a union of the two structs, or use memcpy to transfer contents, before and after calling the test function.

I took a different approach:
In the backend IR stage, structs are essentially treated as references
to the structure.

A local structure may be "initialized" via an IR operation, in which
point it will be assigned storage in the stack frame, and the reference
will be initialized to the storage area for the structure.

Most operations will pass them by reference.

Assigning a struct will essentially be turned into a struct-copy
operation (using the same mechanism as inline memcpy).

Type model could be seen as multiple levels:
I: integer types of 'int' and smaller;
L: integer types of 64 bits or less that are not I.
D: 'double' and smaller floating-point types.
A: Address (pointers, arrays, structs, ...)
X: 128-bit types.
int128, 'long double', SIMD vectors, ...

I:
char, signed char, unsigned char
short, unsigned short
int, unsigned int
_Bool, wchar_t, ...
L:
long, long long, unsigned long, unsigned long long
64-bit SIMD vectors
variant (sorta)
D: double, float, short float
A:
pointers
arrays
structs
class instances
...
X:
grab bag of pretty much everything that is 128 bits.

The toplevel types all basically have similar storage and behavior, so
in many cases one can rely on this rather than the actual type.

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 18 12:51:23 2024

From Newsgroup: comp.lang.c

On 12/17/2024 1:33 PM, Thiago Adams wrote:

Em 12/17/2024 4:07 PM, BGB escreveu:

On 12/17/2024 11:55 AM, Thiago Adams wrote:

Em 12/17/2024 4:03 AM, BGB escreveu:

On 12/16/2024 5:21 AM, Thiago Adams wrote:

On 15/12/2024 20:53, BGB wrote:

On 12/15/2024 3:32 PM, bart wrote:

On 15/12/2024 19:08, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive >>>>>>>> than C.
You can easily write a C++-statement that would hunddres of
lines in
C (imagines specializing a unordered_map by hand). Making a
language
less expressive makes it even less readable, and that's also
true for
your reduced C.

That's not really the point of it. This reduced C is used as an >>>>>>> intermediate language for a compiler target. It will not usually >>>>>>> be read, or maintained.

An intermediate language needs to at a lower level than the
source language.

And for this project, it needs to be compilable by any C89 compiler. >>>>>>>
Generating C++ would be quite useless.

As an IL, even C is a little overkill, unless turned into a
restricted subset (say, along similar lines to GCC's GIMPLE).

Say:
   Only function-scope variables allowed;
   No high-level control structures;
   ...

Say:
   int foo(int x)
   {
     int i, v;
     for(i=x, v=0; i>0; i--)
       v=v*i;
     return(v);
   }

Becoming, say:
   int foo(int x)
   {
     int i;
     int v;
     i=x;
     v=0;
     if(i<=0)goto L1;
     L0:
     v=v*i;
     i=i-1;
     if(i>0)goto L0;
     L1:
     return v;
   }

...

I have considered to remove loops and keep only goto.
But I think this is not bring too much simplification.

It depends.

If the compiler works like an actual C compiler, with a full parser
and AST stage, yeah, it may not save much.

If the parser is a thin wrapper over 3AC operations (only allowing
statements that map 1:1 with a 3AC IR operation), it may save a bit
more...

As for whether or not it makes sense to use a C like syntax here,
this is more up for debate (for practical use within a compiler, I
would assume a binary serialization rather than an ASCII syntax,
though ASCII may be better in terms of inter-operation or human
readability).

But, as can be noted, I would assume a binary serialization that is
oriented around operators; and *not* about serializing the
structures used to implement those operators. Also I would assume
that the IR need not be in SSA form (conversion to full SSA could be
done when reading in the IR operations).

Ny argument is that not using SSA form means fewer issues for both
the serialization format and compiler front-end to need to deal with
(and is comparably easy to regenerate for the backend, with the
backend operating with its internal IR in SSA form).

Well, contrast to LLVM assuming everything is always in SSA form.

...

I also have considered split expressions.

For instance

if (a*b+c) {}

into

register int r1 = a * b;
register int r2 = r1 + c;
if (r2) {}

This would make easier to add overflow checks in runtime (if desired)
and implement things like _complex

Is this what you mean by 3AC or SSA?

3AC means that IR expressed 3 (or sometimes more) operands per IR op.

So:
   MUL r1, a, b
Rather than, say, stack:
   LOAD a
   LOAD b
   MUL
   STORE r1

SSA:
   Static Single Assignment

Oh sorry .. I knew what SSA is.

Generally:
Every variable may only be assigned once (more like in a functional
programming language);
Generally, variables are "merged" in the control-flow via PHI
operators (which variable merges in depending on which path control
came from).

I do similar merge in my flow analysis but without the concept of SSA.

IMHO, while SSA is preferable for backend analysis, optimization, and
code generation; it is undesirable pretty much everywhere else as it
adds too much complexity.

Better IMO for the frontend compiler and main IL stage to assume that
local variables are freely mutable.

Typically, global variables are excluded in most variants, and remain
fully mutable; but may be handled as designated LOAD/STORE operations.

In BGBCC though, full SSA only applies to temporaries. Normal local
variables are merely flagged by "version", and all versions of the
same local variable implicitly merge back together at each branch/label.

Sorry what is BGBCC ? (C compiler?)

It is my C compiler.

Can be found within my current main project: https://github.com/cr88192/bgbtech_btsr1arch/tree/master/bgbcc22

It started out, long ago, as a fork off my scripting language, which was originally a JavaScript clone.

First stage:
Originally written as a C interpreter of sorts.

Original idea was to use dynamically compiled C as an application
scripting language, but C wasn't a great language for this task (vs a JS clone), and the compiler was a lot harder to debug.

Then, for a while, it was turned over to mining metadata from headers to generate an FFI for the script language.

Its use as a C compiler was revived when I started my CPU ISA project,
as I needed a compiler for it, and other options (Clang, GCC, and LCC)
were unattractive in various ways.

Though, in all, a lot more effort in the project has gone into the C
compiler than into much of anything else, and it is still a bit of a
pain finding and fixing bugs (and avoiding causing new bugs).

It targets both BJX2 (my own ISA) or RISC-V, albeit using PE/COFF for
the latter (rather than ELF).

This allows some similar advantages (for analysis and optimization)
while limiting some of the complexities. Though, this differs from
temporaries which are assumed to essentially fully disappear once they
go outside of the span in which they exist (albeit with an awkward
case to deal with temporaries that cross basic-block boundaries, which
need to actually "exist" in some semi-concrete form, more like local
variables).

Note that unless the address is taken of a local variable, it need not
have any backing in memory. Temporaries can never have their address
taken, so generally exist exclusively in CPU registers.

This would definitely simplify expressions grammar.

It can be added in the future.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Thiago Adams@thiago.adams@gmail.com to comp.lang.c on Wed Dec 18 16:43:59 2024

From Newsgroup: comp.lang.c

Em 12/18/2024 3:51 PM, BGB escreveu:

I took a different approach:
In the backend IR stage, structs are essentially treated as references
to the structure.

A local structure may be "initialized" via an IR operation, in which
point it will be assigned storage in the stack frame, and the reference
will be initialized to the storage area for the structure.

Most operations will pass them by reference.

Assigning a struct will essentially be turned into a struct-copy
operation (using the same mechanism as inline memcpy).

But what happens with calling a external C function that has a struct X
as parameter? (not pointer to struct)
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Wed Dec 18 23:37:11 2024

From Newsgroup: comp.lang.c

On 18/12/2024 18:50, BGB wrote:

On 12/18/2024 6:08 AM, bart wrote:

With stack code, the result conveniently ends up on top of the stack
whichever path is taken, which is a big advantage. Unless you then
have to convert that to register code, and need to ensure the values
end up in the same register when the control paths join up again.

With JVM, the rule was that all paths landing at the same label need to
have the same stack depth and same types.

With .NET, the rule was that the stack was always empty, any merging
would need to be done using variables.

BGBCC is sorta mixed:
In most cases, it follows the .NET rule;
A special-case exception exists mostly for implementing the ?: operation (which in turn has special stack operations to signal its use).

BEGINU // start a ?: operator
L0:
... //one case
SETU
JMP L2
L1:
... //other case
SETU
JMP L2
ENDU
L2:

This is a bit of wonk,

Well, this is pretty much what I do in stack code. I consider it impure,
as in needing artificial hints, but also the simplest solution.

I use opcodes STARTMX, RESETMX, ENDMX. They are no-ops when the IL is interpreted. But during the linear scan needed during code generation,
where it has to keep track of the IL's operand stack, RESETMX will reset
the stack too.

(As mentioned, I have a lot more constructs that can yield N values not
just two. Apart from N-way select, if-else, switch-when and case-when statements can also return values.)

if I were designing it now, would likely do it
the same as .NET, and use temporary variables.

In 3AC then it's easy, all paths write to the same temporary.

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 18 18:27:26 2024

From Newsgroup: comp.lang.c

On 12/18/2024 1:43 PM, Thiago Adams wrote:

Em 12/18/2024 3:51 PM, BGB escreveu:

I took a different approach:
In the backend IR stage, structs are essentially treated as references
to the structure.

A local structure may be "initialized" via an IR operation, in which
point it will be assigned storage in the stack frame, and the
reference will be initialized to the storage area for the structure.

Most operations will pass them by reference.

Assigning a struct will essentially be turned into a struct-copy
operation (using the same mechanism as inline memcpy).

But what happens with calling a external C function that has a struct X
as parameter? (not pointer to struct)

In my ABI, if larger than 16 bytes, it is passed by reference (as a
pointer in a register or on the stack), callee is responsible for
copying it somewhere else if needed.

For struct return, a pointer to return the struct into is provided by
the caller, and the callee copies the returned struct into this address.

If the caller ignores the return value, the caller provides a dummy
buffer for the return value.

If no prototype is provided... well, most likely the program crashes or similar.

So, in effect, the by-value semantics are mostly faked by the compiler.

It is roughly similar to the handling of C array types, which in this
case are also seen as a combination of a hidden pointer to the data, and
the backing data (the array's contents). The code-generator mostly
operates in terms of this hidden pointer.

By-Value Structs smaller than 16 bytes are passed as-if they were a 64
or 128 bit integer type (as a single register or as a register pair,
with a layout matching their in-memory representation).

...

But, yeah, at the IL level, one could potentially eliminate structs and
arrays as a separate construct, and instead have bare pointers and a
generic "reserve a blob of bytes in the frame and initialize this
pointer to point to it" operator (with the business end of this operator happening in the function prolog).

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Thu Dec 19 00:32:47 2024

From Newsgroup: comp.lang.c

On 18/12/2024 05:55, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 22:55:53 +0000, bart wrote:

On 17/12/2024 22:25, Lawrence D'Oliveiro wrote:

On Tue, 17 Dec 2024 19:45:49 +0000, bart wrote:

It's not aimed at people /implementing/ such a tool.

It is aimed at those capable of following the links to relevant specs.

It also a pretty terrible link.

Did you see this link <https://developer.mozilla.org/en-US/docs/WebAssembly/Reference>? Lots of examples from there.

I promised a example of Hello World using my IL, and how to process and
run it, in half a page. This is it for Windows :

------------------------------------
Paste the indented code into a file hello.pcl:

addlib "msvcrt"
extproc puts

proc main:::
setcall i32 /1
load u64 "Hello World!"
setarg u64 /1
callf i32 /1 &puts
unload i32
load i32 0
stop
retproc
endproc

Download the pc.exe file here: https://github.com/sal55/langs/blob/master/pc.exe, which is a 65KB file (UPX-compressed from 180KB). (Advice to navigate AV not included here.)

At a command prompt with both files present, type:

pc -r hello

This will convert it to x64 code and run it. Use 'pc' by itself to see
the 6 other processing options.
------------------------------------

So 20 non-blank lines. It would be nice if an equally simple example
existed for WASM/WAT, or if people who suggested that choice could post
a link to such an example /that/ works on Windows.
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Thu Dec 19 00:35:41 2024

From Newsgroup: comp.lang.c

On 19/12/2024 00:27, BGB wrote:

On 12/18/2024 1:43 PM, Thiago Adams wrote:

Em 12/18/2024 3:51 PM, BGB escreveu:

I took a different approach:
In the backend IR stage, structs are essentially treated as
references to the structure.

A local structure may be "initialized" via an IR operation, in which
point it will be assigned storage in the stack frame, and the
reference will be initialized to the storage area for the structure.

Most operations will pass them by reference.

Assigning a struct will essentially be turned into a struct-copy
operation (using the same mechanism as inline memcpy).

But what happens with calling a external C function that has a struct
X as parameter? (not pointer to struct)

In my ABI, if larger than 16 bytes, it is passed by reference (as a
pointer in a register or on the stack), callee is responsible for
copying it somewhere else if needed.

For struct return, a pointer to return the struct into is provided by
the caller, and the callee copies the returned struct into this address.

If the caller ignores the return value, the caller provides a dummy
buffer for the return value.

If no prototype is provided... well, most likely the program crashes or similar.

So, in effect, the by-value semantics are mostly faked by the compiler.

It is roughly similar to the handling of C array types, which in this
case are also seen as a combination of a hidden pointer to the data, and
the backing data (the array's contents). The code-generator mostly
operates in terms of this hidden pointer.

By-Value Structs smaller than 16 bytes are passed as-if they were a 64
or 128 bit integer type (as a single register or as a register pair,
with a layout matching their in-memory representation).

...

But, yeah, at the IL level, one could potentially eliminate structs and arrays as a separate construct, and instead have bare pointers and a
generic "reserve a blob of bytes in the frame and initialize this
pointer to point to it" operator (with the business end of this operator happening in the function prolog).

The problem with this, that I mentioned elsewhere, is how well it would
work with SYS V ABI, since the rules for structs are complex, and
apparently recursive.

Having just a block of bytes might not be enough.
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 18 23:46:21 2024

From Newsgroup: comp.lang.c

On 12/18/2024 6:35 PM, bart wrote:

On 19/12/2024 00:27, BGB wrote:

On 12/18/2024 1:43 PM, Thiago Adams wrote:

Em 12/18/2024 3:51 PM, BGB escreveu:

I took a different approach:
In the backend IR stage, structs are essentially treated as
references to the structure.

A local structure may be "initialized" via an IR operation, in which
point it will be assigned storage in the stack frame, and the
reference will be initialized to the storage area for the structure.

Most operations will pass them by reference.

Assigning a struct will essentially be turned into a struct-copy
operation (using the same mechanism as inline memcpy).

But what happens with calling a external C function that has a struct
X as parameter? (not pointer to struct)

In my ABI, if larger than 16 bytes, it is passed by reference (as a
pointer in a register or on the stack), callee is responsible for
copying it somewhere else if needed.

For struct return, a pointer to return the struct into is provided by
the caller, and the callee copies the returned struct into this address.

If the caller ignores the return value, the caller provides a dummy
buffer for the return value.

If no prototype is provided... well, most likely the program crashes
or similar.

So, in effect, the by-value semantics are mostly faked by the compiler.

It is roughly similar to the handling of C array types, which in this
case are also seen as a combination of a hidden pointer to the data,
and the backing data (the array's contents). The code-generator mostly
operates in terms of this hidden pointer.

By-Value Structs smaller than 16 bytes are passed as-if they were a 64
or 128 bit integer type (as a single register or as a register pair,
with a layout matching their in-memory representation).

...

But, yeah, at the IL level, one could potentially eliminate structs
and arrays as a separate construct, and instead have bare pointers and
a generic "reserve a blob of bytes in the frame and initialize this
pointer to point to it" operator (with the business end of this
operator happening in the function prolog).

The problem with this, that I mentioned elsewhere, is how well it would
work with SYS V ABI, since the rules for structs are complex, and
apparently recursive.

Having just a block of bytes might not be enough.

In my case, I am not bothering with the SysV style ABI's (well, along
with there not being any x86 or x86-64 target...).

For my ISA, it is a custom ABI, but follows mostly similar rules to some
of the other "Microsoft style" ABIs (where, I have noted that across
multiple targets, MS tools have tended to use similar ABI designs).

For my compiler targeting RISC-V, it uses a variation of RV's ABI rules. Argument passing is basically similar, but struct pass/return is
different; and it passes floating-point values in GPRs (and, in my own
ISA, all floating-point values use GPRs, as there are no FPU registers;
though FPU registers do exist for RISC-V).

Not likely a huge issue as one is unlikely to use ELF and PE/COFF in the
same program.

For the "OS" that runs on my CPU core, it is natively using PE/COFF, but
ELF is supported for RISC-V (currently PIE only). It generally needs to
use my own C library as I still haven't gotten glibc or musl libc to
work on it (and they work in a different way from my own C library).

Seemingly, something is going terribly wrong in the "dynamic linking"
process, but too hard to figure out in the absence of any real debugging interface (what debug mechanisms I have, effectively lack any symbols
for things inside "ld-linux.so"'s domain).

Theoretically, could make porting usermode software easier, as then I
could compile stuff as-if it were running on an RV64 port of Linux.

But, easier said than done.

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Thu Dec 19 11:27:03 2024

From Newsgroup: comp.lang.c

On 19/12/2024 05:46, BGB wrote:

On 12/18/2024 6:35 PM, bart wrote:

On 19/12/2024 00:27, BGB wrote:

By-Value Structs smaller than 16 bytes are passed as-if they were a
64 or 128 bit integer type (as a single register or as a register
pair, with a layout matching their in-memory representation).

...

But, yeah, at the IL level, one could potentially eliminate structs
and arrays as a separate construct, and instead have bare pointers
and a generic "reserve a blob of bytes in the frame and initialize
this pointer to point to it" operator (with the business end of this
operator happening in the function prolog).

The problem with this, that I mentioned elsewhere, is how well it
would work with SYS V ABI, since the rules for structs are complex,
and apparently recursive.

Having just a block of bytes might not be enough.

In my case, I am not bothering with the SysV style ABI's (well, along
with there not being any x86 or x86-64 target...).

I'd imagine it's worse with ARM targets as there are so many more
registers to try and deconstruct structs into.

For my ISA, it is a custom ABI, but follows mostly similar rules to some
of the other "Microsoft style" ABIs (where, I have noted that across multiple targets, MS tools have tended to use similar ABI designs).

When you do your own thing, it's easy.

In the 1980s, I didn't need to worry about call conventions used for
other software, since there /was/ no other software! I had to write everything, save for the odd calls to DOS which used some form of SYSCALL.

Then, arrays and structs were actually passed and returned by value (not
via hidden references), by copying the data to and from the stack.

However, I don't recall ever using the feature, as I considered it
efficient. I always used explicit references in my code.

For my compiler targeting RISC-V, it uses a variation of RV's ABI rules. Argument passing is basically similar, but struct pass/return is
different; and it passes floating-point values in GPRs (and, in my own
ISA, all floating-point values use GPRs, as there are no FPU registers; though FPU registers do exist for RISC-V).

Supporting C's variadic functions, which is needed for many languages
when calling C across an FFI, usually requires different rules. On Win64
ABI for example, by passing low variadic arguments in both GPRs and FPU registers.

/Implementing/ variadic functions (which only occurs if implementing C)
is another headache if it has to work with the ABI (which can be assumed
for a non-static function).

I barely have a working solution for Win64 ABI, which needs to be done
via stdarg.h, but wouldn't have a clue how to do it for SYS V.

(Even Win64 has problems, as it assumes a downward-growing stack; in my
IL interpreter, the stack grows upwards!)

Not likely a huge issue as one is unlikely to use ELF and PE/COFF in the same program.

For the "OS" that runs on my CPU core, it is natively using PE/COFF, but

That's interesting: you deliberately used one of the most complex file
formats around, when you could have devised your own?

I did exactly that at a period when my generated DLLs were buggy for
some reason (it turned out to be two reasons). I created a simple
dynamic library format of my own. Then I found the same format worked
also for executables.

But I needed a loader program to run them, as Windows obviously didn't understand the format. Such a program can be written in 800 lines of C,
and can dynamically libraries in both my format, and proper DLLs (not
the buggy ones I generated!).

A hello-world program is under 300 bytes compared with 2 or
2.5KB of EXE. And the format is portable to Linux, so no need to
generate ELF (but I haven't tried). Plus the format might be transparent
to AV software (haven't tried that either).

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Thu Dec 19 14:36:37 2024

From Newsgroup: comp.lang.c

On 12/19/2024 5:27 AM, bart wrote:

On 19/12/2024 05:46, BGB wrote:

On 12/18/2024 6:35 PM, bart wrote:

On 19/12/2024 00:27, BGB wrote:

By-Value Structs smaller than 16 bytes are passed as-if they were a
64 or 128 bit integer type (as a single register or as a register
pair, with a layout matching their in-memory representation).

...

But, yeah, at the IL level, one could potentially eliminate structs
and arrays as a separate construct, and instead have bare pointers
and a generic "reserve a blob of bytes in the frame and initialize
this pointer to point to it" operator (with the business end of this
operator happening in the function prolog).

The problem with this, that I mentioned elsewhere, is how well it
would work with SYS V ABI, since the rules for structs are complex,
and apparently recursive.

Having just a block of bytes might not be enough.

In my case, I am not bothering with the SysV style ABI's (well, along
with there not being any x86 or x86-64 target...).

I'd imagine it's worse with ARM targets as there are so many more
registers to try and deconstruct structs into.

Not messed much with the ARM64 ABI or similar, but I will draw the line
in the sand somewhere.

Struct passing/return is enough of an edge case that one can just sort
of declare it "no go" between compilers with "mostly but not strictly compatible" ABIs.

For my ISA, it is a custom ABI, but follows mostly similar rules to
some of the other "Microsoft style" ABIs (where, I have noted that
across multiple targets, MS tools have tended to use similar ABI
designs).

When you do your own thing, it's easy.

In the 1980s, I didn't need to worry about call conventions used for
other software, since there /was/ no other software! I had to write everything, save for the odd calls to DOS which used some form of SYSCALL.

Then, arrays and structs were actually passed and returned by value (not
via hidden references), by copying the data to and from the stack.

However, I don't recall ever using the feature, as I considered it efficient. I always used explicit references in my code.

Most of the time, one is passing/returning structures as pointers, and
not by value.

By value structures are usually small.

When a structure is not small, it is both simpler to implement, and
usually faster, to internally pass it by reference.

If you pass a large structure to a function by value, via an on-stack
copy, and the function assigns it to another location (say, a global variable):
Pass by reference: Only a single copy operation is needed;
Pass by value on-stack: At least two copy operations are needed.

One also needs to reserve enough space in the function arguments list to
hold any structures passed, which could be bad if they are potentially
large.

But, on my ISA, ABI is sort of like:
R4 ..R7 : Arg0 ..Arg3
R20..R23: Arg4 ..Arg7
R36..R39: Arg8 ..Arg11 (optional)
R52..R55: Arg12..Arg15 (optional)
Return Value:
R2, R3:R2 (128 bit)
R2 is also used to pass in the return value pointer.

'this':
Generally passed in either R3 or R18, depending on ABI variant.

Where, callee-save:
R8 ..R14, R24..R31,
R40..R47, R56..R63
R15=SP

Non-saved scratch:
R2 ..R7 , R16..R23,
R32..R39, R48..R55

Arguments beyond the first 8/16 register arguments are passed on stack.
In this case, a spill space for the first 8/16 arguments (64 or 128
bytes) is provided on stack before the first non-register argument.

If the function accepts a fixed number of arguments and the number of
argument registers is 8 or less, spill space need only be provided for
the first 8 arguments (calling vararg functions will always reserve
space for 16 registers in the 16-register ABI). This spill space
effectively belongs to the callee rather than the caller.

Structures (by value):
1.. 8 bytes: Passed in a single register
9..16 bytes: Passed in a pair, padded to the next even pair
17+: Pass as a reference.

Things like 128-bit types are also passed/returned in register pairs.

Contrast, RV ABI:
X10..X17 are used for arguments;
No spill space is provided;
...

My variant uses similar rules to my own ABI for passing/returning
structures, with:
X28, structure return pointer
X29, 'this'
Normal return values go into X10 or X11:X10.

Note that in both ABI's, passing 'this' in a register would mean that
class instances and COM objects are not equivalent (COM object methods
always pass 'this' as the first argument).

The 'this' register is implicitly also used by lambdas to pass in the
pointer to the captured bindings area (which mostly resembles a
structure containing each variable captured by the lambda).

Can note though that in this case, capturing a binding by reference
means the lambda is limited to automatic lifetime (non-automatic lambdas
may only capture by value). In this case, capture by value is the default.

For my compiler targeting RISC-V, it uses a variation of RV's ABI rules.
Argument passing is basically similar, but struct pass/return is
different; and it passes floating-point values in GPRs (and, in my own
ISA, all floating-point values use GPRs, as there are no FPU
registers; though FPU registers do exist for RISC-V).

Supporting C's variadic functions, which is needed for many languages
when calling C across an FFI, usually requires different rules. On Win64
ABI for example, by passing low variadic arguments in both GPRs and FPU registers.

I simplified things by assuming only GPRs are used.

/Implementing/ variadic functions (which only occurs if implementing C)
is another headache if it has to work with the ABI (which can be assumed
for a non-static function).

I barely have a working solution for Win64 ABI, which needs to be done
via stdarg.h, but wouldn't have a clue how to do it for SYS V.

(Even Win64 has problems, as it assumes a downward-growing stack; in my
IL interpreter, the stack grows upwards!)

Most targets use a downward growing stack.
Mine is no exception here...

Not likely a huge issue as one is unlikely to use ELF and PE/COFF in
the same program.

For the "OS" that runs on my CPU core, it is natively using PE/COFF, but

That's interesting: you deliberately used one of the most complex file formats around, when you could have devised your own?

For what I wanted, I would have mostly needed to recreate most of the
same functionality as PE/COFF anyways.

When one considers the entire loading process (including DLLs/SOs), then PE/COFF loading is actually simpler than ELF loading (ELF subjects the
loader to needing to deal with symbol and relocation tables), similar to
PIE loading.

Things like the MZ stub are optional in my case, and mostly ignored if
present (in my LZ compressed PE variants, the MZ stub is omitted entirely).

I had at one point considered doing a custom format resembling LZ
compressed MachO, but ended up not bothering, as it wouldn't have really
saved anything over LZ compressed PE/COFF.

Some "unneeded cruft" like the Resource Section was discarded, mostly
replaced by an embedded WAD2 image. The header was modified some to
allow for backwards compatibility with the Windows format (mostly
creating a dummy header in the original format that points to the WAD2 directory).

Idea is that icons, bitmaps, and other things, would mostly be held in
WAD lumps. Though, resources which may be accessed via symbols in the
EXE/DLL need to be stored uncompressed (where "__rsrc_lumpname" may be
used to access the contents of resource-section lumps as an extern symbol).

Say, for example:
extern byte __rsrc_mybitmap[]; //resolves to a DIB/BMP or similar

For now, resource formats:
Images:
BMP (various settings)
4, 8, and 16 bpp typical
Supports a non-standard 16-bpp alpha-blended mode (*1).
Supports non-standard 16 color and 256 color with transparent.
Supports CRAM BMP as well (2 bpp)
QOI (assumes RGBA32, nominally lossless)
QOI is a semi-simplistic non-entropy-coded format.
Can give PNG-like compression in some cases.
Reasonably fast/cheap to decode.
LCIF, custom lossy format, color-cell compression.
OK Q/bpp but mostly only on the low-end.
Resembles a QOI+CRAM hybrid.
UPIC, lossy or lossless, JPEG-like (*2)

*1:
0rrrrrgggggbbbbb Normal/Opaque
1rrrraggggabbbba With 3 bit alpha (4b/ch RGB).

For 16 and 256 color, a variant is supported with a transparent color. Generally the high intensity magenta is reused as the transparent color.
This is encoded in the color palette (if all colors apart from one have
the alpha bits set to FF, and one color has 00, then that color is
assumed to be a transparent color).

CRAM bpp: Uses a limited form of the 8-bit CRAM format:
16 bits, 4x4 pixels, 1 bit per pixel
2x 8 bits: Color Endpoints
The rest of the format being unsupported, so it can simply assume a
fixed 32-bits per 4x4 pixel cell.

*2: The UPIC format is structurally similar to JPEG, but:
Uses TLV packaging (vs FF-escape tagging);
Uses Rice coding (vs Huffman)
Uses Z3.V5 VLC, vs Z4.V4
Uses Block-Haar and RCT
Vs DCT and YCbCr.
Supports an alpha channel.
Y 1 (*2A)
YA 1:1 (*2A)
YUV 4:2:0
YUV 4:4:4 (*2A)
YUVA 4:2:0:4
YUVA 4:4:4:4 (*2A)
*2A: May be used in the lossless modes, depending on image.

VLC coding resembles Deflate's natch distance encoding, with sign-folded values. Runs of zero coefficients have a shorter limit, but similar.
Like with JPEG, an 0x00 symbol encodes an early EOB.

In tests, on my main PC:
Vs JPEG: It is a little faster
Q/bpp is similar, better/worse depends on image.
Slightly worse on photos, but "similar".
Generally somewhat better on artificial images.
Vs PNG:
Faster to decode (with less memory overhead);
Better compression on many images (particularly photo-like).

Note that UPIC was designed to not require any large intermediate
buffers, so will decode directly to an RGB555 or RGBA32 output buffer (decoding happens in terms of individual 16x16 pixel macroblocks).

It was designed to be moderately fast and to try to minimize memory
overhead for decoding (vs either PNG or JPEG, which need a more
significant chunk of working memory to decode).

Block-Haar is a Haar transform made to fit the same 8x8 pixel blocks as
DCT, where Haar maps (A,B)->(C,D):
C=(A+B)/2 (*: X/2 here being defined as (X>>1))
D=A-B
But, can be reversed exactly, IIRC:
B=C-(D/2)
A=B+D
By doing multiple stages of Haar transform, one can build an 8-pixel
version, and then use horizontal and vertical transforms for an 8x8
block. It is computationally fairly cheap, and lossless.

The Walsh-Hadamard transform can give similar properties, but generally involves a few extra steps that make it more computationally expensive.

It is possible to use a lifting transform to make a Reversible DCT, but
it is slow...

BGBCC accepts JPEG and PNG for input and can convert them to
BMP/QOI/UPIC as needed.

For audio storage, generally using the RIFF WAV format. For bulk audio,
both A-Law and IMA ADPCM work OK. Granted, IMA ADPCM is not space
efficient for stereo, but mostly OK for mono (most common use-case for
sound effects).

I did exactly that at a period when my generated DLLs were buggy for
some reason (it turned out to be two reasons). I created a simple
dynamic library format of my own. Then I found the same format worked
also for executables.

But I needed a loader program to run them, as Windows obviously didn't understand the format. Such a program can be written in 800 lines of C,
and can dynamically libraries in both my format, and proper DLLs (not
the buggy ones I generated!).

A hello-world program is under 300 bytes compared with 2 or
2.5KB of EXE. And the format is portable to Linux, so no need to
generate ELF (but I haven't tried). Plus the format might be transparent
to AV software (haven't tried that either).

OK.

By design, my PEL format (PE+LZ) isn't going to get under 2K (1K for
headers, 1K for LZ'ed sections).

But, usually this is not a problem.

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Fri Dec 20 05:10:44 2024

From Newsgroup: comp.lang.c

On 12/19/2024 2:36 PM, BGB wrote:

On 12/19/2024 5:27 AM, bart wrote:

On 19/12/2024 05:46, BGB wrote:

On 12/18/2024 6:35 PM, bart wrote:

On 19/12/2024 00:27, BGB wrote:

By-Value Structs smaller than 16 bytes are passed as-if they were a >>>>> 64 or 128 bit integer type (as a single register or as a register
pair, with a layout matching their in-memory representation).

...

But, yeah, at the IL level, one could potentially eliminate structs >>>>> and arrays as a separate construct, and instead have bare pointers
and a generic "reserve a blob of bytes in the frame and initialize
this pointer to point to it" operator (with the business end of
this operator happening in the function prolog).

The problem with this, that I mentioned elsewhere, is how well it
would work with SYS V ABI, since the rules for structs are complex,
and apparently recursive.

Having just a block of bytes might not be enough.

In my case, I am not bothering with the SysV style ABI's (well, along
with there not being any x86 or x86-64 target...).

I'd imagine it's worse with ARM targets as there are so many more
registers to try and deconstruct structs into.

Not messed much with the ARM64 ABI or similar, but I will draw the line
in the sand somewhere.

Struct passing/return is enough of an edge case that one can just sort
of declare it "no go" between compilers with "mostly but not strictly compatible" ABIs.

For my ISA, it is a custom ABI, but follows mostly similar rules to
some of the other "Microsoft style" ABIs (where, I have noted that
across multiple targets, MS tools have tended to use similar ABI
designs).

When you do your own thing, it's easy.

In the 1980s, I didn't need to worry about call conventions used for
other software, since there /was/ no other software! I had to write
everything, save for the odd calls to DOS which used some form of
SYSCALL.

Then, arrays and structs were actually passed and returned by value
(not via hidden references), by copying the data to and from the stack.

However, I don't recall ever using the feature, as I considered it
efficient. I always used explicit references in my code.

Most of the time, one is passing/returning structures as pointers, and
not by value.

By value structures are usually small.

When a structure is not small, it is both simpler to implement, and
usually faster, to internally pass it by reference.

If you pass a large structure to a function by value, via an on-stack
copy, and the function assigns it to another location (say, a global variable):
Pass by reference: Only a single copy operation is needed;
Pass by value on-stack: At least two copy operations are needed.

One also needs to reserve enough space in the function arguments list to hold any structures passed, which could be bad if they are potentially large.

But, on my ISA, ABI is sort of like:
R4 ..R7 : Arg0 ..Arg3
R20..R23: Arg4 ..Arg7
R36..R39: Arg8 ..Arg11 (optional)
R52..R55: Arg12..Arg15 (optional)
Return Value:
R2, R3:R2 (128 bit)
R2 is also used to pass in the return value pointer.

'this':
Generally passed in either R3 or R18, depending on ABI variant.

Where, callee-save:
R8 ..R14, R24..R31,
R40..R47, R56..R63
R15=SP

Non-saved scratch:
R2 ..R7 , R16..R23,
R32..R39, R48..R55

Arguments beyond the first 8/16 register arguments are passed on stack.
In this case, a spill space for the first 8/16 arguments (64 or 128
bytes) is provided on stack before the first non-register argument.

If the function accepts a fixed number of arguments and the number of argument registers is 8 or less, spill space need only be provided for
the first 8 arguments (calling vararg functions will always reserve
space for 16 registers in the 16-register ABI). This spill space
effectively belongs to the callee rather than the caller.

Structures (by value):
1.. 8 bytes: Passed in a single register
9..16 bytes: Passed in a pair, padded to the next even pair
17+: Pass as a reference.

Things like 128-bit types are also passed/returned in register pairs.

Contrast, RV ABI:
X10..X17 are used for arguments;
No spill space is provided;
...

My variant uses similar rules to my own ABI for passing/returning structures, with:
X28, structure return pointer
X29, 'this'
Normal return values go into X10 or X11:X10.

Note that in both ABI's, passing 'this' in a register would mean that
class instances and COM objects are not equivalent (COM object methods always pass 'this' as the first argument).

The 'this' register is implicitly also used by lambdas to pass in the pointer to the captured bindings area (which mostly resembles a
structure containing each variable captured by the lambda).

Can note though that in this case, capturing a binding by reference
means the lambda is limited to automatic lifetime (non-automatic lambdas
may only capture by value). In this case, capture by value is the default.

For my compiler targeting RISC-V, it uses a variation of RV's ABI rules. >>> Argument passing is basically similar, but struct pass/return is
different; and it passes floating-point values in GPRs (and, in my
own ISA, all floating-point values use GPRs, as there are no FPU
registers; though FPU registers do exist for RISC-V).

Supporting C's variadic functions, which is needed for many languages
when calling C across an FFI, usually requires different rules. On
Win64 ABI for example, by passing low variadic arguments in both GPRs
and FPU registers.

I simplified things by assuming only GPRs are used.

/Implementing/ variadic functions (which only occurs if implementing
C) is another headache if it has to work with the ABI (which can be
assumed for a non-static function).

I barely have a working solution for Win64 ABI, which needs to be done
via stdarg.h, but wouldn't have a clue how to do it for SYS V.

(Even Win64 has problems, as it assumes a downward-growing stack; in
my IL interpreter, the stack grows upwards!)

Most targets use a downward growing stack.
Mine is no exception here...

Not likely a huge issue as one is unlikely to use ELF and PE/COFF in
the same program.

For the "OS" that runs on my CPU core, it is natively using PE/COFF, but >>

That's interesting: you deliberately used one of the most complex file
formats around, when you could have devised your own?

For what I wanted, I would have mostly needed to recreate most of the
same functionality as PE/COFF anyways.

When one considers the entire loading process (including DLLs/SOs), then PE/COFF loading is actually simpler than ELF loading (ELF subjects the loader to needing to deal with symbol and relocation tables), similar to
PIE loading.

My wording there sucked...

PIE loading is the same as the case for ELF shared object loading, so is fairly complex.

For normal loading, they try to make it simpler for the kernel loader by having a special "interpreter" program deal with it. The process it then
uses to bootstrap itself is rather convoluted.

Things like the MZ stub are optional in my case, and mostly ignored if present (in my LZ compressed PE variants, the MZ stub is omitted entirely).

My loader will accept multiple sub-variants:
With MZ stub (original format);
Without MZ stub (but uncompressed);
With LZ4 compression (no MZ stub allowed).

The format for the no-stub case is basically the same as the with-stub
case, except that the stub is absent and thus the 'PE' sig is still present.

Note that in my variants, omitting the MZ stub does cause it to change
to a different checksum algorithm (the original PE/COFF checksum being unacceptably weak).

I had at one point considered doing a custom format resembling LZ
compressed MachO, but ended up not bothering, as it wouldn't have really saved anything over LZ compressed PE/COFF.

The core process is still:
Read stuff into memory;
Apply post-load fixups.

This part of the process was essentially unavoidable.

Some "unneeded cruft" like the Resource Section was discarded, mostly replaced by an embedded WAD2 image. The header was modified some to
allow for backwards compatibility with the Windows format (mostly
creating a dummy header in the original format that points to the WAD2 directory).

Note that the change of resource section format was more because the
original approach to the resource section made little sense to me.

Identifying things with short names made a lot more sense than magic
numbers.

The WAD approach Worked for Doom and similar, probably sufficient for
things like inline bitmap images and icons.

Idea is that icons, bitmaps, and other things, would mostly be held in
WAD lumps. Though, resources which may be accessed via symbols in the EXE/DLL need to be stored uncompressed (where "__rsrc_lumpname" may be
used to access the contents of resource-section lumps as an extern symbol).

Note that it can also load blobs of text or binary data.
Though, BGBCC provides less in terms of format converters for arbitrary
data.

A special text format is used both to define files to pull into the
resource section (and what lump name to use), as well as format
conversions to apply.

Say, for example:
extern byte __rsrc_mybitmap[]; //resolves to a DIB/BMP or similar

For now, resource formats:
Images:
    BMP (various settings)
      4, 8, and 16 bpp typical
      Supports a non-standard 16-bpp alpha-blended mode (*1).
      Supports non-standard 16 color and 256 color with transparent.
      Supports CRAM BMP as well (2 bpp)
    QOI (assumes RGBA32, nominally lossless)
      QOI is a semi-simplistic non-entropy-coded format.
      Can give PNG-like compression in some cases.
      Reasonably fast/cheap to decode.
    LCIF, custom lossy format, color-cell compression.
      OK Q/bpp but mostly only on the low-end.
      Resembles a QOI+CRAM hybrid.
    UPIC, lossy or lossless, JPEG-like (*2)

*1:
0rrrrrgggggbbbbb Normal/Opaque
1rrrraggggabbbba With 3 bit alpha (4b/ch RGB).

For 16 and 256 color, a variant is supported with a transparent color. Generally the high intensity magenta is reused as the transparent color. This is encoded in the color palette (if all colors apart from one have
the alpha bits set to FF, and one color has 00, then that color is
assumed to be a transparent color).

CRAM bpp: Uses a limited form of the 8-bit CRAM format:
16 bits, 4x4 pixels, 1 bit per pixel
2x 8 bits: Color Endpoints
The rest of the format being unsupported, so it can simply assume a
fixed 32-bits per 4x4 pixel cell.

There being cases where one may want this...
If an image doesn't have more than 2 colors per 4x4 cell, it may give an acceptable image (and is often less space than 16-color).

Though, for small images, 16 color may use less space due to a smaller
color palette (but, in theory, could add a special case to allow
omitting the color palette when it is the default palette).

Say:
biBitCount=8, biClrUsed=0, biClrImportant=256
Encoding a special "palette is absent, use fixed OS palette" case.
As the BMP format burns 1K just to encode a 256-color palette.

*2: The UPIC format is structurally similar to JPEG, but:
Uses TLV packaging (vs FF-escape tagging);
Uses Rice coding (vs Huffman)
Uses Z3.V5 VLC, vs Z4.V4
Uses Block-Haar and RCT
    Vs DCT and YCbCr.
Supports an alpha channel.
    Y    1       (*2A)
    YA   1:1     (*2A)
    YUV 4:2:0
    YUV 4:4:4   (*2A)
    YUVA 4:2:0:4
    YUVA 4:4:4:4 (*2A)
*2A: May be used in the lossless modes, depending on image.

VLC coding resembles Deflate's natch distance encoding, with sign-folded values. Runs of zero coefficients have a shorter limit, but similar.
Like with JPEG, an 0x00 symbol encodes an early EOB.

^ match. Also, UPIC is a custom format.

Add context:
Actually, it is using an entropy coding scheme I call STF+AdRice:
Swap towards front, with Adaptive Rice Coding.

The Rice coding parameter (k) is adapted based on Q:
0: k--;
1: no change;
2..7: k++
8: k++; Symbol index encoded as a raw 8 bits.

Symbols are encoded as indices into a table. Whenever an index is
encoded, the symbol swaps places with the symbol at (I*15)/16, causing
more commonly used symbols to migrate towards 0.

Theoretically, the decoding process is more complex than a table-driven
static Huffman decoder (as well as worse compression), but:
Less memory is needed;
Faster to initialize;
On average, it is speed competitive.
Lookup table initialization for static Huffman is expensive;
Decode speed hindered by high L1 miss rates.

With a 15-bit symbol-length limit, Huffman has a very high L1 miss rate. Generally, to be fast, one needs to impose a 12 or 13 bit symbol length
limit, reducing compression, but greatly reducing the number of L1
misses. Though, 12 bits is a lower limit in practice (going much less
than this, and Huffman coding becomes ineffective).

In tests, on my main PC:
Vs JPEG: It is a little faster
    Q/bpp is similar, better/worse depends on image.
      Slightly worse on photos, but "similar".
      Generally somewhat better on artificial images.
Vs PNG:
    Faster to decode (with less memory overhead);
    Better compression on many images (particularly photo-like).

Note that UPIC was designed to not require any large intermediate
buffers, so will decode directly to an RGB555 or RGBA32 output buffer (decoding happens in terms of individual 16x16 pixel macroblocks).

It was designed to be moderately fast and to try to minimize memory
overhead for decoding (vs either PNG or JPEG, which need a more
significant chunk of working memory to decode).

Block-Haar is a Haar transform made to fit the same 8x8 pixel blocks as
DCT, where Haar maps (A,B)->(C,D):
C=(A+B)/2 (*: X/2 here being defined as (X>>1))
D=A-B
But, can be reversed exactly, IIRC:
B=C-(D/2)
A=B+D
By doing multiple stages of Haar transform, one can build an 8-pixel version, and then use horizontal and vertical transforms for an 8x8
block. It is computationally fairly cheap, and lossless.

The Walsh-Hadamard transform can give similar properties, but generally involves a few extra steps that make it more computationally expensive.

It is possible to use a lifting transform to make a Reversible DCT, but
it is slow...

Also, the code-size footprint for UPIC is smaller than a JPEG decoder.

BGBCC accepts JPEG and PNG for input and can convert them to BMP/QOI/
UPIC as needed.

For audio storage, generally using the RIFF WAV format. For bulk audio,
both A-Law and IMA ADPCM work OK. Granted, IMA ADPCM is not space
efficient for stereo, but mostly OK for mono (most common use-case for
sound effects).

This isn't used much yet in this project.

In general, for other cases where I use audio, 16kHz is a typical default.

Where:
8 and 11 kHz sound poor.
Also 8-bit linear PCM sounds poor.

I am less a fan of MP3:
Very complex decoder;
Much under 96 or 128 kbps, has very obvious audio distortions...
At lower bitrates, the audio quality is decidedly unpleasant.
IMHO: 16 kHz ADPCM sounds better than 64 kbps MP3.

Not sure why it is so possible, when, as noted, at lower bitrates it
sounds pretty broken (but, then again, it mostly sounds much fine at 128
kbps or beyond, so dunno).

ADPCM's property of sounding tinny is still preferable to sounding like
one is rattling a steel can full of broken glass, IMHO.

Did experimentally create an MP3-like audio codec (but much simpler),
also using Block-Haar (rather than MDCT), and reused some amount of code
from UPIC, which seems to avoid some of MP3's more obvious artifacts.
But, the design did have a few of its own issues (might need to revisit later).

Mostly, it uses a half-cubic spline to approximate the low-frequency components (and try to reduce blocking artifacts; the spline is
subtracted out so only higher frequency components use the Block-Haar),
but seemingly the spline was too coarse (one sample per block), and I
would likely need a higher effective sampling rate for the spline to
avoid blocking artifacts in some cases (mostly, with sounds at roughly
the same frequency as the block size effectively resulting in square
waves, which sound bad).

I did exactly that at a period when my generated DLLs were buggy for
some reason (it turned out to be two reasons). I created a simple
dynamic library format of my own. Then I found the same format worked
also for executables.

But I needed a loader program to run them, as Windows obviously didn't
understand the format. Such a program can be written in 800 lines of
C, and can dynamically libraries in both my format, and proper DLLs
(not the buggy ones I generated!).

A hello-world program is under 300 bytes compared with 2 or
2.5KB of EXE. And the format is portable to Linux, so no need to
generate ELF (but I haven't tried). Plus the format might be
transparent to AV software (haven't tried that either).

OK.

By design, my PEL format (PE+LZ) isn't going to get under 2K (1K for headers, 1K for LZ'ed sections).

But, usually this is not a problem.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Fri Dec 20 17:28:29 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Sat Dec 21 05:34:07 2024

From Newsgroup: comp.lang.c

On Tue, 17 Dec 2024 13:07:44 -0600, BGB wrote:

Every variable may only be assigned once ...

Note this only applies to registers.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sat Dec 21 21:31:24 2024

From Newsgroup: comp.lang.c

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

(Unless you just wanted to say that in some HLL abstraction like
'printf("Hello world!\n")' there's no [visible] conditional branch.
Likewise in a 'ClearAccumulator' machine instruction, or the like.)

The comparisons and predicates are one key function (not any specific
branch construct, whether on HLL level, assembler level, or with the (elementary but most powerful) Turing Machine). Comparisons inherently
result in predicates which is what controls program execution).

So your statement asks for some explanation at least.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Dec 21 13:51:27 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

(Unless you just wanted to say that in some HLL abstraction like 'printf("Hello world!\n")' there's no [visible] conditional branch.
Likewise in a 'ClearAccumulator' machine instruction, or the like.)

The comparisons and predicates are one key function (not any specific
branch construct, whether on HLL level, assembler level, or with the (elementary but most powerful) Turing Machine). Comparisons inherently result in predicates which is what controls program execution).

So your statement asks for some explanation at least.

Start with C - any of C90, C99, C11.

Take away the short-circuiting operators - &&, ||, ?:.

Take away all statement types that involve intra-function transfer
of control: goto, break, continue, if, for, while, switch, do/while.
Might as well take away statement labels too.

Take away setjmp and longjmp.

Rule out programs with undefined behavior.

The language that is left is still Turing complete.

Proof: exercise for the reader.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Dec 22 00:20:32 2024

From Newsgroup: comp.lang.c

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

Janis

I would guess that Tim worked as CS professor for several dozens years.
And it shows.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Dec 22 01:13:07 2024

From Newsgroup: comp.lang.c

On 21.12.2024 23:20, Michael S wrote:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens years.
And it shows.

Ranks and titles are, per se, no guarantee. I'm not impressed; I've
seen all sorts/qualities of professors. YMMV.

If that is true (that he was one) I'm wondering why we observe so
often that he posts statements here and doesn't care to explain it.
At least the many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him).

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Dec 22 02:18:51 2024

From Newsgroup: comp.lang.c

On Sun, 22 Dec 2024 01:13:07 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 23:20, Michael S wrote:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens
years. And it shows.

Ranks and titles are, per se, no guarantee. I'm not impressed; I've
seen all sorts/qualities of professors. YMMV.

If that is true (that he was one) I'm wondering why we observe so
often that he posts statements here and doesn't care to explain it.
At least the many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him).

Janis

It seems, you didn't understand me. (Ogh, it is contagious ;-)

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Dec 22 01:22:01 2024

From Newsgroup: comp.lang.c

On 21.12.2024 22:51, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

(Unless you just wanted to say that in some HLL abstraction like
'printf("Hello world!\n")' there's no [visible] conditional branch.
Likewise in a 'ClearAccumulator' machine instruction, or the like.)

The comparisons and predicates are one key function (not any specific
branch construct, whether on HLL level, assembler level, or with the
(elementary but most powerful) Turing Machine). Comparisons inherently
result in predicates which is what controls program execution).

So your statement asks for some explanation at least.

Start with C - any of C90, C99, C11.

Take away the short-circuiting operators - &&, ||, ?:.

Take away all statement types that involve intra-function transfer
of control: goto, break, continue, if, for, while, switch, do/while.
Might as well take away statement labels too.

Take away setjmp and longjmp.

And also things like the above mentioned 'printf()' that most certainly
implies an iteration over the format string checking for it's '\0'-end.
And so on, and so on. - What will be left as "language".

Would you be able to formulate functionality of the class of Recursive Functions (languages class of a Turing Machine with Chomsky-0 grammar).

Rule out programs with undefined behavior.

The language that is left is still Turing complete.

Is it? - But wouldn't that be just the argument I mentioned above; that
a, say, 'ClearAccumulator' machine statement wouldn't contain any jump?

Proof: exercise for the reader.

(Typical sort of your reply.)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Dec 22 01:39:49 2024

From Newsgroup: comp.lang.c

On 22.12.2024 01:18, Michael S wrote:

On Sun, 22 Dec 2024 01:13:07 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 23:20, Michael S wrote:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens
years. And it shows.

Ranks and titles are, per se, no guarantee. I'm not impressed; I've
seen all sorts/qualities of professors. YMMV.

If that is true (that he was one) I'm wondering why we observe so
often that he posts statements here and doesn't care to explain it.
At least the many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him).

It seems, you didn't understand me. (Ogh, it is contagious ;-)

I'm sorry, no. - I certainly took it literally - as I do (at first)
with most people and their statements (until I get to know better).

If it was meant sarcastically or anything, I'd appreciate a smiley
or something like that. (It certainly wasn't obvious to me.)

If it was meant serious and I completely missed the point - which
may also happen occasionally - I'd appreciate a pointer.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Dec 22 03:04:51 2024

From Newsgroup: comp.lang.c

On Sun, 22 Dec 2024 01:39:49 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 22.12.2024 01:18, Michael S wrote:

On Sun, 22 Dec 2024 01:13:07 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 23:20, Michael S wrote:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens
years. And it shows.

Ranks and titles are, per se, no guarantee. I'm not impressed; I've
seen all sorts/qualities of professors. YMMV.

If that is true (that he was one) I'm wondering why we observe so
often that he posts statements here and doesn't care to explain it.
At least the many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him).

It seems, you didn't understand me. (Ogh, it is contagious ;-)

I'm sorry, no. - I certainly took it literally - as I do (at first)
with most people and their statements (until I get to know better).

If it was meant sarcastically or anything, I'd appreciate a smiley
or something like that. (It certainly wasn't obvious to me.)

If it was meant serious and I completely missed the point - which
may also happen occasionally - I'd appreciate a pointer.

Janis

Part of the answer is in your previous response.
You wrote: "many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him)". You essentially admitted that not all good professors behave like that.

There is more than one school of teaching. One school believes that
students learn from explanations and exercises. Other school believes
that students learn best when provided with bare basics and then asked
to figure out the rest by themselves. There is also the third school
that believes that student don't really learn anything before they try
to explain it to somebody else.

You make an impression of one that received basics of CS. Probably, 40
or so years ago, but still you have to know basic facts. Unlike me, for example.
So, Tim expects that you will be able to utilizes his hints. And that
it would lead to much better understanding on your part then if he
feeds you by teaspoon.
That is one part. Another part is that he is annoyed by your tone.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Dec 22 03:06:54 2024

From Newsgroup: comp.lang.c

On 22.12.2024 02:04, Michael S wrote:

[...]

Part of the answer is in your previous response.
You wrote: "many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him)". You essentially admitted that not all good professors behave like that.

Oh, what I meant to express was different; that good professors
*would* explain it (only bad ones wouldn't).

(At least that was my experience; and not only covering the CS
domain, BTW.)

[ "schools of teaching" stuff snipped ]

You make an impression of one that received basics of CS. Probably, 40
or so years ago, but still you have to know basic facts. Unlike me, for example.
So, Tim expects that you will be able to utilizes his hints.

The point [repeatedly] stated (also by others here) was that
he more often than not just provides no information but simple
arbitrary statements of opinion.

*Especially* if folks here that are discussing CS stuff have 40
or 50 years experience, as you say, with academical and practical
background one would think that a non-substantial "kindergarten"
statement is then effectively just an offense (or likely part of
a arrogant [professorial?] behavior).

And that
it would lead to much better understanding on your part then if he
feeds you by teaspoon.

Which he doesn't do.

Moreover, given that many of the folks here obviously *do* have
a solid background (or at least years long IT or CS experiences)
should, IMO, be a motivation to try to explain any arguable point
if one really cares about the topic. (Unless some habit, maybe of
being an inerrant authority, prevents one from such.)

Myself I'm at least trying to explain knowledges and backup by
experiences, not just throw short phrases into the pool.

That is one part. Another part is that he is annoyed by your tone.

(And I'm annoyed by his. But, anyway, his posting tone is that
same in most of his responses to folks here, not just to me.)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sat Dec 21 22:17:20 2024

From Newsgroup: comp.lang.c

On 12/21/24 20:04, Michael S wrote:
...

There is more than one school of teaching. One school believes that
students learn from explanations and exercises. Other school believes
that students learn best when provided with bare basics and then asked
to figure out the rest by themselves.

I personally believe that Tim generally thinks there's a justification
for what he says, and that we'd be better off figuring it out ourselves.
I also know, from the rare occasions when he's been convinced to provide
his justification, that I often don't consider his justification valid. However, he says things that seem to be unjustified so often, I can't
help wondering if he doesn't occasionally say things he realizes are unjustified (either at the time, or as the result of subsequent
discussion), and withholds his justifications in order to hide the fact
that he knows he was wrong. Probably not, but I keep wondering.
--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Dec 22 06:01:52 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

Or try to figure out how to do this knowing that C has function
pointers.
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Sun Dec 22 11:22:51 2024

From Newsgroup: comp.lang.c

On Sun, 22 Dec 2024 06:01:52 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via
goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

Considering that Janis replied to your post I find a possibility that
he did not look at it unlikely. Although not completely impossible.

Or try to figure out how to do this knowing that C has function
pointers.

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Sun Dec 22 11:35:53 2024

From Newsgroup: comp.lang.c

On 22/12/2024 09:22, Michael S wrote:

On Sun, 22 Dec 2024 06:01:52 -0000 (UTC)
antispam@fricas.org (Waldek Hebisch) wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via
goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

Considering that Janis replied to your post I find a possibility that
he did not look at it unlikely. Although not completely impossible.

He only replied to the first remark. And summarised the rest with:

"[ ponderings about where recursive functions might be used ]"

(18-Dec, 16:26 GMT)

I don't think JP does details, and I've struggled to find posts where he writes actual code. His replies to mine have mostly been about trying to
beat me over the head.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben@bsb.me.uk to comp.lang.c on Sun Dec 22 14:19:13 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

I don't want to speak for Tim, but as far as I am concerned, it all
boils down to what you take to be a model of (effective) computation.
In some purely theoretical sense, models like the pure lambda calculus
and combinator calculus are "complete" and they have no specific
conditional "branches".

Going into detail (such as examples of making a "choice" in pure lambda calculus) are way off topic here.

This is exactly what comp.theory should be used for, so I will cross
post there and set the followup-to header. comp.theory has been trashed
by cranks but maybe a topical post will help it a but.
--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben@bsb.me.uk to comp.lang.c on Sun Dec 22 15:30:30 2024

From Newsgroup: comp.lang.c

Ben Bacarisse <ben@bsb.me.uk> writes:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

I don't want to speak for Tim, but as far as I am concerned, it all
boils down to what you take to be a model of (effective) computation.
In some purely theoretical sense, models like the pure lambda calculus
and combinator calculus are "complete" and they have no specific
conditional "branches".

Going into detail (such as examples of making a "choice" in pure lambda calculus) are way off topic here.

This is exactly what comp.theory should be used for, so I will cross
post there and set the followup-to header. comp.theory has been trashed
by cranks but maybe a topical post will help it a but.

I see from a post I had not read before replying that Tim's point was
very much focused on C. Given that theory is off topic here (and
comp.theory is a mess) there is probably no point in trying to
discussing the more general point.
--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Dec 22 10:38:11 2024

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts [...]

What makes you think I didn't?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Dec 22 19:51:13 2024

From Newsgroup: comp.lang.c

On 22.12.2024 04:17, James Kuyper wrote:

On 12/21/24 20:04, Michael S wrote:
...

There is more than one school of teaching. One school believes that
students learn from explanations and exercises. Other school believes
that students learn best when provided with bare basics and then asked
to figure out the rest by themselves.

In context of this newsgroup where my impression is that there's a lot
of years long IT/CS experienced and quite old people discussing topics
the explanatory "model" of "schools of teaching" is anyway completely inappropriate; there's not "one _teacher_ [who knows almost all]" and
"all the rest are [ignorant] _pupils_" that need to be "guided" (in
one way or the other). Not saying anything substantial on a topic can
certainly be perceived as some rhetorical move but it's surely not any
sort of teaching-didactics [of whatever "school of teaching"]).

I personally believe that Tim generally thinks there's a justification
for what he says, and that we'd be better off figuring it out ourselves.

(My impression is that he often says something on a topic where he has
no deeper knowledge, but is pretending to know by not saying anything substantial.)

I also know, from the rare occasions when he's been convinced to provide
his justification, that I often don't consider his justification valid. However, he says things that seem to be unjustified so often, I can't
help wondering if he doesn't occasionally say things he realizes are unjustified (either at the time, or as the result of subsequent
discussion), and withholds his justifications in order to hide the fact
that he knows he was wrong. Probably not, but I keep wondering.

(This matches with my observations and I drew a similar conclusion.)

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.c on Sun Dec 22 20:41:44 2024

From Newsgroup: comp.lang.c

On 22.12.2024 07:01, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

I'm not sure but may have just skimmed over your "C" example if it
wasn't of interest to the point I tried to make (at that stage).

Or try to figure out how to do this knowing that C has function
pointers.

I will retry to explain what I tried to say... - very simply put...

There's "Recursive Functions" and the Turing Machines "equivalent".
The "Recursive Functions" is the most powerful class of algorithms.
Formal Recursive Functions are formally defined in terms of abstract mathematical formulated properties; one of these [three properties]
are the "Test Sets". (Here I can already stop.)

But since we're not in a theoretical CS newsgroup I'd just wanted
to see an example of some common, say, mathematical function and
see it implemented without 'if' and 'goto' or recursion. - Take a
simple one, say, fac(n) = n! , the factorial function. I know how
I can implement that with 'if' and recursion, and I know how I can
implement that with 'while' (or 'goto').

If I re-inspect your example upthread - I hope it was the one you
wanted to refer to - I see that you have removed the 'if' symbol
but not the conditional, the test function; there's still the
predicate (the "Test Set") present in form of 'int c2 = i < n',
and it's there in the original code, in the goto transformed code,
and in the function-pointer code. And you cannot get rid of that.

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

Janis

--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Dec 22 19:44:49 2024

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts [...]

What makes you think I didn't?

I made the same claim as you earlier and gave examples. You
did not acknowledge my posts. Why? For me most natural
explanation is that you did not read them.
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Sun Dec 22 21:45:14 2024

From Newsgroup: comp.lang.c

On 2024-12-21, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

In a functional langauge, we can make a decision by, for instance,
putting two lambdas into an array A, and then calling A[0] or A[1],
where the index 0 or 1 is comes from some Boolean result.

The only reason we have a control construct like if(A, X, Y) where X
is only evaluated if A is true, otherwise Y, is that X and Y
have side effects.

If X and Y don't have side effects, then if(A, X, Y) can be an ordinary function whose arguments are strictly evaluated.

Moreover, if we give the functional language lazy evaluation semantics,
then anyway we get the behavior that Y is not evaluated if A is true,
and that lazy evaluation model can be used as the basis for sneaking
effects into the functional language and conctrolling them.

Anyway, Turing calculation by primitive recursion does not require
conditional branching. Just perhaps an if function which returns
either its second or third argument based on the truth value of
its first argument.

For instance, in certain C preprocessor tricks, conditional expansion
is achieved by such macros.

When we run the following through the GNU C preprocessor (e.g. by pasting
into gcc -E -x c -p -):

#define TRUE_SELECT_TRUE(X) X
#define TRUE_SELECT_FALSE(X)

#define FALSE_SELECT_TRUE(X)
#define FALSE_SELECT_FALSE(X) X

#define SELECT_TRUE(X) X
#define SELECT_FALSE(X)

#define PASTE(X, Y) X ## Y

#define IF(A, B, C) PASTE(TRUE_SELECT_, A)(B) PASTE(FALSE_SELECT_, A)(C)

#define FOO TRUE
#define BAR FALSE

IF(FOO, foo is true, foo is false)
IF(BAR, bar is true, bar is false)

We get these tokens:

foo is true
bar is false

Yet, macro expansion has no conditionals. The preprocessing language has
#if and #ifdef, but we didn't use those. Just expansion of computed names.

This is an example of not strictly needing conditionals to achieve
conditional evaluation or expansion: an IF(A, B, C) operator that
yields B or C depending on the truth of A, and so forth.

John MacCarthy (Lisp inventor) wrote himself such an IF function
in Fortran, in a program for calculating chess moves. It evaluated
both the B and C expressions, and so it wasn't a proper imperative
conditional, but it didn't matter.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Dec 23 00:20:48 2024

From Newsgroup: comp.lang.c

On Sun, 22 Dec 2024 20:41:44 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 22.12.2024 07:01, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via
goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

I'm not sure but may have just skimmed over your "C" example if it
wasn't of interest to the point I tried to make (at that stage).

Or try to figure out how to do this knowing that C has function
pointers.

I will retry to explain what I tried to say... - very simply put...

There's "Recursive Functions" and the Turing Machines "equivalent".
The "Recursive Functions" is the most powerful class of algorithms.
Formal Recursive Functions are formally defined in terms of abstract mathematical formulated properties; one of these [three properties]
are the "Test Sets". (Here I can already stop.)

But since we're not in a theoretical CS newsgroup I'd just wanted
to see an example of some common, say, mathematical function and
see it implemented without 'if' and 'goto' or recursion. - Take a
simple one, say, fac(n) = n! , the factorial function. I know how
I can implement that with 'if' and recursion, and I know how I can
implement that with 'while' (or 'goto').

If I re-inspect your example upthread - I hope it was the one you
wanted to refer to - I see that you have removed the 'if' symbol
but not the conditional, the test function; there's still the
predicate (the "Test Set") present in form of 'int c2 = i < n',
and it's there in the original code, in the goto transformed code,
and in the function-pointer code. And you cannot get rid of that.

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

Janis

You make no sense. I am starting to suspect that the reason for it
is ignorance rather than mere stubbornness.

https://godbolt.org/z/EKo5rrYce
Show me conditional branch in the right pane.

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Sun Dec 22 23:22:36 2024

From Newsgroup: comp.lang.c

On 22/12/2024 21:45, Kaz Kylheku wrote:

On 2024-12-21, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

In a functional langauge, we can make a decision by, for instance,
putting two lambdas into an array A, and then calling A[0] or A[1],
where the index 0 or 1 is comes from some Boolean result.

The only reason we have a control construct like if(A, X, Y) where X
is only evaluated if A is true, otherwise Y, is that X and Y
have side effects.

If X and Y don't have side effects, then if(A, X, Y) can be an ordinary function whose arguments are strictly evaluated.

Moreover, if we give the functional language lazy evaluation semantics,
then anyway we get the behavior that Y is not evaluated if A is true,
and that lazy evaluation model can be used as the basis for sneaking
effects into the functional language and conctrolling them.

Anyway, Turing calculation by primitive recursion does not require conditional branching. Just perhaps an if function which returns
either its second or third argument based on the truth value of
its first argument.

For instance, in certain C preprocessor tricks, conditional expansion
is achieved by such macros.

When we run the following through the GNU C preprocessor (e.g. by pasting into gcc -E -x c -p -):

#define TRUE_SELECT_TRUE(X) X
#define TRUE_SELECT_FALSE(X)

#define FALSE_SELECT_TRUE(X)
#define FALSE_SELECT_FALSE(X) X

#define SELECT_TRUE(X) X
#define SELECT_FALSE(X)

#define PASTE(X, Y) X ## Y

#define IF(A, B, C) PASTE(TRUE_SELECT_, A)(B) PASTE(FALSE_SELECT_, A)(C)

#define FOO TRUE
#define BAR FALSE

IF(FOO, foo is true, foo is false)
IF(BAR, bar is true, bar is false)

We get these tokens:

foo is true
bar is false

So, how long did it take to debug? (I've no idea how it works. If I
change all TRUE/FALSE to BART/LISA respectively, it still gives the same output. I'm not sure how germane such an example is.)

Yet, macro expansion has no conditionals. The preprocessing language has
#if and #ifdef, but we didn't use those. Just expansion of computed names.

This is an example of not strictly needing conditionals to achieve conditional evaluation or expansion: an IF(A, B, C) operator that
yields B or C depending on the truth of A, and so forth.

John MacCarthy (Lisp inventor) wrote himself such an IF function
in Fortran, in a program for calculating chess moves. It evaluated
both the B and C expressions, and so it wasn't a proper imperative conditional, but it didn't matter.

You mean like this one:

int IF(int c, int a, int b) {
return a*!!c + b*!c;
}

I think most languages can manage that. I guess there was a reason
McCarthy needed it rather than use Fortran's existing IF statement,
other than, being the Lisp guy, that was how his mind worked.)

--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Sun Dec 22 23:29:50 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 22.12.2024 07:01, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

I'm not sure but may have just skimmed over your "C" example if it
wasn't of interest to the point I tried to make (at that stage).

Or try to figure out how to do this knowing that C has function
pointers.

I will retry to explain what I tried to say... - very simply put...

There's "Recursive Functions" and the Turing Machines "equivalent".
The "Recursive Functions" is the most powerful class of algorithms.
Formal Recursive Functions are formally defined in terms of abstract mathematical formulated properties; one of these [three properties]
are the "Test Sets". (Here I can already stop.)

Classic definition uses some number of base functions, some
number of base conditions, conditional definitions and
"minimum operator". "Minimum operator" given a (possibly
partially defined) function f and l computes smallest n such that
f(k, l) is defined for k=0,1,...,n and f(n, l) = 0 and is undefined
otherwise. Some texts require minimum to be effective, that
is f should be total and for each l there should be n >= 0 such
that f(n, l) = 0. Clearly "minimum operator" is equvalent to
'while' loop. IIRC, if instead of "minimum operator" you only
recursion, then resulting class of functions is strictly smaller.
So assuming that I remember correctly, in framework of recursive
functions claim that conditianals and recursion give Turing
completness is false, one needs some "programming" constructs.

Anyway, using recursion you clearly need some way to stop it. If you
restrict yourself to eagerly evaluated total integer valued functions
only, then clearly there is no way to stop recursion. But if
you have different system like lambda calculus or C, then there
are ways to stop recursion that are quite different than 'if'
or tertiary operator.

But since we're not in a theoretical CS newsgroup I'd just wanted
to see an example of some common, say, mathematical function and
see it implemented without 'if' and 'goto' or recursion.

To be clear: I need recursion in general. I do not need 'if'
to stop recursion.

- Take a
simple one, say, fac(n) = n! , the factorial function. I know how
I can implement that with 'if' and recursion, and I know how I can
implement that with 'while' (or 'goto').

If I re-inspect your example upthread - I hope it was the one you
wanted to refer to - I see that you have removed the 'if' symbol
but not the conditional, the test function; there's still the
predicate (the "Test Set") present in form of 'int c2 = i < n',

You failed to see that this is on ordinary total function: it
evaluates both arguments and produces a value. If I take the
following C function:

int lt(int a, int b) {
return (a < b);
}

and compile it using 'gcc -O -S' I get:

lt:
.LFB0:
.cfi_startproc
cmpl %esi, %edi
setl %al
movzbl %al, %eax
ret

As you can see the only control transfer there is 'ret' at the
end of the function. 'if' and C ternary oprators are quite
different, you can not implement them as ordinary functions
(some special case can be optimized to jumpless code, but not
in general).

and it's there in the original code, in the goto transformed code,
and in the function-pointer code. And you cannot get rid of that.

I can, but for something like factorial code would be quite
ugly. One can implement reasonable Turing machine emulator
using just integer and function pointer arrays, array accesses
and assignments, direct and indirect funcction calls.

By reasonable I mean that as long as Turning machine stays
in part of tape modeled as C array emulator and Turing machine
would move in step. Stop in Turing machine would exit emulator.
Only when Turing machine exceeds memory of C program, C program
would exhibit undefined behaviour. If you allow yourself also
C arithmetic operators (crucualy '/' and '%'), then you can
stop execution.

If you assume C implementation with infinite memory such that
'malloc' newer fails, then instead of array you can use
doubly linked list which gets extended when Turing machine
tries to get outside allocated space.

IIUC such infinite C implementation would exhibit undefined
behaviour as C standard requires finite bound on integers and
injective cast from from pointers to some integer type.

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

Wrong, one can use properties of C23 division (actually,
what is needed division and remainder by a fixed positive
number, say 3).

I think that is what is to expect by the theory and the essence of
the point I tried to make.

One point that I wanted to make is that programming languages are
different than theory of integer functions, in particular programming constructs may be surprisingly powerful. For example, there was
a theorem about some special concurrecy problem saying that
desired mutial exclusion can not be done by binary semaphores.
David Parnas showed that it in fact can be solved using arrays
of binary semaphores. The theorem had unstated assumption that
only scalar semaphore variables are in use. Of course, once you eliminate
all useful constructs from a language, then one can not do anything
is such a language (as a joke David Parnas defined such a language).

Second point was that function calls in tail position are quite similar
to goto, and in case of indirect calls they can do job of 'if' or 'switch'.
So if you consider elimination of 'if' (or 'goto') as a cheat, the
cheat is in using function calls, and not in predicates.
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From Kaz Kylheku@643-408-1753@kylheku.com to comp.lang.c on Sun Dec 22 23:47:25 2024

From Newsgroup: comp.lang.c

On 2024-12-22, bart <bc@freeuk.com> wrote:

On 22/12/2024 21:45, Kaz Kylheku wrote:

On 2024-12-21, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

In a functional langauge, we can make a decision by, for instance,
putting two lambdas into an array A, and then calling A[0] or A[1],
where the index 0 or 1 is comes from some Boolean result.

The only reason we have a control construct like if(A, X, Y) where X
is only evaluated if A is true, otherwise Y, is that X and Y
have side effects.

If X and Y don't have side effects, then if(A, X, Y) can be an ordinary
function whose arguments are strictly evaluated.

Moreover, if we give the functional language lazy evaluation semantics,
then anyway we get the behavior that Y is not evaluated if A is true,
and that lazy evaluation model can be used as the basis for sneaking
effects into the functional language and conctrolling them.

Anyway, Turing calculation by primitive recursion does not require
conditional branching. Just perhaps an if function which returns
either its second or third argument based on the truth value of
its first argument.

For instance, in certain C preprocessor tricks, conditional expansion
is achieved by such macros.

When we run the following through the GNU C preprocessor (e.g. by pasting
into gcc -E -x c -p -):

#define TRUE_SELECT_TRUE(X) X
#define TRUE_SELECT_FALSE(X)

#define FALSE_SELECT_TRUE(X)
#define FALSE_SELECT_FALSE(X) X

#define SELECT_TRUE(X) X
#define SELECT_FALSE(X)

#define PASTE(X, Y) X ## Y

#define IF(A, B, C) PASTE(TRUE_SELECT_, A)(B) PASTE(FALSE_SELECT_, A)(C) >>
#define FOO TRUE
#define BAR FALSE

IF(FOO, foo is true, foo is false)
IF(BAR, bar is true, bar is false)

We get these tokens:

foo is true
bar is false

So, how long did it take to debug? (I've no idea how it works. If I

I typed it out right in the middle of my article and piped it out to
gcc, iterating a few times. I made a few silly mistakes in IF, mostly
due to referencing the wrong A, B, C.

Also, the SELECT_TRUE and SELECT_FALSE macros are dead code; not used.

change all TRUE/FALSE to BART/LISA respectively, it still gives the same output. I'm not sure how germane such an example is.)

If you rename consistently, it will work. But it's not hygienic in that
since the solution relies on calculated identifiers, you have to change TRUE_SELECT_TRUE to TRUE_SELECT_BART.

How it works is very simmple in that PASTE(TRUE_SELECT_, A) calculates TRUE_SELECT_TRUE or TRUE_SELECT_FALSE depending on whether A contains
TRUE or FALSE. Then the argument list (B) is combined with this
calculated name, resulting in a macro call to TRUE_SELECT_TRUE(B) or TRUE_SELECT_FALSE(B) with the value of B as an argument.
If the former is used, then it expands to B; if the latter, then to
nothing.

One of the two PASTE calls in the expansion of IF() produces tokens, and
the other nothing. The two results are catenated together into one token sequence, so we get the result of whichever one is nonempty.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Dec 22 17:22:01 2024

From Newsgroup: comp.lang.c

Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2024-12-21, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via
goto.

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

In a functional langauge, we can make a decision by, for instance,
putting two lambdas into an array A, and then calling A[0] or A[1],
where the index 0 or 1 is comes from some Boolean result.

The only reason we have a control construct like if(A, X, Y) where X
is only evaluated if A is true, otherwise Y, is that X and Y
have side effects.

If X and Y don't have side effects, then if(A, X, Y) can be an ordinary function whose arguments are strictly evaluated.

Moreover, if we give the functional language lazy evaluation
semantics, then anyway we get the behavior that Y is not evaluated
if A is true, and that lazy evaluation model can be used as the
basis for sneaking effects into the functional language and
conctrolling them.

Anyway, Turing calculation by primitive recursion does not require conditional branching. Just perhaps an if function which returns
either its second or third argument based on the truth value of its
first argument.

For instance, in certain C preprocessor tricks, conditional
expansion is achieved by such macros.

When we run the following through the GNU C preprocessor (e.g. by
pasting into gcc -E -x c -p -):

#define TRUE_SELECT_TRUE(X) X
#define TRUE_SELECT_FALSE(X)

#define FALSE_SELECT_TRUE(X)
#define FALSE_SELECT_FALSE(X) X

#define SELECT_TRUE(X) X
#define SELECT_FALSE(X)

#define PASTE(X, Y) X ## Y

#define IF(A, B, C) PASTE(TRUE_SELECT_, A)(B) PASTE(FALSE_SELECT_, A)(C)

#define FOO TRUE
#define BAR FALSE

IF(FOO, foo is true, foo is false)
IF(BAR, bar is true, bar is false)

We get these tokens:

foo is true
bar is false

Yet, macro expansion has no conditionals. The preprocessing
language has #if and #ifdef, but we didn't use those. Just
expansion of computed names.

Nice example.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sun Dec 22 17:39:52 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 22.12.2024 02:04, Michael S wrote:

[...]

Part of the answer is in your previous response.
You wrote: "many _good_ professors I met in my life typically were
keen to explain their theses, statements, or knowledge (instead of
dragging that out of him)". You essentially admitted that not all good
professors behave like that.

Oh, what I meant to express was different; that good professors
*would* explain it (only bad ones wouldn't).

(At least that was my experience; and not only covering the CS
domain, BTW.)

[ "schools of teaching" stuff snipped ]

You make an impression of one that received basics of CS. Probably, 40
or so years ago, but still you have to know basic facts. Unlike me, for
example.
So, Tim expects that you will be able to utilizes his hints.

The point [repeatedly] stated (also by others here) was that
he more often than not just provides no information but simple
arbitrary statements of opinion.

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact. They are
no more statements of opinion than a statement about whether the
Riemann Hypothesis is true is a statement of opinion. Someone
might wonder whether an assertion "The Riemann Hypothesis is
true" is true or false, but it is still a matter of fact, not a
matter of opinion.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.c on Mon Dec 23 02:08:46 2024

From Newsgroup: comp.lang.c

On Wed, 18 Dec 2024 23:46:21 -0600, BGB wrote:

... (what debug mechanisms I have, effectively lack any symbols
for things inside "ld-linux.so"'s domain).

nm -D /lib/ld-linux.so.2
--- Synchronet 3.20a-Linux NewsLink 1.114

From antispam@antispam@fricas.org (Waldek Hebisch) to comp.lang.c on Mon Dec 23 02:41:10 2024

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact.

They are opinions _about facts_, or if you prefer, opinion
about truth value of some statements.

They are
no more statements of opinion than a statement about whether the
Riemann Hypothesis is true is a statement of opinion. Someone
might wonder whether an assertion "The Riemann Hypothesis is
true" is true or false, but it is still a matter of fact, not a
matter of opinion.

It is reasobable to assume that you do not know if Riemann Hypothesis
is true or false. So if you say "Riemann Hypothesis is true",
this is just your opinion. I am not a native English speaker
but I believed that "statements of opinion" means just that:
person does not know the truth, but makes a statement.
--
Waldek Hebisch
--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Dec 23 08:43:07 2024

From Newsgroup: comp.lang.c

On 23/12/2024 03:41, Waldek Hebisch wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact.

They are opinions _about facts_, or if you prefer, opinion
about truth value of some statements.

You can program in C without the "normal" conditional statements or expressions. You can make an array of two (or more) function pointers
and select between them using your controlling expression, and that
should be sufficient for conditionals. (There may be other methods too.)

So as far as I can see, Tim gave statements of fact, not opinion.

You can say that Tim's posts were patronising, arrogant, and irritating.
/That/ would be an opinion - a /justified/ opinion because it is
backed up in the evidence of these posts and corroborating evidence from previous posts and discussions from Tim. But without some kind of
precise definition of the terms involved and a robust and repeatable
method of classification, it could not be called "fact".

You could say that Tim's posts were intended to be annoying, or you
could say that he has refused to give an answer to how C can be used
without the "normal" conditionals because he realises he was wrong in
his posts and won't admit it. That would be /unjustified/ opinion - or "speculation" - because we have no way of knowing his motives or
anything more than what he wrote in his posts.

You could, quite fairly, characterise Tim's posts as unjustified
statements of fact - because he has stated his claim as fact, but has
given no justification or reasoning, and it is not something that is
obvious or well-known to people.

They are
no more statements of opinion than a statement about whether the
Riemann Hypothesis is true is a statement of opinion. Someone
might wonder whether an assertion "The Riemann Hypothesis is
true" is true or false, but it is still a matter of fact, not a
matter of opinion.

It is reasobable to assume that you do not know if Riemann Hypothesis
is true or false.

I think if anyone knew the truth of falsity of the Riemann Hypothesis -
i.e., they had a proof one way or the other - we'd have heard about it!

So if you say "Riemann Hypothesis is true",
this is just your opinion.

No, that would not be an opinion. It would be an unjustified claim. "I /believe/ the Riemann Hypothesis is true" is an opinion.

I am not a native English speaker
but I believed that "statements of opinion" means just that:
person does not know the truth, but makes a statement.

No, an opinion is a personal preference or judgement. That's very
different from not knowing about something factual. If I say "the
number 17 will turn up in next week's lottery numbers", that's not an
opinion, it's a claim about facts. It's an unjustified claim, since I
don't know if it is true or not, but it's not an opinion.

It is not always clear when something is a fact or not, and whether a statement is a justified statement of fact, an unjustified statement of
fact (i.e., it might happen to be true, but you have not presented
evidence of it), a justified opinion, or an unjustified opinion. I'm
sure there's a philosophy group on Usenet somewhere, but I doubt if cross-posting there would lead to any clarification!

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Dec 23 09:46:46 2024

From Newsgroup: comp.lang.c

On 22/12/2024 20:41, Janis Papanagnou wrote:

On 22.12.2024 07:01, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

I'm not sure but may have just skimmed over your "C" example if it
wasn't of interest to the point I tried to make (at that stage).

Or try to figure out how to do this knowing that C has function
pointers.

I will retry to explain what I tried to say... - very simply put...

There's "Recursive Functions" and the Turing Machines "equivalent".
The "Recursive Functions" is the most powerful class of algorithms.
Formal Recursive Functions are formally defined in terms of abstract mathematical formulated properties; one of these [three properties]
are the "Test Sets". (Here I can already stop.)

But since we're not in a theoretical CS newsgroup I'd just wanted
to see an example of some common, say, mathematical function and
see it implemented without 'if' and 'goto' or recursion. - Take a
simple one, say, fac(n) = n! , the factorial function. I know how
I can implement that with 'if' and recursion, and I know how I can
implement that with 'while' (or 'goto').

If I re-inspect your example upthread - I hope it was the one you
wanted to refer to - I see that you have removed the 'if' symbol
but not the conditional, the test function; there's still the
predicate (the "Test Set") present in form of 'int c2 = i < n',
and it's there in the original code, in the goto transformed code,
and in the function-pointer code. And you cannot get rid of that.

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

You are adding more restrictions than Tim had given.

We all know that for most non-trivial algorithms you need some kind of repetition (loops, recursion, etc.) and some way to end that repetition.
No one is claiming otherwise.

Tim ruled out &&, ||, ?:, goto, break, continue, if, for, while, switch,
do, labels, setjmp and longjmp.

He didn't rule out recursion, or the relational operators, or any other
part of C.

int fact(int n);

int fact_zero(int n) {
return 1;
}

int n_fact_n1(int n) {
return n * fact(n - 1);
}

int fact(int n) {
return (int (*[])(int)){ fact_zero, n_fact_n1 }[(bool) n](n);
}

There are additional fun things that can be done using different
operators. For an unsigned integer "n" that is not big enough to wrap,
"(n + 2) / (n + 1) - 1" evaluates "(n == 0)".

And Tim did not rule out using the standard library, which would surely
open up new possibilities.

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Mon Dec 23 05:15:29 2024

From Newsgroup: comp.lang.c

On 12/22/2024 8:08 PM, Lawrence D'Oliveiro wrote:

On Wed, 18 Dec 2024 23:46:21 -0600, BGB wrote:

... (what debug mechanisms I have, effectively lack any symbols
for things inside "ld-linux.so"'s domain).

nm -D /lib/ld-linux.so.2

It is not actually on Linux, but rather trying to make my kernel mimic Linux...

The issue isn't getting the symbol map, but rather that in this case,
there are multiple levels of abstraction and so, at the level of the CPU emulator (where I can get instruction traces when something crashes), it
can no longer figure out what addresses map to where.

With the normal PE loader, it can send messages to the virtual debug
UART which signal where it has loaded things in memory (for every EXE
and DLL). But, things partly break down for ELF PIE binaries with glibc
or musl.

Granted, the ELF loader does at least know in theory where the main
binary and interpreter were loaded.

But, seemingly, process is sort of like:
Read in main ELF binary;
Read in interpreter;
Set up argument list, environment, and other stuff (*1), on the stack;
Branch to entry point on interpreter;
Magic happens.
(Currently, it just crashes).

*1:
(SP+ 0): argc
(SP+ 8): argv[0]
(SP+16): argv[1]
...
(SP+(argc+1)*8): NULL
(SP+xx): Env var pointers...
(SP+xx): NULL
(SP+xx): Auxiliary Vectors
Key/value pairs
Terminated by Key==0

Information on how exactly to set up the auxiliary vectors in a way that
glibc and musl are happy, is harder to figure out. At this stage, things become rather poorly documented.

Theoretically the interpreter program is responsible for loading the
other SO's; or if the main ELF loader is supposed to do it, it is not
obvious how it is supposed to tell ld-so where it had loaded them.

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Mon Dec 23 11:35:56 2024

From Newsgroup: comp.lang.c

On 23/12/2024 08:46, David Brown wrote:

Tim ruled out &&, ||, ?:, goto, break, continue, if, for, while, switch,
do, labels, setjmp and longjmp.

He didn't rule out recursion, or the relational operators, or any other
part of C.

int fact(int n);

int fact_zero(int n) {
        return 1;
}

int n_fact_n1(int n) {
        return n * fact(n - 1);
}

int fact(int n) {
        return (int (*[])(int)){ fact_zero, n_fact_n1 }[(bool) n](n); }

There are additional fun things that can be done using different operators. For an unsigned integer "n" that is not big enough to wrap,
"(n + 2) / (n + 1) - 1" evaluates "(n == 0)".

Isn't this just !n ? I don't think "!" was ruled out. This would also
work for negative n.

And Tim did not rule out using the standard library, which would surely
open up new possibilities.

printf (not sprintf) would be reasonable here to show results. Anything
else could be considered cheating.

The original context was a small subset of C that can be used to
represent a larger subset.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Dec 23 13:40:08 2024

From Newsgroup: comp.lang.c

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Dec 23 13:18:46 2024

From Newsgroup: comp.lang.c

On 23/12/2024 12:35, bart wrote:

On 23/12/2024 08:46, David Brown wrote:

Tim ruled out &&, ||, ?:, goto, break, continue, if, for, while,
switch, do, labels, setjmp and longjmp.

He didn't rule out recursion, or the relational operators, or any
other part of C.

int fact(int n);

int fact_zero(int n) {
         return 1;
}

int n_fact_n1(int n) {
         return n * fact(n - 1);
}

int fact(int n) {
         return (int (*[])(int)){ fact_zero, n_fact_n1 }[(bool) n](n);
}

There are additional fun things that can be done using different
operators. For an unsigned integer "n" that is not big enough to
wrap, "(n + 2) / (n + 1) - 1" evaluates "(n == 0)".

Isn't this just !n ? I don't think "!" was ruled out. This would also
work for negative n.

Sure. It was merely another example of something you could use, if you
had ruled out simpler things (like the conversion to bool that I used,
or the ! operator that you suggest).

And Tim did not rule out using the standard library, which would
surely open up new possibilities.

printf (not sprintf) would be reasonable here to show results. Anything
else could be considered cheating.

No, I would not say so - as long as the standard library is not ruled
out, it is part of C. But I think you could reasonably argue that
allowing the standard library makes this whole pointless exercise even
more pointless!

The original context was a small subset of C that can be used to
represent a larger subset.

--- Synchronet 3.20a-Linux NewsLink 1.114

From David Brown@david.brown@hesbynett.no to comp.lang.c on Mon Dec 23 13:24:14 2024

From Newsgroup: comp.lang.c

On 23/12/2024 12:40, Michael S wrote:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

Fairly sure, yes.

But if you think I missed something, please say.

--- Synchronet 3.20a-Linux NewsLink 1.114

From scott@scott@slp53.sl.home (Scott Lurndal) to comp.lang.c on Mon Dec 23 15:41:40 2024

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Sun, 22 Dec 2024 20:41:44 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed
function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

Janis

You make no sense. I am starting to suspect that the reason for it
is ignorance rather than mere stubbornness.

https://godbolt.org/z/EKo5rrYce
Show me conditional branch in the right pane.

The 'C' in 'CSET' is short for conditional. Because
the branch is folded into the compare doesn't mean it
isn't there.
--- Synchronet 3.20a-Linux NewsLink 1.114

From bart@bc@freeuk.com to comp.lang.c on Mon Dec 23 15:51:24 2024

From Newsgroup: comp.lang.c

On 23/12/2024 15:41, Scott Lurndal wrote:

Michael S <already5chosen@yahoo.com> writes:

On Sun, 22 Dec 2024 20:41:44 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed
function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

Janis

You make no sense. I am starting to suspect that the reason for it
is ignorance rather than mere stubbornness.

https://godbolt.org/z/EKo5rrYce
Show me conditional branch in the right pane.

The 'C' in 'CSET' is short for conditional. Because
the branch is folded into the compare doesn't mean it
isn't there.

That's just a mnemomic, which doesn't exist in the x86 version.

Anyway, 'w0' seems to be set either way, and the program counter will
point to the same instruction in each case too.

So there's no branching at this level of code, unless you consider
stepping PC to the next instruction to be a jump.

How is it 'folded into' the compare anyway? Are they not two independent instructions?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Michael S@already5chosen@yahoo.com to comp.lang.c on Mon Dec 23 18:05:48 2024

From Newsgroup: comp.lang.c

On Mon, 23 Dec 2024 15:41:40 GMT
scott@slp53.sl.home (Scott Lurndal) wrote:

Michael S <already5chosen@yahoo.com> writes:

On Sun, 22 Dec 2024 20:41:44 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed
function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

Janis

You make no sense. I am starting to suspect that the reason for it
is ignorance rather than mere stubbornness.

https://godbolt.org/z/EKo5rrYce
Show me conditional branch in the right pane.

The 'C' in 'CSET' is short for conditional. Because
the branch is folded into the compare doesn't mean it
isn't there.

No, branch is not "folded". It is absent. CSET is an ALU operation.
The logical-arithmetic nature of comparison operator is even more
pronounced in code that gcc generates for POWER
https://godbolt.org/z/8Gs9s6nEo

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Dec 23 13:02:02 2024

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens years.
And it shows.

I'm not sure whether to feel flattered or insulted. ;)
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Dec 23 13:18:24 2024

From Newsgroup: comp.lang.c

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Mon Dec 23 13:25:48 2024

From Newsgroup: comp.lang.c

On 12/23/2024 1:02 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens years.
And it shows.

I'm not sure whether to feel flattered or insulted. ;)

AHAHA! lol. You forced me to laugh here. wow. :^D
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Mon Dec 23 13:28:38 2024

From Newsgroup: comp.lang.c

On 12/23/2024 12:46 AM, David Brown wrote:

On 22/12/2024 20:41, Janis Papanagnou wrote:

On 22.12.2024 07:01, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

I'm not sure but may have just skimmed over your "C" example if it
wasn't of interest to the point I tried to make (at that stage).

Or try to figure out how to do this knowing that C has function
pointers.

I will retry to explain what I tried to say... - very simply put...

There's "Recursive Functions" and the Turing Machines "equivalent".
The "Recursive Functions" is the most powerful class of algorithms.
Formal Recursive Functions are formally defined in terms of abstract
mathematical formulated properties; one of these [three properties]
are the "Test Sets". (Here I can already stop.)

But since we're not in a theoretical CS newsgroup I'd just wanted
to see an example of some common, say, mathematical function and
see it implemented without 'if' and 'goto' or recursion. - Take a
simple one, say, fac(n) = n! , the factorial function. I know how
I can implement that with 'if' and recursion, and I know how I can
implement that with 'while' (or 'goto').

If I re-inspect your example upthread - I hope it was the one you
wanted to refer to - I see that you have removed the 'if' symbol
but not the conditional, the test function; there's still the
predicate (the "Test Set") present in form of 'int c2 = i < n',
and it's there in the original code, in the goto transformed code,
and in the function-pointer code. And you cannot get rid of that.

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed
function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

You are adding more restrictions than Tim had given.

We all know that for most non-trivial algorithms you need some kind of repetition (loops, recursion, etc.) and some way to end that repetition.
No one is claiming otherwise.

Tim ruled out &&, ||, ?:, goto, break, continue, if, for, while, switch,
do, labels, setjmp and longjmp.

He didn't rule out recursion, or the relational operators, or any other
part of C.

[...]

pseudo code

func_ptr icall = funcs[i % 3];

icall->you()

;^)

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Dec 23 14:00:38 2024

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 22.12.2024 07:01, Waldek Hebisch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts where I explained in detail how to translate
goto program (with conditional jumps) into program that contains
no goto and no conditional jumps).

I'm not sure but may have just skimmed over your "C" example if it
wasn't of interest to the point I tried to make (at that stage).

Or try to figure out how to do this knowing that C has function
pointers.

I will retry to explain what I tried to say... - very simply put...

There's "Recursive Functions" and the Turing Machines "equivalent".
The "Recursive Functions" is the most powerful class of algorithms.
Formal Recursive Functions are formally defined in terms of abstract mathematical formulated properties; one of these [three properties]
are the "Test Sets". (Here I can already stop.)

But since we're not in a theoretical CS newsgroup I'd just wanted
to see an example of some common, say, mathematical function and
see it implemented without 'if' and 'goto' or recursion. - Take a
simple one, say, fac(n) = n! , the factorial function. I know how
I can implement that with 'if' and recursion, and I know how I can
implement that with 'while' (or 'goto').

If I re-inspect your example upthread - I hope it was the one you
wanted to refer to - I see that you have removed the 'if' symbol
but not the conditional, the test function; there's still the
predicate (the "Test Set") present in form of 'int c2 = i < n',
and it's there in the original code, in the goto transformed code,
and in the function-pointer code. And you cannot get rid of that.

Are you sure about that?
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Dec 23 14:05:55 2024

From Newsgroup: comp.lang.c

scott@slp53.sl.home (Scott Lurndal) writes:

Michael S <already5chosen@yahoo.com> writes:

On Sun, 22 Dec 2024 20:41:44 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

Whether you have the test in an 'if', or in a ternary '?:', or
use it through a bool-int coercion as integer index to an indexed
function[-pointer] table; it's a conditional branch based on the
("Test Set") predicate i<n. You showed in your example how to get
rid of the 'if' symbol, but you could - as expected - not get rid
of the actual test that is the substance of a conditional branch.

I think that is what is to expect by the theory and the essence of
the point I tried to make.

You make no sense. I am starting to suspect that the reason for it
is ignorance rather than mere stubbornness.

https://godbolt.org/z/EKo5rrYce
Show me conditional branch in the right pane.

The 'C' in 'CSET' is short for conditional. Because
the branch is folded into the compare doesn't mean it
isn't there.

It's a moot point because relational operators and equality
operators can be synthesized out of bitwise and arithmetic
operators.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Mon Dec 23 15:50:44 2024

From Newsgroup: comp.lang.c

On 12/23/2024 1:25 PM, Chris M. Thomasson wrote:

On 12/23/2024 1:02 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Sat, 21 Dec 2024 21:31:24 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

So your statement asks for some explanation at least.

I would guess that Tim worked as CS professor for several dozens years.
And it shows.

I'm not sure whether to feel flattered or insulted. ;)

AHAHA! lol. You forced me to laugh here. wow. :^D

merry christmas!
--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben@bsb.me.uk to comp.lang.c on Tue Dec 24 00:41:23 2024

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters.

Hmm... I'm puzzled. Where does the unbounded store come from without
I/O? Do you take "C is Turing complete" to mean that there is a
theoretically possible implementation of C sufficient for any given
problem instance (rather than for any given problem)? That's not how
different models are usually compared, and I think it would run into
some rather odd theoretical problems.

There is a somewhat informal version of "C (with the restrictions you
have stated) is Turing complete" which just means "you can do anything
you want provided you don't hit an implementation limit".
--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Dec 23 20:55:04 2024

From Newsgroup: comp.lang.c

Ben Bacarisse <ben@bsb.me.uk> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters.

Hmm... I'm puzzled. Where does the unbounded store come from without
I/O? Do you take "C is Turing complete" to mean that there is a theoretically possible implementation of C sufficient for any given
problem instance (rather than for any given problem)? That's not how different models are usually compared, and I think it would run into
some rather odd theoretical problems.

Sorry, it seems my comment was misleading. I thought it was
apparent from the rest of my paragraph (not shown in your excerpt)
that my statement was meant as "Furthermore I don't think it matters
if _most_ of the standard library is excluded." There had been a
mention of printf as being infringing (which in my view is silly,
but never mind that), so I wanted to point out that most of the
standard library is irrelevant, including in particular [f]printf.

There is a somewhat informal version of "C (with the restrictions you
have stated) is Turing complete" which just means "you can do anything
you want provided you don't hit an implementation limit".

Yes, I'm familiar with that, and I knowingly glossed over the
distinction, because I think it's customary, when talking about
Turing Completeness relative to conventional programming languages,
to ignore the finiteness of conventional language models. I should
have known better with you in the audience. You got me! :)
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 25 00:51:37 2024

From Newsgroup: comp.lang.c

On 12/23/2024 1:43 AM, David Brown wrote:

On 23/12/2024 03:41, Waldek Hebisch wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact.

They are opinions _about facts_, or if you prefer, opinion
about truth value of some statements.

You can program in C without the "normal" conditional statements or expressions. You can make an array of two (or more) function pointers
and select between them using your controlling expression, and that
should be sufficient for conditionals. (There may be other methods too.)

So as far as I can see, Tim gave statements of fact, not opinion.

Jumping back in:
That one can do this seems obvious enough;
Downside, as I see it, is that there is no current or likely processor hardware where this is likely to be performance competitive with the
more traditional if-goto mechanism (and if the backend is expected to
optimize it away, not obvious what would be gained).

Sort of like with "continuation passing style":
Yes, you can do this, but the performance overhead relative to
conventional call-frames is severe.

But, CPS does at least have use-cases which can justify this overhead.

Though, FWIW, doing control flow via a combination of CPS and plugging
things together with function pointers is fairly useful in implementing
things like fast interpreters (where calling through function pointers
can be faster than going through big if/else trees or "switch()" blocks).

Where, early on in writing interpreters, I had often ran into a limit
that the interpreter would become bottle-necked by how quickly it could
spin in a loop and feed instructions through a big "switch()" block.
Using function pointers can theoretically sidestep this limit (then one
is more limited by how quickly they can walk the trace graph and call
the relevant function pointers).

But, can get within 10x of native code in some cases, which is pretty
fast by interpreter standards (to get much faster usually requires a JIT).

Well, except in my current emulator, where in trying to be
cycle-accurate, the much bigger overhead is in trying to mimic behavior
and cycle costs of the cache hierarchy and similar.

You can say that Tim's posts were patronising, arrogant, and irritating.
/That/ would be an opinion - a /justified/ opinion because it is
backed up in the evidence of these posts and corroborating evidence from previous posts and discussions from Tim. But without some kind of
precise definition of the terms involved and a robust and repeatable
method of classification, it could not be called "fact".

You could say that Tim's posts were intended to be annoying, or you
could say that he has refused to give an answer to how C can be used
without the "normal" conditionals because he realises he was wrong in
his posts and won't admit it. That would be /unjustified/ opinion - or "speculation" - because we have no way of knowing his motives or
anything more than what he wrote in his posts.

You could, quite fairly, characterise Tim's posts as unjustified
statements of fact - because he has stated his claim as fact, but has
given no justification or reasoning, and it is not something that is
obvious or well-known to people.

They are
no more statements of opinion than a statement about whether the
Riemann Hypothesis is true is a statement of opinion. Someone
might wonder whether an assertion "The Riemann Hypothesis is
true" is true or false, but it is still a matter of fact, not a
matter of opinion.

It is reasobable to assume that you do not know if Riemann Hypothesis
is true or false.

I think if anyone knew the truth of falsity of the Riemann Hypothesis - i.e., they had a proof one way or the other - we'd have heard about it!

So if you say "Riemann Hypothesis is true",
this is just your opinion.

No, that would not be an opinion. It would be an unjustified claim.
"I /believe/ the Riemann Hypothesis is true" is an opinion.

I am not a native English speaker
but I believed that "statements of opinion" means just that:
person does not know the truth, but makes a statement.

No, an opinion is a personal preference or judgement. That's very different from not knowing about something factual. If I say "the
number 17 will turn up in next week's lottery numbers", that's not an opinion, it's a claim about facts. It's an unjustified claim, since I don't know if it is true or not, but it's not an opinion.

It is not always clear when something is a fact or not, and whether a statement is a justified statement of fact, an unjustified statement of
fact (i.e., it might happen to be true, but you have not presented
evidence of it), a justified opinion, or an unjustified opinion. I'm
sure there's a philosophy group on Usenet somewhere, but I doubt if cross-posting there would lead to any clarification!

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 25 03:41:41 2024

From Newsgroup: comp.lang.c

On 12/23/2024 3:18 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.

If I were to choose a set of primitive functions, probably:
malloc/free and/or realloc
could define, say:
malloc(sz) => realloc(NULL, sz)
free(ptr) => realloc(ptr, 0)
Maybe _msize and _mtag/..., but this is non-standard.
With _msize, can implement realloc on top of malloc/free.

For basic IO:
fopen, fclose, fseek, fread, fwrite

printf could be implemented on top of vsnprintf and fputs
fputs can be implemented on top of fwrite (via strlen).
With a temporary buffer buffer being used for the printed string.

...

Though, one may still end up with various other stuff over the interface
as well. Though, the interface can be made open-ended if one has a GetInterface call or similar, which can request other interfaces given
an ID, such as, FOURCC/EIGHTCC pair, a SIXTEENCC, or GUID (*1). IMHO, generally preferable over a "GetProcAddress" mechanism due to lower
overheads; tough, with an annoyance that interface vtables generally
have a fixed layout (generally can't really add or change anything
without creating binary compatibility issues; so a lot of
tables/structures need to be kept semi-frozen).

Though, APIs like DirectX had dealt with the issue of having version
numbers for vtables and then one requests a specific version of the
vtable (within the range of versions supported by the major version of DirectX). But, this is crufty.

*1: Say: QWORD qwMajor, QWORD qwMinor.
qwMajor:
Major ID (FOURCC, EIGHTCC)
Or: First 8 bytes of SIXTEENCC or GUID
qwMinor:
SubID/Version (FOURCC or EIGHTCC)
Second 8 bytes of SIXTEENCC or GUID.
Where:
High 32 bits are 0, assume FOURCC.
Else, look at bits to determine EIGHTCC vs GUID.
Assume if both are EIGHTCC, value represents a SIXTEENCC.
Bit patterns for valid SIXTEENCCs vs GUIDs are mutually exclusive.
Names make more sense for public interfaces.
Leaving GUIDs mostly for private/internal interfaces.

Well, unlike Windows, where they use GUIDs for pretty much everything
here (and also, I didn't bother with an IDL compiler; generally doing
all this directly in C).

Well, and some wonk, like the exact contents of structures like BITMAPINFOHEADER being interpreted based on using biSize as a magic
number (well, sometimes with other stuff glued onto the end, as
understood based the use of the biCompression field), ...

But, it has held up well, this structure being almost as old as I am...

In a few cases, one might also take the option of using a "DriverProc()"
style interface, where one provides a pair of context-dependent pointers
and uses magic numbers to identify the desired operation, or, intermediate:
(*ifvt)->QueryProc(ifvt, iHdl, lParm, pParm1, pParm2);
(*ifvt)->ModifyProc(ifvt, iHdl, lParm, pParm1, pParm2);

Where, QueryProc is intended for non-destructive operations, and
ModifyProc for destructive operations.
iHdl: Context-dependent integer handle;
lParm: Magic command number.
pParm1/pParm2: Magic pointers, often:
pParm1: Input data address;
pParm2: Output data address.

Where, vtable is usually provided in "VT **" form, hence the need to
deref the table before a method can be invoked.

Actually, some of this overlaps with how I had implemented the C library
for DLLs in my project:
Only the main binary has the full C library;
DLL's generally use a C library which calls back to the main C library
via a COM style interface (things like malloc/free and stdio calls are
routed over this interface).

Note that this is partly because in my case:
1, DLLs only allow an acyclic dependency graph;
2, The mechanism does not currently allow sharing global variables;
3, There was a desire to allow dlopen/dlsym to dynamically load libraries.

1 & 3 mean that if a statically-linked C library is used for the main
binary:
One needs to also statically link a C library to each DLL;
The C library needs to operate over a COM interface for shared interfaces.

Or, alternatively, that only a DLL may be used for the C library, and
all DLLs would need to use the same C library DLL.

Note that neither 1 nor 2 traditionally apply with ELF Shared Objects
(which usually both shared everything and allow for cyclic dependency
graphs). But, traditionally ELF has other drawbacks, like needing to
access variables and call functions via a GOT (which has higher overhead
than direct calls, or accessing global variables as a fixed offset
relative to a known base register, ...).

Note that having the kernel inject DLLs into a running process wouldn't
really mix well with the way glibc approaches shared objects (where, it manages this stuff in userland, rather than having this left up to the kernel's program loader).

May not matter as much though as if providing an COM-like interface, one doesn't necessarily actually need dlopen/dlsym to be able to see the
symbols in the library that the interface came from.

Where, in this case, COM-like interfaces may be used in ways that
deviate from usual dependency ordering; and was more flexible. They are awkward to use directly, so it may make sense to provide C API wrappers
(thus far, usually statically linked, but they can fetch the interfaces
they need from the main C library or the OS).

Where, in my case, the OS interface is a mix of conventional syscalls
and object-method-calls routed over the syscall interface (the target
being either in the kernel or in another process; or the OS might load a
DLL into the client process and return a process-local vtable).

If non-local, generally the method pointers are generic, and serve to
forward the call over the syscall mechanism (the syscall interface being
used in a somewhat different way from how it would be used in something
like Linux; where Linux generally just does not do things this way...).

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Wed Dec 25 15:43:29 2024

From Newsgroup: comp.lang.c

On 12/25/2024 3:41 AM, BGB wrote:

On 12/23/2024 3:18 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.

If I were to choose a set of primitive functions, probably:
malloc/free and/or realloc
    could define, say:
      malloc(sz) => realloc(NULL, sz)
      free(ptr) => realloc(ptr, 0)
    Maybe _msize and _mtag/..., but this is non-standard.
      With _msize, can implement realloc on top of malloc/free.

For basic IO:
fopen, fclose, fseek, fread, fwrite

printf could be implemented on top of vsnprintf and fputs
fputs can be implemented on top of fwrite (via strlen).
With a temporary buffer buffer being used for the printed string.

...

Though, one may still end up with various other stuff over the interface
as well. Though, the interface can be made open-ended if one has a GetInterface call or similar, which can request other interfaces given
an ID, such as, FOURCC/EIGHTCC pair, a SIXTEENCC, or GUID (*1). IMHO, generally preferable over a "GetProcAddress" mechanism due to lower overheads; tough, with an annoyance that interface vtables generally
have a fixed layout (generally can't really add or change anything
without creating binary compatibility issues; so a lot of tables/
structures need to be kept semi-frozen).

Though, APIs like DirectX had dealt with the issue of having version
numbers for vtables and then one requests a specific version of the
vtable (within the range of versions supported by the major version of DirectX). But, this is crufty.

*1: Say: QWORD qwMajor, QWORD qwMinor.
qwMajor:
    Major ID (FOURCC, EIGHTCC)
    Or: First 8 bytes of SIXTEENCC or GUID
qwMinor:
    SubID/Version (FOURCC or EIGHTCC)
    Second 8 bytes of SIXTEENCC or GUID.
Where:
    High 32 bits are 0, assume FOURCC.
    Else, look at bits to determine EIGHTCC vs GUID.
    Assume if both are EIGHTCC, value represents a SIXTEENCC.
    Bit patterns for valid SIXTEENCCs vs GUIDs are mutually exclusive.
    Names make more sense for public interfaces.
      Leaving GUIDs mostly for private/internal interfaces.

Well, unlike Windows, where they use GUIDs for pretty much everything
here (and also, I didn't bother with an IDL compiler; generally doing
all this directly in C).

Clarification:
Though, despite taking influence from COM, it is not COM.

I am not using the COM API, and generally practices regarding vtable structure, etc, are a bit more loose.

There is also not currently any plan to actually implement the OLE or
COM APIs. Only that some similar ideas are in use.

Pretty much everything else is different...

COM uses a 16-byte struct to convey a GUID;
I was using pairs of 64-bit integer values.

...

Some ideas from OLE, such as storing object instances from one library
in a "document" held by an unrelated program instance, and then saving/reloading them later, are not a thing in my case.

It is possible I could consider doing something similar to OLE, but I
don't have an immediate use-case (and, more often, I was using the
object interfaces internally for things like OS level APIs).

Note that many core OS APIs are still a bit more mundane, like:
memory is still managed using pointers;
files IO is still managed using integer handles;
...

Though, within the kernel, open VFS files are implemented via objects
with vtable pointers. This detail is not exposed to program instances,
where the system calls identify them via integer handles.

Well, and also I am using a Unix style directory tree structure, rather
than drive letters.

But, does differ some in things like locating DLLs for a program:
ELF:
Either "/lib/", "/usr/lib/",
or a hard-coded path in the binary.
Win:
Check current directory;
Then search PATH;
TK:
Check first in the directory the EXE is found;
Then search LIBPATH;
Then search PATH.

Hard coding paths in the binary does mean though that the installation
path for any binaries that depends on custom SO's is fixed, which is not ideal. Checking relative to the binary allows more flexible installation paths.

Well, and some wonk, like the exact contents of structures like BITMAPINFOHEADER being interpreted based on using biSize as a magic
number (well, sometimes with other stuff glued onto the end, as
understood based the use of the biCompression field), ...

But, it has held up well, this structure being almost as old as I am...

Clarification:
I am towards the older end of the Millennial / Gen Y age range...

Started existence in the IBM Clones and MS-DOS era, but by the time I
was using computers, was mostly in the era of Windows, CD-ROM based FMV
games, and early/slow internet.

In a few cases, one might also take the option of using a "DriverProc()" style interface, where one provides a pair of context-dependent pointers
and uses magic numbers to identify the desired operation, or, intermediate:
(*ifvt)->QueryProc(ifvt, iHdl, lParm, pParm1, pParm2);
(*ifvt)->ModifyProc(ifvt, iHdl, lParm, pParm1, pParm2);

Where, QueryProc is intended for non-destructive operations, and
ModifyProc for destructive operations.
iHdl: Context-dependent integer handle;
lParm: Magic command number.
pParm1/pParm2: Magic pointers, often:
    pParm1: Input data address;
    pParm2: Output data address.

Where, vtable is usually provided in "VT **" form, hence the need to
deref the table before a method can be invoked.

Well, theoretically, say:
First 4 pointers are reserved;
Used internally for various stuff;
Methods take the object instance as the first argument;
...

The pointer itself points to an object instance, which may often be a dummy.
Then, the object starts with a pointer to the vtable.

Actually, some of this overlaps with how I had implemented the C library
for DLLs in my project:
Only the main binary has the full C library;
DLL's generally use a C library which calls back to the main C library
via a COM style interface (things like malloc/free and stdio calls are routed over this interface).

Looking back at it, this may not count as "COM like", as, many of the
vtable pointers deviate from the traditional form:
Don't take the object as a first argument;
Many are just plain C function pointers.
Eg: ptr=(*vt)->malloc_fp(sz);

Then again, for marshaling the C library across DLL boundaries, it
likely does not matter (and would have complicated the interface;
requiring method pointers to take a vtable pointer only to ignore it; as conceptually the C library is global across the whole process instance).

Note that this is partly because in my case:
1, DLLs only allow an acyclic dependency graph;
2, The mechanism does not currently allow sharing global variables;
3, There was a desire to allow dlopen/dlsym to dynamically load libraries.

1 & 3 mean that if a statically-linked C library is used for the main binary:
One needs to also statically link a C library to each DLL;
The C library needs to operate over a COM interface for shared interfaces.

Or, alternatively, that only a DLL may be used for the C library, and
all DLLs would need to use the same C library DLL.

Groan, I had described it as a COM interface, but as noted above, this
is not correct in this case...

The vtable in question isn't even close to following COM patterns.

How closely patterns are followed is kinda variable.

But, as noted, if no shared interface were used, then each DLL (and the
main program binary) would effectively have their own heap and could not
share "FILE *" pointers.

While Windows generally has this limitation (at least with MSVC), I
personally didn't want this (better if one can "malloc()" something in
one library and "free()" it in another, and not horribly break the C
runtime).

Cygwin and MinGW had addressed this issue in different ways (say, in the
case of Cygwin, by consolidating all of the core stuff into "cygwin1.dll").

Can note that in my case, each binary image still gets its own native
copy of things like memcpy/memset/strlen/...

These don't depend on any external state, and generally one wants a low-overhead interface for these (along with potential of special
handling by the compiler).

I had at one point considered writing a new C library where this stuff
would have been engineered better, but this fizzled due to inertia. In
this case, the library would have been fully split into a "client" and "server" parts:
"client": Has all the recognizable parts of the C library;
"server": Backend where all the magic happens (the core parts of malloc
and stdio and similar reside here).

In such a library, a lot the client-side calls would be wrappers, say:
void *malloc(size_t size)
{
__clib_autoinit(); //bring up C library if needed
return((*__clib_vta)->Malloc(__clib_vta, size));
}

It is unclear here if the server would still be static linked to the
main EXE, or if it would be a component that is dynamically loaded by
the kernel as needed during process creation (this could make the main
EXE smaller);

or, instead, going the route of having the C library server part inside
of a common DLL (like in Cygwin).

At present, there is a pointer in the task context structure than is set
by the main binary to allow for the DLL's C libraries to bootstrap
themselves (initially a "GetProcAddress" function, that at present only
serves to fetch the main instance vtable pointer).

Note that some libraries like TKGDI (used for graphics/sound/user-input)
is itself mostly a thin wrapper over a vtable on the client side
(though, with some of its own logic, as data needs to be passed over the interface in "GlobalAlloc" memory buffers; as the server is generally
running in a different process).

TKRA-GL (my OpenGL implementation) also internally uses a vtable
structure, but slightly different:
Most of the normal OpenGL calls are handled on the client side;
The vtable essentially mostly handles things like texture uploads and
the backend logic for glDrawArrays / glDrawElements calls.

Note that the "glBegin()"/"glEnd()" interface exists primarily as a
wrapper over the "glDrawArrays()" mechanism.

Currently the backend parts run in the kernel, but it is tempting to
consider folding it off to a dynamically loadable module as it adds significant bulk (and is not always needed).

Say, for example, only loading the OpenGL DLL kernel-side if a user
program tries to create an instance of OpenGL.

Likewise, maybe work towards further separating the client and server
parts of the GL implementation, as there was not a split initially. Not
really sure how it usually works in other systems, this stuff doesn't
seem well documented (though, generally seems like, at least on Windows: "opengl32.dll" wraps GPU vendor provided DLL, which then does whatever,
to communicate with the backend driver).

Note that neither 1 nor 2 traditionally apply with ELF Shared Objects
(which usually both shared everything and allow for cyclic dependency graphs). But, traditionally ELF has other drawbacks, like needing to
access variables and call functions via a GOT (which has higher overhead than direct calls, or accessing global variables as a fixed offset
relative to a known base register, ...).

Note that having the kernel inject DLLs into a running process wouldn't really mix well with the way glibc approaches shared objects (where, it manages this stuff in userland, rather than having this left up to the kernel's program loader).

May not matter as much though as if providing an COM-like interface, one doesn't necessarily actually need dlopen/dlsym to be able to see the
symbols in the library that the interface came from.

Where, in this case, COM-like interfaces may be used in ways that
deviate from usual dependency ordering; and was more flexible. They are awkward to use directly, so it may make sense to provide C API wrappers (thus far, usually statically linked, but they can fetch the interfaces
they need from the main C library or the OS).

Where, in my case, the OS interface is a mix of conventional syscalls
and object-method-calls routed over the syscall interface (the target
being either in the kernel or in another process; or the OS might load a
DLL into the client process and return a process-local vtable).

If non-local, generally the method pointers are generic, and serve to forward the call over the syscall mechanism (the syscall interface being used in a somewhat different way from how it would be used in something
like Linux; where Linux generally just does not do things this way...).

Can note that in trying to get glibc ELF binaries to work on my stuff, effectively there is a separate syscall interface that mimics the Linux syscall interface.

But, likely, these would represent a different "ecosystem" in terms of
the binaries (besides just the ELF / PE differences).

--- Synchronet 3.20a-Linux NewsLink 1.114

From Rosario19@Ros@invalid.invalid to comp.lang.c on Thu Dec 26 13:16:57 2024

From Newsgroup: comp.lang.c

On Mon, 16 Dec 2024 21:22:31 -0000 (UTC), Lawrence D'Oliveiro <>
wrote:

On Sun, 15 Dec 2024 20:08:53 +0100, Bonita Montero wrote:

C++ is more readable because is is magnitudes more expressive than C.

my position is all is based from the more easy instructions both for
cpu and human, if goto

someone says that is better ifcall, but for me is not much readable,
it is worse than ifgoto

And it is certainly more surprising than C. Often unpleasantly so.

You can easily write a C++-statement that would hunddres of lines in C

Yes, but *which* hundreds of lines, exactly, would be the correct C >equivalent?

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Dec 28 09:20:23 2024

From Newsgroup: comp.lang.c

BGB <cr88192@gmail.com> writes:

On 12/23/2024 1:43 AM, David Brown wrote:

On 23/12/2024 03:41, Waldek Hebisch wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact.

They are opinions _about facts_, or if you prefer, opinion
about truth value of some statements.

You can program in C without the "normal" conditional statements or
expressions. You can make an array of two (or more) function
pointers and select between them using your controlling expression,
and that should be sufficient for conditionals. (There may be other
methods too.)

So as far as I can see, Tim gave statements of fact, not opinion.

Jumping back in:
That one can do this seems obvious enough;
Downside, as I see it, is that there is no current or likely
processor hardware where this is likely to be performance
competitive with the more traditional if-goto mechanism [...]

Irrelevant to the issue being discussed.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Dec 28 09:24:16 2024

From Newsgroup: comp.lang.c

BGB <cr88192@gmail.com> writes:

On 12/23/2024 3:18 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.

If I were to choose a set of primitive functions, probably:
malloc/free and/or realloc
could define, say:
malloc(sz) => realloc(NULL, sz)
free(ptr) => realloc(ptr, 0)
Maybe _msize and _mtag/..., but this is non-standard.
With _msize, can implement realloc on top of malloc/free.

For basic IO:
fopen, fclose, fseek, fread, fwrite

printf could be implemented on top of vsnprintf and fputs
fputs can be implemented on top of fwrite (via strlen).
With a temporary buffer buffer being used for the printed string.

Most of these aren't needed. I think everything can be
done using only fopen, fclose, fgetc, fputc, and feof.
--- Synchronet 3.20a-Linux NewsLink 1.114

From BGB@cr88192@gmail.com to comp.lang.c on Sat Dec 28 13:59:24 2024

From Newsgroup: comp.lang.c

On 12/28/2024 11:24 AM, Tim Rentsch wrote:

BGB <cr88192@gmail.com> writes:

On 12/23/2024 3:18 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.

If I were to choose a set of primitive functions, probably:
malloc/free and/or realloc
could define, say:
malloc(sz) => realloc(NULL, sz)
free(ptr) => realloc(ptr, 0)
Maybe _msize and _mtag/..., but this is non-standard.
With _msize, can implement realloc on top of malloc/free.

For basic IO:
fopen, fclose, fseek, fread, fwrite

printf could be implemented on top of vsnprintf and fputs
fputs can be implemented on top of fwrite (via strlen).
With a temporary buffer buffer being used for the printed string.

Most of these aren't needed. I think everything can be
done using only fopen, fclose, fgetc, fputc, and feof.

If you only have fgetc and fputc, IO speeds are going to be unacceptably
slow for non-trivial file sizes.

If you try to fake fseek by closing, re-opening, and an fgetc loop,
well, also going to be very slow.

Then again, fgetc/fputc as the primary operations could make sense for
text files if the implementation is doing some form of format conversion
(such as converting between LF only and CR+LF), though admittedly IMO
one is better off treating text files as equivalent to binary files (and letting the application deal with any conversions here).

OTOH:
fgetc and fputc can be implemented via fread and fwrite;
feof (for normal files) can be implemented via fseek (*1);
Similar, ftell could be treated as a special case of fseek.

*1: Say, if the internal fseek call were made to return the current file position (similar to lseek).

...

Well, in another also recently left facing off with the wonk of UTF-8 normalization for the VFS layer in my project (for paths/filenames).
Options:
Do Nothing, assume valid UTF-8 and that it is sensibly normalized;
May risk malformed encodings at deeper levels of the VFS though.
Encoding only normalization:
Normalize to an M-UTF-8 variant and call it done.
Do a subset of normalizing combining characters.
The full set of Unicode rules would likely be too bulky;
Filesystem should have no concept of locale;
The rules should be ideally be "semi frozen" once defined.

At present, this is applied at the level of VFS syscalls (like "open()"
or "opendir()").

Current thinking is that it will normalize to a variant of M-UTF-8 NFC (characters are stored in composed forms), but:
Will only apply the rules covering the Latin-1 and Latin Extended A
spaces, and a subset of Latin Extended B.

Though, a case could be made for limiting the scope solely to the
Latin-1/1252 range (and passing everything beyond this along as-is).

Less sure, had also added cases for the Roman numeral characters, mostly
for decomposing them into ASCII; various ligatures would also be
decomposed to ASCII (excluding those which appear as their own glyph, so
AE and OE are left as-is, but IJ/DZ/... would be decomposed). A case
could also be made for leaving these alone (passing them along
unmodified). Depends mostly on the open question of whether or not these convey relevant semantic information (or are merely historical/aesthetic).

At present, the rules are stored as a table, with roughly 8 bytes needed
per combiner rule (increases to 12 once initialized, mostly because it allocates a pair of 16-bit hash chains).
Namely: SrcCodepoint1, SrcCodepoint2, DstCodepoint, Flags
Flags specify when and how the rule is applied.
SrcCodepoint2 is currently 0x0000 for simple conversion rules.
DstCodepoint is used for lookup for decompose.
...

Limiting the scope also makes things likely more repeatable (where inconsistent normalization could result in file lookup issues in cases
where rules differ, if stepping on the offending code-points). Goal is
mostly to find an acceptable set of rules that can be "mostly frozen".
Though, in most cases this is likely N/A as the majority of filenames
tend to be plain ASCII.

The responsibility for any more advanced normalization (or
locale-dependent stuff) would be left up at the application level.

Can't seem to find much information about "best practices" in these areas.

It is not certain normalizing for combining characters is actually a
good idea, vs only normalizing for codepoint encoding. Mostly to deal
with cases where malformed data is submitted to the VFS, or possibly
1252 (if the VFS calls and similar are given something that is invalid
UTF-8, then it may be assumed to be 1252). Theoretically, the locale
code in the C library is expected to normalize for 1252 vs UTF-8 though
(but, ideally, the integrity of the VFS should be kept protected from
this sort of thing).

This also applies to console printing, which is also expected to be
handed UTF-8, but may also normalize the strings. Though, there is some
wonk with the console here in my case.

Seemingly (from what I can gather):
Linux:
It is per FS driver;
Some are "do nothing", others normalize.
MacOS:
Also depends on filesystem:
HFS/HFS+, normalizing (as NFD for some reason);
APFS, does nothing (apparently leads to a lot of hassles).
Windows:
FAT32: Depends solely on OS locale;
NTFS: Locale rules are baked-in when the drive is formatted.
The relevant tables are held in filesystem metadata.

...

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Dec 31 04:57:58 2024

From Newsgroup: comp.lang.c

BGB <cr88192@gmail.com> writes:

On 12/28/2024 11:24 AM, Tim Rentsch wrote:

BGB <cr88192@gmail.com> writes:

On 12/23/2024 3:18 PM, Tim Rentsch wrote:

Michael S <already5chosen@yahoo.com> writes:

On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:

And Tim did not rule out using the standard library,

Are you sure?

I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.

Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.

If I were to choose a set of primitive functions, probably:
malloc/free and/or realloc
could define, say:
malloc(sz) => realloc(NULL, sz)
free(ptr) => realloc(ptr, 0)
Maybe _msize and _mtag/..., but this is non-standard.
With _msize, can implement realloc on top of malloc/free.

For basic IO:
fopen, fclose, fseek, fread, fwrite

printf could be implemented on top of vsnprintf and fputs
fputs can be implemented on top of fwrite (via strlen).
With a temporary buffer buffer being used for the printed string.

Most of these aren't needed. I think everything can be
done using only fopen, fclose, fgetc, fputc, and feof.

If you only have fgetc and fputc, IO speeds are going to be
unacceptably slow for non-trivial file sizes.

Once again, any performance concerns are not relevant to the
matter under discussion.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Jan 4 11:18:07 2025

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

antispam@fricas.org (Waldek Hebisch) writes:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via goto. >>>>>>

A 'goto' may be used but it isn't strictly *necessary*. What *is*
necessary, though, that is an 'if' (some conditional branch), and
either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not strictly
necessary either.

No? - Can you give an example of your statement?

Look at example that I posted (apparently neither you nor Tim
looked at my posts [...]

What makes you think I didn't?

I made the same claim as you earlier and gave examples. You
did not acknowledge my posts. Why? For me most natural
explanation is that you did not read them.

You should revise your inference heuristics. There are any
number of reasons why I might not have referred to your
comments. Furthermore your conclusion is incorrect.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Sat Jan 4 12:12:15 2025

From Newsgroup: comp.lang.c

antispam@fricas.org (Waldek Hebisch) writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact.

They are opinions _about facts_, or if you prefer, opinion
about truth value of some statements.

They are
no more statements of opinion than a statement about whether the
Riemann Hypothesis is true is a statement of opinion. Someone
might wonder whether an assertion "The Riemann Hypothesis is
true" is true or false, but it is still a matter of fact, not a
matter of opinion.

It is reasobable to assume that you do not know if Riemann Hypothesis
is true or false. So if you say "Riemann Hypothesis is true",
this is just your opinion. I am not a native English speaker
but I believed that "statements of opinion" means just that:
person does not know the truth, but makes a statement.

A statement of opinion is a statement concerning a subjective
question, such as "Do cats make better pets than dogs?" A
statement of opinion isn't ever right or wrong or true or false,
it merely expresses an individual point of view. Most statements
that have a word like "should" or "good" or "bad" or "better",
etc., are statements of opinion. That can change if the
qualifying words are given precise and objective definitions, but
in most cases they have not been.

A statement of fact is a statement concerning an objective question,
such as "Is every even number greater than 4 the sum of two prime
numbers?". A statement of fact can be right or wrong or true or
false, even if it isn't known at the present time which of those is
the case. The statement "Four colors suffice to color any planar
map such that adjacent regions do not have the same color" is a
statement of fact, both now and 60 years ago before the statement
had been proven. Both P==NP and P!=NP are statements of fact, even
though one of them must certainly be false; the key property is
that they are objective statements, subject to falsification. If I
say "The Earth is flat", that is a statement of fact, even though
the statement is false.

In any case, my statements about a particular subset of C being
Turing Complete were statements of fact, and also true statements.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Chris M. Thomasson@chris.m.thomasson.1@gmail.com to comp.lang.c on Sat Jan 4 12:53:01 2025

From Newsgroup: comp.lang.c

On 1/4/2025 12:12 PM, Tim Rentsch wrote:

antispam@fricas.org (Waldek Hebisch) writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

The comments I made here, in two responses to postings of yours,
were not statements of opinion but statements of fact.

They are opinions _about facts_, or if you prefer, opinion
about truth value of some statements.

They are
no more statements of opinion than a statement about whether the
Riemann Hypothesis is true is a statement of opinion. Someone
might wonder whether an assertion "The Riemann Hypothesis is
true" is true or false, but it is still a matter of fact, not a
matter of opinion.

It is reasobable to assume that you do not know if Riemann Hypothesis
is true or false. So if you say "Riemann Hypothesis is true",
this is just your opinion. I am not a native English speaker
but I believed that "statements of opinion" means just that:
person does not know the truth, but makes a statement.

A statement of opinion is a statement concerning a subjective
question, such as "Do cats make better pets than dogs?"

sometimes, why do cats seem to own their owners?

;^)

[...]
--- Synchronet 3.20a-Linux NewsLink 1.114

From Ben Bacarisse@ben@bsb.me.uk to comp.lang.c on Sun Jan 5 11:18:03 2025

From Newsgroup: comp.lang.c

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

A statement of fact is a statement concerning an objective question,
such as "Is every even number greater than 4 the sum of two prime
numbers?". A statement of fact can be right or wrong or true or
false, even if it isn't known at the present time which of those is
the case. The statement "Four colors suffice to color any planar
map such that adjacent regions do not have the same color" is a
statement of fact, both now and 60 years ago before the statement
had been proven. Both P==NP and P!=NP are statements of fact, even
though one of them must certainly be false; the key property is
that they are objective statements, subject to falsification. If I
say "The Earth is flat", that is a statement of fact, even though
the statement is false.

I think you go too far. The word "fact" is not neutral as far as its
truth is concerned, and writing "a statement of fact" does not
significantly change that. Most dictionaries define a fact as something
that is true (or at least supported by currently available evidence).
One online essay[1] concludes that

"A statement of fact is one that has objective content and is
well-supported by the available evidence."

[1] https://philosophersmag.com/the-fact-opinion-distinction/
--
Ben.
--- Synchronet 3.20a-Linux NewsLink 1.114

From James Kuyper@jameskuyper@alumni.caltech.edu to comp.lang.c on Sun Jan 5 12:04:41 2025

From Newsgroup: comp.lang.c

On 1/5/25 06:18, Ben Bacarisse wrote:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

A statement of fact is a statement concerning an objective question,
such as "Is every even number greater than 4 the sum of two prime
numbers?". A statement of fact can be right or wrong or true or
false, even if it isn't known at the present time which of those is
the case. The statement "Four colors suffice to color any planar
map such that adjacent regions do not have the same color" is a
statement of fact, both now and 60 years ago before the statement
had been proven. Both P==NP and P!=NP are statements of fact, even
though one of them must certainly be false; the key property is
that they are objective statements, subject to falsification. If I
say "The Earth is flat", that is a statement of fact, even though
the statement is false.

I think you go too far. The word "fact" is not neutral as far as its
truth is concerned, and writing "a statement of fact" does not
significantly change that. Most dictionaries define a fact as something
that is true (or at least supported by currently available evidence).
One online essay[1] concludes that

"A statement of fact is one that has objective content and is
well-supported by the available evidence."

[1] https://philosophersmag.com/the-fact-opinion-distinction/

In US constitutional law, there is the concept of "False statements of
fact". The distinction is important in that context because they have
less protection under the First Amendment than true statements of fact.
They still have some protection, but not if they are defamatory, false advertising, or commercial speech.

--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Tue Jan 7 21:38:38 2025

From Newsgroup: comp.lang.c

Ben Bacarisse <ben@bsb.me.uk> writes:

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

A statement of fact is a statement concerning an objective question,
such as "Is every even number greater than 4 the sum of two prime
numbers?". A statement of fact can be right or wrong or true or
false, even if it isn't known at the present time which of those is
the case. The statement "Four colors suffice to color any planar
map such that adjacent regions do not have the same color" is a
statement of fact, both now and 60 years ago before the statement
had been proven. Both P==NP and P!=NP are statements of fact, even
though one of them must certainly be false; the key property is
that they are objective statements, subject to falsification. If I
say "The Earth is flat", that is a statement of fact, even though
the statement is false.

I think you go too far. The word "fact" is not neutral as far as its
truth is concerned, and writing "a statement of fact" does not
significantly change that. Most dictionaries define a fact as something
that is true (or at least supported by currently available evidence).
One online essay[1] concludes that

"A statement of fact is one that has objective content and is
well-supported by the available evidence."

[1] https://philosophersmag.com/the-fact-opinion-distinction/

I will concede that the phrase "statement of fact" can be used in
the sense you describe.

I believe it is also true that "statement of fact" is used in the
sense I describe, and that sense appears among the alternatives in
various well-regarded dictionaries.

In any case, my point was not to have a debate about the meaning of
a phrase, but to clarify the intended meaning of my earlier remarks.
I was making a statement about an objective question, one subject to independent verification or falsification. I was not offering a
comment that was merely expressing a personal point of view.
--- Synchronet 3.20a-Linux NewsLink 1.114

From Tim Rentsch@tr.17687@z991.linuxsc.com to comp.lang.c on Mon Jan 13 08:10:31 2025

From Newsgroup: comp.lang.c

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 21.12.2024 22:51, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 21.12.2024 02:28, Tim Rentsch wrote:

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

On 16.12.2024 00:53, BGB wrote:

[...]

Pretty much all higher level control flow can be expressed via
goto.

A 'goto' may be used but it isn't strictly *necessary*. What
*is* necessary, though, that is an 'if' (some conditional
branch), and either 'goto' or recursive functions.

Conditional branches, including 'if', '?:', etc., are not
strictly necessary either.

No? - Can you give an example of your statement?

(Unless you just wanted to say that in some HLL abstraction like
'printf("Hello world!\n")' there's no [visible] conditional
branch. Likewise in a 'ClearAccumulator' machine instruction, or
the like.)

The comparisons and predicates are one key function (not any
specific branch construct, whether on HLL level, assembler
level, or with the (elementary but most powerful) Turing
Machine). Comparisons inherently result in predicates which is
what controls program execution).

So your statement asks for some explanation at least.

Start with C - any of C90, C99, C11.

Take away the short-circuiting operators - &&, ||, ?:.

Take away all statement types that involve intra-function
transfer of control: goto, break, continue, if, for, while,
switch, do/while. Might as well take away statement labels too.

Take away setjmp and longjmp.

And also things like the above mentioned 'printf()' that most
certainly implies an iteration over the format string checking for
it's '\0'-end.

The *printf() functions can be implemented in standard C, under the
above stated limitations, without needing iteration.

And so on, and so on. - What will be left as "language".

I think most C developers would be able to answer that question
given the above stated description. Is there some part that isn't
clear to you?

Would you be able to formulate functionality of the class of
Recursive Functions (languages class of a Turing Machine with
Chomsky-0 grammar).

General rewrite grammars, which is another name IIRC for Chomsky
Type 0 languages, are computationally equivalent to Turing Machines
(which incidentally takes me back almost five decades to my formal computability education). The answer is yes.

Rule out programs with undefined behavior.

The language that is left is still Turing complete.

Is it?

Yes, it is.

But wouldn't that be just the argument I mentioned above; that a,
say, 'ClearAccumulator' machine statement wouldn't contain any
jump?

No, afaict the two questions have nothing to do with each other.

Proof: exercise for the reader.

(Typical sort of your reply.)

I expect you will see better results if you put more effort into
listening and thinking, and less effort into making ad hominem
remarks.
--- Synchronet 3.20a-Linux NewsLink 1.114

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,007
Nodes:	10 (0 / 10)
Uptime:	196:47:38
Calls:	13,143
Files:	186,574
D/L today:	511 files (113M bytes)
Messages:	3,310,136

Re: transpiling to low level C

Who's Online

System Info