• ciforth model

    From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Thu Apr 16 15:38:47 2026
    From Newsgroup: comp.lang.forth

    The cimodel is based on a memory map that is well defined.

    The most important difference from all preceding models is the
    central role of the dictionary entry concept.
    The dictionary is subdivided into mutually exclusive entries
    with no other data.

    Memory model

    - os system header (or disk boot code)
    - boot code
    - bespoke dictionary
    - dictionary entries
    - header
    - memory possessed by the previous header
    HERE
    - free dictionary
    - task frame
    - buffers

    There is a HIP (high level interpreter pointer) that points to high
    level code. NEXT executes the dictionary entry pointed to by HIP.

    Only the part up till HERE is present in the executable.

    os system header is the obligatory os dependant code to make the
    file loadable by the os. (Or boot from disk.)
    Possibly definitions of segments are to be found here.

    boot code
    The boot code contains a safeguard of the external interface,
    for example the parameters that are passed to an executable.
    Then it initializes the data stack pointer DSP , the return stack pointer RSP, and the high level interpreter pointer HIP, and possibly
    other register initialisation for special purposes.
    It imposes a structure on the task frame and possibly buffers based on pertinent data that is stored in so called user variables.
    (In order to save the system, user variables are changed
    to reflect the new state of the system.)
    The HIP is made to point to the first command
    of the word COLD. Then this first command is executed,
    which is indicated by "doing NEXT".

    Bespoke dictionary
    The reserved part of the dictionary is subdivided in dictionary
    entries.
    A d.e. is
    - name (string, i.e. possibly variable length)
    - Fixed size field, normally the cell-size of the current Forth
    - C Code pointer to what can be executed.
    - D Data pointer to what can be fetched or stored.
    - F A bit array of flags, e.g. immediate flag.
    - L Link information to other dictionary entries.
    - N Contains a name or points to a name
    And possibly optional CELLS like
    - S Points to source code
    - X Whatever serves
    - Data possessed by the preceding entry.
    This data may be machine code, interpreted code or plain data.

    Free dictionary
    The free dictionary can be allocated and then becomes bespoke.
    Strings, functions, buffers, and headers all can be allocated.

    Task frame
    A task frame consist of a data stack, terminal input buffer,
    return stack and user variables.
    This is called thusly because it is replicated if multiple
    task are running concurrently.
    Note that the data stack can run down to HERE, like in the FIG-model.

    The buffers
    The buffers are 1 Kbyte buffers. They are used for a block
    system, that plays an important role as a library.
    They are locked and unlocked while in use and can serve
    for nested includes for files, as well.

    Indirect threading model.
    A pointer to the first fixed field of a header is called dea,
    dictionary entry address.
    Indirect threaded code means that the program counter is loaded
    with the C field of a dea, so effecting an indirect jump to
    the C-field.

    An entry is identified by the handle. All manipulation and properties
    of "definitions", "words", "functions", "buffers" is referring to by
    this address.
    Instead of having a plethora of relation between different fields,
    one finds properties of a dea by passing through the fields.


    A dea containing a low level definition has a pointer to machine code
    in C.

    A variable or buffer contains code that returns the content
    of the D field pointing to storage (in general directly
    after the header.)
    A constant contains code that returns the content of the
    D field, where there is no implication of that being a pointer.

    A high level definition contains a pointer to a specific
    machine code called DOCOL in C.
    The D pointer points to an area (in general directly after
    the header) where a sequence of dea's is stored.
    Execution the definition means executing these dea's in order.

    An object (CREATE DOES> construct) contains a pointer to specific
    machine code called DODOES in C.
    The D pointer points to an area where a pointer to the DOES>
    code resides, followed by a data area.

    ANNEX
    In the 64 bits era a string constant is
    - a cell containing the length in bytes
    - string itself, not necessarily one char per byte.
    - alignment to 8 bytes.
    All fields are one cell.

    However we could squeeze for 16 bits, without logically affecting the model.

    code field: one byte, an offset to a code area of 256 bytes.
    data field: 16 bit pointer
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    flag field: hidden in the name
    link field: 256 bit offset, d.e. are at most 256 byte long.
    The total of 8 bytes will put even the original fig model to shame.
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From dxf@dxforth@gmail.com to comp.lang.forth on Fri Apr 17 11:44:49 2026
    From Newsgroup: comp.lang.forth

    On 16/04/2026 11:38 pm, albert@spenarnc.xs4all.nl wrote:
    ...
    However we could squeeze for 16 bits, without logically affecting the model.

    code field: one byte, an offset to a code area of 256 bytes.
    data field: 16 bit pointer
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    flag field: hidden in the name
    link field: 256 bit offset, d.e. are at most 256 byte long.
    The total of 8 bytes will put even the original fig model to shame.

    Don't know who was first but Fig-forth's variable length names is something that Forth Inc and pretty much everyone adopted. Moore attempted to defend
    '3 chars plus count' but to no avail. That ship had sailed.

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Paul Rubin@no.email@nospam.invalid to comp.lang.forth on Fri Apr 17 00:27:21 2026
    From Newsgroup: comp.lang.forth

    albert@spenarnc.xs4all.nl writes:
    However we could squeeze for 16 bits, without logically affecting the model.

    These days for such a constrained target, it's probably best to tether
    from a bigger machine.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From anton@anton@mips.complang.tuwien.ac.at (Anton Ertl) to comp.lang.forth on Fri Apr 17 07:29:44 2026
    From Newsgroup: comp.lang.forth

    dxf <dxforth@gmail.com> writes:
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    ...
    Don't know who was first but Fig-forth's variable length names is something >that Forth Inc and pretty much everyone adopted. Moore attempted to defend >'3 chars plus count' but to no avail. That ship had sailed.

    Looking at the traditional length+3 chars and
    albert@spenarnc.xs4all.nl's 3 first and last, at least one pair of
    words in Forth-94 conflicts on both systems, and WORDS could show them
    as REA?????? and REA*E, respectively. It would be interesting to
    determine (say, by checking the words from an existing Forth system),
    which scheme produces more conflicts.

    Moore continues with this approach in Color Forth, but he uses some
    compression approach to usually store more characters in the number of
    bits he reserves for the name (IIRC 2 cells, with cell sizes of 20
    bits, 18 bits, and 32 bits on different hardware). I don't remember
    if he stores the length.

    Another option would be to store a hash value that is computed using
    all characters in the name. If a good hash function is used, the
    probability of a conflict is relatively small with, e.g. 4000 names in
    a wordlist (about the number of names that Gforth has in the Forth
    wordlist), and even the 28 bits that albert@spenarnc.xs4all.nl
    provides. The probability of no conflict is approximately

    ((2^28-1)/(2^28))^((4000*3999)/2)

    i.e.

    1 28 lshift s>f fdup 1e f- fswap f/ 4000 dup 1- * 2/ s>f f** f.

    The result is 0.97, i.e., there is a 3% probability of conflict for
    these numbers.

    The disadvantage of this approach is that WORDS or SEE cannot even
    show the little about the name that Chuck Moore's approaches or albert@spenarnc.xs4all.nl's approach shows. But then, if you are so
    pressed for memory that you use one of these approaches, why not also
    save the memory for WORDS and SEE?

    Another disadvantage is that the system cannot tell if a redefinition
    warning comes from a hash conflict or from the name actually being
    redefined; but it shares this disadvantage with all approaches that do
    not store the full name.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2025 proceedings: http://www.euroforth.org/ef25/papers/
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Fri Apr 17 12:10:34 2026
    From Newsgroup: comp.lang.forth

    In article <2026Apr17.092944@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    dxf <dxforth@gmail.com> writes:
    name filed: 4 byte, 3 first and last char. Only 7 bits,
    8th bit counts are flags
    ...
    Don't know who was first but Fig-forth's variable length names is something >>that Forth Inc and pretty much everyone adopted. Moore attempted to defend >>'3 chars plus count' but to no avail. That ship had sailed.

    Looking at the traditional length+3 chars and
    albert@spenarnc.xs4all.nl's 3 first and last, at least one pair of
    words in Forth-94 conflicts on both systems, and WORDS could show them
    as REA?????? and REA*E, respectively. It would be interesting to
    determine (say, by checking the words from an existing Forth system),
    which scheme produces more conflicts.

    The argument was that an extreme impractical Forth can be implemented
    with this model as guideline, to counter the argument of extreme waste
    that I expected.

    This is a realistic header in this model.
    The name takes 24 bytes or 3 cells, a pointer to an area, a preceding count
    and a 1 byte area padded to 8 bytes.


    2513 # *********
    2514 # * + *
    2515 # *********
    2516 #
    2517 21c3 00000000 .balign 8,0x00
    2517 00
    2518 N_PLUS: # Name string
    2519 21c8 01000000 .quad 1 # Name string
    2519 00000000 # Name string
    2520 21d0 2B .ASCII "+" # Name string
    2521 21d1 00000000 .balign 8,0x00 # Name string
    2521 000000
    2522 PLUS: # 0x21D8 is the handle
    2523 21d8 00000000 .quad X_PLUS # code
    2523 00000000
    2524 21e0 00000000 .quad PLUS+HEADSIZE # data ignored

    2524 00000000
    2525 21e8 00000000 .quad 0x0 # flags, empty
    2525 00000000
    2526 21f0 00000000 .quad ZLESS # link pointer
    2526 00000000
    2527 21f8 00000000 .quad N_PLUS # points to name
    2527 00000000
    2528 2200 00000000 .quad 0 # source field
    2528 00000000
    2529 2208 00000000 .quad 0 # extra field (spare)
    2529 00000000
    2530
    2531 X_PLUS:
    2532
    2533 2210 58 POP %RAX #(S1) <- (S1) + (S2)
    2534 2211 5B POP %RBX
    2535 2212 4801D8 ADD %RAX,%RBX
    2536 2215 50 PUSH %RAX
    2537 2216 48AD LODSQ # NEXT
    2538 2218 FF20 JMP QWORD PTR[%RAX]
    2539

    <SNIP>

    Print the name for + :
    HEX 21D8 >NFA $@ TYPE

    A total of 10 cells for the header alone.
    Who cares?

    lina+ -a
    AMDX86 ciforth beta 2026Apr12
    WANT UNUSED
    OK
    UNUSED .
    134221795712 OK


    - anton

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From albert@albert@spenarnc.xs4all.nl to comp.lang.forth on Fri Apr 17 12:13:20 2026
    From Newsgroup: comp.lang.forth

    In article <87ecke59nq.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    albert@spenarnc.xs4all.nl writes:
    However we could squeeze for 16 bits, without logically affecting the model.

    These days for such a constrained target, it's probably best to tether
    from a bigger machine.

    What was the argument about? See my answer to Anton Ertl.

    Groetjes Albert
    --
    The Chinese government is satisfied with its military superiority over USA.
    The next 5 year plan has as primary goal to advance life expectancy
    over 80 years, like Western Europe.
    --- Synchronet 3.21f-Linux NewsLink 1.2