• TclON Formal Specification Draft 1

    From Choosechee@choosechee@pm.me to comp.lang.tcl on Wed Mar 25 21:04:39 2026
    From Newsgroup: comp.lang.tcl

    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal specifications
    for feedback, but I also want to see if anyone here has any feedback,
    like if I've missed something from Tcl, though the lack of command and variable substitution is intentional. The first draft is below. Thank you.

    # Introduction {#introduction}

    *Tcl Object Notation*, or *TclON* (pronounced tick-lawn), is a collection of text-based formats for representing data that allows for structured data interchange. These formats can be nested within each other, allowing for the representation of complex objects in the same way that syntaxes like
    JSON, XML,
    and YAML can. Formats are not mutually exclusive with each other, leading to very dynamic objects, arguably more so than other data interchange
    formats. The
    formats are based on the formats used by Tcl, and TclON can be thought
    of to Tcl
    what JSON is to JavaScript/ECMAScript, hence the name. Although based on
    Tcl,
    TclON formats are meant to be language independent in the same way that
    JSON is;
    the formats do not specify how the resulting values should be
    represented in a
    programming language, though there should be an obvious representation
    for every
    format in most languages.

    # Scope {#scope}

    This specification will generally not describe a semantic meaning for each format, though they will have an obvious semantic meaning that is likely
    to be
    used by applications. Instead, only the syntactical requirements for
    each format
    are described. The only exceptions to this are the string, list, and dict formats, which have some defined semantics associated with them.

    # Formats {#formats}

    ## String {#string}

    The *string* format is the foundational format of TclON; every other TclON format is a subset of this format. A valid string shall be a sequence of Unicode
    code points that has both a literal and a final representation; the literal representation is the exact sequence of code points used to encode the
    string,
    while the final representation is what the string should actually be
    decoded to.
    The specification for this format applies to the literal representation, and describes how it shall be transformed to create the final
    representation. The
    specifications for other formats apply to the final representation of a string,
    *not* the literal representation.

    ### Types of Strings {#types_of_strings}

    A string shall be unsurrounded, quoted, or braced:
    - An *unsurrounded* string shall not start with a double quote ('"') or an
    opening curly brace ('{'), and shall not contain unescaped ASCII
    whitespace
    characters (other Unicode whitespace characters shall be allowed).
    - A *quoted* string shall start and end with unescaped double quotes, may
    contain unescaped ASCII whitespace, but shall not contain unescaped
    quotes in
    the middle.
    - Finally, a *braced* string shall start with an unescaped opening curly
    brace
    and end with an unescaped closing curly brace ('}'). Braced strings may
    contain unescaped ASCII whitespace, and may contain unescaped braces
    in the
    middle *if* they are properly closed, but shall not contain unescaped braces
    that are not properly closed. An escaped brace does not count for
    opening or
    closing an unescaped brace.
    All other Unicode characters may be in any string and in any position,
    as long
    as their position does not cause a constraint to be violated, except for the backslash ('\') character in [certain circumstances](#escapes).

    For quoted strings, the final representation shall not contain the
    surrounding
    double quotes. For braced strings, the final representation shall not
    contain
    the surrounding braces, but shall still contain any other unescaped braces.

    ### Escapes {#escapes}

    Putting a backslash in a string shall have one of two effects to the final representation:
    1. Replacing the immediately following recognized sequence of code
    points with
    a different, singular code point.
    2. Escaping the code point immediately following the backslash.
    In both cases, the backslash shall not be included in the final representation.
    A backslash shall be followed by at least one code point, meaning that a string
    cannot end with an unescaped backslash.

    The recognized escape sequences, in TCL ARE format[^1], along with the replacement
    code point, shall be as follows:
    - `a` = Audible Bell (0x07)
    - `b` = Backspace (0x08)
    - `t` = Horizontal Tab (0x09)
    - `n` = New Line (0x0A)
    - `v` = Vertical Tab (0x0B)
    - `f` = Form Feed (0x0C)
    - `r` = Carriage Return (0x0D)
    - `\n[\t\v\f\r ]*` = Space (0x20)
    - `[0-3]?[0-7]{1,2}` = Treating the sequence as an octal number, the Unicode
    code point corresponding to the UTF-8 code point
    with the
    same value as the octal number when treated as an
    unsigned integer
    - `x[:xdigit:]{1,2}` = Treating the sequence as a hexadecimal number, the
    Unicode code point corresponding to the UTF-8
    code point
    with the same value as the hexadecimal number when
    treated as an unsigned integer
    - `u[:xdigit:]{1,4}` = Treating the sequence as a hexadecimal number, the
    Unicode code point corresponding to the UTF-8
    code point
    with the same value as the hexadecimal number when
    treated as an unsigned integer
    - `U[:xdigit:]{1,8}` = Treating the sequence as a hexadecimal number, the
    Unicode code point corresponding to the UTF-8
    code point
    with the same value as the hexadecimal number when
    treated as an unsigned integer
    The last four substitutions shall still take place without an error even
    if the
    specified code point is not a valid Unicode code point.

    When a backslash is not followed by a recognized escape sequence, the immediately following code point shall be *escaped*. Escaping a code
    point shall
    remove any special meaning that the code point has, if any, and force
    the code
    point to be included in the final representation. This shall be used to put ASCII whitespace characters in an unsurrounded string, double quotes in the middle of a quoted string, unclosed braces in a braced string (there is a [caveat]({#braced_strings_&_escapes}) with this case), or backslashes themselves
    in a string. Escaping shall still work even when the character has no
    special
    meaning, in which case the backslash would have no effect on the final representation. However, a general TclON parser should provide some way to determine whether a character was escaped that is separate from the final representation, in case custom substitutions for certain sequences are
    desired
    by a user of the parser.

    #### Braced Strings and Escapes {#braced_strings_&_escapes}

    Most escape sequences and escapes shall not happen in braced strings,
    only in
    unsurrounded and quoted strings. Backslashes in braced strings, for the most part, shall be treated as normal characters and included in the final representation. The only exceptions are the new line to space escape
    sequence,
    and escaping curly braces. *However*, when escaping curly braces in braced strings, the backslash shall still be present in the final
    representation. The
    backslash shall *not* be present in the final representation when used
    for the
    new line to space escape sequence.

    ## Number {#number}

    A *number* shall be a string whose final representation is of the integer or float subformats.

    ### Integer {#integer}

    An *integer* shall be composed of the following elements, in the
    following order:
    1. An optional plus ('+') or minus ('-') sign.
    2. An optional base specifier, which shall be '0d'/'0D' (decimal), '0b'/'0B'
    (binary), '0o'/'0O' (octal), or '0x'/'0X' (hexadecimal).
    3. One or more digits of the specified base (case-insensitive for hexadecimal),
    or the decimal base if a base specifier is not present. Underscores
    ('_') may
    also be present, but *not* at the beginning or end of this element.

    ### Float {#float}

    A *float* shall be composed of the following elements, in the following
    order:
    1. An optional plus ('+') or minus ('-') sign.
    2. Either:
    - One or more decimal digits followed by a decimal point ('.').
    - A decimal point followed by one or more decimal digits.
    - One or more decimal digits followed by a decimal point, followed
    by one or
    more decimal digits.
    - One or more decimal digits, without any decimal point. This form
    shall only
    be allowed when an exponent is present.
    Underscores may also be present in any of the forms, but not at the beginning
    or end, and not adjacent to a decimal point.
    3. An optional exponent, which shall be composed of an 'e' or 'E',
    followed by
    an integer that does not contain a base specifier.

    ## Boolean {#boolean}

    A *boolean* shall be a string whose final representation is, case-insensitively,
    any of the following:
    - '1'
    - 'true'
    - 'yes'
    - 'on'
    - '0'
    - 'false'
    - 'no'
    - 'off'

    ## List {#list}

    A *list* shall be a string whose final representation contains any number of valid strings, in their literal representations, separated by one or
    more ASCII
    whitespace characters, not counting whitespace preceded by an odd number of backslashes or whitespace in the contained quoted or braced strings. A
    list may
    also contain leading or trailing whitespace. Parsers handling lists shall convert the contained strings to their final representations when considered individually. This is done regardless of the list's literal
    representation, such
    as whether the list is quoted or braced.

    If all the strings contained in a list are of the format X, then a list
    shall
    satisfy the 'X list' format. For example, if all the strings in a list are numbers, the list shall be considered a number list. A 'string list'
    should just
    be called a list, as, by definition, a list contains strings.

    ## Dict {#dict}

    A *dict* shall be a list which contains an even number of strings. It shall otherwise be treated exactly like a list; it is simply present to help facilitate representation of associative arrays, dictionaries, tables,
    maps, and
    similar data structures, though it can, of course, be used for any other purpose.

    If, for every pair of adjacent strings in a dict, not including the same string
    in multiple pairs, the first satisfies format X, and the second
    satisfies the
    format Y, the dict shall satisfy the 'X-Y dict' format. If X or Y is
    string, but
    the other is not, 'string' should be in the name. If both X and Y are
    string,
    however, it should just be called a dict.

    # Sources

    [^1]: https://www.tcl-lang.org/man/tcl9.0/TclCmd/re_syntax.html
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Christian Gollwitzer@auriocus@gmx.de to comp.lang.tcl on Thu Mar 26 08:03:01 2026
    From Newsgroup: comp.lang.tcl

    Am 26.03.26 um 03:04 schrieb Choosechee:
    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal specifications
    for feedback, but I also want to see if anyone here has any feedback,
    like if I've missed something from Tcl, though the lack of command and variable substitution is intentional. The first draft is below. Thank you.


    1.) If you are serious about it, don't write the specification in plain
    text. Write it in a formal grammar like BNF or PEG or in a form suitable
    for one of the existing parser generators. Then you can really test it.


    2.) I'm not sure, have you not just repeated the specifications of Tcl's serialization formats for various "types"?


    3.) What is your use-case? Beware of https://xkcd.com/927/


    Christian
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Choosechee@choosechee@pm.me to comp.lang.tcl on Thu Mar 26 09:28:27 2026
    From Newsgroup: comp.lang.tcl

    On 3/26/26 02:03, Christian Gollwitzer wrote:
    Am 26.03.26 um 03:04 schrieb Choosechee:
    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal
    specifications for feedback, but I also want to see if anyone here has
    any feedback, like if I've missed something from Tcl, though the lack
    of command and variable substitution is intentional. The first draft
    is below. Thank you.


    1.) If you are serious about it, don't write the specification in plain text. Write it in a formal grammar like BNF or PEG or in a form suitable
    for one of the existing parser generators. Then you can really test it.


    2.) I'm not sure, have you not just repeated the specifications of Tcl's serialization formats for various "types"?


    3.) What is your use-case? Beware of https://xkcd.com/927/


              Christian

    1. Maybe I haven't used the right term? I will write an EBNF grammar for
    it, but I thought standards usually had some document like this one in addition to the formal grammar.
    2. Well, yes, that's what I was trying to do. I wanted to make sure I
    didn't miss something that I didn't intentionally exclude.
    3. It's just for a bit of fun and practice. It's also taught me some
    more things about Tcl, like how the backslash new line substitution
    replaces any following whitespace that isn't another new line as well.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Olivier@user1108@newsgrouper.org.invalid to comp.lang.tcl on Sat Mar 28 11:28:13 2026
    From Newsgroup: comp.lang.tcl


    Choosechee <choosechee@pm.me> posted:

    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal specifications
    for feedback, but I also want to see if anyone here has any feedback,
    like if I've missed something from Tcl, though the lack of command and variable substitution is intentional. The first draft is below. Thank you.


    What is the license for TclON ? The same as https://www.json.org/license.html ? Shouldn't add it here ?

    Olivier
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From schmitzu@schmitzu@mail.de to comp.lang.tcl on Sat Mar 28 22:12:19 2026
    From Newsgroup: comp.lang.tcl

    It's a nice idea, and I'm sure it's good practice for writing a
    formal data interchange format.

    Reading your specification, I have a few questions.

    How does TclON differentiate between, lets say a string 3.1415 and
    a number 3.1415? In JSON strings are always surrounded by "...".
    In TclON, if I understand correctly, that isn't the case.
    Additionally, in JSON, you can detect the type of every value
    by looking at its first character. In Tcl, however, everything is
    a string. Usage determines the type.

    Other example:
    How does TclON differentiate between lists and dicts?
    Same problem as before. Dicts are also lists with an even number
    of elements. However, there can also be lists that should be
    lists and have an even number of elements.

    You may have noticed that some Tcl packages that read JSON data
    store a pair for each JSON value. For example, look at tDOM's
    "<domDoc> asTypedList." Otherwise, you won't be able to write
    JSON back with the correct types.

    Another aspect: JSON is a subset of JavaScript. This means you can
    feed the interpreter JSON data, and the result is a native data
    structure that fits seamlessly into the usual syntax.
    This is possible because JavaScript allows objects to be defined
    as prototypes. Tcl doesn't have this feature.
    Therefore, it's impossible to "source" TclON data natively into Tcl.
    Thus, TclON cannot be exactly the same as JSON for JavaScript.
    However, that may not be what you're looking for.

    Uwe

    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Simon Geard@simon@whiteowl.co.uk to comp.lang.tcl on Sun Mar 29 12:34:57 2026
    From Newsgroup: comp.lang.tcl

    On 26/03/2026 02:04, Choosechee wrote:
    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal specifications
    for feedback, but I also want to see if anyone here has any feedback,
    like if I've missed something from Tcl, though the lack of command and variable substitution is intentional. The first draft is below. Thank you.

    <snip>


    I was reading today about how JSON isn't a great fit for AI since it's
    very verbose and leads to a relatively large token consumption, it that
    a consideration for TclON?

    Simon
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Choosechee@choosechee@pm.me to comp.lang.tcl on Mon Mar 30 13:36:15 2026
    From Newsgroup: comp.lang.tcl

    On 3/28/26 06:28, Olivier wrote:

    Choosechee <choosechee@pm.me> posted:

    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal specifications
    for feedback, but I also want to see if anyone here has any feedback,
    like if I've missed something from Tcl, though the lack of command and
    variable substitution is intentional. The first draft is below. Thank you. >>

    What is the license for TclON ? The same as https://www.json.org/license.html ?
    Shouldn't add it here ?

    Olivier

    Well, it's not done yet, so it's All Rights Reserved for now. If I
    somehow don't finish it by January 1st of next year, I'll consider it
    Creative Commons licensed (fully public domain). Otherwise, it probably
    will use the JSON license.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Choosechee@choosechee@pm.me to comp.lang.tcl on Mon Mar 30 13:53:15 2026
    From Newsgroup: comp.lang.tcl

    On 3/28/26 16:12, schmitzu wrote:
    How does TclON differentiate between, lets say a string 3.1415 and
    a number 3.1415? In JSON strings are always surrounded by "...".
    In TclON, if I understand correctly, that isn't the case.
    Additionally, in JSON, you can detect the type of every value
    by looking at its first character. In Tcl, however, everything is
    a string. Usage determines the type.

    Other example:
    How does TclON differentiate between lists and dicts?
    Same problem as before. Dicts are also lists with an even number
    of elements. However, there can also be lists that should be
    lists and have an even number of elements.

    That's the neat part; it doesn't. You simply would ask for a string to
    be parsed as a number, list, etc. If you *need* to distinguish between
    types, then you could use a dict with a value and type key. If
    necessary, I could create a typed values extension later on.

    Another aspect: JSON is a subset of JavaScript. This means you can
    feed the interpreter JSON data, and the result is a native data
    structure that fits seamlessly into the usual syntax.
    This is possible because JavaScript allows objects to be defined
    as prototypes. Tcl doesn't have this feature.
    Therefore, it's impossible to "source" TclON data natively into Tcl.
    Thus, TclON cannot be exactly the same as JSON for JavaScript.
    However, that may not be what you're looking for.

    Hmm, true. You could read a TclON value and probably use it as a valid
    Tcl value, but it wouldn't quite be that simple due to potential
    variable substitution. I'll reconsider how I phrase that part of the specification.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From Choosechee@choosechee@pm.me to comp.lang.tcl on Mon Mar 30 13:54:37 2026
    From Newsgroup: comp.lang.tcl

    On 3/29/26 06:34, Simon Geard wrote:
    On 26/03/2026 02:04, Choosechee wrote:
    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal
    specifications for feedback, but I also want to see if anyone here has
    any feedback, like if I've missed something from Tcl, though the lack
    of command and variable substitution is intentional. The first draft
    is below. Thank you.

    <snip>


    I was reading today about how JSON isn't a great fit for AI since it's
    very verbose and leads to a relatively large token consumption, it that
    a consideration for TclON?

    Simon

    It wasn't, but maybe I could consider it.
    --- Synchronet 3.21f-Linux NewsLink 1.2
  • From saito@saitology9@gmail.com to comp.lang.tcl on Mon Mar 30 15:12:45 2026
    From Newsgroup: comp.lang.tcl

    On 3/25/2026 10:04 PM, Choosechee wrote:
    I'm writing a formal specification for a data interchange format. It's
    meant to be like JSON, but based on Tcl 9.0 instead of JavaScript, so
    I'm calling it TclON. I will find some group about formal specifications
    for feedback, but I also want to see if anyone here has any feedback,
    like if I've missed something from Tcl, though the lack of command and variable substitution is intentional. The first draft is below. Thank you.

    # Introduction {#introduction}

    *Tcl Object Notation*, or *TclON* (pronounced tick-lawn), is a
    collection of
    text-based formats for representing data that allows for structured data interchange. These formats can be nested within each other, allowing for
    the
    representation of complex objects in the same way that syntaxes like
    JSON, XML,
    and YAML can. Formats are not mutually exclusive with each other,


    Do you have a sample of what the notation would look like on a sample
    input? This might make it easier to see what it is, what it does, etc.

    --- Synchronet 3.21f-Linux NewsLink 1.2