• Why Strict YAML Refuses To Do Implicit Typing

    From Ben Collver@bencollver@tilde.pink to comp.misc on Wed Apr 24 03:08:15 2024
    From Newsgroup: comp.misc

    Why StrictYAML refuses to do implicit typing and so should you ==============================================================
    The Norway Problem
    ------------------
    A while back I met an old coworker and he started telling me about
    this interesting bug he faced:

    "So, we started internationalizing the website by creating a config
    file. We added the UK, Ireland, France and Germany at first."

    countries:
    - GB
    - IE
    - FR
    - DE

    "This was all fine. However, one day after a quick configuration
    change all hell broke loose. It turned out that while the UK, France
    and Germany were all fine, Norway was not..."

    "While the website went down and we were losing money we chased down
    a number of loose ends until finally finding the root cause."

    "If turned out that if feed this configuration file into pyyaml:"

    countries:
    - GB
    - IE
    - FR
    - DE
    - NO

    "This is what you got in return:"

    from pyyaml import load
    load(the_configuration)
    {'countries': ['GB', 'IE', 'FR', 'DE', False]}

    It snows a lot in False.

    When this is fed to code that expects a string of the form 'NO', then
    the code will usually break, often with a cryptic error, Typically it
    would be a KeyError when trying to use 'False' as a key in a dict
    when no such key exists.

    It can be "quick fixed" by using quotes - a fix for sure, but kind of
    a hack - and by that time the damage is done:

    countries:
    - GB
    - IE
    - FR
    - DE
    - 'NO'

    The most tragic aspect of this bug, howevere, is that it is intended
    behavior according to the YAML 2.0 specification. The real fix
    requires explicitly disregaring the spec - which is why most YAML
    parsers have it.

    StrictYAML sidesteps this problem by ignoring key parts of the spec,
    in an attempt to create a "zero surprises" parser.

    Everything is a string by default:

    from strictyaml import load
    load(the_configuration).data
    {'countries': ['GB', 'IE', 'FR', 'DE', 'NO']}

    String or float?
    ----------------
    Norway is just the tip of the iceberg. The first time this problem
    hit me I was maintaining a configuration file of application
    versions. I had a file like this initially - which caused no
    issues:

    python: 3.5.3
    postgres: 9.3.0

    However, if I changed it very slightly:

    python: 3.5.3
    postgres: 9.3

    I started getting type errors because it was parsed like this:

    from ruamel.yaml import load
    load(versions) == [{"python": "3.5.3", "postgres": 9.3}]
    # oops those *both* should have been strings

    Again, this led to type errors in my code. Again, I 'quick fixed' it
    with quotes. However, the solution I really wanted was:

    from strictyaml import load
    load(versions) == [{"python": "3.5.3", "postgres": "9.3"}]
    # that's better

    The world's most buggy name
    ---------------------------
    Christopher Null has a name that is notorious for breaking software
    code - airlines, banks, every bug caused by a programmer who didn't
    know a type from their elbow has hit him.

    YAML, sadly, is no exception:

    first name: Christopher
    surname: Null

    Is it okay if we just call you Christopher None instead? --------------------------------------------------------
    load(name) == {"first name": "Christopher", "surname": None}

    With StrictYAML:

    from strictyaml import load
    load(name) == {"first name": "Christopher", "surname": "Null"}

    Type theoretical concerns
    -------------------------
    Type theory is a popular topic with regards to programming languages,
    where a well designed type system is regarded (rightly) as a yoke
    that can catch bugs at an early stage of development while poorly
    designed type systems provide fertile breeding ground for edge case
    bugs.

    (it's equally true that extremely strict type systems require a lot
    more upfront and the law of diminishing returns applies to type
    strictness - a cogent answer to the question "why is so little
    software written in haskell?").

    A less popular, although equally true idea is the notion that markup
    languages like YAML have the same issues with types - as demonstrated
    above.

    User Experience
    ---------------
    In a way, type systems can be considered both a mathematical concern
    and a UX device. In the above case

    In the above, and in most cases, implicit typing represents a major
    violation of the UX principle of least astonishment.

    From: <https://hitchdev.com/strictyaml/why/implicit-typing-removed/>

    p.s.

    1. There is no YAML 2.0, only 1.0, 1.1, and 1.2.
    2. If you just quoted every country, then it wouldn't be a problem.
    3. The YAML 1.1 spec does NOT mandate "language independent types"
    4. The YAML 1.2 spec solves this problem with "schemas."
    <https://yaml.org/spec/1.2/spec.html#id2802346>
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Wed Apr 24 11:39:56 2024
    From Newsgroup: comp.misc

    Ben Collver <bencollver@tilde.pink> wrote or quoted:
    StrictYAML sidesteps this problem by ignoring key parts of the spec,
    in an attempt to create a "zero surprises" parser.

    HTML went the opposite way. Back in the day, quotation marks around
    attributes were mandatory, no ifs or buts. But nowadays, you can
    sometimes skip 'em, depending on what's inside the attribute!
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Wed Apr 24 12:14:22 2024
    From Newsgroup: comp.misc

    Ben Collver <bencollver@tilde.pink> wrote or quoted:
    - GB
    - IE
    - FR
    - DE

    These days JSON is ubiquitous on the web. In the programming world,
    it's Python that rules. Check this out - the following notation is
    valid JSON as well as valid Python, and it's so straightforward that
    most people could understand it after a bit of training.

    { "countries":
    [ "GB",
    "IE",
    "FR",
    "DE" ]}

    Here's a script that parses the same string (like it could
    have been read from a config file) once as JSON and once
    as Python - the result is the same.

    Main.py

    # This is a script for Python 3.9.

    # Prepare that input like it's a string that got read from a file.
    input = '''
    { "countries":
    [ "GB",
    "IE",
    "FR",
    "DE" ]}
    '''[ 1: -1 ]
    print( f"input is (without the outer apostrophes):\n'{input}'" )

    print( "\nReading input as json:" )
    import json
    print( json.loads( input ))

    print( "\nReading input as Python:" )
    import ast
    print( ast.literal_eval( input ))

    Output

    input is (without the outer apostrophes):
    '{ "countries":
    [ "GB",
    "IE",
    "FR",
    "DE" ]}'

    Reading input as json:
    {'countries': ['GB', 'IE', 'FR', 'DE']}

    Reading input as Python:
    {'countries': ['GB', 'IE', 'FR', 'DE']}

    So the input got converted into a Python dict, which contains
    a Python list, which in turn holds Python strs (strings).
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.misc on Wed Apr 24 22:24:35 2024
    From Newsgroup: comp.misc

    On 24 Apr 2024 11:39:56 GMT, Stefan Ram wrote:

    HTML went the opposite way. Back in the day, quotation marks around
    attributes were mandatory, no ifs or buts. But nowadays, you can
    sometimes skip 'em, depending on what's inside the attribute!

    HTML was originally an “application” of SGML, and followed its conventions. Which meant that quotation marks were optional if there was
    no confusion.

    Nowadays, we put them in as a matter of course.

    Omitted closing tags are still with us.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Blue-Maned_Hawk@bluemanedhawk@invalid.invalid to comp.misc on Thu Apr 25 03:43:32 2024
    From Newsgroup: comp.misc

    Datatypes are a social construct.
    --
    Blue-Maned_Hawk│shortens to Hawk│/blu.mɛin.dʰak/│he/him/his/himself/Mr. blue-maned_hawk.srht.site
    It's amazing to see how poorly it's possible to do this.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.misc on Thu Apr 25 04:29:24 2024
    From Newsgroup: comp.misc

    On Thu, 25 Apr 2024 03:43:32 -0000 (UTC), Blue-Maned_Hawk wrote:

    Datatypes are a social construct.

    Is that interpretation inherited, or constructed?
    --- Synchronet 3.20a-Linux NewsLink 1.114