• OT: unicode (Was: Re: Upcoming gfortran 15 will contain unsigned numbers)

    From Wolfgang Agnes@wagnes@example.com to comp.lang.fortran on Mon Nov 25 08:35:48 2024
    From Newsgroup: comp.lang.fortran

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Sat, 23 Nov 2024 09:18:11 -0300, Wolfgang Agnes wrote:

    How about UCS-2?

    “UCS-2” was the name of the encoding back when it was assumed that Unicode
    was always going to be just 16 bits. After the coding was extended, those “surrogate” ranges were introduced, to allow representation of the extra characters within a 16-bit encoding, and so “UCS-2” was renamed to “UTF-16”.

    In short, “UTF-16” is basically “UCS-2 with surrogates”.

    Nice to know! Thanks. So, UCS means ``Universal Character Set''. I
    thought it was a whole different character set. It's a bit difficult to understand ``surrogates''. So many definitions come up such as ``Basic Multilingual Plane''. Can you explain what surrogates are?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lynn McGuire@lynnmcguire5@gmail.com to comp.lang.fortran on Mon Nov 25 14:39:37 2024
    From Newsgroup: comp.lang.fortran

    On 11/25/2024 5:35 AM, Wolfgang Agnes wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Sat, 23 Nov 2024 09:18:11 -0300, Wolfgang Agnes wrote:

    How about UCS-2?

    “UCS-2” was the name of the encoding back when it was assumed that Unicode
    was always going to be just 16 bits. After the coding was extended, those
    “surrogate” ranges were introduced, to allow representation of the extra >> characters within a 16-bit encoding, and so “UCS-2” was renamed to
    “UTF-16”.

    In short, “UTF-16” is basically “UCS-2 with surrogates”.

    Nice to know! Thanks. So, UCS means ``Universal Character Set''. I
    thought it was a whole different character set. It's a bit difficult to understand ``surrogates''. So many definitions come up such as ``Basic Multilingual Plane''. Can you explain what surrogates are?

    There is lots of information at
    https://home.unicode.org/

    And
    https://stackoverflow.com/

    Lynn

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.fortran on Mon Nov 25 23:35:34 2024
    From Newsgroup: comp.lang.fortran

    On Mon, 25 Nov 2024 08:35:48 -0300, Wolfgang Agnes wrote:

    It's a bit difficult to understand ``surrogates''.

    The Unicode folks just decided that the ranges 0xD800-0xDBFF (1024 codes
    of “high surrogates”) and 0xDC00-0xDFFF (1024 codes of “low surrogates”)
    would be used in pairs to represent codes above 0xFFFF in UTF-16 encoding. This gives an additional 1024×1024 = 1048576 different codes, which should
    be enough to cover the entire (current) Unicode range, which officially
    goes up to 0x10FFFF. At least, that’s what they’re saying right now.

    In the full UCS-4 encoding, those ranges are considered invalid.
    --- Synchronet 3.20a-Linux NewsLink 1.114