• FidoNews 33:40 [01/06]: General Articles

    From Fidonews Robot@1:2320/100 to All on Mon Oct 3 01:40:22 2016
    =================================================================
    GENERAL ARTICLES =================================================================

    An UTF-8 nodelist. And now what?
    By Michiel van der Vlist, 2:280/5555


    As you may or may not know, ZC2 in cooperation with some Z2 RCs,
    distributes an UTF-8 encoded nodelist in addition to the weekly and
    daily ASCII encoded lists. It is distributed on a daily basis in the
    file area DAILYUTF.

    The project started a couple of years ago and took a slow start.
    Presently only R28 and R56 participate actively by submitting an UTF-8
    encoded segment. The number of regions participating passively is
    unknown.

    The UTF-8 nodelist offers sysops the opportunity to have their names,
    their loaction and the name of their system listed as spelled in the
    alfabet of their native language.

    I am often asked "what use is an UTF-8 encoded nodelist, when there is
    hardly any FTN sofware around that can handle it?" This article is
    about how I deal with the UTF-8 nodelist in my system. I hope it will
    be a guideline for others.

    Of the the software I use: binkd, Fmail and Golded, only Golded is
    relevant. Binkd and Fmail are character encoding agnostic with regard
    to the nodelist. But Golded uses the nodelist as a lookup data base
    and it it looks at the names of the sysops.

    Golded can not deal directly with UTF-8. Even worse, it has the nasty
    habit - at least the Windows version - of disregarding the active
    codepage and always forcing the code page that was in effect at
    system start up. In my case CP850. So I have no choice but - for
    better or worse - to convert to CP850 whatever I offer to Golded's
    nodelist compiler.

    For this I use two commonly available utilities: sed and iconv. Sed is
    a so called serial or stream editor. It converts streams of bytes into something else. I this case I use it to convert characters not present
    in CP850 to something else. Sed is driven bij a script in this case
    called to850.scr

    Here is the line in the batch file that is called when DILYUTF.ddd
    comes in:

    Daynbr Sed -f to850.scr dailyutf.@### >dailyutf.998

    Daynbr is the win32 version of the famous daynbr.com by Ben Baker.

    Here is the content of to850.scr:

    s/#O+r+#O+r|/ij/g
    s/#O+r+#O+Oa/IJ/g
    s/reU/EUR/g

    Oops, that looks nasty. It looks OK when viewed in in UTF-8
    environment, but unfortunately Fidonews can not deal with UTF-8 yet,
    so it does no look so good. What is betweem the first slashes in the
    first line is the two byte UTF-8 code for the Dutch concatenated 'ij'
    as viewed in CP850. This translates the Dutch concatenated 'ij'into
    the diftong 'ij'. The second line does the same for the comapnion
    capital.

    The third line translates the Euro currency symbol into "EUR". Those
    are all the characters that I presently expect in the UTF-8 nodelist
    that do not fit into CP850.

    The next line in the batch file calls iconv. Iconv is a character code conversion utility. It converts files from one character encoding
    scheme to another. Of course it will only work properly if the
    characters to bve converted are present in the target encoding set.

    Oh, wait, I first introduce a 2 second delay. To make sure that the
    next file, DAILYUTF.999 has a later time stamp then DAILYUTF.998.
    That way Golded's nodelist compiler sees it as "the latest".

    ping -n 2 loopback
    Iconv -c -f utf-8 -t cp850 dailyutf.998 >dailyutf.999
    cd \fido\golded
    gncyg -f -d

    The -f tells golded to do a forced compile and the -d tells it to
    remove duplicate entries. There will be a lot as I compile both the
    ASCII nodelist and the UTF-8 noelist

    In golded's config I have this for the nodelist section:

    NODEPATH d:\fido\nodelist\
    NODELIST d:\fido\\nodelist\dailyutf.*
    NODELIST d:\fido\nodelist\dailylst.*
    NODELIST d:\fido\z2pnt\z2pnt.*


    As a result I can now type either "schroeter" or "schroter" in the To:
    field of a message in Golded and the nodelist lookup of Golded will
    give me the node numbers of Ullrich Schroeter or Ullrich Schroter.

    This is an ongoing project. When our friends in Eastern Europe join
    the project, things get more complicated. I do not yet know how to
    deal with that. If and when it happens and I find a workaround, that
    may trigger a follow up article.


    (C) 2016, Michiel van der Vlist.

    -----------------------------------------------------------------

    --- Azure/NewsPrep 3.0
    # Origin: Home of the Fidonews (2:2/2.0)
    * Origin: LiveWire BBS - Synchronet - LiveWireBBS.com (1:2320/100)