• URI encoding in both Python and JavaScript

    From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.python,comp.lang.javascript on Mon Oct 15 22:24:59 2018
    From Newsgroup: comp.lang.javascript

    Newsgroups: comp.lang.python,comp.lang.javascript

    Here's a small comparing tutorial about URI encoding
    and how to use it for repairing mis-encoded strings.

    Code examples are given for both JavaScript ("JS") and
    Python ("py").

    Section 1: Preparations

    In Python, we need two import's:

    from urllib.parse import quote
    from urllib.parse import unquote

    Section 2: One-pass operations

    convert a character to its ISO 8859-1 representation and
    then represent this in URI notation

    escape( 'ä' )
    "%E4"
    quote( 'ä', encoding='iso8859-1' )
    '%E4'

    convert a character to its UTF-8 representation and then
    represent this in URI notation

    encodeURI( 'ä' )
    "%C3%A4"
    quote( 'ä', encoding='utf-8' )
    '%C3%A4'

    get the character from URI notation assuming it was
    encoded using its ISO 8859-1 representation

    unescape( '%E4' )
    "ä"
    unquote( '%E4', encoding='iso8859-1' )
    'ä'

    get the character from URI notation assuming it was
    encoded using its UTF-8 representation

    decodeURIComponent( '%C3%A4' )
    "ä"
    unquote( '%C3%A4', encoding='utf-8' )
    'ä'

    Section 2: Repairing a misencoded character sequence

    Sometimes people decode UTF-8 as if it was ISO 8859-1.
    This results in ugly strings like »Ã¤« (for »ä«).

    unescape( encodeURI( 'ä' ))
    "ä"
    unquote( quote( 'ä', encoding='utf-8' ), encoding='iso8859-1' )
    'ä'

    But with JavaScript or Python we can repair such strings!
    We first URI-encode them to get a kind of octet sequence.

    escape( 'ä' )
    "%C3%A4"
    quote( 'ä', encoding='iso8859-1' )
    '%C3%A4'

    Now we can decode this octet sequence using the correct
    encoding!

    decodeURIComponent( escape( 'ä' ))
    "ä"
    unquote( quote( 'ä', encoding='iso8859-1' ), encoding='utf-8' )
    'ä'

    Summary

    Both Python an JavaScript allow to URI-encode ISO 8859-1
    characters using either ISO 8859-1 or UTF-8. They also allow
    to decode them again. The details of the function calls
    differ somewhat. The explicit mentioning of the encoding in
    the Python calls makes them more orthogonal (readable)
    than the JavaScript names who do not mention the encodings.

    Newsgroups: comp.lang.python,comp.lang.javascript

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.python,comp.lang.javascript on Mon Oct 15 22:31:42 2018
    From Newsgroup: comp.lang.javascript

    ram@zedat.fu-berlin.de (Stefan Ram) writes:
    encodeURI( 'ä' )

    PS:

    In JavaScript, there is a difference between "encodeURI" and
    "encodeURIComponent" that might be represented in Python as:

    encodeURI( s ) --> urllib.parse.quote(s, safe='~@#$&()*!+=:;,.?/\''); encodeURIComponent( s ) --> urllib.parse.quote(s, safe='~()*!.\'')

    .

    --- Synchronet 3.20a-Linux NewsLink 1.114