From Newsgroup: comp.lang.javascript
Newsgroups: comp.lang.python,comp.lang.javascript
Here's a small comparing tutorial about URI encoding
and how to use it for repairing mis-encoded strings.
Code examples are given for both JavaScript ("JS") and
Python ("py").
Section 1: Preparations
In Python, we need two import's:
from urllib.parse import quote
from urllib.parse import unquote
Section 2: One-pass operations
convert a character to its ISO 8859-1 representation and
then represent this in URI notation
escape( 'ä' )
"%E4"
quote( 'ä', encoding='iso8859-1' )
'%E4'
convert a character to its UTF-8 representation and then
represent this in URI notation
encodeURI( 'ä' )
"%C3%A4"
quote( 'ä', encoding='utf-8' )
'%C3%A4'
get the character from URI notation assuming it was
encoded using its ISO 8859-1 representation
unescape( '%E4' )
"ä"
unquote( '%E4', encoding='iso8859-1' )
'ä'
get the character from URI notation assuming it was
encoded using its UTF-8 representation
decodeURIComponent( '%C3%A4' )
"ä"
unquote( '%C3%A4', encoding='utf-8' )
'ä'
Section 2: Repairing a misencoded character sequence
Sometimes people decode UTF-8 as if it was ISO 8859-1.
This results in ugly strings like »Ã¤« (for »ä«).
unescape( encodeURI( 'ä' ))
"ä"
unquote( quote( 'ä', encoding='utf-8' ), encoding='iso8859-1' )
'ä'
But with JavaScript or Python we can repair such strings!
We first URI-encode them to get a kind of octet sequence.
escape( 'ä' )
"%C3%A4"
quote( 'ä', encoding='iso8859-1' )
'%C3%A4'
Now we can decode this octet sequence using the correct
encoding!
decodeURIComponent( escape( 'ä' ))
"ä"
unquote( quote( 'ä', encoding='iso8859-1' ), encoding='utf-8' )
'ä'
Summary
Both Python an JavaScript allow to URI-encode ISO 8859-1
characters using either ISO 8859-1 or UTF-8. They also allow
to decode them again. The details of the function calls
differ somewhat. The explicit mentioning of the encoding in
the Python calls makes them more orthogonal (readable)
than the JavaScript names who do not mention the encodings.
Newsgroups: comp.lang.python,comp.lang.javascript
--- Synchronet 3.20a-Linux NewsLink 1.114