UTF-8 is your friend
This morning, I experienced a great example of REALLY BAD internationalization.
I was busy booking tickets on the website of the largest European low-cost airlines. At some moment during the booking process, I was asked to type the first name and family name of each passenger. I typed my daughter's first name: Iséa. When I tabbed to the next field, I received this error message:

It appear that this European website selling tickets to airports in more than ten different countries, counting almost as many different official languages, was not able to process the acute accent in my daughter's first name !
And don't think it's because of some English-only arrogant company spirit: The site is available in 18 languages !
UTF-8 offers diacritics and non-latin charsets support for free
I blame them because allowing me to write Iséa correctly is such an easy job: Just save your HTML pages using the UTF-8 encoding (Yes, your HTML editor can do that. Even notepad can!).
UTF-8 encodes your data as Unicode. But the _very_ nice thing is that it preserves the basic properties of null terminated ASCII strings : basic 7-bits ASCII chars are not modified by the encoding. Other characters are converted to a series of 1, 2 or 3 characters (OK, sometimes 4 but it's very rare).
This has a really cool consequence: As long as you simply store and forward the data, you don't have to modify your code at all. Your code doesn't have to be Unicode-aware. Simply store and forward UTF-8 encoded strings as if they were simple and stupid ASCII, null terminated strings ! It's the browser's job to display them correctly (and they all do a good job at it).
The irony, in this case, is that even the so-called special character é is encoded in UTF-8 exactly the same way as using the infamous ISO-8859-1 specified in the HTML page.
Conclusion: Save your HTML pages as UTF-8
It's the biggest yet most simple step to make your web site world-ready.


0 Comments:
Post a Comment
<< Home