See here: Globalization in Microsoft Internet Explorer. Also make sure you read the transcript for details (link on that page).
Here are some parts of the slides:
Character Sets
Map numeric values to characters
..Set of characters = Reportoire
ASCII, ISO-8859-1 most well known
Only 8-bit standards
Other standards needed
..Korean, Japanese, Chinese, Arabic, Devanagari, etc.
Charset != Language
Character Encodings
Specify memory/transmission format for chars
Japanese: JIS vs. Shift-JIS
..Both use JIS X 0208-1990 charset
Create chars 1-4 bytes long
“Charset” vs. “character encoding” – much confusion
Codepages
Windows® implementation of encodings
Full list in MSDN™ Library
Don’t worry about them
Problems with Character Sets
Not uniform
Competing standards
Must support each new charset/encoding
Must select encoding per language
Enter Unicode
All languages in one charset
Chars are 16-bit
..Can be larger using surrogates
Display multiple languages at once
Same as ISO-10646’s Basic Multilingual Plane (BMP)
Unicode Encoding Types
UTF-16 (“raw” Unicode)
..BOM determines endian-ness
UTF-8 (standard)
UTF-7 (mainly historical)
Creating Unicode Files
Notepad
Microsoft® Word
HTML Entity References
Unicode Benefits
Developing binaries easier
Internet Explorer & ASP understand natively
COM understands natively
SQL Server™ 7.0 has UNICODE field type
Unicode Drawbacks
No Unicode .asp or .js files
SQL Server uses UCS-2, not UTF-8
..Fixed in IIS 5.0 - KB Q232580
Overview of Language Negotiation
Internet Explorer sends language options to server
..Accept-language
..Accept-charset
Server processes data w/ codepage
Server returns data
..Content-type
..Content-language
How Internet Explorer Reads/Sends Data
Content-type
..META tag also used
If not specified, autodetection used
..Byte-sniffing
..Problematic, sometimes fails
If fails, Internet Explorer defaults to ISO-8859-1
Internet Explorer 5: Language pack auto-download
..Not available with Unicode
..POSTs to server use page encoding
How Internet Explorer Enables Input - IME
Japanese, Chinese need front-end processing
Input Method Editor (IME)
Controlled by Input Method Manager (IMM)
...
<