Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Snitz Forums 2000 DEV-Group
 DEV Internationalization (v4)
 FAQ: Charsets, codepages & UTF-8 (IE only)
 Forum Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

Deleted
deleted

4116 Posts

Posted - 05 October 2002 :  23:43:41  Show Profile
See here: Globalization in Microsoft Internet Explorer. Also make sure you read the transcript for details (link on that page).

Here are some parts of the slides:

Character Sets
Map numeric values to characters
..Set of characters = Reportoire
ASCII, ISO-8859-1 most well known
Only 8-bit standards
Other standards needed
..Korean, Japanese, Chinese, Arabic, Devanagari, etc.
Charset != Language

Character Encodings
Specify memory/transmission format for chars
Japanese: JIS vs. Shift-JIS
..Both use JIS X 0208-1990 charset
Create chars 1-4 bytes long
“Charset” vs. “character encoding” – much confusion

Codepages
Windows® implementation of encodings
Full list in MSDN™ Library
Don’t worry about them

Problems with Character Sets
Not uniform
Competing standards
Must support each new charset/encoding
Must select encoding per language

Enter Unicode
All languages in one charset
Chars are 16-bit
..Can be larger using surrogates
Display multiple languages at once
Same as ISO-10646’s Basic Multilingual Plane (BMP)

Unicode Encoding Types
UTF-16 (“raw” Unicode)
..BOM determines endian-ness
UTF-8 (standard)
UTF-7 (mainly historical)

Creating Unicode Files
Notepad
Microsoft® Word
HTML Entity References

Unicode Benefits
Developing binaries easier
Internet Explorer & ASP understand natively
COM understands natively
SQL Server™ 7.0 has UNICODE field type

Unicode Drawbacks
No Unicode .asp or .js files
SQL Server uses UCS-2, not UTF-8
..Fixed in IIS 5.0 - KB Q232580

Overview of Language Negotiation
Internet Explorer sends language options to server
..Accept-language
..Accept-charset
Server processes data w/ codepage
Server returns data
..Content-type
..Content-language

How Internet Explorer Reads/Sends Data
Content-type 
..META tag also used
If not specified, autodetection used
..Byte-sniffing
..Problematic, sometimes fails
If fails, Internet Explorer defaults to ISO-8859-1
Internet Explorer 5: Language pack auto-download
..Not available with Unicode
..POSTs to server use page encoding

How Internet Explorer Enables Input - IME
Japanese, Chinese need front-end processing
Input Method Editor (IME)
Controlled by Input Method Manager (IMM)
...

<

Stop the WAR!
  Previous Topic Topic Next Topic  
 Forum Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.19 seconds. Powered By: Snitz Forums 2000 Version 3.4.07