Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Snitz Forums 2000 DEV-Group
 DEV Internationalization (v4)
 V3406 Unicode Encoding Support #2
 Forum Locked  Topic Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

TSAloha
Junior Member

USA
151 Posts

Posted - 08 May 2007 :  00:25:49  Show Profile
Note on a basic rationale for considering unicode based forums:

"<b>Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. </b>"

Unicode Home Page

Unicode support is widely deployed in application internationalization/localization, especially "converting" English
based applications to other locale/language environments, allowing multi-language or multilingual environments of the source applications.


Edit: November 11, 2007

<hr noshade size="1">
<font color="red">[Amendament Note]</font id="red">
First, MarcelG... thanks for your recofirmation on the way you handled your Unicoded forum.

There appears many approaches, but for setting a base forum encoding/charset, your approach probably is the simplest.

<u>Some updated remarks on this </u> (perhaps for people who may be interested in deploying Snitz Forums with unicode/utf-8).


I have added
Codepage = 6501
Charset = "utf-8"
(instead of Response.xxx)
in config.asp for both versions: V3406 source and V4b03Unicode, e.g.
right below the copyright statement should read:

<font color="blue">Codepage = 65001
Charset= "utf-8"</font id="blue">
Session.LCID 1033
Response.Buffer = true


**A proper ASP expression for setting an encoding is what MarcelG incidicated, per below.
<font color="blue">
Response.Codepage = 65001
Response.Charset= "utf-8"</font id="blue">
Session.LCID 1033
Response.Buffer = true


both works to set UTF-8 as a base encoding format for Forums.

No other declarations required in any files. Both V3406 native codes as is and V4b03Unicode with 10 languages (w/lang resource files, utf-8 formatted), and V3406Unicode with V4b03 design (4 langs) with this setting work beatifully.

**<font color="maroon">Additional note:
In V4b03, setup.asp contains the below characterset definition (no such defintion in the source codes, i.e. V3406). Setting unicode encoding format in config.asp will (I assume) overwrite this. Both the Forum source codes of V3406 (or its previous version) and V4b03 basically are designed to have ISO characterset, esp. a default LCID=1033, English-US as a base encoding scheme. A localized language file in V4b03 also basically assumes a locale language encoding format as ISO group character sets. This charset definition is important if/when you have multiple languages deployed. Unicode is a single encoding scheme for "all" written scripts, regardless of locale languages, so it may be sufficient to have the above defined in config.asp to enable the forums with unicode encoding.
</font id="maroon">



<b>V3406 native codes</b>:No additional change required. Forum UI remains in English, forum contents can handle multilingual.
<b>V4b03 based codes/designs</b>: Some additional modifications may be required, per below.
(a) inc_top.asp (V4b03), inc_header.asp (V3406)
inc_top.asp (V4b03) has meta tag definition with charset variable defined, i.e. strLangCharset.
Around line 113 ~116, it is defined as:
Response.Write "<html>" & vbNewline & vbNewline & _
               "<head>"& vbNewline & _
"<meta http-equiv=""Content-Type"" content = ""text/html;charset=" & strLangCharset & """>" & vbNewline & _
"<title>" & strForumTitle & "</title>" & vbNewline

Corresponding section in inc_header.asp (V3406) is around line 85~87: With the same meta tag inserted (in blue)
Response.Write "<html>" & vbNewline & _
<font color="blue">"<meta http-equiv=""Content-Type"" content = ""text/html;charset=" & strLangCharset & """>" & vbNewline & _</font id="blue">
"<title>" & strForumTitle & "</title>" & vbNewline
(b) language files
V4b03 have internationalization features built in, with config.asp to have language files include, wehre a default language file LCID is 1033 for English-US, whose languagage text string variables extra ted in Lang1033.asp

(b) Within each language file, V4b03 codes have LCID, Charsets, and locale etc. definition in indexing variables with all language text extracted from the source codes, and arranged with langstring variable names. (This is an array of text strings for localization). It is necessary to set Codepage/charset in each language file to 65001 and utf-8.) To fully support Unicode/UTF-8 charsets for each language it is necessary to convert/save your language file as a UTF-8 encoded file, including a default language 1033 = English-US. (This can be easily done by opening up locale ISO charset lang files in Notepad, or any other text editor which support unicode/utf-8 encoding format, and do save as.)
(c)Language Selector & multilingual contents - As a part of V4b03 internationalization features, it was implemented with a language selector for selecting a preferred locale language to use in viewing forums and this pulls out a locale language description to display for selection, i.e. 1033 = English-US, from a language file. Having a forum unicode/utf-8 enabled and each language resource file formatted in utf-8 encoding format will allow to display each language in its locale charset description, in addition to publish forum contents in multilingual contents (simultaneously rendering all included languages without switching webpage encoding format. In a browser, a browser encoding format needs to be set to unicode/utf-8 (and in IE, this may require to unselect auto-select to view multilingual contents properly without refreshing it and re-setting it.) If not set with unicode/utf-8 encoding, this may not happen if you have a mix of different ISO charset groups or non-western language charsets. But may not be a problem if you have set only English-US and a locale language, esp languages within a same ISO charset group, or within Latin-1 group for example). Or each encoding specific charset pages can be viewed only with a specific language encoding selected but will appear corrupted in a different language view...

The above was further tested by: <font color="blue">(All local testing forums)</font id="blue">
(1) Converting V3406 native codes to Unicode version
(2) Updating an existing V3406Unicode Forum
(3) Updating an existing V4b03 Unicode Multilingual Forums (w/ 10 languages)
(4) Updating a test forum V3406Unicode with V4b03Design (w/4 languages)
(6) All tested with IE7, Access 2000 DB, and on IIS5.1 (intranet/local)
<font color="maroon">(7) Further tested with a single locale languge version of V3406Unicode with V4b03 design - English only and Japanese only. Language selector disabled, a la no need - for a single locale language version: Forum UI - a locale language, forum contents - multilingual.</font id="maroon">



The original post was:

<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote">Additional note on enabling Snitz Forum V3406 with unicode/utf-8 encoding: It also works well (at least what I have tested) with a following simple change to inc_header.asp:

Add
<META HTTP=EQIV=""content-type"" content=""text/html;charset=utf-8"">
at the beginning of inc_header.asp codes, just below the first include.

example:
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote">
%>
<!--#INCLUDE FILE="inc_func_common.asp"-->
<META HTTP=EQIV=""content-type"" content=""text/html;charset=utf-8"">
<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">

No additional changes in codes required. This will allow all files with inc_header.asp to set a page UTF-8 encoding/charsets.

<font color="red">Reminder: This is only to make Forum contents multilingual, and no localization of Forum U/I in particular locale languaguage(s)nor internationalization involved really.</font id="red">

A quick verification of this is done by testing with 3 different Snitz Forums unicode versions running on IIS
(1) V3406Unicode version
(2) V4b03Unicode version with 6 languages [base code V3304]
(3) A private pet project of V3406Unicode with V4b03 design - currently pre-alpha or something like that, but uprunning with 4 languages. (English-US, Japanese = fully localized, Simplified/Traditional Chinese=partially done. Approx 65%+ of existing V4b03 text strings/variables reused)... if you are familiar with V4b03, look and feel is pretty much the same... and all Admin Options sections are also localized for Japanese, at least.)<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">









Edited by - TSAloha on 11 November 2007 15:20:14

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 May 2007 :  02:07:08  Show Profile  Visit MarcelG's Homepage
I had to add this to the beginning of config.asp, in addition to the change you describe for inc_header.asp.
Response.CodePage = 65001
Response.CharSet = "utf-8"
<

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 08 May 2007 02:07:31
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 08 May 2007 :  08:58:27  Show Profile
I realize your config.asp include for server side UTF-8 encoding setting. Am curious as I do not have any change made to any config.asp at all for all 3 unicode (utf-8) versions.... and seems running ok. V4b03 version is a revival version of my old site, and have it set in inc_top.asp, a previous inc_header.asp counterpart in the old V3.3.0x version.

I believe your config.asp enable your server fully support UTF-8 encoding, while HTML tag definition enable a web page to handle UTF-8 encoding/charsets...

With a native V3406 version, the above seems to work fine.
A "internationalized" version with a language file as a resource file with text strings/variable definitions, a language file itself needs to be formatted with UTF-8 encoding so that all text string variables will render properly in a browser.


Maybe am missing something?

BTW, I love your site/forum design...




<
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 May 2007 :  09:38:39  Show Profile  Visit MarcelG's Homepage
Well, perhaps the Response.CodePage = 65001 and Response.CharSet = "utf-8" is only necessary when your IIS server is not set to do that by default ... ?

Thanks for the compliment.
<

portfolio - linkshrinker - oxle - twitter
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 08 May 2007 :  11:14:58  Show Profile
As far as I know (or believe to know) IIS support unicode as other MS products which incorporate UTF-16 encoding as a base for their product globalization... I assume this is more to do with DB - Access in this case as I have only tested with Access 2000 on IIS 5.1. MS started to support unicode encoding further with Access 2000 which can handle multilingual data entities. Perhaps not so much for text handling per se.

I am assuming also that Response.CodePage=65001 and Response.Charset="utf-8" settings may more to do with handling global variables in internationalization features. One I can think of immediately is Year/Date/Time handling. I believe a part of reasons for setting index variables in V4b03 language files for future use (datetime) for example have to do with this. I haven't verified this further yet but I kinda remember that V4b04alpha was implementing international date/time handling in this fashion...??

Again I may be totally off....

To a Forum moderator: Not quite sure whether this Unicode support topic should be in this forum... so please move this to an appropriate forum if that makes sense.


<

Edited by - TSAloha on 09 May 2007 16:17:22
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 09 May 2007 :  18:50:36  Show Profile  Send ruirib a Yahoo! Message
Hi Taku,

I don't see a better fitting forum, for that matter...<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 13 May 2007 :  13:50:36  Show Profile
Rui, ok thanks.
An original post updated...
Thanks for your help.<
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 18 May 2007 :  14:19:46  Show Profile  Visit MarcelG's Homepage
Good wrap up of all the UTF things you need to know. Great!!!<

portfolio - linkshrinker - oxle - twitter
Go to Top of Page

buringhan
Starting Member

3 Posts

Posted - 31 October 2007 :  01:19:50  Show Profile
Hi TSaloha,

thanks a lot for your timely reply. i did try to put the following lines of code under the copyright statement in config.asp
Codepage = 65001
Charset= "utf-8"
Session.LCID = 1033
Response.Buffer = true

this is the only file i modify in order to enable V3406 with UTF-8,(according to the message, i don't have to edit any other file) but after i made the changes, the problem still existed, i still can't use Chinese to post a new topic or write a content (question marks will be displayed), the problem seems to lie on saving/retrieving chinese character to the database(i'm using MS SQL).

any idea ?


<
Go to Top of Page

buringhan
Starting Member

3 Posts

Posted - 31 October 2007 :  01:32:12  Show Profile
hi ,

i forgot to post url for the forum that i'm working on, here it is:
http://www.vital168.org/mswforum<
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 31 October 2007 :  11:32:15  Show Profile
What chinese encoding are you using?
Chinese Simplified - GB18030, GB2312 or HZ?
Chinese Traditional - Big5?

What Chinese input method are you using?

When you create a post, do yuo see Chineze characters in Preview window before you submit a post?

I cannot view your post in any of the above encoding format nor in UTF-8 except "?". Can you set up a guest account so that I can test a posting in your forum?

I am running a V3406 unicode with only the above change, and can handle CKJ in this forum (local V3406unicode with one mod - Events Calendar Mod which is installed as is without changing codes).

Try this in your config.asp and see whether it makes a difference:
Session.Codepage = 65001
Response.Charset = "utf-8"
'Session.LCID = 1033 '## Do Not Edit
Response.Buffer = true

Is your SQL server configured to support Unicode? I am not that familiar with MySQL db so hopefully someone else here can help you on that for clarification. MarcelG has a unicode forum which I believe is running on MySQL, check: http://www.oxle.com/<



Edited by - TSAloha on 31 October 2007 12:14:27
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 31 October 2007 :  17:32:38  Show Profile
Seems your forum default encoding is set ok with UTF-8.
I noticed one of the posts in yout Test Forums rendering chinese ok....你好

你好å—?
If you set IE page>encoding>unicode:utf-8, you should be able to see Chinese characters rendered ok here in this post.

As you seem to be able to create a post in Chinese:
Have you tried to create a new category in Chinese?
Have you tried to create a new Forum in Chinese?
Have you tried to add a new topic in Chinese?


<

Edited by - TSAloha on 31 October 2007 17:43:11
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 01 November 2007 :  13:42:21  Show Profile
Your updated mswforum looks very nice. Congrats!

Just curious:

Are you doing in-code translation of language text strings with a released V3406?, creating a localized V3406 single language forum with a Simplified Chinese?

If so, you can force to set a default LCID to 2052.
ASCII charsets exist as subsets for Chinese, as is in Japanese, if I am not mistaken, so, if you set LCID to 2052, you should still be able to input English texts/characters. (Though this may not really impact a behavior of unicoded forum, as it handled both and others ok.)


<

Edited by - TSAloha on 01 November 2007 14:19:27
Go to Top of Page

buringhan
Starting Member

3 Posts

Posted - 02 November 2007 :  00:00:24  Show Profile
hi TSAloha,

i really appreciated your effort to reply. Wanna to check out my newly born forum http://www.vital247.org/mswforum/. the interface is ok, but the content is wonderful.
oh, i have question for you, if i use v3406, all i need to do to make it a unicode forum is to edit the config.asp file to set the codepage and charset, do i still have to edit the inc_header file, to include the <mata.....charset=uft-8>, all that i want to achieve is to allow the users to be albe to post both simplied and traditional chinese and able to see them and to perform some search.

this new forum i use now is a v3404 version,(because i didn't quite get the v3406 working), in the v3404, i only added the <mata.....charset=gb2312> in the inc_header file than it worked. but i think unicode is still a better option.<
Go to Top of Page

TSAloha
Junior Member

USA
151 Posts

Posted - 02 November 2007 :  00:45:44  Show Profile
You do not need to add a meta tag statement in inc_header, as it contains a default strcharset string in there which is fetched from config when processed, and as you set a default charset = "utf-8" it should be ok... (Am assuming this is how you have it now).

Yes, your member/users can post a topic in both Simplified and Traditional Chinese in your unicoded forums.
(This is not quite possible if charset is set to gb2312 only. Similar case for Japanese - there are 3 flavors of JA charsets, Shift-JIS, EUC, etc. and each is not quite compatible). Also you can do a text search in any languages or can input English, Simplified Chinese, and or Traditional Chinese, and can do search for text matchings.. Same for member names but strongly recommend to have your members to use English alphanumeric for password, though.


I agree that you would be better off with unicode deployoyment for your above mentioned purpose. My local V3406Unicode forum is set pretty much for the same purpose.

BTW, your mswforum looking better and better... (good for me also for a refresher for Mandarin Chinese.) Good luck on your Forums.<

Edited by - TSAloha on 02 November 2007 00:48:31
Go to Top of Page
  Previous Topic Topic Next Topic  
 Forum Locked  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.13 seconds. Powered By: Snitz Forums 2000 Version 3.4.07