Author |
Topic |
|
_barbara
Junior Member
Germany
123 Posts |
Posted - 16 April 2002 : 19:10:15
|
hello, first of all thanks to bozden and all of you dev guys for your great efforts to internationalize the snitz forum. Since a few days, I am running snitz 4 beta for an international group of people - most of them german, but a few russian and english guys too -which gives me the chance to offer them a language interface to their own choice. So I installed three language files, german (as default), english, and russian (I use the russian language file translated by stratege which is mentioned at http://forum.snitz.com/forum/topic.asp?TOPIC_ID=16035).
Now, using the forum, I was faced with the following problem: Messages, titles etc. with german special characters (ä, ö, ü, ß - to be displayed as ä, ö, ü, ß) are displayed correcty when they are viewed with the german language interface. But, using the russian interface, those characters are not displayed correctly, namely, they are replaced by russian characters. This is annoying because most (if not all) of the messages are resp. will be written in german, which shall also help the russian guys in our group to practice german.
To get a quick solution, I added a few lines to the chkString function in inc_functions.asp (starting at line 383):
function chkString(pString,fField_Type) '## Types - name, password, title, message, url, urlpath, email, number, list fString = trim(pString) '---------------------------------- 'new - replace german special characters fString = Replace(fString, "ä", "ä") fString = Replace(fString, "ö", "ö") fString = Replace(fString, "ü", "ü") fString = Replace(fString, "ß", "ß") '----------------------------------
This works fine. But I think there should be a better solution for this problem. Non-western chararcters like cyrillic can't simply be converted like this so it seems to me to be a more general problem, namely, displaying correctly language-mixed content. Do you have any idea how to solve it?
Barbara
Edited by - _barbara on 16 April 2002 19:14:34
Edited by - _barbara on 16 April 2002 19:19:21< |
|
Deleted
deleted
4116 Posts |
|
n/a
deleted
593 Posts |
Posted - 25 April 2002 : 16:56:23
|
quote:
The ultimate solution for this would be UTF-8. It's not fully problem-free though.
Think Pink ==> Start Internationalization Here
A little note on this subject:
To set a multilingual Forum in unicode, perhaps you can try the following: (1) First, changes in LANGxxxx.asp for unicode (UTF-8), commented out LANG STRING DEF FILE section will not affect but can be set to UNICODE specific notation....example in case of LANG1041 (2) Second, change in inc_top.asp to include HTML meta tag....(inc_top.asp is included in most xxxx.asp files, and will set the page in unicode mode. ------
(1) ----------Langxxx.asp
Lang1041.asp example -
' LANGUAGE STRINGS DEFINITION FILE ' Japanese - LCID=1041 ' CodePage= 65001 (using UTF-8) ' For More Information Please Read the LangReadMe.txt file ' Language Data Dated: 28.02.2002 (UK 24h Format)
LN47
arrLang (intLangIndexCount,2) = "utf-8" ' HTML Content-Type charset definition to be used in <HEAD> tag
(2)-----------inc_top.asp
-immediately below copyright statement...
%> <meta http-equiv="content type" content="text/html;charset=utf-8"> <%
'### start of timer code ------
There may be some specific locale char sets that need to be hardcoded in unicode.. but try the above and see how it works (assume that you have V4.03 and patches donwloaded and installed) If LANGxxxx.asp file LN46, locale language describption is done in a local language, for example, German in Deutch, etc. a locale languge title will show in a language selector pull down box.
This is an approach used for UTF-8 encode format for FORUM (my case testing with English, German, French, Italian, Spanish, Chinese and Japanese and worked ok).
Of course, you have to have all languages you are supporting included in config.asp and also as a default max number of languages is set to 6, as I understand.
Additionally, when you have a forum in unicode, with multiple languages, a current version does not discriminate forums/topics/messages according to a language selection - meaning you will see all (public) posts in all languages... So, if you want to have a language specific forum, perhaps the best way is to setup a language group forum - depending on how you will organize your FORUM...One way I did was to set up a forum according to a language, and then setup a different topics in that language forum. (This can be the other way around).... This can be managed by using members only type forum and assign a moderator to an individual language group, etc.
Hope this helps a little.
< |
|
|
_barbara
Junior Member
Germany
123 Posts |
Posted - 30 April 2002 : 14:58:20
|
Thanks a lot for your hints, leorat. I've been playing around a little bit with the charset definition in the langxxxx.asp files. (There seems to be no need to change inc_top.asp because this file generates the metatag specifying the charset from the langxxxx.file in use.) Setting the charset to unicode in the langxxxx.files as you suggest,
arrLang (intLangIndexCount,2) = "utf-8" seems to work fine - provided this is done in all the language files I want to use. Now my problem is that the russian language file I use apparently uses the windows charset:
arrLang (intLangIndexCount,2) = "windows-1251" Do you have any idea how to re-encode this file in order to get unicode? Just changing the charset to "utf-8" gives strange results...
Of course you are right, just having a multilingual forum installed does not separate the messages by language. Depending on purposes, creating language group forums may be indeed a good solution. But for my forum that's not necessary I think. What I would like is just that, e.g., the russian users of my forum can use their russian language interface and still read and post messages in russian *or* german (or english) and that such language-mixed forum content is displayed correctly in whatever language interface they use...
Barbara
< |
|
|
Deleted
deleted
4116 Posts |
Posted - 30 April 2002 : 15:19:34
|
quote:
Now my problem is that the russian language file I use apparently uses the windows charset:
arrLang (intLangIndexCount,2) = "windows-1251" Do you have any idea how to re-encode this file in order to get unicode? Just changing the charset to "utf-8" gives strange results...
This is one of the major problems you face when working with UTF. You must have the right configuration of database, operating system, installed languages on OS, editor and editor font, to be able to edit the language file.
Best editor I found so far when working on multiple language files is the Notepad from W2K... In some cases, the initial editor (the editor used in creating the language file) is also important.
Think Pink ==> Start Internationalization Here< |
|
|
n/a
deleted
593 Posts |
Posted - 30 April 2002 : 21:23:15
|
Hello,
I have created a utf-8 format lang1049, and posted it at my site for downloading: Please check Snitz Xchange for Lang1049 link: http://www26.brinkster.com/i2asia/forum/
I basically used the same approach that I used for creating utf-8 format langxxx.asp in setting up a UTF-8 based forum. I know there are some hardcoding for converting locale unique charsets into unicode sometimes (I don't know about Russian..but assume there are some unique/special chars and there are some unicode specific issues as Bozden mentioned...).
I visited the demo site and verified language encoding type -Cyrillic (Windows) or Windows-1251, and looked at Lang1049.asp after onverting it into utf-8 in IE and looks fine (at least from my quick glance - no guarantee as I have no idea about Russian). So, I assume this one should work fine...give it a try.
Here is how I did it, and hope this may be of some help for you to consider other languages for utf-8/unicode implementation.
1. Downloaded the source language pack, Lang1049.
2. Opened Lang1049.asp in Internet Explorer (IE6.x)
3. File>Save AS>, which pops up Save Web Page window, with: File name: [ lang1049_asp ] Save as type: [ Web Page, complete (*.htm;*.html) ] Encoding: [ xxxxxxxx ] where [ ] means a pull down box, xxxxx means whatever encoding you are using in browser window (mine poped up with Shift-JIS, as that was the last encoding I used). Here, leave File name and Save as type non-touched, but change Encoding: to Unicode (UTF-8). Choose save in location (I use Desktop normally for further work).
4. Open "Lang1049_asp.html" on Desktop with Internet Explorer. When you right-click on this window for Lang1049_asp.htm, and check "encoding" it should show "UNICODE(UTF-8)"
5. Open Source/View Source of this file (here comes NotePad2000 or 2002). Do some clean up work here. Some junks are added into a file from html...You most likely to see some html headers/footers - at the top
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=utf-8"> <META content="MSHTML 6.00.2715.400" name=GENERATOR></HEAD> <BODY><PRE>
and at the bottom
</PRE></BODY></HTML>
Remove these. (I also get some corrupted ones for <, >, and 'amp;' for (blank space) which do not properly converted in the source lang file, perhaps because of my editor....? But was the same in Notepad2002..) These are replaced/changed.
Next, to set Lang1049 for utf-8, edit
arrLang (intLangIndexCount,2) = "windows-1251" ' HTML Content-Type charset definition to be used in <HEAD> tag
to change "windows-1251" to "utf-8" (now ready to be saved as Lang1049.asp file with utf-8 encoding format.)
6. When the file is cleaned, save it as LANG1049.asp - Save As should show the same type information as before for Save Web Page: File name: [ lang1049_asp] Save as type: [ *.htm;*.html,*.asp;*.shtml;*.shtm)] Code Page: [UTF-8] (and perhaps checked "Add a Unicode Signature (BOM) below the codepage). Change File name to lang1049.asp and SAVE. You have a clean utf-8 format lang1049 now.
7. Open lang1049.asp with Internet Explorer. Right click to check encoding, and you should see UTF-8, and should be able to view the langstrings in Russian!!!!
--- As for installing this to your forum. Simply upload this lang1049 to your /forum/ directory. Remember to set arrLang (intLangIndexCount,2) to "UTF-8" in all langxxxx.asp (including default Lang1033.asp).
You are right: No need to add %> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <%
in inc_top.asp (Thank you and thanks Bozden on this!)
---- That's all.
Give it a try with a utf-8 format lang1049 prepared... Hope it works for you.
LR
Edited by - leorat on 30 April 2002 21:25:32
Edited by - leorat on 30 April 2002 22:50:27< |
|
|
_barbara
Junior Member
Germany
123 Posts |
Posted - 01 May 2002 : 05:29:55
|
I had just found out exactly the same solution , i.e. using IE for utf-8 conversion, it works perfectly, also with IE 5.5.
What makes things even easier is to rename the langxxxx.asp file to langxxxx.txt before opening it in IE, and then save as..., encoding utf-8, and as file type: text file, so you don't get any html tags inserted into the file.
Barbara
< |
|
|
n/a
deleted
593 Posts |
Posted - 01 May 2002 : 08:01:07
|
quote:
I had just found out exactly the same solution , i.e. using IE for utf-8 conversion, it works perfectly, also with IE 5.5.
What makes things even easier is to rename the langxxxx.asp file to langxxxx.txt before opening it in IE, and then save as..., encoding utf-8, and as file type: text file, so you don't get any html tags inserted into the file.
Barbara
Great!!!!! I just tested it on my site...but now I guess I can remove the link and also lang1049 from the forum....
Thanks for file naming...makes lots of sense. (Wasn't looking at a right detail...). This makes life much easier.
Leorat
< |
|
|
RusselHarvey
Starting Member
USA
24 Posts |
Posted - 03 May 2002 : 22:31:41
|
It becomes quite clear that even lang1033 'English-US' should be code in 'utf-8' and saved file in Notepad as 'utf-8 format.
When the forum has more than just 'English-US', this will decide what ie choose its encoding, if not utf-8, then all the other languages that you may listed in the 'langugage selection' drop down list will not display correctly.
< |
|
|
n/a
deleted
593 Posts |
Posted - 03 May 2002 : 23:37:56
|
You are correct. That's why to use utf-8/unicode as a base encoding scheme for a multilingual forum.
Probably it will require a new mod to make all language representation in a language selection box based on a selected language, for example, if you choose English-US (default), all language in a selector, will show up in their English description, and if you choose, for example, Simplified Chinese, all languages will show up in their Simplified Chinese representation, etc. etc.
If you want to see various language topics/messages posted, currently the best way is to have it with utf-8 encoding. Otherwise, you reflesh your browser to see particular language properly rendered in a browser (I haven't used Netscape for some time, but I know there was some different implementation of encoding depending on broswers as well, such as font sizing etc. So, there are something bit more to this.
LR
< |
|
|
|
Topic |
|
|
|