Snitz Forums 2000
Snitz Forums 2000
Home | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 Help Groups for Snitz Forums 2000 Users
 Help: Database: MS SQL Server
 No UTF-8 support...sort of?!
 New Topic  Topic Locked
 Printer Friendly
Author Previous Topic Topic Next Topic  

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 28 January 2009 :  15:58:33  Show Profile  Visit MarcelG's Homepage
It's been little over a month since I upgrade oxle from MS Access to SQL Server 2000.
All seemed well, but today I tried to post the word #1070;#1087;#1080;#1090;#1077;#1088; (jupiter in Russian) and it came out ???????.

Strange thing however is that it did work:
http://oxle.com/topic/4009.html still shows the site seems to be able to succesfully retrieve the data from the database, so it's still there.
But, if I look in the database it does not show the contents of t_message for that topic......


I checked the MS knowledge base and all I found was this: http://support.microsoft.com/kb/232580

Is there something I should change in the way I connect to the SQL database, or should something be changed on the SQL server itself to enable this?<

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 28 January 2009 15:59:18

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 29 January 2009 :  05:04:11  Show Profile  Visit MarcelG's Homepage
BTW: I noticed in the SQL Enterprise Manager that the database uses SQL_Latin1_General_CP1_CI_AS .... is that the cause?
I also notice that here at Snitz UTF-8 is not supported at all ; is that just a fact, or the result of the decision not to support them?<

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 29 January 2009 07:26:34
Go to Top of Page

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 29 January 2009 :  05:23:33  Show Profile  Send ruirib a Yahoo! Message
I'm not sure about the reason for your problem. I would expect that direct insertion of the characters would work, but I don't have enough experience to state that was the cuase for your isse.

A collation in a SQL Server database is used for mailing for sort ordering purposes, so it does not affect what's stored in Unicode (it defines the code page for non-Unicode data). More info on collations: http://msdn.microsoft.com/en-us/library/aa174903(SQL.80).aspx

Snitz is an english speaking forum, so no need to support UTF-8. I see no other reason than that, though this is just an assumption, no one told me this.<


Snitz 3.4 Readme | Like the support? Support Snitz too
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 29 January 2009 :  07:02:35  Show Profile  Visit MarcelG's Homepage
Mmmm, I'm reading in on the collation stuff, and I must say it's interesting reading material.
It's curious to see that Microsoft does not natively support UTF-8 as input.
Source: http://support.microsoft.com/kb/232580
quote:
UTF-8 is not valid character data to SQL Server

Perhaps the route I took with making Unicode available on oxle 2,5 years ago was completely wrong, but I'm a bit disappointed that there is no clear approach described anywhere on how to do this.
This disappointment is not about Snitz at all, but more in general ; it's a global internet, but making your applications work globally for all languages seems to be nearly impossible. I assumed UTF-8 was the way to go, but that doesn't seem true when you're using MS SQL.<

portfolio - linkshrinker - oxle - twitter
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 29 January 2009 :  07:35:45  Show Profile  Visit HuwR's Homepage
data should be stored in MS SQL as Unicode data, the character set (UTF-8) is not relevant for what is stored in the db, as long as you are storing unicode data you can store any language you want. There is nothing special about the setup here, yet it is possible to store any language in the message

the string you tried to post posts perfectly well here, and since you were posting hex values why do you think it is the db that issue, since the db should be storing the hex string not the actual characters (unless you pasted thecaharcters rather than the hex codes, so did you post the hex codes or actually cut and paste the chars from somewhere ?

<
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 29 January 2009 :  07:48:24  Show Profile  Visit MarcelG's Homepage
I wasn't posting hex values when I posted the russian word for Jupiter ; I was posting actual russian characters, and they were shown here as the hex values only *after* the post.
Example: #1088;#1091;#1089;#1089;#1082;#1080;#1081;

Here's what the post window looks like:


In the preview here at Snitz it looks ok: (click for bigger version)


At oxle each character as a questionmark instead of a hex equivealent. That's a difference compared to here at Snitz.<

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 29 January 2009 07:49:22
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 29 January 2009 :  08:01:49  Show Profile  Visit HuwR's Homepage
my guess is that you need to adjust the queries so that they are infact correctly storing the data as unicode, that is what the .Net version is doing and that has no problems displaying or saving foreign character sets (it is using exactly the same database server as the standard asp version, the only difference is the use of N to prefix the unicode strings in a query<
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 29 January 2009 :  08:08:43  Show Profile  Visit MarcelG's Homepage
Mmmm, I read something about that n prefix too. Is it feasible to enable that for the ASP version?<

portfolio - linkshrinker - oxle - twitter
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 29 January 2009 :  09:27:03  Show Profile  Visit HuwR's Homepage
yes, it is SQL not ASP, just change the queries so that wherever you update T_MESSAGE or R_MESSAGE etc so that it has N in front of the string to be inserted/update.

for example in post_info.asp the reply update query around line ~360 would change from

strSql = "UPDATE " & strActivePrefix & "REPLY "
strSql = strSql & " SET R_MESSAGE = '" & txtMessage & "'"

to

strSql = "UPDATE " & strActivePrefix & "REPLY "
strSql = strSql & " SET R_MESSAGE = N'" & txtMessage & "'"
<
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 29 January 2009 :  09:27:48  Show Profile  Visit HuwR's Homepage
you would do this for any string that needs to be unicode, ie message/subject etc<
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 29 January 2009 :  09:41:36  Show Profile  Visit MarcelG's Homepage
Mmmm, that's a lot of search and replace, but I'll give it a try.
If I'm correct all the topic & reply related updates and inserts are done through post_info.asp.
The other updates are done in the profiles (registration and profile edits)...this sounds feasible. Now just find some time to do this.

-edit-
first test: http://oxle.com/topic/5707.html
It works!!! Thanks Huw!

-second edit-
It's done!! At least for forum (title/subject), topic (title, topic, reply).
I'm not going to do the text fields of the profile page, nor am I going to support unicode usernames (that's just creepy...)
I'll have to test some things, such as the split topic mod, but I think it's pretty easy now I know what I should change.<

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 29 January 2009 10:01:33
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 29 January 2009 :  09:59:04  Show Profile  Visit HuwR's Homepage
I wouldn't try it using search and replace anyway, it is not as many as you think, there are only a handful of instances that need changing if that (you only need to change where T_MESSAGE,R_MESSAGE or T_SUBJECT are being inserted or updated.)<
Go to Top of Page

HuwR
Forum Admin

United Kingdom
20584 Posts

Posted - 29 January 2009 :  09:59:58  Show Profile  Visit HuwR's Homepage
as you suggest, only do posts, forget about the profile stuff.<
Go to Top of Page

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 29 January 2009 :  10:04:40  Show Profile  Visit MarcelG's Homepage
Yup, there were not that many instances where I had to use the N' trick. Only 16 if I'm right.
Now I can freely post the real russian name of the lens I just bought. Thanks Huw!!!!<

portfolio - linkshrinker - oxle - twitter
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Topic Locked
 Printer Friendly
Jump To:
Snitz Forums 2000 © 2000-2021 Snitz™ Communications Go To Top Of Page
This page was generated in 0.23 seconds. Powered By: Snitz Forums 2000 Version 3.4.07