Snitz Forums 2000 - international characters

Snitz Forums 2000

Username:	Password:
Save Password
Forgot your Password?

All Forums

Help Groups for Snitz Forums 2000 Users

Help: General / Classic ASP versions(v3.4.XX)

international characters

New Topic

Topic Locked

Printer Friendly

Author

Topic

Page: of 3

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 August 2006 : 06:30:29

I'm curious how I can solve the following 'issue'.
Whenever a user enters text using some non-standard characters, they're shown as for example #305, instead of ı.
When I post this here at Snitz it seems to work correctly (e.g. the characters are shown in the preview), but the final post shows the broken unicode string.
In ChkMessage the string &# is replaced by #.
Can I circumvent this issue ?

I imagine that people who run international Snitz forums have found a workaround for this behaviour?

PS: I've circumvented this issue here at Snitz, by posting &&#305 instead of &#305 or the actual character.
The &# part of the string is then replaced by just the #, making the endresult in the db to be &#305, which translates to the correct character.

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 08 August 2006 06:34:55

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 07:30:23

I don't know whether people who run international forums have those issues, since the characters are not inserted that way. Using a different LCID and a different page encoding avoids that, I suppose.

Snitz 3.4 Readme | Like the support? Support Snitz too

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 August 2006 : 08:08:57

Ah, so when the page encoding is whatever international characterset, the 'form' that's used to enter the text is also in that characterset format, resulting in other text being put over the line. Ok. I see.

Mmm, I think SiSL has some Turkish Snitz forums, so I'll have a look at his pages to see if that solves it.

portfolio - linkshrinker - oxle - twitter

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 09:03:25

Ok, let me know if you need any help.

Snitz 3.4 Readme | Like the support? Support Snitz too

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 August 2006 : 09:31:10

Well, I discovered that SiSL's forums work because he's using the Turkish character set.
I don't think that using that will solve my issues, as my site is not specially intended for the Turkish audience.
That's why I'm using the character set defenition iso-8859-1.
However, I might just need to use utf-8 instead.

The other thing I'm looking at is the replacement of &# by # in the ChkString function for the field type 'Message'.
I'm not sure, but when looking at XSS vulnerabilities, there are only a couple of characters that need to be removed from a message, to prevent the XSS from happening.

< and > (these two certainly!)
',",; and : (possibly)

I even think that when the first 2 are excluded from the message, any HTML would be lost, as every tag requires a valid < and >, instead of a &lt or &gt
So, by preventing these 6 characters from being posted using HTML entities in their Hex or ASCII form, we'd be safe, and the character would be allowed.

Of course, I might just be mistaking, and underestimating the power of XSS.

portfolio - linkshrinker - oxle - twitter

Edited by - MarcelG on 08 August 2006 09:35:51

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 10:28:27

Yeah, utf-8 should work.

The replacements you're talking about are an alternative to use of utf-8?

Snitz 3.4 Readme | Like the support? Support Snitz too

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 August 2006 : 10:43:02

Well, I just tried UTF-8 ; this still won't work, as the characters are then still inserted as their HTML entities.
Next to that UTF-8 seems to miss some characters that I require for my site (for example the € and » are shown as ? instead of the original character).

Therefore, I went back to the original cause of this issue, being the replacement of &# by #.
Even though this replacement may look 'safe' for preventing XSS attacks, I think the real security risk still exists, as anyone can enter still these codes in a Snitz board by adding the extra &.
For example: ı and Ļ

portfolio - linkshrinker - oxle - twitter

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 11:16:53

What database are you using?

Snitz 3.4 Readme | Like the support? Support Snitz too

AnonJr
Moderator

United States
5768 Posts

Posted - 08 August 2006 : 11:35:50

quote:
Originally posted by ruirib

What database are you using?

Just for my own clarification, why would that matter? Is there some bit of database information I'm missing? (Maybe I don't want an answer to the second question since its probably an essay answer...

)

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 13:05:45

I'd like to know what king of Unicode support does that database have. Don't think it would make much of a difference, but I don't think I ever saw the characters stored in the format described by Marcel.

Snitz 3.4 Readme | Like the support? Support Snitz too

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 August 2006 : 14:43:53

I'm using MS Access.
But, correct me if I'm wrong, as I see this, it wouldn't have much to do with the database format, as Snitz itself on SQL has the same situation.
I'll use the double-ampersand hack to work around this, but here's the example:
> input : ı (shown here via double ampersand hack, but on the line below using the character itself)
> output : #305;
> input : ř (shown here via double ampersand hack, but on the line below using the character itself)
> output : #345;
So, somewhere down the line, the input post form is interpreted in some way, and transformed into the html entities. I haven't been able to pinpoint this, but seeing the ChkString function is active prior to db insertion, I wouldn't expect the db type to make the difference.

portfolio - linkshrinker - oxle - twitter

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 15:16:26

Yeah, I agree with that, even if I do find weird that that conversion takes place...
Is that Access 2000?

Snitz 3.4 Readme | Like the support? Support Snitz too

MarcelG
Retired Support Moderator

Netherlands
2625 Posts

Posted - 08 August 2006 : 15:25:04

Yep, MS Access 2000.

portfolio - linkshrinker - oxle - twitter

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 16:33:42

Snitz is not using UTF-8 encoding, you said you did. What language did you intend to support?

Snitz 3.4 Readme | Like the support? Support Snitz too

AnonJr
Moderator

United States
5768 Posts

Posted - 08 August 2006 : 16:38:57

quote:
Originally posted by ruirib

I'd like to know what king of Unicode support does that database have. Don't think it would make much of a difference, but I don't think I ever saw the characters stored in the format described by Marcel.

Ok. I checked and noticed the same behavior on some of my forums as well (all using Access). I didn't check all of them, but after getting the same results from the first three...

Don't know if this helps or makes it more confusing...

ruirib
Snitz Forums Admin

Portugal
26364 Posts

Posted - 08 August 2006 : 16:46:12

Well I know it doesn't work like that. I have a web app that uses old portuguese chars, in unicode, of course. We define the codepage as 65001 and the charset as utf-8 and the chars are stored in the database just as any other characters, not as html encoded chars.

Probably what Marcel missed was the use of the codepage value.

P.S: Of course, if the characters are inserted as HTML instead of directly from the keyboard, they will be stored as HTML...

Snitz 3.4 Readme | Like the support? Support Snitz too

Page: of 3

Topic

New Topic

Topic Locked

Printer Friendly

Jump To:

Snitz Forums 2000

This page was generated in 1.77 seconds.