Author |
Topic  |
MarcelG
Retired Support Moderator
    
Netherlands
2625 Posts |
Posted - 08 August 2006 : 06:30:29
|
I'm curious how I can solve the following 'issue'. Whenever a user enters text using some non-standard characters, they're shown as for example #305, instead of ı. When I post this here at Snitz it seems to work correctly (e.g. the characters are shown in the preview), but the final post shows the broken unicode string. In ChkMessage the string &# is replaced by #. Can I circumvent this issue ?
I imagine that people who run international Snitz forums have found a workaround for this behaviour?
PS: I've circumvented this issue here at Snitz, by posting &ı instead of ı or the actual character. The &# part of the string is then replaced by just the #, making the endresult in the db to be ı, which translates to the correct character. |
portfolio - linkshrinker - oxle - twitter |
Edited by - MarcelG on 08 August 2006 06:34:55 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 08 August 2006 : 07:30:23
|
I don't know whether people who run international forums have those issues, since the characters are not inserted that way. Using a different LCID and a different page encoding avoids that, I suppose. |
Snitz 3.4 Readme | Like the support? Support Snitz too |
 |
|
MarcelG
Retired Support Moderator
    
Netherlands
2625 Posts |
Posted - 08 August 2006 : 08:08:57
|
Ah, so when the page encoding is whatever international characterset, the 'form' that's used to enter the text is also in that characterset format, resulting in other text being put over the line. Ok. I see.
Mmm, I think SiSL has some Turkish Snitz forums, so I'll have a look at his pages to see if that solves it. |
portfolio - linkshrinker - oxle - twitter |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
|
MarcelG
Retired Support Moderator
    
Netherlands
2625 Posts |
Posted - 08 August 2006 : 09:31:10
|
Well, I discovered that SiSL's forums work because he's using the Turkish character set. I don't think that using that will solve my issues, as my site is not specially intended for the Turkish audience. That's why I'm using the character set defenition iso-8859-1. However, I might just need to use utf-8 instead.
The other thing I'm looking at is the replacement of &# by # in the ChkString function for the field type 'Message'. I'm not sure, but when looking at XSS vulnerabilities, there are only a couple of characters that need to be removed from a message, to prevent the XSS from happening.
- < and > (these two certainly!)
- ',",; and : (possibly)
I even think that when the first 2 are excluded from the message, any HTML would be lost, as every tag requires a valid < and >, instead of a < or > So, by preventing these 6 characters from being posted using HTML entities in their Hex or ASCII form, we'd be safe, and the character would be allowed.
Of course, I might just be mistaking, and underestimating the power of XSS. |
portfolio - linkshrinker - oxle - twitter |
Edited by - MarcelG on 08 August 2006 09:35:51 |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
|
MarcelG
Retired Support Moderator
    
Netherlands
2625 Posts |
Posted - 08 August 2006 : 10:43:02
|
Well, I just tried UTF-8 ; this still won't work, as the characters are then still inserted as their HTML entities. Next to that UTF-8 seems to miss some characters that I require for my site (for example the € and » are shown as ? instead of the original character).
Therefore, I went back to the original cause of this issue, being the replacement of &# by #. Even though this replacement may look 'safe' for preventing XSS attacks, I think the real security risk still exists, as anyone can enter still these codes in a Snitz board by adding the extra &. For example: ı and Ļ |
portfolio - linkshrinker - oxle - twitter |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
|
AnonJr
Moderator
    
United States
5768 Posts |
Posted - 08 August 2006 : 11:35:50
|
quote: Originally posted by ruirib
What database are you using?
Just for my own clarification, why would that matter? Is there some bit of database information I'm missing? (Maybe I don't want an answer to the second question since its probably an essay answer... ) |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 08 August 2006 : 13:05:45
|
I'd like to know what king of Unicode support does that database have. Don't think it would make much of a difference, but I don't think I ever saw the characters stored in the format described by Marcel. |
Snitz 3.4 Readme | Like the support? Support Snitz too |
 |
|
MarcelG
Retired Support Moderator
    
Netherlands
2625 Posts |
Posted - 08 August 2006 : 14:43:53
|
I'm using MS Access. But, correct me if I'm wrong, as I see this, it wouldn't have much to do with the database format, as Snitz itself on SQL has the same situation. I'll use the double-ampersand hack to work around this, but here's the example: > input : ı (shown here via double ampersand hack, but on the line below using the character itself) > output : #305; > input : ř (shown here via double ampersand hack, but on the line below using the character itself) > output : #345; So, somewhere down the line, the input post form is interpreted in some way, and transformed into the html entities. I haven't been able to pinpoint this, but seeing the ChkString function is active prior to db insertion, I wouldn't expect the db type to make the difference. |
portfolio - linkshrinker - oxle - twitter |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
|
MarcelG
Retired Support Moderator
    
Netherlands
2625 Posts |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
|
AnonJr
Moderator
    
United States
5768 Posts |
Posted - 08 August 2006 : 16:38:57
|
quote: Originally posted by ruirib
I'd like to know what king of Unicode support does that database have. Don't think it would make much of a difference, but I don't think I ever saw the characters stored in the format described by Marcel.
Ok. I checked and noticed the same behavior on some of my forums as well (all using Access). I didn't check all of them, but after getting the same results from the first three... 
Don't know if this helps or makes it more confusing... |
 |
|
ruirib
Snitz Forums Admin
    
Portugal
26364 Posts |
Posted - 08 August 2006 : 16:46:12
|
Well I know it doesn't work like that. I have a web app that uses old portuguese chars, in unicode, of course. We define the codepage as 65001 and the charset as utf-8 and the chars are stored in the database just as any other characters, not as html encoded chars.
Probably what Marcel missed was the use of the codepage value.
P.S: Of course, if the characters are inserted as HTML instead of directly from the keyboard, they will be stored as HTML... |
Snitz 3.4 Readme | Like the support? Support Snitz too |
 |
|
Topic  |
|