T O P I C R E V I E W |
AnonJr |
Posted - 11 January 2009 : 13:21:17 As I was waiting for the Windows 7 Beta ISO to finish downloading, my mind was wandering over to the URL issues we've discussed - from long URLs stretching the posts, to weird issues with multiple www's in them, and other URL oddities.
I was thinking that maybe it would be better to add an interface to two or three of the more popular link shrinking type sites like TinyURL, etc. and just running all links through which ever service the Admin chose to.
And, as an alternative to relying on a service that might disappear, building an internal service to do something similar. If Marcel was willing to share his code it could be the basis for said service.
Oh, and when I say service, I don't necessarily intend it to be accessible outside the forum software.
And my last random thought was to convert all urls to [link] if no link text was specified. (may need some more explanation once I've made another pot of coffee and thought this out more thoroughly).
This was just a random idea I had and I wanted to write it down somewhere to make it harder to forget. I haven't hit 30 yet and I'm balding, seeing gray in my beard, and forgetting stuff all the time... I think.. < |
15 L A T E S T R E P L I E S (Newest First) |
HuwR |
Posted - 13 January 2009 : 14:47:48 having it between tags does not make it anymore valid, i didn't say that it did, but when it is between tags you do not have to bother pattern matching random text just to see if there may or may not be a valid URL embedded in it somewhere, it is this that causes the issues because it is not possible to easily grab every possibility especially where punctuation is concerned, how do you know a dot is in the URL or denotes the end of a sentence and they just forgot the space ? at least if you insist on URL tags you know 100% for sure where a URL starts and ends.< |
SiSL |
Posted - 13 January 2009 : 13:01:03 Entire examples based on finding if URL is working / well-written or not... That's not the main point actually. It would give very same error if user wrapped a wrong URL into URL tags, it still would not be valid... So I really don't understand necessity of checking if URL is right or wrong. If it is based on certain patterns, such as having a protocol etc. you replace them automatically till line ends with a break or space, if not, leave them as it is... Add url/url to start and end during post, and don't check it ever with edit_hrefs after. Current script also replaces between url tags, that's where it goes wrong, that's all..
not-a-valid-url
[url]forum.snitz.com[/url]
forum.snitz.com goes to http://forum.snitz.com/forum/forum.snitz.com which is not a valid URL at all. See that... Your auto-url recognition would leave this text as it is. Same thing... Tell me how is making it between url tags, make it any more valid-URL than your auto-recognition, which would mess-up even more for your visitors...
What's just wrong with current script is it is global replace. Case by case inspection during post just once would solve all your problems with auto-recognition. such as:
If url starts with http://(etc.etc) -> don't stop till line-break or space or [ or < (or characters or only commands you want to stop it)and wrap it between url tags.
By this way, you would simply stop double redirections on same line, by just removing edit_hrefs check after... So it does not replace globally but replaces one-by-one case...
So this will not happen: http://www.chip.com.tr/urltest.asp?http://www.level.com.tr
But it would act like:
[url]http://www.chip.com.tr/urltest.asp?http://www.level.com.tr[/url]
By removing edit_hrefs, it would parse URL as it should...
PS: That's my final note on the subject, I'll ask you to come over and test as much as you can when I'm done with it for any cases. But please 3.4.0.7 soon! :) < |
Podge |
Posted - 13 January 2009 : 12:51:16 For starters the regexp would contain
1. All protocols - http:// ftp:// news:// etc. 2. All tld's .ad .tv .com .net .org .ie .info, etc. 3. Some way of validating the text in between 1. & 2. which could include anything from valid url's to i.p. addresses
There are 106 regexp results for url here - http://regexlib.com/Search.aspx?k=url&c=-1&m=-1&ps=20
There are lots of different ways to do it but none of them are perfect. Its a lot easier for the user to specify what they want as a url by using url tags and then treat the text as a url rather than have the forum software try to guess what the user wants as a url.
Shaggy made an important point when he said quote: you wouldn't expect a link written in HTML to be active without wrapping it in an anchor tag
< |
SiSL |
Posted - 13 January 2009 : 12:27:12 quote: Originally posted by HuwR
it has nothing to do with being lazy, have you actually got any ide how large the regexp is that is required to parse URL correctly ? if you can't do something correctly then IMHO you shouldn't do it at all. So no, we can't do url parsing, if we could, whay do we continue repeating this discussion over and over again ?
wrapping long text is an entirely different issue and is extremely easy to fix using a very basic regexp, URL's are not.
Trying to figure out how hard it can be. Nope, I'm not here for an argument. I just think you might be exaggrating about regexp. Beside, when it is done in during post, multiple steps of regexp pass would ease it. That's all I'm thinking at this moment...
If you have any hard-to-parse URLs as examples (most likely on a txt file since they might be problem in current code), I'd be happy to have them and do my tests which would help me greatly on my new forum design as well. < |
HuwR |
Posted - 13 January 2009 : 12:15:01 it has nothing to do with being lazy, have you actually got any ide how large the regexp is that is required to parse URL correctly ? if you can't do something correctly then IMHO you shouldn't do it at all. So no, we can't do url parsing, if we could, whay do we continue repeating this discussion over and over again ?
wrapping long text is an entirely different issue and is extremely easy to fix using a very basic regexp, URL's are not.< |
SiSL |
Posted - 13 January 2009 : 11:44:01 quote: Originally posted by HuwR
what I am trying to point out is that if you automatically parse valid URLs in the message, then why do we have URL tags ? is not the purpose of the tags to tell you it is a URL?
I humbly disagree on the subject. We (designers, webmasters) use BBCODE URL tags to help "us" parse URL's and give links in HTML format "easily". But forgetting main thing that our primary duty is to make visitors life easier, not just us.
We can not quote for users, we can not put bold tags instead of them, but heck, we sure can do URL parsing and in so easy way that does leave URL's without parsed if they don't fit our expectations or give links if they do.
Let's just don't get lazy or "let go" for things. Com'on people where is your developer spirit to search for the better options ;)
Here is something to start with a basic solution for any long text... ­ character... Basically wraps text and adds - where it is wrapped...
ThatsalongtextthatdoesnothaveanyspacesinitsothatitisnotwrappedbyanyborderssowecanuseourshycharactersitissousefulwhenrenderedsuccessfullyforexamplehowevernowonderIneedtowritesomemoretexttofitmyscreenat1680x1050resolutiontogivebestresults.
< |
MarcelG |
Posted - 13 January 2009 : 10:36:48 Jeff, it's a bit offtopic, but here's how I built it. I'm afraid I cannot share the link shrinker code as it's my own addition to a commercial non-open source linkmanager package. The basic idea of the linkmanager is pretty straightforward however:
- enter URL on inputpage (example input.asp)
- store URL in a database (for example SQL) and auto-assign an ID to it.
- provide the redirection URL to the user (for example redirect.asp?URL=ID).
- everytime the redirect URL is requested, the counter for that URL in the database is increased by 1, and the browser is redirected to the URL found in the database using response.redirect.
I've added two features to it: Instead of decimal URL ID's use base62 encoded ID's using a-z, A-Z and 0-9, effectively enabling 62x62x62x62=14776336 ID's to be fitted in only 4 characters. Here's some more info: http://oxle.com/convert.asp Here's the function for that: http://oxle.nl/Wh If you'd want *clean* looking URL's isntead of cryptic urls such as the one above, I'd do it differently.
- enter URL *and title* on inputpage (example input.asp)
- if the title is already stored in the database, provide feedback to the user, to provide a new unique title.
- if the title is unique, store it together with the URL in the database
- provide the redirection URL to the user (for example http://mysite.com/link/title).
- enable a custom 404 page in the form of an ASP page, specifically checking for 404's in the above subfolder (/link/).
- have this custom 404 page do a lookup in the database for that unique title, and increase the viewcounter.
- have the custom 404 page perform the redirect to the URL found in the database using response.redirect(url).
This way you could create clean looking links like http://mysite.com/link/latestnews etc.< |
JJenson |
Posted - 13 January 2009 : 10:15:29 Hey Marcel would you be willing to share the Link Shrinker Code with me? My brother has a site they want to truncate links and make them look like they are coming from their site ie www.theirdomain.com/theirtitle
Is this possible I am not going to use this for people to truncate their own urls?< |
MarcelG |
Posted - 13 January 2009 : 07:08:04 Anon, that snipping feature can be found here: http://oxle.com/topic/3787.html< |
AnonJr |
Posted - 13 January 2009 : 06:35:05 It seems we have a consensus - at least on the automatic recognition of URLs.
Rather than hold up 3.4.07 any longer than it has been, shall we agree to slate this for the next update?
Also, while that takes care of a lot of the "weird URL issues" we've been talking about, there's the other issue of long URLs forcing horizontal scrolling. I remember a few years back there was some discussion of (and some code for) snipping them in the middle and adding some ellipses. Alternately, I've also considered doing the URL shortening a la Marcel's Link Shrinker or just providing some text if none is provided. ([ url ]http://www.jesusjoshua2415.com[ /url ] -> [link]< |
ruirib |
Posted - 13 January 2009 : 05:55:19 quote: Originally posted by Shaggy
quote: Originally posted by HuwR is not the purpose of the tags to tell you it is a URL?
This has always been my thinking, as well; you wouldn't expect a link written in HTML to be active without wrapping it in an anchor tag and isn't forum code essentially a simplified version of HTML?
I agree too. I think we should just remove the automatic recognition of URLs.< |
Shaggy |
Posted - 13 January 2009 : 04:42:10 quote: Originally posted by HuwR is not the purpose of the tags to tell you it is a URL?
This has always been my thinking, as well; you wouldn't expect a link written in HTML to be active without wrapping it in an anchor tag and isn't forum code essentially a simplified version of HTML?
< |
HuwR |
Posted - 12 January 2009 : 14:50:26 this issue has been discussed over and over again for nearly 10 years and it is still being discussed. unless you fancy porting the gigantic regexp that is required to validate every conceivable URL allowed by the RFC standards then we will most likely still be discussing it in 10 years time .
what I am trying to point out is that if you automatically parse valid URLs in the message, then why do we have URL tags ? is not the purpose of the tags to tell you it is a URL?
IMHO we should therefor not bother checking for URLs that are not in URL tags, it would save a whole heap of grief and aggravation and another 10 years of recurring discussions < |
SiSL |
Posted - 12 January 2009 : 14:13:10 quote: Originally posted by HuwR
checking whether a string of text contains a valid url embedded in it is not as trivial as you seem to think, that is what the problem is with urls, not whether they are parsed pre or post insertion into the db.
even urls starting with http or www need validation.
I definitely know that. You can be sure I'll do my best on checks... In the end, brainstorming on this, right? Peace ;) < |
HuwR |
Posted - 12 January 2009 : 14:10:49 checking whether a string of text contains a valid url embedded in it is not as trivial as you seem to think, that is what the problem is with urls, not whether they are parsed pre or post insertion into the db.
even urls starting with http or www need validation.
< |
|
|