Author |
Topic  |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 03 December 2004 : 06:27:57
|
do you want the bit after the http// or the bit 2 to the left of the .com?
I see problems either way - for instance http://www.someusername.freeserve.co.uk would stump both of the above. You need to know the dataset you'll be using, and work backwards from there, I suspect. |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 03 December 2004 : 08:04:27
|
OK I get ya, but if you break the problem out more - eg how do you cope with .co.uk domains - it will help you envision the solution a bit easier.
I agree in principle with your idea to use regex, but even using regex, you still need to work out how to handle exceptions (like .co.uk). And also can subdomains not contain magic characters '-','_'?. Will all domains come with the http:// bit?
the regex pattern would be something like (untested):
(?:http\:\/\/)(?:www\.)([A-Za-z0-9_-]*)\.[A-Za-z0-9._-]*$
optional, non-consumed, http:// and www., first consumed match is any characters, numbers or _ or - up to the next dot, then read and discard to the end of the string
Someone may tidy this up a bit, but assuming you just feed it domains, should handle it (but check the escaped characters need to be escaped - may cause hiccoughs otherwise!)
hth
edit: this will not work *if there is no subdomain*, but that wasn't part of the request! |
Edited by - pdrg on 03 December 2004 08:05:12 |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
Posted - 03 December 2004 : 08:51:07
|
Thanks for taking the time to help me with this. Its a good start.
Basically I'm trying to find the most efficient code which would return the subdomain to me in a string.
I won't be checking lots of different domains and their subdomains. It will be on one particular domain. There won't be any .net or .co.uk.
I'll outline the rules as best I can.
Although subdomains could contain "-" or "_", they would be illegal for my purposes. All subdomains should only have letters or numbers and are of at least length 4.
If no subdomain exists, thats already catered for by DNS (as are A HOSTS for mail, ftp, www etc.). DNS will redirect to the correct website, etc. for those and the regex won't get a chance to run.
This is what would happen in the following situations
http://sub1.domain.com - I need "sub1" as a string http://sub2.sub1.domain.com - I need "sub1" as a string http://sub-marine.domain.com - Return an error
I don't need to check for http:// or https:// or that the domain.com is valid. It will always be valid.
|
Podge.
The Hunger Site - Click to donate free food | My Blog | Snitz 3.4.05 AutoInstall (Beta!)
My Mods: CAPTCHA Mod | GateKeeper Mod Tutorial: Enable subscriptions on your board
Warning: The post above or below may contain nuts. |
 |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 03 December 2004 : 09:20:15
|
And will that string be clean? ie the domain won't be in the middle of a load of 'stuff'?
(?:https?\:\/\/)(?:[A-Za-z0-9]\.?)([A-Za-z0-9]{4,})\.domain\.com should do it!
(?:https?\:\/\/) non-capturing http:// or https:// (?:[A-Za-z0-9\-_]*\.?) should non-capture the subdomain with optional dot ([A-Za-z0-9]{4,}) should capture the alphanumeric string at least 4 characters long \.domain\.com note the \. escapes the dot to a literal string
caveat - would need a bit of in-situ testing as (?:[A-Za-z0-9\-_]*\.?) term may also match sub2 AND sub1 - if it does, this will need some tinkering, or you may find the sub1 match will not be the zero'th submatch in the submatch collection returned by the regex object, but maybe the first.
hope this is helpful! Maybe someone else wants to point out if this is a hiding to nothing, but if we go down the regex route, I think the above is 'about right' |
 |
|
Podge
Support Moderator
    
Ireland
3776 Posts |
|
pdrg
Support Moderator
    
United Kingdom
2897 Posts |
Posted - 03 December 2004 : 09:31:44
|
(?:https?\:\/\/)(?:[A-Za-z0-9]\.?)([A-Za-z0-9]{4,})\.domain\.com.*
note the ending .* which will match (and ignore) the rest of the string - however not sure it's even needed! Sorry I cannot test all this for you, but I haven't got the kit with me so this is largely hypothetical!! |
 |
|
-gary
Development Team Member
 
406 Posts |
Posted - 03 December 2004 : 11:26:09
|
Here's an online tester with the ability to toggle things like case and line break detection. They also have tons of code samples.
http://www.regexlib.com |
KawiForums.com
 |
 |
|
|
Topic  |
|