Author |
Topic |
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 21 February 2008 : 05:30:35
|
Back to one of my nemeses in the fabulous world of programming, the dreaded regular expression!
I use this function to parse tags on all my sites and, as you can see, it's a bit bloated (the actual function uses arrays, though) despite my attempts to refine and improve it with each new site. You'll notice I've highlighted a few parts of the function in red; these are the parts I want to try and clean up, if possible. The final output of the function needs to validate as XHTML with no errors or warnings which is why I have all those extra bits and pieces stripping out redundant tags, etc. there in the first place. I'm thinking, maybe, that that's as good as it's going to get but hoping somebody might be able to point out some places where it can be improved.
At the same time, I'm trying to do away with the need for the [list] tag and have it ignore individual line breaks between [bullet] tags. At the moment, to create a list the user needs to input something like:
[list] [bullet]Point 1[/bullet] [bullet]Point 1[/bullet] [bullet]Point 3[/bullet] [/list]
To output this:
<ul><li>Point 1</li><li>Point 2</li><li>Point 3</li></ul>
Note that the line breaks are being stripped out towards the end of the regular expression. What I'd prefer the user to be able to enter but still get the same output is simply:
[bullet]Point 1[/bullet] [bullet]Point 1[/bullet] [bullet]Point 3[/bullet]
With the individual line breaks being ignored. If however, the user should enter more than one line break between 2 bullets, a new list should be started. For example:
[bullet]Point 1[/bullet] [bullet]Point 1[/bullet] [bullet]Point 3[/bullet]
[bullet]Point 4[/bullet] [bullet]Point 5[/bullet] [bullet]Point 6[/bullet]
Should give:
<ul><li>Point 1</li><li>Point 2</li><li>Point 3</li></ul><ul><li>Point 4</li><li>Point 5</li><li>Point 6</li></ul>
Sorry for the lengthy post, been sitting on this one for a while; hopefully I've explained it all clearly enough
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
Edited by - Shaggy on 21 February 2008 05:31:09 |
|
pdrg
Support Moderator
United Kingdom
2897 Posts |
Posted - 21 February 2008 : 11:28:02
|
I'll take a stab at it - I bought regexbuddy a while back, it's a great help! |
|
|
pdrg
Support Moderator
United Kingdom
2897 Posts |
Posted - 21 February 2008 : 12:01:56
|
Working towards it... Case insensitive, of course
Search text \[bullet\](?<bullettext>.*?)\[/bullet\]
Replace text <li>${bullettext}</li>
Doesn't do the split list yet, doesn't do the auto <ul> </ul> yet, but we'll get there |
|
|
pdrg
Support Moderator
United Kingdom
2897 Posts |
Posted - 21 February 2008 : 13:32:23
|
Dim myRegExp, ResultString Set myRegExp = New RegExp myRegExp.IgnoreCase = True myRegExp.Global = True myRegExp.Pattern = "\[bullet\]([\s\S]*?)\[/bullet\](?=\r{0,1})" ResultString = myRegExp.Replace(SubjectString, "<li>$1</li>")
Just going to hit the UL /UL |
|
|
pdrg
Support Moderator
United Kingdom
2897 Posts |
Posted - 21 February 2008 : 13:44:57
|
Dim myRegExp, ResultString Set myRegExp = New RegExp myRegExp.IgnoreCase = True myRegExp.MultiLine = True myRegExp.Global = True myRegExp.Pattern = "^(\[bullet\][\s\S]*\[/bullet\])$" ResultString = myRegExp.Replace(SubjectString, "<ul>$1</ul>") myRegExp.Pattern = "\[bullet\]([\s\S]*?)\[/bullet\](?=\r{0,1})" ResultString = myRegExp.Replace(SubjectString, "<li>$1</li>")
Getting closer - the split groups is the nightmare |
|
|
pdrg
Support Moderator
United Kingdom
2897 Posts |
Posted - 21 February 2008 : 14:19:42
|
Actually mate my brain is softening, I can't get there tonight - want to drop me an email a sec? |
|
|
AnonJr
Moderator
United States
5768 Posts |
Posted - 21 February 2008 : 15:23:36
|
quote: Originally posted by pdrg
Actually mate my brain is softening
RegEx has a way of doing that.
Which reminds me, I still need to fix one for an internal app here at the hospital... |
|
|
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 22 February 2008 : 04:48:05
|
Thanks, Paddy, that's got me a bit closer Out of interest, what's the difference between . and [\s\S]? I'd been trying to get a match on (\[bullet\].*\[/bullet\]) but it was creating a seperate list for each bullet unless there was nothing at all between each bullet.
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
|
HuwR
Forum Admin
United Kingdom
20584 Posts |
Posted - 22 February 2008 : 04:52:22
|
\s vs \S
they actually mean the opposite of each other \s represents a White-Space char and \S non-White-space char
I would recomend downloading Regex Designer from http://www.radsoftware.com.au/regexdesigner/ it is free and very helpful for playing with Regex |
|
|
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 22 February 2008 : 05:20:30
|
Yup, using that at the moment
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
|
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 22 February 2008 : 06:54:06
|
A bit of progress Although a little sloppy looking, this seems to work perfectly:
(\[bullet\][\s\S]+?\[/bullet\])(\s{3,}|$)
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
|
HuwR
Forum Admin
United Kingdom
20584 Posts |
Posted - 22 February 2008 : 07:00:25
|
if you think that looks abit sloppy, you should try looking at some of the URL regex's in the .Net version |
|
|
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 22 February 2008 : 07:02:28
|
I think my head might explode! This one was bad enough.
I think I'll leave the clean-up of the red bits until the next site
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
|
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 22 February 2008 : 10:20:34
|
Of course, nothing's ever easy - Javascript handles line breaks differently to ASP
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
|
HuwR
Forum Admin
United Kingdom
20584 Posts |
Posted - 22 February 2008 : 10:47:10
|
yes, javascript probably only uses a \n rather than \n\r |
|
|
Shaggy
Support Moderator
Ireland
6780 Posts |
Posted - 22 February 2008 : 10:54:05
|
Exactly it. Ah, well, the JS is only used for previewing; I'll let it slide for now.
|
Search is your friend “I was having a mildly paranoid day, mostly due to the fact that the mad priest lady from over the river had taken to nailing weasels to my front door again.” |
|
|
Topic |
|