Sunday, February 19, 2006

Multiple line regular expressions

Today I was looking for a way to remove javascript from HTML pages. The main problem was that regular expressions hardly take into account multiple lines. So after thinking quite a bit (and being unable to find a solution), I finally decided to do a little hack to solve my problem.

What was the hack? To convert all newline characters in HTML text to a temporary marker, apply the regular expression, and then convert them back.

originalresponse=originalresponse.gsub(/(\r\n|\n|\r)/,'[[[NEWLINE]]]')

originalresponse=originalresponse.gsub(/<script([^>]+)>.*?<\/script>/,'')

originalresponse=originalresponse.gsub('[[[NEWLINE]]]',"\n")

3 Comments:

At 10:44 PM, Blogger Unknown said...

I know, a little late comment. But maybe it help anybody that finds you page looking for a solution just like us.

I thing the "correct" way of doing this is to use multiline mode.

"Multiline Mode. Normally, ``.'' matches any character except a newline. With the /m option, ``.'' matches any character."

Which means you can write your regexps like this

/Line1.*Line10/m

 
At 11:42 AM, Blogger Demon said...

Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html

Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.

 
At 9:22 PM, Blogger Tejuteju said...

It was really a nice post and I was really impressed by reading this Ruby on Rails Online Training Hyderabad

 

Post a Comment

<< Home

eXTReMe Tracker