Multiple line regular expressions
Today I was looking for a way to remove javascript from HTML pages. The main problem was that regular expressions hardly take into account multiple lines. So after thinking quite a bit (and being unable to find a solution), I finally decided to do a little hack to solve my problem.
What was the hack? To convert all newline characters in HTML text to a temporary marker, apply the regular expression, and then convert them back.
originalresponse=originalresponse.gsub(/(\r\n|\n|\r)/,'[[[NEWLINE]]]')
originalresponse=originalresponse.gsub(/<script([^>]+)>.*?<\/script>/,'')
originalresponse=originalresponse.gsub('[[[NEWLINE]]]',"\n")
3 Comments:
I know, a little late comment. But maybe it help anybody that finds you page looking for a solution just like us.
I thing the "correct" way of doing this is to use multiline mode.
"Multiline Mode. Normally, ``.'' matches any character except a newline. With the /m option, ``.'' matches any character."
Which means you can write your regexps like this
/Line1.*Line10/m
Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.
http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html
Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.
It was really a nice post and I was really impressed by reading this Ruby on Rails Online Training Hyderabad
Post a Comment
<< Home