On ASP.NET MVC 3, I created a Action Filter for white space removal from the entire html. It works as I expected most of the time but now I need to change the RegEx in order not to touch inside pre element.
I get the RegEx logic from awesome Mads Kristensen‘s blog and I am not sure how to modify it for this purpose.
Here is the logic:
public override void Write(byte[] buffer, int offset, int count) {
string HTML = Encoding.UTF8.GetString(buffer, offset, count);
Regex reg = new Regex(@"(?<=[^])t{2,}|(?<=[>])s{2,}(?=[<])|(?<=[>])s{2,11}(?=[<])|(?=[n])s{2,}");
HTML = reg.Replace(HTML, string.Empty);
buffer = System.Text.Encoding.UTF8.GetBytes(HTML);
this.Base.Write(buffer, 0, buffer.Length);
}
Whole code of the filter:
Any idea?
EDIT:
BIG NOTE:
My intention is totally not speed up the response time. In fact,
maybe this slows things down. I GZiped the pages and this minification makes me
gain approx 4 – 5 kb per page which is nothing.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Parsing HTML with regex very complicated and any simple solutions could break easily. (Use the right tool for the job.) That being said I’ll show a simple solution.
First I simplified the regex you had to:
(?<=s)s+
Replace those matches with an empty string to get rid of double spaces everywhere.
Assuming there are no < or > inside the pre tag, you can add (?![^<>]*</pre>) at the end of the expression to make it fail inside of pre tags. This makes sure that </pre> doesn’t follow current match, without any tags in between.
Resulting in:
(?<=s)s+(?![^<>]*</pre>)
Method 2
Please see the very epic RegEx match open tags except XHTML self-contained tags for all the reasons why regular expressions and HTML don’t get along.
If you’re using that approach above to make the page size smaller, you should definitely look into IIS compression as most browsers can take advantage of it and it’d be easier than how you’re going about it. Here’s how to do it in IIS 6 and IIS 7:
http://technet.microsoft.com/en-us/library/cc771003(WS.10).aspx
Method 3
Maybe break it up into four steps:
- extract any matching PRE elements using regex, something simple like “
start with <pre>(anything not </pre>)* end with </pre>“ - replace each of those matches with a separate GUID and save a dictionary of GUID -> pre element html.
- take out whitespace (won’t affect the GUIDs or their placement.
- iterate through the dictionary you saved in 2. and put the pre elements back in the correct spot.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0