HTML Sanitizer for .NET

I’m starting a project that will be public facing using asp.net mvc. I know there are about a billion php, python, and ruby html sanitizers out there, but does anyone have some pointers to anything good in .net? What are your experiences with what is out there? I know stackoverflow is a site done in asp.net that allows freeform HTML, what does it use?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

HtmlSanitizer

Source: https://github.com/mganss/HtmlSanitizer

A fairly robust sanitizer. It understands and can clean inline styles, but doesn’t have a parser that can deal with <style> blocks, so it strips them. It’s certainly up to and probably beyond the level that Microsoft’s AntiXSS was at, before it was abandoned.

Method 2

https://blog.stackoverflow.com/2008/06/safe-html-and-xss/

Method 3

HtmlRuleSanitizer

Based on your question I have the following suggestions:

  • You want to allow free form HTML, so you need a solution to be able to specify a set of tags, attributes and/or CSS classes which are allowed.
  • By allowing free form HTML it is likely that you’ll also have to deal with malformed HTML because users make errors (deliberate or not). You thus need a solution built on a tolerant parser such as the Html Agility Pack.
  • You’ll want to take a white listing approach because a black listing sanitizer does not protect your from any new HTML specifications. In addition it is very hard to guarantee that a black list covers all cases in the first place due to the size of the HTML specification.

I faced the same problem and built HtmlRuleSanitizer which is a white listing rule based HTML sanitizer on top of the Html Agility Pack.

Method 4

there is a c# version here

Method 5

Here is one built by microsoft. http://wpl.codeplex.com/

var cleanHtml = Sanitizer.GetSafeHtml(unsafeHtml);

Method 6

We can also use

AntiXss.GetSafeHtmlFragments

sanitize input by parsing the HTML fragment,to use this sanitizer for rich content to ensure that it does not content any harmful script and it is safe to be displayed on the browser.For the text input(not rich content) to use AntiXss.HtmlEncode or any other equivalent html encoder.Here is the Sample for rich content.

 string mal = "<IMG NAME = 'myPic' SRC = 'images / myPic.gif' onerror='alert(1)' onerror='alert(1) ><div bottommargin = 150 ondblclick = 'alert('double clicked!')' >< p > Double - click anywhere in the page.</p> </div> ";
                var cleanHtml = Sanitizer.GetSafeHtmlFragment(mal);
                Console.Write(cleanHtml);
                Console.Read();

Note: Download AntiXSS library fron nugetpackage manager and include this namesapce
Microsoft.Security.Application in the souce code;


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x