Is IIS performing an illegal character substitution? If so, how to stop it?

Context: ASP.NET MVC running in IIS, with a a UTF-8 %-encoded URL.

Using the standard project template, and a test-action in HomeController like:

public ActionResult Test(string id)
{
    return Content(id, "text/plain");
}

This works fine for most %-encoded UTF-8 routes, such as:

http://mydevserver/Home/Test/%e4%ba%ac%e9%83%bd%e5%bc%81

with the expected result 京都弁

However using the route:

http://mydevserver/Home/Test/%ee%93%bb

the url is not received correctly.

Aside: %ee%93%bb is %-encoded code-point 0xE4FB; basic-multilingual-plane, private-use area; but ultimately – a valid unicode code-point; you can verify this manually, or via:

string value = ((char) 0xE4FB).ToString();
string encoded = HttpUtility.UrlEncode(value); // %ee%93%bb

Now, what happens next depends on the web-server; on the Visual Studio Development Server (aka cassini), the correct id is received – a string of length one, containing code-point 0xE4FB.

If, however, I do this in IIS or IIS Express, I get a different id, specifically "î“»", code-points: 0xEE, 0x201C, 0xBB. You will immediately recognise the first and last as the start and end of our percent-encoded string… so what happened in the middle?

Well:

  • code-point 0x93 is (source)
  • code-point 0x201c is (source)

It looks to me very much like IIS has performed some kind of quote-translation when processing my url. Now maybe this might have uses in a few scenarios (I don’t know), but it is certainly a bad thing when it happens in the middle of a %-encoded UTF-8 block.

Note that HttpContext.Current.Request.Raw also shows this translation has occurred, so this does not look like an MVC bug; note also Darin’s comment, highlighting that it works differently in the path vs query portion of the url.

So (two-parter):

  1. is my analysis missing some important subtlety of unicode / url processing?
  2. how do I fix it? (i.e. make it so that I receive the expected character)

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

id = Encoding.UTF8.GetString(Encoding.Default.GetBytes(id));

This will give you your original id.
IIS uses Default (ANSI) encoding for path characters. Your url encoded string is decoded using that and that is why you’re getting a weird thing back.

To get the original id you can convert it back to bytes and get the string using utf8 encoding.

See Unicode and ISAPI Filters

ISAPI Filter is an ANSI API – all values you can get/set using the API
must be ANSI. Yes, I know this is shocking; after all, it is 2006 and
everything nowadays are in Unicode… but remember that this API
originated more than a decade ago when barely anything was 32bit, much
less Unicode. Also, remember that the HTTP protocol which ISAPI
directly manipulates is in ANSI and not Unicode.

EDIT: Since you mentioned that it works with most other characters so I’m assuming that IIS has some sort of encoding detection mechanism which is failing in this case. As a workaround though you can prefix your id with this char and then you can easily detect if the problem occurred (if this char is missing). Not a very ideal solution but it will work. You can then write your custom model binder and a wrapper class in ASP.NET MVC to make your consumption code cleaner.

Method 2

Once Upon A Time, URLs themselves were not in UTF-8. They were in the ANSI code page. This facilitates the fact that they often are used to select, well, pathnames in the server’s file system. In ancient times, IE had an option to tell whether you wanted to send UTF-8 URLs or not.

Perhaps buried in the bowels of the IIS config there is a place to specify the URL encoding, and perhaps not.

Method 3

Ultimately, to get around this, I had to use request.ServerVariables["HTTP_URL"] and some manual parsing, with a bunch of error-handling fallbacks (additionally compensating for some related glitches in Uri). Not great, but only affects a tiny minority of awkward requests.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x