Normalization of Strings With String.ToUpperInvariant()

I am currently storing normalized versions of strings in my SQL Server database in lower case. For example, in my Users table, I have a UserName and a LoweredUserName field. Depending on the context, I either use T-SQL’s LOWER() function or C#’s String.ToLower() method to generate the lower case version of the user name to fill the LoweredUserName field. According to Microsoft’s guidelines and Visual Studio’s code analysis rule CA1308, I should be using C#’s String.ToUpperInvariant() instead of ToLower(). According to Microsoft, this is both a performance and globalization issue: converting to upper case is safe, while converting to lower case can cause a loss of information (for example, the Turkish ‘I’ problem).

If I move to using ToUpperInvariant for string normalization, I will have to change my database schema as well, since my schema is based on Microsoft’s ASP.NET Membership framework (see this related question), which normalizes strings to lower case.

Isn’t Microsoft contradicting itself by telling us to use upper case normalization in C#, while it’s own code in the Membership tables and procedures is using lower case normalization? Should I switch everything to upper case normalization, or just continue using lower case normalization?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

According to CA1308, the reason to do this is that some characters cannot be roundtrip converted from upper to lower case. The important thing is that you always move in one direction, so if your standard is to always move to lower case then there is no reason to change it.

Method 2

To answer your first question, yes Microsoft is a bit inconsistent. To answer your second question, no do not switch anything until you have confirmed that this is causing a bottleneck in your application.

Think how much forward progress you can make on you project instead of wasting time switching everything. Your development time is much more valuable than the savings you would get from such a change.

Remember:

Premature optimization is the root of all evil (or at least most of it) in programming. – Donald Knuth

Method 3

Continue using lower case normalization. Only change to conform to Microsoft standards if a large issue develops.

This is unfortunate, but worthwhile. Sadly, Microsoft “standards” tend to be poorly considered and somewhat less than consistent; experience with them has shown that unless there is a compelling reason, it’s best to simply stick with what works while it works. Note that this is generally NOT true of non-Microsoft technologies; but the arbitrariness of the Microsoft “standards” makes them worth avoiding.

Edit: I should clarify here; my opinion of Microsoft is very low, from long experience with their standards. As was pointed out in the comments, I don’t have particular references to point out about “everybody else other than Microsoft”; this just comes from my personal experience. Your Mileage May Vary widely. This answer should be considered really just my opinion. Sorry for not making that more clear earlier.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x