How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?
I want to keep a-z, A-Z, 0-9 and nothing more (not even ” ” spaces).
"Hello there(hello#)".Replace(regex-i-want, "");
should give
"Hellotherehello"
I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", ""); but the spaces remain.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
In your regex, you have excluded the spaces from being matched (and you haven’t used Regex.Replace() which I had overlooked completely…):
result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");
should work. The + makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.
If you want to keep non-ASCII letters/digits, too, use the following regex:
@"[^p{L}p{N}]+"
which leaves
BonjourmesélèvesGutenMorgenliebeSchüler
instead of
BonjourmeslvesGutenMorgenliebeSchler
Method 2
You can use Linq to filter out required characters:
String source = "Hello there(hello#)";
// "Hellotherehello"
String result = new String(source
.Where(ch => Char.IsLetterOrDigit(ch))
.ToArray());
Or
String result = String.Concat(source
.Where(ch => Char.IsLetterOrDigit(ch)));
And so you have no need in regular expressions.
Method 3
Or you can do this too:
public static string RemoveNonAlphanumeric(string text)
{
StringBuilder sb = new StringBuilder(text.Length);
for (int i = 0; i < text.Length; i++)
{
char c = text[i];
if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
sb.Append(text[i]);
}
return sb.ToString();
}
Usage:
string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");
//text: textLaLalol123
Method 4
The mistake made above was using Replace incorrectly (it doesn’t take regex, thanks CodeInChaos).
The following code should do what was specified:
Regex reg = new Regex(@"[^p{L}p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");
This gives:
regexed = "Hellotherehello"
Method 5
And as a replace operation as an extension method:
public static class StringExtensions
{
public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
{
StringBuilder result = new StringBuilder(text.Length);
foreach(char c in text)
{
if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
result.Append(c);
else
result.Append(replaceChar);
}
return result.ToString();
}
}
And test:
[TestFixture]
public sealed class StringExtensionsTests
{
[Test]
public void Test()
{
Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ñ $ 123 ٠١٢٣٤".ReplaceNonAlphanumeric('_'));
}
}
Method 6
var text = "Hello there(hello#)";
var rgx = new Regex("[^a-zA-Z0-9]");
text = rgx.Replace(text, string.Empty);
Method 7
Use following regex to strip those all characters from the string using Regex.Replace
([^A-Za-z0-9s])
Method 8
In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx
However as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0