Issues with System.Text.Json serializing Unicode characters (like emojis)

I am upgrading an application from .NET Core 2.2 to .NET Core 3.0, and the new System.Text.Json serializer is not behaving the same as Newtonsoft did in 2.2. On characters like a non-breaking-space (u00A0) or emoji characters, Newtonsoft (and even Utf8Json) serialize them as their actual characters, not the Unicode code.

I’ve created a simple .NET Fiddle to show this.

var input = new Foo { Bar = "u00A0 Test <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="e9c8a9">[email protected]</a>#$%^&*() 💯uD83DuDCAF 你好" };
var newtonsoft = Newtonsoft.Json.JsonConvert.SerializeObject(input);
var system = System.Text.Json.JsonSerializer.Serialize(input, new System.Text.Json.JsonSerializerOptions
    {
        Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping, 
    });
var utf8Json = Utf8Json.JsonSerializer.ToJsonString(input);

Console.WriteLine($"Original: {input.Bar} - {input.Bar.Contains('u00A0')}"); // Original
Console.WriteLine($"Newtonsoft: {newtonsoft} - {newtonsoft.Contains('u00A0')}"); // Works
Console.WriteLine($"System.Text.Json: {system} - {system.Contains('u00A0')}"); // Does not work
Console.WriteLine($"Utf8Json: {utf8Json} - {utf8Json.Contains('u00A0')}"); // Works

https://dotnetfiddle.net/erCaZl

Is there an Encoder or a JsonSerializerOptions property to serialize like Newtonsoft did?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

This is by-design. Our goal is to ship secure defaults, which is why we escape anything that we don’t know for a fact is safe. For practical reasons, we can’t detect all safe characters because that would mean us shipping large tables and perform potentially non-trivial lookups.

If you really insist, you can extend the JavaScriptEncoder class and choose the encoded characters yourself. I would advise against this because if you’re not careful people can sneak in payloads that might change the semantics of the JSON.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x