I want to insert a record into MySQL that has a non-ASCII Unicode character, but I’m on a terminal that doesn’t let me easily type non-ASCII characters. How do I escape a Unicode literal in MySQL’s SQL syntax?
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
(Bug #10199: “Allow Unicode escape sequence for string literals.”)
This request has been “Open” since 2005. More details in Worklog Task #3529: Unicode Escape Sequences.
though, you can see the following example, which does actually seem to work, but requires you to know the actual byte-by-byte UTF8 encoding:
You can also use the variable-length UTF-8 representation (convenient when, for example, copying from a utf-8 URL-encoded value like %E2%80%98).mysql> select _utf8 x'E28098'; +---+ | ‘ | +---+
This stored function provides the functionality MySQL is (apparently) missing, with a way to turn a literal code point into a character without having to already know the UTF-8 encoding.
VARCHAR(1) seems strange, since utf8 characters in MySQL can be up to 3 bytes long, remember the size of
VARCHAR is characters, not bytes. The function returns a single UTF-8-encoded character from the input value.
For hexadecimal literals, prepend
DELIMITER $$ DROP FUNCTION IF EXISTS `utf8_char` $$ CREATE FUNCTION `utf8_char`(v smallint unsigned) RETURNS VARCHAR(1) CHARSET utf8 NO SQL DETERMINISTIC BEGIN -- http://stackoverflow.com/questions/3632410/mysql-unicode-literals/30675371#30675371 RETURN CHAR(CASE WHEN v <= 0x7F THEN v WHEN v <= 0x7FF THEN 0xC080 | ((v >> 6) << 8) | (v & 0x3F) WHEN v <= 0xFFFF THEN 0xE08080 | (((v >> 12) & 0x0F ) << 16) | (((v >> 6) & 0x3F ) << 8) | (v & 0x3F) ELSE NULL END); END $$ DELIMITER ;
mysql> select utf8_char(8592) AS 'leftwards_arrow'; +-----------------+ | leftwards_arrow | +-----------------+ | ← | +-----------------+ 1 row in set (0.00 sec) mysql> select utf8_char(0x2192) AS 'rightwards_arrow_hex'; +----------------------+ | rightwards_arrow_hex | +----------------------+ | → | +----------------------+ 1 row in set (0.00 sec)
If the goal is to specify the code point instead of the encoded byte sequence (i.e.
0x0F02 instead of the UTF-8
0xE0BC82 for “༂”), then you need to use an encoding in which the code point value just happens to be the encoded byte sequence. For example, “0xE28098″ is the UTF-8 encoded byte sequence for the ” ‘ ” character (as shown in dkamins’s answer), which is code point U+2018. However,
0x2018 is both the code point value for
‘ and the encoded byte sequence for
utf16 (they are effectively the same encoding for BMP characters, but I prefer to use “utf16” as it is consistent with “utf8” and “utf32”, consistent in the “utf” theme). Hence:
returns the same
‘ character as:
utf16 only works for BMP characters (code points U+0000 – U+FFFF) in terms of specifying the code point value. If you want a Supplementary Character (by specifying the code point instead of a specific encoding’s sequence of bytes), then you will need to use the
utf32 encoding. Not only does
_utf32 0x2018 return
To use either UTF-8 or UTF-16 encodings for that same Supplementary Character would require the following:
_utf8mb4 0xF09F91BE _utf16 0xD83DDC7E
HOWEVER, if you are having trouble adding this to a string that is already utf8, then you will need to convert this into
utf8 (or into
utf8mb4 when creating Supplementary Characters as the
utf8 encoding / charset can only handle BMP characters):
CONVERT(_utf32 0x1F47E USING utf8mb4)
Or, using the example character from Michael – sqlbot’s answer:
CONVERT(_utf32 0x2192 USING utf8)
→. Hence, a custom function is not needed in order to create a UTF-8 encoded character from its code point (at least not as of MySQL 8.0). Here is a test query
SELECT _utf32 0x1F47E AS "Supplementary Character in utf32", CONVERT(_utf32 0x1F47E USING utf8mb4) AS "Supplementary Character in utf8mb4", CHARSET(CONVERT(_utf32 0x1F47E USING utf8mb4)) AS "Proof", "---" AS "---", _utf32 0x2192 AS "BMP character in utf32", CONVERT(_utf32 0x2192 USING utf8) AS "BMP character in utf8", CHARSET(CONVERT(_utf32 0x2192 USING utf8)) AS "Proof";
And you can see it working on db<>fiddle (might not work in pre-8.0 MySQL).
For more details on these options, plus Unicode escape sequences for other languages and platforms, please see my post: