How to un-escape a backslash-escaped string?

Suppose I have a string which is a backslash-escaped version of another string. Is there an easy way, in Python, to unescape the string? I could, for example, do:

>>> escaped_str = '"Hello,\nworld!"'
>>> raw_str = eval(escaped_str)
>>> print raw_str
Hello,
world!
>>>

However that involves passing a (possibly untrusted) string to eval() which is a security risk. Is there a function in the standard lib which takes a string and produces a string with no security implications?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

>>> print '"Hello,\nworld!"'.decode('string_escape')
"Hello,
world!"

Method 2

You can use ast.literal_eval which is safe:

Safely evaluate an expression node or a string containing a Python
expression. The string or node provided may only consist of the
following Python literal structures: strings, numbers, tuples, lists,
dicts, booleans, and None. (END)

Like this:

>>> import ast
>>> escaped_str = '"Hello,\nworld!"'
>>> print ast.literal_eval(escaped_str)
Hello,
world!

Method 3

All given answers will break on general Unicode strings. The following works for Python3 in all cases, as far as I can tell:

from codecs import encode, decode
sample = u'mon€y\nröcks'
result = decode(encode(sample, 'latin-1', 'backslashreplace'), 'unicode-escape')
print(result)

In recent Python versions, this also works without the import:

sample = u'mon€y\nröcks'
result = sample.encode('latin-1', 'backslashreplace').decode('unicode-escape')

As outlined in the comments, you can also use the literal_eval method from the ast module like so:

import ast
sample = u'mon€y\nröcks'
print(ast.literal_eval(F'"{sample}"'))

Or like this when your string really contains a string literal (including the quotes):

import ast
sample = u'"mon€y\nröcks"'
print(ast.literal_eval(sample))

However, if you are uncertain whether the input string uses double or single quotes as delimiters, or when you cannot assume it to be properly escaped at all, then literal_eval may raise a SyntaxError while the encode/decode method will still work.

Method 4

In python 3, str objects don’t have a decode method and you have to use a bytes object. ChristopheD’s answer covers python 2.

# create a `bytes` object from a `str`
my_str = "Hello,\nworld"
# (pick an encoding suitable for your str, e.g. 'latin1')
my_bytes = my_str.encode("utf-8")

# or directly
my_bytes = b"Hello,\nworld"

print(my_bytes.decode("unicode_escape"))
# "Hello,
# world"

Method 5

For Python3, consider:

my_string.encode('raw_unicode_escape').decode('unicode_escape')

The ‘raw_unicode_escape’ codec encodes to latin1, but first replaces all other Unicode code points with an escaped 'uXXXX' or 'UXXXXXXXX' form. Importantly, it differs from the normal ‘unicode_escape’ codec in that it does not touch existing backslashes.

So when the normal ‘unicode_escape’ decoder is applied, both the newly-escaped code points and the originally-escaped elements are treated equally, and the result is an unescaped native Unicode string.

(The ‘raw_unicode_escape’ decoder appears to pay attention only to the 'uXXXX' and 'UXXXXXXXX' forms, ignoring all other escapes.)

Documentation:
https://docs.python.org/3/library/codecs.html?highlight=codecs#text-encodings

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating