In a regular expression, which characters need escaping?

In general, which characters in a regular expression need escaping?

For example, the following is not syntactically correct:

echo '[]' | grep '[]'
grep: Unmatched [ or [^

This, however, is syntatically correct:

echo '[]' | grep '[]'
[]

Is there any documentation on which characters should be escaped in a regular expression, and which should not?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

This depends on the application. In your example [ must be quoted as an argument for grep but not echo.

For the shell (from the POSIX specs):

Quoting is used to remove the special meaning of certain characters or
words to the shell. Quoting can be used to preserve the literal
meaning of the special characters in the next paragraph, prevent
reserved words from being recognized as such, and prevent parameter
expansion and command substitution within here-document processing
(see Here-Document).

The application shall quote the following characters if they are to
represent themselves:

|  &  ;  <  >  (  )  $  `    "  '  <space>  <tab>  <newline>

and the following may need to be quoted under certain circumstances.
That is, these characters may be special depending on conditions
described elsewhere in this volume of IEEE Std 1003.1-2001:

*   ?   [   #   ˜   =   %

The various quoting mechanisms are the escape character,
single-quotes, and double-quotes. The here-document represents
another form of quoting; see Here-Document.

Specific programs (using regexes, perl, awk) could have additional requirements on escaping.

Method 2

There are multiple types of regular expressions and the set of special characters depend on the particular type. Some of them are described below. In all the cases special characters are escaped by backslash . E.g. to match [ you write [ instead. Alternatively the characters (except ^) could be escaped by enclosing them between square brackets one by one like [[].

The characters which are special in some contexts like ^ special at the beginning of a (sub-)expression can be escaped in all contexts.

As others wrote: in shell if you do not enclose the expression between single quotes you have to additionally escape the special characters for the shell in the already escaped regex. Example: Instead of '[' you can write \[ (alternatively: "[" or "\[") in Bourne compatible shells like bash but this is another story.

Basic Regular Expressions (BRE)

  • POSIX: Basic Regular Expressions
  • Commands: grep, sed
  • Special characters: .[
  • Special in some contexts: *^$
  • Escape a string: "$(printf '%s' "$string" | sed 's/[.[*^$]/\&/g')"

Extended Regular Expressions (ERE)

  • POSIX: Extended Regular Expressions
  • Commands: grep -E, sed -E (old GNU versions: sed -r)
  • Special characters: .[(
  • Special in some contexts: *^$+?{|
  • Escape a string: "$(printf '%s' "$string" | sed 's/[.[(*^$+?{|]/\&/g')"

Method 3

Each application will have its own set of ‘special’ characters. The issue that you ran into was with grep not the shell. For which characters need to be quoted in grep, read the manpage’s section on “REGULAR EXPRESSIONS”.

For the shell, that characters that should be quoted are:

;'"`#$&*?[]<>{}

and any whitespace.

Depending on the shell, other characters may need to be quoted as well:

!^%

Look under “SHELL GRAMMAR” on the shell’s manpage.

Method 4

grep uses BRE as its regex method. There is good documentation on it here, a general rundown would be “escape any special character or metacharacter to get its literal, escape to create escape sequences (n, r, etc)”, although this is not always true, for example, you have to escape ( and ) to get their special meaning (backreference).

Method 5

The shell may transform the command line before the command execution. Both the shell and grep may use quoting to remove the special meaning of some characters. Nonetheless, grep and shells have different special characters. Moreover, unescaped special characters that did not result from an existing expansion are removed, before the command execution, by the shell.

echo '[]' | grep '[]'

The shell transmits the argument [] to grep and it is parsed as a malformed bracket expression by grep.

echo '[]' | grep []

Above, we can see a similar case. The backslash is removed and [] is transmitted as argument to grep. grep recognizes a malformed bracket expression.

echo '[]' | grep '[]'

Finally, in this case, the quotes are removed by the shell and [] is transmitted as argument to grep but, in this specific case ¹, [ is interpreted by grep as a literal bracket. Quotes are needed to prevent the interpretation of the backslash as a special character by the shell.


¹ POSIX specification.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x