Search and replace strings that are not substrings of other strings

I have a list of replacements like so:

search_and -> replace
big_boy -> bb
little_boy -> lb
good_dog -> gd
...

I need to make replacements for the above, but at the same time avoid matching strings that are longer like these:

big_boys
good_little_boy

I tried this:

sed -i -r "s/$(W){search}(W)/$1{replacement}2/g"

But the above does not work when the string (“good_dog” in this case) occurs at the end of a line like so:

Mary had a 'little_boy', good_little_boy, $big_boy, big_boys and good_dog

Mary had a 'lb', good_little_boy, $bb, big_boys and good_dog

And I doubt the above would work when the string occurs at the start of the line too. Is there a good way to do the search and replacement?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

If you’re using GNU sed (which bare -i suggests you are), there is a “word boundary” escape b:

sed -i "s/b$SEARCHb/$REPLACE/g"

b matches exactly on a word boundary: the character to one side is a “word” character, and the character to the other is not. It is a zero-width match, so you don’t need to use capturing subgroups to keep the value with 1 and 2. There is also B, which is exactly the opposite.


If you’re not using GNU sed, you can use alternation with the start and end of line in your capturing subpatterns: (W|^). That will match either a non-word character or the start of a line, and (W|$) will match either a non-word character or the end of a line. In that case you still use 1 and 2 as you were. Some non-GNU seds do support b anyway, at least in an extended mode, so it’s worth giving that a try regardless.

Method 2

If you want more portable, you can use < and >:

sed -i "s/<$SEARCH>/$REPLACE/g" file

< and > work in gsed, ssed, sed15, sed16, sedmod.

b and B work in gsed only.

In Mac OSX, you must use this syntax:

sed -i '' -e "/[[:<:]]$SEARCH[[:>:]]/$REPLACE/g" file

Method 3

You could also use perl, which should support b on all platforms. Assuming your list of replacements is in the format you show (separated by ->), you could do:

perl -F"->" -ane 'chomp;$rep{$F[0]}=${$F[1]}; 
                  END{open(A,"file"); 
                    while(<A>){
                        s/b$_b/$rep{$_}/g for keys(%rep); 
                        print
                    }
                  }' replacements

Explanation

  • The -a makes perl run like awk, automatically splitting fields into the array @F so $F[0] is the 1st field, $F[1] the second etc. The -F sets the input field separator, just like -F in awk. The -n means “read the input file, line by line and apply the script given by -e to each line”.
  • chomp : removes newlines (n) from the end of the line.
  • $rep{$F[0]}=${$F[1]}; : this populates the hash %rep making the pattern to be replaced (the first field, $F[0]) the key and the replacement ($F[1]) the value.
    *END{} : this is executed after the input file (replacements) has been read.
  • open(A,"file") : open the file file for reading with filehandle A.
  • while (<A>) : read the file line by line.
  • s/// for keys(%rep) : this will iterate through all the keys of the %rep hash, saving each key as the special variable $_. The s/// is the substitution operator and is making the same substitution as explained in Michael’s answer.

You could also read through the file and use sed as shown in the other answers:

$ sed 's/->/t/' replacements | 
    while IFS=$'t' read from to; do sed -i "s/b$fromb/$to/g" file; done


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x