Replace strings in a file based on a list of strings and a list of corresponding replacements

I am trying to replace strings in a file A:

Hello Peter, how is your dad? where is mom?

where the strings to be replaced are in file B:

Peter
dad
mom

and their corresponding replacements are in file C:

John
wife
grandpa

Expected outcome:

Hello John, how is your wife? where is grandpa?

Can I edit file A, replacing the value in file B by using the value from the corresponding line in file C?

What I have done so far:

 cat 1.txt | sed -e "s/$(sed 's:/:\/:g' 2.txt)/$(sed 's:/:\/:g' 3.txt)/" > 4.txt

it works if there is only one line in file B & file C, if there is more than one line, it won’t work.

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The easiest way to do this with sed is to process those two lists and turn them into a script-file e.g.

s/line1-from-fileB/line1-from-fileC/g
s/line2-from-fileB/line2-from-fileC/g
....................................
s/lineN-from-fileB/lineN-from-fileC/g

that sed will then execute, editing fileA. The proper way is to process the LHS/RHS first and escape any special characters that may appear on those lines, then join the LHS and RHS adding the s, the delimiters / and the g (e.g. with paste) and pipe the result to sed:

paste -ds///g /dev/null /dev/null 
<(sed 's|[[.*^$/]|\&|g' fileB) <(sed 's|[&/]|\&|g' fileC) 
/dev/null /dev/null | sed -f - fileA

So there it is: one paste and three seds that will process each file only once, regardless of the number of lines.
This assumes that your shell supports process substitution and that your sed can read a script-file from stdin. Also, it doesn’t edit in-place (I’ve left out the -i switch as it’s not supported by all seds)

Method 2

If you want the replacements to be done independently of each other, for instance for:

foo -> bar
bar -> foo

Applied on

foobar

To result in:

barfoo

as opposed to foofoo as a naive s/foo/bar/g; s/bar/foo/g translation would do, you could do:

perl -pe '
  BEGIN{
    open STRINGS, "<", <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d3a0bbbab5a79392819485">[email protected]</a> or die"STRINGS: $!";
    open REPLACEMENTS, "<", <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3043585956447071627766">[email protected]</a> or die "REPLACEMENTS: $!";
    while (defined($a=<STRINGS>) and defined($b=<REPLACEMENTS>)) {
      chomp ($a, $b);
      push @repl, $b;
      push @re, "$a(?{$repl=$repl[" . $i++. "]})"
    }
    eval q($re = qr{) . join("|", @re) . "}";
  }
  s/$re/$repl/g' strings.txt replacements.txt fileA

That’s perl regexps expected in patterns.txt. Since perl regexps can execute arbitrary code, it’s important that they be sanitized. If you want to replace fixed strings only, you can change that to:

perl -pe '
  BEGIN{
    open PATTERNS, "<", <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="05766d6c63714544574253">[email protected]</a> or die"PATTERNS: $!";
    open REPLACEMENTS, "<", <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="24574c4d42506465766372">[email protected]</a> or die "REPLACEMENTS: $!";
    for ($i = 0; defined($a=<PATTERNS>) and defined($b=<REPLACEMENTS>); $i++) {
      chomp ($a, $b);
      push @string, $a;
      push @repl, $b;
      push @re, "\Q$string[$i]\E(?{$repl=$repl[$i]})"
    }
    eval q($re = qr{) . join("|", @re) . "}";
  }
  s/$re/$repl/g' patterns.txt replacements.txt fileA

Method 3

In the simple example you show where each of the target words appears only once in the file, you could simply do:

$ paste fileB fileC | while read a b; do sed -i "s/$a/$b/" fileA; done
$ cat fileA
Hello John, how is your wife? where is grandpa?

The paste command will print the data from both files combined:

$ paste fileB fileC
Peter   John
dad wife
mom grandpa

We pass this through a simple while read loop which will iterate over every line, saving the value from fileB as $a and that of fileC as $b. Then, the sed command will replace the first occurrence of $a with $b. This is repeated three times.

This approach is fine if you know that your target words only appear once in the file (they have to, otherwise, you’ll need to provide more details that we can use to identify which occurrence should be replaced) and if your files are tiny, like what you showed. For larger files, this will take a long time and is very inefficient since it will need to be run once for every pair of words.

So, if you have larger files, you might want something like this instead:

paste fileB fileC | 
    perl -lane '$words{$F[0]}=$F[1]} 
        END{open(A,"fileA"); while(<A>){s/$_/$words{$_}/ for keys %words; print}'

Method 4

Using xargs, paste, and sed commands:

xargs -a <(paste -d'/' fileB fileC) -L1 -I @ sed -i "s/@/g" fileA

This will process fileA N times where N is the number of lines in fileB or fileC.

Method 5

solution i’ve created is not very short, but is simple enough to be very readable. unless your task was to do the whole thing with sed… ?

 #!/usr/bin/bash

 cp A.txt D.txt

 x=1
 length=$(wc -l B.txt | sed 's/ .*//g')

 until [ $x -eq $length ]; do

    Bx=$(awk "NR==$x" B.txt)
    Cx=$(awk "NR==$x" C.txt)

    sed -i "s/$Bx/$Cx/g" D.txt

    x=$(($x+1))

 done

 rm -f ./sed*

note that this script creates a tonne of junk files if B.txt longer than C.txt and perhaps visa versa (didn’t test it that far)

Method 6

This might help your problem solved.
(Refer: https://unix.stackexchange.com/questions/283017/awk-command-i-want-to-compare-two-rows-in-two-files-and-update-the-second-file-i)

Source.txt has following two lines:

OldString
NewString

Before command execution Target.txt has following lines:

OldString ==> NewString
This is Target File containing OldString now.
OldString is to be replaced.
NewString won't get impacted.

Use:

awk -v lookupStr=`awk 'NR==1' Source.txt` -v replacementStr=`awk 'NR==2' Source.txt` 'NR==2 && (idx=index($0,lookupStr)) { $0=substr($0,1,idx-1) replacementStr substr($0,idx+length(lookupStr)) } 1' Target.txt > temp.txt && mv temp.txt Target.txt

Post command execution Target.txt has following line:

OldString ==> NewString
This is Target File containing NewString now.
OldString is to be replaced.
NewString won't get impacted.

Here I have defined two variables lookupStr and replacementStr. both are assigned to line#1 and line#2 of Source.txt respectively.
Then in the Sencond line of Target.txt I am replacing content of $0 with first character till index of lookupStr (i.e.”OldString”) then appending the replacementStr (i.e. “NewString”) and then concatenating rest of the characters. At the end output is being written to a temp.txt and same is renamed to Target.txt

screenshot of confirmation

If you need to do this replacement exercise in entire file, just remove condition NR==2 && from above command.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x