Sed — Replace first k instances of a word in the file

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

The first section belows describes using sed to change the first k-occurrences on a line. The second section extends this approach to change only the first k-occurrences in a file, regardless of what line they appear on.

Line-oriented solution

With standard sed, there is a command to replace the k-th occurrance of a word on a line. If k is 3, for example:

sed 's/old/new/3'

Or, one can replace all occurrences with:

sed 's/old/new/g'

Neither of these is what you want.

GNU sed offers an extension that will change the k-th occurrance and all after that. If k is 3, for example:

sed 's/old/new/g3'

These can be combined to do what you want. To change the first 3 occurrences:

$ echo old old old old old | sed -E 's/<old>/n/g4; s/<old>/new/g; s/n/old/g'
new new new old old

where n is useful here because we can be sure that it never occurs on a line.

Explanation:

We use three sed substitution commands:

s/<old>/n/g4
This the GNU extension to replace the fourth and all subsequent occurrences of old with n.

The extended regex feature < is used to match the beginning of a word and > to match the end of a word. This assures that only complete words are matched. Extended regex requires the -E option to sed.
s/<old>/new/g
Only the first three occurrences of old remain and this replaces them all with new.
s/n/old/g
The fourth and all remaining occurrences of old were replaced with n in the first step. This returns them back to their original state.

Non-GNU solution

If GNU sed is not available and you want to change the first 3 occurrences of old to new, then use three s commands:

$ echo old old old old old | sed -E -e 's/<old>/new/' -e 's/<old>/new/' -e 's/<old>/new/'
new new new old old

This works well when k is a small number but scales poorly to large k.

Since some non-GNU seds do not support combining commands with semicolons, each command here is introduced with its own -e option. It may also be necessary to verify that your sed supports the word boundary symbols, < and >.

File-oriented solution

We can tell sed to read the whole file in and then perform the substitutions. For example, to replace the first three occurrences of old using a BSD-style sed:

sed -E -e 'H;1h;$!d;x' -e 's/<old>/new/' -e 's/<old>/new/' -e 's/<old>/new/'

The sed commands H;1h;$!d;x read the whole file in.

Because the above does not use any GNU extension, it should work on BSD (OSX) sed. Note, thought, that this approach requires a sed that can handle long lines. GNU sed should be fine. Those using a non-GNU version of sed should test its ability to handle long lines.

With a GNU sed, we can further use the g trick described above, but with n replaced with x00, to replace the first three occurrences:

sed -E -e 'H;1h;$!d;x; s/<old>/x00/g4; s/<old>/new/g; s/x00/old/g'

This approach scales well as k becomes large. This assumes, though, that x00 is not in your original string. Since it is impossible to put the character x00 in a bash string, this is usually a safe assumption.

Method 2

Using Awk

The awk commands can be used to replace the first N occurrences of the word with the replacement.
The commands will only replace if the word is a complete match.

In the examples below, I am replacing the first 27 occurrences of old with new

Using sub

awk '{for(i=1;i<=NF;i++){if(x<27&&$i=="old"){x++;sub("old","new",$i)}}}1' file

This command loops through each field until it matches old, it checks the counter is below 27, increments and the substitutes the first match on the line. Then moves onto the next field/line and repeats.

Replacing the field manually

awk '{for(i=1;i<=NF;i++)if(x<27&&$i=="old"&&$i="new")x++}1' file

Similar to the command before but as it already has a marker on which field it is up to ($i), it simply changes the value of the field from old to new.

Performing a check before

awk '/old/&&x<27{for(i=1;i<=NF;i++)if(x<27&&$i=="old"&&$i="new")x++}1' file

Checking that the line contains old and the counter is below 27 SHOULD provide a small speed boost as it won’t process lines when these are false.

RESULTS

E.g

old bold old old old
old old nold old old
old old old gold old
old gold gold old old
old old old man old old
old old old old dog old
old old old old say old
old old old old blah old

new bold new new new
new new nold new new
new new new gold new
new gold gold new new
new new new man new new
new new new new dog new
new new old old say old
old old old old blah old

Method 3

Say you want to replace only the first three instances of a string…

seq 11 100 311 | 
sed -e 's/1/
&/g'               #s/match string/nmatch string/globally 
-e :t              #define label t
-e '/n/{ x'       #newlines must match - exchange hold and pattern spaces
-e '/.{3}/!{'    #if not 3 characters in hold space do
-e     's/$/./'    #add a new char to hold space
-e      x          #exchange hold/pattern spaces again
-e     's/n1/2/'  #replace first occurring 'n1' string w/ '2' string
-e     'b t'       #branch back to label t
-e '};x'           #end match function; exchange hold/pattern spaces
-e '};s/n//g'      #end match function; remove all newline characters

note: the above will likely not work with embedded comments
…or in my example case, of a ‘1’…

OUTPUT:

There I use two notable techniques. In the first place every occurrence of 1 on a line is replaced with n1. In this way, as I do the recursive replacements next, I can be sure not to replace the occurrence twice if my replacement string contains my replace string. For example, if I replace he with hey it will still work.

I do this like:

s/1/
&/g

Secondly, I am counting the replacements by adding a character to hold space for each occurrence. Once I reach three no more occur. If you apply this to your data and change the {3} to the total replacements you desire and the /n1/ addresses to whatever you mean to replace, you should replace only as many as you wish.

I only did all of the -e stuff for readability. POSIXly It could be written like this:

nl='
'; sed "s/1/\$nl&/g;:t${nl}/n/{x;/.{3}/!{${nl}s/$/./;x;s/n1/2/;bt$nl};x$nl};s/n//g"

And w/ GNU sed:

sed 's/1/n&/g;:t;/n/{x;/.{3}/!{s/$/./;x;s/n1/2/;bt};x};s/n//g'

Remember also that sed is line-oriented – it does not read in the entire file and then attempt to loop back over it as is often the case in other editors. sed is simple and efficient. That said, it is often convenient to do something like the following:

Here is a little shell function that bundles it up into a simply executed command:

firstn() { sed "s/$2/
&/g;:t 
    /n/{x
        /.{$(($1))"',}/!{
            s/$/./; x; s/n'"$2/$3"'/
            b t
        };x
};s/n//g'; }

So with that I can do:

seq 11 100 311 | firstn 7 1 5

…and get…

…or…

seq 10 1 25 | firstn 6 '(.)([1-5])' '152'

…to get…

…or, to match your example (on a smaller order of magnitude):

yes linux | head -n 10 | firstn 5 linux 'linux is an os kernel'
linux is an os kernel
linux is an os kernel
linux is an os kernel
linux is an os kernel
linux is an os kernel
linux
linux
linux
linux
linux

Method 4

A short alternative in Perl:

perl -pe 'BEGIN{$n=3} 1 while s/old/new/ && ++$i < $n' your_file

Change the value of `$n$ to your liking.

How it works:

For every line, it keeps trying to substitute new for old (s/old/new/) and whenever it can, it increments the variable $i (++$i).
It keeps working on the line (1 while ...) as long as it has made less than $n substitutions in total and it can make at least one substitution on that line.

Method 5

Use a shell loop and ex!

{ for i in {1..50}; do printf %s\n '0/old/s//new/'; done; echo x;} | ex file.txt

Yes, it’s a bit goofy.

😉

Note: This may fail if there are less than 50 instances of old in the file. (I haven’t tested it.) If so, it would leave the file unmodified.

Better yet, use Vim.

vim file.txt
qqgg/old<CR>:s/old/new/<CR><a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="502164691021">[email protected]</a>
:x

Explanation:

q                                # Start recording macro
 q                               # Into register q
  gg                             # Go to start of file
    /old<CR>                     # Go to first instance of 'old'
            :s/old/new/<CR>      # Change it to 'new'
                           q     # Stop recording
                            <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d2e6eb92a3">[email protected]</a> # Replay macro 49 times

:x  # Save and exit

Method 6

A simple, but not very fast solution is to loop over commands described in
https://stackoverflow.com/questions/148451/how-to-use-sed-to-replace-only-the-first-occurrence-in-a-file

for i in $(seq 50) ; do sed -i -e "0,/oldword/s//newword/"  file.txt  ; done

This particular sed command probably works only for GNU sed and if newword is not part of oldword. For non-GNU sed see here how to replace only the first pattern in a file.

Method 7

With GNU awk you can set the record separator RS to the word to be replaced delimited by word boundaries. Then it is a case of setting the record separator on the output to the replacement word for the first k records while retaining the original record separator for the remainder

awk -vRS='\ylinux\y' -vreplacement=unix -vlimit=50 
'{printf "%s%s", $0, NR <= limit? replacement: RT}' file

awk -vRS='\ylinux\y' -vreplacement=unix -vlimit=50 
'{printf "%s%s", $0, limit--? replacement: RT}' file

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating