I want to replace only the first k instances of a word.
How can I do this?
Eg. Say file foo.txt contains 100 instances occurrences of word ‘linux’ .
I need to replace first 50 occurrences only.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The first section belows describes using sed to change the first k-occurrences on a line. The second section extends this approach to change only the first k-occurrences in a file, regardless of what line they appear on.
Line-oriented solution
With standard sed, there is a command to replace the k-th occurrance of a word on a line. If k is 3, for example:
sed 's/old/new/3'
Or, one can replace all occurrences with:
sed 's/old/new/g'
Neither of these is what you want.
GNU sed offers an extension that will change the k-th occurrance and all after that. If k is 3, for example:
sed 's/old/new/g3'
These can be combined to do what you want. To change the first 3 occurrences:
$ echo old old old old old | sed -E 's/<old>/n/g4; s/<old>/new/g; s/n/old/g' new new new old old
where n is useful here because we can be sure that it never occurs on a line.
Explanation:
We use three sed substitution commands:
-
s/<old>/n/g4This the GNU extension to replace the fourth and all subsequent occurrences of
oldwithn.The extended regex feature
<is used to match the beginning of a word and>to match the end of a word. This assures that only complete words are matched. Extended regex requires the-Eoption tosed. -
s/<old>/new/gOnly the first three occurrences of
oldremain and this replaces them all withnew. -
s/n/old/gThe fourth and all remaining occurrences of
oldwere replaced withnin the first step. This returns them back to their original state.
Non-GNU solution
If GNU sed is not available and you want to change the first 3 occurrences of old to new, then use three s commands:
$ echo old old old old old | sed -E -e 's/<old>/new/' -e 's/<old>/new/' -e 's/<old>/new/' new new new old old
This works well when k is a small number but scales poorly to large k.
Since some non-GNU seds do not support combining commands with semicolons, each command here is introduced with its own -e option. It may also be necessary to verify that your sed supports the word boundary symbols, < and >.
File-oriented solution
We can tell sed to read the whole file in and then perform the substitutions. For example, to replace the first three occurrences of old using a BSD-style sed:
sed -E -e 'H;1h;$!d;x' -e 's/<old>/new/' -e 's/<old>/new/' -e 's/<old>/new/'
The sed commands H;1h;$!d;x read the whole file in.
Because the above does not use any GNU extension, it should work on BSD (OSX) sed. Note, thought, that this approach requires a sed that can handle long lines. GNU sed should be fine. Those using a non-GNU version of sed should test its ability to handle long lines.
With a GNU sed, we can further use the g trick described above, but with n replaced with x00, to replace the first three occurrences:
sed -E -e 'H;1h;$!d;x; s/<old>/x00/g4; s/<old>/new/g; s/x00/old/g'
This approach scales well as k becomes large. This assumes, though, that x00 is not in your original string. Since it is impossible to put the character x00 in a bash string, this is usually a safe assumption.
Method 2
Using Awk
The awk commands can be used to replace the first N occurrences of the word with the replacement.
The commands will only replace if the word is a complete match.
In the examples below, I am replacing the first 27 occurrences of old with new
Using sub
awk '{for(i=1;i<=NF;i++){if(x<27&&$i=="old"){x++;sub("old","new",$i)}}}1' file
This command loops through each field until it matches
old, it checks the counter is below 27, increments and the substitutes the first match on the line. Then moves onto the next field/line and repeats.
Replacing the field manually
awk '{for(i=1;i<=NF;i++)if(x<27&&$i=="old"&&$i="new")x++}1' file
Similar to the command before but as it already has a marker on which field it is up to
($i), it simply changes the value of the field fromoldtonew.
Performing a check before
awk '/old/&&x<27{for(i=1;i<=NF;i++)if(x<27&&$i=="old"&&$i="new")x++}1' file
Checking that the line contains old and the counter is below 27
SHOULDprovide a small speed boost as it won’t process lines when these are false.
RESULTS
E.g
old bold old old old old old nold old old old old old gold old old gold gold old old old old old man old old old old old old dog old old old old old say old old old old old blah old
to
new bold new new new new new nold new new new new new gold new new gold gold new new new new new man new new new new new new dog new new new old old say old old old old old blah old
Method 3
Say you want to replace only the first three instances of a string…
seq 11 100 311 |
sed -e 's/1/
&/g' #s/match string/nmatch string/globally
-e :t #define label t
-e '/n/{ x' #newlines must match - exchange hold and pattern spaces
-e '/.{3}/!{' #if not 3 characters in hold space do
-e 's/$/./' #add a new char to hold space
-e x #exchange hold/pattern spaces again
-e 's/n1/2/' #replace first occurring 'n1' string w/ '2' string
-e 'b t' #branch back to label t
-e '};x' #end match function; exchange hold/pattern spaces
-e '};s/n//g' #end match function; remove all newline characters
note: the above will likely not work with embedded comments
…or in my example case, of a ‘1’…
OUTPUT:
22 211 211 311
There I use two notable techniques. In the first place every occurrence of 1 on a line is replaced with n1. In this way, as I do the recursive replacements next, I can be sure not to replace the occurrence twice if my replacement string contains my replace string. For example, if I replace he with hey it will still work.
I do this like:
s/1/ &/g
Secondly, I am counting the replacements by adding a character to hold space for each occurrence. Once I reach three no more occur. If you apply this to your data and change the {3} to the total replacements you desire and the /n1/ addresses to whatever you mean to replace, you should replace only as many as you wish.
I only did all of the -e stuff for readability. POSIXly It could be written like this:
nl='
'; sed "s/1/\$nl&/g;:t${nl}/n/{x;/.{3}/!{${nl}s/$/./;x;s/n1/2/;bt$nl};x$nl};s/n//g"
And w/ GNU sed:
sed 's/1/n&/g;:t;/n/{x;/.{3}/!{s/$/./;x;s/n1/2/;bt};x};s/n//g'
Remember also that sed is line-oriented – it does not read in the entire file and then attempt to loop back over it as is often the case in other editors. sed is simple and efficient. That said, it is often convenient to do something like the following:
Here is a little shell function that bundles it up into a simply executed command:
firstn() { sed "s/$2/
&/g;:t
/n/{x
/.{$(($1))"',}/!{
s/$/./; x; s/n'"$2/$3"'/
b t
};x
};s/n//g'; }
So with that I can do:
seq 11 100 311 | firstn 7 1 5
…and get…
55 555 255 311
…or…
seq 10 1 25 | firstn 6 '(.)([1-5])' '152'
…to get…
10 151 152 153 154 155 16 17 18 19 20 251 22 23 24 25
…or, to match your example (on a smaller order of magnitude):
yes linux | head -n 10 | firstn 5 linux 'linux is an os kernel' linux is an os kernel linux is an os kernel linux is an os kernel linux is an os kernel linux is an os kernel linux linux linux linux linux
Method 4
A short alternative in Perl:
perl -pe 'BEGIN{$n=3} 1 while s/old/new/ && ++$i < $n' your_file
Change the value of `$n$ to your liking.
How it works:
- For every line, it keeps trying to substitute
newforold(s/old/new/) and whenever it can, it increments the variable$i(++$i). - It keeps working on the line (
1 while ...) as long as it has made less than$nsubstitutions in total and it can make at least one substitution on that line.
Method 5
Use a shell loop and ex!
{ for i in {1..50}; do printf %s\n '0/old/s//new/'; done; echo x;} | ex file.txt
Yes, it’s a bit goofy.
😉
Note: This may fail if there are less than 50 instances of old in the file. (I haven’t tested it.) If so, it would leave the file unmodified.
Better yet, use Vim.
vim file.txt qqgg/old<CR>:s/old/new/<CR><a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="502164691021">[email protected]</a> :x
Explanation:
q # Start recording macro
q # Into register q
gg # Go to start of file
/old<CR> # Go to first instance of 'old'
:s/old/new/<CR> # Change it to 'new'
q # Stop recording
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d2e6eb92a3">[email protected]</a> # Replay macro 49 times
:x # Save and exit
Method 6
A simple, but not very fast solution is to loop over commands described in
https://stackoverflow.com/questions/148451/how-to-use-sed-to-replace-only-the-first-occurrence-in-a-file
for i in $(seq 50) ; do sed -i -e "0,/oldword/s//newword/" file.txt ; done
This particular sed command probably works only for GNU sed and if newword is not part of oldword. For non-GNU sed see here how to replace only the first pattern in a file.
Method 7
With GNU awk you can set the record separator RS to the word to be replaced delimited by word boundaries. Then it is a case of setting the record separator on the output to the replacement word for the first k records while retaining the original record separator for the remainder
awk -vRS='\ylinux\y' -vreplacement=unix -vlimit=50
'{printf "%s%s", $0, NR <= limit? replacement: RT}' file
OR
awk -vRS='\ylinux\y' -vreplacement=unix -vlimit=50
'{printf "%s%s", $0, limit--? replacement: RT}' file
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0