In the following file:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Ut eu metus id lectus vestibulum ultrices. Maecenas rhoncus.
I want to delete everything before consectetuer and everything after elit.
My desired output:
consectetuer adipiscing elit.
How can I do this?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I’d use sed
sed 's/^.*(consectetuer.*elit).*$/1/' file
Decoded the sed s/find/replace/ syntax:
s/^.*— substitute starting at the beginning of the line (^) followed by anything (.*) up to…(– start a named blockconsectetuer.*elit.– match the first word, everything (.*) up to the last word (in this case, including the trailing (escaped)dot) you want to match)– end the named block- match everything else (
.*) to the end of the line ($) /– end the substitute find section1– replace with the name block between the(and the)above/– end the replace
Method 2
If every line contains both start and end pattern then the easiest way to do this is with grep. Instead of deleting the beginning and ending of each line you can simply output the contents between both patterns. The -o option in GNU grep outputs only the matches:
grep -o 'consectetuer.*elit' file
Note: as mentioned, this only works if every line in the file can be parsed this way. Then again, that’s 80% of all typical use-cases.
Method 3
I’m not sure why this question title has been edited “from file” to “from a line” while the OP doesn’t exclude the possibility across multiple lines even though the example seems to be one line only. Whatever, it might helpful to provide multiple lines solution here.
This works for cross-lines (This answer works if from1 and to2 exist in the file.):
from1=consectetuer; to2=elit; a="$(cat file)"; a="$(echo "${a#*"$from1"}")"; echo "$from1${a%%"$to2"*}$to2"
Examples:
[<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="304859515f525159704859515f525159">[email protected]</a> tmp]$ cat file
1
abc consectetuer lsl
home
def elit dd
2 consectetuer ABC elit
[<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="245c4d454b46454d645c4d454b46454d">[email protected]</a> tmp]$ from1=consectetuer; to2=elit; a="$(cat file)"; a="$(echo "${a#*"$from1"}")"; echo "$from1${a%%"$to2"*}$to2"
consectetuer lsl
home
def elit
[<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cbb3a2aaa4a9aaa28bb3a2aaa4a9aaa2">[email protected]</a> tmp]$
reference: Shell Parameter Expansion
Method 4
Two for loops in AWK:
$ awk '{for(i=1;i<=NF;i++) {if ($i == "consectetuer") beginning=i; if($i== "elit.") ending=i }; for (j=beginning;j<=ending;j++) printf $j" ";printf "n" }' file.txt
consectetuer adipiscing elit.
AWK’s gsub:
$ awk '{gsub(/^.*consectetuer/,"consectetuer"); gsub(/elit.*$/,"elit.");print}' file.txt
consectetuer adipiscing elit.
Method 5
A Perl way. This is essentially the same as MikeV’s sed answer:
perl -pe 's/.*(consectetuer.*elit).*./$1/' file
The -p means “print every line after applying the script given with -e“. The s/foo/bar/ is the substitution operator; it will replace foo with bar. The parentheses capture a pattern and let us use it in the replacement. The first captured pattern is $1, the second $2 and so on.
So, the command will match everything up to consectetuer (.*consectetuer), then everything until elit (.*elit) and then everything else until the end of the line (.*) and will replace that with the captured pattern.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0