I am trying to extract a value from a long string that may change over time. So for example the string could look something like this
....../filename-1.9.0.3.tar.gz"<....
And what I want to extract is the value between filename- and .tar.gz, essentially the file version (1.9.0.3 in this case). The reason I need to do it this way is because I may later run the command and the value will be 1.9.0.6 or 2.0.0.2 or something entirely different.
How can I do this? I’m currently only using grep, but I wouldn’t mind using other utilities such as sed or awk or cut or whatever. To be perfectly clear, I need to extract only the file version part of the string, since it is very long (on both sides) everything else needs to be cut out somehow.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
With grep -P/pcregrep, using a positive look-behind and a positive look-ahead:
grep -P -o '(?<=STRING1).*?(?=STRING2)' infile
in your case replace STRING1 with filename- and STRING2 with .tar.gz
If you don’t have access to pcregrep and/or if your grep doesn’t support -P you can do this with your favourite text processing tool. Here’s a portable way with ed that gives you the same output:
ed -s infile <<IN g/STRING1/s// &/g v/STRING1.*STRING2/d ,s/STRING1// ,s/STRING2.*// ,p IN
How it works: a newline is prepended to each STRING1 occurrence (so now there’s at most one occurrence per line) then all lines not matching STRING1.*STRING2 are deleted; on the remaining ones we only keep what’s between STRING1 and STRING2 and print the result.
Method 2
For the benefit of people without grep -P, you can do this with sed or awk on any POSIX system.
sed -n -e 's/^.*/filename-([^/]*).tar.gz.*$/1/p' -e T -e q
Explanation: turn off default printing, find a line containing the desired pattern and substitute everything away except the part you want to keep, print the result of the substitution, and exit if there was a match. Note that if there are multiple matches on the first matching line, this picks up the last one.
With awk (picking the first match on the line):
awk 'match($0, /filename-[^/]*.tar.gz/) {
print substr(RSTART + 9, RSTART + RLENGTH - 9 - 6, $0);
exit;
}'
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0