It seems I am misusing grep/egrep.
I was trying to search for strings in multiple line and could not find a match while I know that what I’m looking for should match. Originally I thought that my regexes were wrong but I eventually read that these tools operate per line (also my regexes were so trivial it could not be the issue).
So which tool would one use to search patterns across multiple lines?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Here’s a sed one that will give you grep-like behavior across multiple lines:
sed -n '/foo/{:start /bar/!{N;b start};/your_regex/p}' your_file
How it works
-nsuppresses the default behavior of printing every line/foo/{}instructs it to matchfooand do what comes inside the squigglies to the matching lines. Replacefoowith the starting part of the pattern.:startis a branching label to help us keep looping until we find the end to our regex./bar/!{}will execute what’s in the squigglies to the lines that don’t matchbar. Replacebarwith the ending part of the pattern.Nappends the next line to the active buffer (sedcalls this the pattern space)b startwill unconditionally branch to thestartlabel we created earlier so as to keep appending the next line as long as the pattern space doesn’t containbar./your_regex/pprints the pattern space if it matchesyour_regex. You should replaceyour_regexby the whole expression you want to match across multiple lines.
Method 2
I generally use a tool called pcregrep which can be installed in most of the linux flavour using yum or apt.
For eg.
Suppose if you have a file named testfile with content
abc blah blah blah def blah blah blah
You can run the following command:
$ pcregrep -M 'abc.*(n|.)*def' testfile
to do pattern matching across multiple lines.
Moreover, you can do the same with sed as well.
$ sed -e '/abc/,/def/!d' testfile
Method 3
Simply a normal grep which supports Perl-regexp parameter P will do this job.
$ echo 'abc blah blah blah def blah blah blah' | grep -oPz '(?s)abc.*?def' abc blah blah blah def
(?s) called DOTALL modifier which makes dot in your regex to match not only the characters but also the line breaks.
Method 4
Here’s a simpler approach using Perl:
perl -e '$f=join("",<>); print $& if $f=~/foonbar.*n/m' file
or (since JosephR took the sed route, I’ll shamelessly steal his suggestion)
perl -n000e 'print $& while /^foo.*nbar.*n/mg' file
Explanation
$f=join("",<>); : this reads the entire file and saves its contents (newlines and all) into the variable $f. We then attempt to match foonbar.*n, and print it if it matches (the special variable $& holds the last match found). The ///m is needed to make the regular expression match across newlines.
The -0 sets the input record separator. Setting this to 00 activates ‘paragraph mode’ where Perl will use consecutive newlines (nn) as the record separator. In cases where there are no consecutive newlines, the entire file is read (slurped) at once.
Warning:
Do not do this for large files, it will load the entire file into memory and that may be a problem.
Method 5
Supppose we have the file test.txt containing :
blabla blabla foo here is the text to keep between the 2 patterns bar blabla blabla
The following code can be used :
sed -n '/foo/,/bar/p' test.txt
For the following output :
foo here is the text to keep between the 2 patterns bar
Method 6
The grep alternative sift supports multiline matching (disclaimer: I am the author).
Suppose testfile contains:
<book> <title>Lorem Ipsum</title> <description>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua</description> </book>
sift -m '<description>.*?</description>' (show the lines containing the description)
Result:
testfile: <description>Lorem ipsum dolor sit amet, consectetur testfile: adipiscing elit, sed do eiusmod tempor incididunt ut testfile: labore et dolore magna aliqua</description>
sift -m '<description>(.*?)</description>' --replace 'description="$1"' --no-filename (extract and reformat the description)
Result:
description="Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua"
Method 7
I solved this one for me using grep and -A option with another grep.
grep first_line_word -A 1 testfile | grep second_line_word
The -A 1 option prints 1 line after the found line. Of course it depends on your file and word combination. But for me it was the fastest and reliable solution.
Method 8
One way to do this is with Perl. e.g. here’s the contents of a file named foo:
foo line 1 bar line 2 foo foo foo line 5 foo bar line 6
Now, here’s some Perl which will match against any line that begins with foo followed by any line that begins with bar:
cat foo | perl -e 'while(<>){$all .= $_}
while($all =~ /^(foo[^n]*nbar[^n]*n)/m) {
print $1; $all =~ s/^(foo[^n]*nbar[^n]*n)//m;
}'
The Perl, broken down:
while(<>){$all .= $_}This loads the entire standard input in to the variable$allwhile($all =~While the variableallhas the regular expression…/^(foo[^n]*nbar[^n]*n)/mThe regex: foo at the beginning of the line, followed by any number of non-newline chars, followed by a newline, followed immediately by “bar”, and the rest of the line with bar in it./mat the end of the regex means “match across multiple lines”print $1Print the part of the regex that was in parenthesis (in this case, the entire regular expression)s/^(foo[^n]*nbar[^n]*n)//mErase the first match for the regex, so we can match multiple cases of the regex in the file in question
And the output:
foo line 1 bar line 2 foo bar line 6
Method 9
If we want to get the text between the 2 patterns excluding themselves.
Supppose we have the file test.txt containing :
blabla blabla foo here is the text to keep between the 2 patterns bar blabla blabla
The following code can be used :
sed -n '/foo/{
n
b gotoloop
:loop
N
:gotoloop
/bar/!{
h
b loop
}
/bar/{
g
p
}
}' test.txt
For the following output :
here is the text to keep between the 2 patterns
How does it work, let’s make it step by step
/foo/{is triggered when line contains “foo”nreplace the pattern space with next line, i.e. the word “here”b gotoloopbranch to the label “gotoloop”:gotoloopdefines the label “gotoloop”/bar/!{if the pattern doesn’t contain “bar”hreplace the hold space with pattern, so “here” is saved in the hold spaceb loopbranch to the label “loop”:loopdefines the label “loop”Nappends the pattern to the hold space.
Now hold space contains :
“here”
“is the”:gotoloopWe are now at step 4, and loop until a line contains “bar”/bar/loop is finished, “bar” has been found, it’s the pattern spacegpattern space is replaced with hold space that contains all the lines between “foo” and “bar” that have saved during the main looppcopy pattern space to standard output
Done !
Method 10
cat file | egrep "<pattern1>|<pattern2>"
would list all lines matching with either <pattern1> or <pattern2>.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0