How to get all lines between first and last occurrences of patterns?

How can I trim a file (well input stream) so that I only get the lines ranging from the first occurrence of pattern foo to the last occurrence of pattern bar?

For instance consider the following input :

A line
like
foo
this 
foo
bar
something
something else
foo
bar
and
the
rest

I expect this output:

foo
this 
foo
bar
something
something else
foo
bar

Contents hide

Answers:

Method 1

Method 2

Method 3

Explanation

Method 4

Method 5

Method 6

Method 7

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

sed -n '/foo/{:a;N;/^n/s/^n//;/bar/{p;s/.*//;};ba};'

The sed pattern matching /first/,/second/ reads lines one by one. When some line matches to /first/ it remembers it and looks forward for the first match for the /second/ pattern. In the same time it applies all activities specified for that pattern. After that process starts again and again up to the end of file.

That’s not that we need. We need to look up to the last matching of /second/ pattern. Therefore we build construction that looks just for the first entry /foo/. When found the cycle a starts. We add new line to the match buffer with N and check if it matches to the pattern /bar/. If it does, we just print it and clear the match buffer and janyway jump to the begin of cycle with ba.

Also we need to delete newline symbol after buffer clean up with /^n/s/^n//. I’m sure there is much better solution, unfortunately it didn’t come to my mind.

Hope everything is clear.

Method 2

I would do it with a little Perl one-liner.

cat <<EOF | perl -ne 'BEGIN { $/ = undef; } print $1 if(/(foo.*bar)/s)'
A line
like
foo
this 
foo
bar
something
something else
foo
bar
and
the
rest
EOF

yields

foo
this 
foo
bar
something
something else
foo
bar

Method 3

Here’s a two-pass GNU sed solution that doesn’t require much memory:

< infile                                     
| sed -n '/foo/ { =; :a; z; N; /bar/=; ba }' 
| sed -n '1p; $p'                            
| tr 'n' ' '                                
| sed 's/ /,/; s/ /p/'                       
| sed -n -f - infile

Explanation

First sed invocation passes infile and finds first occurrence of foo and all subsequent occurrences of bar.
These addresses are then shaped into a new sed script with two invocations of sed and one tr. Output of the third sed is [start_address],[end_address]p, without the brackets.
Final invocation of sed passes the infile again, printing the found addresses and everything in between.

Method 4

If the input file fits comfortably in memory, keep it simple.

If the input file is huge, you can use csplit to break it into pieces at the first foo and at every subsequent bar then assemble the pieces. The pieces are called piece-000000000, piece-000000001, etc. Choose a prefix (here, piece-) that won’t clash with other existing files.

csplit -f piece- -n 9 - '%foo%' '/bar/' '{*}' <input-file

(On non-Linux systems, you’ll have to use a large number inside the braces, e.g. {999999999}, and pass the -k option. That number is the number of bar pieces.)

You can assemble all the pieces with cat piece-*, but this will give you everything after the first foo. So remove that last piece first. Since the file names produced by csplit don’t contain any special characters, you can work them over without taking any special quoting precaution, e.g. with

rm $(echo piece-* | sed 's/.* //')

or equivalently

rm $(ls piece-* | tail -n 1)

Now you can join all the pieces and remove the temporary files:

cat piece-* >output
rm piece-*

If you want to remove the pieces as they are concatenated to save disk space, do it in a loop:

mv piece-000000000 output
for x in piece-?????????; do
  cat "$x" >>output; rm "$x"
done

Method 5

Here’s another way with sed:

sed '/foo/,$!d;H;/bar/!d;s/.*//;x;s/n//' infile

It appends each line in /foo/,$ range (lines ! not in this range are deleted) to Hold space. Lines not matching bar are then deleted. On lines that match, the pattern space is emptied, exchanged with the hold space and the leading empty line in the pattern space is removed.

With huge input and few occurrences of bar this should be (much) faster than pulling each line into pattern space and then, each time, checking the pattern space for bar.
Explained:

sed '/foo/,$!d                     # delete line if not in this range
H                                  # append to hold space
/bar/!d                            # if it doesn't match bar, delete 
s/.*//                             # otherwise empty pattern space and
x                                  # exchange hold buffer w. pattern space then
s/n//                             # remove the leading newline
' infile

Sure, if this is a file (and fits in memory) you could simply run:

 ed -s infile<<'IN'
.t.
/foo/,?bar?p
q
IN

because ed can search forward and backward.
You could even read a command output into the text buffer if your shell supports process substitution:

printf '%sn' .t. /foo/,?bar?p q | ed -s <(your command)

or if it doesn’t, with gnu ed:

printf '%sn' .t. /foo/,?bar?p q | ed -s '!your command'

Method 6

Using any awk in any shell on any UNIX system and without reading the whole file or input stream into memory at one time:

$ awk '
    f {
        rec = rec $0 ORS
        if (/bar/) {
            printf "%s", rec
            rec = ""
        }
        next
    }
    /foo/ { f=1; rec=$0 ORS }
' file
foo
this
foo
bar
something
something else
foo
bar

Method 7

Grep could do it also (well, GNU grep):

<infile grep -ozP '(?s)foo.*bar' | tr '' 'n'

<infile grep -ozP '        #  call grep to print only the matching section (`-o`)
                           #  use NUL for delimiter (`-z`) (read the whole file).
                           #  And using pcre regex.
(?s)foo.*bar               #  Allow the dot (`.`) to also match newlines.
' | tr '' 'n'           #  Restore the NULs to newlines.

For the input from the question body:

$ <infile grep -ozP '(?s)foo.*bar' | tr '' 'n'
foo
this 
foo
bar
something
something else
foo
bar

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating