I’m parsing a mailbox file that stores e-mail server reports for unsuccessfully delivered e-mail. I wish to extract bad e-mail addresses, so that I remove them from the system. The log file looks like this:
...some content...
The mail system
<<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="c3b0afa2b5aaa0a2b7acaeaaa0f2f2fb83abacb7aea2aaafeda0acae">[email protected]</a>>: host mx1.hotmail.com[65.54.188.94] said: 550
Requested action not taken: mailbox unavailable (in reply to RCPT TO
command)
...some content...
The mail system
<<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="2e41454716166e415e5a47435b435e5c4100404b5a">[email protected]</a>>: host viking.optimumpro.net[79.101.51.82] said: 550
Unknown user (in reply to RCPT TO command)
...some content...
The mail system
<<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="d8abb1bfb1aab6b987b4adb3b998a1b9b0b7b7f6bbb7b5">[email protected]</a>>: host mta5.am0.yahoodns.net[74.6.140.64] said: 554
delivery error: dd This user doesn't have a yahoo.com account
(<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="493a202e203b272816253c2228093028212626672a2624">[email protected]</a>) [0] - mta1172.mail.sk1.yahoo.com (in reply to end
of DATA command)
...etc.
E-mail address comes 2 lines after a line with “The mail system”. Using grep like this gives me the “The mail system” line and the next two lines:
grep -A 2 "The mail system" mbox_file
However, I don’t know how to remove the “The mail system” line and the second empty line from this output. I guess I could write PHP/Perl/Python script to do it, but I wonder if this is possible with grep or some other standard tool. I tried to give negative offset to -B parameter:
grep -A 2 -B -2 "The mail system" mbox_file
But grep complains:
grep: -2: invalid context length argument
Is there a way to do this with grep?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The simplest way to solve it using grep only, is to pipe one more inverted grep at the end.
For example:
grep -A 4 "The mail system" temp.txt | grep -v "The mail system" | grep -v '^d*$'
Method 2
If you aren’t locked in to using grep, try sed …
sed -n '/The mail system/{n;n;p}'
When it finds a line containing “The mail system”, it reads the next line twice, via the n;n;, discarding each previous line as it does so.
This leaves the 3rd line of your group in the pattern space, which is then printed via sed’s p command.. The leading -n option prevents all other printing.
To print the next two lines as well, it is just a case of next and print n;p twice more.
sed -n '/The mail system/{n; n;p; n;p; n;p}'
The next-line reads for the lines you require can be accumulated and printed a a single block with just one p… N reads the next line and appends it to the pattern space,
Here is the final condensed version…
sed -n '/The mail system/{n;n;N;N;p}'
If you want a group seperator, similar to what grep wouuld output, you can use sed’s insert command i (which must be the last command on a line)…
Here is the syntax to include a group seperator
sed -n '/The mail system/{n;n;N;N;p;i--
}' > output-file # or | ...
Here is the output for the first match:
<<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b5c6d9d4c3dcd6d4c1dad8dcd684848df5dddac1d8d4dcd99bd6dad8">[email protected]</a>>: host mx1.hotmail.com[65.54.188.94] said: 550
Requested action not taken: mailbox unavailable (in reply to RCPT TO
command)
--
Method 3
grep -A 2 -B -2 "The mail system" mbox_file
-B is for previous lines, so no need to give -negative value.
grep -A 2 -B 2 "The mail system" mbox_file # This will work please check
Method 4
I see no point in using only grep(s), except if that’s a strict constraint.
It cannot be done with one call to grep.
grep -A 2 "The mail system" mbox_file | tail -n +3
- grep: Find the line and output 2 lines after,
- tail: cut the first 2 lines (i.e. start from the third line).
Method 5
If yo want to remove the first 2 lines
pipe it to sed
sed '1,2d'
as in
grep -A 2 "The mail system" mbox_file | sed '1,2d'
Method 6
This prints the next 1 line following the regexp match, using Perl
perl -ne 'print if( (/The mail system/ && ($end=1))..!$end-- )'
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0