grep to ignore patterns

I am extracting URLs from a website using cURL as below.

curl www.somesite.com | grep "<a href=.*title=" > new.txt

My new.txt file is as below.

<a href="http://website1.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" title="something">
<a href="http://website1.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" information="something" title="something">
<a href="http://website2.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" title="some_other_thing">
<a href="http://website2.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" information="something" title="something">
<a href="http://websitenotneeded.com" rel="nofollow noreferrer noopener" title="something NOTNEEDED">

However, I need to extract only the below information.

<a href="http://website1.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" title="something">
<a href="http://website2.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" information="something" title="something">

I am trying to ignore the <a href which have information in them and whose title end with NOTNEEDED.

How can I modify my grep statement?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I’m not fully following your example + the description but it sounds like what you want is this:

$ grep -v "<a href=.*title=.*NOTNEEDED" sample.txt 
<a href="http://website1.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" title="something">
<a href="http://website1.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" information="something" title="something">
<a href="http://website2.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" title="some_other_thing">
<a href="http://website2.com" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" rel="nofollow noreferrer noopener" information="something" title="something">

So for your example:

$ curl www.example.com | grep -v "<a href=.*title=" | grep -v NOTNEEDED > new.txt

Method 2

The grep man page says:

-v, --invert-match
    Invert the sense of matching, to select non-matching lines. (-v is specified by POSIX .)

You can use regular expressions for multiple inversions:

grep -v 'red|green|blue'

or

grep -v red | grep -v green | grep -v blue


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x