extract from line to line and then save to separate file

I tried my luck with grep and sed but somehow I don’t manage to get it right.

I have a log file which is about 8 GB in size. I need to analyze a 15 minute time period of suspicious activity. I located the part of the log file that I need to look at and I am trying to extract those lines and save it into a separate file. How would I do that on a regular CentOS machine?

My last try was this but it didn’t work. I am at loss when it comes to sed and those type of commands.

sed -n '2762818,2853648w /var/log/output.txt' /var/log/logfile

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

sed -n '2762818,2853648p' /var/log/logfile > /var/log/output.txt

p is for print

Method 2

Probably the best way to do this is with shell redirection, as others have mentioned. sed though, while a personal favorite, is probably not going to do this more efficiently than will head – which is designed to grab only so many lines from a file.

There are other answers on this site which demonstrably show that for large files head -n[num] | tail -n[num] will outperform sed every time, but probably even faster than that is to eschew the pipe altogether.

I created a file like:

echo | dd cbs=5000000 conv=block | tr   \n >/tmp/5mil_lines

And I ran it through:

{ head -n "$((ignore=2762817))" >&2
  head -n "$((2853648-ignore))" 
} </tmp/5mil_lines 2>/dev/null  |
sed -n '1p;$p'                

I only used sed at all there to grab only the first and last line to show you…

2762818
2853648

This works because when you group commands with { ... ; } and redirect the input for the group like ... ; } <input all of them will share the same input. Most commands will exhaust the whole infile while reading it so in a { cmd1 ; cmd2; } <infile case usually cmd1 reads from the head of the infile to its tail and cmd2 is left with none.

head, however, will always seek only so far through its infile as it is instructed to do, and so in a…

{ head -n [num] >/dev/null
  head -n [num]
} <infile 

…case the first seeks through to [num] and dumps its output to /dev/null and the second is left to begin its read where the first left it.

You can do…

{ head -n "$((ignore=2762817))" >/dev/null
  head -n "$((2853648-ignore))" >/path/to/outfile
} <infile

This construct also works with other kinds of compound commands. For example:

set "$((n=2762817))" "$((2853648-n))"
for n do head "-n$n" >&"$#"; shift
done <5mil_lines 2>/dev/null | 
sed -n '1p;$p'

…which prints…

2762818
2853648

But it might also work like:

d=$(((  n=$(wc -l </tmp/5mil_lines))/43 ))      &&
until   [ "$(((n-=d)>=(!(s=143-n/d))))" -eq 0 ] &&
        head "-n$d" >>"/tmp/${s#1}.split"
do      head "-n$d" > "/tmp/${s#1}.split"       || ! break
done    </tmp/5mil_lines

Above the shell initially sets the $n and $d variables to …

  • $n
    • The line count as reported by wc for my test file /tmp/5mil_lines
  • $d
    • The quotient of $n/43 where 43 is just some arbitrarily selected divisor.

It then loops until it has decremented $n by $d to a value less $d. While doing so it saves its split count in $s and uses that value in the loop to increment the named > output file called /tmp/[num].split. The result is that it reads out an equal number of newline delimited fields in its infile to a new outfile for each iteration – splitting it out equally 43 times over the course of the loop. It manages it without having to read its infile any more than 2 times – the first time is when wc does it to count its lines, and for the rest of the operation it only reads as many lines as it writes to the outfile each time.

After running it I checked my results like…

tail -n1 /tmp/*split | grep .

OUTPUT:

==> /tmp/01.split <==
116279  
==> /tmp/02.split <==
232558  
==> /tmp/03.split <==
348837  
==> /tmp/04.split <==
465116  
==> /tmp/05.split <==
581395  
==> /tmp/06.split <==
697674  
==> /tmp/07.split <==
813953  
==> /tmp/08.split <==
930232  
==> /tmp/09.split <==
1046511 
==> /tmp/10.split <==
1162790 
==> /tmp/11.split <==
1279069 
==> /tmp/12.split <==
1395348 
==> /tmp/13.split <==
1511627 
==> /tmp/14.split <==
1627906 
==> /tmp/15.split <==
1744185 
==> /tmp/16.split <==
1860464 
==> /tmp/17.split <==
1976743 
==> /tmp/18.split <==
2093022 
==> /tmp/19.split <==
2209301 
==> /tmp/20.split <==
2325580 
==> /tmp/21.split <==
2441859 
==> /tmp/22.split <==
2558138 
==> /tmp/23.split <==
2674417 
==> /tmp/24.split <==
2790696 
==> /tmp/25.split <==
2906975 
==> /tmp/26.split <==
3023254 
==> /tmp/27.split <==
3139533 
==> /tmp/28.split <==
3255812 
==> /tmp/29.split <==
3372091 
==> /tmp/30.split <==
3488370 
==> /tmp/31.split <==
3604649 
==> /tmp/32.split <==
3720928 
==> /tmp/33.split <==
3837207 
==> /tmp/34.split <==
3953486 
==> /tmp/35.split <==
4069765 
==> /tmp/36.split <==
4186044 
==> /tmp/37.split <==
4302323 
==> /tmp/38.split <==
4418602 
==> /tmp/39.split <==
4534881 
==> /tmp/40.split <==
4651160 
==> /tmp/41.split <==
4767439 
==> /tmp/42.split <==
4883718 
==> /tmp/43.split <==
5000000 

Method 3

You could probably accomplish this with the help of head and tail command combinations as below.

head -n{to_line_number} logfile | tail -n+{from_line_number} > newfile

Replace the from_line_number and to_line_number with the line numbers you desire.

Testing

cat logfile
This is first line.
second
Third
fourth
fifth
sixth
seventh
eighth
ninth
tenth

##I use the command as below. I extract from 4th line to 10th line. 

head -n10 logfile | tail -n+4 > newfile
fourth
fifth
sixth
seventh
eighth
ninth
tenth


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x