I tried my luck with grep and sed but somehow I don’t manage to get it right.
I have a log file which is about 8 GB in size. I need to analyze a 15 minute time period of suspicious activity. I located the part of the log file that I need to look at and I am trying to extract those lines and save it into a separate file. How would I do that on a regular CentOS machine?
My last try was this but it didn’t work. I am at loss when it comes to sed and those type of commands.
sed -n '2762818,2853648w /var/log/output.txt' /var/log/logfile
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
sed -n '2762818,2853648p' /var/log/logfile > /var/log/output.txt
p is for print
Method 2
Probably the best way to do this is with shell redirection, as others have mentioned. sed though, while a personal favorite, is probably not going to do this more efficiently than will head – which is designed to grab only so many lines from a file.
There are other answers on this site which demonstrably show that for large files head -n[num] | tail -n[num] will outperform sed every time, but probably even faster than that is to eschew the pipe altogether.
I created a file like:
echo | dd cbs=5000000 conv=block | tr \n >/tmp/5mil_lines
And I ran it through:
{ head -n "$((ignore=2762817))" >&2
head -n "$((2853648-ignore))"
} </tmp/5mil_lines 2>/dev/null |
sed -n '1p;$p'
I only used sed at all there to grab only the first and last line to show you…
2762818 2853648
This works because when you group commands with { ... ; } and redirect the input for the group like ... ; } <input all of them will share the same input. Most commands will exhaust the whole infile while reading it so in a { cmd1 ; cmd2; } <infile case usually cmd1 reads from the head of the infile to its tail and cmd2 is left with none.
head, however, will always seek only so far through its infile as it is instructed to do, and so in a…
{ head -n [num] >/dev/null
head -n [num]
} <infile
…case the first seeks through to [num] and dumps its output to /dev/null and the second is left to begin its read where the first left it.
You can do…
{ head -n "$((ignore=2762817))" >/dev/null
head -n "$((2853648-ignore))" >/path/to/outfile
} <infile
This construct also works with other kinds of compound commands. For example:
set "$((n=2762817))" "$((2853648-n))"
for n do head "-n$n" >&"$#"; shift
done <5mil_lines 2>/dev/null |
sed -n '1p;$p'
…which prints…
2762818 2853648
But it might also work like:
d=$((( n=$(wc -l </tmp/5mil_lines))/43 )) &&
until [ "$(((n-=d)>=(!(s=143-n/d))))" -eq 0 ] &&
head "-n$d" >>"/tmp/${s#1}.split"
do head "-n$d" > "/tmp/${s#1}.split" || ! break
done </tmp/5mil_lines
Above the shell initially sets the $n and $d variables to …
$n- The line count as reported by
wcfor my test file/tmp/5mil_lines
- The line count as reported by
$d- The quotient of
$n/43where 43 is just some arbitrarily selected divisor.
- The quotient of
It then loops until it has decremented $n by $d to a value less $d. While doing so it saves its split count in $s and uses that value in the loop to increment the named > output file called /tmp/[num].split. The result is that it reads out an equal number of newline delimited fields in its infile to a new outfile for each iteration – splitting it out equally 43 times over the course of the loop. It manages it without having to read its infile any more than 2 times – the first time is when wc does it to count its lines, and for the rest of the operation it only reads as many lines as it writes to the outfile each time.
After running it I checked my results like…
tail -n1 /tmp/*split | grep .
OUTPUT:
==> /tmp/01.split <==
116279
==> /tmp/02.split <==
232558
==> /tmp/03.split <==
348837
==> /tmp/04.split <==
465116
==> /tmp/05.split <==
581395
==> /tmp/06.split <==
697674
==> /tmp/07.split <==
813953
==> /tmp/08.split <==
930232
==> /tmp/09.split <==
1046511
==> /tmp/10.split <==
1162790
==> /tmp/11.split <==
1279069
==> /tmp/12.split <==
1395348
==> /tmp/13.split <==
1511627
==> /tmp/14.split <==
1627906
==> /tmp/15.split <==
1744185
==> /tmp/16.split <==
1860464
==> /tmp/17.split <==
1976743
==> /tmp/18.split <==
2093022
==> /tmp/19.split <==
2209301
==> /tmp/20.split <==
2325580
==> /tmp/21.split <==
2441859
==> /tmp/22.split <==
2558138
==> /tmp/23.split <==
2674417
==> /tmp/24.split <==
2790696
==> /tmp/25.split <==
2906975
==> /tmp/26.split <==
3023254
==> /tmp/27.split <==
3139533
==> /tmp/28.split <==
3255812
==> /tmp/29.split <==
3372091
==> /tmp/30.split <==
3488370
==> /tmp/31.split <==
3604649
==> /tmp/32.split <==
3720928
==> /tmp/33.split <==
3837207
==> /tmp/34.split <==
3953486
==> /tmp/35.split <==
4069765
==> /tmp/36.split <==
4186044
==> /tmp/37.split <==
4302323
==> /tmp/38.split <==
4418602
==> /tmp/39.split <==
4534881
==> /tmp/40.split <==
4651160
==> /tmp/41.split <==
4767439
==> /tmp/42.split <==
4883718
==> /tmp/43.split <==
5000000
Method 3
You could probably accomplish this with the help of head and tail command combinations as below.
head -n{to_line_number} logfile | tail -n+{from_line_number} > newfile
Replace the from_line_number and to_line_number with the line numbers you desire.
Testing
cat logfile This is first line. second Third fourth fifth sixth seventh eighth ninth tenth ##I use the command as below. I extract from 4th line to 10th line. head -n10 logfile | tail -n+4 > newfile fourth fifth sixth seventh eighth ninth tenth
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0