I want to extract an exact line from a very big file. For example, line 8000 would be gotten like this:
command -line 8000 > output_line_8000.txt
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
There’s already an answer with perl and awk. Here’s a sed answer:
sed -n '8000{p;q}' file
The advantage of the q command is that sed will quit as soon as the 8000-th line is read (unlike the other (it was changed after common creativity, haha)).perl and awk methods
A pure Bash possibility (bash≥4):
mapfile -s 7999 -n 1 ary < file
printf '%s' "${ary[0]}"
This will slurp the content of file in an array ary (one line per field), but skip the first 7999 lines (-s 7999) and only read one line (-n 1).
Method 2
It’s Saturday and I had nothing better to do so I tested some of these for speed. It turns out that the sed, gawk and perl approaches are basically equivalent. The head&tail one is the slowest but, suprisingly, the fastest by an order of magnitude is the pure bash one:
Here are my tests:
$ for i in {1..5000000}; do echo "This is line $i" >>file; done
The above creates a file with 50 million lines which occupies 100M.
$ for cmd in "sed -n '8000{p;q}' file"
"perl -ne 'print && exit if $. == 8000' file"
"awk 'FNR==8000 {print;exit}' file"
"head -n 8000 file | tail -n 1"
"mapfile -s 7999 -n 1 ary < file; printf '%s' "${ary[0]}""
"tail -n 8001 file | head -n 1"; do
echo "$cmd"; for i in {1..100}; do
(time eval "$cmd") 2>&1 | grep -oP 'real.*?mK[d.]+'; done |
awk '{k+=$1}END{print k/100}';
done
sed -n '8000{p;q}' file
0.04502
perl -ne 'print && exit if $. == 8000' file
0.04698
awk 'FNR==8000 {print;exit}' file
0.04647
head -n 8000 file | tail -n 1
0.06842
mapfile -s 7999 -n 1 ary < file; printf '%s' "This is line 8000
"
0.00137
tail -n 8001 file | head -n 1
0.0033
Method 3
You can do it many ways.
Using perl:
perl -nle 'print && exit if $. == 8000' file
Using awk:
awk 'FNR==8000 {print;exit}' file
Or you can use tail and head to prevent from reading entire file until the 8000th line:
tail -n +8000 | head -n 1
Method 4
Another version with tail and head
head -n 8000 file | tail -n 1
Method 5
You could use sed:
sed -n '8000p;' filename
If the file is large, then it’d be better to quit:
sed -n '8000p;8001q' filename
You could similarly quit reading the entire file using awk or perl too:
awk 'NR==8000{print;exit}' filename
perl -ne 'print if $.==8000; last if $.==8000' filename
Method 6
How about this?
$ cat -n filename | grep -E "[ t]+8000"
Example
$ cat -n /etc/abrt/plugins/CCpp.conf | grep -E "^[ t]+16"
16 #DebuginfoLocation = /var/cache/abrt-di
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0