I’m comparing the following
tail -n 1000000 stdout.log | grep -c '"success": true' tail -n 1000000 stdout.log | grep -c '"success": false'
with the following
log=$(tail -n 1000000 stdout.log) echo "$log" | grep -c '"success": true' echo "$log" | grep -c '"success": false'
and surprisingly the second takes almost 3 times longer than the first. It should be faster, shouldn’t it?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
I’ve done the following test and on my system the resulting difference is about 100 times longer for the second script.
My file is a strace output called bigfile
$ wc -l bigfile.log 1617000 bigfile.log
Scripts
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="027a766b636c42616e636477686b77">[email protected]</a>:~/tmp$ cat p1.sh tail -n 1000000 bigfile.log | grep '"success": true' | wc -l tail -n 1000000 bigfile.log | grep '"success": false' | wc -l <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cbb3bfa2aaa58ba8a7aaadbea1a2be">[email protected]</a>:~/tmp$ cat p2.sh log=$(tail -n 1000000 bigfile.log) echo "$log" | grep '"success": true' | wc -l echo "$log" | grep '"success": true' | wc -l
I don’t actually have any matches for the grep so nothing is written to the last pipe through to wc -l
Here are the timings:
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="98e0ecf1f9f6d8fbf4f9feedf2f1ed">[email protected]</a>:~/tmp$ time bash p1.sh 0 0 real 0m0.381s user 0m0.248s sys 0m0.280s <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="a0d8d4c9c1cee0c3ccc1c6d5cac9d5">[email protected]</a>:~/tmp$ time bash p2.sh 0 0 real 0m46.060s user 0m43.903s sys 0m2.176s
So I ran the two scripts again via the strace command
strace -cfo p1.strace bash p1.sh strace -cfo p2.strace bash p2.sh
Here are the results from the traces:
$ cat p1.strace % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 97.24 0.508109 63514 8 2 waitpid 1.61 0.008388 0 84569 read 1.08 0.005659 0 42448 write 0.06 0.000328 0 21233 _llseek 0.00 0.000024 0 204 146 stat64 0.00 0.000017 0 137 fstat64 0.00 0.000000 0 283 149 open 0.00 0.000000 0 180 8 close ... 0.00 0.000000 0 162 mmap2 0.00 0.000000 0 29 getuid32 0.00 0.000000 0 29 getgid32 0.00 0.000000 0 29 geteuid32 0.00 0.000000 0 29 getegid32 0.00 0.000000 0 3 1 fcntl64 0.00 0.000000 0 7 set_thread_area ------ ----------- ----------- --------- --------- ---------------- 100.00 0.522525 149618 332 total
And p2.strace
$ cat p2.strace % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 75.27 1.336886 133689 10 3 waitpid 13.36 0.237266 11 21231 write 4.65 0.082527 1115 74 brk 2.48 0.044000 7333 6 execve 2.31 0.040998 5857 7 clone 1.91 0.033965 0 705681 read 0.02 0.000376 0 10619 _llseek 0.00 0.000000 0 248 132 open ... 0.00 0.000000 0 141 mmap2 0.00 0.000000 0 176 126 stat64 0.00 0.000000 0 118 fstat64 0.00 0.000000 0 25 getuid32 0.00 0.000000 0 25 getgid32 0.00 0.000000 0 25 geteuid32 0.00 0.000000 0 25 getegid32 0.00 0.000000 0 3 1 fcntl64 0.00 0.000000 0 6 set_thread_area ------ ----------- ----------- --------- --------- ---------------- 100.00 1.776018 738827 293 total
Analysis
Not surprisingly, in both cases most of the time is spent waiting for a process to complete, but p2 waits 2.63 times longer than p1, and as others have mentioned, you are starting late in p2.sh.
So now forget about the waitpid, ignore the % column and look at the seconds column on both traces.
Largest time p1 spends most of its time in read probably understandably, because there’s a large file to read, but p2 spends 28.82 times longer in read than p1 does. – bash is not expecting to read such a large file into a variable and is probably reading buffer at a time, splitting into lines and and then getting another.
read count p2 is 705k vs 84k for p1, each read requiring a context switch into kernel space and out again. Nearly 10 times the number of reads and context switches.
Time in write p2 spends 41.93 times longer in write than p1
write count p1 does more writes than p2, 42k vs 21k, however they are much faster.
Probably because of the echo of lines into grep as opposed to tail writing buffers.
Further more, p2 spends more time in write than it does in read, p1 is the other way round!
Other factor Look at the number of brk system calls : p2 spends 2.42 times longer breaking than it does reading! In p1 (it doesn’t even register). brk is when the program needs to expand its address space because enough wasn’t allocated initially, this is probably due to bash having to read that file into the variable, and not expecting it to be that large, and as @scai mentioned, if the file gets too large, even that would not work.
tail is probably quite an efficient file reader, because this is what it was designed to do, it probably memmaps the file and scans for line breaks, thus allowing the kernel to optimise the i/o. bash doesn’t isn’t as good both on time spent reading and writing.
p2 spends 44ms and 41ms in clone and execv it’s not a measurable amount for p1. Probably bash reading and creating the variable from tail.
Finally the Totals p1 executes ~ 150k system calls vs p2 740k (4.93 times greater).
Eliminating waitpid, p1 spends 0.014416 seconds executing system calls, p2 0.439132 seconds (30 times longer).
So it appears p2 spends most of the time in user space doing nothing except waiting for the system calls to complete and the kernel to regorganise memory, p1 performs more writes, but is more efficient and causes significantly less system load, and is therefore faster.
Conclusion
I would never try to worry about coding through memory when writing a bash script, that does not mean to say that you do not try to be efficient.
tail is designed to do what it does, it probably memory maps the file so that it is efficient to read and allows the kernel to optimise the i/o.
A better way to optimise your problem might be to first grep for ‘”success”: ‘ lines and then count the trues and falses, grep has a count option which again avoids the wc -l, or even better still, pipe the tail through to awk and count trues and falses concurrently. p2 not only takes long but adds load to the system whilst memory is being shuffled about with brks.
Method 2
On the one hand, the first method calls tail twice, so it has to do more work than the second method which only does this once. On the other hand, the second method has to copy the data into the shell and then back out, so it has to do more work than the first version where tail is directly piped into grep. The first method has an extra advantage on a multi-processor machine: grep can work in parallel with tail, whereas the second method is strictly serialized, first tail, then grep.
So there’s no obvious reason why one should be faster than the other.
If you want to see what’s going on, look at what system calls the shell makes. Try with different shells, too.
strace -t -f -o 1.strace sh -c ' tail -n 1000000 stdout.log | grep ""success": true" | wc -l; tail -n 1000000 stdout.log | grep ""success": false" | wc -l' strace -t -f -o 2-bash.strace bash -c ' log=$(tail -n 1000000 stdout.log); echo "$log" | grep ""success": true" | wc -l; echo "$log" | grep ""success": true" | wc -l' strace -t -f -o 2-zsh.strace zsh -c ' log=$(tail -n 1000000 stdout.log); echo "$log" | grep ""success": true" | wc -l; echo "$log" | grep ""success": true" | wc -l'
With method 1, the main stages are:
tailreads and seek to find its starting point.tailwrites 4096-byte chunks whichgrepreads as fast as they’re produced.- Repeat the previous step for the second search string.
With method 2, the main stages are:
tailreads and seek to find its starting point.tailwrites 4096-byte chunks which bash reads 128 bytes at a time, and zsh reads 4096 bytes at a time.- Bash or zsh writes 4096-byte chunks which
grepreads as fast as they’re produced. - Repeat the previous step for the second search string.
Bash’s 128-byte chunks when reading the output of the command substitution slows it down significantly; zsh comes out about as fast as method 1 for me. Your mileage may vary depending on the CPU type and number, scheduler configuration, versions of the tools involved, and size of the data.
Method 3
Actually the first solution reads the file into memory, too! This is called caching and is automatically done by the operating system.
And as already correctly explained by mikeserv the first solution exectutes grep while the file is being read whereas the second solution executes it after the file has been read by tail.
So the first solution is faster because of various optimizations. But this doesn’t always have to be true. For really large files which the OS decides not to cache the second solution could become faster. But note that for even larger files which won’t fit into your memory the second solution won’t work at all.
Method 4
I think the main difference is very simply that echo is slow. Consider this:
$ time (tail -n 1000000 foo | grep 'true' | wc -l;
tail -n 1000000 foo | grep 'false' | wc -l;)
666666
333333
real 0m0.999s
user 0m1.056s
sys 0m0.136s
$ time (log=$(tail -n 1000000 foo); echo "$log" | grep 'true' | wc -l;
echo "$log" | grep 'false' | wc -l)
666666
333333
real 0m4.132s
user 0m3.876s
sys 0m0.468s
$ time (tail -n 1000000 foo > bb; grep 'true' bb | wc -l;
grep 'false' bb | wc -l)
666666
333333
real 0m0.568s
user 0m0.512s
sys 0m0.092s
As you can see above, the time consuming step is printing the data. If you simply redirect to a new file and grep through that it is much faster when only reading the file once.
And as requested, with a here string:
$ time (log=$(tail -n 1000000 foo); grep 'true' <<< $log | wc -l;
grep 'false' <<< $log | wc -l )
1
1
real 0m7.574s
user 0m7.092s
sys 0m0.516s
This one is even slower, presumably because the here string is concatenating all data onto one long line and that will slow down the grep:
$ tail -n 1000000 foo | (time grep -c 'true') 666666 real 0m0.500s user 0m0.472s sys 0m0.000s $ tail -n 1000000 foo | perl -pe 's/n/ /' | (time grep -c 'true') 1 real 0m1.053s user 0m0.048s sys 0m0.068s
If the variable is quoted so that no splitting occurs, things are a little faster:
$ time (log=$(tail -n 1000000 foo); grep 'true' <<< "$log" | wc -l;
grep 'false' <<< "$log" | wc -l )
666666
333333
real 0m6.545s
user 0m6.060s
sys 0m0.548s
But still slow because the rate limiting step is printing the data.
Method 5
I had a go at this as well… First, I built the file:
printf '"success": "true"
"success": "true"
"success": "false"
%.0b' `seq 1 500000` >|/tmp/log
If you run the above yourself, you should come up with 1.5million lines in /tmp/log with 2:1 ratio of "success": "true" lines to "success": "false" lines.
The next thing I did was run some tests. I ran all of the tests through a proxy sh so time would only have to watch a single process – and could therefore show a single result for the entire job.
This seems to be the fastest, even though it adds a second file descriptor and tee, though I think I can explain why:
time sh <<-CMD
. <<HD /dev/stdin | grep '"success": "true"' | wc -l
tail -n 1000000 /tmp/log | { tee /dev/fd/3 |
grep '"success": "false"' |
wc -l 1>&2 & } 3>&1 &
HD
CMD
666666
333334
sh <<<'' 0.11s user 0.08s system 84% cpu 0.224 total
Here’s your first:
time sh <<CMD
tail -n 1000000 /tmp/log | grep '"success": "true"' | wc -l
tail -n 1000000 /tmp/log | grep '"success": "false"' | wc -l
CMD
666666
333334
sh <<<'' 0.31s user 0.17s system 148% cpu 0.323 total
And your second:
time sh <<CMD
log=$(tail -n 1000000 /tmp/log)
echo "$log" | grep '"success": "true"' | wc -l
echo "$log" | grep '"success": "false"' | wc -l
CMD
666666
333334
sh <<<'' 2.12s user 0.46s system 108% cpu 2.381 total
You can see that in my tests there was more than a 3*difference in speed when reading it into a variable as you did.
I think part of that is that a shell variable has to be split and handled by the shell when it’s being read – it’s not a file.
A here-document on the other hand, for all intents and purposes, is a file – a file descriptor, anyway. And as we all know – Unix works with files.
What’s most interesting to me about here-docs is that you can manipulate their file-descriptors – as a straight |pipe– and execute them. This is very handy as it allows you a little more freedom in pointing your |pipe where you’d like.
I had to tee the tail because the first grep eats the here-doc |pipe and there’s nothing left for the second to read. But since I |piped it into /dev/fd/3 and picked it up again to pass to >&1 stdout, it didn’t much matter. If you use grep -c as many others recommend:
time sh <<-CMD
. <<HD /dev/stdin | grep -c '"success": "true"'
tail -n 1000000 /tmp/log | { tee /dev/fd/3 |
grep -c '"success": "false"' 1>&2 & } 3>&1 &
HD
CMD
666666
333334
sh <<<'' 0.07s user 0.04s system 62% cpu 0.175 total
It’s even faster.
But when I run it without . sourcing the heredoc I can’t successfully background the first process to run them fully concurrently. Here it is without fully backgrounding it:
time sh <<CMD
tail -n 1000000 /tmp/log | { tee /dev/fd/3 |
grep -c '"success": "true"' 1>&2 & } 3>&1 |
grep -c '"success": "false"'
CMD
666666
333334
sh <<<'' 0.10s user 0.08s system 109% cpu 0.165 total
But when I add the &:
time sh <<CMD
tail -n 1000000 /tmp/log | { tee /dev/fd/3 |
grep -c '"success": "true"' 1>&2 & } 3>&1 & |
grep -c '"success": "false"'
CMD
sh: line 2: syntax error near unexpected token `|'
Still, the difference appears to be only a few hundredths of a second, at least for me, so take it as you will.
Anyway, the reason it runs faster with tee is because both greps run at the same time with only one invocation of tail. tee duplicates the file for us and splits it off to the second grep process all in-stream – everything runs at once from beginning to end, so they all finish up around the same time, too.
So going back to your first example:
tail | grep | wc #wait til finished
tail | grep | wc #now we're done
And your second:
var=$( tail ) ; #wait til finished
echo | grep | wc #wait til finished
echo | grep | wc #now we're done
But when we split our input and run our processes concurrently:
3>&1 | grep #now we're done
/
tail | tee #both process together
>&1 | grep #now we're done
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0