I was wondering if there are general guidelines for optimizing Bash scripts.
-
For example, it is more convenient to write loops than lines of commands, but is it also faster to process for the system? Example:
for i in a b c; do echo $i; done echo a echo b echo c
-
Sometimes people present different solutions for the same problem. For example,
sed,cut,awk, andechoare all able to strip digits from a string.
I was wondering if you can say that the fewer digits code has, the faster it is if you use:-
the same command, e.g.
STRING=abc.def echo ${STRING} | sed 's/.def//g' echo ${STRING} | sed '$s/....$//' -
different commands, e.g.
STRING=abc.def echo ${STRING} | cut -d . -f 1 echo ${STRING} | sed 's/.def//g'
-
the same command, e.g.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
The first rule of optimization is: don’t optimize. Test first. If the tests show that your program is too slow, look for possible optimizations.
The only way to be sure is to benchmark for your use case. There are some general rules, but they only apply for typical volumes of data in typical applications.
Some general rules which may or may not be true in any particular circumstance:
- For internal processing in the shell, ATT ksh is fastest. If you do a lot of string manipulations, use ATT ksh. Dash comes second; bash, pdksh and zsh lag behind.
- If you need to invoke a shell frequently to perform a very short task each time, dash wins because of its low startup time.
- Starting an external process costs time, so it’s faster to have one pipeline with complex pieces than a pipeline in a loop.
echo $foois slower thanecho "$foo", because with no double quotes, it splits$foointo words and interprets each word as a filename wildcard pattern. More importantly, that splitting and globbing behavior is rarely desired. So remember to always put double quotes around variable substitutions and command substitutions:"$foo","$(foo)".- Dedicated tools tend to win over general-purpose tools. For example, tools like
cutorheadcan be emulated withsed, butsedwill be slower andawkwill be even slower. Shell string processing is slow, but for short strings it largely beats calling an external program. - More advanced languages such as Perl, Python, and Ruby often let you write faster algorithms, but they have a significantly higher startup time so they’re only worth it for performance for large amounts of data.
- On Linux at least, pipes tend to be faster than temporary files.
- Most uses of shell scripting are around I/O-bound processes, so CPU consumption doesn’t matter.
It’s rare that performance is a concern in shell scripts. The list above is purely indicative; it’s perfectly fine to use “slow” methods in most cases as the difference is often a fraction of a percent.
Usually the point of a shell script is to get something done fast. You have to gain a lot from optimization to justify spending extra minutes writing the script.
Method 2
Shells do not do any reorganization of the code they get handed, it is just interpreted one line after the other (nothing else does much sense in a command interpreter). Much of the time spent by the shell goes to lexical analysis/parsing/launching the programs called.
For simple operations (like the ones munging strings in the examples at the end of the question) I’d be surprised if the time to load the programs don’t swamp any minuscule speed differences.
The moral of the story is that if you really need more speed, you are better off with a (semi)compiled language like Perl or Python, which is faster to run to start with, in which you can write many of the operations mentioned directly and don’t have to call out to external programs, and has the option to invoke external programs or call into optimized C (or whatever) modules to do much of the job. That is the reason why in Fedora the “system administration sugar” (GUIs, essentially) are written in Python: Can add a nice GUI with not too much effort, fast enough for such applications, have direct access to system calls. If that isn’t enough speed, grab C++ or C.
But do not go there, unless you can prove that the performance gain is worth the loss in flexibility and the development time. Shell scripts are not too bad to read, but I shudder when I remember some scripts used to install Ultrix I once tried to decipher. I gave up, too much “shell script optimization” had been applied.
Method 3
We’ll expand here on our globbing example above to illustrate some performance characteristics of the shell script interpreter. Comparing the
bashanddashinterpreters for this example where a process is spawned for each of 30,000 files, shows that dash can fork thewcprocesses nearly twice as fast asbash
bash-4.2$ time dash -c 'for i in *; do wc -l "$i"; done>/dev/null' real 0m1.238s user 0m0.309s sys 0m0.815s bash-4.2$ time bash -c 'for i in *; do wc -l "$i"; done>/dev/null' real 0m1.422s user 0m0.349s sys 0m0.940s
Comparing the base looping speed by not invoking the
wcprocesses, shows that dash’s looping is nearly 6 times faster!$ time bash -c 'for i in *; do echo "$i">/dev/null; done' real 0m1.715s user 0m1.459s sys 0m0.252s $ time dash -c 'for i in *; do echo "$i">/dev/null; done' real 0m0.375s user 0m0.169s sys 0m0.203sThe looping is still relatively slow in either shell as demonstrated previously, so for scalability we should try and use more functional techniques so iteration is performed in compiled processes.
$ time find -type f -print0 | wc -l --files0-from=- | tail -n1 30000 total real 0m0.299s user 0m0.072s sys 0m0.221sThe above is by far the most efficient solution and illustrates the point well that one should do as little as possible in shell script and aim just to use it to connect the existing logic available in the rich set of utilities available on a UNIX system.
Stolen From Common shell script mistakes by Pádraig Brady.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0