I have a small script that loops through all files of a folder and executes a (usually long lasting) command. Basically it’s
for file in ./folder/*;
do
./bin/myProgram $file > ./done/$file
done
(Please Ignore syntax errors, it’s just pseudo code).
I now wanted to run this script twice at the same time. Obviously, the execution is unnecessary if ./done/$file exists. So I changed the script to
for file in ./folder/*;
do
[ -f ./done/$file ] || ./bin/myProgram $file >./done/$file
done
So basically the question is:
Is it possible that both scripts (or in general more than one script) actually are at the same point and check for the existance of the done file which fails and the command runs twice?
it would be just perfect, but I highly doubt it. This would be too easy 😀
If it can happen that they process the same file, is it possible to somehow “synchronize” the scripts?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This is possible and does occur in reality. Use a lock file to avoid this situation. An example, from said page:
if mkdir /var/lock/mylock; then
echo "Locking succeeded" >&2
else
echo "Lock failed - exit" >&2
exit 1
fi
# ... program code ...
rmdir /var/lock/mylock
Method 2
The two instances of your script can certainly interact in this way, causing the command to run twice. This is called a race condition.
One way to avoid this race condition would be if each instance grabbed its input file by moving it to another directory. Moving a file (inside the same filesystem) is atomic. Moving the input files may not be desirable, and this is already getting a bit complicated.
mkdir staging-$$ making-$$
for input in folder/*; do
name=${x#folder/}
staging=staging-$$/$name
output=making-$$/$name
destination=done/$name
if mv -- "$input" "$staging" 2>/dev/null; then
bin/myProgram "$staging" >"$output"
mv -- "$output" "$destination"
mv -- "$staging" "$input"
fi
done
A simple way to process the files in parallel using a widely-available tool is GNU make, using the -j flag for parallel execution. Here’s a makefile for this task (remember to use tabs to indent commands):
all: $(patsubst folder/%,done/%,$(wildcard folder/*))
done/%: folder/%
./bin/myProgram $< ><a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1135513f657c61">[email protected]</a>
mv <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="0327432d776e73">[email protected]</a> <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1c385c">[email protected]</a>
Run make -j 3 to run 3 instances in parallel.
See also Four tasks in parallel… how do I do that?
Method 3
I have the feeling you are really trying to run multiple jobs in parallel and that the lock file is simply a means to an end.
If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:
parallel ./bin/myProgram ::: ./folder/*
It will run myProgram on each core in parallel.
You can install GNU Parallel simply by:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel chmod 755 parallel cp parallel sem
Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Method 4
The problem with locking is that you need a method that creates a lock which is uninterruptable (sometimes called atomar). As Chris has wrote in his answer mkdir is such an uninterruptable operation (creating a file is no such operation).
There is also a high-level command – usally hidden in the procmail package: lockfile. That command has some nice features and can easily be used in your own scripts without the need to “reinvent the wheel” (for instance writing your own function that locks based on directory creation).
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0