Linux: Launch multiple instances of a script a wait for all of them to complete

In order to parallelize some work, I needed to start multiple instances of a script (with an instance number as parameter) and wait for all of them to complete before displaying some results. Here is how I implemented it:

echo "Starting myscript with $instances instances"
for i in `seq 1 $instances`
do
	./myscript.sh $i  >> $LOGS/myscript.$i.log &
done

echo "Waiting for all $instances instances to complete"
ps -Af | grep myscript.sh | grep -v grep | awk ' { print $2; }' | xargs wait

echo "All $instances scripts completed"

It basically starts all the scripts in the background (with &).
I use ps -Af to get the list of all processes including the command line used to start them. You would also get the same results with ps -ef or with ps aux.
Then I grep my script name and use grep -v grep not to see my grep command (which also has the script name in its command line).
Extract the PID from the list.
Convert the vertical list to an horizontal one with xargs and feed it to wait.
Wait as it names says, will wait for all provided PIDs to complete.
I can then display results or just write that we’re done.

Update: I’ve just noticed that this actually didn’t work. There was an error message I didn’t see because I redirected the output of the script:

xargs: wait: No such file or directory

This basically means that wait is a built-in command and not an executable you can start. So it actually doesn’t work with xargs.
Additionally, I executed the following while the x scripts where running:

# ps -Af | grep myscript.sh | grep -v grep | awk ' { print $2; }' | xargs
11731 11732 11733
# wait 11731 11732 11733
-bash: wait: pid 11731 is not a child of this shell
-bash: wait: pid 11732 is not a child of this shell
-bash: wait: pid 11733 is not a child of this shell

This is because wait can only wait for a process started from the same shell…

So what do we do now ? Read the man pages for wait:

If the wait utility is invoked with no operands, it shall wait until all process IDs known to the invoking shell have terminated and exit with a zero exit status.

So basically if I just call wait without doing the whole thing with ps, grep, awk, xargs, it will actually do what I need:

echo "Starting myscript with $instances instances"
for i in `seq 1 $instances`
do
	./myscript.sh $i  >> $LOGS/myscript.$i.log &
done

echo "Waiting for all $instances instances to complete"
wait

echo "All $instances scripts completed"

The only drawback is that if you started other background processes, you have to wait for them all to complete, which might not be what you want…

Another way to do it is still use the whole ps and grep think but use a loop instead of using wait:

echo "Starting myscript with $instances instances"
for i in `seq 1 $instances`
do
	./myscript.sh $i  >> $LOGS/myscript.$i.log &
done

echo "Waiting for all $instances instances to complete"
while [ `ps -Af | grep myscript.sh | grep -v grep | wc -l` -ne 0 ]
do
	sleep 1
done

echo "All $instances scripts completed"

Polling in a loop is not really very elegant but it works and allows you to better select which processes you want to wait for than using the wait command.

Leave a Reply

Your email address will not be published. Required fields are marked *