off topic: combined output of concurrent processes

Amadeus W.M. amadeus84 at verizon.net
Sun Apr 15 05:52:43 UTC 2012


> 
> if your script forks off lots of curls into a file and does not wait for
> them all, then you may get to run the grep before they all finish, hence
> the weird results.

If ioTest.sh is the original example I posted, I'm NOT doing this:

./ioTest.sh | grep ^A | wc -l

I am doing this:

./ioTest.sh > out          # go drink beer
grep ^A out | wc -l

All echos should have completed by the time I do grep. Yet I see fewer 
than 100 lines.

It does work if I append instead of write though.



> 
> Note that you can only wait for immediate children (whereas the pipe
> does not show EOF until all attached processes have exited - that means
> it works for non immediate children too).
> 
> Consider:
> 
>   for n in `seq 1 100`
>   do  echo FOO &
>   done >zot
>   wait
> 

With this exact script, it works for FOO (probably because it's short). 
For FOOOOOOOOOO...(1000 Os) I see again fewer than 100 lines in "zot". 
This, if I iterate 100 times. If I iterate, say, 10-20 times only, I seem 
to get all the lines. Can it have something to do with the number of jobs 
executed in the background?



The real code is like this:

#!/bin/bash

for url in $(cat myURLs)
do
	curl -s $url &
done



I pipe the combined curl outputs to a program that parses the html and 
keeps track of something (I do pipe afterall). I could do that serially 
(without &), but parallel is better. I'm only spewing out some 20 network 
requests simultaneously and so far no warnings from verizon. I'm guessing 
if I do 1000, say, I might set off some alarms. But that's another 
problem.




More information about the users mailing list