off topic: combined output of concurrent processes
Cameron Simpson
cs at zip.com.au
Sun Apr 15 21:38:13 UTC 2012
On 15Apr2012 14:30, Amadeus W.M. <amadeus84 at verizon.net> wrote:
| > Look at this (completely untested) loop:
| >
| > # a little setup
| > cmd=`basename "$0"`
| > : ${TMPDIR:=/tmp}
| > tmppfx=$TMPDIR/$cmd.$$
| >
| > i=0
| > while read -r url
| > do
| > i=$((i+1))
| > out=$tmppfx.$i
| > if curl -s "$url" >"$out"
| > then echo "$out"
| > else echo "$cmd: curl fails on: $url" >&2 fi &
| > done < myURLs \
| > | while read -r out
| > do
| > cat "$out"
| > rm "$out"
| > done \
| > | tee all-data.out \
| > | your-data-parsing-program
|
|
| I understand the script, although I haven't tested it either. My take on
| it:
| + it solves the problem of curls overwriting (I think)
| + the data parsing and tracking is done on the combined curls
Yes.
| - it retrieves the urls serially, not in parallel
No, in parallel. There is an "&" after the "fi" in the if.
It looks like the "fi &" got sucked onto the end of an echo statemnet.
It should be on its own line.
| - it writes them to disk
Just long enough to be read and catted, then removed.
| - it re-reads them from disk, hence some disk activity, although
| probably insignificant relative to the download time.
Should be, yes.
| The way I'm doing it now is this: I do the retrieval and the parsing and
| tracking all within a single program. For each url I create a separate
| thread from which I call curl and get its output, then parse.
| Like this:
|
| // inside each thread:
[... popen(curl...) ...]
| // when threads done, analyze the combined info.
|
| This works, but I would have liked a more modular solution. I want the url
| retrieval to be a separate, standalone entity and the parsing and
| tracking another entity (possibly two entities). Hence, what I want is
|
| - in a shell
| - download in parallel
| - merge curl outputs
My above loop tries to do that. The curls do run in parallel.
| then pipe into the parser/tracker. Parsing can be done per url, but
| tracking MUST be across urls.
That should work; your parser comes at the end of the pipeline.
Cheers,
--
Cameron Simpson <cs at zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/
[Alain] had been looking at his dashboard, and had not seen me, so I
ran into him. - Jean Alesi on his qualifying prang at Imola '93
More information about the users
mailing list