Thanks for the reply.
My questions would probably be how to speed up something, or how to
possibly redo/re-architect part of the crawl process.
As an example, I have a situation where I use cheap cloud vms
(digitalocean) to perform the fetches. The fetches are basic "curl"
with the required attributes. The curl also includes a "Cookie" for
the curl/fetch for the target server. When running from the given ip
address of the vm the target "blocks" the fetch. (I guess someone else
cold have tried to fetch a bunch earlier -- who knows). So, I use an
anonymous proxy-server ip to then generate the fetch. This process
works, but it's slow. So, the process runs a number of these in
parallel at the same time on the cheap droplet. While this speeds
things up, still "slow"... I've also tested running curl with multi
"http" urls in the same curl.
It's these types of things that I'm grappling with...