bruce wrote:
> i know this is way off topic, but i'm considering creating a seti at home like
> grid project for testing purposes. the goal of the project would be to
> extract book information from the amazon.com site/servers using the amazon
> AWS services.
> to get to the scale in a fast enough timeframe, i might have to create some
> sort of distributed/grid application. the key issue is that while Amazon
> permits the
> extraction/use of the book data from their site/servers, Amazon restricts
> how fast you can hit their servers with a given machine/IP. amazon allows a
> server to hit their site oncer/second. the obvious solution is to create a
> distributed app that would be used to parse/extract the information,
> building the database.
> while the initial app would be to test, to make sure everything would work
> correctly, the obvious end result would be to use the database to support a
> possible business venture.
> the client app for this project would consist of a perl/python app used to
> hit the amazon.com server, and then to return the data to the test server.
> the goal is to extract information for ~2-3 million books. i estimate that
> i'd have to have a network of 200-500 machines to accomplish this over 2-3
> days.... with each client machine hitting the amazon server once every 5
> seconds...

And keeping this up to date...

As far as I can tell, this looks like an end-run around the AWS License
conditions: they might well view the distributed application as a single
"Application". It's also one that they might well notice: you think
they're *that* unlikely to notice which IP addresses are hammering at
their servers?

There's a chance that they'll just track down the IP addresses in
question, find out who owns them, and sue you. There's a rather larger
chance that they'll just block you.

Now you might be prepared to take this risk if you're just
experimenting. But I seriously wouldn't want to build a business that is
solely dependent on the goodwill of Amazon.

I don't know exactly what you're planning, but it sounds a bad idea.

