Looking for some apache config help to block evil spiders

Steven W. Orr steveo at syslang.net
Sat Oct 10 20:08:53 UTC 2009


On 10/10/09 14:37, quoth Steven W. Orr:
> I never really checked before, but I have a lot of evil spiders crawling
> around my server. Some of them respect my robots.txt file and others do not.
> Some of the ones that do are still *very* pushy. So I decided to shut that
> bastards off. Here's what I added to my httpd.conf:
> 
> RewriteLog    logs/rewrite_log
> RewriteLogLevel 1
> 
> RewriteCond %{HTTP_USER_AGENT}  ^Baiduspider.* [OR]
> RewriteCond %{HTTP_USER_AGENT}	^msnbot.* [OR]
> RewriteCond %{HTTP_USER_AGENT}	^NaverBot.* [OR]
> RewriteCond %{HTTP_USER_AGENT}	^Sogou-Test-Spider.*
> RewriteCond %{HTTP_USER_AGENT}	^Mozilla/4.0.*
> RewriteCond %{HTTP_USER_AGENT}	^T-Mobile Dash.*
> RewriteRule .* - [F,L]
> 
> and inside each of the virtual domains, I added:
> 
>     RewriteEngine On
>     RewriteOptions Inherit
> 
> Here's the problem. What I want to see is the rewrite_log telling me what it
> has redirected or failed. Instead, I'm getting a line telling me every link
> that it does NOT rewrite. For example:
> 
> 72.30.65.61 - - [10/Oct/2009:14:28:24 --0400] \
> [vdom.syslang.net/sid#b7298ed0][rid#b6b488e8/initial] (1) pass through /d1/fn
> 
> I have googled my brains out and it seems like others have had the same
> questions. I see no answers. If anyone has any idea I love to hear it.
> 
> I understand that nod_rewrite is complicated, but what I'd like to end up with
> a log of all the spiders that got rejected by my rules. Current;y, the
> access_log tells me where the attempt is, the error_log tell me nothing and
> the rewrite_log is telling me more than I want with none of what I need.
> 
> The goal is to see the spiders bouncing off.
> 
> Anyone?
> 
> 

On 10/10/09 14:55, quoth Sharpe, Sam J:
> Are you actually missing the [OR] at the end of the 4th and 5th
> RewriteCond lines, or is that a mispaste...

Yes, thanks, I missed that, but that isn't the problem. The problem is that I
want to be able to see what gets rejected in the log files.

> I found this for a customer today, it's a cracking read and has some
> great pre-written ways of blocking this kind of thing:
> http://www.askapache.com/htaccess/fight-blog-spam-with-apache.html

Turns out there are a number of these kinds of pages out there and some of
them are asking the same question I am: How can I see the rejects in the log
files?

Anyone?


-- 
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
Url : http://lists.fedoraproject.org/pipermail/users/attachments/20091010/694595d5/attachment-0001.bin 


More information about the users mailing list