referrer-b-gone

Tom Sherman has some thoughts on the growing problem of referrer spam, which you’ve seen me rant about here before. He’s got some good ideas, and he’s certainly right that simply blacklisting domains or IPs via .htaccess or even firewalling is a losing battle. He does have one suggestion I disagree with, though:

Referer spam is a problem because spammers can improve their sites’ Google PageRank by getting listed on popular sites through spoofing of the HTTP_REFERER field in an HTTP request.

If bloggers (and other website maintainers) did not publish this information, spammers would not bother to send these spoofed requests to blogs – it would be pointless.

I don’t agree. This logic is attractive, particularly when you are thinking like a reasonable member of polite society. The problem is, you have to think like a spammer. Trying to get spammers to stop referrer-spamming by convincing people to stop posting referrer info is like thinking that e-mail spam could be stopped if most people would just ignore it. Of course we know that most people do ignore spam (and filter/delete it with extreme prejudice).

The problem here is the same as it is in the world of e-mail. The cost of sending spam is low. When faced with a decrease in results, spammers won’t stop spamming, they will spam more, to make sure they hit the one or two people out there still publishing their referer data.

Anyways, that said, using the various anti-spam blacklists out there seems like a good idea and is one I’ve been meaning to follow up on for some time, so I finally have. Blars has written a great module, mod_access_rbl, for Apache, which is a replacement for the default mod_access that also lets you use DNS RBLs to limit access to your website.

Below are the instructions for doing so in debian -stable as I have done tonight:

First, snag the module source:

shkaf:~$ wget http://www.blars.org/mod_access_rbl.tar.gz
--23:24:39--  http://www.blars.org/mod_access_rbl.tar.gz
           => `mod_access_rbl.tar.gz'
Resolving www.blars.org... done.
Connecting to www.blars.org[64.81.35.59]:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5,222 [application/x-tar]

100%[====================================>] 5,222         26.56K/s    ETA 00:00

23:24:40 (26.56 KB/s) - `mod_access_rbl.tar.gz' saved [5222/5222]

Next we take a look at the tarball, since sometimes people are naughty and create a tarball of files with no parent directory, causing you to untar and clutter up your pretty home directory. So thoughtless!

shkaf:~$ tar tvfz mod_access_rbl.tar.gz
-rw-r--r-- blarson/666    1397 2000-07-01 01:28:46 README.rbl
-rw-r--r-- blarson/blars 12142 2000-07-01 00:56:37 mod_access_rbl.c

I knew it! We’ll make our own directory, then:

shkaf:~$ mkdir mod_access
shkaf:~$ cd mod_access
shkaf:~/mod_access$ tar xfz ../mod_access_rbl.tar.gz
shkaf:~/mod_access$ ls
README.rbl  mod_access_rbl.c

Now we use the “apxs” program to compile the module. If you don’t have this program, you probably need to apt-get install apache-dev.

shkaf:~/mod_accessR apxs -c mod_access_rbl.c
gcc -DLINUX=22 -DEAPI -DTARGET="apache" -I/usr/include/db1 -DDEV_RANDOM=/dev/random -DUSE_HSREGEX -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O1 -fPIC -DSHARED_MODULE -I/usr/include/apache-1.3  -c mod_access_rbl.c
gcc -shared -o mod_access_rbl.so mod_access_rbl.o -lc -lm -lcrypt -ldb1 -ldb -lexpat

And then su to root to install the module with apxs as well:

cwage@shkaf:~/mod_access$ su -
Password:
shkaf:~# cd /home/cwage/mod_access
shkaf:/home/cwage/mod_access# apxs -i -n access_module mod_access_rbl.so
cp mod_access_rbl.so /usr/lib/apache/1.3/mod_access_rbl.so
chmod 755 /usr/lib/apache/1.3/mod_access_rbl.so

Last but not least, we need to edit /etc/apache/httpd.conf to reflect the changes. mod_access_rbl is a drop-in replacement for mod_access so instead of adding it, we will just change:

LoadModule access_module /usr/lib/apache/1.3/mod_access.so

to:

LoadModule access_module /usr/lib/apache/1.3/mod_access_rbl.so

Now simply restart apache and voila, you can now use DNS blacklists in your .htaccess or . This is what my .htaccess looks like:

Order allow,deny
allow from all
deny via sbl-xbl.spamhaus.org
deny via relays.ordb.org
deny via list.dsbl.org
deny via unconfirmed.dsbl.org

I may tweak these ongoing and put some more thought into which blacklists are most likely to just catch zombie traffic, but for now, there you have it.


Comments

tom shermanJanuary 17, 2005 at 00:09 · reply

Chris,

Great stuff! And slightly beyond my ken, so I’m glad to someone more technical than myself thinking about this as well. I will disagree with your disagreement with me, however. :)

Trying to get spammers to stop referrer-spamming by convincing people to stop posting referrer info is like thinking that e-mail spam could be stopped if most people would just ignore it.

I think a better analogy would be to try to combat e-mail spam by telling people not to read their e-mail. If people didn’t read their e-mail, then yes, sending spam e-mails would be pointless. Similarly, if people didn’t publish their referrer logs, sending referrer spam would be pointless. (I can’t believe that spammers are putting this much effort into referrer spamming just to get webmasters click to click on spam URLs from password-protected stats pages.)

As to incorporating DNS blacklists into .htaccess (great write-up!)—how does that affect response time? Seems like it might slow things down a bit?

Chris WageJanuary 17, 2005 at 00:16 · reply

I think a better analogy would be to try to combat e-mail spam by telling people not to read their e-mail. If people didn’t read their e-mail, then yes, sending spam e-mails would be pointless. Similarly, if people didn’t publish their referrer logs, sending referrer spam would be pointless.

Well, I for one have stopped publishing any referrer info (or indeed, any website stats at all), but I’m not optimistic this strategy will help much.

Seeing the occasional forged referrer in my logs these days is water off my back – I’m resigned to it.

What I’m really hoping to stop is floods on a scale such as to DoS my site like I’ve experienced in the past.

Response time is affected, but not much for normal usage. The query responses are cached by my local nameserver on the same network, so the most someone would notice is a slight delay on the first load of the page.

Spammers using a botnet of zombies would notice a delay for every new IP they use, which, well, is fine by me, and hopefully it’s followed by a 403!

tom shermanJanuary 17, 2005 at 00:24 · reply

Excellent. I’ve updated my entry with this info.

Thanks for the pointers on this. I can see it helping in the apache logs already.. heh..

Now, if we could somehow combine this with MT-Blacklist so that we could get list of IPs that have sent comment spam and make our own comment spam DNSBL.. Heck, if I could get a list of IPs, I could incorporate it into my BIND config files and make my own DNSBL. I’ll have to think on that process..

Chris WageJanuary 23, 2005 at 21:11 · reply

Strictly speaking, it won’t help anything in your logs, save making it easier to filter out – the referer is still shown, except it yields a 403 instead of a 200 OK.

Using this module is more useful for just preventing the 200 OK send the entire page as data every time it’s requested and decimating your bandwidth.

Both my Wordpress and MT installations have been getting pummelled by referer spam over the last few days. I’ve learned from ~8 years of fighting email spam is that playing whack-a-mole with domain names and IP addresses does not scale and that operating an effective public DNSBL invites lawsuits and DDoS attacks.

Considering that bleak backdrop, I offer the following project:

Set up a local pair of DNSBLs (one traditional IP blacklist to track spam sources, and one RHSBL to track spammed domains) using rbldnsd (see http://www.corpit.ru/mjt/rb… and http://www.surbl.org/rbldns… “http://www.surbl.org/rbldnsd-howto.html)”)

Obtain abusive source addresses and domains from logs (webserver, firewall, application/blog/wiki, etc.), and store in a database along with a time-to-live metric. This implies having a local listing and expiry policy.

Generate rbldnsd zone files and other blacklists from the database. Periodically expire old entries but save them for historical analysis.

This is all well and good for a single site, especially if the DNSBLs are not exposed to the general public. The real value is allowing blacklist information to be shared in a robust, secure, anonymous, trusted manner, e.g., via an application such as Freenet (http://freenet.sourceforge…. “http://freenet.sourceforge.net/)”)

Much of the necessary technology already exists and is mature; the difficulty is in integrating the pieces, building tools to simplify data analysis, policy enforcement, and information sharing, and developing a trust model in which some participants may choose to remain anonymous (this is a hard problem - see http://www.cl.cam.ac.uk/Res… “http://www.cl.cam.ac.uk/Research/SRG/opera/projects/secure/)”)

Not that all this needs to be implemented at once or at every site. The goal is to distribute the effort specifically to allow every casual blogger to take advantage of the results without requiring them to be a DBA/network admin/spamfighter.

Admittedly, it’s a big project but one that I believe is achievable.

Can anyone post up a copy of mod_access_rbl? I can’t download it because blars is so overly aggressive blocking IPs.. I have comcast cable modem at home and can’t d/l it.. and for whatever reason, my colo provider where my website is is also blocked by blars (but it’s kosher, no spam, definitely not an open relay, and not on SPEWS or other lists.. what the heck?!)..

Any help would be greatly appreciated… I’d like to get this installed as I’m currently getting killed by referer spam…

sine nomineSeptember 05, 2006 at 16:18 · reply

thank you so much. a web board i run was invaded by trolls, and this has helped me keep them banned. no more anonymouse for them.

Thanks! Your comment has been submitted and will appear shortly.


Leave a comment