Address Munging Strategy

Updated: 13 June, 1999

About a year ago, I started to get a lot of spam at my usual email address, which is guitar@plover.com. Actually it isn't, but I can't mention my real address on this page; I'll explain why in a moment. I had posted a lot of articles to usenet with this address, and it was widely advertised, and attracted a lot of junk mail. So I used it as the focus for my mail filtering experiements.

The mail filtering was a very mixed success. It trapped most of the spam, but it also had a lot of false positives. People who should have been able to write to me were having their mail rejected. This was a problem.

I thought I'd found a clever solution: I would assign a different address to each of my software projects. For example, I would put the address guitar-tpj-regex@plover.com on my article about regular expressions for The Perl Journal, and the address guitar-perl-diff@plover.com in my Perl Diff.pm module and on the web pages about that. Those addresses weren't filtered, so people writing to me about specific projects would never run afoul of the filter. I'd also be able to take advantage of the multiple-address features of the qmail mailer to arrange separate delivery for all those many addresses; I could have mail to an address directed into a folder, or given priority, or get any other special treatment I wanted.

This plan backfired on me. Lots of spammers have web robot programs that systematically collect all the email addresses that they can from all the web pages they can find. When they got to my pages they would collect twenty or thirty different addresses for me and send spam to all of them. Ooops.

Here's the solution I adopted: I installed a plug-in module into my Apache web server that prevents it from ever displaying any of my real addresses. Instead, if it sees something like guitar-perl-diff@plover.com it replaces it dynamically with an address of the form guitar-perl-diff-id-Cw7+Dz8pKC@plover.com. That's why I can't show you my real email address; if I tried, the server would alter it for me before you saw it. Here's what happens if I try to mention the address guitar-fake@plover.com with the guitar part replaced with my real username, mjd: mjd-fake@plover.com.

The Cw7+Dz8pKC part of the address is generated at the moment that the page is served, and contains an encoded version of the client's IP address. This means that every person sees a slightly different address. The addresses also change daily. By looking at the id part I can see who first copied my address for spamming purposes, and when.

Here's the real point though: All these addresses actually work. You can send mail to any of them. Here, try this one if you like: mjd-perl-addrmunge@plover.com. But if one of those addresses starts to attract spam, I can expire it, so that it doesn't work any more. This only affects the spammers, because only the spammers have the expired address; all the other addresses that the web server showed to other people are still good. I can expire all the addresses from a particular day, or all the addresses that were served to a particular IP address on a particular day. Everyone else's addresses continue to work.

One downside is that I had to expire all the ID-less versions of the addresses that everyone already has. But I can distinguish those and when someone sends me mail at one of them I can have the filter send back an apologetic note explaining the situation. That's inconvenient, but I'll never have to do it again.

A possible side effect of this strategy is that I might be able to see who sold address lists to whom, but it's too soon to know if that'll be interesting or not.

The code isn't very pretty, but I banged it out quickly. If you want to use it, go ahead, but beware that there are many things that are peculiar to my configuration that are built into it; for example, at present it only affects addresses of the form mjdsomething@plover.com.

Updated: 13 June, 1999

Two small changes today:

I changed the IDs to use only lowercase letters. I found that the spammers would see an address like guitar-perl-diff-id-Cw7+Dz8pKC@plover.com and try sending mail to guitar-perl-diff-id-cw7+dz8pkc@plover.com instead. Using all-lowercase makes the IDs a little longer, but only by about 15%, and that is all right.
I added an exemption for addresses that contain -subscribe@ because messages to those addresses are handled automatically by EZMLM and the IDs mess them up. Unfortunately the exemption makes the code substantially more peculiar. Here's what I tried first, and it's a real horror:
```
    s{\b(mjd[-\w]*)
       \@
       (
	(?:\w+\.)?
	plover\.com
       \b)
      }
     {($a,$b) = ($1,$2);
       $a =~ m{-subscribe$} ? "$a\@$b" : "$a-id-$id\@$b"
     }gex;
```
Oh, like, gag me with a runcible. Does anyone have any suggestions about what to do about this? I know a way to eliminate the eval but it is if line-of-the-day complexity (watch for it; it's useful) and the code will get substantially bigger. There should be a simple way to do it. I finally settled on using a negative lookbehind assertion, which is fine, if you like negative lookbehind assertions, and also if you are running perl 5.005. Can anyone think of anything else, preferably something that does not depend on an Ilyazakharevichism?

Also a note: The spammers see guitar-perl-diff-id-cw7+dz8pkc@plover.com and then they actually send mail to dz8pkc@plover.com instead. So far I've had several such messages arrive at my machine, but no spam message to the correct addresses. Isn't that interesting? It suggests that a really simple antispam strategy is to just to include a + sign in your address.

Another note of interest: I got email today addressed to d0u7tm@plover.com, which bounced. Perusal of the logs showed that the address should have been something-id-Cxc-D0U7tm instead, and the Cxc indicates that this address was served today. The was collected by someone running EmailSiphon at 7:08 AM and I received email at that address only nine hours later, at 16:14. Is that not astonishing?

Note to self: Fix the module so that if the User-Agent is EmailSiphon, addresses are munged to be @cyberpromo.com. Or maybe just run wpoison for them.

Return to: Universe of Discourse main page | What's new page | Perl Paraphernalia

mjd-perl-addrmunge@plover.com