xiven.com stating the blatantly obvious since 2002

The ongoing war against comment spam

Whilst looking around various weblogs today, I noticed many posts concerning the evil that is comment spam. Fortunately for myself I have yet to be affected by this blight, but it is still something I am concerned about.

During my wanderings, one particular thing caught my attention. One entry on Adam Kalsey's weblog where a person replied to an accusation of being a comment spammer: they said that someone (one of their competitors) had posted the spam whilst masquerading under the accused's IP address.

Whether or not this was true, the possibility here is that because posting a comment just requires a simple form submit, a user can quite easily use a fake IP address as there is no need to receive the reply from the server.

I thought to myself: there must be a safe way to prevent this possiblity entirely using a simple bit of HTTP confirmation.

Note: from this point on, an understanding of HTTP is strongly recommended.

My first thought was to use HTTP Authentication to force the browser to send back some kind of response before it can initiate a POST request. Of course, this would cause the browser to pop up a login box, so that idea was quickly scrapped.

Then I thought: how about I redirect the POST request?. Much like this:

  1. User posts comment
  2. Browser submits form as a POST request to the specified action URI
  3. Server tells the browser to send the POST request to another URI
  4. Browser submits form as POST request to new URI
  5. Server accepts the POST request and then tells the browser to GET the original comment page
  6. Browser GETs the comment page, therefore returning the user to their post

This can be achieved by using a 307 Temporary Redirect followed by a 303 See Other. Unfortunately, a side effect of using the 307 code is that a conforming browser would alert the user that their POST request was being redirected (for security reasons). This is, unfortunately, undesirable in this particular case.

So I come to my final idea. It works a little like this:

  1. User posts comment
  2. Browser submits form as a POST request to the specified action URI
  3. Server accepts the POST request and then tells the browser to GET a confirmation URI
  4. Browser GETs the confirmation URI
  5. Server accepts the confirmation and then tells the browser to GET the original comment page
  6. Browser GETs the comment page, therefore returning the user to their post

This can be done by using 2 303 See Other headers.

An example in more detail:

  1. User fills their comment in into a form. The form has method="post" and action="processcomment.php"
  2. User click submit button. The browser retreives processcomment.php using a POST with the contents of the comment
  3. processcomment.php writes the comment to the database, but gives it an unconfirmed status. It also generates a random number and stores it in the database with the comment.
  4. processcomment.php then sends the following HTTP headers:
    HTTP/1.1 303 See Other
    Location: confirmcomment.php?postid=xxx&confirm=yyy
    where xxx is the unique ID of the new comment and yyy is the random number
  5. The browser retreives confirmcomment.php?postid=xxx&confirm=yyy using the GET method
  6. confirmcomment.php checks the equality of the random number then updates the database, marking the comment as confirmed
  7. confirmcomment.php then sends the following HTTP headers:
    HTTP/1.1 303 See Other
    Location: viewcomments.php#xxx
    where xxx is the unique ID of the new comment
  8. The browser retreives viewcomments.php, returning the user to their comment

This process is entirely transparent to the user.

Important note: this method will not prevent people from using proxies (anonymous or otherwise) to post a comment. All it does is ensure that the comment poster is not faking their IP address. Anyone who does use a fake IP will not receive the instruction to GET the confirmation URI and so their comment will not be confirmed. Now I may be completely bonkers, but I think that this could be potentially useful. More than likely, several people have already thought of this before me, but such is life.

Other important note: Note that this is almost certainly a violation of the use of GET and POST, since a GET is being used for an action which has side-effects (is not idempotent). In this case though, I would consider this use to be safe IMHO.

Addendum: this story of a weblog owner billing a spammer provided some amusement. ☺

Addendum 2: as has been pointed out to me in a comment, this is all completely pointless since HTTP sits on top of TCP which already deals with this scenario. Bleh.

Addendum 3: this weblog actually does now use this technique (and has done for a few months), not to solve the “fake IP problem” but instead to block badly written spambots that haven't been programmed to deal correctly with the response. It's really quite effective (though of course it doesn't block them all, and certainly won't stop manual spams), and it's quite nice how a problem arose to fit my solution. ☺

Posted: 2003-11-13 15:31:03 UTC by Xiven | Cross-references (0) | Comments (11)

Cross-references

None

Comments

  • Adam Kalsey (2003-11-13 16:22:07 UTC)

    A simpler scheme to implement would be to have the comment form post to another page that then requires the user to click a button confirming their post. The intermediate page would have all the comment information in hidden fields, along with a random ID of some sort. So in order to post a comment, the user would have to hit two submit buttons, preventing him from IP spoofing.

    But I don't think the spammer was saying his IP address had been spoofed. He was saying that someone had posted his URL and email address in the form in order to make it look as if he were spamming.

  • Kye (2003-11-13 16:22:19 UTC)

    Given that HTTP uses TCP which is already a 2 way protocol (Sender sends a SYN packet, Receiver sends an ACK packet (And possibly another flag, I can't be bothered looking it up) Sender sends first data packet), forging an IP address is non trivial without controlling a router between you and the sender. The real problem with IP addresses in HTTP Proxies, where several people will have the same address. One solution maybe to switch to HTTPS, which tends not to go through proxies.

  • Joseph (chujoe) (2003-11-13 17:05:48 UTC)

    Well, I thought I was pretty clear. My spammer ultimately admitted that he waqs the owner of the domain that had spammed me. He simply thought it was his right to do so because I had comments enabled. Actually, I got spammed yesterday by someone who appears to have spoofed an IP & I'm working with the admin at the spoofed host to hunt the lowlife down.

    Anyhow, I hope to retire from my job as the Scourge of Comment Spammers in a couple of weeks when I change hosting companies & install Jay Allen's MT-Blacklist. I'm finding that turning off comments in older posts & not allowing HTML tags in comments has cut down on the problem. Along with MT's IP-banning feature. I have to say, being the Scourge has been good for my traffic stats. Nothing like moral indignation to haul in the hits!

  • Xiven (Registered) (2003-11-14 01:08:36 UTC)

    Hmmm... I hadn't considered the TCP aspect. Silly me - back to the drawing board.

  • Casey (2003-11-14 05:47:05 UTC)

    Please visit my site ([URL removed -- Xiven]). Anyone signing up in the next 2 days will get a free t-shirt of a llama.

  • Xiven (Registered) (2003-11-14 07:08:28 UTC)

    Wow, my first comment spam. I feel so priviledged. Maybe the poster was trying to be funny, so I'll leave the comment there, but the URL is gone, just in case.

  • Jacques Distler (2003-11-16 22:10:05 UTC)

    Dang! And I *really* wanted that llama T-shirt!

    Seriously, though, as far as I can tell, the spammers have been using two techniques so far.

    1) search google for for commonly-used comment-entry scripts (e.g. mt-comments.cgi) and post their spam to them.

    2) follow links to popular weblog entries (from daypop, technorati, or wherever), look for a comment-entry form on that page. If they find one, submit their spam to it.

    Both of these techniques are easily defeated.

    Which is not to say that the spammers won't take the next step and add even more intelligence to their spambots. But, frankly, that is a battle they will lose. There's a real asymmetry between the defender and the attacker. The latter needs to make provisions for a myriad of contingencies, while the former needs to choose just one.

    Since I did my little bit of spam-proofing on Oct 12,
    http://golem.ph.utexas.edu/~distler/blog/archives/000236.html
    I've seen 40 attempts of type 1 (and I've no idea how many attempts of type 2), but not a single spam has gotten through.

  • Kredit (2004-01-18 20:35:28 UTC)

    junkeater.com is offering a tool for a spam free blog. have a look.

    greetings

  • Chris (2004-07-06 18:18:08 UTC)

    I tried the spam tool at junkeater .com and its not that effective, it blocks about 50% of spam for me which is really low.

  • Anonymous (2006-05-19 03:54:17 UTC)

    we kau kat ner

  • Xiven (Registered) (2007-05-01 12:21:22 UTC)

    Just as a matter of interest, this weblog does now use the above technique (has been for a few months), not to solve the "fake IP problem", but to block badly written spambots. It's surprisingly effective.