HomeFAQStatisticsVariousContact

Anti-spam : how to protect your blog/forum from comment spam

Many webmasters do not want to protect their blog or forum with a CAPTCHA test, or to use external filtering services, in order not to bother too much their users. However, they are all facing the same problem : blog spams (AKA comment spams).
Let's see how we can get rid of them using different tips and tricks, even very simple ones, so that those spambots will belong to the past and, of course, in a fully transparent manner for your visitors.
Examples are sorted below from the simplest one to the trickiest one.

I - Blogs

II - Forums


I - Blogs :

  • Multiple external JS files :
  • Very easy to setup and relatively efficient : the form is splitted into several JavaScript parts, all of them mixed up inside external files :

    form_01.js :

       function name_form (){
          write("Enter your name: <input type=text name=nom size=20><p>");
       }
       function close_form(){
          document.write("</form>");
       }
       var myscript='my_scrip.pl';
    
    form_02.js :
       function comment_form(){
          document.write("Your comments:<br><textarea name=comment cols=48 rows=6>");
          document.write("</textarea><p>");
       }
    
    form_03.js :
       function display(){
          document.write("<form method=post action="+myscript+">");
          name_form();
          document.write("Your email: <input type=text name=email size=20><p>");
          comment_form();
          document.write("<input type=submit value='Post your message'>")
          close_form();
        }
    
    HTML page :
       <html>
       <head>
       <script language="JavaScript" src="form_01.js" type="text/javascript"></script>
       <script language="JavaScript" src="form_02.js" type="text/javascript"></script>
       <script language="JavaScript" src="form_03.js" type="text/javascript"></script>
       </head>
       <body>
       ...
       ...
       <script>display();</script>
    
    

    That's it ! No spambot script will ever be able to reconstruct such a form.

  • Pop-up window :
  • Yet another very simple method that only a browser could understand : to post a comment, your visitors will click a JS link to open a window. That link, using a JS function (display_link) will insert another link to the popup() function which contains the URL of the contact form and also will open and center the window on the screen.

    JS form_01.js external file :

       function popup() {
          // center the popup window
          var height=400;
          var width=400;
          var scroll_bar=0;
          // form URL :
          var url='http://mysite.tld/myform.html'
    
          var str = "height=" + height + ",innerHeight=" + height;
          str += ",width=" + width + ",innerWidth=" + width;
          if (window.screen) {
             var ah = screen.availHeight - 30;
             var aw = screen.availWidth - 10;
             var xc = (aw - width) / 2;
             var yc = (ah - height) / 2;
             str += ",left=" + xc + ",screenX=" + xc;
             str += ",top=" + yc + ",screenY=" + yc;
             if (scroll_bar) {str += ",scrollbars=yes";}
             else {str += ",scrollbars=no";}
             str += ",status=no,location=no,resizable=yes";
          }
          win = open(url, 'myform', str);
       }
    
       function display_link(){
          document.write('<a href="javascript:popup();">Post a message</a>');
       }
    
    HTML page :
       <html>
       <head>
       <script language="JavaScript" src="form_01.js" type="text/javascript"></script>
       </head>
       <body>
       ...
       ...
       <script>display_link();</script>
       ...
    

  • HTML events :
  • A user will have to type some text before sending his message. With a simple event like onKeyUp, when he will type his name a hidden variable will be set to a specific value that your ASP/PHP or CGI script will check. If it is not correct, the message will be rejected. In the example below the initial (and wrong) value is 1350 and the correct one is 893 :

    HTML page :

       <form name=myform action=mon_script.pl method=post>
       <input type=hidden name=myevent value=1350>
       ...
       Your name: <input type=text name=nom size=20 onKeyUp="document.myform.myevent.value=893">
       ...
       </form>
    

    CGI script (or whatever) :

       #!/usr/bin/perl
       use CGI;
       $QUERY = new CGI;
       $EVENT = $QUERY->param('myevent');
       if ($EVENT != 893){
          print $QUERY->redirect(-url => "http://fbi.gov");
       }
       ...
       ...
    

    Same here, but we are using 3 variables hidden in an external JS script to match our 893 correct value :

    JS form_1.js external file :

       var topsecret=0;
       var substract=107;
       var myvalue=1000;
    
    HTML page :
       <form name=myform action=myscript.pl method=post
       onSubmit="document.myform.myevent.value=topsecret-substract;">
    
       <input type=hidden name=myevent value=1350>
       ...
       Votre nom: <input type=text name=name size=20 onKeyUp="topsecret=myvalue">
       ...
       <input type=submit value='Post your message'>
       </form>
    

    We coud use other events like onMouseOver, onFocus as well as onLoad (or window.load) which will modify our value when your visitors browser has fully loaded the HTML page :

       <html>
       <head></head>
       <body onload="document.myform.myevent.value=893">
          ...
          <form name=myform action=myscript.pl method=post>
    
          <input type=hidden name=myevent value=1350>
    
          Your name: <input type=text name=nom size=20>
          <input type=submit value='Post your message'>
          </form>
          ...
       </body>
       </html>
    

  • Elapsed time between loading the HTML page and posting the message :
  • It's easy to understand that between loading/reading the HTML page and posting a message it should take at least a dozen or more seconds. By posting to your PHP/ASP or CGI script the time it was when the page was fully loaded, it will ensure there was a reasonable elapsed time between both. We'll use the epoch (number of milliseconds or seconds since January 1st, 1970) either by using Javascript or SSI ((Server Side Includes).

    External JS file + HTML code :

       function print_epoch(){
          var thisdate = new Date();
          // getTime() number of MILLISECONDS since epoch:
          var epoch=thisdate.getTime();
          document.write("<input type=hidden name=epoch value="+epoch+">");
       }
    

       <form name=myform action=mon_script.pl method=post>
       <script>print_epoch();</script>
       ...
       Your name: <input type=text name=name size=20><p>
       ...
       <input type=submit value='Post your message'>
       </form>
    

    HTML file only + SSI :

       <form name=myform action=mon_script.pl method=post>
       <!--#config timefmt="%s" -->
       <input type=hidden name=epoch value="<!--#echo var="DATE_LOCAL" -->">
       ...
       Your name: <input type=text name=name size=20><p>
       ...
       <input type=submit value='Post your message'>
       </form>
    

    Perl (or whatever) script :

       #!/usr/bin/perl
       use CGI;
       $QUERY = new CGI;
       $EPOCH = $QUERY->param('epoch');
       if ((!$EPOCH)||($EPOCH!~/^\d{10,13}$/)){goto REDIR}
       # fetch only seconds:
       $EPOCH=~s/^(\d{10}).*/$1/;
       $now=time;
       # less than 10 seconds between loading the page and posting the message?
       if ($now-$EPOCH<10){
    REDIR:
          # bugger off !
          print $QUERY->redirect(-url => "http://fbi.gov");
       }
       ...
    

  • Obfuscate HTML elements :
  • Without the need of a heavy cryptographic system, you can just obfuscate your HTML code by replacing each character by its decimal value using the charCodeAt() and fromCharCode() JS functions. You can test below, output your code and simply cut and paste it into your HTML page :

    Enter your HTML code and text here :

    Click on "Obfuscate HTML code" to display the code below :

       

    You can download the code used in that example.

  • <IFRAME> tag :
  • The <IFRAME> tag allows to load another page within the current HTML page. It can be very interesting for our purposes as we can save our contact form to a separate file and load it with an iframe :

    Main HTML page :

       <html>
       <head></head>
       <body>
       ...
       This is my text/article...
       ...
       <iframe src="myform.html" width=400 height=200 style="border:none;"></iframe>
       ...
       ...
       </body>
       </html>
    
    External contact form page (myform.html) which will be loaded by the <IFRAME> tag :

       <html>
       <head></head>
       <body>
       <form method=post>
       Name:<input type=text name=nom><p>
       Email:<input type=text name=email><p>
       Comments:<textarea name=comment></textarea><p>
       <input type=submit value='Post you message'>
       </form>
       </body>
       </html>
    
    That is what it looks like :

    Of course, you can obfuscate the <IFRAME> link to your contact form as seen in the "Obfuscate HTML elements" section :

       <html>
       <head></head>
       <body>
       ...
       This is my text/article...
    
       ...
       <script>
     document.write(String.fromCharCode(60,105,102,114,97,109,101,32,115,114));
     document.write(String.fromCharCode(99,61,34,102,111,114,109,117,108,97));
     document.write(String.fromCharCode(105,114,101,46,104,116,109,108,34,32));
     document.write(String.fromCharCode(119,105,100,116,104,61,52,48,48,32));
     document.write(String.fromCharCode(104,101,105,103,104,116,61,50,48,48));
     document.write(String.fromCharCode(32,115,116,121,108,101,61,34,98,111));
     document.write(String.fromCharCode(114,100,101,114,58,110,111,110,101));
     document.write(String.fromCharCode(59,34,62,60,47,105,102,114,97,109,101));
     document.write(String.fromCharCode(62));
     </script>
       ...
       ...
       </body>
       </html>
    


  • Pushing the spambot to trap itself :
  • That's a cool one : if there is one thing that spammers really hate, it is when you decide to play with them, making them losing their time and thus losing money. It would be a shame not to do it !
    So we will take again our first example (multiple external JS files) and obfuscate the call to the main function display() with the fromCharCode() JS built-in function. Then, just for fun, we will create a fake HTML form with fake <INPUT> tags that will be visible only to spambots scripts because it will be hidden with a 'DISPLAY:NONE' CSS instruction. When a spambot will come to your site, it will use the fake form and will be blacklisted, redirected or banned. The real form will be only visible to a visitor with a browser :

    HTML page:

       <html>
       <head>
       <script language="JavaScript" src="form_01.js" type="text/javascript"></script>
       <script language="JavaScript" src="form_02.js" type="text/javascript"></script>
       <script language="JavaScript" src="form_03.js" type="text/javascript"></script>
       </head>
       <body>
       ...
       <!-- display the real form by obfuscating the call to "display();" -->
       <!-- and thus hidding it from spambots -->
       <script>
          document.write(String.fromCharCode(60,115,99,114,105,112,116,62,97,102,102,105,99));
          document.write(String.fromCharCode(104,101,40,41,59,60,47,115,99,114,105,112,116,62));
       </script>
       ..
       ..
       <!-- this is the fake form, only visible to spambots -->
       <div style="display:none;">
       <form method=post action=this_is_a_trap.pl>
       Name:<input type=text name=nom>
       Email:<input type=text name=email>
       Comments:<textarea name=comment></textarea>
       <input type=submit value='Post you message'>
       </form>
       </div>
    

    What to do with spambots posting to the fake form ? They could be blacklisted however that would be almost useless since they will come back again few hours later with another IP. For spammers, time is money so let's make them wasting their time by redirecting them to localhost (127.0.0.1).

    this_is_a_trap.pl :

       #!/usr/bin/perl
       use CGI;
       $QUERY = new CGI;
       print $QUERY->redirect(-url => "http://127.0.0.1");
       exit;
    


  • Forcing Gzip compression :
  • Do you have a shared or dedicated server ? If so, there is nothing easier than forcing apache to output your contact form using GZIP compression.
    Activate mod_deflate from a terminal window (for apache v1.x, use aenmod command) :

    
     # a2enmod deflate
    
    
    Open your apache2.conf file and add the following lines :

    
     # Activate gzip for the following MIME types :
     AddOutputFilterByType DEFLATE text/html
     AddOutputFilterByType DEFLATE text/plain
     AddOutputFilterByType DEFLATE text/xml
     AddOutputFilterByType DEFLATE text/css
     AddOutputFilterByType DEFLATE text/javascript
     AddOutputFilterByType DEFLATE application/x-javascript
    
     # Force compression for your contact form :
     SetEnvIf Request_URI contac.html force-gzip
    
    
    Replace contac.html by the name of your HTML contact form page and reload apache. As its name says, the force-gzip environment variable will force the HTML output of your contact form to be compressed with gzip, even if the browser or script doesn't support any kind of encoding, and thus will feed the spambots with a lot of rubbish data.
    Here is a sample of what the bot will get :
    
     �T��W��.�;Jt@#f��	��\̷
     �F��>}�[]�+�B7�������7��j�1f��4��0,�"���h.;�n�7���QYEJwEP�VYf
     �N w�����ؘ���
     ק���‚®.▒8ªìW._˜F 5+Ò┬§␤Õç	KŽÎý®æé
     ÆÕ¢÷ù┘#ðê«ô²ž—▒SÊÐ_éÝ▒à¹=̒ƒ ùàÃ緈™3≠MVê┐Ø
     €° $¶┘¿§Ù┌ñÕ5”R≤°┼”« ½8Ð"␍┼™┐6Ù┌⎻�=-�o����A�T
     �J_�C��Ln���N���ku@��^
     �ÆƗ1�
     Ŕ�ڑM�ٔ��я?�i'Ӆ�n8�ɲ����ޞ��F!��g�]�0�	��K�
     ;cv�>���
    
    
    Of course, most search engines (Google, Yahoo...) won't be able to index your page and if you want to allow them to do so, you can 'whitelist' them just like in the next example below.


    II - Forums :

    Protecting a large forum may seem more difficult because using small JS or CSS tricks isn't really possible. Fortunately, here again, forcing gzip compression is in most cases more than enough.
    This is an example with Yaab forum and a very nasty bot used by spammers : XRumer. That bot can post hundreds of spams, open user accounts if needed and even crack some CAPTCHA codes. However, it doesn't support gzip compression. The screenshot below shows how a powerful bot like XRumer (v5.05 Palladium demo) can suddenly become totally harmless when the force-gzip variable is activated :

    XRumer

    force-gzip must be activated for the forum main script (viewing / posting messages). In the case of Yabb, it is YaBB.pl. To let crawlers indexing your forum messages, we can simply add them with the BrowserMatch directive which will cancel gzip compression only for them by activating the no-gzip variable.

    apache2.conf file (or httpd.conf) :

    
     # Activate gzip for the following MIME types :
     AddOutputFilterByType DEFLATE text/html
     AddOutputFilterByType DEFLATE text/plain
     AddOutputFilterByType DEFLATE text/xml
     AddOutputFilterByType DEFLATE text/css
     AddOutputFilterByType DEFLATE text/javascript
     AddOutputFilterByType DEFLATE application/x-javascript
    
     # Force gzip compression when calling forum script :
     SetEnvIf Request_URI YaBB.pl force-gzip
     # Deactivate if search engines :
     BrowserMatch "Googlebot" no-gzip
     BrowserMatch "Yahoo\!" no-gzip
     BrowserMatch "msnbot\/" no-gzip
     BrowserMatch "Twiceler" no-gzip
     BrowserMatch "Ask Jeeves" no-gzip
     BrowserMatch "Gigabot" no-gzip
    
    
    Restart Apache.


  • Do's and Don't's :
  • As we have just seen, it is quite easy to get rid of blog spam, however there are some methods to avoid because they are totally obsolete :
    - verification of the referer (HTTP_REFERER): spambots can forge it and make you believe they come from anywhere, including your contact page.
    - cookies : a spambot can accept any cookies you will send and will even return them to you each time you will ask it for (it can be done with only 5 or 6 lines of code).
    - blacklisting IP : bots are using proxies or compromised servers and will always come back with a different IP.

  • Conclusion:
  • Several other methods could be used to protect your blog/forum from comment spam, including rejecting posted messages having a URL in their body or even by using some AJAX tricks. The best methods aren't the most complicated ones but rather the personalised ones. Your contact form may probably be generated by a script so it's easy to alternate the above different examples as well as other simple and effective ones to protect your blog without any constraint for your visitors if you do not want to bother them with a CAPTCHA test or a post moderation of their messages.