Many WordPress MU blog admins are going after spam registration and spam comments in the wrong ways.  Lately I have been doing some studying on the methods in which spam bots attack websites with this spam and the reasons of “why”.

METHODOLOGY
Bots usually utilize cURL, the open source PHP package used to gather data from a web page and access external web pages to include in its own script or to dump in a database.  This package is very effective in turning dinosaur websites that refuse to use feed systems into a neatly laid out feed for use in its own website or syndication in other websites. cURL allows the developer to allow the program to mimic any web browser out there as far as what the targeted site sees and logs, and it can be run from nearly any hosting provider out there.

This makes it extremely hard to actually stop them because someone whom has 50 hosting accounts and the ability to make the “referring” address and “user agent” (aka web browser) to appear to be anything they want makes it especially hard to identify the bots.  I created a program called Super CAPTCHA for the WordPress MU based websites and Buddypress.  This system protects the sites against bot registrations and brute force login attacks by simply displaying an obscure image with words on it, intentionally cutting off a few letters or making some letters completely unreadable forcing the human logic engine to step forward and complete the puzzle.  Every bot I have tried to use on my own website has failed to no avail and I have been using those that these false “SEO Specialist Company”s use.

I then added a logging mechanism that logs all failed registrations and automatically “perma-blocks” registrations that fail more than 5 times making even a successful attack an utter failure.  With this, I have analyzed these bots and the system in which they use to attack websites and did determine that they use different user agents and even clone themselves to appear legitimate bots like Google.

Here is an example from the logs:

47.a.85ae.static.theplanet.com 174.133.10.71 | User: coiamjmartzv | Email: coiamjmartzv@blogstores.info
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

As you can see, they use the EXACT user agent that the Google bot uses, however the Google bot has never made a post on any of my websites; ever, and it sure loves to crawl every single page I have ever published.  This means that these people have gotten smarter, making IP banning and user-agent banning about as effective as using a pea-shooter against an armored tank.

WHY THEY DO IT
Many people ask me, “What the hell do they get out of registering a blog on my website?”.  The answer is a bit of logic, common sense, and thinking like your opponent.  In this case, applying logic, they are targeting sites that host blogs and blogs running on systems that have a link from a “Google Ranked” page to their own blog. This by itself gives their newly created blog a page rank, usually with just a months time.  From there, they post links, some hidden to regular web browsers using <style>’s and <span style=”display: none;”>s so that the links are viable to search engines.  So when it boils right down to it, they are spamming your blog to artificially inflate their master-website’s page ranks, which will in return make their search results stronger when searching them on search engines like Google and Bing.

Thankfully there are people like me out there, that strive to do nothing more than help out public places and improve the quality of content on the web (as I grow tired of searching for something and having to sift through 100-200 websites of spam links before reaching the website I was looking for in the first place).  If you are a WordPress MU blog owner, I strong suggest your download Super CAPTCHA for your blog.  I only ask that in return, you do me 2 favors: 1) Keep the credits in tact, and 2) let me know what you think.