I have to admit I don’t like spam — not with breakfast, not in my Jello, and certainly not in my Google Analytics reports. Google has provided the interwebs with a really complex and useful tool for analyzing website traffic, and it’s totally free. But therein lies the problem. Anyone can sign up for Google Analytics and begin sending their data over for analysis. And just as easily anyone can send fake data to Google.
This is called “Ghost Referral Spam,” and it’s trending in a terrible way. You may recognize some of the top offenders — 4webmasters.org, darodar.com, and best-seo-offer.com — their names clog your reports and push down the real referrers and valuable leads. Why would someone add fake hits to your site? Are they just mean? No, there’s something devious going on here too. The intent is to get you to visit their site, which is often redirected to a store (earning the spammer click-through credit), or worse — they may be leading you into a malware trap. If you see a fishy-looking referrer in your Analytics reports, do a web search before visiting the site.
How Does This Work Anyway?
Assuming you have Google Analytics set up, when a real visitor loads a page on your site, a request is made for a tiny file from Google. This request carries with it your Google Analytics ID and the tracking information from the visitor (e.g.: source, medium, referral). Ghost Referrals exploit this open transactional method by faking a request to Google. There is no actual visit to your site — instead a malicious robot sends a random Google Analytics ID (potentially yours) along with their site url as the referrer.
So those hits from ‘4webmasters.org’ and the like are never actually visiting your site. This means no website filters or code added to .htaccess can block them. The only way to remove this false data is through Google Analytics, using filters or segments. And it turns out you probably need both, at least for now.
Filtering and Segmenting Spam
The latest trick to remove Ghost Referral Spam is to exclude data where the hostname isn’t correct. Since the spammers are randomly sending IDs and data, they don’t actually know whose ID they are using. Often the hostname — which should be your domain name — is not set or uses another random website name. This post has a guide for setting up your filter, and also has a link to an advanced segment you can import into your views.
This excellent segment is easy to install, updated with info about all the latest bad guys, and it works like a charm. Segments are great because the don’t permanently alter your data. They’re just a way of viewing part of your data. You can even compare the full data view (below, in blue) with the segment (orange).
Notice how the blue spikes (Ghost Referral Spam) are clipped off. Turns out this spam accounted for over 25% of the website traffic in these three months! Segments are a little annoying because you need to remember to apply them every time you view your reports (unless you create a custom report or dashboard where segments can be included).
The best thing to do is create a new filtered view of your data that will restrict by hostname. Make this filter as soon as you can, and update it regularly with the latest spammers. However, keep in mind that all the spam hits collected before you created the filter will remain in your data. (This is why you’ll want to keep that segment around — use it whenever you need to look at your historical data.)
So We’re Safe Now?
Unfortunately, there’s one major problem with filtering or segmenting by incorrect hostnames — it won’t take long for spammers to figure out how to spoof the real hostname as well. What’s required here is a global solution from Google that will authenticate analytics requests to confirm real hostnames. Google has mentioned that they are working on the problem, but there is no mention of when this might be fixed.
In the meantime, set up your filters, apply your segments, and under no circumstances should you open up any spam.