Wednesday, September 2, 2015

Eliminate Ghost Referrals from Google Analytics Reporting

If you are using Google Analytics for reporting, I'm sure you've encountered many mystery domains appearing within your reports. Many have investigated the various websites that are doing ghost referrals and why they may be doing it. I'm not going to cover that in this post. Instead I plan to focus on how to eliminate the problem once and for all!

Google Analytics while simple to implement, is also just as simple to spoof. It is a basic JavaScript call that passes over information to be stored against a Google Analytics property. Several low life developers have started to send ghost referrals to Google Analytics which falsely inflate your numbers for your website.

Prevention steps:
Step 1: Start with a segment

Segments are similar to filters except that they don't remove the data, but group/segment it for reporting on certain patterns. I suggest starting with a segment as no data will be lost.

a) Add a name for the segment: I used Remove Ghost Referrals.
b) Add conditions using 'hostname' to only include your valid domains. Add all that apply and then also add translate.googleusercontent.com to cover Google Translate.
c) Add another condition to exclude where the hostname = translate.googleusercontent.com and the source does not contain google.

Additional criteria can be added but I find this covers 99.9% of what I wanted.

Step 2: Verify that your segment works as expected. Open a sessions report in Google Analytics and select the newly created segment (top of reporting pane). It should show you the percentage that met your criteria.  In my case I had 97% of my original (All Sessions) traffic. Your numbers will vary based on the size of the website and amount of traffic. If you change the report to display hostnames, you can verify the hostnames that were allowed through the segment filters.

Step 3: If you like the segment data and are ready to strip out the ghost referrals from All Sessions then its time to create a filter. The problem I ran into here is that there is no easy way to combine filters. So the logic for the segment won't translate exactly to a filter.

To come close to the segment logic, I implemented 2 filters.
1) Includes only domains I want to see in hostname field
2) Excludes any referrals from domains that are passing  translate.googleusercontent.com as their hostname.

So far I've only found one low life that is using that domain as the hostname. Look at the image to see the name as I refuse to list his domain in the post but it's a pretty version of boxer that Will Smith played.

As other sites use this technique, they will need to be added to the exclude filter. Let's hope Google comes up with a way to prove ownership of the request - but in the meantime this provides a suitable solution. 

I find these referral ghosts to cause more headaches for the small business of the world such as my NJ Wedding Photographer friend as they don't receive as much traffic and therefore it is a larger percentage of their overall traffic. Larger companies may not even notice the additional hundreds of spoof pageviews.

Want to spoof your own website to see how it works, then try this cool tool within Andreas Veithen blog post on the subject


No comments:

Post a Comment