Did you know that when Bill Gates enters a bar, the average customer becomes a billionaire? That is what I call the Bill Gates Effect. (Many thanks to Introductory Statistics)
Pss! There is a Power BI dashboard at the end of the blog post, don’t miss it!
The Bill Gates Effect Explained
A business owner is trying to understand the average income of the clients in the bar next door so he hires a student to interview the customers.
When the student had finished the interviews and the analysis, the person with the highest income leaves the bar and Bill Gates comes in. Frustrated, he re-runs the analysis this time with Bill Gates data instead.
The student comes back to the business owner and reports that the average customer in the next door bar is a billionaire!!
Average income= $ 7 900 042 500
The business owner ask the student to show the calculations and after reviewing the data understands the ”mistake”.
Income average Before Bill enters the bar:

Income average After Bill enters the bar:

Most people would agree that $47,500 (the median) is a better reflection of the income amounts for the people sitting in the bar and that the amount of $8 billion has nothing to do with the reality of the 9 people in the bar who are not Bill Gates.
Side Note: I was refreshing my statistical knowledge for my Lean Six Sigma certification when I found that example, and since then I have never had to search for those terms again: average, mean and median. This clearly demonstrated for me how powerful Story Telling can be.
But, how does the “Bill Gates Effect” relates to Google Analytics?
You might already see where this is going, but let me give you an example using Site speed metrics given by Google Analytics.
In Excel, I have the downloaded the following metrics for two days: 8th and 7th of February 2016:
• Page Load Time (ms)
• Avg Page Load Time (ms)
• And Page Load Sample
The Calc. Load time (sec) is a manual calculation of Avg. Page Load Time according to Google Analytics formula in Power BI:
Sum(Page Load Time (ms))/1000/sum(Page Load Sample),
and of course, it yields the same results as if you would take Avg. Page Load Time (sec) directly from Google Analytics.

Let’s calculate the “averages” and Median for both days:

Let me explain the metrics:
1. Avg. Page Load Time (sec) is the metric you will get from Google Analytics
2. Avg Load time is the average the column Avg. Page Load Time (sec)
3. Median is the Median of the Avg. Page Load Time (sec)
Here is an example with Google Analytics Average Page Time and our calculated Power BI Median segmented by country:

There is a huge difference between the metrics when you have outliers in your data, right?
What metric should you use?
Now that you understand the problems using average, you will be horrified by all the averages Google Analytics provides without any consideration to outliers.
Between the average and the mean, here is the Rule of thumb of which metric to use:
• For data sets with outliers (remember, Bill Gates in the club), it is better to use Median
• For symmetric data sets, use the mean.
It is a bit more complex than that, refer to Introductory Statistics if you want to dig deeper, but this is good enough for our analysis.
Do I have outliers?
I can almost guarantee you that you do, but, let’s found out.
Using Power BI (Power Query), we categorize the Page Load Time in bins:
- a) 0-1
- b) 1-3
- c) 3-7
- d) 7-13
- e) 13-21
- f) 21-25
- g) 25-35
- h) 55-60
- i) 60-80
- j) +180
Add a column in the Query Editor with the following formula:
Bins = if [#”Avg. Page Load Time (sec)”]<=1 then “a) 0-1″ else if [#”Avg. Page Load Time (sec)”]<=3 then “b) 1-3″ else if [#”Avg. Page Load Time (sec)”]<=7 then “c) 3-7″ else if [#”Avg. Page Load Time (sec)”]<=13 then “d) 7-13″ else if [#”Avg. Page Load Time (sec)”]<=21 then “e) 13-21″ else if [#”Avg. Page Load Time (sec)”]<=25 then “f) 21-25″ else if [#”Avg. Page Load Time (sec)”]<=35 then “g) 25-35” else if
[#”Avg. Page Load Time (sec)”]<=60 then “h) 55-60″ else if [#”Avg. Page Load Time (sec)”]<=180 then “i) 60-180” else “j) +180”
And visualize them in a bar chart:

Or you can use one of the latest custom visualizations, the box plot: (the outliers are the red dots)

Can I kick Bill out from my club?
Unfortunately, no.
You cannot remove your outliers, but to understand why, we need to understand how site speed is measured. Here is a timeline for how Google Analytics measures site speed and page load:

Simplifying things a lot, you have to take into account:
• People’s internet speed, that varies by location
• your webserver and domain name service,
• how well your website is built,
• and which browser is being used.
Looking back at our example above, 28 seconds might not be an outlier, but a normal performance at some locations and or performance with some browsers.

What should you do then?
Segmentation, segmentation, segmentation
What I normally do is to segment my data by those factors and look for patterns there.
Here is a great example, using our bar chart and filtering them by continent, you can see that the majority of the data in Africa falls in the 7-13 secs bin, while in America falls into the 3-7 secs category.
Makes sense, right? America has probably better internet conenctions than Africa.

So, an outlier in America might not be an outlier in Africa.
Does this mean that I don’t have outliers? Looking at the tail in the bar chart above, we still have “true” outliers, and those, you can remove.
Let’s dig deeper to the Top Offenders to uncover our true outliers:

One person from Russia, had a Page Load Time of 551 Secs!! And the median for Europe is 5,93, so that is definitely an outlier.
It would be great if we could eliminate those outliers easily in Power BI, and of course, somebody thought about it already, so Vote it up, but if you are in a hurry, DataSavvy solved this and posted the solution in his blog, thanks!.
In the meantime, Mean instead of averages will give you a better picture of reality. Remember our chart Page Loads by Location?
When we talk about Page Loads, seconds are as important as milliseconds for a 100 meters runner, so it is important to get as close to the truth as possible.

Do you want to analyze your Google Analytics data?
Try my Google Analytics 4 tempate:
About the report
The following Power BI report app will allow you to connect to Google Analytics 4 to analyze your website performance.
The report contains 7 tabs to analyze different aspects of your website:
- The User Acquisition Page, will allow you to get a deeper understanding on how your new users reach your site.
- The Traffic Acquisition Page, will help you understand where your traffic comes from for new and return users.
- The Pages page, lets you analyze your content so you can improve it or create more of what is working already.
- The Landing Page , will help you understand which pages are driving traffic to your site.
- The Tech Page, will help you optimize your site to the technology your audience uses. and finally,
- The Demographic Page, will help you learn more about your audience.
Important note:
This report uses Windsor.ai connector for Google Analytics.
They have a free tier you can use. If you need a pair tier you can get 15% off with this code:
CURBAL
Is the template using the built in Power BI connector in Power BI?
No, because it is not possible to send filters back to Google Analytics at the time of the query and therefore the data will be incorrectly aggregated.
What did I use to create the report?
This report uses Windsor.ai connector for Google Analytics.
They have a free tier you can use. If you want a paid tier you can get 15% off with this code:
CURBAL
Can I get the report?
If you tested the app and you want to personalize it, you can buy the report here:
-
Google Analytics 4 Template App by Curbal449 € (Price excl. VAT: 359 €)
For more information about this report check this blog post.
Leave a Reply