Visualizing Web Analytics in R Part 5: Interactive Heatmap
This article is the fifth in a series about visualizing Google Analytics and other web analytics data using R. This article focuses on using heatmaps to determine which articles are important in which geographical areas. The series hopes to show how R and interactive visualizations can help to answer the following business questions:
- Which articles need work to improve search engine ranking
- Which articles are well ranked but do not get clicked, and need work on titles or meta data
- Where to focus efforts for new content
- How to use passive web search data to focus new product development
The other articles in the series are:
- Visualizing Web Analytics Data in R Part 1: the Problem
- Visualizing Web Analytics Data in R Part 2: Interactive Outliers
- Visualizing Web Analytics Data in R Part 3: Interactive 5D (3D)
- Visualizing Web Analytics Data in R Part 4: Interactive Globe
- Visualizing Web Analytics Data in R Part 6: Interactive Networks
- Visualizing Web Analytics Data in R Part 7: Interactive Complex
Heatmap of Region and Page
In looking at large amounts of data with only three dimensions, heatmaps can frequently be faster to understand the general trends than scatter plots or geographic maps, though they are not as good at identifying outliers as the scatter plots shown in articles 1 through 3, nor are they good at displaying geographic relations as are the maps shown in article 4. Heatmaps are best used to get the overall sense of a dataset. Figures 1, 2 and 3 show heatmaps of ISO region vs. URL with scaling by column (ISO Region), row (URL) and none. It is important to know how scaling is done when reading a heatmap as these three interactive graphs show.
Figure 1 is scaled by column, and shows that the
social-buttons.com article is the most important article in all geographies, followed
effective-yield-loan-amortization in Saudi Arabia, Nigeria the Philippines and Egypt and and
sales-and-lead-management-with-suite-crm in Lithuania, Slovakia and other smaller countries.
Figure 2 is scaled by row and is visually very different; it shows that the overwhelming sources of traffic for all pages are the US, the UK and other English-speaking countries. This isn’t a surprise since the site is English-only.
Figure 3 is scaled across all cells, and shows that from the perspective of all traffic, only three articles and three countries are important:
stopping-rachel-from-cardholder-services are the main articles and are largely accessed in the US, the UK and by users whose country is not set.
Heatmaps are easier to read than scatterplots, but don’t necessarily yield as much information and can be misleading if the user does not know how data was scaled for the plot.
This article was written in RStudio and uses the
d3heatmap package for all graphics.