Drill-down visualizations can be a good way to present a lot of data in a digestible format. In this example, we’ll create a graph of median home values by U.S. state using R and the highcharter package.

Median home values by state Sharon Machlis, IDG

Initial graph of median home values by state (highest and lowest 10 states). Data from Zillow.

Each state’s bar will be clickable — the drilldown — to see data by county.

Graph of median home values in Massachusetts counties Sharon Machlis, IDG

After clicking the bar for Massachusetts, a user sees median home values by Massachusetts county. Data from Zillow.

There are three main steps to making a drill-down graph with highcharter:

  1. Wrangle your data into the necessary format; 
  2. Create a basic top-level graph; and 
  3. Add the drill-down.

If you want to follow along, download state- and county-level data sets for the Zillow Home Value Index from Zillow at https://www.zillow.com/research/data/. I’m using the ZHVI Single-Family Homes series.

First, load the packages we’ll be using:

library(rio)
library(dplyr)
library(purrr)
library(highcharter)
library(scales)
library(stringr)

All can be installed from CRAN with install.packages() if you don’t already have them on your system.

Note that highcharter is an R wrapper for the Highcharts JavaScript library — and that library is only free for personal, non-commercial use (including testing it locally), or use by non-profits, universities, or public schools. For anything else, including government use, you need to buy a license. 

Next, I import the state and county CSV files into R with the following code. (My CSV files are in a data subfolder of my project directory.)

states <- import("data/State_zhvi.csv")
counties <- import("data/County_zhvi.csv")

These files have hundreds of columns, one for each month starting in 1996. I want to graph the most recent data, so I look for the name of the last column with

names(states)[ncol(states)]

At the time I wrote this, that returned 2020-06-30, which I’ll use as my MedianValue column. I’d like to compare that value to the start of the century, so I’ll also include 2020-01-31 as a PriceIn2000 column.

Data wrangling

Here’s my code for creating a latest_states data frame, which I’ll use as a base for the graph:

Copyright © 2020 IDG Communications, Inc.