Data visualization automation isn’t just a dream. This short tutorial shows you how to create and save 154 charts as high-resolution PNG image files with just 10 lines of R code.
Years ago creating multiple charts for publication was painful. Wrangling the data to get it in the right format in Excel or painstakingly adding manual configurations to In Design was not fast nor fun. And updating the graphics with any changes in the dataset was equally, if not more, frustrating.
Thankfully, today's data professionals have many tools at their disposal to generate data visualization content at scale.
We start by loading the tidyverse library, which gives us the data wrangling (via dplyr) and data visualization (via ggplot2) functions to get things started. We also bring in the lesser-known hrbrthemes library to give us some opinionated design options.
library(tidyverse)
library(hrbrthemes)
We'll use a csv file that you can grab from GitHub and define it as countries.
countries <- read_csv("input_data/countries.csv")
A quick look at the data reveals three columns that represent country name, year, and population in millions. So the dataset tells us how the total population (in millions) has changed around the world between 1991 and 2018. Here are the raw records for Albania.
country | year | population |
Albania | 1991 | 3.26679 |
Albania | 1992 | 3.247039 |
Albania | 1993 | 3.227287 |
Albania | 1994 | 3.207536 |
Albania | 1995 | 3.187784 |
Albania | 1996 | 3.168033 |
Albania | 1997 | 3.148281 |
Albania | 1998 | 3.12853 |
Albania | 1999 | 3.108778 |
Albania | 2000 | 3.089027 |
Albania | 2001 | 3.060173 |
Albania | 2002 | 3.05101 |
Albania | 2003 | 3.039616 |
Albania | 2004 | 3.026939 |
Albania | 2005 | 3.011487 |
Albania | 2006 | 2.992547 |
Albania | 2007 | 2.970017 |
Albania | 2008 | 2.947314 |
Albania | 2009 | 2.927519 |
Albania | 2010 | 2.913021 |
Albania | 2011 | 2.905195 |
Albania | 2012 | 2.900401 |
Albania | 2013 | 2.895092 |
Albania | 2014 | 2.889104 |
Albania | 2015 | 2.880703 |
Albania | 2016 | 2.876101 |
Albania | 2017 | 2.873457 |
Albania | 2018 | 2.866376 |
We want to make one line chart for each country in the dataset. We start by adding unique(countries$country) to the loop definition so that each country gets cycled through the two nested functions.
for(target_country in unique(countries$country)) {
ggplot(countries %>% filter(country == target_country),
aes(x = year, y = population)) +
geom_line() + geom_point(color='blue') +
labs(title = target_country, subtitle = 'Population in millions from 1990 to 2018',
y = 'Population in millions', x = 'Year') + theme_ipsum()
ggsave(filename = str_c(target_country, '.png'), path = 'output_charts') }
The first function, ggplot(), creates a simple line chart with a few labels and design options along the way. Adding + theme_ipsum() at the end cleans up the appearance nicely courtesy of the hrbrthemes library that we loaded above.
After a given plot is created with ggplot(), we then use ggsave() to export each country's chart. Here we use the unique country name as the file name with a .png extension as the expected output.
Albania, our example country, reveals a steep decline in population over the period.
Don’t like that our y-axis doesn’t start at 0 for all countries? Simply add + ylim(0, NA) to the ggplot() function and it will fix things right up.
Now the fun part. Send your beautiful charts to your marketing, publications, or comms teams for them to share with the world. Just don’t tell them it took you only 10 lines of code to get it done!
Finally, for the R purists out there who find traditional loops offensive, here is a functional approach using purr. In terms of total processing time for this example, however, purr was only one second faster on my desktop to create all the saved images (19 seconds vs. 20 seconds). The code below is extended from the indispensable R for Data Science by Hadley Wickham and Garrett Grolemund.
plots <- countries %>%
split(.$country) %>%
map(~ggplot(., aes(x = year, y = population)) +
geom_line() + geom_point(color='blue') +
labs(title = .$country, subtitle = 'Population in millions from 1990 to 2018',
y = 'Population in millions', x = 'Year') +
theme_ipsum()
)
paths <- str_c(names(plots), ".png")
pwalk(list(paths, plots), ggsave, path = "save_charts/output_charts")