Bee Swarm charts in R using ggplot

Bee Swarm charts are a great way of showing the distribution of data within a category. They can also be used to represent the magnitude of a related variable through the size of the swarm point. This is something the New York Times has used very effectively in their interactive vaccination rates by county social vulnerability chart through sizing the swarm points by county populations.

Although bee swarms are relatively common D3js visualizations, they are not as commonly used in R based charts. Using a four step process, we can combine ggplot with the ggbeeswarm and packcircles packages to achieve the same style of bee swarm charts in R.

Getting started

The three packages we’re going to use are tidyverse, ggbeeswarm and packcircles.

library(tidyverse)
library(ggbeeswarm)
library(packcircles)

For the purposes of this tutorial, we’re going to create a data frame containing random values for both coordinates along the x-axis and size, with all of the data points belonging to the same group. Although outside of the scope of this tutorial, differentiating group allows us to create comparative distributions like in the New York Times example.

rand_points <- data.frame(x_group = "group",
                          x=rnorm(200, mean=50, sd=10),
                          size=rexp(200, 1))

Our random data, data frame will look something like this:

##   x_group        x       size
## 1   group 48.81626 1.06354986
## 2   group 26.50262 0.40919700
## 3   group 44.14879 0.07213613
## 4   group 57.60283 0.56124153
## 5   group 62.85665 0.13134574
## 6   group 44.45474 0.38002462

Step 1: Create a ggplot chart with the data set

We’re going to create a ggplot chart in order to eventually extract data from it for use in our final bee swarm chart. This chart will allow us to use geom_beeswarm to place our data points onto the chart, which we will then re-size and then re-pack using the packcircles package. This chart can be a very simple ggplot object, because outside of the x and y coordinates for our bee swarm points, we will be throwing the rest of the chart data away.

beeswarm_without_size <- rand_points %>%
  ggplot(aes(x=x, y=x_group)) +
  geom_beeswarm(groupOnX=FALSE)

This produces a bee swarm chart that looks like this:

Step 2: Extract the relevant data from the ggplot object

First we’re going to build the beeswarm_without_size chart we created in step 1 into an intermediary variable

oldbee_chart_data <- ggplot_build(beeswarm_without_size)

Next we’re going to create a new data frame containing the data within the pre-rendered ggplot object oldbee_chart_data. Because this was a very simple chart, with no other geoms placed before geom_beeswarm, the data will be located like this: oldbee_chart_data > data > [[1]] > x or y

We are also going to re-add our size data for our data points back into the new data frame under the r column name. This will be interpreted as a radius value in our next step.

newbee_frame <- data.frame(x=oldbee_chart_data$data[[1]]$x, 
                           y=oldbee_chart_data$data[[1]]$y, 
                           r=rand_points$size)

Our newly created data frame with the coordinates provided by ggplot and our own point sizes will look something like this:

##          x         y          r
## 1 48.81626 1.0000000 1.06354986
## 2 26.50262 1.0000000 0.40919700
## 3 44.14879 1.0099974 0.07213613
## 4 57.60283 0.9698270 0.56124153
## 5 62.85665 1.0086223 0.13134574
## 6 44.45474 0.9915534 0.38002462

Step 3: Repel the points and create polygons

Using the packcircles package, we use the circleRepelLayout function on our newbee_frame, which will generate a new set of coordinates that prevent the overlapping of our swarm points.

newbee_repel <- circleRepelLayout(newbee_frame, wrap=FALSE)

Once again we’re going to use the packcircles package to turn those repeled points into circle polygons we can then place into our final ggplot. We do this by using the circleLayoutVertices function. Muich like in step 2, we’re going to discard everything except for the layout data, which has created for us a set of data that looks like this:

##          x          y    radius
## 1 49.16399 -0.7210968 0.5818406
## 2 26.50262  1.0000000 0.3609037
## 3 46.37695  2.5305196 0.1515310
## 4 58.93265 -0.3659824 0.4226686
## 5 62.12639 -0.6018168 0.2044716
## 6 45.62201 -1.6546517 0.3478011

We’re then going to use that data, along side setting the columns that contain the x, y and radius values, into the circleLayoutVertices function.

newbee_repel_out <- circleLayoutVertices(newbee_repel$layout, xysizecols = 1:3)

This will give us a set of data that looks like this:

##          x          y id
## 1 49.74584 -0.7210968  1
## 2 49.72756 -0.5763989  1
## 3 49.67387 -0.4407930  1
## 4 49.58814 -0.3227995  1
## 5 49.47576 -0.2298325  1
## 6 49.34379 -0.1677335  1

We now have a final set of polygon points that can be used to create our bee swarm chart. Note the ID value, as this will be very important in creating our chart.

Step 4: Create the final bee swarm chart

We can now use the newbee_repel_out data to create our bee swarm chart.

newbee_repel_out %>% ggplot(aes(x, y, group=id)) +
  geom_polygon(aes(color=id, fill=id)) +
  coord_equal() +
  labs(title="This is a Bee Swarm Chart in R")

Note the use of grouping and equal coordinates. Both are essential to ensuring that we have proper circles for our bee swarm points.

The grouping by IDs will ensure that ggplot draws the polygons correctly. Failing to do so will result in a chart that looks like this:

Using coord_equal is essential to ensuring that we get properly sized circles for our swarm. If we don’t use coord_equal, the chart will look like this:

At this point we can apply all of the usual ggplot aesthetic themeing we want to our new bee swarm plot, making something both interesting and beautiful for our audience.

newbee_repel_out %>% ggplot(aes(x, y, group=id)) +
  geom_polygon(aes(color=id, fill=id), alpha=0.75) +
  coord_equal() +
  labs(x="X value", y="",
    title="This is a Bee Swarm Chart in R") +
  theme_void() +
  theme(legend.position = "none")

If you found this tutorial useful, please consider sharing it on your favorite social media site, and maybe even buying me a coffee. Thank you for your interest and support!

Bee swarm code combined

library(tidyverse)
library(ggbeeswarm)
library(packcircles)

set.seed(100)

rand_points <- data.frame(x_group = "group",
                          x=rnorm(200, mean=50, sd=10),
                          size=rexp(200, 1))

beeswarm_without_size <- rand_points %>%
  ggplot(aes(x=x, y=x_group)) +
  geom_beeswarm(groupOnX=FALSE)

oldbee_chart_data <- ggplot_build(beeswarm_without_size)

newbee_frame <- data.frame(x=oldbee_chart_data$data[[1]]$x, 
                           y=oldbee_chart_data$data[[1]]$y, 
                           r=rand_points$size)

newbee_repel <- circleRepelLayout(newbee_frame, wrap=FALSE)

newbee_repel_out <- circleLayoutVertices(newbee_repel$layout, xysizecols = 1:3)

newbee_repel_out %>% ggplot(aes(x, y, group=id)) +
  geom_polygon(aes(color=id, fill=id)) +
  coord_equal() +
  labs(x="X value", y="",
    title="This is a Bee Swarm Chart in R") +
  theme_void() +
  theme(legend.position = "none")