06/03/2021

Licensing Consultant

Not just any technology

How to create ggplot labels in R

Labeling all or some of your info with text can aid notify a story —...

Labeling all or some of your info with text can aid notify a story — even when your graph is utilizing other cues like colour and dimensions. ggplot has a couple of developed-in means of carrying out this, and the ggrepel package deal adds some extra operation to individuals choices. 

For this demo, I’ll commence with a scatter plot wanting at proportion of older people with at minimum a 4-calendar year school diploma vs. known Covid-19 situations for every capita in Massachusetts counties. (The concept: A school education and learning may well imply you’re extra very likely to have a task that lets you operate securely from home. Of study course there are a good deal of exceptions, and numerous other things influence infection costs.)

If you want to abide by together, you can get the code to re-generate my sample info on web page two of this short article.

Producing a scatter plot with ggplot

To commence, the code under masses several libraries and sets scipen = 999 so I don’t get scientific notation in my graphs:

library(ggplot2)
library(ggrepel)
library(dplyr)
choices(scipen = 999)

Right here is the info construction for the ma_info info body:

head(ma_info)
                Place AdultPop Bachelors PctBachelors CovidPer100K Positivity    Location
1          Barnstable   165336     70795    .4281887          7.     .0188 Southeast
two           Berkshire    92946     31034    .3338928          nine.     .0095      West
three             Bristol   390230    109080    .2795275         30.eight     .0457 Southeast
four Dukes and Nantucket    20756      9769    .4706591         twenty five.three     .0294 Southeast
5               Essex   538981    212106    .3935315         29.5     .0406 Northeast
6            Franklin    53210     19786    .3718474          four.7     .0052      West

The following team of code generates a ggplot scatter plot with that info, like sizing details by whole county inhabitants and coloring them by area. geom_easy() adds a linear regression line, and I also tweak a couple of ggplot design and style defaults. The graph is saved in a variable known as ma_graph.

ma_graph <- ggplot(ma_data, aes(x = PctBachelors, y = CovidPer100K, 
dimensions = AdultPop, colour = Location)) +
geom_place() +
scale_x_ongoing(labels = scales::%) +
geom_easy(process='lm', se = Untrue, colour = "#0072B2", linetype = "dotted") +
topic_minimal() +
guides(dimensions = Untrue)

That generates a basic scatter plot:

ggplot2 scatter plot with percent college education on x axis and Covid-19 infection rates on y axis Sharon Machlis, IDG

Fundamental scatter plot with ggplot2.

On the other hand, it is at this time difficult to know which details characterize what counties. ggplot’s geom_text() functionality adds labels to all the details:

ma_graph +
geom_text(aes(label = Place))
ggplot scatter polot with default text labels Sharon Machlis

ggplot scatter plot with default text labels.

geom_text() utilizes the exact same colour and dimensions aesthetics as the graph by default. But sizing the text primarily based on place dimensions will make the small points’ labels tough to read. I can end that actions by placing dimensions = NULL.

It can also be a little bit tough to read labels when they’re ideal on best of the details. geom_text() lets you “nudge” them a little bit better with the nudge_y argument.

There is an additional developed-in ggplot labeling functionality known as geom_label(), which is comparable to geom_text() but adds a box close to the text. The following code utilizing geom_label() creates the graph demonstrated under.

ma_graph +
geom_label(aes(label = Place, dimensions = NULL), nudge_y = .7)
ggplot scatter plot with geom_label() Sharon Machlis, IDG

ggplot scatter plot with geom_label().

These capabilities operate nicely when details are spaced out. But if info details are closer together, labels can stop up on best of each other — primarily in a more compact graph. I added a phony info place near to Middlesex County in the Massachusetts info. If I re-operate the code with the new info, Faux blocks component of the Middlesex label.

ma_graph2 <- ggplot(ma_data_fake, aes(x = PctBachelors, y = CovidPer100K, size = AdultPop, color = Region)) +
geom_place() +
scale_x_ongoing(labels = scales::%) +
geom_easy(process='lm', se = Untrue, colour = "#0072B2", linetype = "dotted") +
topic_minimal() +
guides(dimensions = Untrue)
ma_graph2
ma_graph2 +
geom_label(aes(label = Place, dimensions = NULL, colour = NULL), nudge_y = .75)
ggplot2 scatter plot with labels on top of each other Sharon Machlis, IDG

ggplot2 scatter plot with default geom_label() labels on best of each other

Enter ggrepel.

Producing non-overlapping labels with ggrepel

The ggrepel package deal has its very own variations of ggplot’s text and label geom capabilities: geom_text_repel() and geom_label_repel(). Applying individuals functions’ defaults will immediately shift one of the labels under its place so it does not overlap with the other one.

As with ggplot’s geom_text() and geom_label(), the ggrepel capabilities allow for you to established colour to NULL and dimensions to NULL. You can also use the same  nudge_y arguments to generate extra area amongst the labels and the details.

ma_graph2 + 
geom_label_repel(info = subset(ma_info_phony, Location == "MetroBoston"),
aes(label = Place, dimensions = NULL, colour = NULL), nudge_y = .75)
Scatter plot with labels not overlapping for close points Sharon Machlis, IDG

Scatter plot with geom_label_repel().

The graph higher than has the Middlesex label higher than the place and the Faux label under, so there’s no hazard of overlap.

Focusing interest on subsets of info with ggrepel

Sometimes you may perhaps want to label only a handful of details of distinctive interest and not all of your info. You can do so by specifying a subset of info in the info argument of geom_label_repel():

ma_graph2 + geom_label_repel(info = subset(ma_info_phony, Location == "MetroBoston"), 
aes(label = Place, dimensions = NULL, colour = NULL),
nudge_y = two,
section.dimensions = .two,
section.colour = "grey50",
course = "x"
)
Scatter plot with only some points labelled Sharon Machlis, IDG

Scatter plot with only some details labeled. 

Customizing labels and traces with ggrepel

There is extra customization you can do with ggrepel. For case in point, you can established the width and colour of labels’ pointer traces with section.dimensions and section.colour

You can even convert label traces into arrows with the arrow argument:

ma_graph2 + geom_label_repel(aes(label = Place, dimensions = NULL),
arrow = arrow(size = unit(.03, "npc"),
type = "shut", ends = "last"),
nudge_y = three,
section.dimensions = .three
)
Scatter plot with ggrepel labels and arrows. Sharon Machlis, IDG

Scatter plot with ggrepel labels and arrows.

And you can use ggrepel to label traces in a multi-collection line graph as nicely as details in a scatter plot.

For this demo, I’ll use another info body, mydf, which has some quarterly unemployment info for 4 US states. The code for that info body is also on web page two. mydf has three columns: Price, Point out, and Quarter.

In the graph under, I come across it a minor tough to see which line goes with what point out, mainly because I have to look back again and forth amongst the traces and the legend.

graph2 <- ggplot(mydf, aes(x = Quarter, y = Rate, color = State, group = State)) +
geom_line() +
topic_minimal() +
scale_y_ongoing(extend = c(, ), limitations = c(, NA))
graph2
line graph with 4 lines and a legend to the right Sharon Machlis, IDG

ggplot line graph.

In the following code block, I’ll include a label for each line in the collection, and I’ll have geom_label_repel() place to the second-to-last quarter and not the last quarter. The code calculates what the second-to-last quarter is and then tells geom_label_repel() to use filtered info for only that quarter. The code uses the Point out column as the label, “nudges” the info .75 horizontally, eliminates all the other info details, and receives rid of the graph’s default legend.

second_to_last_quarter <- max(mydf$Quarter[mydf$Quarter != max(mydf$Quarter)])
graph2 +
geom_label_repel(info = filter(mydf, Quarter == second_to_last_quarter),
aes(label = Point out),
nudge_x = .75,
na.rm = Genuine) +
topic(legend.placement = "none")
Line graph with label for each line Sharon Machlis, IDG

Line graph with ggrepel labels.

Why not label the last quarter in its place of the second-to-last one? I experimented with that initial, and the pointer traces finished up wanting like a continuation of the graph’s info:

Line graph with confusing label pointing lines at the end of each line Sharon Machlis, IDG

Line graph with perplexing label pointing traces.

The best two traces need to not be starting to development downward at the stop!

If you want to come across out extra about ggrepel, test out the ggrepel vignette with

vignette("ggrepel", "ggrepel")