Violin plots in ggplot2

Use geom_violin() to quickly plot a visual summary of variables, using the Boston dataset, MASS library.

Use geom_violin() to quickly plot a visual summary of variables, using the Boston dataset from the MASS library.

1. Upload the relevant libraries:


2. Load data and use the tidyr package to transform wide into long format:

dt.long <- gather(Boston, "variable",
"value", crim:medv)

3. Create some color palettes:

col <- colorRampPalette(c("red", "blue"))(14)
# col.bp <- brewer.pal(9, "Set1") # brewer.pal only has a max of 9 colors
col.rc <- as.vector(distinctColorPalette(14))

4. Plot(s):

  • With the standard colors produced by ggplot2:
ggplot(dt.long,aes(factor(variable), value))+
geom_boxplot(alpha=0.3, color="black", width=.1)+
labs(x = "", y = "")+
theme(legend.title = element_blank())+
facet_wrap(~variable, scales="free")


  • With the color palette produced by colorRampPalette:
ggplot(dt.long,aes(factor(variable), value))+
geom_boxplot(alpha=0.3, color="black", width=.1)+
labs(x = "", y = "")+
scale_fill_manual(values = col, name="")+
facet_wrap(~variable, scales="free")


  • With the color palette produced by randomcoloR library:
ggplot(dt.long,aes(factor(variable), value))+
geom_boxplot(alpha=0.3, color="black", width=.1)+
labs(x = "", y = "")+
scale_fill_manual(values = col.rc, name="")+
facet_wrap(~variable, scales="free")


Find color breaks for mapping (fast)

I’ve stumbled upon a little trick to compute jenks breaks faster than with the classInt package, just be sure to use n+1 instead of n as the breaks are computed a little bit differently. That is to say, if you want 5 breaks, n=6, no biggie there.

For more on the Bayesian Analysis of Macroevolutionary Mixtures see BAMMtools library

system.time(getJenksBreaks(mydata$myvar, 6))
> user system elapsed
> 0.970 0.001 0.971

On the other hand this takes way more time with large datasets
system.time(classIntervals(mydata$myvar, n=5, style="jenks"))
> Timing stopped at: 1081.894 1.345 1083.511

Saving graphics in R

A brief minimal guide on saving graphics in R

This is intended to be a minimalistic guide on how to save graphics in an R environment giving tips on formats and codes.

1. What format?

a. Vector files like PDF, EPS, PS, SVG: high quality, easility resizable and works in any anvironment. In particular, I find PDF to work great with LaTeX, ppt, and word. pdf(“mygraph.pdf”).
b. WMF: easily resizable but works only in a Windows environment. I don’t own or work with Windows, so I have never used this format. The general command is win.metafile(“mygraph.wmf”). I personally despise word as a writing tool, I wrote my master thesis in it and it was a nightmare, but if you really need to use it… If you have a MAC (and you are still using word) I recommend you take a look at this website for inspiration. If you work in a Windows environment free alternatives here, and here (mostly for reports or lecture notes, but I know people who write entire articles), not free here, …And LaTeX for All. If you work with Linux you’re porbably laughing.
c. JPG   –> never use jpg formats
d. PNG, TIFF are bitmap (or raster) formats, preferable for raster graphics, such as photos. png(“myplot.png”) or tiff(“myplot.tiff”). Good to know: to make more than one page of  graphs add the -%d. as in png(“plot-%d.png”) see example 3.
e. svg is another vector format, like pdf or eps. Default settings for svg() does not allow for multiple pages in a single file

f. one extra mention for the .eps format, the one I normally use and that I find the most practical. I use it to store all graphs for the most disparate purposes: to include them into a LaTeX document (it will just transform your .eps files into .pdf(s) and add them to your library), for presentations in ppt, keynote or LaTeX (again) and publications. Windows usually does not visualize authomatically encapsulated scripts **but** if you own a Windows machine, you can always download a program such as Ghostscript ,GIMP , Photoshop , or EPS viewer


2. How to use it?

This works with most plotting libraries: (1) first call your format saving line (e.g. pdf, png, jpeg, postscript…), (2) plot commands, (3) tells to stop saving whatever you are plotting, meaning that if you don’t call it you may end up with a bunch of graphs on the same page.

example 1:
plot(x, y) closes the graphics device, it stops the saving of any further plotting commands, so be sure to add it when you are done with plotting

Alternatively you can use the dev.print command, which produces postscript prints:

example 2:
plot(x, y)
dev.print(pdf, "myplot.pdf") #here I use pdf, but it can be any other format... see this link ()

example 3:  Multiple pages: 1 plot per page…
plot(x, y)

plot(x, y)

3. What if I am using ggplot2 or ggmap?
ggsave You only need to specify the filename, it’s very convenient for quick plots. It saves the last plot that you displayed, the default size is the size of the current graphics device, unless otherwise specified in height and width and the unit measure can be in cm, inches or mm units =(“cm”, “inches”, “mm”) . It guesses the type of graphics device from the extension: see this for more details
Of course, you need plot in ggplot2 to use ggsave…

4. What size?
The codes above format plots according to the size in which they are diplayed in R or by default values (in inches). Sizing can be controlled via width, height.

pdf("myplot.pdf", width=10, height=5)
a # I am using code from this post

5. How to customize graphsThis works with pdf() and postscript() -I always use postscript…-
As usual, like most things in R, everything is highly customizable, for instance you can:
1. Define the font family to be used via the family option. The default is Helvetica but you can find an exhaustive list of fonts here
2. bg changes the background color (I usually set it to transparent bg= “transparent”, so that I don’t have problems when using graphs for presentations, especially if they are in powerpoint or keynote)
3. horizontal direction of the printed image, if set to FALSE it’s vertical

… and much more… Those three are the ones that I most commonly use.


To paraphrase Röyksopp, what else is there?

Well, a lot…

  1. the Cairo package (see this link)  to export anti-aliased, high resolution plots in R for Windows
  2. I have purposedly avoided mentioning the lattice package, mostly because I don’t use it. Lattice is a trellis graphics system that exists in parallel with the normal R graphics system and the graph exporting system is a bit different from that of other environments. A great intro for anyone who wants a go at lattice is this set of slides. In general, when plotting in R you have plenty of choice and usually one environment (either ggplot, plot basic, or lattice just to mention some) is enough to do everything.
  3. Some journals have strict requirements for  graphs quality, and the .eps format called via postscript(“myveryniceplot.epc”,  paper= “special” , onefile= FALSE) seems to do the trick
  4. The whole world of interactive plotting: see ggvis, plotly, htmlwidgets, googleVis, and shiny (just to mention a few)