Coloring maps in R for social sciences

“It’s all about matching perceptual dimensions with data dimensions” Cindy Brewer

Plotting a map brings about two issues: colors and scale, as they both have to work together to best describe your data or else failure is on the line.

To have a better perception of how colors look on a map, I find very useful this website Color Brewer  by Cindy Brewer  (read a recent interview here), which gives advice on map colors, hues and scales for various backgrounds and contexts.

R makes it easy to choose a palette thanks to RColorBrewer library, so that you don’t have to create one by yourself (you can see the available combinations by calling display.brewer.all() function):

Rplot

  • Sequential: continuous variable with data ranging from relatively low to relatively high or interesting values;
  • Qualitative:categorical variables with no specific ordering;
  • Diverging: continuous variable for data where both large low and high values are interesting, or a scale comprising negative to positive values. Usually cold colors denote low or negative (-) values while warm colors (red, orange) denote high or positive (+) values. Also, the mid-point should mean something and add information to your message (e.g. national average);

The choice of colors is trickier, depending on the subject mapping, the message to convey and the context:

  • Rainbow palette: think again;
  • Red means “look at me”, so use it to highlight something meaningful;

    percForeign2008
    1. Foreign presence in Spain, year 2008;
  • Similarly use bright or dark colors to highlight important information, contrasting it to softer/pale tones;
  • Prefer a single hue palette if possible;
  • Be aware that specific colors may have specific cultural meanings (!), a few examples:
  1. Red: in South Africa it’s the color of mourning;
  2. Orange: color of Protestants in Ireland;
  3. Yellow: color of mourning in Egypt and a positive color in Asia;
  4. Purple: color of mourning in Thailand.

for further information see this comprehensive graph of colors in culture. A good rule is also not to choose colors in order to give a good/bad message (green vs red or blue vs red), unless you are mapping the number of drowned kittens.

Once solved the color issue, which I’d like to stress should be weighted together with the choice of a specific palette, we can choose a number of class intervals, that is to say how many colors we are going to use. If I use a diverging palette for continuous variables,  I prefer to have an odd number of colors, either 5 or 7 (but this fits my specific mapping requirements), so that the mid value has neutral light tones. Also, ideally breaks should mean something and not be arbitrarily chosen. R once again has a solution for this, the classInt library, which provides a set of styles to choose from for continuous numerical variables (sd, equal, pretty, quantile, kmeans, hclust, bclust, fisher, or jenks), as well as the option to set them manually (fixed).

  • equal: equal distance, ideally for data with a normal distribution;
  • quantile: quantiles are good for data with a skewed distribution;
  • jenks/fisher: my personal favorites, it tries to reduce the variance within classes and maximize the variance between classes;

n <- 5 # how many colors?
variable # my variable of choice
category <- classIntervals(variable, n,style = "jenks",na.ignore=T)
palette <- brewer.pal(n,"RdBu")
color <- findColours(category,(palette))
bins <- category$brks
lb <- length(bins)
plot(spain, col=color,border=T)
legend("topright",fill=palette,legend=(paste(round(bins[-length(bins)],1),"-",round(bins[-1],1))),cex=2, bg="white")

blogmap1
2. Mean Age at Childbearing for year 1981 in Spain

Of course we can edit pretty much everything to tailor the map to our needs and preferences. For instance the above map portrays 910 areas and I prefer to suppress borders to avoid overcrowding by setting plot(spain, col=color,border=F) and using the layout function to separate the plot from legends to get something like this:

percForeign2008
3. TFR difference between Spaniards and Foreigners

layout(matrix(c(1,2,3),1,3,byrow=T), widths=c(1,1,0.35), heights=1)

Advertisements

Author: acarioli

is a PostDoc at the Geography and Environment department of the University of Southampton, WorldPop project team. She is also affiliated researcher at CED, UAB and Dondena Centre. Her interests include spatial econometrics and modeling, bayesian methods, machine learning processes, forecasting, micro-data simulation, and data visualization. Demo-traveler, Mac enthusiast, R zealot and Rladies member.