Coloring maps in R for social sciences

“It’s all about matching perceptual dimensions with data dimensions” Cindy Brewer

Plotting a map brings about two issues: colors and scale, as they both have to work together to best describe your data or else failure is on the line.

To have a better perception of how colors look on a map, I find very useful this website Color Brewer  by Cindy Brewer  (read a recent interview here), which gives advice on map colors, hues and scales for various backgrounds and contexts.

R makes it easy to choose a palette thanks to RColorBrewer library, so that you don’t have to create one by yourself (you can see the available combinations by calling display.brewer.all() function):

Rplot

  • Sequential: continuous variable with data ranging from relatively low to relatively high or interesting values;
  • Qualitative:categorical variables with no specific ordering;
  • Diverging: continuous variable for data where both large low and high values are interesting, or a scale comprising negative to positive values. Usually cold colors denote low or negative (-) values while warm colors (red, orange) denote high or positive (+) values. Also, the mid-point should mean something and add information to your message (e.g. national average);

The choice of colors is trickier, depending on the subject mapping, the message to convey and the context:

  • Rainbow palette: think again;
  • Red means “look at me”, so use it to highlight something meaningful;

    percForeign2008
    1. Foreign presence in Spain, year 2008;
  • Similarly use bright or dark colors to highlight important information, contrasting it to softer/pale tones;
  • Prefer a single hue palette if possible;
  • Be aware that specific colors may have specific cultural meanings (!), a few examples:
  1. Red: in South Africa it’s the color of mourning;
  2. Orange: color of Protestants in Ireland;
  3. Yellow: color of mourning in Egypt and a positive color in Asia;
  4. Purple: color of mourning in Thailand.

for further information see this comprehensive graph of colors in culture. A good rule is also not to choose colors in order to give a good/bad message (green vs red or blue vs red), unless you are mapping the number of drowned kittens.

Once solved the color issue, which I’d like to stress should be weighted together with the choice of a specific palette, we can choose a number of class intervals, that is to say how many colors we are going to use. If I use a diverging palette for continuous variables,  I prefer to have an odd number of colors, either 5 or 7 (but this fits my specific mapping requirements), so that the mid value has neutral light tones. Also, ideally breaks should mean something and not be arbitrarily chosen. R once again has a solution for this, the classInt library, which provides a set of styles to choose from for continuous numerical variables (sd, equal, pretty, quantile, kmeans, hclust, bclust, fisher, or jenks), as well as the option to set them manually (fixed).

  • equal: equal distance, ideally for data with a normal distribution;
  • quantile: quantiles are good for data with a skewed distribution;
  • jenks/fisher: my personal favorites, it tries to reduce the variance within classes and maximize the variance between classes;

n <- 5 # how many colors?
variable # my variable of choice
category <- classIntervals(variable, n,style = "jenks",na.ignore=T)
palette <- brewer.pal(n,"RdBu")
color <- findColours(category,(palette))
bins <- category$brks
lb <- length(bins)
plot(spain, col=color,border=T)
legend("topright",fill=palette,legend=(paste(round(bins[-length(bins)],1),"-",round(bins[-1],1))),cex=2, bg="white")

blogmap1
2. Mean Age at Childbearing for year 1981 in Spain

Of course we can edit pretty much everything to tailor the map to our needs and preferences. For instance the above map portrays 910 areas and I prefer to suppress borders to avoid overcrowding by setting plot(spain, col=color,border=F) and using the layout function to separate the plot from legends to get something like this:

percForeign2008
3. TFR difference between Spaniards and Foreigners

layout(matrix(c(1,2,3),1,3,byrow=T), widths=c(1,1,0.35), heights=1)

A space-time box plot of Spain’s TFR for 910 comarcas.

The idea behind spatial analysis is that space matters and near things are more similar: a variable measured in city A is (ideally) different from the same variable measured in city B. A simple way to get a feeling and to represent this hypothesis is through graphical visualization, usually a map(s).

TFRG_all_4years_Spain

However, when dealing with time series maps are cumbersome and  with sometimes some information is lost, such as the national average or path convergence. Box plots are a simple yet very effective way to synthesize a lot of information in one graph. The following plot depicts TFR over a 30 years period for 910 Spanish areas with respect to the national average value (thick black line in the middle of the boxes).

p <- ggplot(dat, aes(x=factor(YEAR), y=dat$TFR))
p <- p + geom_boxplot()
p <- p + scale_y_continuous(limits=c(0,2.5)) + scale_x_discrete("YEAR", breaks=seq(1981,2011,by=5))

TFRG

Moran plots in ggplot2

Moran plots are one of the many way to depict spatial autocorrelation:
moran.test(varofint,listw)
where “varofint” is the variable we are studying, “listw” a listwise neighbourhood matrix, and the function “moran.test” performs the Moran’s test (duh!) for spatial autocorrelation and is included in the spdep funtionality. The same plot can be done using ggplo2 library. Provided that we already have our listwise matrix of neighborhood relationships listw, we first define the variable and the lagged variable under study, computing their mean and saving them into a data frame (there are a lot of datasets you can find implemented in R: afcon, columbus, syracuse, just to cite a few). The purpose is to obtain something that looks like this (I have used my own *large* set of Spanish data to obtain it):

ggplot2.moranplot1

Upload your data. Here is Anselin (1995) data on African conflicts, afcon:

data(afcon)
varofint listw varlag var.name <- "Total Conflicts"
m.varofint m.varlag
and compute the local Moran's statistic using localmoran:

lisa
and save everything into a dataframe:
df

use these variables to derive the four sectors "High-High"(red), "Low-Low"(blue), "Low-High"(lightblue), "High-Low"(pink):
df$sector significance vec =df$m.varofint & df$varlag>=df$m.varlag]  df$sector[df$varofint<df$m.varofint & df$varlag<df$m.varlag]  df$sector[df$varofint<df$m.varofint & df$varlag>=df$m.varlag]  =df$m.varofint & df$varlag<df$m.varlag]

df$sec.data

df$sector.col[df$sec.data==1] <- "red"
df$sector.col[df$sec.data==2] <- "blue"
df$sector.col[df$sec.data==3] <- "lightblue"
df$sector.col[df$sec.data==4] <- "pink"
df$sector.col[df$sec.data==0] <- "white"

df$sizevar df$sizevar 0.1)
df$FILL df$BORDER
to get the ggplot graph:
p 0.05", "High-High", "Low-Low","Low-High","High-Low"))+
scale_x_continuous(name=var.name)+
scale_y_continuous(name=paste("Lagged",var.name))+
theme(axis.line=element_line(color="black"),
axis.title.x=element_text(size=20,face="bold",vjust=0.1),
axis.title.y=element_text(size=20,face="bold",vjust=0.1),
axis.text= element_text(colour="black", size=20, angle=0,face = "plain"),
plot.margin=unit(c(0,1.5,0.5,2),"lines"),
panel.background=element_rect(fill="white",colour="black"),
panel.grid=element_line(colour="grey"),
axis.text.x  = element_text(hjust=.5, vjust=.5),
axis.text.y  = element_text(hjust=1, vjust=1),
strip.text.x  = element_text(size = 20, colour ="black", angle = 0),
plot.title= element_text(size=20))+
stat_smooth(method="lm",se=F,colour="black", size=1)+
geom_vline(xintercept=m.varofint,colour="black",linetype="longdash")+
geom_hline(yintercept=m.varlag,colour="black",linetype="longdash")+
theme(legend.background =element_rect("white"))+
theme(legend.key=element_rect("white",colour="white"),
legend.text =element_text(size=20))

Check out the interactive shiny version on pracademic

You can find me at PAA 2015 Poster Session 2 and 3!

PAA2015_Poster_Spatial2PAA2015_Poster_Spatial_30You can find me at PAA 2015:

The Relationship between Space and Time: a Spatial Approach, Poster Session 2, slot 15, 10:30 AM – 12:30 PM

A Spatial Econometrics Analysis of Three Decades of Fertility Change in Spain, Poster Session 3, slot 27 Thursday, April 30 1:00 PM – 3:00 PM

Indigo Ballrooms A-H Level 2

A ggmap of 2015 Israeli elections by city

IL_el_percThe recent Israeli elections are a reminder of how Demography and Space play a crucial role in the outcome of the 20th Knesset. For more insight, read the full Demotrends blog post by Ashira Menashe-Oren the demographics of the Israeli electorate here. The map has been done using ggmap and ggplot, two simple mapping tools I really like. If you are interested in the code, below you can find the relative syntax and data.

To start upload the libraries:

library(maptools) #reads the shape file

library(ggmap)

library(ggplot2)

Download the shape file (I normally use Diva-GIS website) and read it:

map.ogr<- readOGR(".","ISR_adm1")

Data set:

df <- structure(list(lon = c(35.148529, 35.303546, 34.753934, 34.781768,34.989571, 34.824785, 34.808871, 34.883879, 34.844675, 34.90761, 35.010397, 34.871326, 35.21371, 34.655314, 34.887762, 34.792501, 34.574252, 34.791462, 34.748019, 34.787384, 34.853196, 34.811272, 34.919652, 34.888075, 35.098051, 35.119773, 34.872938, 34.835226, 34.988099, 35.002462), lat = c(32.517127, 32.699635, 31.394548, 32.0853, 32.794046, 32.068424, 32.072176, 32.149961, 32.162413, 32.178195, 31.890267, 32.184781, 31.768319, 31.804381, 32.084041, 31.973001, 31.668789, 31.252973, 32.013186, 32.015833, 32.321458, 31.892773, 32.434046, 31.951014, 33.008536, 32.809144, 31.931566,32.084932, 31.747041, 31.90912), City = structure(c(30L, 19L,24L, 29L, 9L, 25L, 7L, 11L, 10L, 14L, 16L, 23L, 13L, 1L, 21L,28L, 2L, 4L, 3L, 12L, 20L, 27L, 8L, 15L, 18L, 22L, 26L, 6L, 5L, 17L), .Label = c("Ashdod", "Ashkelon", "Bat yam", "Beersheva",  "Beit  Shemesh", "Bnei brak", "Giv'atayim", "Hadera", "Haifa",  "Herzliyya", "Hod HaSharon", "Holon", "Jerusalem", "Kefar Sava",  "Lod", "Modi'in - Makkabbim - Re'ut", "Modi'in Illit", "Nahariyya", "Nazareth ", "Netanya", "Petach Tikva", "Qiryat Atta", "Ra'annana",  "Rahat", "Ramat gan", "Ramla", "Rehovot", "Rishon", "Tel-Aviv",  "Umm Al-Fahm"), class = "factor"), most.votes = c(96.28, 91.41,  87.62, 34.03, 24.98, 30.93, 40.1, 38.77, 34.2, 34.66, 28.95,  32.75, 23.9, 30.96, 27.87, 29.78, 39.31, 37.17, 32.88, 30.86,  33.14, 26.95, 31.77, 32.22, 34.25, 35.01, 39.1, 57.56, 27.89,  71.63), party = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,  2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L), .Label = c("joint list", "labour", "likud", "yahadut hatora"), class = "factor")), .Names = c("lon", "lat",  "City", "most.votes", "party"), class = "data.frame", row.names = c(NA,  -30L))

get the map using “get_map"

gmap <- get_map(location=c(34.2,29.4,36,33.5),zoom=7,source="stamen",maptype="watercolor")

and plot the map:

ggmap(gmap)+

geom_polygon(aes(x = long, y = lat, group=id), data = map.ogr, color ="blue", fill ="white", alpha = .8, size = .4)+

geom_point(aes(x=lon,y=lat,color=party,size=most.votes),data=df)+ scale_colour_discrete("Coalition", labels = c("Joint List", "Labour","Likud","United Torah Judaism"), breaks = c("joint list", "labour","likud","yahadut hatora")) + scale_size_continuous("Coalition", labels = c("Joint List", "Labour","Likud","United Torah Judaism"), breaks = c("joint list", "labour","likud","yahadut hatora"), range=c(10,15), guide = FALSE)+ theme(axis.text=element_text(size=18), plot.title=element_text(size=rel(3)), legend.key = element_rect(fill = "white"), legend.background =element_rect("white"), legend.text = element_text(size = 25), legend.title = element_text(size = 25))+ guides(colour = guide_legend(override.aes = list(size=8)))+ labs(x="",y="")

IL_el_perc_city_names_color If you want to add city names you can use the “annotate” option, adding the code below after guides(...)+. I have modified the coordinates to avoid overlapping of labels and colored names to match the color of the winner party.

annotate("text",x=c(35.14853+ 0.2,35.21371+0.15,35.00246+ 0.15,34.79146+0.15, 34.98957-0.08,34.78177-0.14), y=c(32.51713,31.76832,31.90912,31.25297, 32.79405,32.08530),size=5,font=3, label=c("Umm Al-Fahm","Jerusalem","Modin  Illit","Beersheva","Haifa","Tel Aviv"), color=c("darkred","blue4","deeppink4", "blue4","springgreen4","green4"))+

For beginners I highly recommend ggplot2 mailing list, a great and shame-free place to learn.

A Spatial Analysis of Recent Fertility Patterns in Spain – EPC poster

Here is the third poster session winner for EPC Budapest 2014 presenting the main results for our research on fertility differentials in Spain!
EPC Poster here is the link to the high resolution pdf, in case you’re interested.

EPC Poster_final Alessandra Carioli
All the graphics have been realized in R using maptools library for maps and ggplot library for graphs.

The working paper will soon follow.

Location, location, location! Why space matters in demography and why we should care.

Read my first contribution to the Demotrends blog! and don’t forget to like Demotrends either in facebook or twitter 🙂
Of course all graphics have been realized in R (maptools library and a bunch of others).
Location, location, location! Why space matters in demography and why we should care..