Valar Morghulis: Some charts using GOT (tv-show) deaths

Drawing from one of the most important demographic laws, Valar Morghulis (all men must die), here is a simple summary of the deadly happenings in four seasons of GOT as reported by the Washington Post.

Let’s start by the total number of (portrayed) deaths by season:

df1 ggplot(df1,aes(x=factor(Series),y=Total))+
geom_bar(stat="identity",fill=c("yellow","orange","red","brown"))+
xlab("Season number")+
ylab("Total number of deaths")

Number of deaths by season box-plot
Number of deaths by season

ggplot(df1,aes(x=Series,y=Total))+
geom_line(lwd=2)+
xlab("Season number")+
ylab("Total number of deaths")

Number of deaths by season

by location in Westeros:

df2 Location=c("King's Landing","Beyond the Wall","Castle Black","The Twins","The Riverlands")
ggplot(df2,aes(x=factor(Location),y=Deaths))+
geom_bar(stat="identity",fill=c("lightblue","black","brown","darkseagreen","red"))+
ylab("Total number of deaths")+
xlab("")+
theme(axis.text=element_text(size=15))

Number of deaths by location

by method of death:
df3 Method=c("Animal","Animal Death","Arrows","Axe","Blade","Bludgeon","Crushing","Falling","Fire","Hands","HH item","Mace","Magic","Other","Poison","Spear","Unknown")
df3.1 df3.2 ggplot(df3.2,aes(x=factor(Method),y=value,fill=variable))+
geom_bar(stat="identity")+
ylab("")+
xlab("")+
theme(axis.text.x=element_text(size=15,angle=45))+
scale_fill_discrete(name ="Method of Death", labels=c("Season 1", "Season 2", "Season 3", "Season 4"))

Number of deaths by method
and lastly by House allegiance:
df4 House df4.1 df4.2 ggplot(df4.2,aes(x=reorder(factor(House),value),y=value,fill=variable))+
geom_bar(stat="identity")+
ylab("")+
xlab("")+
theme(axis.text.x=element_text(size=15,color="black"),
axis.text.y=element_text(size=15,color="black"))+
scale_fill_discrete(name ="House Allegiance", labels=c("Season 1", "Season 2", "Season 3", "Season 4"))+
coord_flip()

Number of deaths by house

Advertisements

Pyramid-like bar chart for climate change barriers

I was scrolling through the Independent and got hooked on a graph displaying the percentage of people’s concerns regarding climate change by country, and was extremely surprised by the results. UK and US lag far behind countries including China in wanting their governments to pursue a meaningful commitment to successfully address climate change.

newplot

library(ggplot2)
library(grid)
library(plyr)

dta<-
structure(list(country = structure(c(15L, 3L, 4L, 5L, 14L, 6L,
10L, 12L, 1L, 2L, 7L, 8L, 9L, 11L, 13L, 15L, 3L, 4L, 5L, 14L,
6L, 10L, 12L, 1L, 2L, 7L, 8L, 9L, 11L, 13L), .Label = c("Australia",
"China", "Denmark", "Finland", "France", "Germany", "Hong Kong",
"Indonesia", "Malaysia", "Norway", "Singapore", "Sweden", "Thailand",
"UK", "US"), class = "factor"), issue = c("Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage who think climate \nchange is 'not a serious problem' ",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change",
"Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change"
), perc = c(32L, 14L, 23L, 10L, 26L, 11L, 22L, 18L, 11L, 4L,
5L, 3L, 2L, 5L, 6L, -17L, -4L, -8L, -3L, -7L, -4L, -10L, -8L,
-3L, -1L, -1L, -1L, -1L, -1L, -1L)), .Names = c("country", "issue",
"perc"), row.names = c(NA, -30L), class = "data.frame")

p <- ggplot(dta, aes(reorder(country,perc),perc,fill=issue)) +
geom_bar(subset = .(issue == "Percentage who think climate \nchange is 'not a serious problem' "), stat = "identity",colour="black",alpha=0.5) +
annotate("text",x = 16.5, y = -12,label=dta$issue[16], fontface="bold")+
geom_bar(subset = .(issue == "Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change"),colour="black", stat = "identity",alpha=0.5) +
annotate("text",x = 16.5, y = 15,label=dta$issue[1], fontface="bold")+
scale_fill_manual(values = c("#F7320B", "#2BC931"))+
geom_text(subset = .(issue == "Percentage who think climate \nchange is 'not a serious problem' "),
aes(label=perc.a), position="dodge", hjust=-.35)+
geom_text(subset = .(issue == "Percentage that want their country's strategy not to agree \nto any international agreement that addresses climate change"),colour="black", stat = "identity",aes(label=perc.b), position="dodge", hjust=2)+
coord_flip() +
xlab("")+
ylab("")+
scale_x_discrete(expand=c(0.2,0.55))+
scale_y_continuous(limits=c(-22,32),
breaks = c(-17,-10,0,10,32),
labels = paste0(as.character(c(17,10,0,10,32), "%")))+
theme(axis.text.y  = element_text(size=13,hjust=1),
axis.text = element_text(colour = "black"),
plot.background = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.ticks = element_blank(),
axis.text.x = element_blank(),
legend.background =element_rect("white"),
legend.position="none",
strip.background = element_rect(fill = "white", colour = "white"),
strip.text.x = element_text(size = 13))

ggsave("newplot.pdf",p,scale=2)

Mean Age at Childbearing in Spain 2011

TFR 2011 fixed

Coloring maps in R for social sciences

“It’s all about matching perceptual dimensions with data dimensions” Cindy Brewer

Plotting a map brings about two issues: colors and scale, as they both have to work together to best describe your data or else failure is on the line.

To have a better perception of how colors look on a map, I find very useful this website Color Brewer  by Cindy Brewer  (read a recent interview here), which gives advice on map colors, hues and scales for various backgrounds and contexts.

R makes it easy to choose a palette thanks to RColorBrewer library, so that you don’t have to create one by yourself (you can see the available combinations by calling display.brewer.all() function):

Rplot

  • Sequential: continuous variable with data ranging from relatively low to relatively high or interesting values;
  • Qualitative:categorical variables with no specific ordering;
  • Diverging: continuous variable for data where both large low and high values are interesting, or a scale comprising negative to positive values. Usually cold colors denote low or negative (-) values while warm colors (red, orange) denote high or positive (+) values. Also, the mid-point should mean something and add information to your message (e.g. national average);

The choice of colors is trickier, depending on the subject mapping, the message to convey and the context:

  • Rainbow palette: think again;
  • Red means “look at me”, so use it to highlight something meaningful;

    percForeign2008
    1. Foreign presence in Spain, year 2008;
  • Similarly use bright or dark colors to highlight important information, contrasting it to softer/pale tones;
  • Prefer a single hue palette if possible;
  • Be aware that specific colors may have specific cultural meanings (!), a few examples:
  1. Red: in South Africa it’s the color of mourning;
  2. Orange: color of Protestants in Ireland;
  3. Yellow: color of mourning in Egypt and a positive color in Asia;
  4. Purple: color of mourning in Thailand.

for further information see this comprehensive graph of colors in culture. A good rule is also not to choose colors in order to give a good/bad message (green vs red or blue vs red), unless you are mapping the number of drowned kittens.

Once solved the color issue, which I’d like to stress should be weighted together with the choice of a specific palette, we can choose a number of class intervals, that is to say how many colors we are going to use. If I use a diverging palette for continuous variables,  I prefer to have an odd number of colors, either 5 or 7 (but this fits my specific mapping requirements), so that the mid value has neutral light tones. Also, ideally breaks should mean something and not be arbitrarily chosen. R once again has a solution for this, the classInt library, which provides a set of styles to choose from for continuous numerical variables (sd, equal, pretty, quantile, kmeans, hclust, bclust, fisher, or jenks), as well as the option to set them manually (fixed).

  • equal: equal distance, ideally for data with a normal distribution;
  • quantile: quantiles are good for data with a skewed distribution;
  • jenks/fisher: my personal favorites, it tries to reduce the variance within classes and maximize the variance between classes;

n <- 5 # how many colors?
variable # my variable of choice
category <- classIntervals(variable, n,style = "jenks",na.ignore=T)
palette <- brewer.pal(n,"RdBu")
color <- findColours(category,(palette))
bins <- category$brks
lb <- length(bins)
plot(spain, col=color,border=T)
legend("topright",fill=palette,legend=(paste(round(bins[-length(bins)],1),"-",round(bins[-1],1))),cex=2, bg="white")

blogmap1
2. Mean Age at Childbearing for year 1981 in Spain

Of course we can edit pretty much everything to tailor the map to our needs and preferences. For instance the above map portrays 910 areas and I prefer to suppress borders to avoid overcrowding by setting plot(spain, col=color,border=F) and using the layout function to separate the plot from legends to get something like this:

percForeign2008
3. TFR difference between Spaniards and Foreigners

layout(matrix(c(1,2,3),1,3,byrow=T), widths=c(1,1,0.35), heights=1)

A space-time box plot of Spain’s TFR for 910 comarcas.

The idea behind spatial analysis is that space matters and near things are more similar: a variable measured in city A is (ideally) different from the same variable measured in city B. A simple way to get a feeling and to represent this hypothesis is through graphical visualization, usually a map(s).

TFRG_all_4years_Spain

However, when dealing with time series maps are cumbersome and  with sometimes some information is lost, such as the national average or path convergence. Box plots are a simple yet very effective way to synthesize a lot of information in one graph. The following plot depicts TFR over a 30 years period for 910 Spanish areas with respect to the national average value (thick black line in the middle of the boxes).

p <- ggplot(dat, aes(x=factor(YEAR), y=dat$TFR))
p <- p + geom_boxplot()
p <- p + scale_y_continuous(limits=c(0,2.5)) + scale_x_discrete("YEAR", breaks=seq(1981,2011,by=5))

TFRG

Moran plots in ggplot2

Moran plots are one of the many way to depict spatial autocorrelation:
moran.test(varofint,listw)
where “varofint” is the variable we are studying, “listw” a listwise neighbourhood matrix, and the function “moran.test” performs the Moran’s test (duh!) for spatial autocorrelation and is included in the spdep funtionality. The same plot can be done using ggplo2 library. Provided that we already have our listwise matrix of neighborhood relationships listw, we first define the variable and the lagged variable under study, computing their mean and saving them into a data frame (there are a lot of datasets you can find implemented in R: afcon, columbus, syracuse, just to cite a few). The purpose is to obtain something that looks like this (I have used my own *large* set of Spanish data to obtain it):

ggplot2.moranplot1

Upload your data. Here is Anselin (1995) data on African conflicts, afcon:

data(afcon)
varofint listw varlag var.name <- "Total Conflicts"
m.varofint m.varlag
and compute the local Moran's statistic using localmoran:

lisa
and save everything into a dataframe:
df

use these variables to derive the four sectors "High-High"(red), "Low-Low"(blue), "Low-High"(lightblue), "High-Low"(pink):
df$sector significance vec =df$m.varofint & df$varlag>=df$m.varlag]  df$sector[df$varofint<df$m.varofint & df$varlag<df$m.varlag]  df$sector[df$varofint<df$m.varofint & df$varlag>=df$m.varlag]  =df$m.varofint & df$varlag<df$m.varlag]

df$sec.data

df$sector.col[df$sec.data==1] <- "red"
df$sector.col[df$sec.data==2] <- "blue"
df$sector.col[df$sec.data==3] <- "lightblue"
df$sector.col[df$sec.data==4] <- "pink"
df$sector.col[df$sec.data==0] <- "white"

df$sizevar df$sizevar 0.1)
df$FILL df$BORDER
to get the ggplot graph:
p 0.05", "High-High", "Low-Low","Low-High","High-Low"))+
scale_x_continuous(name=var.name)+
scale_y_continuous(name=paste("Lagged",var.name))+
theme(axis.line=element_line(color="black"),
axis.title.x=element_text(size=20,face="bold",vjust=0.1),
axis.title.y=element_text(size=20,face="bold",vjust=0.1),
axis.text= element_text(colour="black", size=20, angle=0,face = "plain"),
plot.margin=unit(c(0,1.5,0.5,2),"lines"),
panel.background=element_rect(fill="white",colour="black"),
panel.grid=element_line(colour="grey"),
axis.text.x  = element_text(hjust=.5, vjust=.5),
axis.text.y  = element_text(hjust=1, vjust=1),
strip.text.x  = element_text(size = 20, colour ="black", angle = 0),
plot.title= element_text(size=20))+
stat_smooth(method="lm",se=F,colour="black", size=1)+
geom_vline(xintercept=m.varofint,colour="black",linetype="longdash")+
geom_hline(yintercept=m.varlag,colour="black",linetype="longdash")+
theme(legend.background =element_rect("white"))+
theme(legend.key=element_rect("white",colour="white"),
legend.text =element_text(size=20))

Check out the interactive shiny version on pracademic

You can find me at PAA 2015 Poster Session 2 and 3!

PAA2015_Poster_Spatial2PAA2015_Poster_Spatial_30You can find me at PAA 2015:

The Relationship between Space and Time: a Spatial Approach, Poster Session 2, slot 15, 10:30 AM – 12:30 PM

A Spatial Econometrics Analysis of Three Decades of Fertility Change in Spain, Poster Session 3, slot 27 Thursday, April 30 1:00 PM – 3:00 PM

Indigo Ballrooms A-H Level 2