Chapter 3 Plotting Sociomatrices
This chapter describes in detail how we make our sociomatrices using the ggplot2
package.
We assume that you already have data in a sociomatrix. We will start with one such matrix - the mouse
matrix that is available in the compete
package.
# devtools::install_github('jalapic/compete')
library(compete)
mouse
## A B C D E F G H I J K L
## A 0 0 0 1 0 0 2 0 0 0 0 0
## B 1 0 1 1 0 0 0 0 1 0 0 0
## C 0 2 0 1 0 1 1 0 1 0 0 1
## D 0 0 1 0 0 1 1 0 0 0 0 1
## E 0 1 0 0 0 0 0 0 0 0 0 0
## F 0 0 2 2 0 0 2 0 0 0 3 1
## G 1 2 1 1 0 0 0 0 0 1 0 2
## H 4 7 4 4 5 6 2 0 5 7 14 5
## I 0 2 0 1 1 0 1 0 0 0 3 0
## J 5 8 6 2 7 15 3 4 1 0 25 6
## K 1 5 6 1 2 9 5 2 4 1 0 2
## L 0 1 0 0 0 2 0 2 0 0 1 0
These data are actually stored in a data.frame of 12 rows and 12 columns:
str(mouse)
## 'data.frame': 12 obs. of 12 variables:
## $ A: int 0 1 0 0 0 0 1 4 0 5 ...
## $ B: int 0 0 2 0 1 0 2 7 2 8 ...
## $ C: int 0 1 0 1 0 2 1 4 0 6 ...
## $ D: int 1 1 1 0 0 2 1 4 1 2 ...
## $ E: int 0 0 0 0 0 0 0 5 1 7 ...
## $ F: int 0 0 1 1 0 0 0 6 0 15 ...
## $ G: int 2 0 1 1 0 2 0 2 1 3 ...
## $ H: int 0 0 0 0 0 0 0 0 0 4 ...
## $ I: int 0 1 1 0 0 0 0 5 0 1 ...
## $ J: int 0 0 0 0 0 0 1 7 0 0 ...
## $ K: int 0 0 0 0 0 3 0 14 3 25 ...
## $ L: int 0 0 1 1 0 1 2 5 0 6 ...
3.1 Raw Sociomatrices
The first step we need to do is to convert the sociomatrix dataframe into long format data so we can give x
and y
coordinates. We can do this using the melt
function from the reshape2
package. We ensure that this output is a data.frame using data.frame()
.
In case our diagonal had NA
entries, we also only keep complete cases.
#make the df we will use for plotting
<- reshape2::melt(mouse)
m.dat <- data.frame(m.dat)
m.dat <- m.dat[complete.cases(m.dat),] #removing NAs
m.dat head(m.dat)
## variable value
## 1 A 0
## 2 A 1
## 3 A 0
## 4 A 0
## 5 A 0
## 6 A 0
tail(m.dat)
## variable value
## 139 L 2
## 140 L 5
## 141 L 0
## 142 L 6
## 143 L 2
## 144 L 0
This just gives 2 columns of output. A variable
column which refers to the original column that the value
entry came from. We need to add a new column that gives the row location. These are in the order A->L.
We could just add that column like this:
$row.id <- LETTERS[1:12]
m.dathead(m.dat)
## variable value row.id
## 1 A 0 A
## 2 A 1 B
## 3 A 0 C
## 4 A 0 D
## 5 A 0 E
## 6 A 0 F
tail(m.dat)
## variable value row.id
## 139 L 2 G
## 140 L 5 H
## 141 L 0 I
## 142 L 6 J
## 143 L 2 K
## 144 L 0 L
At this point, there are two main things that we should do before plotting. i) Decide whether to remove zeros - as sociomatrices generally look better without them in. ii) Reorder the ids by their dominance.
Let’s first make a very basic plot without doing either of these things.
We use geom_tile()
from ggplot2
to make our plot, and use geom_text()
to add in the value. The x
axis should be the loser (column from original sociomatrix) and the y
axis should be the winner (row from the original sociomatrix).
The parameters inside geom_tile()
define the color, fill and size of each tile.
The parameter label=
inside geom_text()
define which column supplies the text. We also dictate the color and size of the text.
library(tidyverse)
<- ggplot(m.dat, aes(x=variable, y=row.id)) +
p geom_tile(color="black", size=0.5, stat="identity", fill='white') +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic()
p
There are a number of other things that are wrong with this plot. First, it is more readable to have the labels on the x-axis at the top of the plot. We can do this as follows, while also labeling the axes with new titles:
<- p +
p scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner")
p
It still isn’t quite right. We can adjust a number of other theme
elements to make the plot look better:
<- p +
p theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
)
p
This looks better, but it still has three issues. The ids are in alphabetical order as opposed to being in dominance order. The matrix is full of zeros which make it harder to read. Finally, all tiles are filled with white, when it would be more readable to fill with a color that depicts the value of the cell/tile.
Let’s first remove the zeros. This should be done when we create the data.frame after melting. We substitute zeros with NA
s.
== 0] <- NA
m.dat[m.dat head(m.dat)
## variable value row.id
## 1 A NA A
## 2 A 1 B
## 3 A NA C
## 4 A NA D
## 5 A NA E
## 6 A NA F
tail(m.dat)
## variable value row.id
## 139 L 2 G
## 140 L 5 H
## 141 L NA I
## 142 L 6 J
## 143 L 2 K
## 144 L NA L
We can now replot using the same plotting code as above:
<- ggplot(m.dat, aes(x=variable, y=row.id)) +
p geom_tile(color="black", size=0.5, stat="identity", fill='white') +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
)
p
Reordering the order of individuals on the rows and columns requires a bit more thought. The best way to reorder would be based on some measure of dominance. For this example, we are going to choose to reorder the ids based on David’s Scores. We could pick other ranking methods though such as I&SI.
To reorder by David’s Scores, we first must calculate the David’s Scores of each individual. We will use the ds
function from the compete
package:
<- ds(mouse)
v v
## A B C D E F G
## -7.50702839 -20.20524267 -8.08154762 -16.11250000 -7.58202839 -4.25514347 -9.33750000
## H I J K L
## 36.78053221 -0.06417125 34.27341270 13.91740735 -11.82619048
We can reorder this vector and grab the names in descending order:
<- rev(names(v)[order(v)])
v1
v1
## [1] "H" "J" "K" "I" "F" "A" "E" "C" "G" "L" "D" "B"
Now we have the order of individuals by their dominance. We can factorize our two columns containing the individual ids and set their levels. We actually have to make the levels of the winners the reverse order.
$variable <- factor(m.dat$variable, levels=v1)
m.dat$row.id <- factor(m.dat$row.id, levels = rev(v1))
m.dat
str(m.dat)
## 'data.frame': 144 obs. of 3 variables:
## $ variable: Factor w/ 12 levels "H","J","K","I",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ value : int NA 1 NA NA NA NA 1 4 NA 5 ...
## $ row.id : Factor w/ 12 levels "B","D","L","G",..: 7 1 5 2 6 8 4 12 9 11 ...
We can now replot the sociomatrix with the order of the ids now set:
<- ggplot(m.dat, aes(x=variable, y=row.id)) +
p geom_tile(color="black", size=0.5, stat="identity", fill='white') +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
)
p
David’s Scores are a quick way of determining rank, but let’s instead go with the more formal I&SI rankings. We can calculate these using the isi98
function in the compete
package:
<- isi98(mouse) isi
##
## INITIAL RANK:
## [1] "H" "J" "K" "I" "F" "A" "E" "C" "G" "L" "D" "B"
## I = 5
## SI = 17
isi
## $best_matrix
## H J K F C I E A G L B D
## H 0 7 14 6 4 5 5 4 2 5 7 4
## J 4 0 25 15 6 1 7 5 3 6 8 2
## K 2 1 0 9 6 4 2 1 5 2 5 1
## F 0 0 3 0 2 0 0 0 2 1 0 2
## C 0 0 0 1 0 1 0 0 1 1 2 1
## I 0 0 3 0 0 0 1 0 1 0 2 1
## E 0 0 0 0 0 0 0 0 0 0 1 0
## A 0 0 0 0 0 0 0 0 2 0 0 1
## G 0 1 0 0 1 0 0 1 0 2 2 1
## L 2 0 1 2 0 0 0 0 0 0 1 0
## B 0 0 0 0 1 1 0 1 0 0 0 1
## D 0 0 0 1 1 0 0 0 1 1 0 0
##
## $best_order
## [1] "H" "J" "K" "F" "C" "I" "E" "A" "G" "L" "B" "D"
##
## $I
## [1] 3
##
## $SI
## [1] 11
##
## $rs
## [1] 0.9300699
We can graph the dominance rankings directly from the $best_order
output.
It’s noticeable that this order differs quite a bit from the David’s Scores rankings:
$best_order isi
## [1] "H" "J" "K" "F" "C" "I" "E" "A" "G" "L" "B" "D"
v1
## [1] "H" "J" "K" "I" "F" "A" "E" "C" "G" "L" "D" "B"
We will then refactor our two columns based on these rankings.
$variable <- factor(m.dat$variable, levels=isi$best_order)
m.dat$row.id <- factor(m.dat$row.id, levels = rev(isi$best_order))
m.dat
str(m.dat)
## 'data.frame': 144 obs. of 3 variables:
## $ variable: Factor w/ 12 levels "H","J","K","F",..: 8 8 8 8 8 8 8 8 8 8 ...
## $ value : int NA 1 NA NA NA NA 1 4 NA 5 ...
## $ row.id : Factor w/ 12 levels "D","B","L","G",..: 5 2 8 1 6 9 4 12 7 11 ...
And we can replot the matrix based on this order
<- ggplot(m.dat, aes(x=variable, y=row.id)) +
p geom_tile(color="black", size=0.5, stat="identity", fill='white') +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
)
p
This now is starting to look like a proper sociomatrix. There are however quite a few numbers under the diagonal. It’s not immediately obvious which numbers are inconsistencies - i.e. when the individual in the row is a winner over the individual in the column. For instance, L beats F 2 times, whereas F beats L 1 time only. Therefore this is an inconsistent relationship. However, L beats H 2 times, but H beats L 5 times, so this is not inconsistent.
One way we could fill these tiles would just be based on the value of the cell. This can be done using scale_fill_gradient()
. To this we need to supply a fill color for the lowest value (which is actually 1 not 0 as we removed them) and a fill color for the highest value. We also need to add fill=value
inside the ggplot()
function - we also remove fill="white"
from the geom_tile()
function.
<- ggplot(m.dat, aes(x=variable, y=row.id, fill=value)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
p
This looks ok, but the shading is heavily skewed by the highest value. One alternatively strategy would be to log the value column, and supply that information as the new fill value. To do this, we would add a new column to the data.frame.
$value_log <- log(m.dat$value)
m.dat
<- ggplot(m.dat, aes(x=variable, y=row.id, fill=value_log)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
p
This looks a bit better.
Another way to represent the relationships is by filling the tile with a shade of color that represents the directional consistency of each relationship.
We can do that by adding a new column to the data.frame which represents the directional consistency of each measure. We can calculate this directly from the original matrix.
<- mouse / (mouse + t(mouse))
dcs $DC <- reshape2::melt(dcs)[,2]
m.dat
head(m.dat)
## variable value row.id value_log DC
## 1 A NA A NA NaN
## 2 A 1 B 0 1
## 3 A NA C NA NaN
## 4 A NA D NA 0
## 5 A NA E NA NaN
## 6 A NA F NA NaN
tail(m.dat)
## variable value row.id value_log DC
## 139 L 2 G 0.6931472 1.0000000
## 140 L 5 H 1.6094379 0.7142857
## 141 L NA I NA NaN
## 142 L 6 J 1.7917595 1.0000000
## 143 L 2 K 0.6931472 0.6666667
## 144 L NA L NA NaN
Now we can plot based on this measure:
<- ggplot(m.dat, aes(x=variable, y=row.id, fill=DC)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
p
Except this looks confusing. It would be better only to color the tiles when the animal is dominant and exclude those who are not.
We can check for each pair of individuals which has the highest directional consistency by checking the matrix of directional consistencies against its transpose. We then assign a NA
to any cell that is not greater than its transpose:
<=t(dcs)]<-NA
dcs[dcs dcs
## A B C D E F G H I J K
## A NaN NA NaN 1.0000000 NaN NaN 0.6666667 NA NaN NA NA
## B 1 NaN NA 1.0000000 NA NaN NA NA NA NA NA
## C NaN 0.6666667 NaN NA NaN NA NA NA 1.0000000 NA NA
## D NA NA NA NaN NaN NA NA NA NA NA NA
## E NaN 1.0000000 NaN NaN NaN NaN NaN NA NA NA NA
## F NaN NaN 0.6666667 0.6666667 NaN NaN 1.0000000 NA NaN NA NA
## G NA 1.0000000 NA NA NaN NA NaN NA NA NA NA
## H 1 1.0000000 1.0000000 1.0000000 1 1.0000000 1.0000000 NaN 1.0000000 0.6363636 0.8750000
## I NaN 0.6666667 NA 1.0000000 1 NaN 1.0000000 NA NaN NA NA
## J 1 1.0000000 1.0000000 1.0000000 1 1.0000000 0.7500000 NA 1.0000000 NaN 0.9615385
## K 1 1.0000000 1.0000000 1.0000000 1 0.7500000 1.0000000 NA 0.5714286 NA NaN
## L NaN 1.0000000 NA NA NaN 0.6666667 NA NA NaN NA NA
## L
## A NaN
## B NA
## C 1.0000000
## D 1.0000000
## E NaN
## F NA
## G 1.0000000
## H 0.7142857
## I NaN
## J 1.0000000
## K 0.6666667
## L NaN
We now can melt
this matrix to provide the new fill values of directional consistency and use those for plotting:
$DC1 <- reshape2::melt(dcs)[,2] # only need 2nd column of melt output
m.dat
<- ggplot(m.dat, aes(x=variable, y=row.id, fill=DC1)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
p
But wait, there is actually however one mistake with this plot. Remeber that scale_fill_manual()
fills the colors of the plot from lowest to highest, making the lowest equal to “white”. As we plotted the above plot based on DC1
column which was only keeping the highest directional consistency value for each relationship, that means one of these “high” DC values is still the “lowest” in that column. We can find it:
min(m.dat$DC1,na.rm=T)
## [1] 0.5714286
which(m.dat$DC1 == min(m.dat$DC1,na.rm=T))
## [1] 107
107,] # I vs K m.dat[
## variable value row.id value_log DC DC1
## 107 I 4 K 1.386294 0.5714286 0.5714286
It turns out that K beat I 4 times, and I beat K 3 times - giving K a directional consistency of 0.57 over I. When looking at the above plot, you will notice that the tile in the row of K and column of I which has a 4 in it is white. Technically, this should be a very pale red color.
How do we fix this ?
Essentially we need to put some DC values back into this column. But we need to choose wisely. Should we pick a value of 0? This then might skew the range of colors, although DC does vary between 0 and 1. Alternatively, we could pick 0.5 as we are only interested in looking at the higher DC values.
In this example, I’ll pick 0.5 as the cutoff value. What we should do is replace and NA
values in the DC1
column with 0.5
$DC1 <- ifelse(is.na(m.dat$DC1), 0.5, m.dat$DC1)
m.dattail(m.dat)
## variable value row.id value_log DC DC1
## 139 L 2 G 0.6931472 1.0000000 1.0000000
## 140 L 5 H 1.6094379 0.7142857 0.7142857
## 141 L NA I NA NaN 0.5000000
## 142 L 6 J 1.7917595 1.0000000 1.0000000
## 143 L 2 K 0.6931472 0.6666667 0.6666667
## 144 L NA L NA NaN 0.5000000
Now, let’s replot:
<- ggplot(m.dat, aes(x=variable, y=row.id, fill=DC1)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
p
Notice that now the tile K-I with a 4 in it is now filled with color, but all the others that should be white are still white.
3.2 Dichotomized Sociomatrices
We can plot dichotomized sociomatrices in two ways. The first would be to use the function get_di_matrix()
from the compete
package to generate a dichotomized matrix. We would then go through similar steps as above.
This is the dichotomized matrix:
get_di_matrix(mouse)
## A B C D E F G H I J K L
## A 0 0 0 1 0 0 1 0 0 0 0 0
## B 1 0 0 1 0 0 0 0 0 0 0 0
## C 0 1 0 0 0 0 0 0 1 0 0 1
## D 0 0 0 0 0 0 0 0 0 0 0 1
## E 0 1 0 0 0 0 0 0 0 0 0 0
## F 0 0 1 1 0 0 1 0 0 0 0 0
## G 0 1 0 0 0 0 0 0 0 0 0 1
## H 1 1 1 1 1 1 1 0 1 1 1 1
## I 0 1 0 1 1 0 1 0 0 0 0 0
## J 1 1 1 1 1 1 1 0 1 0 1 1
## K 1 1 1 1 1 1 1 0 1 0 0 1
## L 0 1 0 0 0 1 0 0 0 0 0 0
Alternatively, if we have already gone through the above steps, we can simply piggy-back on the plot we just made based on the directional consistency. In this plot, the only tiles that receive a color are those with a directional consistency above 0.5, which is exactly what we want for a dichotomized matrix. All we need to do is replace the text values. We no longer want the raw values, but a 1 if their directional consistency is above 0.5.
All we need to do is create a new value column and populate with 1s when the DC > 0.5 using an ifelse()
statement:
$value1 <- ifelse(m.dat$DC1>.5, 1, NA) m.dat
We can then replot putting in this new text:
<- ggplot(m.dat, aes(x=variable, y=row.id, fill=DC1)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value1), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
p
3.3 Multiple Sociomatrices
If we want to plot multiple sociomatrices we need to consider how to color the tiles.
As an example here are some data that have three groups of four mice:
<- read_csv("https://gist.githubusercontent.com/jalapic/98d09d5c98e28276e81d3bc3de3e3832/raw/2175e26485d5b9790c7ef48bf14f94a96e16a4b9/threegroups.csv")
df
head(df)
## # A tibble: 6 x 4
## Group id1 id2 value
## <dbl> <chr> <chr> <dbl>
## 1 1 A A 0
## 2 1 A B 15
## 3 1 A C 11
## 4 1 A D 8
## 5 1 B A 3
## 6 1 B B 0
tail(df)
## # A tibble: 6 x 4
## Group id1 id2 value
## <dbl> <chr> <chr> <dbl>
## 1 3 C C 0
## 2 3 C D 7
## 3 3 D A 2
## 4 3 D B 4
## 5 3 D C 4
## 6 3 D D 0
We can look at the win-loss matrices from each of these three groups:
<- split(df, df$Group) %>%
mats map(~ reshape2::acast(., id1~id2, value.var="value"))
mats
## $`1`
## A B C D
## A 0 15 11 8
## B 3 0 4 2
## C 0 0 0 1
## D 0 0 0 0
##
## $`2`
## A B C D
## A 0 0 0 1
## B 2 0 1 1
## C 4 6 0 3
## D 4 5 2 0
##
## $`3`
## A B C D
## A 0 3 0 5
## B 33 0 24 7
## C 11 3 0 7
## D 2 4 4 0
One approach would be to create three separate matrices and plot them side by side. Let’s first create a plotting function that will color the tiles by the number of wins in each matrix. Then we’ll apply that function to each of the three groups separately.
Here is the function:
<- function(mat){
make_plot
# Although plotting by value, adding in code that would enable filling by DC
# melt matrix back to df
<- reshape2::melt(mat)
m.dat <- data.frame(m.dat)
m.dat <- m.dat[complete.cases(m.dat),] #removing NAs
m.dat
# remove 0s from value column
$value <- ifelse(m.dat$value==0, NA, m.dat$value)
m.dat
# factorize ids by rank order
<- isi98(mat)
isi $Var2 <- factor(m.dat$Var2, levels=isi$best_order)
m.dat$Var1 <- factor(m.dat$Var1, levels = rev(isi$best_order))
m.dat
# create directional consistency column
<- mat / (mat + t(mat))
dcs $DC <- reshape2::melt(dcs)[,3]
m.dat
# create directional consistency column only >0.5
<=t(dcs)]<-NA
dcs[dcs$DC1 <- reshape2::melt(dcs)[,3]
m.dat
# put in .5 as minimum DC column
$DC1 <- ifelse(is.na(m.dat$DC1), .5, m.dat$DC1)
m.dat
# put in 0 as minimum in value1 column for fill
$value1 <- ifelse(is.na(m.dat$value), .5, m.dat$value)
m.dat
# plot
<- ggplot(m.dat, aes(x=Var2, y=Var1, fill=value1)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradient(
low = "white",
high = "red1",
space = "Lab",
na.value = "white",
guide = "colourbar")
return(p)
}
Now we will create individual plots and put them together using gridExtra
.
<- make_plot(mats[[1]])
p1 <- make_plot(mats[[2]])
p2 <- make_plot(mats[[3]]) p3
library(gridExtra)
grid.arrange(p1,p2,p3, nrow=1)
Hopefully you notice what the problem is with this approach. The highest value in each matrix(15, 6, 33) is the same shade of red which is misleading.
What we have to do instead is scale the fill color based on the whole range of values across all matrices.
The best way to do this is to have all of the data in one dataframe and create a scale based on the value column. We can do that as follows:
$scale_value <- (df$value-min(df$value)) / ((max(df$value)-min(df$value)))
dfhead(df)
## # A tibble: 6 x 5
## Group id1 id2 value scale_value
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 1 A A 0 0
## 2 1 A B 15 0.455
## 3 1 A C 11 0.333
## 4 1 A D 8 0.242
## 5 1 B A 3 0.0909
## 6 1 B B 0 0
We next have to piece that scale_value back. The best way to do this is to start right back from the original data.frame including all groups. But before we do that, let’s create all the data.frames we need for plotting and store them in a list:
# Get matrices
<- split(df, df$Group) %>%
mats map(~ reshape2::acast(., id1~id2, value.var="value"))
# make dataframe for plotting
<- mats %>%
dx map(~ reshape2::melt(.)) %>%
map(~ data.frame(.)) %>%
map(~ mutate(., value0 = ifelse(value==0, NA, value)))
# get rank orders
<- lapply(mats, isi98) %>% map(~ .$best_order) l
We can look at these like this:
lapply(dx, head)
## $`1`
## Var1 Var2 value value0
## 1 A A 0 NA
## 2 B A 3 3
## 3 C A 0 NA
## 4 D A 0 NA
## 5 A B 15 15
## 6 B B 0 NA
##
## $`2`
## Var1 Var2 value value0
## 1 A A 0 NA
## 2 B A 2 2
## 3 C A 4 4
## 4 D A 4 4
## 5 A B 0 NA
## 6 B B 0 NA
##
## $`3`
## Var1 Var2 value value0
## 1 A A 0 NA
## 2 B A 33 33
## 3 C A 11 11
## 4 D A 2 2
## 5 A B 3 3
## 6 B B 0 NA
Next we need to add the scale column. Again, we can do this to each element of the list using a loop:
<- split(df, df$Group)
df.l
<- NULL
out for(i in 1:length(l)){
$Var1 <- df.l[[i]]$id1
df.l[[i]]$Var2 <- df.l[[i]]$id2
df.l[[i]]<- full_join(dx[[i]],df.l[[i]])
out[[i]]
# factorize
$Var2 <- factor(out[[i]]$Var2, levels=l[[i]])
out[[i]]$Var1 <- factor(out[[i]]$Var1, levels = rev(l[[i]]))
out[[i]]
}
lapply(out,head)
## [[1]]
## Var1 Var2 value value0 Group id1 id2 scale_value
## 1 A A 0 NA 1 A A 0.00000000
## 2 B A 3 3 1 B A 0.09090909
## 3 C A 0 NA 1 C A 0.00000000
## 4 D A 0 NA 1 D A 0.00000000
## 5 A B 15 15 1 A B 0.45454545
## 6 B B 0 NA 1 B B 0.00000000
##
## [[2]]
## Var1 Var2 value value0 Group id1 id2 scale_value
## 1 A A 0 NA 2 A A 0.00000000
## 2 B A 2 2 2 B A 0.06060606
## 3 C A 4 4 2 C A 0.12121212
## 4 D A 4 4 2 D A 0.12121212
## 5 A B 0 NA 2 A B 0.00000000
## 6 B B 0 NA 2 B B 0.00000000
##
## [[3]]
## Var1 Var2 value value0 Group id1 id2 scale_value
## 1 A A 0 NA 3 A A 0.00000000
## 2 B A 33 33 3 B A 1.00000000
## 3 C A 11 11 3 C A 0.33333333
## 4 D A 2 2 3 D A 0.06060606
## 5 A B 3 3 3 A B 0.09090909
## 6 B B 0 NA 3 B B 0.00000000
We can very nearly do the plotting. However, there is still one thing to fix. Previously, the scale_fill_manual()
worked by setting a range between two colors based on the values in a particular column. This time, we want the fill to be based on the scale_value
column which ranges from 0 to 1 across the data but may range between any values in between for any given data.frame. Therefore, we need to scale the fill slightly differently.
<- function(df){
make_scale_plot <- ggplot(df, aes(x=Var2, y=Var1, fill=scale_value)) +
p geom_tile(color="black", size=0.5, stat="identity") +
geom_text(aes(label = value0), color="black", size=rel(3.5)) +
theme_classic() +
scale_x_discrete(expand = c(0, 0), position = "top") +
scale_y_discrete(expand = c(0, 0)) +
xlab("Loser") +
ylab("Winner") +
theme(axis.text.x = element_text(vjust = 1),
axis.text.y = element_text(hjust = 0.5),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(fill=NA,color="black", size=0.5, linetype="solid"),
axis.line = element_blank(),
axis.ticks = element_blank(),
panel.background = element_rect(fill="white"),
plot.background = element_rect(fill="white"),
axis.text = element_text(color="#3C3C3C", size=rel(1.1)),
legend.position = "none"
+
) scale_fill_gradientn(
limits = c(0,1),
colors=c("white", "red1")
)
return(p)
}
We can finally plot each matrix side-by-side:
<- make_scale_plot(out[[1]])
pp1 <- make_scale_plot(out[[2]])
pp2 <- make_scale_plot(out[[3]]) pp3
grid.arrange(pp1,pp2,pp3, nrow=1)
There are clearly many improvements that could be made. The code could definitely be cleaned up for the multiple sociomatrices. We also probably don’t want to have “Winner” and “Loser” labels on every matrix. There are probably other things we could do better too.