**Exercise 1 - How to average a dataset of morphometric data (shape data) by group.**

**Lets start with the input objects. We have a 3D array of 30 specimens shape data (p by k coordinate data) called**

*data*, and a grouping variable

*group,*which is of class factor, with m levels. We want to average the morphometric data by these m levels.

[Here you should familiarise yourself with the factor functions, such as as.factor() and levels(), see here for more details].

The first thing to do is to change the 3D array

*data*into a 2D array (n by p x k) with geomorph function two.d.array()

x <- two.d.array(data)

I will explain a long-winded way of doing things, which is intuitive but cumbersome. Then I will show a simple one line version that utilises two useful base functions in R, which make life much simpler!

1) Set up the output matrix, which will be filled by a for loop:

p <- dim(data)[1] # the number of landmarks

k <- dim(data)[2] # the dimensions of the coordinates

Y <- array(NA, dim=c(p,k,length(levels(group))) #new empty array to fill

dimnames(Y)[[3]] <- levels(group)# set group levels as new names

Now write a for loop that subsets the 2D array by group and calculates the mean shape using geomorph function mshape():

for (i in 1: length(levels(group))){

grp <- x[which(group==levels(group)[i]),]# makes a subset 2D array with only specimens of the ith group by using which().

foo <- arrayspecs(grp ,p,k, byLand=F) # turn back into an 3D array

Y[,,i] <- mshape(foo) # place into the new 3D array

}

Since mshape() requires a 3D array, the switching between 2D and #D is necessary, but cumbersome. While this procedure is very long-winded, it does allow you to get used to manipulating arrays, and learn how to do procedures by levels of a grouping factor.

But if you look under the hood, you see that mshape is simply taking the means of the rows and columns of the matrix. This leads us to:

2) This for loop can in fact be replaced with one line:

means <- rowsum(x, group)/as.vector(table(group))# rowsum() is a simple base function to get the sum of the rows, while table() is a great function for getting group sizes (group n).

Y <- arrayspecs(means ,dim(data)[1],dim(data)[2], byLand=F)# then all you have to do is put the averaged data back into an 3D array.

Both ways produce the same result. As is often the case with R, 'there are many ways to skin a cat'! The first example works with functions you likely already know, and allows you to become more proficient with writing your own for loops and searching or subsetting with which(). The second example shows you that looking deeper into R base functions will uncover little gem functions, which make our life much easier!

Enjoy averaging by group!

Emma

This comment has been removed by the author.

ReplyDeleteThanks Emma!

ReplyDeleteThere is another way to find means (or other statistics) by groups

means <- aggregate(x ~ group, FUN=mean)

It will append the group levels as the first column, but that can be fixed by

means <- (aggregate(x ~ group, FUN=mean))[,-1]

What's that you say, why would this method be better to use?

1) You can use any function. E.g., FUN = mean, sd, var, etc. Even make a special one as a separate function, if you like.

2) It can aggregate over multiple factors. For example,

means <- aggregate(x ~ A + B, FUN=mean)

will produce group means for all levels of B within all levels of A. This is the same as doing this

group <- factor(paste(A, B))

means <- aggregate(x ~ group, FUN=mean)

So, it is a bit simpler (appended columns for factor levels, notwithstanding)