**Exercise 1 - How to average a dataset of morphometric data (shape data) by group.**

**Lets start with the input objects. We have a 3D array of 30 specimens shape data (p by k coordinate data) called**

*data*, and a grouping variable

*group,*which is of class factor, with m levels. We want to average the morphometric data by these m levels.

[Here you should familiarise yourself with the factor functions, such as as.factor() and levels(), see here for more details].

The first thing to do is to change the 3D array

*data*into a 2D array (n by p x k) with geomorph function two.d.array()

x <- two.d.array(data)

I will explain a long-winded way of doing things, which is intuitive but cumbersome. Then I will show a simple one line version that utilises two useful base functions in R, which make life much simpler!

1) Set up the output matrix, which will be filled by a for loop:

p <- dim(data)[1] # the number of landmarks

k <- dim(data)[2] # the dimensions of the coordinates

Y <- array(NA, dim=c(p,k,length(levels(group))) #new empty array to fill

dimnames(Y)[[3]] <- levels(group)# set group levels as new names

Now write a for loop that subsets the 2D array by group and calculates the mean shape using geomorph function mshape():

for (i in 1: length(levels(group))){

grp <- x[which(group==levels(group)[i]),]# makes a subset 2D array with only specimens of the ith group by using which().

foo <- arrayspecs(grp ,p,k, byLand=F) # turn back into an 3D array

Y[,,i] <- mshape(foo) # place into the new 3D array

}

Since mshape() requires a 3D array, the switching between 2D and #D is necessary, but cumbersome. While this procedure is very long-winded, it does allow you to get used to manipulating arrays, and learn how to do procedures by levels of a grouping factor.

But if you look under the hood, you see that mshape is simply taking the means of the rows and columns of the matrix. This leads us to:

2) This for loop can in fact be replaced with one line:

means <- rowsum(x, group)/as.vector(table(group))# rowsum() is a simple base function to get the sum of the rows, while table() is a great function for getting group sizes (group n).

Y <- arrayspecs(means ,dim(data)[1],dim(data)[2], byLand=F)# then all you have to do is put the averaged data back into an 3D array.

Both ways produce the same result. As is often the case with R, 'there are many ways to skin a cat'! The first example works with functions you likely already know, and allows you to become more proficient with writing your own for loops and searching or subsetting with which(). The second example shows you that looking deeper into R base functions will uncover little gem functions, which make our life much easier!

Enjoy averaging by group!

Emma

Thanks Emma!

ReplyDeleteThere is another way to find means (or other statistics) by groups

means <- aggregate(x ~ group, FUN=mean)

It will append the group levels as the first column, but that can be fixed by

means <- (aggregate(x ~ group, FUN=mean))[,-1]

What's that you say, why would this method be better to use?

1) You can use any function. E.g., FUN = mean, sd, var, etc. Even make a special one as a separate function, if you like.

2) It can aggregate over multiple factors. For example,

means <- aggregate(x ~ A + B, FUN=mean)

will produce group means for all levels of B within all levels of A. This is the same as doing this

group <- factor(paste(A, B))

means <- aggregate(x ~ group, FUN=mean)

So, it is a bit simpler (appended columns for factor levels, notwithstanding)

Hello there!

ReplyDeleteI am a newcomer in both R and geomorph, and here I am struggling with a dataset of my own, in order to establish comparisons on the shape of five different species of fish. I have used ImageJ for landmarks and MorphoJ for Procrustes fit and outliers visualization... Resulting in a 63 rows x 49 columns data matrix, where each pair of column consist in the general XnYn coordinates combination, being the last column the fish IDs. As all examples I have read on geomorph are in a completely different array of data, I am now wondering if I can use the package for establishing the comparisons I need... Please, give me your thoughts!

Regards, Bowie

Hi Bowie, thanks for your message. I've actually written a blog post showing how to read in those type of data matrices. Please see: http://www.geomorph.net/2014/07/tips-tricks-4-reading-in-data-files

DeleteHi,

ReplyDeleteThis may be a dumb question, but do you use Procrustes coordinates or raw landmark coordinates for this? If I understand, using raw landmarks could seriously skew the mean because of rotation and such?

The Procrustes coordinates should only be used in this instance, yes.

DeleteThanks for the info. Would you then want to do a second gpa alignment on the mean data? Also can this be modified for centroid size?

DeleteOne GPA is all that's needed, on the raw specimen coordinate data. The group means after GPA are ready for further analysis, like a PCA using plotTangentSpace, or in a phylogenetic analysis.

DeleteAnd yes, for centroid size, CS, you do: CS.means <- (aggregate(CS ~ group, mean))[,-1]