Friday, April 18, 2014

Tips & Tricks 3: Ordering Datasets Alphabetically

Today's exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 3 - How to reorder the dataset alphabetically by specimen name.

Say you have a 2D array dataset with specimen names (here species) as the row names of species data 
(I have jumbled up the species of geomorph's data(plethspecies) for illustrative purposes)
                      [,1]         [,2]      [,3]        [,4]        [,5]        [,6]       [,7]       [,8]
P_cinereus       0.2170911 -0.000276374 0.2592660 -0.05280429 -0.01647032 -0.01611658 -0.2561081 -0.1222936
P_nettingi       0.2182337 -0.007565857 0.2550516 -0.07272572 -0.02249780 -0.02076196 -0.2448233 -0.1139709
P_hoffmani       0.2157877 -0.002494932 0.2538394 -0.05447151 -0.02241651 -0.01523319 -0.2502073 -0.1157858
P_virginia       0.2155726  0.001949692 0.2575607 -0.05119322 -0.03633579 -0.01912463 -0.2491259 -0.1158853
P_serratus       0.2086011  0.005599942 0.2474015 -0.05370812 -0.02172973 -0.01504136 -0.2538781 -0.1217033
P_electromorphus 0.2094443 -0.001604654 0.2502967 -0.05379854 -0.02843583 -0.01551027 -0.2536498 -0.1242497
P_shenandoah     0.2164737  0.003217350 0.2621750 -0.04501284 -0.01737607 -0.01863017 -0.2564934 -0.1198138
P_hubrichti      0.2154898 -0.000836868 0.2522478 -0.06473866 -0.03708750 -0.02215923 -0.2497272 -0.1155023

P_richmondi      0.2115516 -0.005225566 0.2499019 -0.06172998 -0.02749914 -0.01938572 -0.2553196 -0.1238423

and you want to order them alphabetically by name, this is simply:

> y <- y[sort(rownames(y)),] # where y is a 2D array of your data

The equivalent for your 3D array is:
> Y <- Y[,,sort(dimnames(Y)[[3]])] # where Y is a 3D array

Easy! In these two examples, the specimen names are in the data matrix itself. But what if you want to sort by other classifiers?

Using this construct, you can be smarter with ordering and sort by any other classifier, simply by adding in the classifying vector as x in Y[,,sort(x)]
for example,

> data(plethodon)
> Y <- plethodon$land
# the individual specimens in this dataset are in a random order. To sort the individuals by species names as given by the classifier $species:
> plethodon$species 
 [1] Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah Teyah Teyah
[19] Teyah Teyah Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah
[37] Teyah Teyah Teyah Teyah
Levels: Jord Teyah

> Y[,,sort(plethodon$species)] # where Y is a 3D array

Manipulating datasets in R is challenging at first, but is easy once you know the tricks!


Wednesday, April 16, 2014

Geomorph 3D Visualization

Dear geomorph users,

version 2.0 of geomorph brings new developments in how shape deformations from 3D coordinate shape data can be viewed. We have implemented warping of 3D surface files (e.g., .ply files), which allows the user to visualize the shape deformations along Principal Component axes, Multivariate Regression slopes, Partial Least Squares axes and group differences, to name a few.

The new function warpRefMesh() reads in a .ply file and landmark coordinates associated with this specimen, and uses the thin-plate spline method (Bookstein 1989) to warp the mesh into the shape defined by a second set of landmark coordinates, usually
those of the mean shape for a set of aligned specimens. 

When using this function for warping, it is highly recommended that the mean shape (derived from the users sample using mshape()) is used as the reference for warping (see Rohlf 1998). 

The workflow is as follows:
1. Calculate the mean shape using mshape()
2. Choose an actual specimen to use for the warping - the specimen used as the template for this warping is recommended to be one most similar in shape to the average of the sample, but can be any reasonable specimen - do this by eye, or use findMeanSpec()
3. Warp this specimen into the mean shape using warpRefMesh()
4. Use this average mesh where it asks for a mesh= in the analysis functions and visualization functions (plotTangentSpace(), plotAllometry(), bilat.symmetry(), and plotRefToTarget()).

> ref <- mshape(data)
> refmesh <- warpRefMesh("plyfile.ply", data[,,1], ref, color=NULL, centered=T) # here the digitized coordinates for specimen plyfile are the first in the array

warpRefMesh plots the imported mesh (below left) and the new warped mesh that represents the ref shape (mean shape, below right):

The function also returns to object refmesh a mesh3d object of this new warped mesh, which can be used in any function in geomorph where mesh= is an option. For example,
> plotTangentSpace(data, axis1 = 1, axis2 = 2, warpgrids=T, mesh= refmesh)

Here, we see that function plotTangentSpace returns the shape deformation matching the most negative and most positive shape along the axis specified in axis1=.

The warping capabilities in geomorph are similar to those done using IDAV Landmark (e.g., Drake and Klingenberg 2010), but of course the benefit is that now all your analyses and visualizations can be done in R.


Geomorph Version 2.0 Now Available!

Geomorph users,

We have uploaded version 2.0 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.0 comes with substantial changes and new features: 

New function phylo.pls()for assessing the multivariate association between two blocks of variables in a phylogenetic context. This is based on a new paper by Adams and Felice

New function two.b.pls() for assessing the multivariate association between two blocks of variables

New function morphol.disparity() to compare Procrustes variance disparity among groups

New method for function physignal() to calculate phylogenetic signal using a generalized Kappa statsitic. This is based on a new paper by Adams (Systematic Biology in press)

F-ratios and R-squared values added to output of ProcD.lm()

3D Visualizations now include "surface" option to view shape deformation as warped mesh3d surfaces in the following: plotRefToTarget(), plotTangentSpace(), plotAllometry(), and bilat.symmetry(). 

New I/O functions: warpRefMesh() to create a mesh3d surface that represents the mean shape, findMeanSpec() to assist in choosing a template ply file for use with warpRefMesh() that identifies specimen closest to the mean shape, and defline.modules() to interactively assign landmarks to modular partitions (currently 2D only)

Generalized data input to allow 3D array or 2D matrix of data added to the following analysis functions: compare.evol.rates(), phylo.pls(), morphol.integr(), two.b.pls(), physignal(), and plotGMPhyloMorphoSpace()

Ability to input univariate data added to compare.evol.rates() and physignal()

Verbose output = T/F added to the following functions: bilat.symmetry(), phylo.pls(), two.b.pls(), morphol.integr(), plotAllometry(), plotTangentSpace(), physignal(). This new option allows the user to decide what information should be returned. Verbose=F, the default, returns the basic output, usually the test statistic and P-value.Verbose=T, returns new datasets, shape scores and more. 

As always, all the changes and additions for this version can be found in the geomorph news file.


Wednesday, February 5, 2014

Geomorph 1.1-6 Now Available!

Geomorph users,

We have uploaded version 1.1-6 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

The issue of the Mac OS X installers has been resolved. The problem stemmed from a dependancy (to JPEG), which was lagging behind for Mac binaries. We thank the curators for helping us with this issue.

Version 1.1-6 does not contain any new functions. It fixes a few small inconsistencies in the code and helpfiles (listed below). The most important update deals with two functions missing from the NAMESPACE in 1.1-5. We apologize for any problems this may have caused.
  • Minor I/O enhancements in readmulti.nts()
  • Minor I/O enhancements in define.sliders.3d() to allow sliders to be in any order
  • Simplified pPsup (original code from J. Claude) to not include size re-scaling by beta (underlying function used in trajectory.analysis() only)
  • Added name.check for groups in compare.evol.rates()

Friday, January 17, 2014

Tips & Tricks 2: Deleting Specimens and Landmarks

Today's exercise is nice and simple, and allows you to get used to manipulating datasets in R.

Exercise 2 - How to delete specimens or landmarks from a dataset of morphometric data (shape data).

When you load your shape data into R with Geomorph (with the read... functions), it will typically be in a 3D array format (class array). Recall that a 3D array is akin to having a p landmarks by k dimension matrix for each individual specimen arranged on separate filing cards, and these are stacked up into n dimensions.
For example:
[1] array
[1] 12 2 40 # This means there are 12 landmarks, in 2 dimensions, and 40 specimens.
Imagine 40 filing cards with a 12 by 2 matrix on them. You get the idea.

Some analyses in geomorph require the dataset to be a 2D array, which is n rows and p*k columns.
mydata.2D <- two.d.array(mydata)
[1] matrix
[1] 40 24 # This means there are 40 specimens and 24 variables (12 * 2).

R has a wonderful built in functionality to access parts of these arrays and matrices. This is why we don't write functions in Geomorph to add/delete specimens, or landmarks, because CRAN did it for us!
I'll give you a quick overview of indexing as it applies to deleting specimens. (You can read more about indexing here (section 3.4).)

Take the 2D array, mydata.2D. Say specimen 7 is an outlier, too juvenile perhaps, or damaged. <- mydata.2d[-7,] # here we remove row number 7.

Or if you know specimens 7 through to 10 are no good to use: <- mydata.2d[-(7:10),]

For several specimens, or those out of sequence:
omit <- c(5,7, 11, 30) # make a vector of the specimen numbers to delete <- mydata.2d[-omit,]

For the 3D array, it is equally simple. Using the same specimen examples as above: <- mydata[,,-7] <- mydata[,,-(7:10)] <- mydata[,,-omit] # note this time, we are accessing the third position in the array, the "card", whereas above we were accessing the row of the matrix.

This same technique can be applied to landmarks.

In the 2D array: Landmark 5 is missing from some specimens. So let's delete it.
Since the landmark data are arranged x1 y1 x2 y2 ... the landmark we want to delete is in:
delete <- c((5*2), (5*2+1)) # position 10 and 11 <- mydata.2d[,-delete)]
[1] 40 22 

Note that this will change the numbering of the landmarks.

In a 3D array, this is simpler: <- mydata[-5,,)] # this removes landmark 5 from each "card"
[1] 11  2 40

Getting adept with indexing is a valuable skill with R. Have fun deleting!


Wednesday, January 15, 2014

Tips & Tricks 1: Averaging Data By Group

I thought it would be helpful to new users of Geomorph and R to discuss some tips and tricks for manipulating morphometric datasets in R. So I am starting a series called "Tips & Tricks". These will be published as I think of them, and assume you have already inputted your data into R using geomorph functions, such as readland.nts, readmulti.nts, readland.tps, read.morphologika, or simple read.csv. Remember that the read... functions in Geomorph turn the morphometric data into a 3D array (like a stack of filing cards; each card representing an individual and having on it a p landmarks by k dimensions matrix of coordinates). 

Exercise 1 - How to average a dataset of morphometric data (shape data) by group.

Lets start with the input objects. We have a 3D array of 30 specimens shape data (p by k coordinate data) called data, and a grouping variable group, which is of class factor, with m levels. We want to average the morphometric data by these m levels.

[Here you should familiarise yourself with the factor functions, such as  as.factor() and levels(), see here for more details].

The first thing to do is to change the 3D array data into a 2D array (n by p x k) with geomorph function two.d.array()

  x <- two.d.array(data)

I will explain a long-winded way of doing things, which is intuitive but cumbersome. Then I will show a simple one line version that utilises two useful base functions in R, which make life much simpler!

1) Set up the output matrix, which will be filled by a for loop:

  p <- dim(data)[1] # the number of landmarks
  k <- dim(data)[2] # the dimensions of the coordinates
  Y <- array(NA, dim=c(p,k,length(levels(group))) #new empty array to fill
  dimnames(Y)[[3]] <- levels(group)# set group levels as new names

Now write a for loop that subsets the 2D array by group and calculates the mean shape using geomorph function mshape():

for (i in 1: length(levels(group))){
grp <- x[which(group==levels(group)[i]),]# makes a subset 2D array with only specimens of the ith group by using which().
foo <- arrayspecs(grp ,p,k, byLand=F) # turn back into an 3D array
    Y[,,i] <- mshape(foo) # place into the new 3D array
Since mshape() requires a 3D array, the switching between 2D and #D is necessary, but cumbersome. While this procedure is very long-winded, it does allow you to get used to manipulating arrays, and learn how to do procedures by levels of a grouping factor.

But if you look under the hood, you see that mshape is simply taking the means of the rows and columns of the matrix. This leads us to:

2) This for loop can in fact be replaced with one line:
means <- rowsum(x, group)/as.vector(table(group))# rowsum() is a simple base function to get the sum of the rows, while table() is a great function for getting group sizes (group n).
Y <- arrayspecs(means ,dim(data)[1],dim(data)[2], byLand=F)# then all you have to do is put the averaged data back into an 3D array.

Both ways produce the same result. As is often the case with R, 'there are many ways to skin a cat'! The first example works with functions you likely already know, and allows you to become more proficient with writing your own for loops and searching or subsetting with which().  The second example shows you that looking deeper into R base functions will uncover little gem functions, which make our life much easier!

Enjoy averaging by group!


Friday, January 3, 2014

Geomorph and MacOS X

Happy New Year to all Geomorph users,

I have received a few emails from Mac users wanting to update to the new version of Geomorph. Unfortunately there is still a lag on the CRAN repository for a MacOS X binary of version 1.1-5. While this is a nuisance, there is a work around.

The package source 'tarball' can be installed in R or RStudio: Download the tarball (.tar.gz), and then navigate to the Package Installer and choose the Local Source Package option. To do this, you will need to make sure you have C and Fortran compilers installed (see below).

First, read thoroughly the R FAQs. The process has become simpler with newer versions of R, but still involves two main downloads: Go to the tools page to install gofortran. Also install the free Xcode 5 (from the App store) - this is the developer tools package. Once installed XQuartz needs to be open when running R.
EDIT: Update Jan 22 2014: Make sure also to have the following libraries installed before installing the .tar.gz file: jpeg, calibrate, vegan, ape, geiger.
EDIT: Update Jan 23 2014: Also, when installing Xcode tools, make sure to go to preferences and install the "command line tools".

If you have any problems, please make sure you have the current version of all software being used (R, Xcode, gofortran). We hope that CRAN will upload the Mac binary soon.