Monday, July 21, 2014

Geomorph update 2.1.1 now available!

Geomorph users,

We have uploaded version 2.1.1 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.1.1 contains small updates and fixes a few small bugs.

New Feature: Specimens can now be rotated to their principal axes in gpagen() with option to disable (PrinAxes=TRUE by default). This helps visualizations in other functions. Thank you to Jessie Atterholt of UC Berkeley for this suggestion. 

gpagen(hummingbirds$land,curves=hummingbirds$curvepts, PrinAxes=TRUE)

Bug Fixes: 

  •     Fixed concatenated SSCP matrix issue in procD.lm(), pairwise.D.test() and pairwise.slope.test()
  •    Corrected issue reading specimen names in readland.tps()
  •    Corrected color options for plotting groups in plotTangentSpace()
  •    Corrected test for slope:group interaction in plotAllometry()

Computational Change: Underlying code for ancestral state estimation (used in plotGMPhyloMorphoSpace() and physignal()) has been changed to use fastAnc() from Liam Revell's phytools.


Thursday, July 10, 2014

Tips & Tricks 4: Reading In Data Files

Today's exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 4 - How to read a file of coordinate data into R and make sure it is numeric.

Reading your data files into R  for analysis with geomorph or other packages can be challenging. Geomorph has several functions for common data file types of coordinate data (e.g. readland.tps(), readland.nts()).

But what if your data are not in this format?

One of the main issues people face is that using read.table() or read.delim(), numerical data can be coerced into a factor automatically. Below is a very simple solution to make sure your data are numerical:

mydata <- read.table("data.txt",header=TRUE,row.names=1,
stringsAsFactors = FALSE) 
# here we are reading in a tab-delimited text file from MorphoJ, but this can be an issue with data from any outside program. The stringsAsFactors = FALSE is VERY important here

[1] FALSE # here R tells is the data are not numeric, even though we can see they are if we use View(mydata).

The solution is:
as.matrix(sapply(mydata[], as.numeric))

For example, say we know the shape coordinates are present in the file after two columns of non-shape coordinates (these could be centroid size, or a classifier), then:

shape <- as.matrix(sapply(mydata[,-(1:2)], as.numeric)) 
# here we say, use all columns except the first two.

[1] TRUE # now it's numeric. Ready to go!



Wednesday, June 18, 2014

Free Geometric Mophometrics - a Summary

I have seen a few questions floating around the interwebs recently from people starting up in morphometrics and wondering what is the best software to use. I am inspired by my fellow 3D expert Dr. Peter Falkingham's blog post on free software choices for working with 3D slice data and surface models (acquired by micro-CT, laser/photo surface scanners or Photogrammetry). I though i'd write a complementary piece on free software for collecting landmark-based morphometric data.

Why free?
Software development is hard work, so it is understandable that it costs money. However, sometimes the cost of the software is so high it makes it completely inaccessible - and software that isn't used is not helpful for anyone. As Peter so eloquently put it, 

"using freely available software where possible means that my skills are transferrable to different workplaces. It means grant money can be spent on hardware and materials, rather than software licenses. And it means that when I train students in the software I use, they can take those skills wherever they go and not have to re-learn a different proprietary package to do something they have already been doing for weeks/months/years."

And this is my stance also - I spent a good portion at the start of my PhD trying to find adequate ways of getting 3D landmark data from my CT scans of caecilian skulls for free, because I didn't want to be tied to the CT facilities computers. Most importantly though, I believe in Science being accessible and affordable. And the good news is Morphometrics is more free than ever!

Geomorph is, of course, free, and available on CRAN, which is also free! I will be highlighting several geomorph features here. But many people for good reasons prefer more GUI-based applications for their data collection, and so I will present a brief overview of some common and easy to find options.

2D landmark data collection

Geomorph's function digitize2d() allows users to read in one .jpeg file after the next, set the scale and place the landmarks. Then it conveniently write a .TPS file of the coordinate data, which can then be easily read into geomorph (using readland.tps()) . One benefit of using geomorph and R is that it works on Mac, Windows and Linux. 

tpsDIG distributed by F.James Rohlf on the SUNY Morphometrics website, is the classic 2D landmark data software. A good video tutorial of how to use this software is found here. The GUI interface is very user-friendly, and it facilitates collecting both landmark data and also curves and outlines. TpsDIG is only available for Windows, but is not too RAM-demanding so can be run on Virtual machines (a good free virtual machine is VirtualBox). TpsDIG also writes .tps files (obviously!).

ImageJ from NIH is the go-to image manipulation and measurement tool, a must for any morphometrician. It also runs on Mac, Windows and Linux. It isn't specifically for 2D landmark data, so it can be daunting to start with because there are so many other features. But it is simple: Import an image; measure the scale using the line tool and cmd+M, the go to Analyze > Set scale and set what the distance in pixels is in mm. Then you can use the point tool to click and place landmarks. The coordinates are printed in the results window, and need to be copied into an excel sheet (or your freeware equivalent) and saved as a text file. For importing into geomorph, it's best to save it as one row per specimen and x1,y1,x2,y2 ... columns, then it can be brought in using read.table() and converted to a 3D array using arrayspecs(). Advanced users can even make their own macros in ImageJ for their specific needs. Related to ImageJ is FiJi, developed for more image processing.

3D landmark data collection

Geomorph's function digit.fixed() allows users to digitize 3D landmarks on surfaces of .PLY files. , which are one type of 3D surface file (Meshlab is great free software to convert files into this format). These already contain the scale of the specimen, so all one needs to do is place the landmarks. Here is a video of how. The landmark data for each specimen are written to a .NTS file, which can be read into geomorph (using readmulti.nts()) . As mentioned  already, the benefit of using geomorph and R is that it works on Mac, Windows and Linux.  Semilandmark data can also be collected with geomorph, using either digit.fixed(), or buildtemplate() and digitsurface().  Videos demonstrating how to use these are here.

Landmark Editor by IDAV was the software I used the most before starting with geomorph. It also reads in .PLY files, and has a nice feature of setting correspondence between meshes for semi-automatic digitizing. LE only works on Windows machine, sadly. I have tried with minimal success to get it working on a virtual machine, because it has high-reliance on the VRAM and RAM, particularly with large meshes. Also the software is not being updated with the changing Windows operating systems, so it still performs the best on XP machines. LE outputs a .DTA file, which is essentially an .NTS file for multiple specimens. This file can be read into geomorph using readland.nts() but beware, the file has incorrect header notation; every header is 1 n p-x-k 1 9999 Dim=3, rather than 1 n p-x-k 0 Dim=3, which denotes that missing data is in the file even when it is not. Best to change the header by hand so that you don't get a "NAs introduced by coercion" error.

Meshlab by the Visual Computing Lab has a feature called PickPoints. I'll say now that I have found this feature quite difficult to master, and so I always returned to Landmark Editor. Landmarks are placed on the surface of any 3D surface file (that has faces) that Meshlab can read (see the website for the full list, but includes the standard .PLY, .STL and .VRML). To place a landmark, hold option and left-mouse button click on the surface. The function has a template option for semi-automated digitizing also. To save the coordinate data, Meshlab writes a .pp file, which has the coordinates embedded in with a lot of other code. The R package Morpho by S. Schlager has a function to read in these files (read.mpp()). Meshlab is available for Mac, Windows and Linux. 

Edgewarp by Bill Green and Fred L. Bookstein has been around since 1994 (available here). The software reads in slice data, i.e. CT or MRI imaging systems, and allows users to place landmarks and semilandmarks. It is available for Mac and Linux only. The command-line startup makes this software less appealing to newbies, but it has a dedicated following, particularly in the anthropology sector. I have very little experience with it, so would love to have anyone with more experience to comment.

I'd love to hear about anyone's experience with these, or other software I have omitted (this is NOT a comprehensive list). Comment away!


Monday, June 2, 2014

Geomorph 2.1 Now Available!

Geomorph users,

We have uploaded version 2.1 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.1 comes with some small changes and new features: 

Mike Collyer has now officially joined the geomorph team!

New functions 
pairwise.slope.test() to compare slopes of regression lines. This is based on a new paper by Mike Collyer and colleagues (Collyer, M.L., D.J. Sekora, and D.C. Adams. 2014. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. In Press) 

procD.pgls() to assess high-dimensional ANOVA and regression models in a phylogenetic context. This is based on a new paper by Dean Adams (Adams, D.C. 2014. A method for assessing phylogenetic least squares models for shape and other high-dimensional multivariate data. Evolution. In Press)

New features
We have added residual randomization options added to procD.lm()and pairwiseD.test(), as described in Collyer et al. 2014.

We have also enhanced capabilities of the digitizing function digitize2d(). The function now reads multiple images and outputs a TPS file. It can be used with missing data. And a digitizing session can be restarted where previous session stopped. In this overhaul, we have fixed a bug in the previous digitize2d() regarding the scale. If any users think they are affected by this, please contact Emma (sherratt[at]iastate[dot]edu).


Friday, April 18, 2014

Tips & Tricks 3: Ordering Datasets Alphabetically

Today's exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 3 - How to reorder the dataset alphabetically by specimen name.

Say you have a 2D array dataset with specimen names (here species) as the row names of species data 
(I have jumbled up the species of geomorph's data(plethspecies) for illustrative purposes)
                      [,1]         [,2]      [,3]        [,4]        [,5]        [,6]       [,7]       [,8]
P_cinereus       0.2170911 -0.000276374 0.2592660 -0.05280429 -0.01647032 -0.01611658 -0.2561081 -0.1222936
P_nettingi       0.2182337 -0.007565857 0.2550516 -0.07272572 -0.02249780 -0.02076196 -0.2448233 -0.1139709
P_hoffmani       0.2157877 -0.002494932 0.2538394 -0.05447151 -0.02241651 -0.01523319 -0.2502073 -0.1157858
P_virginia       0.2155726  0.001949692 0.2575607 -0.05119322 -0.03633579 -0.01912463 -0.2491259 -0.1158853
P_serratus       0.2086011  0.005599942 0.2474015 -0.05370812 -0.02172973 -0.01504136 -0.2538781 -0.1217033
P_electromorphus 0.2094443 -0.001604654 0.2502967 -0.05379854 -0.02843583 -0.01551027 -0.2536498 -0.1242497
P_shenandoah     0.2164737  0.003217350 0.2621750 -0.04501284 -0.01737607 -0.01863017 -0.2564934 -0.1198138
P_hubrichti      0.2154898 -0.000836868 0.2522478 -0.06473866 -0.03708750 -0.02215923 -0.2497272 -0.1155023

P_richmondi      0.2115516 -0.005225566 0.2499019 -0.06172998 -0.02749914 -0.01938572 -0.2553196 -0.1238423

and you want to order them alphabetically by name, this is simply:

> y <- y[order(rownames(y)),] # where y is a 2D array of your data

The equivalent for your 3D array is:
> Y <- Y[,,order(dimnames(Y)[[3]])] # where Y is a 3D array

Easy! In these two examples, the specimen names are in the data matrix itself. But what if you want to sort by other classifiers?

Using this construct, you can be smarter with ordering and sort by any other classifier, simply by adding in the classifying vector as x in Y[,,sort(x)]
for example,

> data(plethodon)
> Y <- plethodon$land
# the individual specimens in this dataset are in a random order. To sort the individuals by species names as given by the classifier $species:
> plethodon$species 
 [1] Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah Teyah Teyah
[19] Teyah Teyah Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah
[37] Teyah Teyah Teyah Teyah
Levels: Jord Teyah

> Y[,,order(plethodon$species)] # where Y is a 3D array

Manipulating datasets in R is challenging at first, but is easy once you know the tricks!


Update 19th April: The original version  of this post used the function sort() rather than order(). While they both came to the same conclusion in these examples, I should explain the difference between them, and why I changed them. sort() returns an ordered vector of the object within the parentheses, while order() returns the addresses of the elements within object if they would be ordered. order() therefore is more often used within Y[] than on its own.

Wednesday, April 16, 2014

Geomorph 3D Visualization

Dear geomorph users,

version 2.0 of geomorph brings new developments in how shape deformations from 3D coordinate shape data can be viewed. We have implemented warping of 3D surface files (e.g., .ply files), which allows the user to visualize the shape deformations along Principal Component axes, Multivariate Regression slopes, Partial Least Squares axes and group differences, to name a few.

The new function warpRefMesh() reads in a .ply file and landmark coordinates associated with this specimen, and uses the thin-plate spline method (Bookstein 1989) to warp the mesh into the shape defined by a second set of landmark coordinates, usually
those of the mean shape for a set of aligned specimens. 

When using this function for warping, it is highly recommended that the mean shape (derived from the users sample using mshape()) is used as the reference for warping (see Rohlf 1998). 

The workflow is as follows:
1. Calculate the mean shape using mshape()
2. Choose an actual specimen to use for the warping - the specimen used as the template for this warping is recommended to be one most similar in shape to the average of the sample, but can be any reasonable specimen - do this by eye, or use findMeanSpec()
3. Warp this specimen into the mean shape using warpRefMesh()
4. Use this average mesh where it asks for a mesh= in the analysis functions and visualization functions (plotTangentSpace(), plotAllometry(), bilat.symmetry(), and plotRefToTarget()).

> ref <- mshape(data)
> refmesh <- warpRefMesh("plyfile.ply", data[,,1], ref, color=NULL, centered=T) # here the digitized coordinates for specimen plyfile are the first in the array

warpRefMesh plots the imported mesh (below left) and the new warped mesh that represents the ref shape (mean shape, below right):

The function also returns to object refmesh a mesh3d object of this new warped mesh, which can be used in any function in geomorph where mesh= is an option. For example,
> plotTangentSpace(data, axis1 = 1, axis2 = 2, warpgrids=T, mesh= refmesh)

Here, we see that function plotTangentSpace returns the shape deformation matching the most negative and most positive shape along the axis specified in axis1=.

The warping capabilities in geomorph are similar to those done using IDAV Landmark (e.g., Drake and Klingenberg 2010), but of course the benefit is that now all your analyses and visualizations can be done in R.


Geomorph Version 2.0 Now Available!

Geomorph users,

We have uploaded version 2.0 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.0 comes with substantial changes and new features: 

New function phylo.pls()for assessing the multivariate association between two blocks of variables in a phylogenetic context. This is based on a new paper by Adams and Felice

New function two.b.pls() for assessing the multivariate association between two blocks of variables

New function morphol.disparity() to compare Procrustes variance disparity among groups

New method for function physignal() to calculate phylogenetic signal using a generalized Kappa statsitic. This is based on a new paper by Adams (Systematic Biology in press)

F-ratios and R-squared values added to output of ProcD.lm()

3D Visualizations now include "surface" option to view shape deformation as warped mesh3d surfaces in the following: plotRefToTarget(), plotTangentSpace(), plotAllometry(), and bilat.symmetry(). 

New I/O functions: warpRefMesh() to create a mesh3d surface that represents the mean shape, findMeanSpec() to assist in choosing a template ply file for use with warpRefMesh() that identifies specimen closest to the mean shape, and defline.modules() to interactively assign landmarks to modular partitions (currently 2D only)

Generalized data input to allow 3D array or 2D matrix of data added to the following analysis functions: compare.evol.rates(), phylo.pls(), morphol.integr(), two.b.pls(), physignal(), and plotGMPhyloMorphoSpace()

Ability to input univariate data added to compare.evol.rates() and physignal()

Verbose output = T/F added to the following functions: bilat.symmetry(), phylo.pls(), two.b.pls(), morphol.integr(), plotAllometry(), plotTangentSpace(), physignal(). This new option allows the user to decide what information should be returned. Verbose=F, the default, returns the basic output, usually the test statistic and P-value.Verbose=T, returns new datasets, shape scores and more. 

As always, all the changes and additions for this version can be found in the geomorph news file.