Thursday, November 20, 2014

Tips & Tricks 5: Extracting Classifiers Using Substring

Today's exercise in another easy one, and is inspired by Ariel Marcy of the University of Queensland, who came up with the algorithm and co-wrote the code below.

Exercise 5 - How to extract classifiers from names of specimens.

Well-organised morphometricians will have a consistent naming system for their specimens, such that information about the species and ID are included in the image name, and perhaps other information such as locality, sex, side (for asymmetry studies), replicate etc. (this in fact is recommended for new users of morphometrics. See this blog post for more details on starting up your morphometrics study). Although beware that the maximum length of path names for files on Windows machines is 260 characters.


Here, the name of the specimen is the binomial species name, followed by it's ID, the sex of the specimen and then the locality.

Here is a simply workflow for converting this information into a table so that these classifiers can be used in geomorph functions.

Assuming you have a 3D array of the coordinate data, mydata, as read in using readland.tps()readland.nts() or any other reasonable way.

categories <- strsplit(dimnames(mydata)[[3]], "_")  # separates the specimen names by underscore

Strspilt() is a very useful base function! If you are using a space, or period, just replace the second option with " " or ".". However it's very good practise to use underscores since spaces can cause REALLY irritating issues (particularly with and other phylogeny reading functions)

classifiers <- matrix(unlist(categories), ncol=5, byrow=T)  #reads list into matrix
classifiers <- cbind(dimnames(mydata)[[3]], classifiers) # adds the specimen ID to the first column of the table
colnames(classifiers) <- c("fileID", "Genus", "Species", "ID", "Sex", "Locality")#rename the column headings

classifiers <-   #converts to data frame so can index using $



Geomorph update 2.1.2 Now Available!

Geomorph users,

We have uploaded version 2.1.2 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.1.2 comes with some small changes and new features: 

New functions 
advanced.procD.lm()for statistically comparing two or more explanatory models. This function is brought to geomorph by Mike Collyer, and can be tested by using the new dataset pupfish.

To warp a specimen outline to the reference, we have added a new function called warpRefOutline(). Also, we have added warping of outline in plotRefToTarget()

These functions together allows the visualisation of 2D datasets. More details on how to use this feature are available in the helpguide.

Other New features

We have added a ShowPlot option to two.b.pls().

And Mike Collyer has been working on improving the lm functions, now adding RRPP option to procD.pgls().


Monday, July 21, 2014

Geomorph update 2.1.1 now available!

Geomorph users,

We have uploaded version 2.1.1 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.1.1 contains small updates and fixes a few small bugs.

New Feature: Specimens can now be rotated to their principal axes in gpagen() with option to disable (PrinAxes=TRUE by default). This helps visualizations in other functions. Thank you to Jessie Atterholt of UC Berkeley for this suggestion. 

gpagen(hummingbirds$land,curves=hummingbirds$curvepts, PrinAxes=TRUE)

Bug Fixes: 

  •     Fixed concatenated SSCP matrix issue in procD.lm(), pairwise.D.test() and pairwise.slope.test()
  •    Corrected issue reading specimen names in readland.tps()
  •    Corrected color options for plotting groups in plotTangentSpace()
  •    Corrected test for slope:group interaction in plotAllometry()

Computational Change: Underlying code for ancestral state estimation (used in plotGMPhyloMorphoSpace() and physignal()) has been changed to use fastAnc() from Liam Revell's phytools.


Thursday, July 10, 2014

Tips & Tricks 4: Reading In Data Files

Today's exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 4 - How to read a file of coordinate data into R and make sure it is numeric.

Reading your data files into R  for analysis with geomorph or other packages can be challenging. Geomorph has several functions for common data file types of coordinate data (e.g. readland.tps(), readland.nts()).

But what if your data are not in this format?

One of the main issues people face is that using read.table() or read.delim(), numerical data can be coerced into a factor automatically. Below is a very simple solution to make sure your data are numerical:

mydata <- read.table("data.txt",header=TRUE,row.names=1,
stringsAsFactors = FALSE) 
# here we are reading in a tab-delimited text file from MorphoJ, but this can be an issue with data from any outside program. The stringsAsFactors = FALSE is VERY important here

[1] FALSE # here R tells is the data are not numeric, even though we can see they are if we use View(mydata).

The solution is:
as.matrix(sapply(mydata[], as.numeric))

For example, say we know the shape coordinates are present in the file after two columns of non-shape coordinates (these could be centroid size, or a classifier), then:

shape <- as.matrix(sapply(mydata[,-(1:2)], as.numeric)) 
# here we say, use all columns except the first two.

[1] TRUE # now it's numeric. Ready to go!



Wednesday, June 18, 2014

Free Geometric Mophometrics - a Summary

I have seen a few questions floating around the interwebs recently from people starting up in morphometrics and wondering what is the best software to use. I am inspired by my fellow 3D expert Dr. Peter Falkingham's blog post on free software choices for working with 3D slice data and surface models (acquired by micro-CT, laser/photo surface scanners or Photogrammetry). I though i'd write a complementary piece on free software for collecting landmark-based morphometric data.

Why free?
Software development is hard work, so it is understandable that it costs money. However, sometimes the cost of the software is so high it makes it completely inaccessible - and software that isn't used is not helpful for anyone. As Peter so eloquently put it, 

"using freely available software where possible means that my skills are transferrable to different workplaces. It means grant money can be spent on hardware and materials, rather than software licenses. And it means that when I train students in the software I use, they can take those skills wherever they go and not have to re-learn a different proprietary package to do something they have already been doing for weeks/months/years."

And this is my stance also - I spent a good portion at the start of my PhD trying to find adequate ways of getting 3D landmark data from my CT scans of caecilian skulls for free, because I didn't want to be tied to the CT facilities computers. Most importantly though, I believe in Science being accessible and affordable. And the good news is Morphometrics is more free than ever!

Geomorph is, of course, free, and available on CRAN, which is also free! I will be highlighting several geomorph features here. But many people for good reasons prefer more GUI-based applications for their data collection, and so I will present a brief overview of some common and easy to find options.

2D landmark data collection

Geomorph's function digitize2d() allows users to read in one .jpeg file after the next, set the scale and place the landmarks. Then it conveniently write a .TPS file of the coordinate data, which can then be easily read into geomorph (using readland.tps()) . One benefit of using geomorph and R is that it works on Mac, Windows and Linux. 

tpsDIG distributed by F.James Rohlf on the SUNY Morphometrics website, is the classic 2D landmark data software. A good video tutorial of how to use this software is found here. The GUI interface is very user-friendly, and it facilitates collecting both landmark data and also curves and outlines. TpsDIG is only available for Windows, but is not too RAM-demanding so can be run on Virtual machines (a good free virtual machine is VirtualBox). TpsDIG also writes .tps files (obviously!).

ImageJ from NIH is the go-to image manipulation and measurement tool, a must for any morphometrician. It also runs on Mac, Windows and Linux. It isn't specifically for 2D landmark data, so it can be daunting to start with because there are so many other features. But it is simple: Import an image; measure the scale using the line tool and cmd+M, the go to Analyze > Set scale and set what the distance in pixels is in mm. Then you can use the point tool to click and place landmarks. The coordinates are printed in the results window, and need to be copied into an excel sheet (or your freeware equivalent) and saved as a text file. For importing into geomorph, it's best to save it as one row per specimen and x1,y1,x2,y2 ... columns, then it can be brought in using read.table() and converted to a 3D array using arrayspecs(). Advanced users can even make their own macros in ImageJ for their specific needs. Related to ImageJ is FiJi, developed for more image processing.

3D landmark data collection

Geomorph's function digit.fixed() allows users to digitize 3D landmarks on surfaces of .PLY files. , which are one type of 3D surface file (Meshlab is great free software to convert files into this format). These already contain the scale of the specimen, so all one needs to do is place the landmarks. Here is a video of how. The landmark data for each specimen are written to a .NTS file, which can be read into geomorph (using readmulti.nts()) . As mentioned  already, the benefit of using geomorph and R is that it works on Mac, Windows and Linux.  Semilandmark data can also be collected with geomorph, using either digit.fixed(), or buildtemplate() and digitsurface().  Videos demonstrating how to use these are here.

Landmark Editor by IDAV was the software I used the most before starting with geomorph. It also reads in .PLY files, and has a nice feature of setting correspondence between meshes for semi-automatic digitizing. LE only works on Windows machine, sadly. I have tried with minimal success to get it working on a virtual machine, because it has high-reliance on the VRAM and RAM, particularly with large meshes. Also the software is not being updated with the changing Windows operating systems, so it still performs the best on XP machines. LE outputs a .DTA file, which is essentially an .NTS file for multiple specimens. This file can be read into geomorph using readland.nts() but beware, the file has incorrect header notation; every header is 1 n p-x-k 1 9999 Dim=3, rather than 1 n p-x-k 0 Dim=3, which denotes that missing data is in the file even when it is not. Best to change the header by hand so that you don't get a "NAs introduced by coercion" error.

Meshlab by the Visual Computing Lab has a feature called PickPoints. I'll say now that I have found this feature quite difficult to master, and so I always returned to Landmark Editor. Landmarks are placed on the surface of any 3D surface file (that has faces) that Meshlab can read (see the website for the full list, but includes the standard .PLY, .STL and .VRML). To place a landmark, hold option and left-mouse button click on the surface. The function has a template option for semi-automated digitizing also. To save the coordinate data, Meshlab writes a .pp file, which has the coordinates embedded in with a lot of other code. The R package Morpho by S. Schlager has a function to read in these files (read.mpp()). Meshlab is available for Mac, Windows and Linux. 

Edgewarp by Bill Green and Fred L. Bookstein has been around since 1994 (available here). The software reads in slice data, i.e. CT or MRI imaging systems, and allows users to place landmarks and semilandmarks. It is available for Mac and Linux only. The command-line startup makes this software less appealing to newbies, but it has a dedicated following, particularly in the anthropology sector. I have very little experience with it, so would love to have anyone with more experience to comment.

I'd love to hear about anyone's experience with these, or other software I have omitted (this is NOT a comprehensive list). Comment away!


Monday, June 2, 2014

Geomorph 2.1 Now Available!

Geomorph users,

We have uploaded version 2.1 to CRAN. The windows and mac binaries have been compiled and the tarball is available.

Version 2.1 comes with some small changes and new features: 

Mike Collyer has now officially joined the geomorph team!

New functions 
pairwise.slope.test() to compare slopes of regression lines. This is based on a new paper by Mike Collyer and colleagues (Collyer, M.L., D.J. Sekora, and D.C. Adams. 2014. A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity. In Press) 

procD.pgls() to assess high-dimensional ANOVA and regression models in a phylogenetic context. This is based on a new paper by Dean Adams (Adams, D.C. 2014. A method for assessing phylogenetic least squares models for shape and other high-dimensional multivariate data. Evolution. In Press)

New features
We have added residual randomization options added to procD.lm()and pairwiseD.test(), as described in Collyer et al. 2014.

We have also enhanced capabilities of the digitizing function digitize2d(). The function now reads multiple images and outputs a TPS file. It can be used with missing data. And a digitizing session can be restarted where previous session stopped. In this overhaul, we have fixed a bug in the previous digitize2d() regarding the scale. If any users think they are affected by this, please contact Emma (sherratt[at]iastate[dot]edu).


Friday, April 18, 2014

Tips & Tricks 3: Ordering Datasets Alphabetically

Today's exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 3 - How to reorder the dataset alphabetically by specimen name.

Say you have a 2D array dataset with specimen names (here species) as the row names of species data 
(I have jumbled up the species of geomorph's data(plethspecies) for illustrative purposes)
                      [,1]         [,2]      [,3]        [,4]        [,5]        [,6]       [,7]       [,8]
P_cinereus       0.2170911 -0.000276374 0.2592660 -0.05280429 -0.01647032 -0.01611658 -0.2561081 -0.1222936
P_nettingi       0.2182337 -0.007565857 0.2550516 -0.07272572 -0.02249780 -0.02076196 -0.2448233 -0.1139709
P_hoffmani       0.2157877 -0.002494932 0.2538394 -0.05447151 -0.02241651 -0.01523319 -0.2502073 -0.1157858
P_virginia       0.2155726  0.001949692 0.2575607 -0.05119322 -0.03633579 -0.01912463 -0.2491259 -0.1158853
P_serratus       0.2086011  0.005599942 0.2474015 -0.05370812 -0.02172973 -0.01504136 -0.2538781 -0.1217033
P_electromorphus 0.2094443 -0.001604654 0.2502967 -0.05379854 -0.02843583 -0.01551027 -0.2536498 -0.1242497
P_shenandoah     0.2164737  0.003217350 0.2621750 -0.04501284 -0.01737607 -0.01863017 -0.2564934 -0.1198138
P_hubrichti      0.2154898 -0.000836868 0.2522478 -0.06473866 -0.03708750 -0.02215923 -0.2497272 -0.1155023

P_richmondi      0.2115516 -0.005225566 0.2499019 -0.06172998 -0.02749914 -0.01938572 -0.2553196 -0.1238423

and you want to order them alphabetically by name, this is simply:

> y <- y[order(rownames(y)),] # where y is a 2D array of your data

The equivalent for your 3D array is:
> Y <- Y[,,order(dimnames(Y)[[3]])] # where Y is a 3D array

Easy! In these two examples, the specimen names are in the data matrix itself. But what if you want to sort by other classifiers?

Using this construct, you can be smarter with ordering and sort by any other classifier, simply by adding in the classifying vector as x in Y[,,sort(x)]
for example,

> data(plethodon)
> Y <- plethodon$land
# the individual specimens in this dataset are in a random order. To sort the individuals by species names as given by the classifier $species:
> plethodon$species 
 [1] Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah Teyah Teyah
[19] Teyah Teyah Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah
[37] Teyah Teyah Teyah Teyah
Levels: Jord Teyah

> Y[,,order(plethodon$species)] # where Y is a 3D array

Manipulating datasets in R is challenging at first, but is easy once you know the tricks!


Update 19th April: The original version  of this post used the function sort() rather than order(). While they both came to the same conclusion in these examples, I should explain the difference between them, and why I changed them. sort() returns an ordered vector of the object within the parentheses, while order() returns the addresses of the elements within object if they would be ordered. order() therefore is more often used within Y[] than on its own.