Friday, April 18, 2014

Tips & Tricks 3: Ordering Datasets Alphabetically

Today's exercise is another nice and simple one, and allows you to get used to manipulating datasets in R.

Exercise 3 - How to reorder the dataset alphabetically by specimen name.

Say you have a 2D array dataset with specimen names (here species) as the row names of species data 
(I have jumbled up the species of geomorph's data(plethspecies) for illustrative purposes)
                      [,1]         [,2]      [,3]        [,4]        [,5]        [,6]       [,7]       [,8]
P_cinereus       0.2170911 -0.000276374 0.2592660 -0.05280429 -0.01647032 -0.01611658 -0.2561081 -0.1222936
P_nettingi       0.2182337 -0.007565857 0.2550516 -0.07272572 -0.02249780 -0.02076196 -0.2448233 -0.1139709
P_hoffmani       0.2157877 -0.002494932 0.2538394 -0.05447151 -0.02241651 -0.01523319 -0.2502073 -0.1157858
P_virginia       0.2155726  0.001949692 0.2575607 -0.05119322 -0.03633579 -0.01912463 -0.2491259 -0.1158853
P_serratus       0.2086011  0.005599942 0.2474015 -0.05370812 -0.02172973 -0.01504136 -0.2538781 -0.1217033
P_electromorphus 0.2094443 -0.001604654 0.2502967 -0.05379854 -0.02843583 -0.01551027 -0.2536498 -0.1242497
P_shenandoah     0.2164737  0.003217350 0.2621750 -0.04501284 -0.01737607 -0.01863017 -0.2564934 -0.1198138
P_hubrichti      0.2154898 -0.000836868 0.2522478 -0.06473866 -0.03708750 -0.02215923 -0.2497272 -0.1155023

P_richmondi      0.2115516 -0.005225566 0.2499019 -0.06172998 -0.02749914 -0.01938572 -0.2553196 -0.1238423
...

and you want to order them alphabetically by name, this is simply:

> y <- y[order(rownames(y)),] # where y is a 2D array of your data

The equivalent for your 3D array is:
> Y <- Y[,,order(dimnames(Y)[[3]])] # where Y is a 3D array

Easy! In these two examples, the specimen names are in the data matrix itself. But what if you want to sort by other classifiers?

Using this construct, you can be smarter with ordering and sort by any other classifier, simply by adding in the classifying vector as x in Y[,,sort(x)]
for example,

> data(plethodon)
> Y <- plethodon$land
# the individual specimens in this dataset are in a random order. To sort the individuals by species names as given by the classifier $species:
> plethodon$species 
 [1] Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah Teyah Teyah
[19] Teyah Teyah Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Jord  Teyah Teyah Teyah Teyah Teyah Teyah
[37] Teyah Teyah Teyah Teyah
Levels: Jord Teyah

> Y[,,order(plethodon$species)] # where Y is a 3D array

Manipulating datasets in R is challenging at first, but is easy once you know the tricks!

Emma

Update 19th April: The original version  of this post used the function sort() rather than order(). While they both came to the same conclusion in these examples, I should explain the difference between them, and why I changed them. sort() returns an ordered vector of the object within the parentheses, while order() returns the addresses of the elements within object if they would be ordered. order() therefore is more often used within Y[] than on its own.

2 comments:

  1. Hi Emma,
    You're making a wrong usage of "sort", you should be using "order" instead.
    In the particular case of rownames, sort can be used, but I would consider it a dangerous habit to foster.

    Best,
    Tal

    ReplyDelete
    Replies
    1. Thank you for your comment, Tal. Please can you explain why it is a "dangerous habit"? That sounds terribly onerous for such a task.

      I do however understand the difference between sort and order. I will make amends to the post to clarify this to my audience.

      Delete