Traversing nodes in a phylogenetic tree
Converting internal node numbering from postorder to preorder in a bifurcating tree using DendroPy
.
In a tree structure, we can visit each node in different orders: inorder (left subtree → root → right subtree), preorder (root → left subtree → right subtree), postorder (left subtree → right subtree → root), etc. A detailed introduction to tree traversal algorithms can be found
here.
For example, we have a tree with five species: ((A,B),(C,(D,E))); In the above tree, if we visit every node (including tips and internal nodes) in preorder, we will start at the root, and always visit left branch before right branch:
Preorder 1 2 3 4 5 6 7 8 9
Nodes R* X* A B Y* C Z* D E
*: internal nodes
If following postorder, we would go to left subtree, then right subtree, and visit the root in the end:
Postorder 1 2 3 4 5 6 7 8 9
Nodes A B X* C D E Z* Y* R*
Sometimes in phylogenetics, nodes are indexed slightly differently: we would want to visit the tips of a tree first, and then the internal nodes would be in preorder or postorder, depending on the program.
In the ggtree
R package, internal nodes in a tree are always in preorder. However, in one of our datasets, the internal nodes were indexed in postorder. To plot the tree correctly, I will need to convert the internal node numbers:
Postorder_internal 1 2 3 4 5 6 7 8 9
Nodes A B C D E X* Z* Y* R*
Preorder_internal 1 2 3 4 5 6 7 8 9
Nodes A B C D E R* X* Y* Z*
Here is a python script to obtain both postorder and preorder internal node numbers using the DendroPy
library:
1 

python internalnodes_postorder_preorder.py tree.tre 5
where
tree.tre
contains the tree file in newick format: ((A,B),(C,(D,E)));
and 5
is the number of species(tips) in the tree.The output of the python script will look like the following:
Postorder => Preorder
6 7
7 9
8 8
9 6