TDAstats: R pipeline for computing persistent homology in topological data analysis

The TDAstats package is a comprehensive pipeline for conducting topological data analysis in R, allowing useRs to calculate, visualize, and conduct statistical inference on persistent homology in a Vietoris-Rips simplicial complex. The increased use of next-generation sequencing (and other newer experimental methods) has resulted in the increased prevalence of high-dimensional datasets. Although dimension reduction methods (e.g. principal component analysis) have been used with some success, it would be ideal to preserve all of the high-dimensional information within a dataset during data analysis. This is where topological data analysis and persistent homology come in.
Currently, the fastest method to compute persistent homology is the Ripser C++ library, a variant of which is wrapped by TDAstats using Rcpp. This allows TDAstats to outpace other comprehensive pipelines for topological data analysis while also being implemented in R, a language familiar to many data scientists. TDAstats allows visualization of persistent homology with either topological barcodes (below, left panel) or persistence diagrams (below, right panel) as publication-quality figures using the ggplot2 library, allowing useRs to fully customize the plots. Lastly, TDAstats is the first software library (as far as we know) that allows nonparametric statistical inference on persistent homology using permutation tests. This allows useRs to compare the topology, or "shape", of two datasets and determine if they originate from similar populations. Current efforts are being directed at applying TDAstats to elucidate knowledge about topological graph theory.
TDAstats has been published in the Journal of Open Source Software. For more details, you can visit its CRAN page; the associated GitHub repo contains the code behind TDAstats. See the file for information on how to get started.


sigQC is an R package, available on CRAN, that we have worked in collaboration to develop, defining an integrated methodology for gene signature quality control. Increasing amounts of genomic data mean that gene expression signatures are becoming critically important tools, poised to make a large impact on the diagnosis, management and prognosis for a number of diseases. For the purposes of this package, we define the term gene signature to mean: ‘a set of genes whose co-ordinated mRNA expression pattern is representative of a biological pathway, process, or phenotype, or clinical outcome.’
A key issue with gene signatures of this nature is whether the expression of many genes can be summarised as a single score, or whether multiple components are represented. In this package, we have automated the testing of a number of quality control metrics designed to test whether a single score, such as the median or mean, is an appropriate summary for the genes’ expression in a dataset. The tools in this package enable the visualization of properties of a set of genes in a specific dataset, such as expression profile, variability, correlation, and comparison of methods of standardisation and scoring metrics.

Read more about it on Andrew’s blog, or on the preprint.
Try it for yourself today as well -- CRAN package.

egtplot: A Python Package for 3-Strategy Evolutionary Games

Evolutionary game theory is a very broad modeling framework that effectively describes many aspects of biological cooperation and competition. Visualization of three-strategy evolutionary games has historically been difficult within the Python ecosystem. We have created a package to ease visualization efforts that is capable of displaying both static and animated dynamics with the game space. For detailed software usage instructions we refer to our interactive jupyter notebook. We also welcome comments and questions regarding our whitepaper on bioRxiv.
See our GitHub repository for detailed installation instructions.