Computational Methods for Paleogenomics and Comparative Genomics


Algorithms for NGS data analysis

In a fruitful collaboration with Faraz Hach (UBC and Vacouver Prostate Center) we are developing novel and efficient algorithms for processing large genomic and transcriptomic sequence data sets, mostly motivated by questions from cancer genomics. We are currently focusing on the analysis of Third Generation Sequencing data (PacBio and Nanopore) [Freddie (2021), HASLR (2020), CoLoRMap (2016)]. We are also working on data from other sequencing protocols such as barcoded short reads [Calib (2019)] and linked reads (collaboration with Rayan Chikhi) [WABI 2020].

Genome evolution

One of our favorite research problems, historically the initial research of our group, is the reconstruction of ancestral genome structures (genome maps based on synteny blocks, or ancestral gene orders) [MMB 2018]. We work within two methodological frameworks for this problem: a local approach, that considers a single ancestral genome within a given species phylogeny [PLoS Comput Biol (2008), ANGES (2012)], and a global (aka small parsimony approach), that considers all ancestral genomes of a species phylogeny at once [PhySca (2017)]. Over the last few years, we aimed to extend these approaches in order to work within a model accounting for gene family events such as gene duplication, loss or transfer [SCJ-SGD (2020), DeCoSTAR (2017)]. This line of work also motivated a series of papers on the reconciliation between gene trees and species trees [ecceTERA (2016), SuGeT].

Anopheles mosquito genomics and comparative scaffolding

We did apply our comparative genomics and genome rearrangement algorithms to a fascinating, large-scale, data set composed of toughly twenty Anopheles mosquito genomes [Science (2015)]. This in turn raises interesting questions on how to handle fragmented genome assemblies in genome rearrangement studies [ArtDeCo (2015), ADSeq (2018), BMC Biology (2020)].

Pathogen genomics

We recently started a very active collaboration with the SFU Computational Epidemiology lab of Dr. Leonid Chindelevitch, and Dr. Will Hsiao (BC center for Disease Control), focusing on the development and application of novel bioinformatics tools for the analysis of whole-genome sequencing data of microbial pathogens [MentaLiST (2018), HyAsP (2019), PathoGiST (2020)].

Ancient DNA

The recent breakthroughs in ancient DNA (aDNA) sequencing naturally complements complement methods for reconstructing ancient genomes. This motivated our project to assemble recently sequenced historical samples of the pathogen Yersinia pestis, showing that it is possible to go beyond single nucleotide mutations to analyze ancient pathogens data [FPSAC (2013), AGaPES (2017)].

Big data flow cytometry bioinformatics

This topic stems from a collaboration with Ryan Brinkman (BC Cancer Agency) and Max Libbrecht (SFU), funded by Genome Canada and NIH [flowGraph (2019), flowLearn (2018)].

Combinatorics problems motivated by bioinformatics questions

We are also interested in applications of enumerative combinatorics techniques to theoretical questions motivated by bioinformatics problems such as RNA secondary structures alignment [IJFCS 2018], RNA design [BCB 2019], gene trees counting [JMB 2020], and sequence alignment [PSC 2021].