Algorithms for NGS data analysis
In a fruitful collaboration with Faraz Hach (UBC and Vacouver Prostate Center)
we are developing novel and efficient algorithms for processing large genomic and transcriptomic sequence
data sets, mostly motivated by questions from cancer genomics. We are currently focusing on the analysis of
Third Generation Sequencing data (PacBio and Nanopore)
[
Freddie (2021),
HASLR (2020),
CoLoRMap (2016)].
We are also working on data from other sequencing protocols such as barcoded short reads
[
Calib (2019)]
and linked reads (collaboration with Rayan Chikhi)
[
WABI 2020].
Genome evolution
One of our favorite research problems, historically the initial research of our group,
is the reconstruction of ancestral genome structures (genome maps based on synteny blocks, or
ancestral gene orders) [
MMB 2018].
We work within two methodological frameworks for this problem: a local approach, that
considers a single ancestral genome within a given species phylogeny
[
PLoS Comput Biol (2008),
ANGES (2012)],
and a global (aka small parsimony approach), that considers all
ancestral genomes of a species phylogeny at once
[
PhySca (2017)].
Over the last few years, we aimed to extend these approaches in order to work within a
model accounting for gene family events such as gene duplication,
loss or transfer
[
SCJ-SGD (2020),
DeCoSTAR (2017)].
This line of work also motivated a series of papers on the reconciliation between gene trees and species
trees [
ecceTERA (2016),
SuGeT].
Anopheles mosquito genomics and comparative scaffolding
We did apply our comparative genomics and genome rearrangement algorithms to
a fascinating, large-scale, data set composed of toughly twenty
Anopheles mosquito genomes
[
Science (2015)].
This in turn raises interesting questions on how to handle fragmented genome assemblies in genome
rearrangement studies [
ArtDeCo (2015),
ADSeq (2018),
BMC Biology (2020)].
Pathogen genomics
We recently started a very active collaboration with the SFU Computational Epidemiology lab of
Dr. Leonid Chindelevitch,
and Dr. Will Hsiao (BC center for Disease Control), focusing on the development and application of
novel bioinformatics tools for the analysis of whole-genome sequencing data of microbial pathogens
[
MentaLiST (2018),
HyAsP (2019),
PathoGiST (2020)].
Ancient DNA
The recent breakthroughs in ancient DNA (aDNA) sequencing naturally complements
complement methods for reconstructing ancient genomes. This motivated our project to
assemble recently sequenced historical samples of the pathogen
Yersinia pestis,
showing that it is possible to go beyond single nucleotide mutations to analyze ancient
pathogens data
[
FPSAC (2013),
AGaPES (2017)].
Big data flow cytometry bioinformatics
This topic stems from a collaboration with Ryan Brinkman (BC Cancer Agency) and Max Libbrecht
(SFU), funded by Genome Canada and NIH
[
flowGraph (2019),
flowLearn (2018)].
Combinatorics problems motivated by bioinformatics questions
We are also interested in applications of enumerative combinatorics techniques to theoretical questions motivated by
bioinformatics problems such as RNA secondary structures alignment [
IJFCS 2018],
RNA design [
BCB 2019],
gene trees counting [
JMB 2020], and sequence alignment
[
PSC 2021].