GATK for Microbes: by Bhanu Gandham

Описание к видео GATK for Microbes: by Bhanu Gandham

The detection of mutations in any microbial genomes is essential in understanding drug resistance, immune evasion, and other epidemiological characteristics of infectious disease. In an effort to leverage the algorithms that have already become the standard for human genomic data processing, thousands of researchers have applied the Broad’s Genome Analysis Toolkit (GATK) to microbial variant discovery. However, this human-focused software may not currently provide the best results for all pathogens. For example, genomic “trouble spots” (e.g. repetitive loci, regions of high genetic diversity, translocated or entirely absent regions, etc.) are often ignored in human datasets as they represent a small percentage of the overall genome. In contrast, pathogen genomes are much smaller and these trouble spots are proportionally more abundant. In addition, these same trouble spots often underlie clinically important phenotypes (e.g. severity, transmissibility, resistance to antimicrobials, etc.) that are the immediate subject of investigation, and thus cannot be omitted from analysis.

To provide the bacterial research community with robust variant calling methods - funded by the Chan Zuckerberg Initiative’s Essential Open Source Software for Science program - we optimized the GATK to call short variants on bacterial genomic datasets. We developed an automated workflow that calls high-quality filtered variants against a single reference using short-reads sequencing data. We optimized for low allele frequencies, varying read depths, and sequencing and mapping errors typical of microbial data—resulting in improved sensitivity and precision. For improved coverage across circular bacterial genomes, we developed a tool that calls variants on reads spanning the artificial breakpoint at which they are linearized for sequencing. We are currently working on expanding the GATK’s usability to other microbes - viruses, fungi and protozoans.

Another one of our goal for this project is to encourage community adoption and to provide the microbial research community with an automated best-practices variant discovery workflow, that is runtime and cost efficient. To that end, we have made our scalable, and reproducible end-to-end WDL (Workflow Description Language) workflow publicly available, along with technical documentation and test datasets in Terra, which is a cloud native platform developed at the Broad Institute.



Bhanu Gandham

Bhanu Gandham is a bioinformatics scientist and she is very passionate about developing new and improved computational solutions to solve research challenges that the biomedical community faces. Before joining the Broad Institute, she has had a well-rounded industry and academic experience working as a systems engineer at an IT company, a Bioinformatics Staff Scientist in an epigenetic lab at Emory School of Medicine, and as a Bioinformatics Lead of Exome pipelines at Veritas Genetics. At the Broad Institute, Bhanu previously led the bioinformatics support efforts for the Genome analysis toolkit (GATK) software package. In that role, she was responsible for helping genomics researchers adopt and efficiently implement the GATK tools.


Closely interacting with the genomics community and having worked as a software engineer and a scientist in academia and in industry, Bhanu gained a unique perspective which helps her to identify gaps in engineering to support genomic research. It enabled her see first-hand the need for bioinformatics support in microbial research, which inspired her to take on her current role as a Computational Scientist leading development effort to build best-practices for microbial variant discovery, amongst other projects that are aimed at solving biomedical research challenges the community faces today.

Комментарии

Информация по комментариям в разработке