Research

As a computational biologist, I possess a unique blend of expertise in biology and computer science that allows me to solve complex biological problems using computational tools and techniques. I have a strong background in molecular biology, genomics, and computer science, coupled with extensive experience in programming languages such as Python, and R. I am proficient in analyzing large datasets, developing algorithms, and using statistical models to gain insights into biological phenomena. With my excellent problem-solving skills and ability to communicate complex scientific concepts to both technical and non-technical audiences, I thrive in interdisciplinary teams, collaborating effectively with wet lab scientists, bioinformaticians, and software developers. I am passionate about applying my skills to advance scientific research, and excited to contribute to a better understanding of human health. Learn more about my open source contributions here.

Thesis Project: Elucidating the genetic and cellular origins of diffuse large B cell lymphoma

Advisor: Sandeep Dave
As part of my thesis project, I have been working in the Dave Lab, working on understanding the differences between the various subtypes of DLBCL using genomic and transcriptomic sequencing data. At the same time, I have also been facilitating projects under the Atlas of Blood Cancer Genomes (ABCG) as the bioinformatics lead for multiple projects with collaborators across the globe.

Previous Projects:

During my first year in the Computational Biology and Bioinformatics PhD program, I had the chance to rotate with three labs before having to select my lab. I worked with (a) Alex Hartemink, (b) Sandeep Dave, and (c) Ed Iversen (with Ravi Karra). After working with these excellent people, I decided to return to the Dave Lab for my dissertation work. You can find my CV here. Below is a brief description of some of my previous projects.

Clonal analysis of cardiomyocyte growth and regeneration

Advisor: Ed Iversen (with Ravi Karra)
In my third rotation, I worked on creating a statistical model for analysis of cardiomyocte proliferation. The biological question to be answered was how the distribution of proliferating cardiomyocyte cells differs from the background growth rate and how the location of cardiomyocytes was correlated with the vasculature. We worked with fluorescence microscopy images (with three channels) of heart slices that had been injured. Currently, I'm still working on wrapping up this project.

Reconstructing Immunoglobulin sequences from bulk sequencing data

Advisor: Sandeep Dave
In my second rotation, I worked on immunoglobulin reconstruction from tumor bulk sequencing data. The immunogolobulin locus is a highly variable locus that is highly mutated to the extent that almost every B cell has a different sequence at this locus. This extreme variability is the secret behind our body's excellent immune system. In lymphomas, one of these B cells is clonally replicated and thus there exists a population of cells that exhibit a certain clonotypic sequence at this locus. Single cell methods have been used to identify these clonotypes in the recent past. I used bulk sequencing data to accomplish the same task. I presented my work on this project at the CBB 2019 retreat poster session - you can find my poster here.

Understanding non-coding transcripts

Advisor: Alex Hartemink
In my first rotation, I had the pleasure of working with Alex Hartemink where I worked with RNAseq expression data for non coding trancripts. We know that there exists pervasive non-coding transcription in the yeast genome. I worked on developing and implementing a systemic classification of pertinent transcripts based on their location with respect to coding genes. The next step was to identify the relation between the changes in the transcription of the genes and the adjacent non-coding transcripts. I ended the project the project by looking at the gain/loss in nucleosomal structure in these adjacent gene-transcript pairs.

Selecting Features from Sample Specific Coexpression Networks using Random Forests

Advisor: ChloƩ-Agathe Azencott
Sample Specific Coexpression networks may be evaluated from the aggregate network by estimating the effect of each sample on the network. The proposition here is that there might be other dissmilarity measures to calculate the edge weight that are at least as expressive. I used random forests to predict the edge weights using measures including L1, L2, and Mahalanobis distances, and identified important features using permutation feature importance. (Code)

Modeling the optimal propensity of lysogeny for coexisting populations

Advisor: Supreet Saini
Temperate phages make a developmental decision between lysogeny and lysis in order to avoid the extinction of not only their own species but also of their bacterial hosts. I worked on estimating the optimal lysogenic propensity as a function of the environmental stresses for individual species and the multiplicity of infection in order to maximize coexistence (biorXiv).