As a first year student in the Computational Biology and Bioinformatics program, I get the chance to rotate with three labs before having to select my lab. I'm currently in my second rotation, with one more to go before I choose my home for the next few years. Below is a brief description of projects I've recently been a part of.

Reconstructing Immunoglobulin sequences from bulk sequencing data

Advisor: Sandeep Dave
I'm currently rotating in the Dave Lab working on immunoglobulin reconstruction from bulk data. The immunogolobulin locus is a highly variable locus that is highly mutated to the extent that almost every B cell has a different sequence at this locus. This extreme variability is the secret behind our body's excellent immune system. In lymphomas, one of these B cells is clonally replicated and thus there exists a population of cells that exhibit a certain clonotypic sequence at this locus. Single cell methods have been used to identify these clonotypes in the recent past. I'm using bulk sequencing data to accomplish the same task. Hit me up to know more!

Understanding non-coding transcripts

Advisor: Alex Hartemink
In my first rotation, I had the pleasure of working with Alex Hartemink where I worked with RNAseq expression data for non coding trancripts. We know that there exists pervasive non-coding transcription in the yeast genome. I worked on developing and implementing a systemic classification of pertinent transcripts based on their location with respect to coding genes. The next step was to identify the relation between the changes in the transcription of the genes and the adjacent non-coding transcripts. I ended the project the project by looking at the gain/loss in nucleosomal structure in these adjacent gene-transcript pairs. Hit me up to know more!

Selecting Features from Sample Specific Coexpression Networks using Random Forests

Advisor: ChloƩ-Agathe Azencott
Sample Specific Coexpression networks may be evaluated from the aggregate network by estimating the effect of each sample on the network. The proposition here is that there might be other dissmilarity measures to calculate the edge weight that are at least as expressive. I used random forests to predict the edge weights using measures including L1, L2, and Mahalanobis distances, and identified important features using permutation feature importance. (Code) Hit me up to know more!

Modeling the optimal propensity of lysogeny for coexisting populations

Advisor: Supreet Saini
Temperate phages make a developmental decision between lysogeny and lysis in order to avoid the extinction of not only their own species but also of their bacterial hosts. I worked on estimating the optimal lysogenic propensity as a function of the environmental stresses for individual species and the multiplicity of infection in order to maximize coexistence (biorXiv). Hit me up to know more!