Research Projects & Resources
"Anything that can go wrong will go wrong." - Edward Aloysius Murphy Jr
"Anything that can go wrong will go wrong." - Edward Aloysius Murphy Jr
#X handle: @G2Pportal ↗️
➡️ Have you tried to Investigate your target gene by mapping mutational data together with functional/genomic annotation on protein sequences and 3D structures? Our lab develops the Genomics 2 Proteins (G2P) portal to enable broad and diverse biomedical and computational scientists to connect genetic screening outputs to protein sequences and structures efficiently and interactively.
The G2P portal generalizes the capability of linking genomics to proteins beyond databases by allowing users to interactively upload protein residue-wise annotations (variants, scores, etc.) as well as the protein structure to establish the connection. The portal is an easy-to-use discovery tool for researchers and scientists to Generate Hypotheses on the Structure-Function Relationship between Natural or Synthetic Genetic Variations and their Molecular Phenotype.
Read the G2P Portal Flagship Publication! Check out the portal's use cases in this Video Tutorial Series, Talks/Workshops.
For Codes/APIs related to the portal, Click here ↗️
➡️ Protein-coding, single-gene rare disease variants have a profoundly big impact on protein structure–function relationships. Can We Use Biologically Interpretable Protein Features for Characterizing Disease Mutations at Scale?
We hypothesize that protein structural and functional features can be leveraged to interpret missense variants, leading to single amino acid substitutions in proteins. Towards that, we are developing methods for protein function-specific interpretation of the molecular effect of variants with precision, such as the impact on protein stability/conformation, enzymatic or catalytic activity, inter- and intra-molecular interactions, and post-translational modifications.
Learn more: Poster (AHSG'23); prior Publications (PNAS 2020, BRAIN 2022, 2023).
➡️ Many (rare) disease mutations trigger pathogenesis by misfolding the protein, leading to cellular toxicity and diseases. Can We Quantify Mutations' Effects on Protein Folding, Conformation, and Dynamics?
Autosomal dominant tubulointerstitial kidney disease (ADTKD) represents a rare, yet clinically significant genetic disorder caused by UMOD genetic variants. Using all-atom molecular dynamics simulations, we explored the degree of conformational change caused by UMOD pathogenic missense variants. Simulation results reveal that UMOD variants exhibit significant conformational changes, displaying highly extended conformations, and experience significant protein destabilization.
Learn more: Abstract (BPS'24).
➡️ Many classically "druggable" targets are "understudied." What can we learn about these targets using computational approaches, such as Structure Prediction using AlphaFold, Focket Identification, and In Silico Saturation Mutatgenesis?
Kinases are enzymes with critical roles in regulation and metabolism; they are commonly targeted in oncology and, recently, in other diseases. However, hundreds of kinases are marked "understudied" by the NIH Pharos database due to the lack of known biology and chemical matter. We are interested in charting important sites in these kinases by generating variant effect maps on stability (folding free energy upon mutation; ΔΔG) and identifying druggable pockets.
Learn more: Abstract (BPS'24).
➡️ Ion channels are reportedly associated with many neurodevelopmental disorders and have been targets of therapeutics. Can We Combine AI and Protein Features to Predict the Functional Effect of Ion Channel Variants?
The functional effects of ion channel variants are heterogeneous (a spectrum of loss of functional, neutral, and gain of function effects). We sought to study the combined power of (evolutionary) information captured in protein language models and (mechanistic) information repressed by protein sequence and structure into ion channel variant function prediction.
Learn more: Poster (MSS'24, ISMB'24); prior Publications (Sci. Transl. Med. 2020).
➡️ Embeddings from Protein Language Models (PLMs) are increasingly used to develop predictors of protein features and mutation effects. How to Compare PLM Embeddings Before Using them for Downstream tasks?
Our team is interested in developing a tool to analyze and compare information captured in the high-dimensional embeddings of PLMs to find the best PLM for a downstream biological task. Check out the first version of the tool: EMA, which shows differences across PLMs, their complementarity with traditional protein features, and probable gene bias in variant effect prediction.
Learn more: Preprint (BioRxiv 2024), Codes/APIs related to EMA, Click here ↗️
➡️ DNA-Encoded Library (DEL) technology allows the screening of millions, or even billions, of encoded compounds in a pooled fashion which is faster and cheaper than traditional approaches. These massive amounts of data related to DEL binders and not-binders to the target of interest enable Machine Learning (ML) model development and screening of large, readily accessible, drug-like libraries in an ultra-high-throughput fashion. In this project, we developed A DEL+ML Pipeline for Hit Discovery using three DELs and five ML Models (fifteen DEL+ML combinations) to Identify Novel Binders of Validated Cancer Targets, CK1𝛼/δ.
Read the ChemrXiv Preprint!
For Codes/APIs related to the portal, Click here ↗️
➡️ Breakthroughs in precision genome editing technologies now enable the systematic mutation of endogenous proteins at scale and directly in cells, in their native cellular and genomic contexts. Efficient computational methods, however, are needed to identify functional hotspots out of large-scale base editor (BE) mutagenesis screens and unravel insights into protein complex function, regulation, and structure. Our lab develops Methods for Clustering Mutagenesis Readouts and Investigating the Structure-Function Relationship of the Readouts using Machine Learning and Structural Bioinformatics.
Learn more: Coming soon!