Resources, Tools & Codebases
"When you take on a task, finding the best way to achieve the desired results is always your responsibility." - Gilbert Lafayette
"When you take on a task, finding the best way to achieve the desired results is always your responsibility." - Gilbert Lafayette
#X handle: @G2Pportal ↗️
The Genomics 2 Proteins portal has two main modules:
1. Gene/Protein Lookup: A human proteome-wide resource for mapping genetic variants from multiple databases (ClinVar, gnomAD, HGMD) onto protein sequences and structures from Protein Data Bank and AlphaFold databases. A suite of biologically interpretable protein features is available to explore for variant positions on proteins to hypothesize their impact on protein structure-function relationship.
2. Interactive Mapping: An interactive tool for users to upload their own mutational data and protein structures and establish the link between variant positions in proteins and their structure-function relationship.
Codebases / APIs:
➡️ G2P APIs: https://g2p.broadinstitute.org/api-docs
➡️ G2P3D API (gene to transcript to protein isoform to structure mapping): Provides API access to the Gene-Transcript-Protein Isoform-Structure identifier mapping for a given gene as a CSV file. The G2P3D API is available at the following endpoint:
https://g2p.broadinstitute.org/api/gene/:geneName/protein/:UniProtAC/gene-transcript-protein-isoform-structure-map
Examples: For LDLR, https://g2p.broadinstitute.org/api/gene/LDLR/protein/P01130/gene-transcript-protein-isoform-structure-map
➡️ G2P Protein Feature API: Provides all protein features integrated into the G2P portal for the canonical protein isoform for a given gene in a tabular format. The table has one row per amino acid in the protein sequence and columns for each feature. The features include physicochemical properties, structural features, PPIs, PTMs, UniProtKB annotations, pocket annotations, and MaveDB experimental data. The Protein Feature API is available at the following endpoint:
https://g2p.broadinstitute.org/api/gene/:geneName/protein/:UniProtAC/protein-features
Examples: For LDLR, https://g2p.broadinstitute.org/api/gene/LDLR/protein/P01130/protein-features
➡️ G2P API Client Library: An open-source Python package that provides streamlined access to the G2P API for automated data retrieval and analysis. Link: https://github.com/broadinstitute/g2papi ↗️
➡️ G2P - Bio Integration Suite: A suite of tools and algorithms used by Genomics 2 Proteins portal to integrate and align genomics and protein data. Link: https://github.com/broadinstitute/g2p-bis ↗️
➡️ Tutorials: YouTube playlist ↗️
The EMA Python library is designed to analyze and compare embeddings from different foundation models, for example, protein language models: ESM1v and ESM2, for a set of samples. EMA examines pairwise distances to uncover local and global patterns and tracks the representations and relationships of these groups across different embedding spaces.
Codebases / APIs:
➡️ EMA source code: https://github.com/broadinstitute/ema ↗️
Examples:
Google CoLab ↗️: how to use the ema-tool to compare protein embeddings across three ESM models.
Google CoLab ↗️: how to use the ema-tool to compare embeddings of missense mutations across two ESM models
This repository contains the pre-trained Graphical Neural Network (ChemProp/GNN) and Multilayer Perceptron (MLP/ANN) models and scripts for virtual screen small molecule libraries for predicting hits/binders for Casein kinase 1𝛼/δ (CK1𝛼/δ). We also provide scripts to analyze the chemical diversity of the library using tSNE.
Codebases / APIs:
➡️ GitHub: https://github.com/broadinstitute/DEL-ML-Refactor ↗️