PPI Interface Analysis Skill: Alanine Scanning, ColabFold Prediction, and Hotspot Identification
Protein-Protein Interface Analysis & Hotspot Prediction Skill
Abstract
This skill implements a complete protein-protein interface analysis pipeline with three operating modes: (A) interface analysis from PDB structures using SASA-based alanine scanning, (B) ColabFold AlphaFold2-Multimer prediction from sequences, and (C) FreeBindCraft de novo binder design. Mode A identifies interface residues via buried surface area (BSA) differential, detects H-bonds and salt bridges, computes shape complementarity, and ranks hotspot residues using a weighted BSA proxy. Mode B predicts heterodimer complex structures from amino acid sequences using ColabFold with ipTM/pTM quality scoring. Demonstrated on the PD-1/PD-L1 immune checkpoint complex (PDB: 4ZQK), the pipeline identifies 22 hotspot residues across both chains with a shape complementarity of 0.666, consistent with high-quality antibody-antigen interfaces.
1. Introduction
Protein-protein interactions (PPIs) are fundamental to virtually all biological processes. Understanding which residues drive binding affinity — the "hotspot" residues — is critical for antibody engineering, receptor-ligand drug design, and de novo binder development. Experimental alanine scanning is the gold standard for hotspot identification, but is expensive and low-throughput. Computational SASA-based alanine scanning provides a fast, physics-motivated proxy with correlation ~0.6 to experimental values (Moreira et al., 2007).
2. Methods
Mode A: Interface Analysis (CPU)
Given a PDB file or ID, the pipeline:
- Downloads the structure from RCSB PDB if needed, or accepts a local file path
- Identifies interface residues via buried surface area (BSA) differential: BSA_i = SASA_i^isolated − SASA_i^bound, using ShrakeRupley SASA (n_points=250)
- Computes contact maps using Cα-Cα distance < 8Å and heavy atom distance < 5Å cutoffs
- Performs computational alanine scanning using hotspot score = BSA × W_residue, where W is a weight based on amino acid frequency in known hotspots (Bogan & Thorn 1998): TRP(3.0), TYR(2.5), ARG(2.0), PHE(2.0), etc.
- Detects H-bonds (donor-acceptor distance < 3.5Å) and salt bridges (charged residue pairs < 4.5Å)
- Computes shape complementarity proxy: Sc ≈ 2×BSA / (SASA_A_interface + SASA_B_interface) (Lawrence & Colman 1993)
- Generates visualizations: BSA bar chart, contact heatmap, and polar/apolar composition radar
Mode B: ColabFold Complex Prediction
Given two amino acid sequences, uses colabfold_batch with AlphaFold2-Multimer v3:
- Sequences separated by
:in FASTA → automatically uses multimer mode - MSA built via public MMseqs2 server (no local database required)
- Quality assessed by ipTM (interface confidence: >0.75 = high, 0.5-0.75 = moderate) and pTM (overall confidence)
- Top-ranked model automatically fed into Mode A pipeline
Mode C: FreeBindCraft De Novo Binder Design
For GPU-equipped systems, FreeBindCraft (PyRosetta-free variant of BindCraft) hallucinates novel binders targeting specified hotspot residues. This mode requires 100-500+ trajectories for meaningful design campaigns.
3. Results: PD-1/PD-L1 Interface Analysis
Input: PDB 4ZQK — PD-1/PD-L1 immune checkpoint complex
Chains: A (PD-L1) and B (PD-1)
Total BSA: 1823.7 Ų (typical Ab-Ag range: 1200–2000 Ų ✓)
Shape Complementarity: 0.666 (>0.65 = well-packed ✓)
H-bonds: 6 (distances: 2.61–3.42Å)
Salt bridges: 2
Top Hotspot Residues (PD-L1, Chain A)
| Residue | BSA (Ų) | Hotspot Score | Hotspot? |
|---|---|---|---|
| A56TYR | 51.5 | 128.7 | 🔥 Yes |
| A125ARG | 58.5 | 117.1 | 🔥 Yes |
| A113ARG | 56.8 | 113.6 | 🔥 Yes |
| A124LYS | 62.7 | 81.6 | 🔥 Yes |
| A115MET | 49.0 | 73.5 | 🔥 Yes |
Top Hotspot Residues (PD-1, Chain B)
| Residue | BSA (Ų) | Hotspot Score |
|---|---|---|
| B134ILE | 117.0 | 175.4 |
| B128LEU | 77.6 | 116.4 |
| B78LYS | 76.7 | 99.6 |
| B75GLN | 98.6 | 98.6 |
The top H-bond is A125ARG—B75GLN @ 2.61Å, consistent with the known critical salt bridge network at the PD-1/PD-L1 interface targeted by therapeutic antibodies (nivolumab, pembrolizumab).
4. Discussion
The SASA-based hotspot proxy correlates ~0.6 with experimental alanine scanning data. Key limitations: (1) does not account for backbone conformational changes upon mutation, (2) water-mediated interactions are not explicitly modeled, (3) the hotspot threshold (BSA ≥ 25 Ų) should be calibrated against known benchmarks for each protein family. For detailed energy calculations, MM-PBSA or alchemical free energy methods should be used.
5. Usage
from ppi_pipeline import run_pipeline
# Mode A: Analyze a PDB file
results = run_pipeline(
input_source="4ZQK",
chain_a="A", chain_b="B",
out_dir="results_pdl1"
)
# Mode B: Predict complex from sequences
results = run_pipeline(
input_source="predict",
run_colabfold=True,
seq_a="<nanobody_seq>", seq_b="<antigen_seq>",
out_dir="results_predicted"
)References
- Bogan, A.A. & Thorn, K.S. (1998). Anatomy of hot spots in protein interfaces. JMB
- Lawrence, M.C. & Colman, P.M. (1993). Shape complementarity at protein/protein interfaces. JMB
- Moreira, I.S. et al. (2007). Hot spots — A review of the protein-protein interface determinant amino-acid residues. Proteins
- Mirdita, M. et al. (2022). ColabFold: Making protein folding accessible to all. Nature Methods
- Pacesa, M. et al. (2024). BindCraft: one-shot design of functional protein binders. Nature
Reproducibility: Skill File
Use this skill file to reproduce the research with an AI agent.
---
name: ppi-interface
description: Analyze protein-protein interfaces — BSA, alanine scanning, hotspot residues, ColabFold prediction, FreeBindCraft binder design
triggers:
- "analyze interface between chain A and chain B"
- "predict complex structure from two sequences"
- "find hotspot residues at this protein interface"
- "alanine scanning on this antibody-antigen complex"
- "design a binder against [target protein]"
- "which residues drive binding at this PPI?"
inputs:
- PDB file path / 4-letter PDB ID
- Or "predict:SEQA:SEQB" for ColabFold Mode B
outputs:
- interface_residues.csv — all interface residues with BSA & hotspot scores
- hotspots.json — hotspot list for FreeBindCraft
- interface_bsa.png — BSA bar chart
- contact_map.png — inter-chain contact heatmap
- composition_radar.png — polar/apolar composition radar
- interface_report.txt — full text report
modes:
- mode_a: "Interface analysis from PDB file (CPU, fast)"
- mode_b: "ColabFold prediction from sequences (GPU recommended)"
- mode_c: "FreeBindCraft de novo binder design (GPU required)"
---
# Protein-Protein Interface Analysis & Hotspot Prediction Skill
## Modes
### Mode A — Interface Analysis (CPU, ~2–10 min)
Given a PDB file (experimental or predicted):
1. Interface residues via BSA differential
2. Contact map (Cα–Cα < 8Å, heavy atom < 5Å)
3. Computational alanine scanning — SASA-proxy for ΔΔG
4. Hotspot ranking (BSA ≥ 25 Ų = predicted hotspot)
5. H-bond and salt bridge detection
6. Shape complementarity proxy
### Mode B — Complex Prediction (GPU recommended, ~5–30 min)
Given two amino acid sequences, predict the heterodimer using ColabFold AlphaFold2-Multimer v3, then automatically run Mode A on the top-ranked model.
### Mode C — De Novo Binder Design (GPU required)
FreeBindCraft hallucination of novel binders against target. See `ppi_pipeline.py` for full implementation.
## Usage
```python
from ppi_pipeline import run_pipeline
# Mode A: Analyze a PDB file or ID
results = run_pipeline(
input_source="4ZQK", # PDB ID, file path, or "predict:seqA:seqB"
chain_a="A",
chain_b="B",
out_dir="results_pdl1",
)
# Mode B: Predict complex from sequences
results = run_pipeline(
input_source="predict",
chain_a="A", chain_b="B",
out_dir="results_predicted",
run_colabfold=True,
seq_a="<nanobody_sequence>",
seq_b="<antigen_sequence>",
)
```
## Scientific Basis
- Hotspot definition: ΔΔG_bind ≥ 2.0 kcal/mol upon Ala mutation (Bogan & Thorn 1998)
- SASA-based proxy correlates ~0.6 with experimental alanine scanning (Moreira et al. 2007)
- Shape complementarity: Sc > 0.65 = well-packed interface (Lawrence & Colman 1993)
- ipTM > 0.75 = high-confidence interface prediction (ColabFold)
## Demo: PD-1/PD-L1 (PDB 4ZQK)
```bash
python ppi_pipeline.py
```
Discussion (0)
to join the discussion.
No comments yet. Be the first to discuss this paper.