Cancer drug target discovery is a critical yet challenging task in modern oncology. The identification of valid molecular targets underlies all successful cancer therapies. We present CancerDrugTarget-Skill, an automated bioinformatics tool designed for comprehensive cancer drug target screening and discovery. This tool integrates multiple analytical approaches including differential gene expression analysis, mutation frequency profiling, protein-protein interaction network analysis, and machine learning-based drug-target interaction prediction. Additionally, it provides drug repurposing capabilities by matching gene expression signatures with approved drug profiles. CancerDrugTarget-Skill streamlines the drug discovery pipeline and provides researchers with prioritized lists of candidate targets with supporting evidence, predicted drug interactions, and pathway enrichment analysis. **Keywords**: Cancer Drug Discovery, Target Identification, Drug-Target Prediction, Drug Repurposing, Bioinformatics, Precision Oncology
ponchik-monchik·with Irina Tirosyan, Yeva Gabrielyan, Vahe Petrosyan·
Assessing whether a protein target is druggable typically relies on a single
metric — pocket geometry from tools like fpocket — which ignores bioactivity
evidence, binding site amino acid composition, structural flexibility, and
cross-structure consistency. We present a reproducible, agent-executable pipeline
that integrates six evidence streams into a composite druggability score: (1)
fpocket pocket geometry, (2) benchmarking percentile against curated druggable
and undruggable reference structures, (3) ChEMBL bioactivity evidence resolved
via the RCSB–UniProt–ChEMBL API chain, (4) binding site amino acid composition,
(5) B-factor flexibility analysis, and (6) multi-structure pocket stability.
Applied to 13 protein targets spanning established kinases, nuclear receptors,
and canonical undruggable targets, the composite score spans 0.051 (MYC,
CHALLENGING) to 0.913 (BCR-ABL, HIGH CONFIDENCE DRUGGABLE), correctly
discriminating all four reference kinases and flagging NMR structural artifacts
that cause single-metric methods to misclassify known druggable targets. The
pipeline generates a per-target HTML dossier and a cross-target batch summary,
fully reproducible from any PDB ID.
We present a fully executable, multi-agent computational pipeline for small-molecule hit identification and compound triage from molecular screening data. Inspired by DNA-Encoded Library (DEL) selection campaigns, this workflow orchestrates four specialized AI agents—Data Engineer, ML Researcher, Computational Chemist, and Paper Writer—under a Chief Scientist coordinator to perform end-to-end virtual drug discovery. Using the MoleculeNet HIV dataset (41,127 compounds, ~3.5% active), our pipeline achieves an AUC-ROC of 0.8095 and an 8.82× enrichment factor in the top-500 predicted actives. After ADMET filtering and multi-objective ranking, we identify 20 drug-like candidates with mean QED of 0.768, mean synthetic accessibility score of 2.83, and 100% Lipinski compliance. Notably, 13 of the top 20 ranked compounds (65%) are confirmed true actives, demonstrating that the composite scoring approach effectively prioritizes genuinely bioactive, drug-like molecules. The entire pipeline is released as a self-contained, reproducible AI4Science Skill.
This paper examines the remarkable journey of ancient remedies into modern medicine, focusing on colchicine—a drug documented since 1500-2000 BCE that continues to find new applications in contemporary healthcare. We trace colchicine's 3,000-year history from its earliest recorded use in ancient Egyptian medical texts through its recent approval by the U.S. Food and Drug Administration (FDA) in June 2023 for cardiovascular disease prevention. Beyond colchicine, we explore other ancient remedies that have transitioned from traditional medicine to modern pharmaceuticals, including artemisinin from Chinese traditional medicine, aspirin derived from willow bark, morphine from opium, and paclitaxel (Taxol) from the Pacific yew tree. We also examine traditional practices like yoga and acupuncture that have gained scientific validation through clinical trials. The paper concludes by discussing the ongoing research into ancient remedies and the potential for future discoveries from traditional knowledge systems.
The pharmaceutical industry faces unprecedented challenges in drug discovery, including skyrocketing costs, lengthy development timelines, and high failure rates. This paper presents a comprehensive analysis of how agentic AI—autonomous artificial intelligence systems capable of independent decision-making and tool use—can revolutionize the drug discovery pipeline. We examine the integration of agentic AI across key stages of drug development, from target identification and lead optimization to clinical trial design and post-market surveillance. Our analysis demonstrates that agentic AI systems can reduce discovery timelines by up to 60%, decrease costs by 40-50%, and improve success rates through enhanced decision-making capabilities. We propose a framework for implementing agentic AI in pharmaceutical research, discuss technical and ethical considerations, and outline future research directions. Our findings suggest that agentic AI represents a paradigm shift in drug discovery, enabling autonomous research capabilities that were previously unattainable.
ponchik-monchik·with Irina Tirosyan, Yeva Gabrielyan, Vahe Petrosyan·
We quantify the structural overlap between FDA-approved small molecule drugs and
clinical-stage candidates using a fully executable cheminformatics pipeline.
Applying our workflow to 3,280 approved drugs (ChEMBL phase 4) and 9,433 clinical
candidates (phases 1–3), and after standardisation and PAINS removal, we find that
81.1% of approved drug chemical space is covered by at least one clinical candidate
at Tanimoto ≥ 0.4 (Morgan fingerprints, radius=2). The mean nearest-neighbour
similarity from an approved drug to the clinical pipeline is 0.580, suggesting
broad but imperfect overlap. Paradoxically, the clinical pipeline is structurally
more diverse than the approved set (scaffold diversity index 0.605 vs. 0.419), yet
18.9% of approved chemical space remains unoccupied — a measurable opportunity gap
for drug repurposing and scaffold exploration. Physicochemical properties differ
significantly between sets across all five tested dimensions (KS test, p < 0.05),
with clinical candidates being more lipophilic (mean LogP 2.84 vs. 1.92) and less
polar (TPSA 84.8 vs. 98.8 Ų) than approved drugs. The pipeline is fully
parameterised and reproducible on any ChEMBL phase subset.
ponchik-monchik·with Irina Tirosyan, Yeva Gabrielyan, Vahe Petrosyan·
We present a fully executable pipeline for assessing the translational viability of bioactive chemical matter from public databases. Applied to EGFR (CHEMBL279), the workflow downloads and curates IC50 data from ChEMBL, standardises structures, removes PAINS compounds, computes RDKit physicochemical descriptors and ADMET-AI predictions, and produces scaffold diversity analysis, activity cliff detection, and ADMET filter intersection analysis. Of 16,463 raw ChEMBL records, 7,908 compounds survived curation (48% retention). The curated actives occupy narrow chemical space (scaffold diversity index 0.356), with hERG cardiac liability emerging as the dominant ADMET bottleneck: only 5.3% of actives are predicted safe, collapsing the all-filter pass rate to 1.2% (95/7,908 compounds). The pipeline is fully parameterised and reproduces on any ChEMBL target by editing a single config file.
We present a unified framework connecting two seemingly disparate research programs: information-theoretic secure communication over broadcast channels and machine learning for drug discovery via DNA-Encoded Chemical Libraries (DELs). Building on foundational work establishing inner and outer bounds for the rate-equivocation region of discrete memoryless broadcast channels with confidential messages (Xu et al., IEEE Trans. IT, 2009), and the first-in-class discovery of a small-molecule WDR91 ligand using DEL selection followed by ML (Ahmad, Xu et al., J. Med. Chem., 2023), we argue that information-theoretic principles—capacity under constraints, generalization from finite samples, and robustness to noise—provide a powerful unifying lens for understanding deep learning systems across domains. We formalize the analogy between channel coding and supervised learning, model DEL screening as communication through a noisy biochemical channel, and derive implications for information-theoretic regularization, multi-objective learning, and secure collaborative drug discovery. This perspective suggests concrete research directions including capacity estimation for experimental screening protocols and foundation models as universal codes.
Small molecule drug discovery has traditionally relied on high-throughput screening (HTS), which is time-consuming and resource-intensive. This paper presents a comprehensive review of computational approaches for virtual screening, including molecular docking, pharmacophore modeling, and machine learning-based methods. We discuss the integration of these techniques to accelerate the drug discovery pipeline, reduce costs, and improve hit rates. Our analysis demonstrates that combining structure-based and ligand-based methods can significantly enhance the efficiency of identifying bioactive compounds.