From years to days: St. Petersburg scientists use AI to speed up cancer drug discovery

St. Petersburg researchers built a configurable AI platform that ranks drug candidates in days, not years. It unifies data and models with clear next steps teams can reuse.

Categorized in: AI News IT and Development
Published on: Nov 14, 2025
From years to days: St. Petersburg scientists use AI to speed up cancer drug discovery

AI from St. Petersburg labs is speeding up drug development - here's what engineering teams can take from it

At a press event tied to the Priority-2030 program (launched in 2021 and now covering 141 Russian universities), researchers from St. Petersburg shared how they're using AI as a practical tool for cross-industry problems-drug discovery included.

The core idea: build a configurable platform for multimodal data and deliver predictive and prescriptive analytics across domains. In their words, it's a "constructor" of solutions-assemble methods, validate them on real tasks, and ship them as a single product.

A configurable platform for multimodal analytics

SPbPU is developing a digital platform that unifies methods under one roof-ingest, model, evaluate, explain, and recommend. The emphasis is not just on forecasting what will happen, but also explaining why and what to do next.

  • Single entry point for different data types (text, tabular, chemical structures, imaging, etc.).
  • Model catalog that can be adapted per sector-industry, medicine, and beyond.
  • Task-first validation: test on concrete problems, prove accuracy, then deliver as a packaged tool.
  • Prescriptive analytics: tie model outputs to recommended actions and plausible interventions.

Drug discovery use case: from years to days

A team led by Aleksandr Timin (SPbPU) is applying this approach to oncology. They built a database of 100,000+ aminothiophene-based structures and use a pre-trained neural network to rank the most promising compounds for synthesis and preclinical testing.

What used to take years of trial-and-error now compresses to days of model-driven triage. The lab then synthesizes short-listed molecules and feeds results back into the system to sharpen future screens.

  • Data curation: clean structures, standardize formats (e.g., SMILES), and attach assay labels where available.
  • Model screening: run pre-trained models to score efficacy, toxicity, and selectivity proxies.
  • Downstream filters: uncertainty estimates, diversity selection, synthetic feasibility, and cost constraints.
  • Lab handoff: generate a compact, testable batch; close the loop with active learning after assays.

What this means for engineering teams

This is platform thinking applied to AI. Rather than one-off models, you build shared components-data adapters, evaluation harnesses, explainability, and decision layers-then retarget them across domains.

  • Prioritize reproducibility: version datasets, features, models, and evaluation pipelines.
  • Mix methods pragmatically: graph models for molecules, Transformers for sequences and text, gradient boosting for tabular baselines.
  • Track business metrics, not just AUC: cycle time to candidate, lab throughput, and validation yield.
  • Close the loop: route experimental results back to training for active learning and drift control.

Practical implementation notes

  • Data layer: schema for multimodal assets, lineage, and consent/usage constraints.
  • Feature store + embeddings: molecular fingerprints, graph embeddings, learned representations.
  • Model ops: containerize, run batch and on-demand scoring, enforce CI on metrics and fairness checks.
  • Explainability: per-sample rationales (substructures, fragments), confidence scores, counterfactuals when possible.

Validation, risk, and compliance

Clinical contexts raise the bar. Log decisions, justify thresholds, and plan for audits. Align with regulatory thinking early-documentation, traceability, and human oversight are not optional.

For context on regulator expectations, see the FDA's perspective on AI/ML in drug development: FDA: AI/ML in Drug Development.

Why this approach is gaining traction

AI is shifting from experiment to routine tool because the feedback loops are shorter and the infrastructure is reusable. Prove accuracy on a real task, package it, and move to the next adjacent problem.

The St. Petersburg work shows how a single platform can support discovery workflows while remaining adaptable to other industries-without rebuilding the stack every time.

Next steps for developers

  • Define your target decision: what gets prioritized, funded, or synthesized based on model output.
  • Stand up a minimal pipeline: ingestion → baseline model → evaluation → report → human-in-the-loop.
  • Add uncertainty + explainability before scaling to production.
  • Integrate an active learning loop once you have lab or real-world feedback.

Level up your skills

If you're building similar systems-MLOps, LLM tooling, and data-centric workflows-ongoing practice matters. You can explore role-based learning tracks here: AI courses by job.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide