AI from St. Petersburg labs is speeding up drug development - here's what engineering teams can take from it
At a press event tied to the Priority-2030 program (launched in 2021 and now covering 141 Russian universities), researchers from St. Petersburg shared how they're using AI as a practical tool for cross-industry problems-drug discovery included.
The core idea: build a configurable platform for multimodal data and deliver predictive and prescriptive analytics across domains. In their words, it's a "constructor" of solutions-assemble methods, validate them on real tasks, and ship them as a single product.
A configurable platform for multimodal analytics
SPbPU is developing a digital platform that unifies methods under one roof-ingest, model, evaluate, explain, and recommend. The emphasis is not just on forecasting what will happen, but also explaining why and what to do next.
- Single entry point for different data types (text, tabular, chemical structures, imaging, etc.).
- Model catalog that can be adapted per sector-industry, medicine, and beyond.
- Task-first validation: test on concrete problems, prove accuracy, then deliver as a packaged tool.
- Prescriptive analytics: tie model outputs to recommended actions and plausible interventions.
Drug discovery use case: from years to days
A team led by Aleksandr Timin (SPbPU) is applying this approach to oncology. They built a database of 100,000+ aminothiophene-based structures and use a pre-trained neural network to rank the most promising compounds for synthesis and preclinical testing.
What used to take years of trial-and-error now compresses to days of model-driven triage. The lab then synthesizes short-listed molecules and feeds results back into the system to sharpen future screens.
- Data curation: clean structures, standardize formats (e.g., SMILES), and attach assay labels where available.
- Model screening: run pre-trained models to score efficacy, toxicity, and selectivity proxies.
- Downstream filters: uncertainty estimates, diversity selection, synthetic feasibility, and cost constraints.
- Lab handoff: generate a compact, testable batch; close the loop with active learning after assays.
What this means for engineering teams
This is platform thinking applied to AI. Rather than one-off models, you build shared components-data adapters, evaluation harnesses, explainability, and decision layers-then retarget them across domains.
- Prioritize reproducibility: version datasets, features, models, and evaluation pipelines.
- Mix methods pragmatically: graph models for molecules, Transformers for sequences and text, gradient boosting for tabular baselines.
- Track business metrics, not just AUC: cycle time to candidate, lab throughput, and validation yield.
- Close the loop: route experimental results back to training for active learning and drift control.
Practical implementation notes
- Data layer: schema for multimodal assets, lineage, and consent/usage constraints.
- Feature store + embeddings: molecular fingerprints, graph embeddings, learned representations.
- Model ops: containerize, run batch and on-demand scoring, enforce CI on metrics and fairness checks.
- Explainability: per-sample rationales (substructures, fragments), confidence scores, counterfactuals when possible.
Validation, risk, and compliance
Clinical contexts raise the bar. Log decisions, justify thresholds, and plan for audits. Align with regulatory thinking early-documentation, traceability, and human oversight are not optional.
For context on regulator expectations, see the FDA's perspective on AI/ML in drug development: FDA: AI/ML in Drug Development.
Why this approach is gaining traction
AI is shifting from experiment to routine tool because the feedback loops are shorter and the infrastructure is reusable. Prove accuracy on a real task, package it, and move to the next adjacent problem.
The St. Petersburg work shows how a single platform can support discovery workflows while remaining adaptable to other industries-without rebuilding the stack every time.
Next steps for developers
- Define your target decision: what gets prioritized, funded, or synthesized based on model output.
- Stand up a minimal pipeline: ingestion → baseline model → evaluation → report → human-in-the-loop.
- Add uncertainty + explainability before scaling to production.
- Integrate an active learning loop once you have lab or real-world feedback.
Level up your skills
If you're building similar systems-MLOps, LLM tooling, and data-centric workflows-ongoing practice matters. You can explore role-based learning tracks here: AI courses by job.
Your membership also unlocks: