SciSciGPT: a practical AI collaborator for the science of science
Science is producing more data, more methods, and more specialization than any one person can hold in their head. That's great for progress, but it slows research down and raises the barrier for new projects. SciSciGPT is an open-source, prototype AI collaborator built to help researchers work through data-heavy projects in the science of science-faster, with clearer provenance, and fewer technical bottlenecks.
Think of it as a structured assistant that reads the literature, writes queries, runs analyses, and critiques its own outputs. You stay in control of questions, assumptions, and interpretation. It handles the tedious steps and gives you a reproducible trail.
What it does
SciSciGPT is a multi-agent system that mirrors how strong research teams operate. Each specialist focuses on a core job, coordinated by a manager that plans the workflow and keeps things on track.
- ResearchManager: plans the project, breaks the prompt into steps, assigns tasks, and synthesizes results.
- LiteratureSpecialist: finds, reads, and organizes relevant sci-sci papers and extracts key points.
- DatabaseSpecialist: cleans and queries large scholarly datasets, standardizes entities, and outputs structured data.
- AnalyticsSpecialist: runs statistical analysis, builds visuals, and iterates based on feedback.
- EvaluationSpecialist: evaluates methods, visuals, and outputs, scores quality, and suggests concrete improvements.
The system works conversationally: you ask a question, it plans, executes, checks, and refines. It's modular, so data sources and methods can be swapped as needs change.
Case study 1: collaboration among Ivy League universities
Prompt: "Generate a network for collaborations among Ivy League Universities between 2000 and 2020. Optimize colors and annotations."
The system scoped what was needed-acquire data, construct the network, visualize. The DatabaseSpecialist built the dataset with SQL, standardized institution names, filtered by year, computed co-authorship counts, and saved the result. The EvaluationSpecialist scored each step and flagged small fixes before moving on.
Then the AnalyticsSpecialist built a graph (NetworkX + Matplotlib), iterated on layout, labels, edge weights, and color. After two refinement cycles guided by the EvaluationSpecialist, the final figure clearly mapped collaboration intensity (edge weight) and institutional output (node size). From there, you can extend by field-level breakdowns or time-sliced views.
Case study 2: replicating a team-size finding
Prompt: upload a figure from a well-known study on team size and ask: "Interpret this figure. Redo the analysis using your database. Create a similar visualization."
The ResearchManager parsed the dual-axis structure and trends. The DatabaseSpecialist pulled data for millions of papers, including citations, disruption percentiles, and team sizes. With this, the AnalyticsSpecialist recreated the plot, estimated confidence intervals, and reported correlations and percentage changes by team size.
Want more? Ask for OLS with controls, propensity score matching, or for computing disruption scores at runtime instead of using precomputed fields. The system will document the exact choices and steps for review.
A roadmap for agent capability
The team behind SciSciGPT proposes a practical maturity model for LLM agents used in research. It's a useful checklist when you assess or build similar systems:
- Functional capabilities: tool use for literature, data processing, and statistical methods.
- Workflow orchestration: planning, task decomposition, and reflective feedback loops.
- Memory architecture: keep only what matters across steps to stay focused and efficient.
- Human-AI collaboration: conversational workflows that keep researchers in control.
SciSciGPT implements core pieces across all four levels and leaves room for deeper reasoning and domain extensions.
Early evidence: faster cycles, credible outputs
In a small pilot, SciSciGPT was tested against three researchers (predoc, PhD, postdoc) using their normal workflows and coding assistants. On the same tasks, SciSciGPT finished in roughly one-tenth of the average time and received higher ratings for effectiveness, technical soundness, analytical depth, visualization, and documentation.
There are caveats: small sample, time constraints for participants, and differences in personal preferences for methods. Reviewers also noted that documentation was long. In response, the system interface now supports collapsible logs-complete provenance when you want it, less screen clutter by default.
Limits and what broke
- Occasional unnecessary downsampling (e.g., stray SQL limits).
- Coordination issues: if data extraction falls short, analysis can suffer.
- Method preferences: some experts wanted different models or field conventions (e.g., ERGMs).
- Nondeterminism: identical prompts can yield small variations; tighter prompts reduce drift.
Interestingly, some variability is useful. Running a prompt multiple ways can surface alternative analytical paths you might want to compare before locking in a final approach.
Data and infrastructure
SciSciGPT connects to large scholarly data lakes and a dedicated literature corpus. It has been integrated with a structured subset of SciSciNet and can work with sources like OpenAlex. Queries run in a relational setup (e.g., BigQuery), and the system brings back clean tables ready for analysis and plotting.
The literature side (SciSciCorpus) parses PDFs into paragraphs, summarizes them, tags sections (methods, results, discussion), and stores embeddings for retrieval. This supports source-grounded answers and quick citation trails.
Ethics and practical concerns
As AI takes on more of the mechanical work, credit and authorship norms need clarity. Early-career researchers still need to build core analytical skills, so blind outsourcing is a risk. Access may be uneven across institutions.
The simplest guardrails help: clear documentation of every step, explicit sign-off by human authors, and shared guidelines on what counts as AI assistance versus intellectual contribution. Treat the system like a capable assistant-verify important steps and keep ownership of decisions.
Where this goes next
Expect stronger reasoning, better retrieval, wider data coverage, and easier user data imports. Interface improvements will keep the "show your work" ethos while staying readable. Because it's open-source, the research community can extend it to other data-heavy fields with their own data sources and methods.
If you're exploring AI-assisted research workflows and want structured training paths by role, you can browse curated options here: AI courses by job.
How to get value right now
- Use it for early scoping: ask for 2-3 candidate approaches, then pick one to push.
- Be explicit about datasets, time windows, and metrics to reduce variance.
- Ask the system to list key assumptions and possible failure modes before running.
- Lock methods with a short checklist (sampling frame, filters, models, CIs) and reuse across runs.
- Keep the critique loop on: ask it to grade its plots and suggest improvements, then rerun.
The end goal is simple: spend less time wrangling, more time asking sharper questions and stress-testing answers.
Your membership also unlocks: