Decades of Jane Goodall's chimpanzee notes go digital as ASU's AI makes them searchable

ASU and the Jane Goodall Institute are turning 60+ years of Gombe chimp notes into a searchable database with AI. Computer vision, LLMs, and human checks keep it accurate.

Categorized in: AI News Science and Research
Published on: Feb 17, 2026
Decades of Jane Goodall's chimpanzee notes go digital as ASU's AI makes them searchable

Turning 60+ years of Gombe chimpanzee field notes into searchable data with AI

Arizona State University is helping carry forward the work started by Jane Goodall by bringing decades of Gombe Stream Research Center records into a usable digital format. In 2022, the Jane Goodall Institute partnered with ASU primatologist Ian Gilby to host more than six decades of handwritten and digital data at the Institute of Human Origins. The archive covers daily observations of wild chimpanzees in Gombe National Park, ecological context and related artifacts, protected in fire- and waterproof cabinets funded by donors.

The goal is simple: take a mountain of handwritten Tiki sheets and turn them into structured datasets that researchers can query, analyze and link to video and geospatial records. The outcome feeds into the Jane Goodall Institute's new Gombe AI Research Platform.

From Tiki sheets to a living database

Gombe researchers have logged daily records for over 60 years on a checkbox-style form called a Tiki sheet. Each sheet tracks one "focal" chimpanzee across a day: subgroup entries and exits, feeding bouts, species encounters and more. Hundreds of thousands of these records now live in ASU's Gombe Research Archive.

Manual data entry was the bottleneck. In fall 2025, Gilby partnered with ASU Enterprise Technology's AI Acceleration team. Senior AI engineer Krishna Sriharsha Gundu built a pipeline that combines computer vision with language models to read the sheets. The team calls it "Gombe AI."

How the pipeline works

Starting from scanned images, computer vision straightens pages, locates form fields and extracts checkbox and numeric entries, outputting clean rows and columns. Those tables flow into a relational database for analysis.

Margin notes are handled differently. The team layers in large language models to interpret handwritten text, abbreviations and context-specific symbols. Gundu developed the extraction code; student researcher and data science undergraduate Joesh Jhaj focuses on translation and interpretation, using GPT's API when handwriting or symbols are unclear.

Nothing is accepted blindly. Students working with the archive compare the digitized output against original sheets to verify accuracy before the data is committed to the database.

Why this matters for scientists and conservationists

Gilby's team sees clear scientific upside. Standardized, searchable records make it faster to test behavioral hypotheses, examine social networks, align behavioral events with ecological data and revisit classic findings with new methods. Better, cleaner data helps inform conservation strategies for an endangered species while strengthening links to questions about human origins.

The workflow also sets a pattern other long-term field projects can adopt: modernize legacy notes without losing the nuance that margin annotations often carry.

What other research teams can copy right now

  • Scanning: Aim for consistent, high-resolution scans; include color to preserve faint pencil marks and aging paper contrast.
  • Preprocessing: Auto-dewarp, deskew and denoise. Simple heuristics (borders, corner markers) can boost form detection accuracy.
  • Form extraction: Treat structured fields (checkboxes, times, counts) separately from free text. Save both raw images and parsed outputs.
  • Handwriting: Use a handwriting-aware OCR step; escalate ambiguous tokens to an LLM with a tight schema and examples.
  • Context rules: Encode domain conventions (e.g., symbols for arrivals/departures) so the model isn't guessing in a vacuum.
  • Human-in-the-loop: Require double-checks for low-confidence items; log edits and reasons for traceability.
  • Schema and versioning: Store parsed data in a relational model with controlled vocabularies; version both raw scans and processed tables.
  • Validation: Build unit tests on a gold-standard subset of sheets; track precision/recall for each field type over time.
  • Governance: Document permissions, sensitive locations and researcher identifiers; apply access tiers if needed.
  • Integration: Plan from day one to link behavior logs with video, GPS and ecological data via shared IDs and timestamps.

The people and the platform

This work is a cross-campus effort. Gilby leads the scientific direction. Gundu engineers the vision and extraction stack. Jhaj builds translation logic and handles edge cases where handwriting or symbols need context to parse correctly. The output strengthens the Jane Goodall Institute's Gombe AI Research Platform and will make decades of observations far more accessible to current and future researchers.

As Gundu puts it, the real win is collaboration-domain experts setting the rules, engineers building to those rules and students pressure-testing the process against real-world data.

Learn more

For background on the long-running field program, see the Jane Goodall Institute's overview of the Gombe Stream Research Center.

If your lab is standing up similar pipelines and needs structured learning on applied AI and data workflows, explore our AI certification for data analysis.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)