1.5 Million Plant Images Accelerate AI for Precision Agriculture

NC State and USDA built AgIR, an open-source plant image repo with 1.5M plant photos and labels for farm-ready computer vision. First release lands on USDA SCINet with cut-outs.

Categorized in: AI News Science and Research
Published on: Oct 10, 2025
1.5 Million Plant Images Accelerate AI for Precision Agriculture

N.C. Plant Sciences Initiative Researchers Share Images to Accelerate AI Use in Agriculture

An open-source plant image repository developed at NC State University and the USDA Agricultural Research Service is closing a major data gap for AI in agriculture. The Ag Image Repository (AgIR) holds 1.5 million plant photos with rich metadata across growth stages, environments and genotypes. The first release will land on the USDA's high-performance computing cluster, SCINet, as a step toward broad public access.

Beyond raw photos, the team is producing background-free "cut-outs" of plants. Current coverage includes 16 cover crop species, 38 weed species, and core cash crops such as corn, soybeans and cotton. These assets make it far easier to build reliable computer vision tools for field conditions.

Why this dataset matters for research

Farm fields are visually messy: light shifts, soils differ, leaves overlap, and varieties express traits differently across weather and management. As Alexander Allen notes, "A stop sign looks the same on the East Coast as it does on the West. But that's not always the case with a pea plant."

For computer vision to perform across sites and seasons, you need large, well-labeled, diverse image sets. AgIR delivers scale, consistent imaging, and contextual labels so models can learn plant identity, growth stage, stress signals and more under real agricultural variability.

How the images are built: BenchBots and a clean pipeline

The Precision Sustainable Agriculture team engineered three wheel-mounted BenchBots operating in "semi-field" environments at Beltsville, MD; Texas A&M; and outside NC State's Plant Sciences Building. Each site arranges hundreds of potted plants in rows, while an overhead track system captures high-detail images, three times per week, through the plant life cycle.

Software handles segmentation (cut-outs), color correction and metadata attachment. The goal: make annotation reasonable at scale, so researchers get both volume and quality without months of manual cleanup.

From data to field impact

AgIR is aimed at researchers building tools for farmers, breeders and agronomists. Current and near-term uses include:

  • Weed, cover crop and cash crop identification for targeted spraying and mechanical control.
  • Growth stage detection to schedule operations and input timing.
  • Disease scoring and stress detection from visual cues for earlier interventions.
  • High-throughput phenotyping: fruit counting, canopy cover, biomass estimation and trait scoring.
  • Domain adaptation studies across sites, cultivars and seasons to improve generalization.

As Chris Reberg-Horton puts it, "Smart equipment is available now to apply most of our inputs variably, but we have been stuck on creating enough intelligence to tell that equipment what to do. Computer vision is the technology that can do it."

Access and baselines

AgIR will first be accessible via USDA ARS SCINet, with plans to broaden availability. The team is also preparing open baselines so labs can fine-tune and benchmark without reinventing core pipelines-useful for students, small teams and applied projects.

Quick start for researchers

  • Review the metadata schema and image protocols to align your labeling strategy with AgIR fields.
  • Use cut-outs for data augmentation and synthetic scene generation; then fine-tune on full-scene images.
  • Split train/validation sets by site and season to measure true generalization.
  • Benchmark against provided baselines; report metrics that matter in-field (e.g., per-species precision at operational thresholds).
  • Plan for on-equipment constraints early: quantization, pruning and image resolution trade-offs.

BenchBots: engineering notes

Hardware design lead Mark Funderburk and colleagues prioritized repeatable imaging: consistent camera geometry, controlled motion and scheduled passes for time-series analysis. That consistency reduces spurious variance, making downstream model training and cross-study comparisons more reliable.

Allen's software team focused on annotation throughput: semi-automated segmentation, color standardization and structured labels. The result is a library researchers can actually use at scale.

Beyond cover crops: breeding and phenomics

Plant breeders can plug AgIR into high-throughput phenotyping workflows. Computer vision can offload fruit counts, lesion scoring and plot-level trait measurements. With standardized imaging and labels, cross-program comparisons get easier, and models trained on one crop can be adapted faster to another.

What's next

The team is expanding species coverage and adding more cash crops. More field environments and growth stages are in the pipeline. As Matthew Kutugata notes, shared baselines and shared data give researchers a clear on-ramp to build, test and improve tools without starting from zero.

About the N.C. Plant Sciences Initiative

The N.C. Plant Sciences Initiative unites over 100 faculty affiliates across nine NC State colleges with partners in government and industry. The mission: solve complex agricultural challenges through interdisciplinary research, extension and education.

Upskill your AI toolkit

If you're building computer vision or data pipelines for ag research, a curated catalog of AI courses can save weeks of trial and error. Explore options by role at Complete AI Training.