Is AI already killing people by accident?
Short answer: we don't know in any specific case, and we may never know. Military programs are opaque, incident reports are selective, and the fog of war is thick.
Longer answer: incidents like these are not hypothetical. Current AI systems are unreliable in the exact ways that make lethal mistakes likely under pressure.
What we can and can't know
Targeting errors have always happened. The difference now is the temptation to bolt generative models and weak perception systems onto life-and-death workflows without proof they reduce harm.
Even if AI wasn't involved in a given strike, the pattern is clear: secrecy, hype, and "move fast" thinking are bleeding into warfare. That combo guarantees more bad calls.
The technical problem: unreliable reasoning and brittle vision
Generative models still fail at reasoning, calibration, and common sense. Vision models overfit to training data and break under small shifts in lighting, angles, or context.
That's fine for demos; it's disastrous for targeting. Performance varies by task: logistics and maintenance are forgiving; classification under uncertainty in unfamiliar environments is not.
Deploying on vibes or vendor claims is a safety hazard. Without hard, public benchmarks on collateral damage and false positives, "AI-assisted" can quietly mean "more mistakes, faster."
The moral problem: outsourcing blame
AI can become a moral smokescreen. Leaders set the error budget, decide what "acceptable" civilian casualties are, and choose to push the button anyway.
If you'd blame dice for a bad roll, you're missing the point. The choice to trust dice is the decision. Same with algorithms.
Where AI may help safely (for now)
- Logistics and supply planning
- Predictive maintenance and parts forecasting
- Summarization for intel triage with strict human review
High-ambiguity, high-cost decisions like target selection should be last on the list, not first.
Practical guardrails for leaders, engineers, and policy makers
- Define hard thresholds: maximum tolerated false positives, civilian casualty estimates, and abort criteria under uncertainty.
- Demand a baseline comparison: legacy process vs. AI-assisted vs. AI-disabled across identical scenarios.
- Require abstention modes: systems must say "I don't know" and hand control back to humans under distribution shift.
- Measure what matters: collateral-damage rate, near-miss count, escalation triggers, and time-to-human-intervention.
- Red-team with adversarial inputs, spoofing, and sensor failures. No pass, no deployment.
- Log everything: inputs, model versions, prompts, overrides, and decision rationale for audit and accountability.
- Freeze scope: ban use in targeting until empirical evidence shows a net reduction in civilian harm in blinded trials.
Technical controls teams should implement now
- Uncertainty calibration and selective prediction so the system abstains under low confidence.
- Ensemble and sensor cross-checks to reduce single-model failure modes.
- Out-of-distribution detection with safe fallback to conservative, non-AI procedures.
- Scenario-based evaluation, not leaderboard vanity: test in unfamiliar terrains, lighting, and adversarial camo.
- Model cards, data lineage, and versioned policies tied to explicit risk limits.
- Human-in-the-loop with meaningful authority and time to intervene (no dark patterns, no rubber-stamp UX).
Governance and accountability
- Independent safety reviews with the power to block deployment.
- Public reporting (to the extent possible) on test results, incident rates, and corrective actions.
- Clear assignment of responsibility up the chain of command. "The AI did it" is not a defense.
- Procurement contracts that tie payment to demonstrated reduction in harm, not glossy benchmarks.
If you work in or around public-sector AI, see resources on AI for Government and policy-focused training like AI for Policy Makers.
Questions to answer before any field deployment
- What is the maximum acceptable civilian harm rate, and who signed off on it?
- What is the evidence that AI reduces that rate vs. current practice, across blinded tests?
- When must the system abstain, and what is the human fallback?
- What changes if sensors are spoofed, jammed, or partially degraded?
- Who goes to court if it fails? Name the role, not "the tool."
For dev and IT teams inside defense contractors
- Refuse ambiguous acceptance criteria. Get failure thresholds and test plans in writing.
- Instrument everything for post-incident forensics. Design for auditability from day one.
- Block deployment behind safety gates that require red-team signoff and abstention tests to pass.
- Ship kill-switches and rate limiters that default to human control under uncertainty spikes.
For product and policy leaders
- Stop announcing "AI-enabled" before you have evidence it reduces harm.
- Tie incentives to safety metrics, not feature delivery. Vendors get paid for outcomes, not promises.
- Publish incident playbooks with escalation paths that assume AI error, not perfection.
The bottom line
We may not get straight answers about any one strike. But the pattern is obvious: unreliable systems plus secrecy equals preventable deaths.
Use AI where it's provably safe and measurably helpful. Everywhere else, slow down, measure, and keep humans fully responsible.
Further reading:
Your membership also unlocks: