Solar Open 100B dispute puts Korea's from scratch rule and sovereign AI ambitions on trial

Upstage's Solar Open 100B faces claims it echoes Zhipu's GLM. The dispute is pushing South Korea to define "from scratch" and set proof standards.

Categorized in: AI News IT and Development
Published on: Jan 03, 2026
Solar Open 100B dispute puts Korea's from scratch rule and sovereign AI ambitions on trial

South Korea's "from-scratch" AI test: The Solar Open 100B controversy

A dispute over Upstage's large language model, Solar Open 100B, has turned into a national stress test for South Korea's AI sovereignty program. At stake is a simple but stubborn question: what does "from scratch" actually mean, and how do you prove it?

The allegation: Solar Open 100B might be derived from Zhipu AI's work rather than built independently. If true, it would violate the rules of the National AI Foundation Model Project, which funds five teams-Naver Cloud, Upstage, LG AI Research, NC AI, and SK Telecom-under strict independence requirements.

How the dispute started

Seokhyun Ko, CEO of Sionic AI, published a technical analysis arguing that Solar Open 100B shows strong structural similarity to Zhipu AI's GLM-4.5-Air. He highlighted unusually close layer normalization parameters, leftover GLM-style config code, and license references to Zhipu AI in the repo.

His claim: those signals don't look like the output of a model trained independently from random initialization.

Upstage's response: public verification

Upstage denies any reuse of foreign weights. The company says Solar Open 100B was trained end-to-end in-house and that parameter similarity alone isn't proof of derivation.

On January 2 in Seoul's Gangnam district, Upstage opened its logs, intermediate checkpoints, and training history to public scrutiny. The session included CEO SungHoon Kim, engineers, external experts, and an invitation for Ko to review materials. The company leaned on tracked experiment artifacts (for example, Weights & Biases) as evidence that's hard to fake after the fact.

The technical core: what weight similarity can and can't prove

Layer norm parameters are easy to compare, but they're also simple and can converge to similar values across related architectures and objectives. Many researchers argue that this signal, on its own, is weak evidence.

A credible derivation test should include attention Q/K/V weights, cross-layer correlation patterns, optimizer states, training trajectories, and consistency of loss curves from early checkpoints. One metric won't settle it. A stack of orthogonal signals might.

For background on why layer norm values can converge, see the original paper: Layer Normalization (Ba, Kiros, Hinton).

The policy question: what qualifies as sovereign AI?

Ko later clarified a broader point: even if licenses allow code reuse, does heavy reference to foreign architectures align with the intent of a sovereign AI program? This is a policy gap-what's legal in open source vs. what the project expects in provenance and independence.

If independence is the goal, the development pathway matters as much as the final checkpoint. Provenance isn't a footnote; it's part of the deliverable.

Expect spillover to all five teams

The controversy won't end with Upstage. Naver Cloud, LG AI Research, NC AI, and SK Telecom will likely face tougher disclosure expectations when they publish. Performance scores won't be enough.

Prepare for requirements around training transparency: intermediate checkpoints, reproducible logs, and clear documentation of external dependencies.

Time to define "from scratch" and make it verifiable

The phrase "from scratch" is vague without a verification plan. The fix is to convert policy goals into testable artifacts and repeatable procedures.

If the government wants trust, it needs standards that teams can implement and auditors can check-consistently, quickly, and without guesswork.

A practical verification checklist for teams

  • Publish immutable experiment logs with run IDs (e.g., W&B/MLflow), including seeds, hyperparameters, optimizer settings, LR schedules, and eval metrics per step.
  • Release periodic checkpoints (early, mid, late) with cryptographic hashes; show continuous training loss curves that match those artifacts.
  • Provide per-layer statistics: weight/grad norms, attention Q/K/V similarity maps across checkpoints, and alignment drift over time.
  • Keep optimizer states (e.g., Adam moments) for sampled checkpoints to prove a live training path rather than a one-shot copy.
  • Offer small-scale replay: reproduce a subset of training (same seed/data slice) and match curves and checkpoint hashes within tolerance.
  • Document a software bill of materials: base code, borrowed components, licenses, and changes. Call out any third-party kernels or configs explicitly.
  • Track data lineage: sources, filters, deduping steps, and any overlap with public corpora used by alleged source models.
  • Commission an independent audit that reviews private artifacts under NDA and publishes a public attestation.

For engineering leaders

  • Bake provenance into your pipeline early. Retroactive proof is messy.
  • Automate artifact capture: configs, seeds, checkpoints, and diffs per run.
  • Treat licenses like dependencies in prod-no ambiguity, no surprises.
  • Assume you'll need to show your work to a third party.

What to watch next

Evidence review from the Upstage session will shape the near-term narrative. Regardless of the outcome, the bar for documentation and disclosure just got higher for everyone in the program.

If the government codifies standards-what artifacts to log, how to submit checkpoints, how audits run-teams can focus on execution instead of speculation. Clear rules reduce noise.

Upskilling your team

If you're formalizing MLOps, evaluation, and audit workflows, see these resources to speed up team training: Popular AI certifications.

This case is a reminder: performance matters, but provenance decides trust. Build both into your roadmap from day one.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)
Advertisement
Stream Watch Guide