AI Breakthrough Gives Data Owners Unprecedented Control Over Their Contributions

FlexOlmo lets data owners keep control of their data after AI training by merging independently trained sub-models. Data can be added or removed without sharing raw files.

Published on: Jul 10, 2025
AI Breakthrough Gives Data Owners Unprecedented Control Over Their Contributions

A New AI Model Lets Data Owners Regain Control

Researchers at the Allen Institute for AI (Ai2) have developed a large language model called FlexOlmo that changes how training data is managed in AI development. Unlike traditional models where data is permanently embedded after training, FlexOlmo allows data owners to maintain control over their data even after the model is built.

Currently, big AI companies collect massive datasets from various sources, often without clear ownership rights, and create models that fully absorb this data. Extracting or removing specific data after training is practically impossible, like trying to recover eggs from a finished cake.

How FlexOlmo Works

FlexOlmo introduces a modular training process. Data owners start with a publicly available “anchor” model, then independently train a second model using their own data. This personalized model is merged back with the anchor and contributed to the final model. This process means data owners don’t have to share raw data directly, and crucially, their data sub-models can be removed later if needed.

This asynchronous training means contributors don’t need to coordinate their efforts, making the process flexible and scalable.

The Technology Behind It

FlexOlmo uses a “mixture of experts” architecture, which combines several sub-models to form a larger, more capable model. The key innovation is a method to merge independently trained sub-models seamlessly. This is done by representing model values in a new way that allows their capabilities to be integrated during inference.

The Ai2 team tested this approach on a dataset called Flexmix, which includes proprietary books and websites. Their 37-billion-parameter model outperformed individual models across all tasks and scored 10% better on benchmarks compared to other merging methods.

Why This Matters

  • Data Ownership: Contributors can add data without losing control or handing over raw files.
  • Data Removal: If legal issues or objections arise, specific data can be extracted from the final model.
  • Privacy: Sensitive data can be used more safely, potentially combined with privacy techniques like differential privacy.
  • Modular Development: Models can be collaboratively built and updated without full retraining.

Ali Farhadi, CEO of Ai2, points out that this approach allows opting out of the system without harming model performance or inference speed, making it a practical alternative to the current “all or nothing” model training.

Industry and Legal Implications

Data ownership has become a complex legal issue in AI development. Some publishers have taken legal action against AI companies for using their content without permission, while others have negotiated licensing deals.

FlexOlmo offers a new path for building shared AI models where multiple data owners can contribute and maintain control. This could reduce conflicts and promote more transparent AI development.

Researchers also note that while FlexOlmo reduces the need to share raw data, care must be taken to prevent unintended data reconstruction from the final model, emphasizing the potential role of privacy-preserving techniques.

Looking Ahead

FlexOlmo’s modular approach challenges the traditional view of AI models as monolithic entities and opens the door to more collaborative and controlled AI training. For professionals involved in AI development or data management, this model offers a fresh perspective on balancing innovation with data rights and privacy.

For those interested in deepening their AI knowledge or keeping up with the latest trends in AI training and development, consider exploring specialized courses and resources available at Complete AI Training.