AI Researchers Push Back Against Unrestricted Use of Public Data for Model Training
A global survey shows most AI researchers oppose unrestricted use of public data for training AI models. Creatives demand stronger ethical standards and consent rights.

Most AI Researchers Reject Unrestricted Use of Public Data for Training AI Models
A recent global survey reveals that only about 25% of AI researchers support the unrestricted use of publicly available data to train AI models. This has intensified tensions between creatives demanding control over their work, government policies, and industry ambitions.
Writers and academics increasingly voice concerns about their work being used without consent in AI development. Investigations, such as those covered by The Atlantic, exposed how engineers at Meta downloaded over 7 million books and 80 million research papers from pirated online libraries. Meanwhile, the Society of Authors, representing more than 11,000 members, has petitioned for transparency and fair use in AI training practices.
New findings from University College London (UCL) highlight that most AI researchers advocate for stronger ethical standards regarding training data. The majority oppose the UK government's current proposal, which would require creatives to explicitly opt out if they don’t want their work used. This approach contradicts long-standing copyright norms that typically require explicit permission to use creative content.
This ongoing debate illustrates that, despite AI’s hype and potential, AI systems and companies have yet to earn a broad social license to operate. Building this license depends heavily on respecting the values and norms of various sectors, including the creative industries.
What AI Researchers Think About Training Data
In June 2024, a team conducted the largest international survey of published AI researchers to date, collecting 4,260 responses. The aim was to understand researchers’ views on AI innovation, ethics, and responsibilities, especially regarding training data.
The survey revealed diverse opinions. While both researchers and the public recognize AI’s risks, they disagree on who should be responsible for ensuring AI’s safe use. Researchers ranked the top three responsible parties as:
- Companies developing AI
- Government
- International standards bodies
In contrast, a prior UK public survey by the Alan Turing Institute and Ada Lovelace Institute found the public prioritized:
- An independent regulator
- Companies developing AI
- An independent oversight committee with citizen involvement
Researchers generally support public involvement in AI decisions after deployment, with 84% considering this somewhat important. However, fewer support public input during earlier stages like model training or development. Most see public engagement as more relevant when AI is already in use, focusing on managing risks and regulations.
Experience with public participation methods among researchers is limited. When used, it tends to be low-participation types like surveys, rather than stronger engagement such as co-designing standards or open deliberative processes. Still, researchers acknowledge concerns about training data quality, industry control, and the need for guiding AI research agendas.
An Agenda for AI Research Policy
These differences between researchers and creatives are not simply misunderstandings. For example, Paul McCartney recently told the BBC, “I think AI is great, and it can do lots of great things, but it shouldn’t rip creative people off.” McCartney, who used AI to revive John Lennon’s voice in a recent track, appreciates AI’s opportunities but insists on respecting creators’ rights.
Such views, shared by many creatives and public members, deserve serious consideration rather than dismissal as misconceptions. Increasing dialogue and deliberation between the public and AI developers is crucial. Yet, nearly 40% of researchers cite lack of time and funding as key barriers to public engagement.
Addressing these barriers requires dedicated resources and support from funding bodies and institutions training AI professionals. Projects like Public Voices in AI demonstrate pathways to better integrate public perspectives into AI research.
Further qualitative research is needed to explore AI researchers’ views on risk, responsibility, and public attitudes more deeply. The survey results suggest a clear opportunity to engage researchers and the public in meaningful debate about AI’s future.