How AI is Changing Open Source Development
AI is influencing open source software development across multiple fronts. While it opens new possibilities, it also raises challenges for developers and communities.
Open source projects are facing pressing questions: Can their code be freely used to train AI models? Where does AI add value, and where might it cause harm—whether in coding, programming, or managing projects? What does open washing mean, and how does open source relate to regulations like the EU AI Act? Lastly, is AI a threat to the future of open source itself? Insights from recent open source conferences such as OCX, 38C3, and FOSDEM 2025 shed light on these topics, with input from the Eclipse Foundation and other experts.
Open Source Code as AI Training Data: Legal and Ethical Questions
Code makes up a significant part of the data used to train current AI models. This isn't just to tackle programming tasks but also to enhance the AI’s reasoning. Some chatbots can write Python code that runs in a sandbox to compute answers, rather than responding directly. This method is especially helpful for complex calculations.
Most of the training code is open source, widely accessible online. For example, Meta’s Llama-3 model trained on 15 trillion tokens, with 17% being code. But this raises legal concerns. Is a language model a derivative of the original code? Does it need to comply with the code’s license?
The legal situation remains unclear. Collections like Hugging Face BigCode avoid including code under licenses like the Eclipse Public License (EPL), favoring permissive licenses such as Apache and MIT. These licenses don't require derivative works to stay under the same license. The Eclipse Foundation’s Mike Milinkovich sees no clear EPL violation but acknowledges uncertainty. Copyright laws vary by country, and courts have yet to decide if AI models are derivative works or fall under fair use.
Even if legal issues get resolved, ethical concerns persist. Open source contributors typically share their code hoping for collaboration, improvements, and community feedback—not for their work to be used silently as AI training material. Restrictive licenses limit users, which may reduce contributions. Hosting projects in vendor-neutral spaces like the Eclipse Foundation sacrifices some control but encourages company and individual participation. However, when AI models train on open source code without direct interaction, it becomes tough to require those users to give back.
Chatbots: Reducing Work or Adding Burden?
AI chatbots help developers by answering questions that would otherwise flood internal forums or sites like Stack Overflow. Smaller models can even run locally on regular computers, using open source tools like Apple’s MLX or llama.cpp, with management tools like Ollama simplifying usage.
Since ChatGPT’s launch in late 2022, Stack Overflow’s new questions and answers have halved annually. Yet, chatbot answers can sometimes be inaccurate or hallucinated. Despite this, AI often provides faster and friendlier responses than traditional forums or web searches. "Let me ask ChatGPT for you" is becoming the new "Let me Google that."
The downside: AI-generated answers occasionally produce useless or misleading solutions, causing extra work. They also tend to favor older frameworks and tools unless the prompt includes newer information. Without public questions and feedback, project maintainers miss out on insights about user challenges. This lack of visible interaction threatens both projects and the quality of future AI training data. Finding a way to restore this feedback loop is crucial for both AI and open source communities.
AI Assistance in Programming
Beyond chatbots, AI tools help with coding by generating code snippets, comments, tests, and offering troubleshooting advice. These tools integrate either as chatbots or directly within code editors, providing context-aware suggestions based on surrounding code.
Available options include GitHub Copilot and open source alternatives. Some require registration and limit monthly usage. Projects like Eclipse Theia offer an alternative IDE experience with Copilot integration. Users can track requests and define agents to customize prompts.
GitHub Copilot recently enabled local model runs, broadening its accessibility. For example, the Visual Studio extension Spring Tools leverages Copilot to explain code and SQL queries directly, adding context automatically and offering “Apply Changes” buttons for seamless integration. However, Copilot’s continuous updates can cause inconsistencies—features working today might break tomorrow. Also, Copilot may lack knowledge on very recent frameworks or face prompt length limits, affecting performance.
Upcoming Conference on AI in Software Development
The online betterCode() GenAI conference returns on June 26, focusing on AI-supported software development. This event builds on previous success and features updated content. Highlights include:
- Software development with Copilot, ChatGPT, and similar tools
- Latest trends in AI coding tools
- AI-supported software testing
- Using large language models to analyze legacy systems
- Strengths and weaknesses of AI in secure software development
- Legal considerations in AI-assisted development
For developers exploring AI’s impact on open source and programming, events like this offer valuable perspectives.
To stay current with AI tools and training, check out Complete AI Training’s latest AI courses.
Your membership also unlocks: