Why Visual Artists Struggle to Protect Their Work from AI Crawlers
Visual artists face a growing challenge: keeping their creative work safe from generative AI tools that scrape the internet for training data. While tools exist to help block these AI crawlers, most artists lack the technical knowledge or control over their websites to effectively use them.
The best way to protect art from unauthorized AI use is to stop AI crawlers from accessing it in the first place. But this requires specific tools and website controls that many artists either don’t have or don’t know how to operate.
What the Research Reveals
A study by researchers at the University of California San Diego and the University of Chicago surveyed over 200 visual artists and analyzed more than 1,100 artist websites. The goal was to understand artists’ awareness, use, and control of tools that can block AI crawlers.
The key takeaway? Content creators want control over how their work is used, not just whether it’s accessible. But current internet tools and standards fall short of providing clear, enforceable rights for creators.
Artists' Efforts and Challenges
- Nearly 80% of surveyed artists have tried to prevent their work from being used in AI training.
- Two-thirds have used a tool called Glaze, which masks artwork to confuse AI crawlers.
- 60% of artists reduced how much work they share online, and 51% only post low-resolution images.
- Despite this demand, over 60% were unfamiliar with robots.txt, a simple file that can block web crawlers.
Glaze, developed by the University of Chicago team, alters images to make them less useful to AI systems. While helpful, it’s a workaround. The ideal solution is preventing AI crawlers from accessing the original files altogether.
Robots.txt: A Simple Yet Underused Tool
Robots.txt is a basic text file placed in a website’s root directory. It can instruct web crawlers which pages or files they should or should not access. However, AI crawlers are not legally obligated to follow these instructions.
The study found that over 10% of the top 100,000 websites explicitly block AI crawlers using robots.txt. However, some high-profile sites like Vox Media and The Atlantic removed such blocks after licensing content to AI companies.
Interestingly, many websites that allow AI crawling include misinformation sites, which may intentionally feed false data into AI models.
Limited Control for Artists
More than 75% of artist websites are hosted on third-party platforms that don’t allow users to modify their robots.txt files. This means artists often can’t block AI crawlers on their own sites.
Squarespace stands out as the only platform offering a straightforward interface to block AI tools. Still, only 17% of artists using Squarespace enable this feature, likely due to lack of awareness.
Do AI Crawlers Respect Robots.txt?
The answer varies. Large corporations’ AI crawlers generally respect robots.txt rules. However, some crawlers—like Bytespider from TikTok owner ByteDance—do not.
Many AI assistant crawlers claim to respect robots.txt but lack verification. Overall, corporate AI crawlers tend to follow these rules, while many AI assistant crawlers do not.
Recently, Cloudflare introduced a "block AI bots" feature. Currently, only 5.7% of Cloudflare-hosted sites use it, but it shows promise as a defense tool if adopted more widely.
Legal Uncertainty Adds to the Problem
Legal frameworks around AI training data are still developing. In the US, courts are debating how copyright law applies to AI models trained on scraped content. The European Union’s AI Act requires model providers to get permission from copyright holders.
This legal uncertainty is driving more attention toward technical measures to control access to creative content, especially if courts grant broad “fair use” rights to AI model builders.
What Artists Can Do Now
- Learn about and use tools like robots.txt and Glaze to protect your work.
- Choose website platforms that allow you to control crawler access, such as Squarespace.
- Limit the quality and quantity of your online portfolio if you want to reduce exposure.
- Stay informed about legal developments affecting AI and copyright.
For creatives looking to deepen their understanding of AI tools and protection methods, exploring specialized training can be valuable. Check out Complete AI Training’s latest courses tailored for creative professionals.
Further Reading
The full study, Somesite I Used To Crawl: Awareness, Agency and Efficacy in Protecting Content Creators From AI Crawlers, is available on arXiv for those interested in the detailed findings and methodology.
Your membership also unlocks:
 
             
             
                            
                            
                           