AI training site stole his photos, then sued when he complained: Robert Kneschke's story
What would you do if you found out that the photos that you’d taken were being used to train AI without your consent? You might think that you have the right to ask the organization that is using them to stop and delete them. Well, it’s not that simple, as the case of Robert Kneschke, a German stock photographer, shows. Kneschke said he asked LAION — a non-profit that provides training materials for machine learning research — to remove his photos from its dataset. Not only did LAION refuse, but it also said he must pay for making “unjustified” copyright claims.
His case raises questions: Is copyright protection enforceable at all in the age of AI? And is AI training, as it exists now, intrinsically unethical? As we try to answer these questions, let’s take a closer look at how Kneschke has challenged one of the pillars of the generative AI industry: its ability to train AI models on billions of images, many of them copyrighted, for free.
LAION and its role in the generative AI revolution
Before we go deeper into the case, a couple of words need to be said about LAION (Large-scale Artificial Intelligence Open Network), which has been instrumental in the ongoing generative AI craze. LAION is a German non-profit that is best known for releasing several huge datasets of images and captions. These datasets, the largest of which contains 5.8 billion filtered image-text pairs, have been used to train prominent text-to-image and video-generating models, including Stable Diffusion, Midjorney and Google’s Imagen. The data that LAION parsed to find image-text pairs comes from Common Crawl, another non-profit that scrapes the web every month and provides petabytes of data for free to the public.
As we discussed in one of our previous articles, it might be hard to hold LAION liable for copyright infringement. The reason is that LAION says it does not host the images themselves, but rather provides the URLs where these images can be downloaded. So one could argue that LAION technically has nothing to take down, even if asked to, but Kneschke believes that LAION has not shielded itself from liability.
What has Kneschke done to cause LAION’s ire?
In his blog, Kneschke says that he first wrote to LAION in February after he found “heaps of images” from his portfolio in the LAION dataset and asked that they be removed. Kneschke used a tool called ‘Have I Been Trained?’ to search for the images, and discovered that most of his photos that were in the dataset had watermarks.
Kneschke might have cherished some hopes when he reached out to LAION based on the information on the nonprofit’s website. There LAION states that its dataset “contains links to external third-party websites, the content of which we have no influence on,” but at the same time notes: “Should you nevertheless become aware of a copyright infringement, we ask that you inform us accordingly. As soon as we become aware of legal violations, we will remove such content immediately.”
LAION responded to Kneschke’s request almost immediately. Although, to be more precise, it was not LAION that replied. A law firm hired by LAION told Kneschke that not only there was nothing to remove, but also that LAION did not facilitate the use of the photographer’s images by third parties, since “image content linked by our client can be accessed freely on the internet.” They also told him to back off and to deal with those other third parties directly. Finally, they warned him that if he persists, LAION would sue him for damages for making an “unjustified copyright claim” under German law.
Undeterred, Kneschke contacted LAION again in March. He demanded that they cease and desist from any possible claims for damages, and that they disclose the source of the images and how long they had used them. With that, Kneschke apparently crossed the red line. LAION refused to provide any information, and followed up with a threat to make the photographer pay the legal fees (about 800 Euros) it claimed to have incurred while “defending itself against the obviously unjustified claim.” Kneshke fired back in April by filing a lawsuit against LAION for copyright infringement in a German court. Kneschke told us he was still waiting for the court to acknowledge his claim, which would set the legal process in motion, including laying the groundwork for LAION to tell its side of the story.
The bone of contention
Kneschke says that LAION had to store the images, at least temporarily, in order to include them in the dataset, and that this makes it liable for copyright infringement. LAION does not deny the act of reproduction, however fleeting, but insists that it is permitted under the German Copyright Act §§ 44b and 60d.
Section 44b(2) of the Act states that “It is permitted to reproduce lawfully accessible works in order to carry out text and data mining.” Section 44b(3) notes that such “uses …are permitted only if they have not been reserved by the rightholder… in a machine-readable format.”, which could mean with a code. For its part, section 60d states that it’s permitted to make reproductions to carry out text and data mining for scientific research purposes.
One of Kneschke’s other issues with LAION is that it seems to have found some unlikely or likely bedfellows in for-profit companies. One such example is Richard Vencu, who happens to be on both LAION’s team and that of Stability AI, the makers of Stable Diffusion.
In any case, the whole situation is pretty bad optics for LAION, who, even if they are found to be legally in the right, will still look like a bully shaking down artists for money.
What Kneschke hopes to achieve with his lawsuit
When we contacted Robert in late May, he told us that he was optimistic about the outcome of the legal proceedings. “Of course we think we have a valid claim, otherwise we wouldn’t have started the lawsuit,” he told us. Kneschke said that he believes that LAION “is not fully transparent in how they worked in detail” and that litigation could shed light on some processes hidden from public view that could ultimately determine its outcome.
As for what he hopes to accomplish through what appears to be an uphill legal battle with LAION, Kneschke said that he wants professional artists to have a say in how companies that develop AI-based tools use their work, with at least the ability for artists to opt out, give their permission first, and be financially compensated:
AI generative imagery is a groundbreaking new technology that mainly affects professional artists. The technology is also heavily reliant on getting useful images and metadata from the artists it tries to imitate and replace, so we think artists should have a much stronger voice in how their works are used for AI training. The minimal rules should be choice, consent and compensation.
Lawsuits piling up
The lack of clear rules for image use with regards to AI training has sparked a wave of lawsuits by artists and others who are now challenging the status quo. Kneschke is taking on LAION, which may be a long shot, while others are going after AI startups that, one could argue, are directly cashing in on the artists’ work by using them to create their paid products.
In January, a group of artists filed a class-action lawsuit against a trio of AI-powered image generators Midjorney, Stable Diffusion, and DevianArt’s DreamUp. They argued that the organizations violated the rights of “millions of artists” by using their works for training without consent. Around the same time, Stability AI was sued by stock photography platform Getty Images for infringing on its copyright and trademark protections. Getty alleged Stability AI copied over 12 million images from its database without permission or any sort of payout.
Implications for the AI industry
Since we are not lawyers, we are not qualified to comment on the legal merits of the Kneschke’s or any other lawsuit. We will leave that to the courts to decide. However, we are curious to see how these cases turn out, because if the artists have the upper hand, it could turn the entire AI industry upside down.
They also raise important questions about who owns the data on the Internet, how it can be used and reused, and what the rights and responsibilities of both data providers and data consumers are.
Like Kneschke and many others who see AI not necessarily as an enemy but as a potential ally, we hope that AI startups train their models ethically. That means allowing those who don’t want to participate with their copyrighted images or personal information to easily opt out, preferably at any stage of the process. While this may inconvenience the fledgling AI industry, in the long run it will make things right with data owners, including artists and other content creators. For this to happen, we need clear and consistent rules for data use across the AI industry that everyone follows and respects.