Authors are suing Nvidia for illegally using their materials to train its artificial intelligence (AI), spotlighting the need for firms to navigate copyright laws carefully in AI applications.
AI infringement issues are escalating as companies increasingly use copyrighted materials to train sophisticated algorithms without the creators’ consent. Experts say the problem underscores the urgent need for clearer guidelines and protections in the rapidly advancing field of AI.
“AI presents unique copyright concerns for businesses, primarily because it can produce content that closely resembles or “copies” human-generated content, such as articles, publications, images and music,” Star Kashman, a cybersecurity and privacy lawyer, told PYMNTS in an interview.
“The use of AI-generated creations raises complex questions about ownership and copyright, as these creations often use datasets that include copyrighted works of art and may infringe upon these copyrights.”
In a proposed class-action lawsuit, authors Brian Keene, Abdi Nazemian, and Stewart O’Nan claimed that their literary creations were part of a compilation of 196,640 books used for training NeMo, an AI designed to mimic conventional written language. The training database was discontinued in October following allegations of copyright breaches. The authors assert that Nvidia acknowledged using this dataset for NeMo’s training, thus violating their copyright.
The legal challenge demands compensation for the U.S.-based authors whose copyrighted materials were employed in training NeMo’s advanced language models over the past three years. Such language models underpin AI solutions like NeMo, which according to Nvidia offers a quick and cost-effective gateway to generative AI technology.
At the center of the issue is the Books3 dataset, which contains 108 gigabytes of data copied from Bibliotik, a private tracker and one of many “shadow library” sites known for distributing a large amount of copyrighted material without a license, as stated in the lawsuit. The authors are seeking financial compensation and the destruction of any copies Nvidia made or used that infringe on their copyright rights.
Nvidia did not immediately respond to a request for comment from PYMNTS. Nvidia told The Wall Street Journal, “We respect the rights of all content creators and believe we created NeMo in full compliance with copyright law.”
The Nvidia lawsuit is one of several legal disputes involving copyrighted material and AI. In another case, The New York Times sued Microsoft and OpenAI in December, claiming they illegally used the newspaper’s articles to train their AI chatbots.
However, Microsoft argues that the case is merely a narrative fabricated by the Times. The software giant recently compared it to when Hollywood freaked out over VCRs in the 1970s, thinking the ability to record TV would ruin the entertainment business.
Determining if AI-generated content is eligible for intellectual property rights protection is complex. While the U.K.’s Copyright, Designs and Patents Act of 1988 acknowledges copyright for computer-generated works without a human author, the U.S. lacks clear guidance and is awaiting a decision from the D.C. Court of Appeals on this matter.
These differences in copyright laws could significantly affect the revenue models of both AI firms and content creators, raising questions about originality and the protection of style in AI-produced works, Ryan Abbott, professor of law and health sciences at the University of Surrey, recently told PYMNTS as part of the “TechREG Talks” series.
To avoid AI copyright problems, companies should ensure the data used for training is either public domain, properly licensed, or falls under the category of fair use, Kashman said. It’s also necessary to obtain permission to use any copyrighted materials intended for use.
Kashman said that for businesses, staying updated with the newest laws on AI and copyright and creating specific rules for using content created by AI is crucial. Following the law and ethical guidelines is essential, and ensuring that any copyrighted material in training datasets is used only with proper permission.
“In AI, copyright infringement occurs most often with content generation tools, deep learning models trained on copyrighted content without authorization, and automated content aggregation platforms,” Kashman said. “Managing these concerns requires a proactive approach to copyright management and compliance, ensuring that AI technologies are employed ethically and legally, so businesses are protected.”