Protecting Commercial AI Rights is Harder than You Think – EU Edition – The Scholarly Kitchen

Protecting Commercial AI Rights is Harder than You Think – EU Edition – The Scholarly Kitchen

In the quaint days of 2019, when the EU issued its Digital Single Market Copyright Directive (DSM), much attention was focused on issues such as a news publishers’ rights and the obligations of platforms to take down infringing materials. It seemed that outside of STM publishing, not several people engaged in discussions around the scope of the text and data mining (TDM) exceptions contained in Articles 3 (non-commercial research) and 4 (commercial research).

Generative AI changed this dynamic. After all, text and data mining is the technological approach by which generative AI systems are trained. As noted in the current draft of the EU’s AI Act, “[t]ext and data mining techniques may be used extensively in this [training] context for the retrieval and analysis of such content, which may be protected by copyright and related rights.” The current draft of the AI Act explicitly requires compliance with the DSM to access the EU market, regardless of the country in which the copyright-relevant acts of training occur.

There are, however, several open questions about the DSM, and especially the rights reservation language in Article 4 for commercial TDM which are likely to confound rights holders and AI companies alike.

DSM Articles 3 and 4 Revisited

Article 3 of the DSM, which is similar in scope to the exception that was (and is) then in place in then-EU member the United Kingdom, allows non-commercial TDM on lawfully acquired content by research organizations. As research organizations are typically publishers’ customers or using content available under open access licenses, STM publishers were generally supportive of this exception.

Article 4, which created a non-commercial exception subject to rights reservation by the copyright owner, seemed more problematic given that copyright is an “opt in” regime. However, at the time — and based on conversations I had with EU officials — the law seemed to impute a distinction between professional content, placed on the websites owned and controlled by publishers, and non-professional content such as Reddit comments and Facebook posts. My understanding is that the EU saw little harm in expecting that the former can reserve its rights when desired, while the latter was unlikely to care.

Recent lawsuits have increased my concern about this issue, especially now that text and data mining is being used as part of large-scale commercial AI.

Challenges of Rights Reservation

The rights reservation language of Article 4 provides:

  • The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online. (italics added)

In explanatory text, the DSM states:

  • In the case of content that has been made publicly available online, it should only be considered appropriate to reserve those rights by the use of machine-readable means, including metadata and terms and conditions of a website or a service. Other uses should not be affected by the reservation of rights for the purposes of text and data mining. In other cases, it can be appropriate to reserve the rights by other means, such as contractual agreements or a unilateral declaration. Rightholders should be able to apply measures to ensure that their reservations in this regard are respected. (italics added)

This language leaves several questions unanswered. What does “machine readable” mean in this context? After all, the TDM exception is an exception to allow very smart machines to “read” and process information, so isn’t anything on a website “machine readable?” What level of granularity is required under DSM Article 4? Is a copyright notice sufficient? What about the words “all rights reserved?” Would it be enough to include “CC BY-NC” in metadata fields? Or does it need to state “commercial rights are expressly reserved under Article 4 of the DSM?” The ambiguity is troubling.

Where is the Content?

Even the foregoing unanswered questions assume the content is in the control of the rights owner. There are several situations in which this is not true.

First, there is pirated content. It has been well documented that some AI companies have trained systems on illegal sets of content. Would an EU-based court hold that the failure to have rights reservation language on illegal content means that such rights have been waived? That is highly unlikely, so let’s move to the next category.

Content may be legally posted online over the objections of the copyright owner. For example, in the recent case Am. Soc’y for Testing and Materials v. Public.Resource.Org, Inc., 82 F.4th 1262 (D.C. Cir. 2023), the Court of Appeals for the District of Columbia Circuit ruled that the non-commercial posting of standards incorporated into reference by law is fair use. It is safe to assume that the entity posting the standards over the objection of copyright owners will not take steps to reserve the copyright owner’s commercial AI rights in the EU. Would an EU-based court hold that the failure to reserve rights on a “non-commercial” website where the content is posted over the objections constitutes a waiver? Doubtful, but murky.

Let’s take this further. What about preprint servers? Recently, several journal publishers allow authors to post preprints of author manuscripts on servers, notwithstanding the fact that copyright often is subsequently transferred to publishers. Does the preprint server need to expressly reserve TDM rights, or is it enough that they are reserved on the version of record? How might an AI company know it is the same? Similar questions are raised with respect to other aggregation sites such as PubMed Central and institutional repositories.

Will this Change?

Legislative changes, like lawsuits, are often a lagging indicator of the times. In 2019, the legislators in the EU seemed focused on commercial and non-commercial research aspects of TDM. They were not likely worried that well-funded commercial entities were developing AI systems through mass infringement and ignoring Article 4 rights reservation clauses, nor did they seem focused on how copyright compliant AI companies might be able to identify reservations for content on multiple sites.

In an ideal world the EU might revisit Article 4, but that is unlikely to happen. Until such time, rights owners should reserve AI rights as explicitly as possible, as granularly as possible, using machine and human readable language, and should require licensees who republish their content online to do the same. And with the AI Act removing any ambiguity about compliance requirements, AI companies seeking to train on copyrighted content might do best to license content directly from rightsholders. Relying on the absence of rights reservation language is risky, unless the AI developer is absolutely certain that it is using an official version.

Disasters Expo USA, is proud to be supported by Inergency for their next upcoming edition on March 6th & 7th 2024!

The leading event mitigating the world’s most costly disasters is returning to the Miami Beach

Convention Center and we want you to join us at the industry’s central platform for emergency management professionals.
Disasters Expo USA is proud to provide a central platform for the industry to connect and
engage with the industry’s leading professionals to better prepare, protect, prevent, respond
and recover from the disasters of today.
Hosting a dedicated platform for the convergence of disaster risk reduction, the keynote line up for Disasters Expo USA 2024 will provide an insight into successful case studies and
programs to accurately prepare for disasters. Featuring sessions from the likes of The Federal Emergency Management Agency,
NASA, The National Aeronautics and Space Administration, NOAA, The National Oceanic and Atmospheric Administration, TSA and several more this event is certainly providing you with the knowledge
required to prepare, respond and recover to disasters.
With over 50 hours worth of unmissable content, exciting new features such as their Disaster
Resilience Roundtable, Emergency Response Live, an Immersive Hurricane Simulation and
much more over just two days, you are guaranteed to gain an all-encompassing insight into
the industry to tackle the challenges of disasters.
By uniting global disaster risk management experts, well experienced emergency
responders and the leading innovators from the world, the event is the hub of the solutions
that provide attendees with tools that they can use to protect the communities and mitigate
the damage from disasters.
Tickets for the event are $119, but we have been given the promo code: HUGI100 that will
enable you to attend the event for FREE!

So don’t miss out and register today:

And in case you missed it, here is our ultimate road trip playlist is the perfect mix of podcasts, and hidden gems that will keep you energized for the entire journey


This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More