AI training and copyright laws

A key issue in AI LLM training is the increasingly common adoption of "opt-out" systems in copyright law reforms. This enables copyright owners to specifically refuse to allow their works to be used in LLM training. Many countries are keen to enable AI LLM training to occur in their countries, to attract AI tech companies.

This is usually called a text and data mining (TDM) exception. It allows AI LLM operators to use copyright-protected data for training without consent, unless copyright owners explicitly opt out. Singapore has one and HK is proposing one. In the EU there is a broad TDM exception with an "opt-out" requiring "machine-readable means." EU courts have determined this means a strong automated system.

There are many problems with using opt-outs.

Firstly, it is national. So, does Elton John have to opt out in every country, or will his music be used to train in Singapore but not UK where he lives and will probably opt out? Or is an opt out worldwide?
Secondly copyright law grants exclusive rights to owners, but LLM rules go against that requiring action to avoid losing your exclusive rights.
The impacts are arguably huge, that LLMs can in effect digest the back catalogue of every publishing house, ruining their licensing business models.
What exactly is opting out? Does an artist need to say it out loud, register it in a database, send a notification to someone or put watermarks in their works? Technology based solutions are usually suggested; that is a way that AI systems can determine that a specific work is not permitted to be copied. Artists must ensure all their works are therefore tagged, watermarked or have metadata added. The challenge is the vast number of existing works, earlier editions, and versions without the tech solution embedded.
Lastly there are many web-hosted databases of works that contained their own service terms, many of which prohibit reproduction.

The dominant AI player, the US, has no TDM exception but a huge number of AI cases. These are arguing that training LLMs on copyrighted works without permission constitutes direct, vicarious, and contributory copyright infringement. AI developers have argued that training their models are "fair use", transformative use or intermediate use. There is no definitive answer and many content sectors are in live litigation now.

The danger is that countries start to take different approaches. The UK and Hong Kong’s continuing analysis of the area is worth following.

AI training and copyright laws

Recent Posts

Comments