Training AI With TV & Film Content: How Licensing Deals Look

Photo illustration of a robot's hand dropping a coin into a human palm
Illustration: Variety VIP+; Adobe Stock

In this article

  • Demand for high-quality video to train video generation models is significant as more developers enter the field
  • Film & TV content licensing hasn’t materialized for Hollywood studios, but deals are forming among smaller global distributors
  • Early licensees are considering multiple factors when preparing a video dataset — and stipulating how the data can be used

As AI developers rush to acquire data to train AI models, there is a substantial need for video content.

AI training among a fast-growing set of AI video model developers requires up to “millions of hours of video,” a source told VIP+ in reference to comments by multiple AI companies.

There’s a particular need for high-quality content that’s not already available on the internet to be scraped, and that could come from licensing publisher archives. Using synthetic data for video model training instead might also introduce problems because many outputs of today’s video models still don’t perfectly simulate 3D reality.

The video licensing opportunity could expand as multiple developers race to improve video generation models capable of producing sophisticated outputs in an increasingly competitive field. Strong rival entrants in the wake of Sora and Runway’s model update Gen-3 have included Kling and Luma AI’s Dream Machine. There are many more video models in development, numbering as many as 65, a source told VIP+.

Yet the known licensing activity among publishers for AI training has been scant for video content. Even as developers are still scraping video data, there’s apparent willingness to pay to license video content. So far, it’s primarily occurred among stock video providers Shutterstock and Getty Images. Licensing talks have also reportedly occurred for other large-scale holders of video clips such as Photobucket, per Reuters in April.

Yet publicly acknowledged deals have been nonexistent for film and TV content to train video generation models, though talks have occurred. Alphabet, Meta and OpenAI have engaged in talks with Hollywood studios, Bloomberg reported in May.

Video Licensing Emerging Among Smaller Distributors
While it’s likely to take time for the major studios to move on any AI licensing opportunity, some smaller-scale licensing deals for higher-quality film and TV content are quietly gearing up.

Calliope Networks has been aggregating content licenses to build a catalog of movies, TV episodes and photos, among other types of data, to be used as a high-quality dataset to train AI models.

Through partnerships with content owners, its growing catalog has now reached over 17,000 hours of film and TV content from more than 10,000 titles of lengths ranging from film shorts of a few minutes to episodic TV to full-length feature films from smaller distributors based around the world, said Calliope Networks CEO and co-founder Dave Davis, who began aggregating content for Calliope’s video dataset in earnest after Sora launched in February.

Calliope Networks is currently engaging in deal talks about licensing its catalog with several AI companies building video generation models, including some of the biggest companies in the space. Davis anticipates closing multiple major licensing deals sometime this fall, though he said the licensee may not want to be named.

Preparing AI Video Datasets for AI Training
For Calliope, purposefully curating and preparing its dataset to ensure it’s useful for AI training has emphasized factors including the high quality (fidelity) and diversity of the content.

RELATED: Content Owner Lawsuits Against AI Companies: Complete Index

“We’ve kept an eye on what is useful for AI companies. We’ve tried to make it into a very diverse dataset, so there’s lots of variety of locations, objects, activities,” said Davis, adding that the company purposefully “overindexed” on documentary fare for its diversity. He added that all content in the catalog is HD or better, although that hasn’t been stipulated by AI companies, meaning older content could still be of value.

AI companies that license content directly from producers would be getting clean, high-quality data files that aren’t accessible (“publicly available”) on the web. “Developers will pay for content that’s difficult to source otherwise. For example, a catalog of global documentaries is not stuff that’s generally sitting up on YouTube and easy to scrape,” said Davis. “When companies get files directly, they also eliminate a lot of hassle with artifacts like pop-up ads that they have to process.”

Setting Terms for Video Licensing Deals
High-quality film and TV content would likely come at a significant premium to stock video footage. Calliope’s list price for HD content is set at $6.25 per minute, with an additional premium for 4K or 3D content, said Davis.

By comparison, the video content licensing deals that have occurred between tech companies and stock video providers have reportedly paid out on a price-per-clip or price-per-minute base, with rates ranging over $1 per short-form video, said Reuters. Adobe is reportedly offering to pay its network of photographers and videographers for videos to train its own text-to-video model at a rate of $120 for about 45 minutes, or less than $3 per minute, per a Bloomberg report in April.

Davis further argued that directly negotiating deals is critical to maximizing value for content owners versus marketplaces, which will tend to commoditize content and lead to a race to the bottom on price.

RELATED: AI Content Licensing Deals With Major Publishers — Complete Index

Calliope’s standard terms limit AI training to one model, with a one-year training period, meaning a developer would not be permitted to use the data to train a new model, even if a new training cycle fell within the one-year period. This ensures that if an AI company wanted to train a new model on the same content, it would need to go back and license from the publisher again.

Model updates happen frequently but still require developers to train models from scratch. “New models are trained from scratch, like GPT-5 is being trained from scratch. This is how it works. It’s a new training run,” Ed Newton-Rex, CEO at Fairly Trained, told VIP+ in April.

So far, major studios haven’t moved on licensing video for AI training. Among other factors, a predominant fear among video publishers is that, by licensing, they will be contributing to the tech that replaces them in the market, which some have analogized to the early days of studios licensing high-value content to Netflix.

Yet the reality remains that in the meantime, AI video models are being commercially and capably built through massive data scraping regardless of publisher participation and without their compensation, consent or credit. Recently, 404 Media reported on internal spreadsheets detailing Runway’s model training data, which included copyrighted material by major publishers and creators on YouTube and movie piracy sites.

Davis would argue for licensing, since publishers can at least expect to negotiate fair compensation that becomes a new revenue stream. Perhaps more important, it can give publishers a collaborative voice in the way AI development proceeds and a greater ability to have at least some say in how their content is used, such as potential restrictions that an AI output won’t re-create an exact scene.

“Everybody's concerned about AI and what it means for the future of creators. But the question is, should we perpetuate a world where content is used without a license or compensation?” said Davis. “Or do we try to be a part of the conversation and change the outcome for producers and creators for the better by engaging?”

The choice publishers should make isn’t obvious, but three paths appear open: license content to AI companies, sue on claim of copyright infringement or wait and withhold content in a losing battle against data scraping.

Variety VIP+ Explores Gen AI From All Angles — Pick a Story