Google IP boss: We don’t believe we should pay for AI …

Google AI Overviews shown in front of a Google webpage. Picture: Shutterstock/DIA TV

A leading Google public affairs executuive has admitted that the tech giant does not believe it should have to pay to use unpaywalled content for AI training.

Roxanne Carter told the Lords Communications and Digital Committee on Tuesday that Google wants to create “wholly new content” and not replicate publishers’ work in products like AI Overviews and AI Mode.

She also insisted that publishers can opt out of AI training without being penalised in search, but did not comment when asked about the ability to opt out of content being scraped for AI Overviews and being affected in the same way.

Committee chair Baroness Keeley asked Carter if she agreed that rightsholders in the UK creative industries should receive payment or a revenue share when their work is used by AI system.

Carter responded: “When it comes to training AI models on freely available content that is available on the open web, we do not believe that we should license.

“What the AI model is trying to do is analyse huge amounts of data to identify patterns and statistical relationships between words, language concepts. It is not an information retrieval system. It is not a database. It is not looking to make copies. What it’s trying to do is develop new tools to then produce wholly new content.”

Carter said they are, however, “seeing the market develop” for access to content such as archive content, specialised datasets, or other work that has been opted out of AI training.

Baroness Keeley repeated the question about payment, saying it had not been answered.

Carter answered: “I think it depends on if you’re asking us to pay for every single piece of content for the training of the model, and if that content is freely available on the web, then no.

“But if you’re asking us whether or not there should be deals for access to content that might be off-platform or archive, then, yes, absolutely, and those deals are being done.”

She later said: “We license for content… We don’t believe that we need a licence to train, but we are doing licences for access.”

And she said at the end of the session: “Just to reiterate, it’s not a question of that we won’t pay. I think we need the certainty as to what we are paying for.”

In December Google announced AI deals with publishers , although it did not describe them as licensing agreements.

The tech giant said it pays publishers for “extended display rights and content delivery methods like APIs”.

Publishers involved include The Guardian, Der Spiegel, The Washington Post, The Washington Examiner, The Times of India and El País.

Carter said: “That’s something that we’re keen to roll out to more news publishers over the coming months.”

Culture Secretary Lisa Nandy told the committee later on Tuesday afternoon that deals are working “very well for the bigger players. They don’t necessarily work well for the smaller players. Now, we’re as concerned about them as we are about the bigger players…”

Google’s Carter argued that the current system through which website publishers have to opt out of being crawled by particular AI bots using robots.txt signals is “effective” and has “worked for many years”.

She said it means “the large media sites, for example, [are] able to opt out as well as your small personal blog”.

Asked about fears from publishers that if they opt out of Google’s AI bots they will not appear as they should in Google search results, Carter said this is not true.

She said: “You can remain on search and yet still opt out of your content being used for AI training.

“When we were designing our generative AI tools, we launched a consultation, and we spoke very widely with the web ecosystem, and one of the clear pieces of feedback that we got was they want to remain on search, but they want to have the control to opt out of their content being used for AI training.

“So in 2023 we launched something called Google Extended, which sits in robots.txt, but is a standalone control that allows you to say yes to search, no to AI training.”

However, although Google Extended lets publishers opt out from allowing AI chatbot Gemini and AI development platform Vertex from scraping their content, it does not stop sites from being accessed and used in Google’s AI Overviews.

To avoid this publishers would have to opt out of being scraped by Googlebot, which indexes for search.

Asked directly if publishers can opt out of AI Overviews, Carter did not answer. She said: “This is an issue that is live, ongoing discussion with the CMA,” referring to the Competition and Markets Authority which recently found Google has “strategic market status” via its search products including AI Overviews and separate product AI Mode . The CMA is now consulting on potential rules and regulations for Google.

“So this is a ongoing process with them, and I’m not able to give more details at this time.”

Google tells publishers that don’t want their content to appear in AI Overviews to use the NOSNIPPET meta tag and the DATA-NOSNIPPET attribute to limit visibility of specific pages or parts of page – similar to how they could previously control whether they appeared as featured snippets at the top of results.

Nandy noted that publishers had “particular concern” about search, noting: “Opting out means opting out of the search engine, then you’ve effectively killed off your your business before you’ve started.”

She said AI Overviews have “emerged as a stronger concern… I don’t think there’s a concern, particularly, around the AI summaries. There’s a concern when the AI summaries become competitor to the original products, using the original product in order to create another article that then is used by people instead of the original.”

She said the arrival of AI Overviews illustrated how “fast moving” these issues are and “that new technologies are being developed all the time that are throwing up new challenges that people debating this three or four years ago wouldn’t even been able to foresee”.

Google aiming to create ‘wholly new content’, not replicate

Carter was also asked about concerns that content is being replicated by AI products like Google’s AI Overviews in search results after it has been trained on.

She responded: “Certainly from the Google side, when we are designing these tools, the purpose is to then create wholly new content. To get to that point, you need to feed the model really huge data sets, really, really, really massive, and then that guards against bias, it ensures different perspectives.”

She added that AI products therefore “shouldn’t be replicating” although there were “rare” cases of memorisation when the models were first launched.

This has been managed by output filters and other technical controls, she said, so “the models are now performing much better”.

Email [email protected] to point out mistakes, provide story tips or send in a letter for publication on our “Letters Page” blog