CMA says UK publishers should be allowed to opt out of …

Google AI Overviews search feature shown in front of a Google webpage

Google AI Overviews shown in front of a Google webpage. Picture: Shutterstock/DIA TV

The UK’s competition watchdog has said publishers should be able to opt out of their content being used in Google’s AI Overviews without it affecting how they appear in search engine results.

The Competition and Markets Authority also said the way Google decides how content is ranked in search results, including AI Overviews and AI Mode, should be made “fair and transparent”.

The CMA’s proposals come after it designated Google as a tech platform with “strategic market status” for regulation in October.

Google, on which more than 90% of online searches in the UK are carried out, is the first tech platform to face conduct requirements under new powers in the Digital Markets, Competition and Consumers Act 2024 .

The proposals are now out for consultation until 25 February after which a final decision will be taken.

Sarah Cardell, chief executive of the CMA, said the proposals would “give UK businesses and consumers more choice and control over how they interact with Google’s search services”.

She added: “They would also provide a fairer deal for content publishers, particularly news organisations, over how their content is used in Google’s AI Overviews.”

According to Press Gazette analysis, Google dominates makes at least £20bn a year in revenue from UK advertisers .

CMA: Publishers ‘do not have sufficient choice’ over how Google uses their content

The CMA said publishers “currently do not have sufficient choice” over how their content is used in Google’s AI-generated responses.

And it said they have “no realistic option but to allow their content to be crawled” because of Google’s market dominance.

The CMA said: “By not providing sufficient control over how this content is then used, Google can limit the ability of publishers to monetise their content, while accessing content for AI-generated responses in a way that its competitors cannot match.”

The effect of all the limitations imposed on publishers by Google, the CMA said, “can restrict publishers’ abilities to invest in new high-quality content, which leads to detriment for end users. This could be particularly significant for certain sectors, such as news”.

Google tells publishers that don’t want their content to appear in AI Overviews to use the NOSNIPPET meta tag and the DATA-NOSNIPPET attribute so they remain linked in organic search without summaries of their content appearing either there or in search generative AI features.

But the CMA said that “use of this control is likely to affect [publishers’] organic search ranking and reduce traffic because publishers lose the descriptive text that helps users decide whether to click onto the website”.

It cited a Google study (carried out before AI products were present) that found reducing detail on publisher content in search reduced traffic by 45%.

Publishers can also use the Google Extended robots.txt control to opt out from allowing AI chatbot Gemini and AI development platform Vertex to scrape their content, but this does not stop sites from being accessed and used in Google’s AI Overviews or the main search results.

The CMA said there was a “lack of clarity over the scope” of Google Extended which “prevents publishers from making an informed choice as to how their content is used in broader generative AI services”.

The regulator therefore said Google should “provide publishers with effective controls to withhold their search content from being used in” the training and grounding of its broader generative AI services like Gemini and Vertex as well as the grounding of AI Overviews and AI Mode.

Google defines grounding as “the ability to connect model output to verifiable sources of information. If you provide models with access to specific data sources, then grounding tethers their output to these data and reduces the chances of inventing content”.

The CMA said “any changes necessary” should be made to Google Extended to enable publishers to opt their search content out of AI training and grounding.

And “clear and detailed” information should be published about the scope of Google Extended’s control, including its purpose, the key exceptions and limitations of it, and what publisher content it covers.

The CMA added that in relation to the grounding of AI Overviews and AI Mode, Google should introduce a new control so they can opt out at both directory level (entire website) and page level (for example on content where generative AI summaries would most impact clickthroughs) and it must not “take any actions or omissions that would frustrate the effectiveness of the controls”.

The CMA also said Google must not “maintain or introduce ranking signals” designed to downrank in general search results publishers who have opted out of their content being used in AI Overviews and AI Mode.

Such publishers should also not have their content “presented or displayed differently” in general search.

And Google must not “attempt to circumvent any publisher’s choice to withhold its search content by acquiring that content through other sources” for example via third-party scrapers which are believed to have been used by other AI platforms.

The CMA said it would be “reasonable”, however, for Google to “acquire such content through open-source datasets, where these datasets have obtained content legally”.

Publishers have ‘limited transparency’ over how Google uses their content

The CMA also said publishers “have limited transparency over how their content, gathered for search, is used by Google in AI-generated responses and how users engage with that content.

“This makes it harder for publishers to make informed decisions, including on whether to allow Google to use their content and for which purposes.”

It is proposed that Google must “publish clear and detailed information explaining how search content is used for the training and grounding of its generative AI services and features”, for example on its Google Search Central website.

The CMA also said Google should make sure the “effect and scope” of the ways publishers can control the use of their content “is transparent and can be well-understood”.

Publishers should also be given “clear and detailed metrics on user engagement with their search content where it is used in its search generative AI features”.

That data should include user impressions (including when a publisher’s content is attributed in AI Overviews or AI Mode), clickthroughs from links in those AI features so publishers can assess the quality of those clicks, and clickthrough rate (the percentage of users who click through).

The data should be provided on a disaggregated basis for each publisher, fully disaggregated from other elements of general search, and through a commonly accessible platform such as Google Search Console.

Why Google needs better source attribution in AI Overviews and AI Mode

The CMA also said attribution of content in AI products like AI Overviews “is important for both consumers and publishers. For consumers, attribution can allow them to test the veracity of AI-generated content.

“For publishers, attribution can help ensure that consumers are aware of the sources of content, which in turn can allow them to sustain brand value and the creation of new material.”

The watchdog said it had seen “evidence that prominent and accurate attribution… can have a significant impact on publisher outcomes including clickthrough rates”.

The CMA said Google has been making “concrete steps” to improve attribution but that “publishers’ inability to withdraw content contributes to insufficient incentives for Google to assure its attribution of publishers in search generative AI features remains accurate and sufficiently prominent”.

The new proposals state that Google should “take reasonable steps to ensure” content is sufficiently and accurately attributed in AI Overviews and AI Mode and that it must publish “clear and detailed information” explaining how it is doing so.

That explanation should include how Google identifies what content to attribute and the steps it takes to monitor the accuracy of its attributions.

The CMA said: “Given the way generative AI responses are created, we recognise that search generative AI features cannot always be completely accurate; however… we expect Google to outline the steps it takes to improve the factuality of search generative AI features, how it deals with factual inaccuracies when they arise, and metrics on factuality that would help publishers to take a decision on whether to give consent for Google to use their search content in search generative AI features”.

The CMA said it recognised its collective proposals would “not address all the issues currently facing publishers” but said the changes “would be an important precondition for addressing broader concerns about publishers’ ability to negotiate reasonable terms with Google for use of their search content”.

Google’s response to CMA proposals

Ron Eden, Google’s principal for product management, said in a statement: “For years, we have provided web publishers with a range of controls , based on open standards like robots.txt, to manage how their content appears in Search. As technology has evolved, so have our tools.

“We added controls for things like Featured Snippets and image previews (which also apply to AI Overviews). And more recently, we introduced Google-Extended , a new control that lets websites manage how their content is used to train our Gemini models.”

Eden continued: “Building on this framework, and working with the web ecosystem, we’re now exploring updates to our controls to let sites specifically opt out of Search generative AI features. Our goal is to protect the helpfulness of Search for people who want information quickly, while also giving websites the right tools to manage their content. We look forward to engaging in the CMA’s process and will continue discussions with website owners and other stakeholders on this topic.

“Any new controls need to avoid breaking Search in a way that leads to a fragmented or confusing experience for people. As AI increasingly becomes a core part of how people find information, any new controls also need to be simple and scalable for website owners.

“We’re optimistic we can find a path forward that provides even more choice to website owners and publishers, while ensuring people continue to get the most helpful and innovative Search experience possible.”

NMA urges ‘full physical separation of crawlers’

The News Media Association, which represents major national and regional publishers in the UK, said: We strongly welcome the CMA’s proposals to allow news publishers to control whether their content is used in AI Overviews while remaining visible in Google Search results and to ensure fair ranking of publisher content.

“The CMA rightly recognises that Google is able to extract valuable data without reward, harming publishers and giving the company an unfair advantage over competitors in the AI model market, including British start-ups.”

However Meredith said that “given the seriousness of Google’s anti-competitive impact on news media and the AI market, we are sceptical that weaker behavioural remedies will be sufficient, and we urge the CMA to consider full physical separation of crawlers for Search and individual AI products.

“Ensuring the CMA is properly resourced and supported by the government will allow the regulator to move faster on other essential interventions, including requiring Google to negotiate fair and reasonable payment terms with publishers. Without a clear duty on Google to negotiate, publishers will continue to face significant barriers to securing a fair price for their valuable content. It is critical that ministers support the CMA to move at pace.”

In a recent letter to the Lords Communications Digital Committee, which is carrying out an inquiry on AI and copyright, Meredith called for “remedies that ensure Google keeps crawling for General Search and crawling for the development or provision of AI Search Interfaces (such as AI Overviews and AI Mode) separate, and must seek a publisher’s informed consent before deploying a web crawler for the purpose of an AI Search Interface on a publisher’s website”.

Email [email protected] to point out mistakes, provide story tips or send in a letter for publication on our “Letters Page” blog