What's New

Google's Internal API Leak - What's the big deal?

Tom Elliot
October 10, 2024
What's New

Introduction

At the start of May (2024) some internal Google documents were leaked, causing quite a stir in the SEO community. Rand Fishkin of SparkToro originally published them, and Mike King from iPullRank did an early dive into the details - both reputable names in the SEO world - and they were later confirmed as authentic by The Verge. The documents shed light on the inner workings of Google's ranking and search systems, and in particular how some of Google's public communications have been misleading. This post dives into the key takeaways and what they mean for the SEO community.

Overview of the Leak

In May 2024 an anonymous source leaked thousands of pages of internal Google documentation, revealing intricate (and previously well protected) details about Google's search ranking algorithms. This unprecedented leak has given the SEO community insights into how Google ranks web pages, confirming long-held suspicions and unveiling new ranking factors with actual internal Google data.

Key Points:

  • User Interaction Signals: Evidence suggesting that user click data and behavior metrics are used in rankings.
  • Site Authority: Internal use of metrics similar to domain authority, contradicting public statements.
  • Technical Infrastructure: Insights into the systems and processes that support Google’s search functions​​​​.

Key Findings from the Documents

The leaked Google documents provide significant insights into the factors influencing search rankings and Google's internal operations. Here are the key findings:

User Interaction Signals

Google collects user behavior metrics, such as click-through rates (CTR) and dwell time. While it’s not confirmed that these metrics are directly used in rankings, their collection suggests potential influence (which Google has always denied).

Site Authority Metrics

Google uses metrics similar to domain authority to evaluate the trustworthiness and authority of websites. This suggests that factors like domain age and the number of inbound links play a significant role in rankings. Again, something Google has repeatedly denied.

Technical Infrastructure

We also got some insights into the technical infrastructure supporting Google’s search functions. The documentation is all about the Content Warehouse API, which stores vast amounts of data related to web content, links, and user interactions.

Data Collection Practices

Google extensively collects data, including clickstream data that tracks the sequence of pages a user visits. This data is used to enhance search quality and understand user behavior better. Google has denied the use of Clicks as a ranking factor.

Implications for SEO: Rethinking Google's Credibility

The leak reveals discrepancies between Google's public statements and internal practices, indicating a lack of transparency about ranking factors like domain authority and user behavior signals. Marketers and SEOs should approach Google's guidelines with a bit more skepticism and prioritise empirical testing and independent analysis. It'd be great to have a bit more transparency and clarity from Google, but don't hold your breath.