
Getty Images | Alexander Corner
Google appears to have accidentally posted a ton of internal technical documents on GitHub that detail some of the ways their search engines rank web pages. To most people, the question of search rankings is simply “are web search results good or bad?”, but the SEO community is both excited to get a glimpse behind the scenes and outraged that the documents contradict some of the things Google has said in the past. Most of the commentary on the leak comes from SEO experts Rand Fishkin and Mike King.
Google confirmed the document’s authenticity to The Verge, saying it “is careful not to make inaccurate inferences about search based on out-of-context, out-of-date, or incomplete information. We share extensive information about how search works and the types of factors our system weighs, and we work to protect the integrity of our search results from manipulation.”
What’s interesting about the accidental publication of the GoogleAPI GitHub is that even though these are highly confidential internal company documents, Google technically published them under the Apache 2.0 license, which means that anyone who stumbles across these documents is granted a “perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license” and they are currently freely available online, like here.

The leaked information includes a ton of API documentation for Google’s “ContentWarehouse,” which looks a lot like its search index. As expected, even this incomplete explanation of how Google ranks web pages is incredibly complex. King writes, “The API documentation represents 2,596 modules with 14,014 attributes (features).” These are all documents written by programmers for programmers, and they rely on a lot of background information that you would only know if you worked on the search team. The SEO community is still scouring the documentation and using it to build hypotheses about how Google search works.
Fishkin and King accuse Google of “lying” to SEO experts in the past. One of the things the document reveals is that click-through rates on search result listings affect rankings, but Google has repeatedly denied that this has anything to do with “stewing” results. The click-tracking system is called “Navboost” – in other words, it boosts websites that users navigate to. Naturally, a lot of this click data comes from Chrome, even when you exit the search. For example, some results may show a small set of “sitemap” results below the main listing, and apparently part of what drives this is the most popular subpages as determined by Chrome’s click tracking.
The documents also suggest that Google has whitelists that artificially prioritize certain websites for certain topics. The two whitelists mentioned are “isElectionAuthority” and “isCovidLocalAuthority.”
Many of the documents behave exactly as search engines would expect them to: sites have a “SiteAuthority” value, and well-known sites will rank higher than lesser-known ones, authors also have their own rank, but as with everything here, it’s impossible to know how everything will interact with everything else.
Both comments from SEO experts sound like they’re angry that Google misled them, but surely Google needs to maintain at least a slightly adversarial relationship with people who try to manipulate their search results? Recent studies state that “search engines appear to be losing the cat-and-mouse game of SEO spam” and that “there is an inverse correlation between page optimization level and perceived expertise, suggesting that SEO can at least undermine subjective page quality.” None of these additional materials seem to be good for users or Google’s quality of results. For example, now that we know that click-through rates affect search rankings, why can’t click farms improve a website’s listing?