Did the web get to big for Google to Google?

In their article “Google Considers Reducing Webpage Crawl Rate“, Search Engine Journal reported that Google may soon visit websites to look for new and updated content a lot less frequently than it currently does. It’s an interesting article because none of the talking heads from Google really comes clean on just why they are considering this… but I think I might have the answer. I’ve definitely got a theory…

What Google said about Reducing Crawl Rate

“… what I mean is that computing, in general, is not really sustainable. And if you think of Bitcoin, for example, Bitcoin mining has real impact on the environment that you can actually measure, especially if the electricity is coming from coal plants or other less sustainable plants. We are carbon-free, since I don’t even know, 2007 or something, 2009, but it doesn’t mean that we can’t reduce even more our footprint on the environment. And crawling is one of those things that early on, we could chop off some low-hanging fruits.”

Gary Illyes – Google

There’s no doubt that Google crawl a lot of webpages only to find that nothing has happened. Nothing at all. Fundamentally it’s wasteful but they only know they’ve wasted their computing time and bandwidth after they’ve come to your website and found that the last time you updated it was in late 2017 with a “coming soon” post. Existing technologies like XML sitemaps and RSS feeds can address this problem though, easily providing Google with a way to check if a site has been updated without having to crawl every single page.

So… what’s the problem? Well, the major problem that Google are talking about is the sustainability of massive computing operations like theirs and their carbon footprint. But… Google are already carbon neutral (Gary Illyses statement that Google are carbon-free is actually incorrect, they were carbon neutral in 2007 and plan to be carbon free by 2030).

With the vast amount of money Google has though, it could easily become a carbon-negative business. Changing what they do and how they do it seems like more work than they need to do when they could, alternatively, just plant a small forest or two. You can plant a tree for £1 at https://moretrees.eco/.

You have to ask… what gives? Why is crawling suddenly such a bad thing?

Theory: The Internet is Growing Faster than Google Can Cope With

I’m not convinced by Google’s “green-washing” of the reduction in crawl rate. I think the reasons for it are much simpler than Google are letting on. I think the Internet is getting too big for Google to handle.

Back in 2021, https://websitesetup.org/ estimated that the web was growing by 576,000 new websites per day. If Google want to continue to index all of the web, that means they need to index 576,000 sites more every single day. Over a year that’s an increase of 210,240,000 sites over the course of the year. And that was 2021. The internet is expanding exponentially, with more and more people creating more and more content every single day.

In 2020, the amount of data on the internet was estimated to have hit 40 zetabytes (a zetabyte being roughly a trillion gigabytes). Compare this to an early estimate by Eric Schmidt, then CEO of Google, at the web being a measly 5 million terabytes or so of data and, even then, Google having taken 7 years to index just .004% of that.

To paraphrase Douglas Adams…

The internet is big. Really, really big.

me, by way of douglas adams

You have to wonder if it is physically (digitally?) possible for Google to index all of it. You also have to wonder if it’s economically viable.

There are more bytes of data than there are people on the planet

Google’s core business model remains showing adverts to people who click on them. We all use Google for free and, according to the now infamous study by SparkToro, there are an increasing percentage of searches on Google that do not generate a click at all. No click = no revenue for Google. Although only roughly half the population of the planet uses the internet, even if every single one of them is a generating revenue for Google by clicking on links, the expansion of the internet is massively outstripping the expansion of the human race.

Is there a tipping point at which Google simply won’t be able to index all, or even a significant percentage, of the internet at all? Will the number of sites needing to be indexed not only outstrip Google’s ability to index them (according to Eric Schmidt, it already did) but also outstrip the commercial imperative to do so?

Remember – every crawl costs money, every addition to the index costs money, every search costs money, and Google only gets paid when you click.

How can Google fix it?

There are already technologies out there, including the humble XML sitemap and RSS feeds, that allow websites to direct Google and other search engines to their new and updated content. Will Google issue another one of its famous pieces of “guidance” that websites will need to provide these, or some other form, of feed if they want to be indexed?

Some pundits more on the fringe of these matters have even questioned if, in the future, Google might charge website owners to index their site. This would effectively make the entire Google search index “pay to play” and every single entry you see on Google would be paid for and, therefore, an advert. Think that sounds ridiculous? Google already did this with Google Shoppingconverting it from a free service to a paid service back in 2012, a decision they reversed in 2020 in response to the financial crisis caused by the Covid-19 pandemic.

Of course, Google could just follow the likes of Bing and start to support IndexNow… (but that’s a whole other story).

How does this affect your website and what should you do?

Google may already not be visiting your site as often as it used to. Updating your site on a regular basis has been SEO best practice for some time, but keeping a regular schedule of adding and updating content is likely to become more important as Google look to get crawling under control. If Google thinks your site is dormant, it will come and check it less frequently. That might not be a problem if your site is genuinly dormant, but it will be a problem for you when you do have new content and you’re waiting for Google to pick it up.

If you want to stay ahead of the curve on this one, you need to:

  1. Ensure your website has a functioning XML sitemap.
  2. Ensure your website has RSS feeds.
  3. Ensure your blog posts appear on your homepage
  4. Blog frequently.
  5. Share your content regularly and repeatedly across social media channels

Be the first to comment

Leave a Reply