I got an email from Seth Godin asking for clarification of one of yesterday's mini-posts about how Google can index the web without crawling.#
Doc said that Google and Bing are doing an increasingly bad job of archiving the history of the web. When he looks for something he wrote 20 years ago, the search engines can't reliably find it. He also said they're not indexing the current web like they used to. He experiences it thus: Google can't find one of his recent posts, but then after he visits it, presumably in Chrome, it can. #
Google isn't crawling his site. In the past it would check frequently updated pages, such as blogs, every few minutes, for new links. On discovering one, it would read it, add it to their index, and then it would be findable in Google. #
But when he visits one of those pages in Chrome, then, a few minutes later it can be found in the search engine. It appears to be using Doc's human behavior to find new pages to index. They can do this now because they have a popular web browser, so they can retire their old method of discovering links and let the users do their crawling. #
My own experience. For what it's worth, I've found that Google is still really good at finding my old stuff, as long as there are some good unique words I can search for. If I search for "future-safe archives" for example, it gives me back a really good set of results. But the other day I was trying to find something I wrote about Obama and his online social net he let dwindle after the 2008 election, and although I believe I wrote about this a number of times, I could only find one piece, by going to the January 2009 archive page and scanning it with my eyes. #
Caveat: This is all based on tea-leave reading. Neither of us have any insight into what Google is actually doing to maintain its index. #
Last update: Saturday January 11, 2020; 11:22 AM EST.
You know those obnoxious sites that pop up dialogs when they think you're about to leave, asking you to subscribe to their email newsletter? Well that won't do for Scripting News readers who are a discerning lot, very loyal, but that wouldn't last long if I did rude stuff like that. So here I am at the bottom of the page quietly encouraging you to sign up for the nightly email. It's got everything from the previous day on Scripting, plus the contents of the linkblog and who knows what else we'll get in there. People really love it. I wish I had done it sooner. And every email has an unsub link so if you want to get out, you can, easily -- no questions asked, and no follow-ups. Go ahead and do it, you won't be sorry! :-)