Google Index to Go Real Time

Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. Search industry analyst Danny Sullivan told us today that this could be “the next chapter” for Google.

Last Fall we were told by Google’s Brett Slatkin, lead developer on the PubSubHubbub (PuSH) real time syndication protocol, that he hoped Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years. Google senior product manager Dylan Casey said yesterday at Sullivan’s Search Marketing Expo in Santa Clara, California that the company plans to soon publish a standard way for site owners to participate in a program much like that.

How The System Might Work

PuSH is a syndication system based on the ATOM format where a publisher tells the world about a Hub that it will notify every time new content is published. Subscribers then tell the Hub “when this Publisher posts new content, please deliver it to me right away.” So instead of the Subscriber checking back with the Publisher all the time to see if there’s new content, they just sit and wait to be told that there is by the Hub. The Publisher publishes something, then tells the Hub that it’s available, then the Hub goes and delivers it to all the Subscribers. This can take as little as a few seconds.

If Google can implement an Indexing by PuSH program, it would ask every website to implement the technology and declare which Hub they push to at the top of each document, just like they declare where the RSS feeds they publish can be found. Then Google would subscribe to those PuSH feeds to discover new content when it’s published.

PuSH wouldn’t likely replace crawling, in fact a crawl would be needed to discover PuSH feeds to subscribe to, but the real-time format would be used to augment Google’s existing index.

As Danny Sullivan told us today, Google would have to implement some sort of spam control and not just let content be pushed live to the index unvetted. That was what happened in the earliest days of search and it was a real mess, he told us.

The Advantages of a Real Time Google Index

PuSH is much more computationally efficient for Google but Slatkin says that even more important is the impact of such a move for small publishers. Right now many small sites get visited by Google maybe once a week. With a PuSH system in place, they would be able to get their content to Google automatically right away.

A richer, faster, more efficient internet would be good for everyone, but the benefits in search wouldn’t be limited to Google, either. The PubSubHubbub is an open protocol and the feeds would be as visible to Yahoo and Bing as they would be to Google.

“I am being told by my engineering bosses to openly promote this open aproach even to our competitors,” Slatkin says. That’s a very good sign.

We expect this will be a very big deal and we’ll be covering it more extensively in the coming days, as well as whenever Google has something to announce more formally.