instrumenting the web, instrumenting people
We would love to have better access to data that’s out there. We find it frustrating that we don’t.
–Larry Page, via Bloomberg
Let’s just unpack that statement for a minute. Larry Page, CEO of Google, sees a meaningful void in the data it collects, and the void frustrates Google’s mission to organize the world’s information.
But there have always been data-sets that are off-limits. Google doesn’t process commercial data such as I/B/E/S from Thomson, FICO scores or D&B numbers. It doesn’t have access to telephone records or transaction histories through payment networks, such as Visa and Mastercard. So why now?
It’s because the web is slowly closing off, and it’s indicative of a sea-change in how people find content on the web.
Through most of its existence, Google has relied on the open web to expand, so it can collect new information, new users, and new levels of engagement. Everyday, it harvested the needs, desires and questions of an increasingly connected global community through its search box and matched these to results taken from its ever-expanding index of content. Google instrumented the web.
The web’s still growing. More users are engaging more services, day by day. But it’s happening in social networks and through mobile apps: it’s happening outside of the open web, beyond the reach of the Google crawlers. The robots.txt-gambit has finally been countered.
Robots.txt is the text-file that sits on web-servers and tells Google and other search engines to include or exclude it from the database. In a web mediated by search, it’s inconceivable that any site in need of traffic would opt out of the Google index. And with a knowing wink, the search industry has always said, “if you don’t want to be indexed, just let us know with Robots.txt.” They said it to the news industry as Google News hoovered up article after article, splayed them out in all their equivalence, and cemented the concept of news as a commodity.
What’s different today? Instagram took the bait, built a thriving community of more than 30 million users, and none of their photos make it into a Google search. Facebook, Twitter - they built terrific size and engagement, but they didn’t do it by making it easier to be found in a Google search. Rather than build a business out of arbitraging the commercial value of paid search terms, like Demand Media, they changed the way people found things on the web.
While Google invested billions in instrumenting the vast documents and contents of the web, Twitter, Facebook, Instagram instrumented people through tweets, status updates, and photos. And its this web of people that sits just beyond Google’s reach and frustrates Larry Page.