My goal with this blog

I write about relevant changes in the way that people use the web and how startups are built to provide services and products for this ever changing wonderful thing we still know as "the web." As a former entrepreneur turned early-stage investor, my greatest hope is for this to be useful to other folks that are like me in the hopes that they can avoid some of the mistakes I've made.

Getting back to the distributed Internet

These days everyone wants to grow a big cloud-based service with millions of users and billions of page views. Even Microsoft has got cloud-envy and wants to plunk down $50B to prove it. Unfortunately, most of us mere mortals have tons of trouble really getting these services to scale once we do get the users in the door. Things start to crawl, drives start to fill up, and before you know it you're Twitter with everyone biting the hand of the free service you've given them. Including this piece that argues for fronting Twitter with a proxy based of some sort of distributed hashtable thing.

While the caching/proxying thing has worked great wonders for most of todays super-scaled cloud services (YouTube, Facebook), I doubt it will continue to be the panacea for write-heavy communications applications like Twitter. Even the infallible Google seems to occasionally hit speed bumps with its venerable Gmail (which is not RDBMS-backed but is no doubt cooked from dilithium crystals).

It strikes me that the right solution to this problem is to revisit the notion of making distributed Internet applications work— but not the kind that have the service provider scaling across cages and datacenters. Instead what we need is to find models for using the "dark matter" of the Internet, namely all client computers connected to broadband connections and sitting idle for most of the day. NAT traversal issues and the mass-migration to laptops notwithstanding, these machines could be what is needed to help truly scale cloud-based services.

And even if all of those home/office PCs are too sporadically available, we could begin to rely on the growing number of personal accounts being created on VPS/shared hosting/Dot-Mac/whatever environments where each user is given some fractional part of a CPU and disk storage to do with as he pleases. This is why Wordpress's attempt at distributed Twitter, launched just a couple of weeks ago, is so interesting; most shared hosting environments provide a Wordpress install which means that with a little work (and some coordination between Twitter and the Prologue team), everyone can help take the load off.

There are "business model" reasons for why pulling off a hybrid implementation like this might prove difficult. After all at the end of the day, most consumer cloud services are valued by how many users (and more importantly user data and metadata) they are in possession of at the time of acquisition. Technical challenges aside, I am 100% certain that had someone pitched this approach to us at Tabblo back in 2005, we would have flat-out rejected it for all of the wrong business-model reasons. However, now that we are moving data centers though, I'm keenly aware of all of the terabytes of high-res image data that we must now shepherd across the continent when for most of what we needed, we could have done with a lot less (incidentally this problem is much worse for all of our bigger older cousins in the photo-hosting business).

Hopefully sometime soon we'll all get back to the original architecture of the Internet when it comes to this stuff— loosely coupled and distributed when it counts the most— and the scaling question will morph into one about writing large-scale distributed systems, a much more fun problem to work on.