DaveNet : Clay Shirky on P2P

Clay Shirky on P2P

Wednesday, November 15, 2000 by Dave Winer.

Intro

Clay Shirky is an Internet entrepreneur and venture capitalist at The Accelerator Group in NYC. He's also one of the leading thinkers in the new software artform called P2P. Like many others, he's uncomfortable with the phrase, but is coming to terms with it.

Clay posted an essay on a mailing list I subscribe to, attempting to explain once and for all, what P2P is, and I think he did a wonderful job. I asked if I could run his comments through DaveNet and he said yes.

So here's what Clay Shirky says about P2P.

Clay Shirky: What is P2P?

After a year or so of attempting to describe the revolution in file sharing and related technologies, we have finally settled on a label for what's happening: peer-to-peer.

Somehow, though, this label hasn't clarified things. Taken literally, servers talking to one another are peer-to-peer. The game Doom is peer-to-peer. There are even people applying the label to email and telephones. Meanwhile, Napster, which jump-started the conversation, is not peer-to-peer in the strictest sense, because it uses a centralized server to store pointers and resolve addresses.

If we treat peer-to-peer as a literal definition for what is happening, then we have a phrase that describes Doom but not Napster, and suggests that Alexander Graham Bell is a peer-to-peer engineer but Shawn Fanning is not.

This literal approach to peer-to-peer is plainly not helping us understand what makes P2P important. Having computers act as peers on the internet is hardly novel, so the mere fact of peer-to-peer architecture can't possibly be the explanation for the recent changes in internet use.

What *has* changed is what the nodes of these P2P systems are -- internet-connected PCs, which had been formerly relegated to being nothing but clients -- and where these nodes are -- at the edges of the internet, cut off from the DNS system because they have no fixed IP address.

Resource-centric addressing

P2P is a class of applications that takes advantage of resources -- storage, cycles, content, human presence -- available at the edges of the internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, P2P nodes must operate outside the DNS system, and have significant or total autonomy from central servers.

That's it. That's what makes P2P distinctive.

Napster and ICQ and Freenet and AIMster and Popular Power and Groove are all leveraging previously unused resources, by tolerating and even working with the variable connectivity of the hundreds of millions of devices that have been connected to the edges of the internet in the last few years.

P2P is as P2P does

Up until 1994, the whole internet had one model of connectivity. Machines were assumed to be always on, always connected, and assigned permanent IP addresses. The DNS system was designed for this environment, where a change in IP address was assumed to be abnormal and rare, and could take days to propagate through the system.

With the invention of Mosaic, another model began to spread. To run a Web browser, a PC needed to be connected to the internet over a modem. This created a second class of connectivity, because PCs would enter and leave the network cloud frequently and unpredictably, and would have a different, possibly masked, IP address with each new session. This instability prevented PCs from having DNS entries, and therefore prevented PC users from hosting any data or net-facing applications locally.

For a few years, treating PCs as dumb but expensive clients worked well. PCs had never been designed to be part of the fabric of the internet, and in the early days of the Web, the toy hardware and operating systems of the average PC made it an adequate a life-support system for a browser, but good for little else.

Over time, though, as hardware and software improved, the unused resources that existed behind this veil of second class connectivity started to look like something worth getting at. At a conservative estimate, the world's net-connected PCs presently host an aggregate ten billion Mhz of processing power and ten thousand terabytes of storage, assuming only 100 million PCs among the net's 300 million users, and only a 100 Mhz chip and 100 Mb drive on the average PC.

The veil is drawn back

The launch of ICQ in 1996 marked the first time those intermittantly connected PCs became directly addressable by average users. Faced with the challenge of establishing portable presence, ICQ bypassed DNS in favor of creating its own directory of protocol-specific addresses that could update IP addresses in real time, a trick followed by Groove, Napster, and NetMeeting as well. (Not all P2P systems use this trick. Gnutella and Freenet, for example, bypass DNS the old fashioned way, by relying on numeric IP addresses. Popular Power and SETI@Home bypass it by giving the nodes scheduled times to contact fixed addresses, thus delivering their current IP address at the time of the connection.)

Whois counts 23 million domain names, built up in the 16 years since the inception of IP addresses, in 1984. Napster alone has created more than 23 million non-DNS addresses in 16 months, and when you add in all the non-DNS Instant Messaging addresses, the number of P2P addresses designed to reach dynamic IPs tops 200 million. Even if you assume that the average DNS host has 10 additional addresses of the form foo.host.com, the total number of P2P addresses now equals the total number of DNS addresses after only 4 years, and is growing faster than the DNS universe today.

As new kinds of net-connected devices like DVRs and wireless PDAs proliferate, they will doubtless become an important part of the internet as well, but for now PCs make up the enormous preponderance of these untapped resources. PCs are the dark matter of the internet, and their underused resources are fueling P2P.

Litmus tests

If you're looking for a litmus test for P2P, this is it: 1) Does it treat variable connectivity and temporary network addresses as the norm and, 2) does it give the nodes at the edges of the network significant autonomy?

If the answer to both of those questions is yes, the application is P2P. If the answer to either question is no, it's not P2P.

Another way to examine this distinction is to think about ownership. The is less about "Can the nodes speak to one another?" and more about "Who owns the hardware that the service runs on?" The huge preponderance of the hardware that makes Yahoo work is owned by the Yahoo and managed in Santa Clara. The huge proponderance of the hardware that makes Napster work is owned by Napster users and managed on tens of millions of individual desktops. P2P is a way of decentralizing not just features, but costs and administration as well.

Who's whom?

Napster is P2P, because the addresses of Naspter nodes bypass the DNS system, and because once the Napster server resolves the IP addresses of the PCs hosting a particular song, it shifts control of the file transfers to the nodes. Furthermore, the ability of the Napster nodes to host the songs without central intervention lets Napster users get access to several terabytes of storage and bandwidth at no additional cost.

However, Intel's "server peer-to-peer" is not P2P, because servers have always been peers. Their fixed IP addresses and permanent connections present no new problems, and calling what they already do "peer-to-peer" presents no new solutions.

ICQ and Jabber are P2P, because not only do they devolve connection management to the individual nodes once they resolve the addresses, they violate the machine-centric worldview encoded in the DNS system. Your address has nothing to do with the DNS systems, or even with a particular machine, except temporarily -- your chat address travels with you. Furthermore, by mapping 'presence' -- whether you are at your computer at any given moment in time -- chat turns the old idea of permanent connectivity and IP addresses on its head. Chat is an important protocol *because* of the transience of the connectivity.

Email, which treats variable connectivity as the norm, is nevertheless not P2P, because your address is not machine independent. If you drop AOL in favor of another ISP, your AOL email address disappears as well, because it hangs off DNS. Interestingly, in the early days of the internet, there was a suggestion to make the part of the email address before the @ globally unique, linking email to a person rather than to a person@machine. That would have been P2P in the current sense, but it was rejected in favor of a machine-centric view of the internet.

Popular Power is P2P, because the distributed clients that contact the server need no fixed IP address and have a high degree of autonomy in performing and reporting their calculations, and can even be offline for long stretches while still doing work for the Popular Power network.

Dynamic DNS is not P2P, because it tries to retrofit PCs into the traditional DNS system, and so on.

This list of resources which current P2P systems take advantage of -- storage, cycles, content, presence -- is not necessarily complete. If there were some application which needed 30,000 separate video cards, or microphones, or speakers, a P2P system could be designed that used those resources as well.

P2P is a horseless carriage

Whenever something new seems to be happening on the internet, there is a push to define it, and as with the "horseless" carriage or the "compact" disc, new technologies are often labelled according to some simple difference from what came before -- horsedrawn carriages, non-compact records.

Calling this new class of applications peer-to-peer emphasizes their difference from the dominant client/server model. However, like the horselessness of the carriage or the compactness of the disc, the "peeriness" of P2P is more a label than a definition.

As we've learned from the history of the internet, adoption is a better predictor of software longevity than perfection is, and as the P2P movement matures, users will not adopt applications that embrace decentralization for decentralization's sake. Instead, they will adopt those applications which use just enough decentralization, in just the right way, to create novel functions or improve existing ones.

Clay Shirky