Filesharing Technologies

HomePage | RecentChanges | Preferences

The Technology

There are four basic technologies used in file-sharing applications:

1) File-transfer itself.

This is the obvious basic building block, and is a simple case of one program opening a file and transmitting the contents to another program, which creates a file and puts the information it receives into it. Reading and writing files is about as simple as you can get and transmitting information is what the internet does. This means that file-transfer programs can, in their most basic form, be incredibly simple (a few dozen lines of Visual Basic will do it).

2) Indexing.

Basic file transfer is fine if the server is sending a pre-set file - but if the client is going to request a file then they usually want to be able to search for the correct one. For a system where all files are kept on a central server (i.e. iTunes or a filestore) the index merely has to keep track of the name of the file and where it can be found. For centrally indexed systems (such as the original Napster) the index has to keep track of which machine the file is on. Again, this is fairly trivial - a simple database with a single table can manage this with ease. More complex systems could be used for efficiency's sake, but creating something basic shouldn't take long at all.

3) Message-Passing.

When Napster was sued out of existence it sent shockwaves through the file-transfer community - not only had they lost a major resource, but any successor to Napster was certain to suffer the same fate. Much to their relief Justin Frankel stepped in with Gnutella - a decentralised file-sharing system. Rather than each client (or 'node')connecting to a central server, each one connected to a few other clients. Each node would pass on messages from the ones connected to it, allowing nodes to communicate even if they weren't directly connected. The ways that messages are passed can get very complicated, in order to prevent the number of messages from swamping the system when it gets too large, but the basic idea is quite simple.

The most common method of simplifying the system has been for the majority of clients to connect to 'supernodes' - more powerful, well connected nodes - with these nodes acting as central indexes for all the clients attached to them. This means that queries never have to go to the clients themselves, keeping their traffic levels much lower.

4) Swarming

Imagine that there are 500 people all trying to download a file from the same place at once - most servers would get completely swamped. But with most file-sharing networks the same file is being shared by many people - it therefore makes sense for each person to share the load between them. However, some people are on very fast connections and some are on very slow ones. If your download happens to be directed to a very slow uploader, you might be waiting days for a file that other people downloaded in minutes. The answer is to break the files into sections small enough for even a slow connection to upload in a few minutes, and to then download these tiny chunks from different places.

Another tweak makes this even more powerful - there's no need for every segment of a file to be downloaded before you start to share them - which means that once a node has downloaded segment one, they can upload it to a different node, taking the pressure off of the original node and allowing it share the rest of the segments faster.

The only complex technology here is making sure that two files _are_ identical - i.e. that when two people are both sharing the same Britney Spears song that they both have the same mp3, not two different ones - otherwise the file cannot be stitched back together.

What Uses What

All file-sharing programs obviously transfer files. IM programs (MSN, AIM, YM, etc.) do nothing else, but generally don't need to.

FTP and file servers have a very basic file-search capability - they allow files to be grouped into folders; but if you've ever gone looking for a file on your computer you know how cumbersome this is. More modern file-servers (Windows 2000/XP, for instance) create indexes to speed up searching. Napster was the first program to allow file-searching across multiple machines, which is why it was so useful/dangerous. Grokster/Kazaa/eMule all have this functionality.

Message-passing is used by most semi-decentralised systems, like Grokster/Kazaa/eMule/etc. However, they have the SuperPeer capability which means that only these huge nodes really need to pass messages along - all other nodes simply hang off of one of them. WASTE relies heavily on message-passing to make sure that information is passed from machine to machine, and encrypts it to ensure that people can't accidentally see messages they shouldn't. The ultra-paranoid system Freenet _only_ uses message-passing - a node never makes a direct connection beyond nodes that the user trusts - and when messages are passed their origin is removed so that they cannot be traced back.

Swarming originated with Kazaa, but has swiftly become common-place throughout the P2P ecosystem. All modern filesharing systems include this. BitTorrent is an oddity in that it _only_ supports File-sharing and Swarming - it is deliberately designed to allow for the downloading of a single file from a centrally-specified location, using Swarming to speed up the connection. It has become the most common legitimate file-sharing system (after FTP and HTTP) and is used by many Linux distributors to keep bandwidth costs low.


HomePage | RecentChanges | Preferences
This page is read-only | View other revisions
Last edited December 7, 2004 7:54 am by 193.138.107.178 (diff)
Search: