Project 3: Building a BitTorrent Client
Overview:
In
this assignment, your job is to design and
implement a simple
BitTorrent (BT) client that reads a
.torrent file containing metadata for the file to be downloaded, connects to a tracker as
well as multiple peers, and is able to upload and download to seeds and
peers simultaneously. This means you will have to manage the state of
several connections and work with multiple threads. Your client will
connect to seeds, peers, and leeches to download a
single video file (.mov) from that peer. When the file is finished downloading, you
will save it to the hard disk. All communication will be done over TCP; and should also accept incoming connections
from other students' clients.You will not have to worry about choking any
peers, but you will have to make sure to unchoke them and keep them up to date on what pieces your
client has downloaded. You only have to download a single file from the torrent file.Design Reviews:
To make sure that you have a working design before you attempt to implement
it, you should meet with a TA to review your design. These meetings will
be held around ten days before the due date, and should take around 20-25
minutes. Please bring a copy of your design for the TA to read. In addition,
you will turn in your final design document reflecting the actual implementation
one day after the phase is due (might not change from the design review
document if you design well).
Submitting the Project:
Your project should be submitted by email as either a .zip or .tar file., and the "main" starting method
should be in a file called mybtclient.java.Assignment Outline:
- Resources for this project .
- Java and torrent files
- BitTorrent client operation
- Bencoding
- Communicating with the Tracker
- Communicating with Peers
- Helpful Hints and Programming Style
- Tasks
- Design Questions
Resources for this Project:
It is strongly recommended that you
bookmark or download the Sun Java 1.5.0 API
, as well as read the following pages:
Java and Torrent Files:
The following files will allow you to concentrate
on the networking portion of the project and without having to worry about how to
bencode or "unbencode" the
torrent file and some of the communication.
- TorrentFile.java - Stores the data from a .torrent
file in an easy-to-access format.
- TorrentFileHandler.java - Accepts a String
corresponding to a filename for a .torrent file and returns a TorrentFile object containing the unencoded data stored in the file.
- TorrentFileHandlerTester.java - An
example of how to use the TorrentFileHandler and TorrentFile classes.
- Bencoder.java
- Accepts Strings, Integers, Lists, and Maps (Dictionaries) and returns them
as bencoded byte arrays. Also accepts bencoded byte arrays of Strings, Integers, Lists, and
Dictionaries and returns them as Strings, Integers, Lists and Maps.
- BencoderTester.java
- An example of how to use the Bencoder class.
- ToolKit.java
- A utility class that includes functions for conversion between byte[] and int.
- project3.torrent
- This is the torrent file that contains the metadata about the file your client will download.
BitTorrent Client Operation:
Your BitTorrent client should basically do the
following:
- Take as a command-line
argument the name of the .torrent file to be loaded and the name of the file to
save the data to. For example:
java mybtclient project3.torrent myvideo.wmv
- Open the .torrent file and
parse it using the Bencoder class to decode the data.
- Allocate the total space for the file (named by the second command-line argument) on the disk before
starting the download of the file.
- Open a TCP socket to the tracker at the IP address and
port specified by the TorrentFile object and send an HTTP GET request to the tracker.
- Decode the response
from the tracker and extract the IP adresses and ports of the list of
peers. You must extract these IP addresses from the list -- hard-coding
it is not acceptable.
- Open TCP connections to the
peers and begin downloading simultaneously from several of them.
- Download each piece
of the file, verify its SHA-1 hash against the hash stored in the .torrent
file, and then writing the appropriate pieces into the file. The first time you begin the download,
you need to contact the tracker and let it know you are starting download.
- After a piece is downloaded
and verified, notify the peers that you have completed the piece.
You can keep track of valid pieces using a java.util.BitSet.
- Other peers should be able to
request pieces that you have verified from your client, and your client should
send those pieces to the peers, according to the BT protocol.
- Repeat steps 7-8 (using the
same TCP connections) for the rest of the file. Periodically, your client should update its status to the tracker by repeating steps 4-6 (using the same TCP connections). The update period should be
no less than the interval returned by the tracker.
This may change during execution, so it should be updated each time the tracker
is contacted (if the interval value is excessively
large, you may cap it at 180 seconds).
- Make sure your client has
uploaded at least 10% of the file size before closing your program.
- When the file is finished,
contact the tracker and send it the completed event and properly
close all connections.
In
addition to the above basic run-through of your program's execution, your
program should be able to detect input from the user at any point (you can
decide what the input should be), and allow the user to close the program,
suspending download of the file. The user should be able to restart the program
at a later time and resume downloading. Any non-verified pieces/blocks should be
discarded before suspending, and the program should also save how much it has
uploaded and downloaded for that torrent. How
you choose to implement saving the state of your program is up to you.
Bencoding (Pronounced "Bee Encoding"):
Bencoding a method of encoding
binary data. Tracker responses and inter-peer
communication are bencoded, according to the BT protocol as shown in the following list (taken from http://www.bittorrent.org/protocol.html)
- Strings are length-prefixed
base ten followed by a colon and the string. They are encoded in UTF-8. For
example 4:spam corresponds to 'spam'.
- Integers are represented by
an 'i' followed by the number in ASCII base 10
followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3.
Integers have no size limitation. i-0e is invalid. All
encodings with a leading zero, such as i03e, are invalid, other than i0e, which
of course corresponds to 0.
- Lists are encoded as an 'l'
followed by their elements (also bencoded) followed by
an 'e'. For example, l4:spam4:egse corresponds to ['spam', 'eggs'].
- Dictionaries are encoded as a
'd' followed by a list of alternating keys and their corresponding values
followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow':'moo', 'spam':'eggs'} and
d4:spaml1:a1:bee corresponds to {'spam': ['a', 'b']}. Keys must be strings and
appear in sorted order (sorted as raw strings, not alphanumerics).
Communicating with the Tracker:
Your BT client uses the tracker's IP address and port number supplied by
the TorrentFile object to contact the tracker, and send it an HTTP_GET request
with the following key/value pairs. Note that these are NOT bencoded, but must be properly escaped [this list is taken
from http://www.bittorrent.org/protocol.html
]:
- info_hash - The 20 byte(160-bit) SHA1
hash of the bencoded form of the info value from the
metainfo file. This value must be escaped.
- peer_id - A string of length 20 which
this downloader uses as its id. Each downloader generates its own id at random
at the start of a new download. This value must be
escaped.
- ip - An optional parameter
giving the IP (or DNS name) for this peer.
- port - The port number this peer
is listening on. Common behavior is for a downloader to try to listen on port
6881 and if that port is taken try 6882, then 6883, etc. and give up after 6889.
- uploaded - The total amount uploaded
so far, encoded in base ten ASCII.
- downloaded - The total amount downloaded
so far, encoded in base ten ASCII.
- left - The number of bytes this
peer still has to downloaded, encoded in base ten ASCII. This key is important - if you do not specify
how much you have left to download, the tracker assumes you are a seed and will
not return any seeds in the peer list.
- event - This is an optional key
which maps to started , completed , or stopped (or empty
, which is the same as not being present. If not present, this is one of the
announcements done at regular intervals. An announcement using started is
sent when a download first begins, and one using completed is sent when
the download is complete. No completed is sent if the file was complete
when started. Downloaders send an announcement using
stopped when they cease downloading. Your client should also
periodically update its status to the tracker, using an update period no less than the interval returned by the tracker. Since the interval may change during execution, it should be updated each time the tracker
is contacted. If the interval value is excessively
large, you may cap it at 180 seconds.
The response from the tracker
is a bencoded dictionary and contains two keys:
- interval - Maps to the number of
seconds the downloader should wait between regular rerequests.
- peers - Maps to a list of
dictionaries corresponding to peers, each of which contains the keys peer id
, ip , and port , which map
to the peer's self-selected ID, IP address or dns name
as a string, and port number, respectively.
Communicating with Peers:
Handshaking between peers begins with character
nineteen (decimal) followed by the string 'BitTorrent protocol'. After the fixed headers are 8 reserved bytes which are set to
0. Next is the 20-byte SHA-1 hash of the bencoded form of the info value from the metainfo (.torrent) file. The next 20-bytes are the peer id
generated by the client. The info_hash should be the
same as sent to the tracker, and the peer_id is the
same as sent to the tracker. If the info_hash is
different between two peers, then the connection is dropped.
- All integers are encoded as
4-bytes big-endian.
- The peer_id should simply be random numbers.
After the handshake, messages
between peers take the form of <length prefix><message
ID><payload> , where length prefix is a 4-byte
big-endian value and message ID is a single decimal character. The payload
depends on the message. Please consult either of the BT-related resources for
detailed information describing these messages. Below is a list of messages that
need to be implemented for this project.
- bitfield: <len=0001+X><id=5><bitfield>
The bitfield message
may only be sent immediately after the handshaking sequence is completed, and
before any other messages are sent. It is optional, and need not be sent if a
client has no pieces.
The bitfield message
is variable length, where X is the length of the bitfield. The payload is a bitfield representing the pieces that have been successfully
downloaded. The high bit in the first byte corresponds to piece index 0. Bits
that are cleared indicated a missing piece, and set bits indicate a valid and
available piece. Spare bits at the end are set to zero.
- keep-alive: <length prefix>
is 0000. There is no message ID and no payload. These should be sent around
once every 2 minutes to prevent peers from closing connections.
- choke: <length
prefix> is 1 and message ID is 0. There is no
payload.
- unchoke: <length
prefix> is 1 and the message ID is 1. There is no payload.
- interested: <length prefix>
is 0001 and message ID is 2. There is no payload.
- uninterested: <length prefix>
is 0001 and message ID is 3. There is no payload.
- have: <length prefix>
is 0005 and message ID is 4. The payload is a zero-based index of the piece
that has just been downloaded and verified.
- request: <length prefix>
is 0013 and message ID is 6. The payload is as follows:
<index><begin><length>
Where <index>
is an integer specifying the zero-based piece index, <begin> is
an integer specifying the zero-based byte offset within the piece, and
<length> is the integer specifying the requested length.
<length> is typically 2^14 (16384) bytes.
A smaller piece should only be used if the piece length is not divisible by
16384. A peer may close the connection if a block larger than 2^14 bytes is
requested.
- piece: <length prefix>
is 0009+X and message ID is 7. The payload is as follows:
<index><begin><block>
Where <index>
is an integer specifying the zero-based piece index, <begin> is
an integer specifying the zero-based byte offset within the piece, and
<block> which is a block of data, and is a subset of the piece
specified by <index> .
Below is an example of what would take place between
two peers setting-up a connection and starting sharing.
- The local host opens a TCP
Socket to the remote peer and sends the handshake packet. The local host then
listens for the remote peer to respond.
- Upon receiving the handshake
packet and verifying the info_hash, the remote peer
responds with a similar handshake packet (except with its peer_id). The remote peer then listens for the local host to
send a bitfield or other packet. The remote host can
send a bitfield packet to the local host at this time.
- Upon receiving the handshake
and verifying the info_hash, the local host then
(optionally) sends a bitfield packet which tells the
remote peer which pieces it has downloaded and verified so far.
- If the local host is
interested in what the remote peer has downloaded, then it sends an interested
packet, otherwise it sends an uninterested packet. If the remote peer is
interested in what the local host has downloaded, then it sends an interested
packet, otherwise it sends an uninterested packet.
- When the local host, or the
remote peer, is ready to download/upload to the other, it will send an unchoke packet.
Please note that clients will, and your client
should, ignore any request packets received while a remote peer is choked. A
client should only upload to another client if the connection is unchoked AND the remote peer is interested. This means that
a remote peer will not reply to the local hosts's
request packets unless you have expressed interest AND the remote peer has sent
an unchoke packet. If the remote peer sends a choke
packet during data transfer, any outstanding requests will be discarded and
unanswered - they should be re-requested after the next unchoke packet.
Helpful Hints and Programming Style:
- If you aren't sure what a particular packet is supposed to look
like, you can always fire up a working client, such as Azereus BitTorrent client,
and then use a network packet sniffing tool like Wireshark to
capture the traffic and analyze it.
- If
you're having trouble contacting the Tracker, remember that contacting
the Tracker is just a simple HTTP GET request. Since this type of
operation is commonly done by every web browser and several smaller
programs, you can use the Java class java.net.URL to easily handle this type of request. Remember that whatever you're trying to do in Java has probably already been
done a thousand times, so it's usually better (and faster) to find an existing implementation
than to write one yourself.
- Bencoder.java doesn't work with LANG=en_US.UTF8, therefore, remember to set the
LANG environment variable to "en_US" by following commands:
For C shell:
setenv LANG en_US
For Bourne shell: export LANG=en_US
Writing maintainable code is just as important as
writing correct code. Consequently, you will be graded on your style. Below is a
list of suggestions to make your code easier to understand and maintain:
- Conditions - All of your
methods must have clearly specified preconditions and postconditions. A precondition is what you assume to be true
before your method is called, and a postcondition is what you can show to be true after
your method has returned. Trivial preconditions (such as "int a is an integer") should be
omitted. The best place to put the pre and postconditions is in a comment before the first line of each
method.
Example:
/*
Precondition: b is non-zero.
Postcondition: returns a / b.
*/
double divide(double a, double b) {
return (a / b);
}
- Comments - Comments for
variables and for sections of code that does not have an obvious result. Please
do not comment obvious statements or sections of code. For example,
int i = 0; //i is an integer and the initial value is 0
is not useful and unnecessarily
clutters the code.
- Naming - Names of classes,
methods, and variables should be descriptive. For example, if a class has a
field containing its hash value, naming the field hash_value is much better than naming it hv .
- Exceptions - Catch exceptions
and do something useful with them. For example, print a statement describing
what went wrong to System.error.
- Easy to understand
loops/recursion. When possible, it's best to avoid break statements
whenever possible. When using recursion, tail-recursion is often preferable
because "smart" compilers can often translate it into a loop.
- Efficiency - Avoid being
terribly inefficient. For example, if you have to sort 10M objects, it's better
to sort them with a O(n*lg(n)) algorithm than a O(n2)
algorithm.
- Encapsulation - Encapsulation
is a useful feature of object-oriented languages, and so you should try to write
classes that expose as little as possible. This way you can change the
implementation of a class without affecting the rest of your program.
- Encapsulation/Objects - Encapsulation is a useful
feature of object-oriented languages, and because you are writing your program
in Java, which is an object-oriented language, your program must separate the functionality
into multiple classes. This way you can change the implementation of a class
without affecting the rest of your program. An example of separation would be to
have separate FileHandler, PeerInterface, and TrackerInterface classes to manipulate files, interface with
peers and interface with the tracker, respectively. If you do not separate your
program into functional units, you will lose points on the Style portion of the
grading.
Tasks:
- (5%) Correctly setting up the TCP
sockets/connections
- (20%) Correctly interfacing with
the tracker
-- Retrieving the peer list, periodically updating the tracker,
updating the tracker when quitting the application, and updating the
tracker after completing the download.
- (20%) Correctly interfacing with peers -- Handshaking with peers, maintaining local state, sending and receiving keep-alive messages.
- (20%) Correctly downloading from at
least 2 peers simultaneously
- (20%) Correctly uploading to at
least 2 peers simultaneously
- (15%) Correctly downloading and
verifying the file
Design Questions:
Your design document should include the following:
- A high-level description and diagrams of
how your program works. Specifically, describe the high-level interactions
between the classes. This is a good way to make sure that your program doesn't
have any classes depending on the implementations of other classes. If you know
how to use LaTeX, this should be fairly easy to
diagram. Also, OpenOffice.org's Draw program has an
Export to PDF feature that is very simple to use.
- A brief (about 1 paragraph)
description of each class in the program.
Please answer the following questions in your design document (due one
day after the code for this phase is due).
- How long did you originally think this project would take
you to finish?
- How much time did you actually spend on this
project?
- What parts of the project were the most challenging and
which were the easiest?
- Did you have trouble finding resources for the project?
- Were any parts of the project confusing to you?
- Any additional feedback about the program -- please provide constructive/useful
comments or insights.
This assignment is derived from one developed at Rutgers University for CS 352.