Homework 7 - parsing packet traces
In this homework, we use libpcap to analyze packet traces captured with tcpdump. You can use "man pcap" to learn about the pcap API. libpcap gives us one packet at a time, in the order that they originally arrived. It is up to us to process the packets, and try to learn something from them.
Our interest in this homework is to reconstruct the data flowing between hosts, based on tcpdump traces. We will focus on TCP flows, and a key aspect of the homework is the reassembly of TCP packets into the original data. A correct submission will contain the following
- A Makefile that, given simply the command "make", produces an executable called 'hw7'.
- The hw7 binary takes two command line arguments: an input file (produced by tcpdump -w), and a directory for output files.
- Running hw7 produces a table of flows, per (unidirectional) flow, identified by src ip/port, dst ip/port. For each flow, the number of segments and data payload bytes (not counting duplicate packets) in each direction should be listed.
- In the directory indicated, a file for each flow named as follows: SRCIP.SRCPORT-DSTIP.DSTPORT.log
- Each of these files should contain all payload data (no IP/TCP headers) sent over each flow. Take care to handle packet duplicates and reordering!
For example, it may say
~> hw6 thetrace thedirectory
SRC IP/PORT DST IP/PORT BYTES PACKETS
a.b.c.d/8484 e.f.g.h/80 12205 2115
e.f.g.h/80 a.b.c.d/8484 3555 223
a.b.c.d/22 e.f.g.i/19495 1205 211
e.f.g.i/19495 a.b.c.d/22 335 32
and thedirectory would contain the files
~> ls thedirectory
a.b.c.d.8484-e.f.g.h.80
e.f.g.h.80-a.b.c.d.8484
a.b.c.d.22-e.f.g.i/19495
e.f.g.i/19495-a.b.c.d.22
An example tcpdump tracefile is included in the hw7 template directory. However, it would be advisable to record your own traces and try your solution on them as well. When grading, we will use this file as well as another dump containing some tcp flows.
Hints
Read the ip and tcp header structure definitions in /usr/include/netinet/ip.h and tcp.h.
Use tcpdump / wireshark to verify that your code is parsing the packets correctly.
Use lseek to jump to an arbitrary point in a file, even beyond its current size. fopen truncates the file when opening for writing, so open() may be a better idea.
inet_ntoa() is a handy function for printing IP addresses. However, beware: it uses a static char array internally, no memory is allocated for the return value!
Make sure to store the initial sequence number of each flow when it gets established.