Due: Tue, Dec 3, 7:00am
In this project, you will implement a tool called revlookup that uses multiple threads to lookup the domain names for a list of IPv4 addresses. We will design the tool to use a producer-consumer model, where the producer (main thread) reads the list of IPv4 addresses into a (bounded) circular queue, and the consumers (worker threads) remove an address from the queue and use the getnameinfo libc library function to try to resolve the IPv4 address to a domain name.
Additionally, the workers will add their lookup result to a hash table that maps an IPv4 address to a domain name. The hash table both stores the results and also serves as a cache: a worker thread should check if the result is already in the cache before it makes a potentially expensive call to getnameinfo.
The program's output is the content of this hashtable: it should loop over the hashtable and print each entry (see the rubric for the specific output format).
A file where each line contains a single IPv4 address in dotted-decimal notation, followed by a newline (note that there can be no leading or trailing whitespace).
Print a usage statement to stdout and exit with status 0.
The maximum number of IPv4 addresses that the circular queue can store at one time. The main thread inserts each IP address from IP_LIST_FILE into this queue. Must be > 0. The default is 10.
The number of worker threads. Each worker thread attempts to dequeue an IPv4 address from the queue and performs a DNS reverse lookup to try to resolve the address to a domain name. Must be > 0. The default is 1.
To help get started, please use the following zip file of skeleton code.
The skeleton code implements the major data structures, but it is up to you to implement the option parsing and the producer-consumer design. Note that for a hash table, we use uthash; the skeleton code wraps the calls to uthash in a simpler interface. If you are attempting bonus1, you will want to consult the uthash user guide to learn how to sort the hash table.
Implement a -s|--sort command-line option. When specified, the output should be sorted by IP address. Note that the IP addresses should be sorted not lexicographically on their string representation, but based on their 4-byte integer value. In other words, your sort function should use inet_pton to convert the IP string to a big-endian uint64_t, and then ntohl to convert it to a little-endian uint64_t. It is this final value that is the sort key.
Submit your project as a zip file via gradescope. Your project must include a Makefile that builds an executable called revlookup. Please refer to the instructions for submitting an assignment for details on how to login to gradescope and properly zip your project.
The input file contains 50 lines. Note, however, that some IP addresses are repeated, and so there are only 42 unique IP addresses.
I generated most of the IP addresses randomly: they may resolve to a domain name, they may not. Moreover, they may resolve to a specific domain name today, and a different one tomorrow. To help spot check the output, the input file also contains the following nine stable IP addresses (I've also noted the domain name they point to):
For all tests other than the bonuses, the output (stdout) should resemble the following (the order is not important):
./revlookup -t1 -q2 50.txt
./revlookup -t1 -q10 50.txt
./revlookup -t2 -q2 50.txt
./revlookup -t2 -q10 50.txt
./revlookup -t4 -q2 50.txt
./revlookup -t4 -q10 50.txt
./revlookup -t2 -q10 -s 50.txt
The output (stdout) should resemble the following:
./revlookup -t2 -q10 -n 50.txt
The output (stdout) should resemble the following: