CSCI 415/515: Fall 2023
Systems Programming
Project 6: revlookup

Due: Fri, Dec 8, 5:00pm


In this project, you will implement a tool called revlookup that uses multiple threads to lookup the domain names for a list of IPv4 addresses. We will design the tool to use a producer-consumer model, where the producer (main thread) reads the list of IPv4 addresses into a (bounded) circular queue, and the consumers (worker threads) remove an address from the queue and use the getnameinfo libc library function to try to resolve the IPv4 address to a domain name.

Additionally, the workers will add their lookup result to a hash table that maps an IPv4 address to a domain name. The hash table both stores the results and also serves as a cache: a worker thread should check if the result is already in the cache before it makes a potentially expensive call to getnameinfo.

The program's output is the content of this hashtable: it should loop over the hashtable and print each entry (see the rubric for the specific output format).

Name

revlookup - Lookup the domain names for a list of IPv4 addresses

Synopsis

revlookup [-h] [-q MAX_QUEUE_SIZE] [-t NUM_THREADS] IP_LIST_FILE

Positional Arguments

IP_LIST_FILE

A file where each line contains a single IPv4 address in dotted-decimal notation, followed by a newline (note that there can be no leading or trailing whitespace).

Options

-h, --help

Print a usage statement to stdout and exit with status 0.

-q, --max-queue-size MAX_QUEUE_SIZE

The maximum number of IPv4 addresses that the circular queue can store at one time. The main thread inserts each IP address from IP_LIST_FILE into this queue. Must be > 0. The default is 10.

-t, --threads NUM_THREADS

The number of worker threads. Each worker thread attempts to dequeue an IPv4 address from the queue and performs a DNS reverse lookup to try to resolve the address to a domain name. Must be > 0. The default is 1.

Skeleton Code

To help get started, please use the following zip file of skeleton code.

The skeleton code implements the major data structures, but it is up to you to implement the option parsing and the producer-consumer design. Note that for a hash table, we use uthash; the skeleton code wraps the calls to uthash in a simpler interface. If you are attempting bonus1, you will want to consult the uthash user guide to learn how to sort the hash table.

Bonus 1: Sort by IP

Implement a -s|--sort command-line option. When specified, the output should be sorted by IP address. Note that the IP addresses should be sorted not lexicographically on their string representation, but based on their 4-byte integer value. In other words, your sort function should use inet_pton to convert the IP string to a big-endian uint64_t, and then ntohl to convert it to a little-endian uint64_t. It is this final value that is the sort key.

Bonus 2: Sort by Domain Name

Implement a -n|--sort-by-name command-line option. When specified, the output should be sorted lexicographically based on the domain name.

Submitting

Submit your project as a zip file via gradescope. Your project must include a Makefile that builds an executable called revlookup. Please refer to the instructions for submitting an assignment for details on how to login to gradescope and properly zip your project.

Rubric

Input

The input file contains 50 lines. Note, however, that some IP addresses are repeated, and so there are only 42 unique IP addresses.

50.txt


      

I generated most of the IP addresses randomly: they may resolve to a domain name, they may not. Moreover, they may resolve to a specific domain name today, and a different one tomorrow. To help spot check the output, the input file also contains the following nine stable IP addresses (I've also noted the domain name they point to):

knowns.txt




      

Output

For all tests other than the bonuses, the output (stdout) should resemble the following (the order is not important):

output.txt




      

One Worker Thread


1.1 -q2 (17 pts)


        ./revlookup -t1 -q2 50.txt
        

1.2 -q10 (17 pts)


        ./revlookup -t1 -q10 50.txt
        

Two Worker Threads


2.1 -q2 (17 pts)


        ./revlookup -t2 -q2 50.txt
        

2.2 -q10 (17 pts)


        ./revlookup -t2 -q10 50.txt
        

Four Worker Threads


3.1 -q2 (17 pts)


        ./revlookup -t4 -q2 50.txt
        

3.2 -q10 (15 pts)


        ./revlookup -t4 -q10 50.txt
        

Bonus 1: Sort Output by IP


100.1 sort (10 pts)


        ./revlookup -t2 -q10 -s 50.txt
        

The output (stdout) should resemble the following:

output_sorted_by_ip.txt

      

Bonus 2: Sort Output by Domain Name


200.1 sort (10 pts)


        ./revlookup -t2 -q10 -n 50.txt
        

The output (stdout) should resemble the following:

output_sorted_by_domain.txt