Class Project: Reproduce a Network Research Result using Large Language Models

Due Jan. 24, 2026 by email to the teaching assistant.

Problem Definition

In class, you have learned about a number of networking-related papers, but in the absence of open-source prototypes, manually reproducing one of these articles takes a long time.In this project, you will learn how to use large language models, chain-of-thought-based prompt engineering, and few-shot learning to reproduce networking papers. Your goals are to (1) select a networking domain paper and obtain a good grasp of the effects of large language models, and (2) reproduce your selected networking domain paper with our semi-automated reproduction framework and perform an evaluation.

Requirements

Large Language Model:

You can choose one or more large language models to help you reproduce the networking domain paper.

GPT-5 (suggested)
Claude Sonnet 4.5 (suggested)
Gemini 3 (suggested)
Doubao
DeepSeek

Papers:

Choose one of the papers below to reproduce.

GRoot: Proactive Verification of DNS Configurations (SIGCOMM'20)
Tiramisu: Fast Multilayer Network Verification (NSDI'20)
Arya: Arbitrary Graph Pattern Mining with Decomposition-based Sampling (NSDI'23)
NCFlow: Contracting Wide-area Network Topologies to Solve Flow Problems Quickly (NSDI'21)
One-Size-Fits-None: Understanding and Enhancing Slow-Fault Tolerance in Modern Distributed Systems (NSDI'25)
ABM: Active Buffer Management in Datacenters (SIGCOMM'22)
DCTCP: Data Center TCP (SIGCOMM'10)
HPCC: High Precision Congestion Control (SIGCOMM'19)
ARROW: Restoration-Aware Traffic Engineering (SIGCOMM'21)
Agua: A Concept-Based Explainer for Learning-Enabled Systems (SIGCOMM'25)

Metric: Criteria for Judging Success

Criteria for judging the success of reproducing the paper:

Functional Evaluation:

Verify if the replication system achieves the basic functionalities of the original system. If the replication system performs similarly to or better than the original system in these aspects, it can be deemed successful functionally.

Performance Evaluation:

Compare the performance of the replication system with that of the original system. Performance metrics may include speed, stability, resource utilization, etc.

Experimental Document: Information to be Recorded

The information that needs to be recorded during the experiment:

Choice of Papers and Large Language Model:

Your choice of papers and large language model.

Number of Prompts:

Count the number of all prompts you used, the number of prompts constructed with the semi-automated framework, the number of prompts used for debugging, and the number of prompts used in addition to semi-automated frameworks (human involvement). If you believe the current number of manual prompts is excessive, you can improve the semi-automated reproduction framework and include the document outlining your improvements to the prompt framework in the final submission.

All Prompt	Prompts Constructed with the Semi-automated Frame	Prompt of Human Involvement	Debug Prompt

Reproduction Time:

Count the total time spent from the time you read the paper to the time you reproduced the system, and how much of that time was spent reading the paper, how much was spent on code generation, and how much was spent on debugging.

All Time (hour)	Read Paper	Code Generation	Debug

Realized Functions:

What functions were implemented in the original system and what functions did you reproduce. For example:

	Simple Marking at the Switch	...	...
Functions realized in the original system
Reproduced system

If there are functions that have not been reproduced, please explain why.

Performance Comparison:

Perform two or more performance evaluations, with each performance evaluation tested with at least two datasets. As much as possible, use the dataset used by the original system. If the dataset tested by the original system is overly large, you may choose a subset of the data for this project's testing. However, be sure to elaborate on the method used to select the subset in your documentation. For example:

Time: (One type of performance in one table)

Dataset	Original System	Reproduction System	Average Relative Error(%)
Dataset1	X s	Y s	\|Y-X\|/X *100%
Dataset2
...

LOC (Line of Code):

Count the lines of code of the reproduction system and original system.

	Original System	Reproduction System
LOC (line)	X lines	Y lines

Submission

Please submit by email to the instructor. Turn in electronic and paper material as follows.

Submission should include:

Your source code.
A dialogue document that records all your interactions with (inputs and outputs) LLM.
A report document including evaluation.