CS262a: Fall 2021 Final Projects

This page contains final CS262a projects for Fall of 2021. These projects were done in groups of two or three (maximum of four for undergraduates) and spanned a wide range of topics.

To see what the project suggestions were, see HERE.

1:   Co-scheduling Feature Updates and Queries for Feature Stores (GoogleDoc proposal: HERE)
 Woosuk Kwon and Debbie Liang and Jimmy Xu
RALF is a system for maintaining data in feature stores, which store pre-computed features use for ML model training and inference. RALF is responsible for processing new data to compute updates to feature tables, which are queried by downstream models. Currently, RALF only computes updates eagerly. This project is to add support for lazy computation in RALF, so that features are computed when queried to meet some specified deadline.
Supporting Documentation: Final Report (pdf) Poster (pdf)
2:   Privacy-Preserving Opportunistic Network (GoogleDoc proposal: HERE)
 Alvin Tan and Tess Despres and Shishir Patil and Jean-Luc Watson
Wireless Sensor Networks (WSNs) are becoming increasingly popular for applications such as city sensing, health monitoring, wildlife tracking, etc. However, these systems are bottlenecked by the number and range of available gateways along with the cost of maintaining such gateways. To address this, our project goal is to investigate privacy-preserving opportunistic sensor networks. These networks require no fixed infrastructure and allow sensors to opportunistically piggy-back data through Mobile Ubiquitous LAN Extensions (MULE) devices that may be intermittently passing through the sensor's local radio broadcast area.
Supporting Documentation: Final Report (pdf) Poster (pdf)
3:   Scalable Firewalls for Publicly-Routable Cloud Tenants (GoogleDoc proposal: HERE)
 Tenzin Ukyab and Emily Marx
Cloud tenants frequently use workloads that span multiple regions in the cloud and require complex network setup to connect all the regions. McClure et al. suggests cloud providers reduce this complexity by presenting tenants with a high-level virtual network API, rather than a set of low-level virtual network components. However, this requires cloud providers make all tenant endpoints publicly routable but default-off. Currently, cloud tenant endpoints are not publicly-routable, which reduces the risk of DDoS attacks. Thus, generally the only firewalls are at the endpoints. However, adding public addresses for all tenant hosts makes the cloud network vulnerable to an increased volume of resource-exhaustion attacks. To mitigate this risk, we propose adding a firewall system to each cloud point of presence (PoP), in addition to the per-endpoint firewalls. A naive solution adding a single firewall representing firewall rules for all the endpoints served by the PoP would not scale, so we propose a more sophisticated system. This new system adds firewall servers sharded by tenant to the PoP. The edge router directs traffic to the correct firewall server; the most commonly used endpoint firewall rules are also cached in the edge router. With a cloud network of size $10^{12}$ VMs, our design has a $0.0004\%$ rate of leaking disallowed network traffic into the cloud network, with half the CPU usage per tenant of the status-quo VPN tunnels.
Supporting Documentation: Final Report (pdf) Poster (pdf)
4:   Statistic Collection During Think Time in Modin (GoogleDoc proposal: HERE)
 Connor McMahon and Jonathan Shi
Modin is a drop-in replacement for Pandas that allows for convenient parallelization of dataframe transformations across multiple cores, and is often used in Jupyter notebooks for data science algorithm development. This project leverages user "think time" -- periods of time in which the user is examining notebook outputs and not running any computations -- to collect statistics about dataframes to enable optimizations in query planning.
Supporting Documentation: Final Report (pdf) Poster (pdf)
5:   Learned Memory Allocation in Heterogeneous Memory Systems (GoogleDoc proposal: HERE)
 Jaewan Hong and Junsun Choi
Current hardware and application memory trends put immense pressure on the computer system’s memory subsystem. However, with the end of Moore's Law and Dennard scaling, DRAM capacity growth slowed down significantly. To cope with this limitation, on the hardware side, the market for memory devices has diversified to a multi-layer memory topology spanning multiple orders of magnitude in cost and performance. Above from the user level, applications has increased in need to process vast data sets with low latency, high throughput.Memory allocation system designed to cater average cases cannot support all of the demands together.

We present a learned memory allocation scheme, MARL(Memory Allocation Reinforcement Learning) for heterogeneous memory systems. In this project, we built an reinforcement learning algorithm to automate the memory allocation automation.

Supporting Documentation: Final Report (pdf) Poster (pdf)
6:   ProtoBlocks: A protocol-safe programming language (GoogleDoc proposal: HERE)
 William Mullen and Vivek Nair
Memory-safe languages like Rust and Java have effectively eliminated buffer overflow vulnerabilities through good programming language design alone. Can the same be done for cryptographic protocol vulnerabilities? We seek to implement a "protocol-safe" language which enforces good design choices while implementing cryptographic protocols and eliminating common vulnerabilities and pitfalls using DataCapsules as a basis.
Supporting Documentation: Final Report (pdf) Poster (pdf)
7:   QoS Aware Accelerator for Multi-Tenancy Execution (GoogleDoc proposal: HERE)
 Seah Kim and Avinash Nandakumar
Recent works have proposed multi-tenancy on deep neural network (DNN) accelerators, to enable concurrent execution of DNN applications co-located on the same hardware, while improving overall system utilization. However, workload co-location would cause contention over shared resources, causing quality of service (QoS) degradation. Thus, to maintain QoS of latency critical workloads, we come up with a new dynamic, contention-aware accelerator architecture with that has software runtime progress monitoring and hardware dynamic memory request manipulator.
Supporting Documentation: Final Report (pdf) Poster (pdf)
8:   Efficient cross-mesh communication and data loading library (for large-scale training) (GoogleDoc proposal: HERE)
 Chanwut Kittivorawong and Sheng Shen
Parax is a JIT compiler for distributed tensor computation and large-scale neural network training. Parax is built on top of JAX and XLA. Parax can compile a tensor computational graph, generate a parallelization strategy and run it on a Ray GPU cluster. Parax searches for the best parallelization strategy in a comprehensive search space that combines intra-operator parallelism and inter-operator parallelism. Parax can automatically find and compose complicated strategies such as data-parallel, tensor partitioning and pipeline parallel.

One performance bottleneck of the Parax is to handle the communication of distributed tensors across device meshes. The goal of this project is to design and implement an efficient collective communication library that supports the communication of distributed tensors across device meshes.

Supporting Documentation: Final Report (pdf) Poster (pdf)
9:   Automated Cache Hierarchy for Feature Stores (GoogleDoc proposal: HERE)
 Edward Choi and Shreyas Krishnaswamy and Priyam Mohanty
Real-world data pipelines are becoming more and more reliant on data featurization. This has propelled development of feature stores, systems that store featurized data and serve them to ML models or other workloads downstream. While many production feature stores are ad-hoc, a recently-developed, unified feature store called RALF has facilitated research on end-to-end feature store optimizations. However, RALF currently assumes all features fit in memory. This inhibits its ability to handle large workloads. Thus, a caching hierarchical model is necessary to evict features and save to disk when necessary. Towards this model, we introduce three new novel caching policy "classes" specific to feature stores: cost-awareness, table-level caching, and a combination of the two. For rapid protoyping, we design and build a feature store simulator that implements these policy classes and demonstrate that these policy classes perform considerably better than naive caching strategies like LRU and MRU. We also perform a validation experiment to confirm that the simulator actually mirrors RALF in terms of functionality and speed.
Supporting Documentation: Final Report (pdf) Poster (pdf)
10:   Fast Thread Migration in a Heterogenous ISA System (GoogleDoc proposal: HERE)
 John Fang and Charles Hong and Max Banister
Heterogenous systems-on-chip (SoCs) are becoming more popular across different levels of computing. However, the modularity of extensible ISAs like RISC-V can also cause fragmentation and the coexistence of cores in the same system that support different ISA extensions. Currently, no OS scheduler is designed for managing threads on such heterogeneous systems. Users must use poorly-suited mechanisms to manually set the cores on which their software should run, hurting performance and increasing development time. We propose an automatic thread migration mechanism that schedules accelerated programs to the appropriate cores in a shared memory heterogeneous system, adapting system software to an increasingly common hardware environment.
Supporting Documentation: Final Report (pdf) Poster (pdf)
11:   Fair and Efficient Resource Allocation for Distributed Systems (GoogleDoc proposal: HERE)
 Wenshuo Guo
Efficient resource allocation in distributed computing system and cloud computing has been an important and difficult task. For user satisfaction, these allocations are also expected to satisfy certain fairness criteria. In particular, user needs are constantly changing, and even the users are often unable to precisely derive the amount of different computing resources they need, such as CPU, CPU, memory, disk space to satisfy their SLOs. In this project, we combine the resource allocation scheme with new machine learning algorithms, which can learn the users needs and desgin efficient and fair allocations from feedbacks.
Supporting Documentation: Final Report (pdf) Poster (pdf)
12:   Privacy Preserving Personal Search Index (GoogleDoc proposal: HERE)
 Shomil Jain and Ben Hoberman (CS 294) and Solomon Joseph (CS 294)
Google, Apple, Facebook, and other third-party services collect, store, and operate on large quantities of personal data. These companies most frequently use this data for personalization, advertising, or other financially-motivated purposes (e.g. direct sale to data brokers). Due to the current advertising- driven paradigm of personal computing, users are faced with a tradeoff: either use third-party applications that may collect, trade, operate on, and sell personal data, or don’t use these applications at all. In this paper, we propose Lens: an end-to-end application framework built on top of EGo & OpenEnclave that enables users to leverage third party applications that perform secure computation over their personal data. Lens provides an alternative to users who're seeking to leverage third-party services without trusting such services with their data. Lens protects the privacy of sensitive user-provided data and ensures the confidentiality and integrity of data-in-use through Intel SGX enclaves. Lens allows users to initialize and use third-party application modules running on trusted hardware, while empowering a larger number of untrusted parties to take advantage of large reservoirs of personal data in a privacy-preserving manner.
Supporting Documentation: Final Report (pdf) Poster (pdf)
13:   Secure Services for the Extensible Internet (GoogleDoc proposal: HERE)
 Cathy Lu and Zach Van Hyfte and William Lin (CS 294)
The Extensible Internet is a proposal for an addition to the public internet architecture that allows for the deployment of new in-network services. EI provides this framework through service nodes that form a new L3.5 layer on top of the existing L3 network layer. Service nodes improve the security and privacy guarantees of the services they offer by running service execution environments within enclaves. We aim to build two security-focused services on top of these secure service nodes: a scalable attestation service and an implementation of oblivious DNS.
Supporting Documentation: Final Report (pdf) Poster (pdf)
14:   FlitReduce: Improving Memory Fabric Performance via End-to-End Network Packet Compression (GoogleDoc proposal: HERE)
 Xingyu Li and Tushar Sondhi
New technologies in fabrication and packaging have led to an explosion in core counts as time continues. Network- on-Chip (NOC) is a router-based packet switching network that enables an efficient on-chip interconnect. Prior research has shown that, especially for highly parallel workloads, the design and performance of the NoC has a great effect on the overall performance of the system. Our project FlitReduce intends to improve the data transmission efficiency over the NoC by data compression. FlitReduce implements compressors and decompressors on the ends of the NoC to reduce the amount of overall memory traffic and, thus, the communication congestion, improving the NoC performance while imposing little effect on the other components in the network.
Supporting Documentation: Final Report (pdf) Poster (pdf)
15:   Sharded, Secure Multicast Tree for Paranoid Stateful Lambdas (GoogleDoc proposal: HERE)
 Marcus Plutowski and Vivek Bharadwaj and Willis Wang
The Paranoid Stateful Lambda (PSL) system is a component of the broader Global Data Plane, specifically built to provide secure, scalable, and high-speed communication between the worker enclaves managed by a given PSL instance. This project aims to reimplement the inter-enclave networking structure of the Paranoid Stateful Lambda system with the goal of increasing its scalability while maintaining a high degree of responsiveness.
Supporting Documentation: Final Report (pdf) Poster (pdf)
16:   The easy way to Parallel & Recursive programming on GPU (GoogleDoc proposal: HERE)
 Cheng Cao and Justin Kalloor
Tradiontally GPU DSLs like GLSL/HLSL/OpenCL do not allow recursion in its language, and only CUDA supports it through a maze of software and hardware features. In order to express recursion, GPU programmers have been converting recursive or large branching algorithms into producer-consumer queues, or write their own stacks. We aim to do this automatically in a optimizing compiler, and we hope to create a scheduling and reordering system to further enhance performance on GPUs
Supporting Documentation: Final Report (pdf) Poster (pdf)
17:   Secure, Fast, Location-Independent Routing for the Global Data Plane (GoogleDoc proposal: HERE)
 Rahul Arya and Thanakul Wattanawong and Praveen Batra
The implementation of a fast, flexible location-independent routing system for the Global Data Plane. The system contains many features of a complete GDP switching network including location-independent routing, certificate chaining, and a routing information base. Other features were also implemented for security and performance including packet-level encryption that can be used to build full dTLS support at each hop, and almost lock-free caching of the forwarding table. Benchmarks indicate that the system has superior performance both in general and compared to previous implementations, and we propose that our implementation can be naturally extended to support other novel features such as arbitrary trust domains for routing.
Supporting Documentation: Final Report (pdf) Poster (pdf)
18:   CapsuleDB: A DataCapsule Based Key-Value Store (GoogleDoc proposal: HERE)
 William Mullen and Nivedha Krishnakumar and Brian Wu
The Paranoid Stateful Lambda project implemented a stateful Function-as-a-service (FaaS) framework with confidentiality, integrity, and provenance by using DataCapsules. However, the system was heavily constrained due to the need to keep everything within memory. CapsuleDB is a secure database designed to address several performance issues with PSLs, such as the in-memory constraint, poor read/write latency, and inefficient data recovery.
Supporting Documentation: Final Report (pdf) Poster (pdf)

Back to CS262a page
Maintained by John Kubiatowicz (kubitron@cs.berkeley.edu).
Last modified Fri Dec 24 12:24:20 2021