1: | Enabling multi-leaf transactions on a Merkle tree (GoogleDoc proposal: HERE) | |
Aneesh Khera and Cibi Pari and Alex Kazorian and Janakirama Kalidhindi | ||
Merkle trees allow for efficient and authenticated verification of data contents through hierarchical cryptographic hashes. Because the contents of parent nodes in the trees are the hashes of the children nodes, concurrency in Merkle trees is a hard problem. In this project, we will explore possible ideas in concurrent single-leaf updates in a Merkle tree. Our model of updates follows the epoch-style methodology that Google Key Transparency uses where a new tree root is published at the beginning of each epoch. We plan to implement our ideas in a distributed environment using Ray, a distributed python framework, and measure the performance using generated workloads. We also plan to compare the performance to the naive approach used by Trillian, the sparse Merkle tree implementation used by the Google Key Transparency team. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
2: | AUTOPHASING: Learning to Optimize Compiler Passes with Deep Reinforcement Learning (GoogleDoc proposal: HERE) | |
Ameer Haj-Ali and Qijing Huang | ||
The precise ordering of various optimization passes in the compiler can drastically change a program's performance. Traditionally, an order is chosen that is deemed "good enough" by programmers who decide based off of what they think works well. However, the optimal ordering of optimizations, often referred to as phase ordering, is often quite different from the default ordering in the compiler. Moreover, the ordering of optimizations may vary from program to program, and even the hardware the program is run on. Thus an ideal compiler must take into account various program features, the underlying hardware and whatever else may be running on the machine.
After applying each optimization pass the performance of the program and the order/number of operations (features) within the program changes accordingly. Passes could be seen as actions the compiler takes, and the program features could be considered as what the compiler observes after taking each action i.e., interacting and manipulating the underlying program features. The goal of the compiler is to maximize the performance of the program. This behavior is isomorphic to that seen in deep reinforcement learning, where generally an agent interacts with the environment by taking actions based on a policy that takes as an input a state/observation and outputs that action that maximizes the reward. Advances in deep reinforcement learning open a new horizon for addressing the phase ordering challenge.
In this paper, we explore the benefits of using deep reinforcement learning to achieve a better ordering of optimization passes in a mainstream compiler (LLVM). Previous work relied on the use of a frequency of a program's instructions as the state. However, we show that this is insufficient due to limited observations, the necessity to apply multiple passes sometimes to affect such features, and inability to operate on multiple programs simultaneously. Instead, we propose a solution that uses a histogram of the actions (passes) themselves as observations. We see several performance improvements from such system, stemming from taking into account the entire trajectory, as well as faster learning since the rate limiting step (calling the compiler repeatedly) is reduced.
We implement a framework that takes any group of program and intelligently finds an optimal sequence of passes that optimize their run cycles. As a case study, the framework is used in compiling multiple High Level Synthesis (HLS) programs. A simulation environment is built for deep reinforcement learning by leveraging existing open-source HLS compiler backend. It takes a few seconds to generate feedback from the environment for each action the algorithm takes. We run both policy gradient and deep-Q network with various settings in the framework. We also modify these algorithms to better fit the needs of the phase ordering of the compiler and to achieve shorter runtimes. The design choices we have explored include the batch size, the use of reward-to-go, and the frequency to gather rewards. The optimization goal for the algorithms is to generate a sequence of passes that further reduces overall target program runtime compared to -O3 within a reasonable time frame.
In order to compare the deep reinforcement learning algorithms with other existing methods, we also implement and apply random, greedy and genetic algorithms proposed in the literature. We compare the runtime and performance of our framework versus state-of-the-art algorithms for overcoming the compiler phase ordering challenge on a HLS benchmark suite that has 12 programs. We use six programs for training and six programs for testing. Overall, our framework runs one to two orders of magnitude faster than state-of-the-art approaches for compiler phase ordering and achieves 16% improvement over the standard "high optimization level" (the -O3 flag) in the compiler. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
3: | Using Software-defined Caching to Enable Efficient Communication in a Serverless Environment (GoogleDoc proposal: HERE) | |
Kailas Vodrahalli and Eric Zhou | ||
There has been a recent surge in popularity of serverless computing with stateless functions such as AWS Lambda mainly because of the abstractions that hide complex cluster management details from the user. PyWren is a serverless computing framework that leverages the elasticity and simplicity of such stateless functions to run embarrassingly parallel tasks. We are interested in extending this framework to leverage the locality given by co-located processors. In practice, this means augmenting PyWren with a local cache. The local cache will decrease the number of messages and quantity of data sent, especially in situations where broadcasts are necessary. One specific application where we believe our cache can enable significant performance gains is serverless linear algebra. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
4: | Spectrum: Classifying, Replicating and Mitigating Spectre Attacks on a Speculating RISC-V Microarchitecture (GoogleDoc proposal: HERE) | |
Abraham Gonzalez and Ben Korpan and Ed Younis and Jerry Zhao | ||
Following the disclosure of Spectre and Meltdown at the beginning of 2018, many attacks have surfaced targeting speculative microarchitectural state in out-of-order processors. While classifications have been developed for cache-based and timing-based side-channel attacks, a general taxonomy for the space of speculative execution attacks which target microarchitectural vulnerabilities has not been formalized. This project aims to create a taxonomy that would both inform categorization of new attacks and help development of hardware mitigations for these attacks. Additionally, this project aims to use a generic, open-source out-of-order processor (BOOM) to both replicate a subset of the attacks and create a new mitigation for cache side-channel attacks. This mitigation will allow the execution core to fill and access speculative cache lines separate from the L1D$, providing isolation while minimizing performance loss. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
5: | Enabling multi-leaf transactions on a Merkle tree (GoogleDoc proposal: HERE) | |
Nikhil Athreya and Nick Riasanovsky | ||
This is a problem that Google's KeyTransparency team told Prof. Popa recently. Merkle trees (or hash trees) are a very commonly used primitive for authentication in recent cryptography-based systems such as Bitcoin, Ethereum, KeyTransparency, Certificate Transparency, and many others. However, one change to a leaf in the tree, percolates changes to all the hashes on the path from that root until the root, posing challenges to concurrency. Nevertheless, there exists research on achieving concurrency among single-node operations. However, recent use cases for Merkle trees require *multi-node* operations to happen atomically. For example, Google's KeyTransparency system stores pairs of (username, public key) in the leaves of a huge Merkle tree. For the same user, it might have different leaves, some mapping a phone number, a username, or other identifiers to the same public key. Yet, Google needs to update all these entries at the same time when the user's public key changes. Since Google is envisioning storing everyone's public keys over all their applications in this tree, they need to make accesses to this tree very efficiently. Thus, such multi-leaf transactions have to run as concurrently as possible. How would you design such a system? If you are interested in this problem, please contact Prof. Popa as well. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
6: | A Succinctly Verifiable Key Transparency System (GoogleDoc proposal: HERE) | |
Yuncong Hu and Weikeng Chen | ||
Key transparency provides a global and accountable database of users' public keys, which can instantiate public-key infrastructure (PKI) for end users. In these systems each user publishes their public key and can detect any manipulation to it. However, existing systems require each key owner to validate each version of the public key database, which is implemented by having each key owner verify a membership proof from the server for each version of the database; this leads to a limited security guarantee and incurs heavy overheads to both clients and the server: (1) each key owner must stay constantly online and check its own public key; (2) the server generates a user-specific proof for each key owner for each version.
In this paper we present SVKT, a key transparency system that is succinctly verifiable and addresses the issues above. Now, the server publishes a few global proofs; each user verifies these proofs after which they will be convinced that the database is correctly updated. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
7: | Job Scheduling with Deep Reinforcement Learning (GoogleDoc proposal: HERE) | |
Xinyun Chen and Huazhe Xu and Zhuang Liu | ||
Job scheduling and resource management problems are ubiquitous and fundamental in computer systems. Various manually-designed heuristic algorithms have been proposed for this problem. Meanwhile, deep reinforcement learning has been widely applied in domains such as game playing and robotic control, and it has the potential to learn a policy that achieves a better performance than humans. However, such a technique hasn't been well explored in the scheduling problem. In this project, we study the effectiveness of applying deep reinforcement learning to automate the scheduling algorithm design. We hope that the learned algorithms could provide some insights into the design of future scheduling algorithms. | ||
Supporting Documentation: | ||
8: | Transparent Block-Level Compression in Hardware (GoogleDoc proposal: HERE) | |
Kimberly Lu and Kyle Kovacs | ||
Modern SSD storage technology is desirable for its non-volatility and high I/O performance. Unfortunately, the way flash memory is designed means that it wears out after a limited number of write cycles. One potential way to increase the lifetime of SSD drives is to limit the number of writes performed. Compressing data between the OS and disk level could lead to fewer physical disk writes, thereby extending SSD endurance.
In addition to compression, we also offload encryption and error correction into hardware, in order to build a more complete picture of what a storage node in a dis-aggregated datacenter could potentially look like. However, we will first focus on studying the performance of compression only, and compare it against a software implementation.
Poster Link: https://docs.google.com/presentation/d/1m5spC408E019Cqd7Fp58AkW5SCp-d4e6cB6-_9q8IQA/edit?usp=sharing | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
9: | Distributed Cloud Edge Video Analytics (GoogleDoc proposal: HERE) | |
Rosanna Neuhausler and Katie Li and Ameena Golding | ||
As processing capabilities of edge devices have advanced, shifting CPU intensive tasks to the edge no longer seems impossible. The implications for reducing latency and increasing performance introduce a new era of computing with smart devices as the focus. One such field is video surveillance, which not only incurs the weight of machine learning but also giga- to terabytes of data storage. Previous solutions involving smart video surveillance networks have sent all data to the cloud for processing. This not only exposes security risks but also makes the task of surveillance dependent upon network connection, which in rural areas may serve as the primary bottleneck. To address these concerns, we propose surveillance system designs that utilizes recent hardware advancements in vision processing units (VPU), a Convolution Neural Network for both image recognition and, in concert with edge caching and the relaxing of storage concerns, allowing for independence from the cloud at larger scales than before. We test our design using Google's Vision Kit and the AWS DeepLens, motivated by the devices' use of Intel's Movidius Myriad X VPU and the Amazon cloud, respectively. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
10: | A Distributed Multithreaded Monitoring System for Anna: a Distributed Key-Value Store (GoogleDoc proposal: HERE) | |
Neil Giridharan and Arvind Sankar | ||
It is hard to build systems that work well across many orders of magnitudes of scale. Anna is a key value store that solves this problem through achieving blazingly fast speeds by the use of coordination-free consistency and wait-free execution. It has a multi-master replication architecture that uses coordination-free actors and lattices to achieve a variety of consistency models. In order to achieve better performance, Anna has a monitoring system and policy engine that enables workload-responsiveness and adaptability. This monitoring system tracks and adjusts resources and workload changes. The policy engine then uses the statistics that were gathered to take specific actions. Currently this monitoring system is a single node that is single threaded and thus is a bottleneck once more traffic is added to the system. In this project we come up with a distributed, multithreaded monitoring system design and implementation that will scale to the workload size. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
11: | Exploiting Cross-Camera Correlations to Decrease Resource Cost in Large-Scale Video Analytics (GoogleDoc proposal: HERE) | |
Samvit Jain and Paras Jain | ||
Can we exploit cross-camera content correlations in wide-area enterprise camera deployments to 1) decrease resource usage and 2) improve accuracy in large-scale video analytics operations? In the context of person re-identification and tracking workloads, can we use a model of spatio-temporal traffic patterns, built on historical data, to prune the inference search space (thus reducing the cloud workload) and reduce the rate of false positive matches (by searching fewer uncorrelated cameras)? What is this scaling behavior of such a system? | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
12: | Replication for Global Data Plane (GoogleDoc proposal: HERE) | |
Yucheng Yang and Scott Numamoto and Steven Wu | ||
Global Data Plane (GDP) provides a secure single-writer append-only logging interface that serves as a storage and communication primitive. GDP is a data-centric platform with the ability to scale for an IoT number of devices. GDP logs can be stored on potentially untrusted distributed infrastructure and optimize for locality.
GDP offers location-independent routing and overlay multicast. The design of GDP involves two related challenges: how to ensure high availability for users, and how to persist logs through failures in the GDP infrastructure.
We proposed a set of semantics for users that offers high availability, and designed a mechanism to replicate logs between servers with no reliance on timing. We further implemented it on top of the current GDP services. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
13: | Techniques for Privacy Over the Interledger (GoogleDoc proposal: HERE) | |
Akash Khosla and Nick Zoghb and Vedant Saran | ||
The Interledger Protocol (ILP) is premised on the idea that the world will always have more than one payment network (i.e. Visa, SWIFT, Bitcoin, Cosmos and others) because the world will never agree to use a single ledger. Interledger is now live and the core protocol was finalized in late 2017. It acts as a universal network for sending value, independent of any company or currency. There are some scenarios where senders (and receivers) would want to minimize what a passive observer in the network is able to learn about transactions. Corporations may want to hide their balance sheet to protect the confidentiality of customers. Individuals may want to protect themselves from mass data mining.
Since Interledger has a fair amount in common with the current Internet Protocol (IP) design, we want to provide a solution that allows for people to use onion routing, and also allow for non-onion routed users to leverage a VPN by having internal networks of connectors built on ILP. On top of that, we may be able to make the first incentivized Tor network - having money involved adds new benefits because you don't have the participation issue of tor being run by volunteers. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
14: | Secure Federated Analytics (GoogleDoc proposal: HERE) | |
Charles Lin and Sukrit Kalra | ||
There has been a lot of recent attention on using concepts from cryptography to make databases more secure. One specific direction in which there remains a lot of work is the area of Secure Federated Analytics. In this setup, multiple parties each hold their own databases, and we would like to run analytics queries over the union ("federation") of their data. However, the parties may be heavily restricted in the amount of data they can share with other parties, due to legal or operational concerns, such as those for medical data. Thus, Secure Federated Analytics is the challenge of executing these queries without leaking any intermediate protected data to any party.
Existing solutions for this are some combination of slow, restrictive in expressiveness, or weak in threat model. DJoin only allows a restricted set of queries to run, and SMCQL can run general queries but performs slowly, and relies on semi-honest participants plus one fully trusted third party. We hope to explore the design space and construct a protocol that improves on performance as well as threat model. Evaluation will be done using query benchmarks against existing systems. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
15: | Nomad - Hierarchical Computation Framework for IoT applications (GoogleDoc proposal: HERE) | |
Romil Bhardwaj and Alvin Ghouas | ||
Modern IoT applications are computationally monolithic and built assuming a "flat" computing architecture, where processing and inference on data from edge devices is done exclusively on the cloud. With growing computation power on the edge, increasingly remote edge deployments and data privacy concerns, there is a need to push out portions of the processing pipeline closer to the edge. We propose Nomad, a framework to partition these monolithic programs, optimally schedule the partitions across the heterogeneous cloud-edge compute hierarchy and orchestrate data pipelines with minimal developer effort. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||
16: | DeepBM: A deep learning-based dynamic page replacement policy (GoogleDoc proposal: HERE) | |
Xinyun Chen | ||
In database systems, buffer management is critical for the efficiency of page accesses, which caches pages that are frequently used or will be used shortly for faster future queries. Specifically, database pages must be in RAM for DBMS to operate on it. Whenever a page needs to be promoted to the buffer, a page needs to be evicted if the buffer is already full. Various heuristic-based policies aiming at selecting the right page to evict have been proposed, such as least recently used (LRU), mostly recently used (MRU), CLOCK algorithm, etc. Existing database systems typically implement a generalized heuristics for all workloads, without taking the workload structure into consideration. This is reasonable, since manually developing heuristics for each single workload is infeasible in practice. However, we will demonstrate that there could be a considerable gap of hit rate between a general-purpose page replacement policy and the optimal workload-specific one.
In this work, we are the first to explore the potential of leveraging deep neural networks to learn a policy for buffer management. In particular, we propose DeepBM, a deep learning-based dynamic buffer management policy, focusing on the page replacement task. During the execution of the database workload, when the buffer becomes full, DeepBM predicts a page in the buffer for eviction. Different from existing heuristic-based algorithms, which use a generic heuristic throughout the execution, DeepBM learns from the past execution and dynamically adapts to the workload. We hope that our work could suggest the benefit of bringing modern learning-based techniques into traditional database system design. | ||
Supporting Documentation: Final Report (pdf) Poster (pdf) | ||