Google Scholar page

rashmikv[at]eecs.berkeley[dot]edu
I am a postdoctoral researcher at EECS, UC Berkeley working with Prof. Ion Stoica (AMPLab) and Prof. Kannan Ramchandran (BLISS). Prior to this, I received my Ph.D. from UC Berkeley in September 2016, advised by Prof. Kannan Ramchandran. My research interests lie in the theoretical and practical challenges that arise in storage and analysis of big data, with a current focus on erasure coding for big-data systems. This broadly includes the areas of information and coding theory, and computer systems.

My Ph.D. research was supported by a Microsoft Research PhD fellowship, a Facebook Fellowship, and a Google Anita Borg Memorial Scholarship. Thanks to Microsoft, Facebook, and Google!



Research

Today's large scale distributed storage systems comprise of thousands of nodes, storing hundreds of petabytes of data. In these systems, failures are common, and this makes it essential to store the data in a redundant fashion to ensure reliability and availability. The most common way of adding redundancy is replication. However, replications is highly inefficient in terms of storage capacity utilization. With the rapid increase in the volume of data needed to be stored, replication is quickly becoming unsustainable, and many distributed storage systems are now turning to erasure coding which offers a storage-efficient alternative. While traditional codes are optimal in terms of storage capacity utilization, they perform poorly in terms of other system resources such as network bandwidth and device I/O. Furthermore, the usage of codes has largely been limited to achieving space-efficient fault tolerance in disk-based storage systems, that is, for storing "cold" (less-frequently accessed) data.
My research deals with constructing new erasure codes (i.e., designing new encoding and decoding algorithms) that provably overcome the limitations of classical erasure codes for application into large-scale distributed storage systems, and designing and building systems that employ these new generation of storage codes in novel ways. My research work also explores new avenues for employing erasure coding in big-data systems, in particular for "hot" (frequently accessed) data, beyond the realm of disk-based storage systems and for goals beyond fault tolerance.

Here is a brief description of my projects.

Hitchhiker:

An erasure-coded storage system that reduces both network traffic and disk I/O by around 25-45% during recovery with no additional storage, the same fault tolerance, and arbitrary flexibility in the choice of parameters, as compared to Reed-Solomon based systems. We have implemented Hitchhiker on top of Facebook's Hadoop Distributed File System (HDFS) and evaluated various metrics on the data-warehouse cluster in production at Facebook with real-time traffic and workloads. Hitchhiker is being incorporated into Apache Hadoop. The underlying erasure code employed in Hitchhiker is desgined by making use of the Piggybacking framework (see below). You can find more details about Hitchhiker in our ACM SIGCOMM 2014 paper.

Reference:

Piggybacking Code Design Framework:

Piggybacking is a framework for designing practical distributed storage codes that are efficient in terms of device I/O and network-bandwidth while not compromising on storage efficiency. Piggybacking framework enables construction of best known codes for multiple settings. The basic idea behind this framework is to take multiple instances of existing codes and add carefully designed functions of the data of one instance to the other. You can find more details about Piggybacking in our IEEE ISIT 2013 paper and the upcoming IEEE Transactions on Information Theory paper.

Reference:

Regenerating Code Constuctions:

'Regenerating Codes' are a class of erasure codes for distributed storage systems which are optimal with respect to both storage space and network-bandwidth utilization. Regenerating codes come in two flavors: (i) Minimum Storage (MSR): which minimize bandwidth usage while storing an optimal amount of data, (ii) Minimum Bandwidth (MBR): which further minimize bandwidth usage by allowing for a slightly higher storage space. Here is a description of some of our regenerating code constructions.

Product-Matrix Codes

Product-Matrix codes are the most general class of regenerating code constructions available in the literature, spanning both MBR and MSR. These codes are the first and the only explicit construction of regenerating codes that allow the number of nodes in the system to scale irrespective of other system parameters. This attribute allows one to vary the number of nodes in the system on the fly, which is very useful, for example, in provisioning resources based on the demand. Further, these codes can be implemented using Reed-Solomon encoders and decoders (which are already existing) as building blocks.

Under these codes, each node is associated with an encoding vector. The source data is assembled in the form of a matrix, and each node stores the projection of this matrix along its encoding vector. To recover data stored in a node, all helper nodes pass a projection of the data stored in them along the direction of the encoding vector of the failed node.

We have also optimized these codes with respect to other aspects of distributed storage systems such as security, error correction, scaling via ratelessness, etc.
References:
Repair-by-Transfer Codes

These are MBR codes which perform repair-by-transfer: the data stored in any node can be recovered by mere transfer of data from other nodes, without requiring any computation. This minimizes the disk IO since each node reads only the data that it transfers, and also permits the use of "dumb" nodes. The animated video above presents the working of these codes.
References:


Interference Alignment based MISER Codes

Interference Alignment is a concept that was proposed in the field of wireless communication to efficiently handle multiple interfering communications. We show that this concept necessarily arises during repair in MSR codes. Using these insights we construct codes operating at the MSR point.
References:
Twin Codes

This is a simple, yet powerful, framework which allows one to use any erasure code and still remain bandwidth efficient. Under this framework, nodes are partitioned into two types, and the data is encoded using two (possibly different) codes. The data is encoded in a manner such that the problem of repairing nodes of one type is reduced to that of erasure-decoding of the code employed by the other type. Here, one can choose the constituent codes based on the properties required, for instance, employing LDPC codes allows for low-complexity decoding algorithms.
References:
Non-achievability Results

It has been shown that there is a fundamental tradeoff between the two resources: storage space and network bandwidth, and MSR and MBR are the two extreme points of this tradeoff. While the codes described above operate at these extreme points, we have also shown that there are no codes which can achieve the interior points on this tradeoff.
References:



Publications

(On Google Scholar)

* indicates equal contribution

Journal Papers


Conference Papers


Workshop Papers


Journal Preprints


  • "Information-theoretically Secure Erasure Codes for Distributed Storage"
    K. V. Rashmi*, Nihar B. Shah*, Kannan Ramchandran, and P. Vijay Kumar
    Under submission to IEEE Transactions on Information Theory.




CV

Please find my CV here.




News

• 'EC-Cache' paper accepted at USENIX OSDI 2016.
• Received Eli Jury Award 2016 for best thesis in the area of Systems, Communications, Control, or Signal Processing at EECS, UC Berkeley.
• New paper at IEEE ISIT 2016 on sparsifying distributed storage codes for fast encoding.
• Gave an invited ITA Graduation Day talk at ITA 2016.
• Gave an invited talk at Allerton 2015.
• Awarded Google Anita Borg Memorial Scholarsihp 2015. Thanks, Google!
• FAST 2015 paper chosen as the best paper of USENIX FAST 2015 by StorageMojo.
• Distributed secret sharing paper accepted at IEEE Journal of Selected Topics in Signal Processing.
• New paper at USENIX FAST 2015 on reducing I/O cost in distributed storage codes.
• New paper at ACM SIGCOMM 2014 on Hitchhiker, an effcieint erasure-coding system for data centers in collaboration with Facebook.
• Awarded Microsoft Research PhD Fellowship 2013-15. Thanks, Microsoft!
• New paper at IEEE ISIT 2013 on Piggybacking framework for constructing storage codes.
• Awarded Facebook Fellowship 2012-13. Thanks, Facebook!
• Product-Matrix codes paper awarded IEEE Data Storage Best Paper and Best Student Paper Awards for 2011/2012.