Google Scholar page
I am a Ph.D. candidate at UC Berkeley in the Electrical Engineering and Computer Science department. My research interests lie in the theoretical and practical challenges that arise in storage and analysis of big data. This broadly includes the areas of information and coding theory, and computer systems. I am advised by Prof. Kannan Ramchandran, and I also closely collaborate with Prof. Ion Stoica.

My research has been supported by a Microsoft Research PhD fellowship, a Facebook Fellowship, and a Google Anita Borg Memorial Scholarship. Thanks to Microsoft, Facebook, and Google!

Currently, I am researching new encoding mechanisms for large-scale distributed storage systems that significantly improve their I/O, network and storage efficiency as compared to the state-of-the-art techniques. I am also working on building practical systems that employ these new coding techniques in novel ways.


Today's large scale distributed storage systems comprise of thousands of nodes, storing hundreds of petabytes of data. In these systems, failures are common, and this makes it essential to store the data in a redundant fashion to ensure reliability and availability. The most common way of adding redundancy is replication. However, replication is highly inefficient in terms of storage capacity utilization, and hence many distributed storage systems are now turning to Reed-Solomon (erasure) codes. While these codes are optimal in terms of storage capacity utilization, they perform poorly in terms of other system resources such as disk I/O and network bandwidth. During recovery of a failed or otherwise unavailable unit of data, traditional techniques require a large amount of data to be read and transferred across the network. The above video provides a fun introduction to this issue.
My research deals with constructing new erasure codes (i.e., designing new encoding and decoding algorithms) that overcome the limitations of classical erasure codes for application into large scale distributed storage systems, and building storage systems that employ these new generation of storage codes in novel ways.

Here is a brief description of my past projects.


A new erasure-coded storage system that reduces both network traffic and disk I/O by around 25-45% during recovery with no additional storage, the same fault tolerance, and arbitrary flexibility in the choice of parameters, as compared to Reed-Solomon based systems. We have implemented Hitchhiker on top of the Hadoop Distributed File System (HDFS) and evaluated various metrics on the data-warehouse cluster in production at Facebook with real-time traffic and workloads. The underlying Hitchhiker's erasure code is desgined by making use of the Piggybacking framework (see project below). You can find more details about Hitchhiker in our ACM SIGCOMM 2014 paper.


Piggybacking Code Design Framework:

Piggybacking is a framework for designing distributed storage codes that are efficient in data I/O and network bandwidth consumption during node-repair. The basic idea behind this framework is to take multiple instances of existing codes and add carefully designed functions of the data of one instance to the other. You can find more details about Piggybacking in our IEEE ISIT 2013 paper.


Regenerating Code Constuctions:

'Regenerating Codes' are a new class of erasure codes for distributed storage systems which are optimal with respect to both storage space and network bandwidth utilization. Regenerating codes come in two flavors: (i) Minimum Storage (MSR): which minimize bandwidth usage while storing an optimal amount of data, (ii) Minimum Bandwidth (MBR): which further minimize bandwidth usage by allowing for a slightly higher storage space. Here is a description of some of our regenerating code constructions.

Product-Matrix Codes

Product-Matrix codes are the most general class of regenerating code constructions available in the literature, spanning both MBR and MSR. These codes are the first and the only explicit construction of regenerating codes that allow the number of nodes in the system to scale irrespective of other system parameters. This attribute allows one to vary the number of nodes in the system on the fly, which is very useful, for example, in provisioning resources based on the demand. Further, these codes can be implemented using Reed-Solomon encoders and decoders (which are already existing) as building blocks.

Under these codes, each node is associated with an encoding vector. The source data is assembled in the form of a matrix, and each node stores the projection of this matrix along its encoding vector. To recover data stored in a node, all helper nodes pass a projection of the data stored in them along the direction of the encoding vector of the failed node.

We have also optimized these codes with respect to other aspects of distributed storage systems such as security, error correction, scaling via ratelessness, etc.
Repair-by-Transfer Codes

These are MBR codes which perform repair-by-transfer: the data stored in any node can be recovered by mere transfer of data from other nodes, without requiring any computation. This minimizes the disk IO since each node reads only the data that it transfers, and also permits the use of "dumb" nodes. The animated video above presents the working of these codes.

Interference Alignment based MISER Codes

Interference Alignment is a concept that was proposed in the field of wireless communication to efficiently handle multiple interfering communications. We show that this concept necessarily arises during repair in MSR codes. Using these insights we construct codes operating at the MSR point.
Twin Codes

This is a simple, yet powerful, framework which allows one to use any erasure code and still remain bandwidth efficient. Under this framework, nodes are partitioned into two types, and the data is encoded using two (possibly different) codes. The data is encoded in a manner such that the problem of repairing nodes of one type is reduced to that of erasure-decoding of the code employed by the other type. Here, one can choose the constituent codes based on the properties required, for instance, employing LDPC codes allows for low-complexity decoding algorithms.
Non-achievability Results

It has been shown that there is a fundamental tradeoff between the two resources: storage space and network bandwidth, and MSR and MBR are the two extreme points of this tradeoff. While the codes described above operate at these extreme points, we have also shown that there are no codes which can achieve the interior points on this tradeoff.


(On Google Scholar)

* indicates equal contribution

Journal Preprints

Journal Papers

Conference Papers

Workshop Papers

Tutorial Articles

  • "Network Coding"
    K. V. Rashmi*, Nihar B. Shah*, and P. Vijay Kumar, in Resonance , vol. 15, no. 7, pp. 604-621.
    (Resonance is a journal of science education published by the Indian Academy of Sciences)


Please find my CV here.


• Received Eli Jury Award 2016 for best thesis in the area of Systems, Communications, Control, or Signal Processing at EECS, UC Berkeley.
• New paper at IEEE ISIT 2016 on sparsifying distributed storage codes for fast encoding.
• Gave an invited ITA Graduation Day talk at ITA 2016.
• Gave an invited talk at Allerton 2015.
• Awarded Google Anita Borg Memorial Scholarsihp 2015. Thanks, Google!
• Our FAST 2015 paper chosen as the best paper of USENIX FAST 2015 by StorageMojo.
• Our distributed secret sharing paper accepted at IEEE Journal of Selected Topics in Signal Processing.
• New paper at USENIX FAST 2015 on reducing I/O cost in distributed storage codes.
• New paper at ACM SIGCOMM 2014 on Hitchhiker, an effcieint erasure-coding system for data centers in collaboration with Facebook.
• Awarded Microsoft Research PhD Fellowship 2013-15. Thanks, Microsoft!
• New paper at IEEE ISIT 2013 on Piggybacking framework for constructing storage codes.
• Awarded Facebook Fellowship 2012-13. Thanks, Facebook!
• Our Product-Matrix codes paper awarded IEEE Data Storage Best Paper and Best Student Paper Awards for 2011/2012.