Google Scholar page


I am on the academic job market this year.
I am a postdoctoral researcher at EECS, UC Berkeley working with Prof. Ion Stoica (AMPLab) and Prof. Kannan Ramchandran (BLISS). Prior to this, I received my Ph.D. from UC Berkeley in September 2016, advised by Prof. Kannan Ramchandran. My research interests lie in the theoretical and practical challenges that arise in storage and analysis of big data. This broadly includes the areas of information and coding theory, computer systems, and machine learning.

My Ph.D. research was supported by a Microsoft Research PhD fellowship, a Facebook Fellowship, and a Google Anita Borg Memorial Scholarship. Thanks to Microsoft, Facebook, and Google!


Today's large scale distributed storage systems comprise of thousands of nodes, storing hundreds of petabytes of data. In these systems, failures are common, and this makes it essential to store the data in a redundant fashion to ensure reliability and availability. The most common way of adding redundancy is replication. However, replication is highly inefficient in utilizing storage capacity. With the rapid increase in the volume of data needed to be stored, replication is quickly becoming unsustainable, and many distributed storage systems are now turning to erasure coding which offers a storage-efficient alternative. While classical erasure codes are optimal in terms of storage capacity utilization, they come with many drawbacks when applied to the distributed storage setting. For instance, they result in siginificant increase in the usage of system resources such as network bandwidth and device I/O. Furthermore, in big-data systems, the usage of codes has largely been limited to achieving space-efficient fault tolerance in disk-based storage systems, that is, for storing "cold" (less-frequently accessed) data.
My PhD research dealt with constructing new erasure codes (i.e., designing new encoding and decoding algorithms) that provably overcome the limitations of classical erasure codes for application into large-scale distributed storage systems, and designing and building systems that employ these new generation of storage codes in novel ways. My research work also explores new avenues for employing erasure coding in big-data systems, in particular for "hot" (frequently accessed) data, beyond the realm of disk-based storage systems and for goals beyond fault tolerance.

Here is a brief description of few of my projects.


In-memory object caching in data-intensive clusters routinely face the challenges of popularity skew, background load imbalance, and server failures, which result in severe load imbalance across servers and degraded I/O performance. Selective replication is a commonly used technique to tackle these challenges, where the number of cached replicas of an object is proportional to its popularity. EC-Cache is a load-balanced, low latency cluster cache that uses online erasure coding to overcome the limitations of selective replication. As compared to selective replication, EC-Cache improves load balancing by more than 3x and reduces the median and tail read latencies by more than 2x for typical parameters. The benefits offered by EC-Cache are further amplified in the presence of background network load imbalance and server failures.



An erasure-coded storage system that reduces both network traffic and device I/O by around 25-45% during recovery with no additional storage, the same fault tolerance, and arbitrary flexibility in the choice of parameters, as compared to Reed-Solomon based systems. We have implemented Hitchhiker on top of Facebook's Hadoop Distributed File System (HDFS) and evaluated various metrics on the data-warehouse cluster in production at Facebook with real-time traffic and workloads. The underlying erasure code employed in Hitchhiker is desgined by making use of our Piggybacking framework (see below). Hitchhiker has beeb incorporated into Apache Hadoop.


Piggybacking Code Design Framework:

Piggybacking is a framework for designing practical distributed storage codes that are efficient in terms of device I/O and network-bandwidth while not compromising on storage efficiency. Piggybacking framework enables construction of best known codes for multiple settings. The basic idea behind this framework is to take multiple instances of existing codes and add carefully designed functions of the data of one instance to the other. You can find more details about Piggybacking in our IEEE ISIT 2013 paper and the upcoming IEEE Transactions on Information Theory paper.


Regenerating Code Constuctions:

'Regenerating Codes' are a class of erasure codes for distributed storage systems which are optimal with respect to both storage space and network-bandwidth utilization. Regenerating codes come in two flavors: (i) Minimum Storage (MSR): which minimize bandwidth usage while storing an optimal amount of data, (ii) Minimum Bandwidth (MBR): which further minimize bandwidth usage by allowing for a slightly higher storage space. Here is a description of some of our regenerating code constructions.

Product-Matrix Codes

Product-Matrix codes are the most general class of regenerating code constructions available in the literature, spanning both MBR and MSR. These codes are the first and the only explicit construction of regenerating codes that allow the number of nodes in the system to scale irrespective of other system parameters. This attribute allows one to vary the number of nodes in the system on the fly, which is very useful, for example, in provisioning resources based on the demand. Further, these codes can be implemented using Reed-Solomon encoders and decoders (which are already existing) as building blocks.

Under these codes, each node is associated with an encoding vector. The source data is assembled in the form of a matrix, and each node stores the projection of this matrix along its encoding vector. To recover data stored in a node, all helper nodes pass a projection of the data stored in them along the direction of the encoding vector of the failed node.

We have also optimized these codes with respect to other aspects of distributed storage systems such as security, error correction, scaling via ratelessness, etc.
Repair-by-Transfer Codes

These are MBR codes which perform repair-by-transfer: the data stored in any node can be recovered by mere transfer of data from other nodes, without requiring any computation. This minimizes the disk IO since each node reads only the data that it transfers, and also permits the use of "dumb" nodes. The animated video above presents the working of these codes.

Interference Alignment based MISER Codes

Interference Alignment is a concept that was proposed in the field of wireless communication to efficiently handle multiple interfering communications. We show that this concept necessarily arises during repair in MSR codes. Using these insights we construct codes operating at the MSR point.
Twin Codes

This is a simple, yet powerful, framework which allows one to use any erasure code and still remain bandwidth efficient. Under this framework, nodes are partitioned into two types, and the data is encoded using two (possibly different) codes. The data is encoded in a manner such that the problem of repairing nodes of one type is reduced to that of erasure-decoding of the code employed by the other type. Here, one can choose the constituent codes based on the properties required, for instance, employing LDPC codes allows for low-complexity decoding algorithms.
Non-achievability Results

It has been shown that there is a fundamental tradeoff between the two resources: storage space and network bandwidth, and MSR and MBR are the two extreme points of this tradeoff. While the codes described above operate at these extreme points, we have also shown that there are no codes which can achieve the interior points on this tradeoff.


(On Google Scholar)

* indicates equal contribution

Journal Papers

Conference Papers

Workshop Papers

Journal Preprints

  • "Information-theoretically Secure Erasure Codes for Distributed Storage"
    K. V. Rashmi*, Nihar B. Shah*, Kannan Ramchandran, and P. Vijay Kumar
    Under submission to IEEE Transactions on Information Theory.


Please find my CV here.


• Talk at Stanford Information Theory Forum.
• 'EC-Cache' accepted at USENIX OSDI 2016.
• Received Eli Jury Award 2016 for best thesis in the area of Systems, Communications, Control, or Signal Processing at EECS, UC Berkeley.
• 'Piggybacking framework' accepted to IEEE Transactions on Information Theory.
• 'Sparsifying storage codes for fast encoding' accepted at IEEE ISIT 2016.
• Invited talk at ITA Graduation Day 2016.
• Invited talk at Allerton 2015.
• Awarded Google Anita Borg Memorial Scholarsihp 2015. Thanks, Google!
• 'Distributed secret sharing' accepted to IEEE Journal of Selected Topics in Signal Processing.
• 'Reducing I/O cost in distributed storage codes' at USENIX FAST 2015. Chosen as the best paper of USENIX FAST 2015 by StorageMojo.