Research Projects

Learning from Shuffled Data

 

The classical prediction problem in machine learning involves learning some prediction function given samples of data and responses. However, in some applications, the correspondence information between data and responses may be missing either due to intentional deletion (e.g. to preserve privacy), or in the collection process itself (e.g. point cloud data in graphics, or data from mobile sensors). It therefore becomes necessary to analyze learning procedures with additional ‘‘permutation” noise.

We take preliminary steps in this direction by analyzing perhaps the simplest prediction problem, that of linear regression, with shuffled data. We have results for both the model selection problem — inferring the unknown permutation — and the prediction problem — analysis of the prediction error.

Ashwin Pananjady, Martin J. Wainwright, Thomas A. Courtade, “Linear Regression with an Unknown Permutation: Statistical and Computational Limits”, Allerton 2016.

Compressing Sparse Sources Under Local Decodability Constraints

 

Standard information theoretic schemes for source coding compress data by exploiting redundancy. This data is then “decompressed” by reading all the bits in the compressed representation, and associating this sequence with an original uncompressed sequence. What if, however, we were not particularly interested in knowing the uncompressed data in its entire form, but just in a few bits of it? This is often the case in DNA compression, in which diagnostic applications are often interested in knowing just a few bases that could correspond to the presence or absence of a condition. Is it still reasonable to expect that such an application should read all of the compressed data? Our contributions for this problem of locally decodable source coding include tight bounds on a variable block length scheme that compresses r-sparse sequences and recovers any single source bit by probing at most d bits of the codeword. We are now exploring a similar problem for the Bernoulli source, and extending our results to average case decoding depth.

Ashwin Pananjady, Thomas A. Courtade, “Compressing Sparse Sequences under Local Decodability Constraints”, IEEE International Symposium on Information Theory (ISIT) 2015.

On variants of the disjoint set cover problem

 

The maximum disjoint set cover problem, also known as the cover decomposition problem, is a fundamental combinatorial optimization problem with the following setting: we are given a universe of elements and a collection of subsets of that universe, and the objective is to partition those subsets into as many sub-collections as possible such that each partition forms a set cover of the universe (the union of the subsets in that partition is equal to the universe.) Applications of the problem include the satisfaction of fixed user demands during crowd-sourcing, maximization of coverage lifetime in sensor networks, and simultaneous batch-processing in supply chains. We study two aspects of this problem: the first has to do with designing online algorithms for the problem when subsets are provided one at a time by an adversary, and the objective of the algorithm is to assign them irrevocably to partitions in order to maximize the number of disjoint set covers relative to the optimal offline algorithm, a metric known as the competitive ratio. Applications, again, are natural online extensions of those stated above. Key technical contributions so far include a lower bound on the competitive ratio of all online algorithms, and an online algorithm that performs comparable to that lower bound for a reasonable number of targets.

Another angle that we have been exploring involves a particular integer programming formulation of the disjoint set cover problem, whose LP relaxation corresponds exactly to the maximum coverage lifetime problem in sensor networks. This work is motivated by a conjecture that the integrality gap of this packing LP was 2, and its practical ramifications include a theoretical justification of the often used disjoint set cover approach in coverage lifetime problems, and an efficient way of constructing large packing LPs with constant integrality gaps. Our technical contribution so far has been a counterexample to the conjecture using projective geometries, which has a maximum integrality gap of 7/3.

Ashwin Pananjady, Vivek Kumar Bagaria, Rahul Vaze, “The Online Disjoint Set Cover Problem and its Applications”, IEEE Conference on Computer Communications (INFOCOM) 2015.

Ashwin Pananjady, Andrew Thangaraj, “On the Integrality Gap of an Exponential Packing LP Formulation of the Cover Decomposition Problem”, part of my BTech Senior Thesis, IIT Madras.

Maximizing Coverage Lifetime of Wireless Sensor Networks

 

The maximum lifetime coverage problem is a widely studied NP-hard problem in wireless sensor networks, in which given a universe of target and a collection of battery-limited sensors that monitor those targets, the objective is to design a schedule of operation for the sensors such that the total time for which all targets are monitored is maximized. While some provably optimal efficient algorithms are known for this problem, its hardness of approximation by polynomial time algorithms has remained open. In this body of work, we first showed a tight lower bound on the approximation ratio of all polynomial time algorithms subject to certain complexity constraints.

Moreover, a common approach to solve the maximum lifetime coverage problem has been by using disjoint set covers, in which each set cover is operated independently of others in distinct time slots. However, algorithms to find disjoint set covers have all been heuristic. We provide a provably optimal algorithm for the general problem of finding disjoint set covers, which also extends to the maximum lifetime coverage problem with the same approximation ratio. Other key contributions include a polynomial time optimal algorithm for the maximum lifetime coverage problem for sensors with one-dimensional coverage, and a polynomial time approximation scheme for sensors with circular coverage.

Vivek Kumar Bagaria, Ashwin Pananjady, Rahul Vaze, “Optimally Approximating the Coverage Lifetime of Wireless Sensor Networks”, IEEE/ACM Transactions on Networking, accepted.

Approximating adversarial vertex deletion in graphs

 

Consider an election network in which vertices represent people or organizations, and edges their alliances. The problem of adversarially shielding people from the election to ensure that a particular candidate loses or wins maps to the following problem: given a graph and a distinguished vertex in the graph, find the minimum number of vertices that need to be deleted from the graph such that the distinguished vertex becomes the minimum or maximum degree vertex in the remaining graph. This is an NP-hard problem whose other applications are in measuring influence in protein networks, and analysing terrorist networks. Our key contributions were a lower bound on the approximation ratio of all polynomial time algorithms for this problem, and a polynomial time algorithm with equivalent approximation ratio for a sub-class of problems. We also analysed the problem for bipartite graphs and regular graphs.

Sounaka Mishra, Ashwin Pananjady, N Safina Devi, “On the Complexity of Making a Distinguished Vertex Minimum or Maximum Degree by Vertex Deletion”, Elsevier Journal of Discrete Algorithms, July 2015.

File Exchange Between Selfish Users

 

Peer-to-peer networks are built on the premise that file exchange between peers over a local network is cheaper than download from a remote server. However, current paradigms of file exchange allow for free-riding, in which users can leech off the other peers without providing anything valuable to the network. Platforms like torrent try to minimize free-riding by following dynamic tit-for-tat policies during file transfer in order to promote fairness. A strict fairness criterion is the Give-and-Take (GT) criterion - by which two peers will only exchange file-segments if each has something unique to offer the other - which disallows free-riding entirely. However, the GT criterion turns out to be too strict for arbitrary file-segment acquisitions, and so we propose the random sampling of file-segments by users as a mode of acquisition. The problem is to maximize the number of users who are able to acquire all file-segments. Our key contribution so far is an elegant split-and-exchange recursive algorithm that ensures a constant factor approximation to this problem with high probability for all regimes of users and files.

Ashwin Pananjady, Vivek Kumar Bagaria, Rahul Vaze, “Maximizing Utility Among Selfish Users in Social Groups”, Proceedings of the Twentieth National Conference on Communications (NCC) 2014, IIT Kanpur.