I am a Computer Science PhD candidate at AMPLab, EECS, UC Berkeley, advised by Michael I. Jordan.
My research interests encompasses Machine Learning and Big Data problems, including designing scalable machine learning algorithms for deployment in large-scale systems.
Specifically, my current research focuses on adapting database concepts on concurrency control to parallelizing inherently sequential machine learning algorithms, in order to maximize scalability while preserving correctness and theoretical guarantees. This approach has been successfully applied to non-parametric clustering, non-parametric feature modelling, online facility location, submodular maximization, and correlation clustering.
Prior to my PhD studies, I worked as a research scientist at DSO National Laboratories, Singapore. As part of a collaboration with the Future Urban Mobility project at SMART (Singapore-MIT Alliance for Science and Technology), I worked with Javed Aslam and Daniela Rus on mining travel patterns using data collected from a roving network sensor of taxi probes.
I obtained my BS and MS in Computer Science at Carnegie Mellon University, where I was advised by Priya Narasimhan. As part of my thesis work, I developed a framework for localizing and diagnosing faulty nodes in a MapReduce cluster, based on OS-level performance counters, white-box metrics extracted from logs, and on application-level heartbeats. The fault diagnosis framework was able to capture a variety of faults including resource hogs and application hangs, and to localize the fault to subsets of worker nodes in a Hadoop system.[ Short Biography | CV ]