Overview    Publications    Projects    Services   

Current Projects

I am wrapping up the following projects at UC Berkeley. While I have worked on a variety of projects in the past, my future research will be focused on generalized query optimizations for relational and non-relational data analytical workloads (e.g., pandas dataframes, spreadsheet formulas, and machine learning) and efficiently serving these workloads in the cloud.

  • FormS: a Python library that efficiently translates spreadsheet formulas to SQL queries
  • Smash: a string distance metric that captures acronyms, abbreviations, and typos together.

Past Projects

My past research in PhD and PostDoc was focused on supporting user-centered analytical interfaces at scale by reshaping modern data analytical stacks on three aspects:

  • Interactivity: how do we help end-users consume visual results with desirable properties and performance preserved?
  • Scalability: how do we scale the execution of user-centered analytical interfaces to multiple machines?
  • Cost: how do we reduce resource usage while not sacrificing performance?
Please check out this video for an overview of my past research. The major projects I worked on include:
  • Transactional Panorama: a conceptual framework for user perception in analytical visual interfaces
  • Taco: efficient and compact spreadsheet formula graphs
  • Modin: a scalable dataframe system
  • Lux: a visualization recommendation library for data scientists to perform easy data exploration in dataframe workflow
  • CrocodileDB: a new database architecture that exploits time slackness to enable new resource-efficient query execution (video)
  • ACC: a high-performance main-memory database that adaptively choosees and mixes multiple concurrency control protocols