Database management systems (DBMSs) are designed to be general-purpose tools that support a wide variety of applications, from banking to social networking and making scientific discoveries. To improve the performance of such applications, researchers have leveraged the unique characteristics of application areas to build domain-specific DBMSs that outperform traditional implementations. Performing such specialization requires labor intensive, complex, and error prone efforts. The intellectual merits of this project are to advance the state of the art in application-specific DBMS design by investigating techniques to perform such domain specialization automatically.
Specifically, this project aims to leverage recent advances in programming systems and data management research to build tools that can automatically understand database application semantics. Given such knowledge, the goals of this project are to:
All software artifacts developed in this project are released to the public, with plans to incorporate their usage in both the undergraduate and graduate curricula. In addition, as part of the project is to collect and study the shortcomings of real-world database applications, the collected applications are collected into a repository that is publicly accessible repository for researchers and practitioners in the field to experiment and reproduce the results.
From purchasing plane tickets to running climate simulations, we interact with database-backed applications on a daily basis. Since their initial deployments, DBMSs have emerged as the de facto software system for managing persistent data and have supported a varied of applications across multiple domains.
While typical database applications are organized using the archetypal a three-tier architecture, the ever-increasing need to improve application performance and software maintainability lead to new research directions. Researchers have recently realized that specializing DBMS implementations for different application domains dramatically improve performance and programmability, as demonstrated in large-scale data analytics, machine learning, transactional systems, scientific applications, and streaming applications. While the general architectures of such specialized DBMSs do not deviate much from three-tier architectures, the implementation of each component can differ significantly across application domains. Unfortunately, devising domain-specific DBMSs requires substantial manual labor: developers must identify a group of applications that share common characteristics in terms of data needs, gain a thorough understanding of the semantics of the identified applications, find areas for innovation, and produce new, customized DBMS implementations. Even if this complex work succeeds, existing applications must often be completely rewritten to utilize the new implementations since their APIs for interacting with the DBMS might have changed. Needless to say, this entire process is time consuming, labor intensive, and error prone.
In this project, we propose a fundamentally different approach to the design of domain-specific DBMSs. Rather than relying on manual efforts, we will leverage existing and devise new techniques using programming languages and formal methods research to understand application semantics well enough to generate domain-specific DBMS implementations customized for a given application. Doing so lets application developers from different domains interact with DBMSs using familiar APIs but without concern for the efficiency and correctness of the underlying DBMSs that their applications utilize.
Our work has been published in top-tier academic conferences, with research software prototypes released publicly on github (see below).
Research produced from this project has been implemented in the following software prototypes:
Alvin has initiated and chaired the ACM student research competition at the annual SIGMOD conference:
Alvin will continue to lead other similar events that aim to increase and broaden research opportunities for students.
Findings from this project have been incorporated into classes offered at the University of Washington:
July 1, 2017 - June 30, 2022
Generating Application-Specific Database Management Systems
This material is based upon work supported by the National Science Foundation under Grant No. IIS-2027575. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Contact: Alvin Cheung
Date of Last Update: July 2020