Selected Research Projects in Deep Learning and Security

Deep Learning for Program Synthesis

Can we teach computers to write code? The ability to automatically synthesize code has numerous applications, ranging from helping end-users (non-technical users) create snippets of code for task automation and simple data manipulation, helping software developers synthesize mundane pieces of code or optimized code, helping data scientists clean up and explore data, to helping algorithm designers discover new algorithms.
The task is extremely challenging. Traditional approaches using constraint-solving based solutions have only been able to scale up to small programs. In this project, we will explore new directions using deep learning and deep reinforcement learning combined with other techniques such as formal reasoning and probabilistic programming for broader and more scalable program synthesis.
We will consider program synthesis from a broad spectrum of specification methods, including natural language description, input-output examples, programming by demonstrations, formal goal descriptions, novelty-based metrics, and other reward functions.
We will consider a broad spectrum of application domains, including end-user programming, automatic synthesis of security protocols and secure systems including smart contracts, automatic synthesis of data analytics queries and pipelines, automated assistance for program analysis and verification, and automatic synthesis of distributed systems code.
In this process, besides enabling real-world applications using program synthesis, we hope to make contributions towards addressing core challenges in deep learning including generalization, search, abstraction, and representation. We believe that fully solving the program synthesis problem is equivalent to solving AGI (artificial general intelligence).

Secure Deep Learning

Even though deep learning has made huge advances in many application domains, achieving super-human performance in certain tasks and datasets, deep learning systems can be fragile and easily fooled. For example, an attacker could add adversarial perturbations often invisible to human vision to an image to cause a deep neural network to misclassify the perturbed image. Such attacks go beyond image classification, and are effective across different neural network architectures and applications. Why are neural networks easily fooled? How can we build effective defense against such attacks?
Generalization is a key challenge to deep learning systems. How do we know how a deep learning system such as a neural program, a robot or a self-driving car will behave in a new environment and still be safe and secure against attacks such as adversarial perturbation? How do we specify security properties for deep learning systems? How do we test and verify desired security properties for deep learning systems? Is it possible to provide provable guarantees?
Security will be one of the biggest challenges in deploying artificial intelligence. Traditional program verification techniques such as symbolic reasoning are mainly effective for logical programs. To reason about the safety and security of artificial intelligence systems, we need to design and develop new techniques and approaches. We plan to take a multi-pronged approach to explore deeper understanding of attacks, defenses, and methods for reasoning about the security of artificial intelligence systems.

Privacy-preserving Machine Learning and Data Analytics

Machine learning and data analytics compute statistics and build models from different users' (often sensitive) data, such as location, medical, and financial data. With the rise of ubiquitous sensing, personalization, and virtual assistants, users' privacy is at ever-increasing risk. Can we enable the power and utility of machine learning and data analytics while still ensuring users' privacy? What is the relationship between privacy-preservation and generalization and robustness in machine learning?
We will explore new techniques and approaches including differential privacy to enable privacy-preserving machine learning and data analytics in the real world. We aim to design and develop a general framework to enable automatic data analytics query analysis and rewriting to ensure the query results are differentially private. We plan to explore different approaches for differentially-private deep learning. Our goal is to both provide practical real-world solutions for privacy-preserving machine learning and data analytics and deepen the theoretical understanding in this area.

Keystone: Open Source Secure Enclave

Secure computation is a powerful abstraction, protecting the integrity and confidentiality of computations over sensitive data. There are already many applications for secure computing, and it will continue to grow in importance. Secure hardware enclaves provide a powerful solution to secure computation with little or no performance overhead over native computation. Hardware enclaves enable computation over confidential data, providing strong isolation from other applications, the operating system, and the host.
How can we create a truly trustworthy secure enclave? It will require open source design and implementation and decentralized trust on its lifecycle management. Although many TEEs have been proposed by both industry (e.g., Intel SGX) and academia (e.g., Sanctum), no full-stack implementation has been open-sourced for use.
Keystone is an open-source project for building trusted execution environments (TEE) with secure hardware enclaves, based on the RISC-V architecture. Our goal is to build a secure and trustworthy open-source secure hardware enclave, accessible to everyone in industry and academia. Keystone introduces customizable TEE, a new paradigm of building TEE wherein both platform providers and enclave developers customize their TEE to have minimal trusted computing base (TCB), and be highly optimized for the resource usage of each application. This enables a lot of use cases of Keystone enclaves from embedded IoT application to machine learning.

Artificial Intelligence for Data Science

More and more data is being collected in all areas ranging from business activities, smart homes, smart buildings, to s\ mart cities, with the promise to help improve decision making and efficiency. However, data analytics today is still a \ labor-intensive process, requiring significant manual effort at almost every stage of the data science pipeline. As a r\ esult, huge volumes of collected data goes unutilized due to the lack of analyst resources. Can we make the data scienc\ e pipeline more automated and reduce the mundane manual labor needed? Can we help analysts be more productive and help \ automatically extract insights from data?
We aim to explore new approaches for automated data exploration and insight extraction, while leveraging limited guidan\ ce and feedback from human analysts. To achieve this, we will explore and combine techniques including deep learning, r\ einforcement learning, program synthesis, meta learning, probabilistic programming, and interpretable machine learning.\ Given a dataset, we will explore how to automate the different stages of the data science pipeline, including data wra\ ngling, data cleaning, feature engineering and extraction, model building and architecture search, model criticism and \ revision, and results presentation and interpretation.
We will explore diverse application domains including computer security such as attack and anomaly detection and diagno\ sis, system monitoring and diagnosis, and trend analysis. Our long-term vision is to build real-world systems that auto\ matically explore, analyze and learn from data in order to glean insights from data and facilitate decision making, whi\ le leveraging limited human guidance and feedback.

AI and Blockchain Technology

Blockchain and smart contracts provide new ways to build distributed and decentralized applications, empowering new capabilities in financial instruments and beyond. How do we ensure that the distributed and decentralized system will operate to ensure the desired properties, even when different parties have different incentives and are autonomous? How do we ensure that smart contracts are correctly written to enforce the desired behaviors?
We plan to explore new techniques for how to reason about smart contracts and decentralized applications. We aim to design new techniques for automatic exploration of the design space of distributed consensus and decentralized system. We also plan to explore new approaches for automatic synthesis of smart contracts. Our exploration will in particular leverage methods and new development in machine learning and deep learning.

Deep Learning for Security Applications

Deep learning can provide new capabilities and approaches for addressing security problems. In this project, we will explore different security application domains where deep learning provides promising solutions, including vulnerability detection, fraud detection, and defense against various types of malicious attacks.

Others

We are interested in advancing the-state-of-the-art in deep learning and artificial general intelligence in general. As an example, we are particularly interested in self-learning where an agent can learn to improve itself through interacting with the environment. Could we build an agent to automatically learn to read Math textbooks and do Math? We are excited to explore these areas and build self-learning agents as a long-term vision.