Publications
AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads
Top Picks in Computer Architecture
ACM Reproducibility Badges: Artifacts Available, Artifacts Evaluated - Functional, Results Reproduced
Seah Kim, Jerry Zhao, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
International Symposium on Microarchitecture (MICRO), October 2023.
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators
ACM Reproducibility Badges: Artifacts Available, Artifacts Evaluated - Functional, Results Reproduced
Charles Hong, Qijing Huang, Grace Dinh, Mahesh Subedar, Yakun Sophia Shao
International Symposium on Microarchitecture (MICRO), October 2023.
RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation
ISCA Distinguished Artifact Award
ACM Reproducibility Badges: Artifacts Available, Artifacts Evaluated - Functional, Results Reproduced
Dima Nikiforov, Shengjun Chris Dong, Chengyi Lux Zhang, Seah Kim, Borivoje Nikolic, Yakun Sophia Shao
International Symposium on Computer Architecture (ISCA), June 2023.
CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems
ACM Reproducibility Badges: Artifacts Available, Artifacts Evaluated - Functional, Results Reproduced
Sagar Karandikar, Aniruddha Udipi, Junsun Choi, Joonho Whangbo, Jerry Zhao, Svilen Kanev, Edwin Lim, Jyrki Alakuijala, Vrishab Madduri, Yakun Sophia Shao, Borivoje Nikolic, Krste Asanovic, Parthasarathy Ranganathan
International Symposium on Computer Architecture (ISCA), June 2023.
RETROSPECTIVE: Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks
ISCA@50 Retrospective: 1996-2020, June 2023.
MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks
IEEE Reproducibility Badges: Open Research Objects, Research Objects Reviewed, Results Reproduced
Seah Kim, Hasan Genc, Vadim Nikiforov, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
IEEE International Symposium on High-Performance Computer Architecture (HPCA) , March 2023.
Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design
Qijing Huang, Charles Hong, John Wawrzynek, Mahesh Subedar, Yakun Sophia Shao
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , May 2022.
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration
DAC Best Paper Award
Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Steffl, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
Design Automation Conference (DAC) , December 2021.
A 16mm2 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET
Abraham Gonzalez, Jerry Zhao, Ben Korpan, Hasan Genc, Colin Schmidt, John Wright, Ayan Biswas, Alon Amid, Farhana Sheikh, Anton Sorokin, Sirisha Kale, Mani Yalamanchi, Ramya Yarlagadda, Mark Flannigan, Larry Abramowitz, Elad Alon, Yakun Sophia Shao, Krste Asanovic, and Bora Nikolic
IEEE European Solid-State Circuit Conference (ESSCIRC), September 2021.
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators
Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao
International Symposium on Computer Architecture (ISCA), June 2021.
Simba: Scaling Deep-Learning Inference with Chiplet-Based Architecture
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler
Communications of the ACM, June 2021.
Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs
Alon Amid, Albert Ou, Krste Asanovic, Yakun Sophia Shao, and Borivoje Nikolic
IEEE International Symposium on Circuits and Systems (ISCAS), May 2021.
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms
Jingyi Xu, Sehoon Kim, Borivoje Nikolic, Yakun Sophia Shao
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , March 2021.
SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference
Jie-Fang Zhang, Ching-En Lee, Chester Liu, Yakun Sophia Shao, Stephen W. Keckler, Zhengya Zhang
IEEE Journal of Solid-State Circuits (JSSC), February 2021.
Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs
Alon Amid, David Biancolin, Abraham Gonzalez, Daniel Grubb, Sagar Karandikar, Harrison Liew, Albert Magyar, Howard Mao, Albert Ou, Nathan Pemberton, Paul Rigge, Colin Schmidt, John Wright, Jerry Zhao, Yakun Sophia Shao, Krste Asanovic, Borivoje Nikolic
IEEE Micro Special Issue on Agile and Open-Source Hardware, July/August 2020.
NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning
Ameer Haj-Ali, Nesreen K. Ahmed, Ted Willke, Yakun Sophia Shao, Krste Asanovic, Ion Stoica
International Symposium on Code Generation and Optimization (CGO), February 2020.
A 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Inference Accelerator with Ground-Referenced Signaling in 16nm
JSSC Best Paper Award
Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany
IEEE Journal of Solid-State Circuits (JSSC), Jan 2020.
MAGNet: A Modular Accelerator Generator for Neural Networks
Rangharajan Venkatesan, Yakun Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Yanqing Zhang, Brian Zimmer, William J. Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany
International Conference on Computer Aided Design (ICCAD), November 2019.
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
MICRO Best Paper Award, Top Picks in Computer Architecture Honorable Mention, Selected as a CACM Research Highlight
Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler
International Symposium on Microarchitecture (MICRO), October 2019.
A 0.11pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator Designed with a High-Productivity VLSI Methodology
Rangharajan Venkatesan, Yakun Sophia Shao, Brian Zimmer, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany
HotChips: A Symposium on High Performance Chips (HotChips), August 2019.
A 0.11pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Referenced Signaling in 16nm
Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany
International Symposia on VLSI Technology and Circuits (VLSI), June 2019.
SNAP: A 1.67-21.55 TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference
Jie-Fang Zhang, Ching-En Lee, Chester Liu, Yakun Sophia Shao, Stephen W. Keckler, Zhengya Zhang
International Symposia on VLSI Technology and Circuits (VLSI), June 2019.
Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration
Top Picks in Computer Architecture Honorable Mention
Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W. Keckler, Christopher W. Fletcher, Joel Emer
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) , April 2019.
Timeloop: A Systematic Approach to DNN Accelerator Evaluation
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, Joel Emer
International Symposium on Performance Analysis of Systems and Software (ISPASS) , March 2019.
A Modular Digital VLSI Flow for High-Productivity SoC Design
Brucek Khailany, Evgeni Krimer, Rangharajan Venkatesan, Jason Clemons, Joel Emer, Matthew Fojtik, Alicia Klinefelter, Michael Pellauer, Nathaniel Pinckney, Yakun Sophia Shao, Shreesha Srinath, Christopher Torng, Sam (Likun) Xi, Yanqing Zhang, Brian Zimmer
Design Automation Conference (DAC) , June 2018.
Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks
Ching-En Lee, Yakun Sophia Shao, Jie-Fang Zhang, Angshuman Parashar, Joel Emer, Stephen W. Keckler, Zhengya Zhang
SysML Conference, February 2018.
Using Dynamic Dependence Analysis to Improve the Quality of High-Level Synthesis Designs
Rafael Garibotti, Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks
International Symposium on Circuits and Systems (ISCAS), May 2017.
Co-Designing Accelerators and SoC Interfaces using gem5-Aladdin
Yakun Sophia Shao, Sam Likun Xi, Vijayalakshmi Srinivasan, Gu-Yeon Wei, and David Brooks
International Symposium on Microarchitecture (MICRO) , October 2016.
The Aladdin Approach to Accelerator Design and Modeling
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks
IEEE Micro, Top Picks of 2014, May-June 2015.
Toward Cache-Friendly Hardware Accelerators
Yakun Sophia Shao, Sam Xi, Viji Srinivasan, Gu-Yeon Wei, and David Brooks
HPCA Sensors and Cloud Architectures Workshop (SCAW), Feb 2015.
MachSuite: Benchmarks for Accelerator Design and Customized Architectures
Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks
International Symposium on Workload Characterization (IISWC), Oct 2014.
Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures
Top Picks in Computer Architecture
Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks
International Symposium on Computer Architecture (ISCA), June 2014.
Energy Characterization and Instruction-Level Energy Model of Intel's Xeon Phi Processor
Yakun Sophia Shao and David Brooks
International Symposium on Low Power Electronics and Design (ISLPED), Sept 2013.
Quantifying Acceleration: Power/Performance Trade-Offs of Application Kernels in Hardware
Brandon Reagen, Yakun Sophia Shao, Gu-Yeon Wei, and David Brooks
International Symposium on Low Power Electronics and Design (ISLPED), Sept 2013.
ISA-Independent Workload Characterization and its Implications for Specialized Architectures
Yakun Sophia Shao and David Brooks
International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2013.
Power, Performance and Portability: System Design Considerations for Micro Air Vehicle Applications
Yakun Sophia Shao, Judson Porter, Michael Lyons, Gu-Yeon Wei, and David Brooks
Advanced Computer Architecture and Compilation for Embedded Systems (ACACES), July 2010.
Dissertation and Book
Design and Modeling of Specialized Architectures
Yakun Sophia Shao
Ph.D. Dissertation, Harvard University, May 2016.
Research Infrastructures for Hardware Accelerators
Yakun Sophia Shao and David Brooks
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, November 2015.
Patents
Efficient Neural Network Accelerator Dataflows
US Patent App. 16/672,918, Filed Nov 2019.
Scalable Multi-Die Deep Learning System
US Patent App. 16/517,431, Filed July 2019.
Deep Neural Network Accelerator with Fine-Grained Parallelism Discovery
US Patent App. 15/929,093, Filed Jan 2019.