The End of Central Processor Units by David A. Patterson and Kimberly Keeton 1) Problem: servers today cannot scale as fast as the increasing demand and surprisingly even I/O-intensive applications are limited by CPU speed. They are also needlessly expensive: almost half the price of a large Sun Enterprise 10000--64 processors, 64 GB DRAM, 668 disks--are in the microprocessors, boards, and enclosures for the CPU, not including memory or disks and their enclosures. Moreover, servers today are plagued by availability and system administration problems, with annual cost of ownership running at three times the original hardware price. 2) Solution: microprocessors are placed near the I/O devices and connected via fast serial links and single-chip crossbar switches. Such a system scales communication bandwidth and processing with increasing number of disks, and yet reduces cost by replacing expensive desktop-oriented microprocessors with low cost embedded-oriented microprocessors and by leveraging the cabinetry, cooling, and power supplies already needed for I/O devices. It is also easier to support redundancy to increase availability and therefore reduce the cost of system administration. 3) Justification: In the past, I/O devices were slow relative to the CPU, and hence relegated to third-class status in the minds of system designers. Current systems rely on a hierarchy of interconnects and contoller firmware: SCSI controller on disk is connected via a SCSI bus to SCSI controller on a PCI chip, which is connected via PCI bus to a PCI bridge chip, which is connected to via the memory bus to a memory controller. Such a hodge-podge of buses and firmware were satisfactory when I/O devices were slow, but Gigabit Ethernet is near and a single disk now transfers at 30 MB/sec from the media, increasing at 40%/year. Since power supplies and fans for disks can only support a modest amount of additional power, and because of the size restrictions of a disk, the ideal microprocessor for an "Intelligent DISK" (IDISK) needs to be power- efficient, and integrates memory, fast serial lines, and disk interface into a single chip. Embedded microprocessors hold promise: they are more than 1/2 integer performance of desktop microprocessors yet the die is 4 to 6 times smaller and they burn 10 to 100 times less power. Since Moore's Law also applies to crossbar switches, a high bandwidth communication system can be constructed using fast serial lines over 10s of meters of copper wire connected to single chip switches, allowing them to both scale economically with the number of disks and reduce cost of redundancy to support availability. IDISKs have the processing and communication to enable low cost RAID support and the monitoring and control to detect and isolate failed components. By proving a conventional interface to standard front end, IDISKs can appear as just a bunch of disks rather than as hundreds of computers. This combination of availability and hiding complexity should significantly reduce cost of ownership. Thus CPU will soon join terms like core and drum which reflect the age of the speaker rather than the state of the art.