The Freedom CPU architecture: a GNU/GPL'ed high-performance 64-bit microprocessor developed in an open, Web-wide collaborative environment. Andrew D. Balsa, 5 August 1998 w/ many contributions from Rafael Reilova and Richard Gooch. History ------- The idea of a GNU/GPl'ed CPU design sprang in the middle of some email exchanges between three long-time GNU/Linux users (also Linux kernel developers in their spare time) with diverse backgrounds*. We were questioning monopolies and how the dominance of an operating system (including the kernel, the Graphical User Interface and the availability of "killer-applications" as well as documentation) was intimately related to the world-wide dominance of a specific, outdated, awkward and inefficient CPU architecture. I guess we all know what I am referring to. We also expressed our faith that GNU/Linux is well on its way to provide the basic foundation for a totally Free software environment (in the GNU/GPL sense; please get a copy of the GNU GPL license if you are reading this, or check www.gnu.org). However, this Freedom is limited or rather bound by the proprietary hardware on which it feels most at home to run: the traditional x86-based PC. Finally, we were worried that Intel's attitude of not releasing advance information to the Free Software community about its forthcoming Merced architecture would delay the development of a compatible gcc compiler, of a custom version of the Linux kernel, and finally of the vast universe of Free Software tools. It is vaguely rumoured that Linus Torvalds may have received advance information on Merced by signing an Intel NDA, but this would be an individual exception and wouldn't really fit with the spirit of Free Software. On the whole, even though Merced will certainly be more modern that the x86 architecture, it will be a step backwards in terms of Freedom, since unlike for the x86, there will most likely never be a Merced clone chip. In the previous days, we had been discussing the various models for Free Software development, their advantages and disadvantages. Putting these two discussions together, I quickly drafted an idea and posted it to Rafael and Richard, warning them that this would be good reading while they were compiling XFree86 or a similarly large package... and then they liked it! Here is this crazy, utopic idea, merged with comments, criticism and further ideas from Rafael and Richard: The Freedom GNU/GPL'ed architecture ----------------------------------- We started with some questions: - Why don't we develop a 64-bit CPU and put the design under the GNU General Public License? - Why don't we make the development process of this new CPU completely open and transparent, so that the best brains worldwide can contribute with the best ideas (somehow using the same communication mechanisms traditionally used by the Free Software community)? - How can we make the CPU development process entirely democratic and truly open, whereas it is usually surrounded by paranoia and secrets? - How can we design something that will improve in *technical* *grounds* on what will be available in 2000 from the most advanced CPU architecture team ever put together by any corporation (the Merced)? There are really two distinct incredible challenges here: a) the performance and feasability of the resulting architecture, and b) the open development process under a GNU/GPL license and the intellectual property rights questions raised by this process. Tackling a) first (performance and feasability), we think the Freedom architecture could be more efficient under GNU/Linux compared to other architectures by making it: 1) More compatible with the gcc compiler. We have the source code to gcc, but most importantly, we have the gcc developers available to help us figure out what features they would like to see in a CPU architecture. Why gcc? Because it is the cornerstone of the entire body of Free Software. Basically, an efficient architecture for gcc will see an increase in efficiency across-the-board on *all* Free Software programs. 2) Faster in the Linux kernel. Right now, if we take for example the PC architecture, we notice that the Linux kernel has to "work around" (and some would say "work against") various idiosyncrasies of the x86/PC specifications and hardware. We also have to maintain compatibility with outdated x86 chips. And obviously, there is no possibility of implementing some of the often used Linux kernel functions in silicon. A new design, custom fitted to the Linux kernel code, would vastly improve the performance of any kernel-bound applications. Further ideas for a possible architecture and implementation can be found in the appendices (as well as the "economics" of the project). Note that we are calling the architecture "Freedom" (for obvious reasons), and its first implementation "F1". Projected end-user cost of an F1 CPU is around $100. Everything is very utopic, we know. :-) However, it also seems to us that at this stage, the real challenges for our project are entirely within b): the development process and the intellectual property issues. Developing the Freedom architecture: issues and challenges ---------------------------------------------------------- The Dilbert cartoon says it all, in fact: our project *is* a whole new paradigm! What we are basically proposing is to bring together the competences and creative powers of thousands of individuals over the Web into the design process of an advanced, Free, GNU/GPL'ed 64-bit CPU architecture. And we don't even know if it's possible! We know two things for sure: 1) In the past and present, corporations like Intel, IBM and Motorola are known for having broken down design teams, so that no close groups could be formed that would be able to recreate the entire design (and eventually quit and form their own companies). Recently, Andy Grove has given a new meaning to the word "paranoia" as a management tool. Our proposed Free, open, transparent, collaborative environment counters this trend. It is also in a large part related to some new trends in Human Resources management and Organizational theory. In fact, it is very akin to the concept of Virtual Corporations, except that in this case we are rather dealing with a Virtual Non-Profit Organization. In this respect, the Freedom project is also an experiment in Organizational theory, but it's not a gratuitous experiment. Many studies indicate that keeping people in small closed groups, bound by strict NDA and other legal constraints to public silence, and putting a relatively high amount of pressure on these groups, is _not_ the best method to unleash creative powers. It also sometimes leads to buggy designs... 2) The development of the Linux kernel, by a group of highly talented programmers/system developers is an example that an open, collaborative environment aiming for a GNU/GPL'ed piece of software with a particularly high intellectual/technological value, is possible. Moreover, it can be shown that in some areas, the Linux kernel performs _better_ then its commercial counterparts. However, this list of certainties is rather short compared to the list of questions generated by our proposal: - How will new ideas be selected or discarded for inclusion in the design, amid the inevitable "noise" of Bad Ideas (tm)? Who will be the judge of what's Good and Bad? - Also inevitably, mutually exclusive options/features will appear during the course of development. Again, who will decide on the direction to be chosen? - Who will own the final design intellectual property rights? Is the "copyleft" applicable in the case of a CPU design? What about the masks for the first silicon? - Will the GPL be sufficient as a legal instrument to protect the design? What changes, if any, will have to be made to the GNU/GPL to adapt it to a chip design? - If the design process uses commercial EDA and other tools, in what measure do these proprietary items "taint" our GNU/GPL'ed design? Is it possible to separate the GPL part from the commercial/proprietary parts? - What about existing patents? Will the project need any? Will it be able to "buy" any, or pay royalties? - Contrarily to a piece of software, partial implementations of the Freedom design will not be possible. The first implementation that will go to silicon *must* be functional and complete. All "holes" in the design must be plugged before the first mask gets drawn. How do we make volunteers accept such a rigid schedule? There are some questions raised as a consequence of the possible succes of the Freedom implementation: - There are vast possibilities for a GNU/GPL'ed CPU design in the industrial, medical, aeronautical, automotive and other domains. In fact, a Free, stable, high-performance design offers possibilities never before envisioned by hardware designers in various domains. Is this the beginning of a small revolution in e.g. embedded hardware? - Will the design sustain itself over the years as the ideal GNU/Linux processor? - Can this experiment in open development have other consequences on the electronics industry? Are we really proposing a new paradigm for CPU development? Can this paradigm be applied to other VLSI designs? Tools ----- We all know the saying: "If the only tool one has is a hammer...". We'll need "groupware" tools for the Freedom project, but the word "groupware" has a bad reputation nowadays. We prefer to use "collaborative work tools". Some of them have only come into existence and widespread use in the last decade; I am obviously talking about the Web itself, and its assortment of communication technologies: email, newsgroups, mailing lists, Web sites, SGML/PDF/HTML documentation and editing/translation software. Much of this infrastructure is/has been used to develop GNU/Linux, and is nowadays based on GNU/Linux, BTW. But we'll also need new tools, that perhaps don't even exist yet. I think it's worth mentionning that perhaps one the greatest steps in this direction is the WELD project, developped at Berkeley. It could well become the cornerstone of the Freedom project, or conversely, the Freedom project can perhaps be thought of as _the_ ideal, perfect test case for the WELD project. Conclusion ---------- The conclusion is simple and obvious: - if you are a CPU architect/VLSI engineer, or - if you have a good idea on CPU design that you have been toying with for some time and would like to test, or - if you just like challenging intellectual propositions _and_ brainstorming interaction: Please join and help us turn this idea into a reality! -- *: Richard is an Australian astrophysicist preparing his Ph.D. on astronomic visualization; Rafael is a researcher on EDA tools at the University of Cincinatti. I am an ex-Ph.D. student in Management and an ex-firmware engineer, with a special interest in Ethical problems in multi-cultural environments (I was born in Brazil and am presently living in France). None of us has any formal education in CPU architecture. Rafael comes closest, since he is in VLSI design and EDA tools development, and also developed some new code for CPU recognition in the Linux kernel. Richard developed the Pentium Pro MTRR support in the Linux 2.1.x kernels (as well as other novel kernel routines), and is also a hardware developer. I have the honour of having diagnosed the Cyrix 6x86 "Coma" bug and proposed a workaround for it under GNU/Linux (both were at first rejected by Cyrix Corp.). I am also a long time hardware and firmware developer, and have contributed in various ways to GNU/Linux development (e.g. the Linux Benchmarking HOWTO). Richard E. Gooch Rafael R. Reilova Andrew D. Balsa -- Appendix A: Ideas for a GPL'ed 64-bit high performance processor design 24 July 1998 This is just a dream, a utopic idea of a free processor design. It's also a list of things I would like to see in a future processor. 1) This project will need a sponsor if it ever wants to become a reality. Getting first silicon is not going to be free, nor easy. 2) Choice of a 64-bit datapath, address space: obvious nowadays. Simplifies just about everything. 3) Huffman encoded instruction set: improves cache/memory -> CPU bandwidth, which is one of the main bottlenecks nowadays. Should be quite simple to add a Huffman encoder to a compiler back-end. All instructions lengths are multiple of byte. 4) RISC vs. CISC vs. dataflow debate: it's over! Get the advantages of each, disadvantages of none as much as feasible. 5) 1, 2 or 4 internal 7-stage pipelines. 6) Speculative execution: 4 branches, 8 instructions deep each. 7) 64-byte instruction prefetch queue. 8) 32-byte write buffers. 9) Microprogram partly in RAM. Must be able to emulate x86 instruction set (assembler source level). 10) 64-bit TSC w/ multiple interrupt capabilities. 11) Power saving features. 12) MMX and 3DNow! emulation. 13) Fully-static design (clock-stoppable). 14) F1 implementation: 128 bits external data path, 40 bits external addressing capabilities. 15) Performance monitoring registers "a la" Pentium. 16) External FPU, memory mapped (have no idea what it should look like). FPUs can be added to work in parallel (up to 4?). Separate bus. Same bus can handle a graphics coprocessor with its dual-ported memory. 17) 8KB 4-ported L1 unified cache, with independent line-locking/line-flushing capabilities. Can be thought of as a 1 KB register set. 18) Separate 64KB each L2 instruction and data caches, running at CPU speed. 19) Integrated intelligent DMA controller, 32 channels. 20) Integrated interrupt controller: 30 maskable interrupts, 1 System Management interrupt, 1 non-maskable interrupt. 21) 0 internal registers! Yep, this is a memory-memory machine. Instruction set recognizes 32 pseudo-registers at any moment. 22) Interrupts cause automatic register set switch to vectored register set => 0 (zero) context switch latency! 23) No penalty for instructions that access byte, word, dword data. 24) Operation in little or big-endian mode "a la" MIPS. 25) Paging "a la" Intel, with 4k pages + 4M extension. 26) Also VSPM "a la" Cyrix 6x86, with 1K definable pages. 27) ARR registers "a la" Cyrix 6x86 (similar to MTRR on Intel PPro): allows defining non-cacheable regions (useful for NUMA, see below). 28) Internal PLL with software programmable multiplier; can switch from 1x to 2x to 3x to nx in 0.5 increments, on-the-fly. 29) The MMU should also support object protection "a la" Apple Newton. 30) Single-bit ECC throughout. 31) Direct support of 4 1MB dual ported memory regions for NUMA-style multiprocessing (also on FPU bus). 32) CPU architecture project name: "Freedom". Could also be called "Merced-killer", or "Anti-Merced", or "!Merced", but in fact we are not anti-anything with this project. We are just pro-Freedom and open; what we dislike about the Intel Merced is its proprietary design and restrictive development environment. I guess the challenge here is to determine whether a GPL'ed CPU design is feasible. Is open, collaborative development possible WRT CPU design? How does one get the funding to actually put the design on silicon, once it is ready? How can revisions be handled? Are there patents that would inherently block such a development process? The idea also is to use gcc as the ideal development compiler for this CPU (unlike Merced). And to be able to port the Linux kernel with a minimal effort on this new processor. -- Appendix B: Freedom-F1 die area / cost / packaging physical characteristics / external bus August 5, 1998 Just as a reminder, the F1 CPU does _not_ include an FPU or 3DNow! unit (but SIMD integer instructions will be included). Recommended maximum size: 122 mm2. This gives us 200 dies/8-inch wafer (see an example of such a wafer on Hennessy and Patterson, page 11). Roughly, die yield = 0.5 for our 122 mm2 5-layer 0.25 micron CPU (H&P, page 13, updated to reflect better fabs). This allows more or less 10-11 million transistors, divided as follows: 6-7 million for the caches, 4-5 million for the rest. Assume wafer yield = 95%, final test yield = 95%. Testing costs of $500/hour, 20 seconds/CPU. Packaging costs = $25-50 (see below) Roughly, following H&P, this gives us a unit cost of $75-100/good CPU, tested, boxed in anti-static packaging and shipped to the US, if the Taiwan foundries can keep the wafer processing cost around $3.500. Packaging: I am going to propose something surprising, but I think we should use the same packaging as the Celeron CPU, in terms of physical dimensions and CPU placement. Like that we can also use the Celeron heatsink/fans already in the market, and the Celeron mounting hardware. PCI set: again I am going to propose a heresy, but I think we could use 100MHz Slot 1 motherboards. First, Intel is not alone anymore manufacturing Slot 1 chipsets: VIA has just released a Slot 1 chipset with excellent performance and the latest goodies in terms of technology (we can get timing info from the VIA chipset datasheets). Second, we don't have to worry about the motherboard/PCI set issue anymore. Third, it's almost impossible to go beyond 100MHz on a standard motherboard, because of RFI issues; so basically 100-112MHz is as good as it gets. Fourth, there will be many people out there with Slot 1 motherboards, willing to upgrade their PII/Celeron CPUs (specially the Celeron). Fifth, these motherboards are nowadays quite cheap, and we get all the benefits of high-volume production. Sixth, this allows easy upgrades of the Freedom CPU to higher speed grades, larger cache versions, FPU-with versions, etc... Now, if we accept the above, we have to put on the Celeron-style Freedom printed circuit a small EEProm that will contain the Freedom BIOS, the L2 cache and a socket for the FPU. This increases the cost of the CPU, but decreases overall costs, so I still think it's a good move. Please check a photograph of the Celeron and tell me if I am just dreaming. -- Appendix C: Legal issues / financial issues August 5, 1998 We would like to have support from the Free Software Foundation for the Freedom project. We are _not_ proposing that the Free Software Foundation build a fab. What we are saying is: If we go to a foundry in the US or Taiwan, give them a mask, and ask them to run a batch of 0.25 micron, 5 layer 8-inch wafers for us, they'll quote approx. $3K-5K or less even, per wafer, as their price (our cost) for our batch (in the year 2000). An approximate cost for a batch of F1 CPUs would theoretically be somewhere between $ 500k and $ 1000K, for 5000-10000 good CPUs. Not exactly pocket money, but we could sell those CPUs on a subscription basis. Like this: people who would subscribe would get the Merced-killer for around $100 (compare that to the projected cost of $ 5000/unit for the Merced), on a first-come/first served basis, and any left-over CPUs after the cost of the batch would be covered, could be sold for a slightly higher price to pay for the next batch and further mask development. We suggest putting some quotas in the system. Demand is likely to be higher than supply. ;-) The Free Software Foundation could coordinate all the legal/financial/logistic aspects of the project (and would be adequately compensated for this work). This, of course, would depend on getting support from Mr. Stallman for this initiative.