Maximally Scalable Computing
via Physics-Based Models & Architectures
(working title)

ABSTRACT
for SGER and/or CAREER proposal

Michael P. Frank
CISE Department
University of Florida
June 6, 2000

The following is a draft abstract describing a line of research to be proposed for NSF support under SGER, CAREER, and/or possibly other programs.

Motivation and Background

We are all aware of the great economic and social benefits that our increasingly cost-effective computing technologies have been contributing to the nation and to humankind, benefits that have been especially apparent during the record-setting U.S. economic expansion of the last decade. How, as scientists, can we best help to ensure that rapid improvements in computing technology will continue for as long as possible?

Over the next few decades, assuming that Moore's Law [] continues to hold, the sizes of our lowest-level computational devices (initially MOSFET transistors, then whatever replaces them) will rapidly approach the nanometer scale, and affordable computational systems will contain exponentially-increasing numbers of such devices. As a result, the question of whether those devices are being operated and organized in the most asymptotically efficient manner becomes increasingly important. With the number of devices per machine approaching the trillions and beyond, even a small polynomial defecit in asymptotic efficiency can easily overwhelm typical constant factor differences between architectures, and translate into significant real deficits in computer performance below what could potentially be achieved.

To gain insight into the most asymptotically efficient and most scalable possible ways of organizing computation, my colleagues and I have been studying models of computation based directly on the (known) computational capabilities and limits of physics itself [,]. Our models incorporate all known fundamental physical constraints on information processing, such as the speed-of-light limit, fundamental quantum limits on information density and processing rates, and the very important constraints posed by physical reversibility and the second law of thermodynamics. (Our models are also perfectly compatible with the possibility of quantum-coherent computing [,], though it is not yet clear whether that particular feature will become practical to implement within the foreseeable future.)

The key feature of all our models is that they permit computations to be performed in a manner that can be made both logically and thermodynamically reversible to an arbitrarily high degree of perfection (which must be scaled up modestly with machine size). These models are thermodynamically efficient, continually recycling nearly all of the energy used to encode their computational states. This allows the devices to be effectively packed into dense, three-dimensional configurations that decrease communication delays and thereby speed up many parallel computations.

We have posed the very strong conjecture, which we call the "Tight Church's Thesis" [,] that our preferred model (or some minor variation of it) is, for all classes of computational tasks, asymptotically exactly as efficient (within only a constant, not a polynomial factor) as is the most efficient possible physical mechanism for performing the task. In fact, we have proven that our model, in its physical instantiation, is strictly more efficient, by polynomial factors, than all bounded-temperature physical implementations of traditional (irreversible) models of computation, for a broad class of problems.

The major attraction for theoretical computer science of having such a tight model is that it permits developing exact predictions of the asymptotic performance of aggressive algorithms, and studying asymptotically exact complexity classes, without the worry that the model's predictions will fail to scale well to very large machine sizes, or that some more efficient model will come along later to undercut one's lower bounds by some polynomial factor. In contrast, the sort of model that we propose could only be unseated if there were surprising new developments in fundamental physics.

For computer engineers, the attraction of these physical models is that they provide a very direct and explicit guide to the design of efficient computer architectures, especially in the nano-scale regime. We have calculated that, for a wide variety of proposed nano-scale computing mechanisms and reasonable thermal constraints, any densely-packed machine larger than a few microns or millimeters across will operate faster if it uses our reversible model than if it uses any physically possible implementation of any irreversible model of computation.

In the course of our research to date, we have actually implemented and tested physical instantiations of our model using standard silicon VLSI processes, including a simple universal parallel architecture, and a more conventional RISC-style CPU. With present technology, the number of devices in an inexpensive machine is not yet so great as to yield benefits in raw performance per unit cost from the use of reversible technology. However, even today, reversible technology can yield enormous increases in performance per unit power, which may have near-term practical applications in contexts such as mobile, embedded, or space-based systems where power limitations are critical.

Finally, an interesting long-term insight gained from our work on reversible computing is that the underlying physical fact of reversibility cannot be efficiently hidden away at a low level, and instead, for optimal scalability of computation to ever-larger numbers of processing elements, reversibility must eventually be propagated all the way up to the level of the instruction-set architecture, programming language, and application algorithms. So, a large fraction of the field of computer science will eventually need to evolve to accomodate a significant concern with reversibility, if we are truly to approach the ultimate physical limits of computing.

Proposed Plan of Research

The aim of the proposed research is fourfold:

To continue to refine our theoretical understanding of the fundamental physical limits of computation;
To continue our theoretical and design work on models of computation, computer architectures, programming languages, and algorithms that are explicitly designed with the intent to attain the maximum asymptotic performance that is compatible with physical law, within competitive constant factors;
To continue our work on developing implementations of our designs that are cost-effective for near-term specialized applications in low-power computing and densely-packed supercomputing;
To use our insights from the physics of computation to help guide the widespread, longer-term efforts to develop a cost-effective, general purpose nano-scale computing technology to replace the MOSFET transistor, once that device has reached its physical limits.

Here are my plans for working towards these four goals, in more detail.

1. Fundamental physical limits.

This part of the work will involve an ongoing personal effort, with assistance from my colleagues, to understand more, and at a deeper level, about the fundamental implications that physics has for computation (and vice-versa). This is a long-term effort. Currently evident aspects of this effort include:

Quantum computing. It is important for this effort to continue to keep up with the latest developments in quantum computing (theoretical & experimental). If and when practical, scalable quantum computing starts to look feasible, it should be included in our model, and we should focus more effort on developing quantum coherent algorithms.

Entropy density. We would like to learn more about the fundamental quantum limits on entropy/information densities. Presently understood bounds [,] are rather loose and approximate. We would like to know more accurately how high of an information density can be realistically achieved, as a function of energy density.

Relationship between physical entropy and computational information. We think we have gained significant new insights into the meaning of physical entropy as a result of our work in reversible computation. Specifically, physical entropy can be simply defined, for all practical purposes, as the amount of physically-expressed information in a system that (for one reason or another) cannot be uncomputed. (Where "uncomputing" refers to any process that clears some redundant, encoded information by performing, in reverse, a series of physical steps that would have generated that information.) We have been invited by the journal Entropy to contribute a paper to discuss this viewpoint in greater depth.

Relativistic quantum models. I would like to better understand the computational implications, if any, of relativistic quantum physics (QED, QCD, etc.). To this end I will educate myself further about these fields.

General relativity. We would also like to attain a better understanding of the (very long-term) computational implications of general relativity. Gravitational limits are the only fundamental physical constraint on computation that we know of that is not yet well reflected in our models. (Of course, gravitational effects are not expected to become important any time soon, so this task has a relatively low priority.)

2. Physics-based computing models and architectures.

The second major goal of this research is to design and develop optimally efficient and effective computing architectures, at all levels from theoretical models through processor architectures, languages, and algorithms. This work includes:

Model building and dissemination. Although the basic structure of our physical models of computing has already been defined [], more work remains to be done in formalizing the models more thoroughly, and especially in disseminating what we have learned about the advantages of these types of models to the wider theoretical computer science community, with more publications in more widely-read journals.

Asymptotic advantages of reversible models. Though we have published two papers and a thesis chapter [,] dealing with this topic already, some of our latest results in this area have not yet been journaled, and all of these results need to be cleaned up and presented more formally and rigorously. Some further extensions of the scaling theory are also possible.

Theoretical bounds on overhead from reversibility. The polynomial asymptotic performance advantages of reversible models, which arise from thermal constraints, are attenuated by small algorithmic overheads that are generally incurred in reversible computing. We have made significant progress [,] towards establishing tight theoretical lower bounds on the worst-case amount of this overhead case, and we believe this work can be completed fairly readily. This will enable a more precise and definitive characterization of the maximum asymptotic speedup that is attainable through reversibility in the general case.

Reversible algorithms. Regardless of the general case, reversibility yields even greater speedups for many specific classes of problems for which highly efficient reversible algorithms exist []. This line of work will develop efficient reversible equivalents for a wide variety of important conventional algorithms.

Reversible hardware architectures. There is of course the issue of coming up with an efficient hardware architecture, with a convenient programming model. Conventional irreversible architectures are still evolving rapidly, with many interesting recent innovations such as reconfigurable [] and processor-in-memory [] architectures. We foresee an ongoing effort to develop efficient partially- to fully-reversible versions of the best ideas in conventional irreversible architectures.

Reversible programming languages. Similarly, we would like to develop asymptotically efficient reversible variants of the best new ideas in programming languages. We note that a large part of the challenge of developing an asymptotically efficient programming language is to develop an efficient & convenient language for expressing parallel computations. This is a problem that to our knowledge the parallel computing community has not yet solved in a fully satisfactory way. Our work will aim to integrate our insights in the physics of computing with the parallel programming challenge. (For example, perhaps an effective reversible, parallel programming language would be similar to a language for specifying an assembly-line process in a physical factory. This makes sense because information, like material objects, cannot just appear and disappear, but rather can be moved from place to place, assembled (computed), and disassembled (uncomputed).)

3. Development of Practical Near-Term Applications.

There are a number of interdisciplinary challenges that remain for bringing reversible computing ideas to practical fruition, even for the more near-term applications such as ultra-low-power computing in energy-limited environments, and for densely-configured supercomputing. These challenges include:

Power supplies. Develop a high-Q resonant oscillator with the right characteristics for driving MOSFET-based reversible ("adiabatic") circuits. Currently we think that MEMS-based mechanical oscillators look like the best approach, as they can attain Q's in the tens of thousands, and sufficiently high frequencies, and are not too large. I am currently collaborating with MEMS experts Toshi Nishida (in UF's ECE department) and Sam Miller (at Sandia National Labs) to work out the details of this approach, and also I am looking at analog designs with power electronics expert Khai Ngo, also in ECE.

Leakage bounds. More accurately characterize expected leakage characteristics of future MOSFET technology generations to determine upper bounds on the feasible performance/power ratio of adiabatic technology. (Leakage is an important limiting factor on all low-power MOSFET technologies.) Current estimates are that adiabatic technology will enable substantial low-power advantages throughout at least the next 10 years of MOSFET technology, but for a more precise forecast, I am talking with researchers who are involved in producing the ITRS (International Technology Roadmap for Semiconductors).

Packaging. For dense supercomputing applications, better 3-D packaging and SOI (silicon-on-insulator) technology needs to be developed to enable large numbers (hundreds to thousands) of layers of reversible circuitry to be cost-effectively integrated into modestly-sized packages. Sandia National Labs also has experience in that area, and I am talking with Thomas Fischer, head of their packaging division, about working together on this problem.

RAMs. Good, dense adiabatic RAMs with long-term data retention need to be developed. Adiabatic SRAM and DRAM designs exist, but the SRAMs are not very dense, and the DRAMs may have problems with refresh rates. I am talking with Joe Brewer, expert in nonvolatile RAMs, about alternative ideas in this area.

Functional elements. Work needs to be done on developing more efficient adiabatic versions of critical functional circuit elements such as adders, multipliers, etc. This will involve developing hardware equivalents of known efficient reversible algorithms. This work will be done in collaboration with Bill Eisenstadt, a VLSI circuits expert in UF's ECE department.

Architecture & algorithms. The reversible architecture and algorithms work mentioned in the previous section will of course also contribute to this effort, although practical near-term reversible architectures need not be fully reversible.

Applications. Last but not least, more work is needed to identify specific, suitable near-term applications for this technology. In small systems, the greater energy efficiency of the adiabatic approach comes with a price tag of somewhat lower logic-hardware efficiency. Therefore, it is not cost-effective except in applications where total costs are limited more by energy-related costs than by the cost of the logic hardware itself. Identifying such applications and quantifying the improvements in cost-efficiency that can be attained is of course critical to the development of reversible computing as a practical near-term technology for the low-power niche. I am drawing upon a number of university and industry contacts for more information about application requirements in this area.
Experimental validation. In addition to creating the above-mentioned designs & characterizing them theoretically & through simulation, we have a strong desire to thoroughly validate our work with physical experiments as we go along. This will ensure that our understanding of the relevant physical issues does not miss anything critical. All of the above design elements, except for the new packaging technology, can be implemented straightforwardly using academically available design tools and lithographic processes. (For advanced packaging, we will rely more on collaborations with industrial partners such as Sandia.) Testing of components and systems can primarily proceed in a typical electronics workbench environment, although we would like to augment the electronic testing with thermal imaging and high-precision thermoelectrics-based calorimetry (such as described in []) to enable us to isolate parts of our system that are dissipating too much power, and also to accurately measure the dissipation in subsystems that are operating in an extremely low-power regime.

4. Help guide efforts towards nano-scale computing technologies.

Part of this work will involve keeping up with the latest research developments in all the promising areas dealing with potential future nano-scale computing mechanisms, for example:

Nanoelectronic structures (quantum dots, resonant tunning transistors, hybrid structures) []
Superconducting electronics []
Molecular/nanotube electronics []
Nano-mechanical computing []
and also various advanced cooling technologies.

Of course, this list will change as promising new technologies arise. The purpose of studying the arising technologies is to enable accurate forecasting of their parameters that are relevant to computation, which will enable us to more accurately determine the size/cost scale of machines above which our new asymptotically optimal architectures will begin to dominate. Thus we will be better able to anticipate when will be the best time to focus our maximum effort towards pushing these novel architectures into the mainstream.

In addition to influencing computer architecture, there is an important need to disseminate to researchers who are working on new low-level device technologies some of the important insights about device design that have been learned from studies of physical models of computing. These insights include the following:

Any competitive nano-scale computational device must be designed to support an asymptotically thermodynamically reversible ("adiabatic") mode of operation, or else it will not be competitive with devices that do, except in very small computing systems.

A key parameter determining the overall computational efficiency of a reversible device is something which we call the entropy coefficient - the ratio between the device's entropy generation per operation, and its frequency of operation (this ratio is roughly constant with frequency in many asymptotically reversible mechanisms). This parameter should be minimized. We don't know of any fundamental lower bounds as to how small it may be.

The entropy generation of the device when it is operated irreversibly should also be minimized. (Though there is an unavoidable lower bound on this parameter, namely k_B ln 2 per bit erased.)

The minimum rate of leakage of a device's energy out of its intended state space is another critical technological parameter. If this parameter can not be made insignificantly small, then it sets an upper bound on the scalability of 3-D architectures, even if they are reversible. We don't know of any fundamental lower bounds on this parameter. In fact, we know that in both large MOSFET devices, and nano-mechanical molecular devices [], it can indeed be made insignificantly small. But it is critical that the developers of other candidate nano-scale technologies also pay attention to this parameter, or they will find themselves uncompetitive for massively parallel computing. Also, this parameter must be kept below a certain bound [] in order for a technology to be a potential candidate for supporting scalable quantum coherent computing.

We believe there is a valuable role to be played in making sure that applied physicists and device technologists who are working on future nano-scale computing mechanisms are well aware of these issues, as it will help enable machines based on these mechanisms to be as scalable as possible.

Conclusion

In summary, I am planning an ambitious, multifaceted research career, pursuing an interdisciplinary but unified set of scientific and engineering studies that are expected to have a significant impact across many areas of computer engineering and computer science, and that ultimately will help computing performance to continue to improve as rapidly as possible, for as long as possible, so as hopefully to foster continuing national and international prosperity.

I am preparing now to submit NSF proposals to continue the significant start that I have already made towards this goal during the course of my Ph.D. work and in my first year as a faculty member at UF. I anticipate making even more rapid progress in the above-outlined areas over the next few years, as I recruit graduate students, continue to build my industry & university collaborations, and publish a variety of journal articles describing this work (I have already been invited to submit articles relating to this work to four different journals). Over time, I hope that the continuation of this project, together with its network of collaborators, will become an important nexus in the worldwide research effort pushing boldly onwards towards the ultimate physical limits of computation.

I hope that the NSF and its directorates will agree with me that the proposed research program represents a significant potential long-term scientific contribution towards the future of our technology and of our civilization, and that the time is now ripe to pursue it.

Maximally Scalable Computing via Physics-Based Models & Architectures (working title) ABSTRACT for SGER and/or CAREER proposal Michael P. Frank CISE Department University of Florida June 6, 2000