We have developed a high-performance extensible simulation engine in C++ that currently uses threads to exploit the power of parallel execution on shared-memory multi-processors. In the future, we will also incorporate message-passing interfaces to support modern hybrid platforms consisting of clusters of shared-memory multi-processors.
Our primary objective has been to perform first-principles calculations of microscopic quantities on systems of macroscopic size, such as 10^{15}--and this is only the beginning!
The dynamic recursion method is another player in the world of simulation tools. As such is coexists with methods like molecular dynamics, Monte Carlo, fluid dynamics, etc. but is not directly related to these. Despite numerous recent advances in software and hardware, many of these tools struggle to the present day with the seemingly hopeless problem of 10^{23} coupled differential equations in macroscopic condensed-matter systems. With PaRP, the dynamic recursion method allows us for the first time to bridge the gap between microscopic and macroscopic worlds in a numerical context.
What to do? The obvious thing of limiting the number of electrons (or particles, in general) included is hardly a viable option--not only because it has been done in times when computers were more wimpy--but also because of other reasons: On the one hand, the probabilistic nature of quantum mechanics usually prevents a simple extrapolation from microscopic to mesoscopic or macroscopic systems. On the other hand, the smallest experimentally accessible systems in solid state physics, such as quantum dots, still contain on the order of thousands of electrons.
Therefore, clever approximations are needed to cope with sheer size of the task. The dynamic recursion has auch an approximation built-in, in that it stresses local quantities over global quantities. Local quantities are most strongly dominated by the immediate surroundings of the site (or state) of interest, e.g. in a lattice.Whatever happens far away from that site of interest only weakly influences the local quantity and can be neglected altogether beyond a certain distance. This lies at the heart of the recursion method and is sometimes also called "black body theorem," in reminiscence of the fact that the electro-magnetic mode density in a cavity is insensitive to the actual shape of the cavity.
An interesting side-effect of this "black-body theorem" is that numerical errors, which are inevitably incurred with finite precision computer arithmetic, do not grossly distort the result because their influence, too, decays with distance and does so at an exponential rate. Dynamic recursion takes the recursion method one step further by systematically exploiting this insensitivity to errors. It is then possible to omit small contributions to the local quantity without introducing run-away errors. Thereby, sticking with our example, not all the 2^{100} electronic states must be represented in computer memory at all times, but only the ones with "large" contributions. This is what makes these problems tractable.
A new implementation was therefore desired that remedies these deficiencies and also supports the latest high-performance hardware architectures. From a computer science point of view, our objectives can be summarized as follows:
We have developed an initial prototype in C++ using the pthread library for thread-based concurrent execution on shared-memory systems (currently SGI Power Challenge). Today's highest-performance platforms, however, consist of clusters of shared-memory multiprocessors interconnected with high-speed links (HIPPI). This unique architecture requires a thread-based approach inside each shared-memory machine and a message-passing strategy between machines. At the same tim, one has to ensure that the memory affinity of data structures is high, because if ignored effects like network latency, non-uniform memory access and cache misses will all degrade performance to the point where muli-processing gains are wiped out.
- Automatic storage scheme for Hamiltonian matrix and associated vectors
- Built-in matrix-vector-multiplication algorithm interfacing with this storage scheme
- Automatic partitioning of problem and data structures for concurrent execution.
- Definition of an API that hides details of matrix storage, multiplication routines and parallel execution.
- Extensiblity and adaptability to a wide range of physics problems while re-using existing code.
- Employing modern object-oriented (C++) facilities of encapsulation, inheritance, templates to achieve these objectives to produce highly adaptable, low-overhead and easy-to-read code.
Our future efforts will focus on supporting these hybrid architectures, e.g. SGI array systems or Linux clusters. One option is to add MPI support for communication between machines. The other option is to rewrite the code using existing portable threads and communications packages such as Tulip, SMARTS and Nexus. In addition, we are evaluating to what extent we can adapt our interfaces to conform to standard scientific applications frameworks such POOMA. to scripting interfaces such as SILOON, or to component models which are supported by projects such as Ligature.
In past and present work, the value of visualization of recursion vectors has been underappreciated. In the figure above, which is a first attempt at displaying the recursion vectors, we notice patterns that are reminiscent of interference, but their interpretation is as yet unclear. We believe important physical insights may be gained from visualization. For example, in disordered systems that we have investigated this interference-like pattern breaks down and grows at a disorder-dependent rate. Our first steps towards visualization have been done by gathering data from the parallel application on one node and dumping them to disc. These data are then processed and displayed in a second stage with differenct applications. The amount of data generated this way is enormous (several hundred MB for only 200 levels--corresponding to a 3D sample of 400*400*400 sites), so that this procedure is too crude to handle larger simulations. In order to follow a recursion simulation run in real time and step-by-step, we would need to employ high-performance parallel visualization tools such as the ACL visualizaton tools with parallel point-to-point transfers (e.g. PAWS). This avenue, although promising and interesting, is currently not pursued, mostly due to a lack of manpower.
Thank you for visiting. Check back frequently
for updates.
If you have comments or suggestions,
email me at wolfram@darkwing.uoregon.edu
Last modified: Mon Mar 20 09:55:01 PST 2000