The Parallel Recursion Project


Welcome to the Parallel Recursion Project Page. Please skip to the  Introduction for more information.

NOTE:

March 20, 2000: A new version of PARP has been completed recently. This web page will be updated shortly. Please check back soon.

surface.jpg
The figure shows the magnitudes and distribution of the components of  the 100th transformation vector (tridiagonal basis vector) for a two-dimensional square lattice. The x-y plane in the figure corresponds to the lattice plane, the magnitude of the components is plotted along the z-axis and is also color-coded.

Introduction

The PaRP (Parallel Recursion Project) is a software package for materials science numerical simulations on high-performance computing platforms. It is based on the dynamic recursion method which is a method suited for physical systems that can be expressed in terms of sparse matrices (Hamiltonians). The latter is possible for all physical systems whose interactions are short-ranged in real space.

We have developed a high-performance extensible simulation engine in C++ that currently uses threads to exploit the power of parallel execution on shared-memory multi-processors. In the future, we will also incorporate message-passing interfaces to support modern hybrid platforms consisting of clusters of shared-memory multi-processors.

Our primary objective has been to perform first-principles calculations of microscopic quantities on systems of macroscopic size, such as 1015--and this is only the beginning!

The dynamic recursion method is another player in the world of simulation tools. As such is coexists with methods like molecular dynamics, Monte Carlo, fluid dynamics, etc. but is not directly related to these. Despite numerous recent advances in software and hardware, many of these tools struggle to the present day with the seemingly hopeless problem of 1023 coupled differential equations in macroscopic condensed-matter systems. With PaRP, the dynamic recursion method allows us for the first time to bridge the gap between microscopic and macroscopic worlds in a numerical context.

The Problem, Motivation, and Physics

This kind of potential is necessary to be prepared for tomorrow's applications in physics. One of the coming challenges in numerical physics is the quest for correlated systems. Many interesting phenomena arise from correlations and interactions between electrons, ions, excitons, etc. Examples are superconductivity, the fractional quantum Hall effect, metal-insulator transitions, phase transitions, etc. The intrinsic difficulty with these systems is their enormous state-space or, in other words, the mindboggling number of configurations of the system that must be accounted. Consider that a system with 100 electronic states has already 2100 distinct configurations. To store all configurations, one would need 2100 bits (= 257TByte ~ 1017TByte)!! Without approximations, this is obviously beyond the scope of what any numerical method (or analytic for that matter--except maybe for a number of narrow special cases) can be reasonably expected to achieve.

What to do? The obvious thing of limiting the number of electrons (or particles, in general) included is hardly a viable option--not only because it has been done in times when computers were more wimpy--but also because of other reasons: On the one hand, the probabilistic nature of quantum mechanics usually prevents a simple extrapolation from microscopic to mesoscopic or macroscopic systems. On the other hand, the smallest experimentally accessible systems in solid state physics, such as quantum dots, still contain on the order of thousands of electrons.

Therefore, clever approximations are needed to cope with sheer size of the task. The dynamic recursion has auch an approximation built-in, in that it stresses local quantities over global quantities. Local quantities are most strongly dominated by the immediate surroundings of the site (or state) of interest, e.g. in a lattice.Whatever happens far away from that site of interest only weakly influences the local quantity and can be neglected altogether beyond a certain distance. This lies at the heart of the recursion method and is sometimes also called "black body theorem," in reminiscence of the fact that the electro-magnetic mode density in a cavity is insensitive to the actual shape of the cavity.

An interesting side-effect of this "black-body theorem" is that numerical errors, which are inevitably incurred with finite precision computer arithmetic, do not grossly distort the result because their influence, too, decays with distance and does so at an exponential rate. Dynamic recursion takes the recursion method one step further by systematically exploiting this insensitivity to errors. It is then possible to omit small contributions to the local quantity without introducing run-away errors. Thereby, sticking with our example, not all the 2100 electronic states must be represented in computer memory at all times, but only the ones with "large" contributions. This is what makes these problems tractable.


The Objectives, Solutions, and Computer Science

When developing a large software package or library, one of the goals should be to make it accessible to other users and application programmers. With the recursion method, this has been tricky. In the past, a number of applications based on the recursion method have been developed and a library of recursion-related routines, the Cambridge Recusion Library (Fortran) exists. As a result of the recursion method's being a rather general purpose technique with wide applicability, it is difficult to design a library which satisfies all possible applications for different materials, Hamiltonians, models, etc. In fact, in the past library support has therefore been limited to helper routines, with the application programmer bearing the brunt of the work, namely writing the central routines for storing the Hamiltonian matrix and the associated transformation vectors and for multiplying them. This was not only time-consuming but it also required considerable programming skill on the part of the physicist wishing to employ the recursion method for a specific simulation. Coincidentally, the user-supplied code is the most CPU-time and memory intensive portion, thus critically affecting performance.

A new implementation was therefore desired that remedies these deficiencies and also supports the latest high-performance hardware architectures. From a computer science point of view, our objectives can be summarized as follows:

We have developed an initial prototype in C++ using the pthread library for thread-based concurrent execution on shared-memory systems (currently SGI Power Challenge). Today's highest-performance platforms, however, consist of clusters of shared-memory multiprocessors interconnected with high-speed links (HIPPI). This unique architecture requires a thread-based approach inside each shared-memory machine and a message-passing strategy between machines. At the same tim, one has to ensure that the memory affinity of data structures is high, because if ignored effects like network latency, non-uniform memory access and cache misses will all degrade performance to the point where muli-processing gains are wiped out.

Our future efforts will focus on supporting these hybrid architectures, e.g. SGI array systems or Linux clusters. One option is to add MPI support for communication between machines. The other option is to rewrite the code using existing portable threads and communications packages such as Tulip, SMARTS and Nexus. In addition, we are evaluating to what extent we can adapt our interfaces to conform to standard scientific applications frameworks such  POOMA. to scripting interfaces such as SILOON, or to component models which are supported by projects such as Ligature.

In past and present work, the value of visualization of recursion vectors has been underappreciated. In the figure above, which is a first attempt at displaying the recursion vectors, we notice patterns that are reminiscent of interference, but their interpretation is as yet unclear. We believe important physical insights may be gained from visualization. For example, in disordered systems that we have investigated this interference-like pattern breaks down and grows at a disorder-dependent rate. Our first steps towards visualization have been done by gathering data from the parallel application on one node and dumping them to disc. These data are then processed and displayed in a second stage with differenct applications. The amount of data generated this way is enormous (several hundred MB for only 200 levels--corresponding to a 3D sample of 400*400*400 sites), so that this procedure is too crude to handle larger simulations. In order to follow a recursion simulation run in real time and step-by-step, we would need to employ high-performance parallel visualization tools such as the ACL visualizaton tools with parallel point-to-point transfers (e.g. PAWS). This avenue, although promising and interesting, is currently not pursued, mostly due to a lack of manpower.

Papers

Talks & Presentations

Contributors and Collaborators

Wolfram T. Arnold, Roger Haydock.
We wish to thank Sameer Shende (TAU) for invaluable input and support, Janice Cuny and John Conery for useful discussions.

Thank you for visiting. Check back frequently for updates.
If you have comments or suggestions, email me at wolfram@darkwing.uoregon.edu

Last modified: Mon Mar 20 09:55:01 PST 2000