Power and Performance Optimization at the System Level

The BlueGene/L supercomputer has been designed with a focus on power/performance effciency to achieve high application performance under the thermal constraints of common data centers. To achieve this goal, emphasis was put on system solutions to engineer a power-efficient system. To exploit thread level parallelism, the BlueGene/L system can scale to 64 racks with a total of 65536 computer nodes consisting of a single compute ASIC integrating all system functions with two industry-standard PowerPC microprocessor cores in a chip multiprocessor configuration. Each PowerPC processor exploits data-level parallelism with a high/performance SIMD floating point unit.

To support good application scaling on such a massive system, special emphasis was put on efficient communication primitives by including five highly optimized communication networks. After an initial introduction of the BlueGene/L system architecture, we analyze power/performance efficiency for the BlueGene system using performance and power characteristics for the overall system performance (as exemplified by peak performance numbers).

To understand application scaling behavior, and its impact on performance and power/performance efficiency, we analyze the NAMD molecular dynamics package using the ApoA1 benchmark. We fid that even for strong scaling problems, BlueGene/L systems can deliver superior performance scaling and deliver significant power/performance efficiency. Application benchmark power/performance scaling for the voltage-invariant energy*delay² power/performance metric demonstrates that choosing a power-efficient 700MHz embedded PowerPC processor core and relying on application parallelism was the right decision to build a powerful, and power/performance efficient system.

By: Valentina Salapura; Randy Bickford; Matthias Blumrich; Arthur A. Bright; Dong Chen; Paul Coteus; Alan Gara; Mark Giampapa; Michael Gschwind; Manish Gupta; Shawn Hall; Ruud A. Haring; Philip Heidelberger; Dirk Hoenicke; Gerard V. Kopcsay; Martin Ohmacht; Rick A. Rand; Todd Takken; Pavlos Vranas

Published in: RC23555 in 2005


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to reports@us.ibm.com .