Scalable instruction-level parallelism through tree-instructions

We describe a representation of instruction-level parallelism which does not
require checking dependencies at run-time, and which is suitable for processor
implementations with varying issue-width. In this approach, a program is
represented as a sequence of tree-instructions, each containing multiple
primitive operations and executable either in one or multiple cycles, but
which do not require a specific processor organization. Tree-instructions
are generated assuming an implementation capable of large instruction-level
parallelism. During instruction cache reloading/accessing, tree-instructions
are decomposed into subtrees which fit the actual resources available in an
implementation, and which require a simple instruction-dispatch mechanism as
in the case of statically scheduled processors. The representation makes
practical the use of the same parallelized code in implementations with
different issue-width; simulation results indicate that the instruction-level
parallelism achievable with this approach degrades less than 10% with respect
to code compiled for each specific implementation.

By: Jaime H. Moreno and Mayan Moudgill

Published in: RC20661 in 1996


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

Questions about this service can be mailed to .