Advancements in VLSI has made it attractive to package multiple processors into a single multichip
or a board module. There is an increasing trend towards using such processor-clusters in
large multiprocessor design. Past research on designing processor-cluster based systems has focused
mainly in studying the packaging technologies affecting the inter-cluster network. To make
processor-cluster based multiprocessor design more attractive, there is a strong need to understand
the details about the topology inside the cluster, its memory organization, and the impact of this
organization on system performance. In this paper we focus on such aspects of processor-cluster
design with an overall objective to support a logically shared address programming model. We analyze
the communication costs for accesing inter-cluster and intra-cluster memories under different
cluster organizations. The merits of these organizations are evaluated based on the performance
of collective communication algorithms, which occur frequently in appplications. In this paper
we focus on implementing the broadcast collective communication algorithm, Umesh, on clustered
systems. Our results indicate that cluster organizations like bus and crossbar which allow memory
inside a cluster to be accessed without messaging overheads, outperform other organizations because
of faster intra-cluster access. We also demonstrate that such faster access can be exploited to
design better algorithms on clustered systems. We propose a new algorithm - clus mesh for broadcasting
on clustered meshes. For reasonably faster communication within clusters, this algorithm
can outperform the existing umesh algorithm by upto 20%.
0 comments:
Post a Comment