Routing and fault tolerance in Z-Fat tree

Mo Adda, Adamantini Peratikou

Research output: Contribution to journalArticlepeer-review

13 Downloads (Pure)

Abstract

Fat tree topologies have been extensively used as interconnection networks for high performance computing, cluster and data center systems, with their most recent variants able to fairly extend and scale to accommodate higher processing power. While each progressive and evolved fat-tree topology includes some extra advancements, these networks do not fully address all the issues of large scale HPC. We propose a topology called Zoned-Fat tree (Z-Fat tree,) which is a further extension to the fat trees. The extension relates to the provision of extra degree of connectivity to utilize the extra ports per switches (routing nodes), that are, in some cases, not utilized by the architectural constraints of other variants of fat trees, and hence increases the bisection bandwidth, reduces the latency and supplies additional paths for fault tolerance. To support and profit from the extra links, we propose an adaptive low latency routing for up traffic which is based on a series of leading direction bits predefined at the source; furthermore we suggest a deterministic routing by implementing a dynamic round robin algorithm that overtakes D-mod-K in same cases and guarantees the utilization of all the extra links. We also propose a fault tolerance algorithm, named recoil-and-reroute which makes use of the extra links to ensure higher message delivery even in the presence of faulty links and switches.
Original languageEnglish
Pages (from-to)2373-2386
Number of pages14
JournalIEEE Transactions on Parallel and Distributed Systems
Volume28
Issue number8
Early online date13 Feb 2017
DOIs
Publication statusPublished - 1 Aug 2017

Keywords

  • Adaptive routing
  • connectivity
  • deterministic routing
  • fault tolerance
  • fat-tree

Fingerprint

Dive into the research topics of 'Routing and fault tolerance in Z-Fat tree'. Together they form a unique fingerprint.

Cite this