Abstract
Fat tree topologies have been extensively used as interconnection networks for high performance computing, cluster and data center systems, with their most recent variants able to fairly extend and scale to accommodate higher processing power. While each progressive and evolved fat-tree topology includes some extra advancements, these networks do not fully address all the issues of large scale HPC. We propose a topology called Zoned-Fat tree (Z-Fat tree,) which is a further extension to the fat trees. The extension relates to the provision of extra degree of connectivity to utilize the extra ports per switches (routing nodes), that are, in some cases, not utilized by the architectural constraints of other variants of fat trees, and hence increases the bisection bandwidth, reduces the latency and supplies additional paths for fault tolerance. To support and profit from the extra links, we propose an adaptive low latency routing for up traffic which is based on a series of leading direction bits predefined at the source; furthermore we suggest a deterministic routing by implementing a dynamic round robin algorithm that overtakes D-mod-K in same cases and guarantees the utilization of all the extra links. We also propose a fault tolerance algorithm, named recoil-and-reroute which makes use of the extra links to ensure higher message delivery even in the presence of faulty links and switches.
Original language | English |
---|---|
Pages (from-to) | 2373-2386 |
Number of pages | 14 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 28 |
Issue number | 8 |
Early online date | 13 Feb 2017 |
DOIs | |
Publication status | Published - 1 Aug 2017 |
Keywords
- Adaptive routing
- connectivity
- deterministic routing
- fault tolerance
- fat-tree