Design Methodologies for Exascale Reconfigurable Data-Center-on-a-Chip Architectures

The heterogeneity, dynamicity and variability of data- and computation-intensive tasks of big data applications call for massively parallel architectures. Networks-on-Chip (NoC) offer a promising approach for dealing with the resource-intensive tasks (processing, communication, storage). However, relying on simple networking principles to address the chip-level interconnection problems (e.g., data rate throughput, network delay, power consumption, fault-tolerance) does not adequately tackle the hardware-software interactions (e.g., traffic dynamics patterns). Moreover, it is not possible to increase the size and NoC dimensions in the deep-submicron domain with the aim of achieving higher performance since wire-based interconnects imply long propagation delays and higher energy consumption. To provide ultra-fast high-fidelity terabyte (TB) communication and exploit the massive fine-grained parallelism for exascale applications, we develop novel design methodologies for data-center-on-a-chip (DCoC) architectures: (i) We propose a user-cooperated network coding (NC) NoC consisting of cooperation units, a corridor routing algorithm to support the NC-based multicast and an adaptive flit dropping scheme to avoid network congestion and save power. This communication strategy offers a 100X improvement in throughput over traditional approaches. (ii) To overcome the inefficiencies associated with the spatio-temporal variability of big data applications, we propose a general mathematical framework for reconfigurable NoC design and runtime optimization. This analytical formalism can be applied to arbitrary network topologies and sizes, routing, or heterogeneous resource allocation problems. Mathematical investigation of the NoC reconfigurability problem allows us to design a dynamic resource allocation algorithm, which guarantees the attainment of optimality region. (iii) To reduce the complexity of the design-space exploration of large-scale NoCs, we formulate a mathematical benchmark synthesis framework that not only allows the extraction of dynamical task dependencies of big-data applications and synthesizes traffic workloads spatio-temporally consistent with realistic traffic behavior, but can also be easily scaled via complex network inspired algorithms for generating large scale benchmarks while preserving key structural features that governs application communication behaviors. This enables the design of DCoC architectures consisting of thousands of cores.