New Kids on the Unblocking: Strategies to Overcome Blocking Networks

Caio Von Rondow Morais; Jeronimo Penha; José A. Nacif; Ricardo Ferreira

doi:10.5753/wscad.2023.235874

Caio Von Rondow Morais UFV
Jeronimo Penha UFV
José A. Nacif UFV
Ricardo Ferreira UFV

DOI: https://doi.org/10.5753/wscad.2023.235874

Resumo

Full-crossbar interconnections offer high parallel bandwidth, simple routing, and performance. However, their cost is prohibitive. Multistage networks have emerged as a cost-effective alternative, reducing the cost to O(n log(n)). This study addresses the challenges posed by large networks with 256 connections, where the configuration space expands exponentially. We present an approach that efficiently handles routing by reducing the extra stages. We reduce network cost by a factor of 2×. Our approach enables graph routing with up to 214 operators within the 256-connection multistage network, doubling performance compared to previous multistage CGRA frameworks.

Referências

Carvalho, W., Canesche, M., , Silva, L., Nacif, J., and Ferreira, R. (2020). A design exploration of scalable mesh-based fully pipelined accelerators. In IEEE ICFPT.

Choi, Y.-k., Chi, Y., Qiao, W., Samardzic, N., and Cong, J. (2021). Hbm connect: High-performance hls interconnect for fpga hbm. In ACM FPGA.

Ferreira, R., Vendramini, J., Pereira, M. M., and Carro, L. (2011). An fpga-based heterogeneous coarse-grained dynamically reconfigurable architecture. In Int conference on Compilers, architectures and synthesis for embedded systems CASES.

Gazit, I. and Malek, M. (1989). On the number of permutations performable by extra-stage multistage interconnection networks. IEEE trans on computers, 38(2).

Hu, Q., Shen, X., and Liang, W. (1996). Optimally routing lc permutations on k-extra-stage cube-type networks. IEEE transactions on computers, 45(1):97–103.

Katangur, A. K., Pan, Y., and Fraser, M. D. (2002). Message routing and scheduling in optical multistage networks using simulated annealing. In Parallel and Distributed Processing Symposium, International, volume 2, pages 8–pp. IEEE Computer Society.

Kong, X., Huang, Y., Zhu, J., Man, X., Liu, Y., Feng, C., Gou, P., Tang, M., Wei, S., and Liu, L. (2023). Mapzero: Mapping for coarse-grained reconfigurable architectures with reinforcement learning and monte-carlo tree search. In ISCA.

Lawrie, D. H. (1975). Access and alignment of data in an array processor. IEEE Transactions on computers, 100(12):1145–1155.

Lee, C. (1997). Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In IEEE MICRO.

Mei, B., Lambrechts, A., Mignolet, J.-Y., Verkest, D., and Lauwereins, R. (2005). Architecture exploration for a reconfigurable architecture template. IEEE Design & Test of Computers, 22(2):90–101.

Shen, X. (1995). Optimal realization of any bpc permutation on k-extra-stage omega networks. IEEE transactions on computers, 44(5):714–719.

Silva, L., Ferreira, R., Canesche, M., Penha, J., , and Nacif, J. (2019). Ready: A fine-grained multithreading overlay framework for modern cpu-fpga dataflow applications. ACM Transactions on Embedded Computing Systems (TECS), 18.

UCSB (2020). Bench:. https://web.ece.ucsb.edu/EXPRESS/benchmark/.

UFV (2023). Multistage. https://github.com/lesc-ufv/graphs.

Walker, M. J. (2019). Generic connectivity-based cgra mapping via integer linear programming. In Symp on Field-Programmable Custom Computing Machines (FCCM).

Yang, Y. and Wang, J. (1998). On blocking probability of multicast networks. IEEE Transactions on Communications, 46(7):957–968.