DF-DTM: explorando redundância de tarefas em Dataflow

  • Leandro Rouberte UFRJ
  • Alexandre Sena UERJ
  • Alexandre Nery UERJ
  • Leandro Marzulo UERJ
  • Tiago Alves UERJ
  • Felipe França UFRJ

Abstract


Instruction Reuse is a technique adopted in Von Neumann architectures that improves performance by avoiding redundant execution of instructions (or traces of instructions) when the result to be produced can be obtained by searching an input/output history table for such instruction. Those techniques, however, are yet to be studied in the context of the Dataflow model, which has been gaining traction in the high performance community, due to its inherent parallelism. This paper proposes an approach for reuse in Dataflow, called DFDTM (Dataflow Dynamic Task Memoization). Our technique supports reuse of individual nodes and subgraphs, which are analogous to instructions and traces, respectively. The potential of DF-DTM is evaluated by a series of experiments that analyze the behavior of redundant tasks in three relevant benchmark applications, resulting in reuse of up to 97% of the executed tasks.

References

Alves, T. A. O., Goldstein, B. F., França, F. M. G., and Marzulo, L. A. J. (2014). A minimalistic dataow programming library for python. In Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014 International Symposium on, pages 96–101.

Bosilca, G., Bouteiller, A., Danalis, A., Hérault, T., Lemarinier, P., and Dongarra, J. (2012). Dague: A generic distributed dag engine for high performance computing. Parallel Computing, 38(1-2):37–51.

da Costa, A. T., Franca, F. M. G., and Filho, E. M. C. (2000). The dynamic trace memoization reuse technique. In Parallel Architectures and Compilation Techniques, 2000. Proceedings. International Conference on, pages 92–99.

Duran, A., Ayguadé, E., Badia, R. M., Labarta, J., Martinell, L., Martorell, X., and Planas, J. (2011). Ompss: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters, 21:173–193.

Gajinov, V., Stipiíc, S., Eriíc, I., Unsal, O. S., Ayguadé, E., and Cristal, A. (2014). Dash: A benchmark suite for hybrid dataow and shared memory programming models: with comparative evaluation of three hybrid dataow models. In Proceedings of the 11th ACM Conference on Computing Frontiers, CF '14, pages 4:1–4:11, New York, NY, USA. ACM.

Giorgi, R. e. a. (2014). TERAFLUX: Harnessing dataow in next generation teradevices. Microprocessors and Microsystems, pages –.

Marzulo, L. A., Alves, T. A., França, F. M., and Costa, V. S. (2014). Couillard: Parallel programming via coarse-grained data-ow compilation. Parallel Computing, 40(10):661 – 680.

Michie, D. (1968). "Memo" Functions and Machine Learning. Nature, 218:19–22.

P. C. Gilmore, R. E. G. (1961). A linear programming approach to the cutting-stock problem. Operations Research, 9(6):849–859.

Pell, O., Mencer, O., Tsoi, K., and Luk, W. (2013). Maximum performance computing with dataow engines, pages 747–774.

Shibata, Y., Tsumura, T., Tsumura, T., and Nakashima, Y. (2014). An implementation of auto-memoization mechanism on arm-based superscalar processor. In System-on-Chip (SoC), 2014 International Symposium on, pages 1–8.

Sodani, A. and Sohi, G. S. (1997). Dynamic instruction reuse. In Computer Architecture, 1997. Conference Proceedings. The 24th Annual International Symposium on, pages 194–205.

Swanson, S., Michelson, K., Schwerin, A., and Oskin, M. (2003). Wavescalar. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 291–302.

Tsai, Y. Y. and Chen, C. H. (2011). Energy-efcient trace reuse cache for embedded processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 19(9):1681–1694.

Wozniak, J., Armstrong, T., Wilde, M., Katz, D., Lusk, E., and Foster, I. (2013). Swift/t: Large-scale application composition via distributed-memory dataow processing. In Cluster, Cloud and Grid Computing (CCGrid), 2013 13th IEEE/ACM International Symposium on, pages 95–102.
Published
2016-10-05
ROUBERTE, Leandro; SENA, Alexandre; NERY, Alexandre; MARZULO, Leandro; ALVES, Tiago; FRANÇA, Felipe. DF-DTM: explorando redundância de tarefas em Dataflow. In: SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (SSCAD), 17. , 2016, Aracajú. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2016 . p. 275-286. DOI: https://doi.org/10.5753/wscad.2016.14266.