A method for prediction execution time of GPU programs
( Pp. 38-45)

More about authors
Kleimenov Andrey Anatolievich aspirant fakulteta vychislitelnoy matematiki i kibernetiki
Lomonosov Moscow State University Popova Nina Nikolaevna kandidat fiziko-matematicheskih nauk; docent fakulteta vychislitelnoy matematiki i kibernetiki
Lomonosov Moscow State University
For read the full article, please, register or log in
Abstract:
The use of coprocessors such as GPU and FPGA is a leading trend in HPC. Therefore a lot of applications from a wide variety of domains were modified for GPUs and successfully used. In this paper, we propose an approach for prediction execution time of CUDA kernels, based on a static analysis of a program source code. The approach is based on building a CUDA core model and a graphics accelerator model. The developed method for estimating the execution time of CUDA kernels is applied to the implementation of matrix multiplication, the Fourier transform and the backpropagation method for training neural networks. As a result of verification, the approach showed good prediction accuracy, especially on low GPU loads.
How to Cite:
Kleimenov A.A., Popova N.N., (2021), A METHOD FOR PREDICTION EXECUTION TIME OF GPU PROGRAMS. Computational Nanotechnology, 1: 38-45. DOI: 10.33693/2313-223X-2021-8-1-38-45
Reference list:
Alavani G., Varma K., Sarkar S. Predicting Execution Time of CUDA Kernel Using Static Analysis. IEEE Intl. Conf. on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications. 2018. Pp. 948-955. URL: ISPA/IUCC/BDCloud/SocialCom/SustainCom
Arafa Y. et al. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs. IEEE High Performance Extreme Computing Conference (HPEC). 2019. Pp. 1-8.
Baghsorkhi S.S. et al. An adaptive performance modeling tool for GPU architectures. ACM SIGPLAN Not. 2010. Vol. 45. No. 5. Pp. 105-114.
Bakhoda A. et al. Analyzing CUDA workloads using a detailed GPU simulator. IEEE International Symposium on Performance Analysis of Systems and Software. 2009. Pp. 163-174.
Che S. et al. Rodinia: A benchmark suite for heterogeneous computing. IEEE International Symposium on Workload Characterization (IISWC). 2009. Pp. 44-54.
Collange S. et al. Barra: A Parallel Functional Simulator for GPGPU. IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. 2010. Pp. 351-360.
Hlavac M. FFT-cuda Electronic resource . URL: https://github.com/mmajko/FFT-cuda (data obrashcheniya: 12.01.2021).
Hong S., Kim H. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Comput. Archit. News. 2009. Vol. 37. No. 3. S. 152-163.
Jia W., Shaw K.A., Martonosi M. Stargazer: Automated regression-based GPU design space exploration. IEEE International Symposium on Performance Analysis of Systems Software. 2012. Pp. 2-13.
Jia Z. et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. arXiv. 2018.
Konstantinidis E., Cotronis Y. A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parallel Distrib. Comput. 2017. Vol. 107. Pp. 37-56.
Lattner C., Adve V. LLVM: A compilation framework for lifelong program analysis transformation. International Symposium on Code Generation and Optimization, 2004. 2004. Pp. 75-86.
Malhotra G., Goel S., Sarangi S.R. GpuTejas: A parallel simulator for GPU architectures. 21st International Conference on High Performance Computing, HiPC 2014. 2014.
Mei X., Chu X. Dissecting GPU Memory Hierarchy Through Microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 2017. Vol. 28. No. 1. Pp. 72-86.
Sim J. et al. A performance analysis framework for identifying potential benefits in GPGPU applications. In: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP 12. New York, USA: ACM Press, 2012. P. 11.
Wu G. et al. GPGPU performance and power estimation using machine learning. IEEE 21st International Symposium on High Performance Computer Architecture (HPCA). 2015. Pp. 564-576.
Zhang Y., Owens J.D. A quantitative performance analysis model for GPU architectures. IEEE 17th International Symposium on High Performance Computer Architecture. 2011. Pp. 382-393.
Kleymenov A.A., Popova N.N. Staticheski-determinirovannyy metod prognozirovaniya dinamicheskikh kharakteristik parallel nykh programm // Vestn. YUUrGU. Ser.: Vych. matem. inform. 2021. T. 10. № 1. S. 20-31. Kleymenov A.A., Popova N.N. A method for prediction dynamic characteristics of parallel programs based on static analysis. Bulletin of the South Ural State University. Series: Computational Mathematics and Software Engineering. 2021. Vol. 10. No. 1. Pp. 20-31. (In Russ.)
Nvidia GeForce GTX 1050 Electronic resource . URL: https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1050/ (access date: 12.01.2021).
CUDA C Programming Guide Electronic resource . URL: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html (access date: 12.01.2021).
Keywords:
performance analysis, CUDA-kernel, static analysis, GPU model.