Fundamentals of Accelerated Computing with CUDA Python
- Basic Python competency, including familiarity with variable types, loops, conditional statements, functions, and array manipulations
- NumPy competency, including the use of ndarrays and ufuncs
- No previous knowledge of CUDA programming is required
Certificate: Upon successful completion of the assessment, participants will receive an NVIDIA DLI certificate to recognize their subject matter competency and support professional career growth.
Hardware Requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated server in the cloud.
This workshop teaches you the fundamental tools and techniques for running GPU-accelerated Python applications using CUDA®GPUs and the Numba compiler.
- GPU-accelerate NumPy ufuncs with a few lines of code.
- Configure code parallelization using the CUDA thread hierarchy.
- Write custom CUDA device kernels for maximum performance and flexibility.
- Use memory coalescing and on-device shared memory to increase CUDA kernel bandwidth.
- Meet the instructor.
- Create an account at courses.nvidia.com/join
- Begin working with the Numba compiler and CUDA programming in Python.
- Use Numba decorators to GPU-accelerate numerical Python functions.
- Optimize host-to-device and device-to-host memory transfers.
- Learn CUDA’s parallel thread hierarchy and how to extend parallel program possibilities.
- Launch massively parallel custom CUDA kernels on the GPU.
- Utilize CUDA atomic operations to avoid race conditions during parallel execution.
- Learn multidimensional grid creation and how to work in parallel on 2D matrices.
- Leverage on-device shared memory to promote memory coalescing while reshaping 2D matrices.
- Review key learnings and wrap up questions.
- Complete the assessment to earn a certificate.
- Take the workshop survey.