Making the Work-Efficient Parallel Prefix Sum Do Less Work

Описание к видео Making the Work-Efficient Parallel Prefix Sum Do Less Work

The prefix sum (cumulative sum) algorithm can be accelerated for parallel processing through various algorithms, including the work-efficient algorithm in which the calculation is performed in "up sweep" and "down sweep" stages.

A common use of the prefix sum is to aid in selecting a random element from an array of probabilities. The cumulative sum of the array is generated, and then an element is chosen such that it is the first element with a value greater than a random fraction of the total.

In this talk, Stephen Sanderson from The University of Queensland discusses modifications to the work-efficient parallel prefix sum algorithm and the subsequent search algorithm are developed which avoid unnecessary work in this specific "sum and search" use case, thereby minimizing expensive memory transactions. Optimization steps for this algorithm are discussed in terms of NVIDIA CUDA® best practices, covering topics such as shared memory caching, bank conflict avoidance, and efficient memory access patterns.

Комментарии

Информация по комментариям в разработке