MLIR-based code generation for GPU tensor cores

Описание к видео MLIR-based code generation for GPU tensor cores

Abstract:
The state-of-the-art in high-performance deep learning today is primarily driven by manually developed libraries optimized and highly tuned by expert programmers using low-level abstractions with significant effort. This effort is often repeated for similar hardware and future ones. We pursue and evaluate the more modular and reusable approach of using compiler IR infrastructure to generate libraries by encoding all the required optimizations as a sequence of transformations and customized passes on an IR. We believe that until the recent introduction of MLIR (Multi-level intermediate representation), it had been hard to represent and transform computation at various levels of abstraction within a single IR. Using the MLIR infrastructure, we build a transformation and lowering pipeline to automatically generate near-peak performance code for matrix-matrix multiplication (matmul) as well as matmul fused with simple pointwise operators targeting tensor cores on NVIDIA GPUs. On a set of problem sizes ranging from 256 to 16384, our performance evaluation shows that we can obtain performance that is 0.95X to 1.19X and 0.80X to 1.60X of cuBLAS for FP32 and FP16 accumulate respectively on NVIDIA’s Ampere based Geforce RTX 3090. Furthermore, by allowing the fusion of common pointwise operations with matrix-matrix multiplication, we obtain performance ranging from 0.95X to 1.67X of a cuBLAS-based implementation. Additionally, we present matmul-like examples such as 3-d contraction and batched matmul, which the pipeline can efficiently handle while providing competitive performance. We believe that these results motivate further research and engineering on automatic domain-specific library generation using compiler IR infrastructure for similar specialized accelerators.

Paper: https://dl.acm.org/doi/10.1145/349777...

Speaker Bio:
Navdeep Katel is a Senior Software Engineer at PolyMage Labs, focussing on code generation for GPUs using MLIR. He obtained his Master's (Research) degree in Computer Science and Engineering at the Indian Institute of Science (IISc) in 2021. Prior to IISc, he obtained his Bachelors from U.I.E.T Panjab University in 2019. At IISc, he was part of the Multicore Computing Lab where he researched automatic code generation targeting accelerators such as GPUs, including tensor cores on NVIDIA GPUs

LinkedIn:   / navdeepkumarkatel  

Meetup Link: https://www.meetup.com/Bangalore-Comp...

Комментарии

Информация по комментариям в разработке