Why GPUs Outpace CPUs?

Описание к видео Why GPUs Outpace CPUs?

A Deep Dive into Why GPUs Outpace CPUs - A Hands-On Tutorial

FLOPS is commonly used to quantify the computational power of processors and other computing devices. It is an important metric for tasks that involve complex mathematical calculations, such as scientific simulations, artificial intelligence and machine learning algorithms.

FLOPS stands for "Floating Point Operations Per Second" which means the number of floating-point calculations a computer system can perform in one second. The higher the FLOPS value, the faster the computer or processor can perform floating-point calculations, indicating better computational performance.

In this tutorial, let us use FLOPS as a metric to evaluate the performance of CPU versus GPU. We will begin by employing the DAXPY (Double-precision A*X plus Y) operation, a commonly used operation in numerical computing. This operation involves multiplying a scalar (A) with a vector (X) and adding the result to another vector (Y). We will calculate FLOPS to perform the DAXPY operation using both the CPU and GPU, respectively.

The DAXPY operation is executed using NumPy operations (A * X + Y). NumPy can leverage optimized implementations, and the actual computation may occur in optimized C or Fortran libraries. Therefore, a more effective way to compare speeds is by conducting matrix multiplications using TensorFlow. The second part of our code is designed to accomplish precisely this task. We will perform matrix multiplications of various-sized matrices and explore how the true advantage of GPUs lies in working with large matrices (datasets in general).

In the second part of this tutorial, we will verify the GPU speed advantage over CPU for different matrix sizes. The relative efficiency of the GPU compared to the CPU can vary based on the computational demands of the specific task.

In order to make sure we start with a common base line for each matrix multiplication task, we will clear the default graph and release the GPU memory. We will also disable the eager execution in TensorFlow for the matrix multiplication task. Please note that eager execution is a mode that allows operations to be executed immediately as they are called, instead of requiring them to be explicitly executed within a session. Eager execution is enabled by default in TensorFlow 2.x. By disabling eager execution, operations are added to a computation graph, and the graph is executed within a session.

Finally, Forget FLOPS, it's all about the memory bandwidth!!!

Memory bandwidth is a measure of how quickly data can be transferred between the processor (CPU or GPU) and the memory.

High memory bandwidth is crucial for tasks that involve frequent access to large datasets (e.g., deep learning training)

Memory bandwidth becomes particularly important when dealing with large matrices, as transferring data between the processor and memory efficiently can significantly impact overall performance.

Code used in this video is available here: https://github.com/bnsreenu/python_fo...

Original title: Why GPUs Outpace CPUs? (tips tricks 56)

Комментарии

Информация по комментариям в разработке