Efficient 3D Perception for Autonomous Vehicles (Zhijian Liu)

Описание к видео Efficient 3D Perception for Autonomous Vehicles (Zhijian Liu)

Autonomous vehicles rely on 3D perception to understand their surrounding environment. Although there has been remarkable progress in enhancing the accuracy of perception models, their efficiency still lags behind real-time performance, impeding their use in real-world applications.

In this guest lecture at Penn hosted by Prof. Rahul Mangharam (‪@RealTimemLAB‬), I presented our recent work, BEVFusion (ICRA 2023), which facilitates efficient multi-task multi-sensor fusion by unifying camera, LiDAR, and radar features in a shared bird's-eye view (BEV) space. We addressed an efficiency bottleneck by accelerating the key view transformation operator by 40 times. BEVFusion achieved the leading solution on three popular 3D perception benchmarks, including nuScenes, Argoverse, and Waymo, across different tasks, such as object detection, object tracking, and map segmentation. It has received more than 1k GitHub stars since its release.

Subsequently, I discussed two of our latest works, FlatFormer (CVPR 2023) and SparseViT (CVPR 2023), which aim to accelerate 2D image and 3D point cloud backbones for perception. FlatFormer is an efficient point cloud transformer that attains real-time performance on edge GPUs and is faster than sparse convolutional methods while retaining superior accuracy. SparseViT explores the idea of spatial sparsity in the context of 2D image transformers and delivers a 1.5 times measured speedup compared with its dense counterpart without compromising accuracy.

Комментарии

Информация по комментариям в разработке