Efficient and Cross-Platform AI Inference Apps Using Rust and Wasm - Michael Yuan, WasmEdge

Описание к видео Efficient and Cross-Platform AI Inference Apps Using Rust and Wasm - Michael Yuan, WasmEdge

Efficient and Cross-Platform AI Inference Apps Using Rust and Wasm - Michael Yuan, WasmEdge

Today’s AI inference apps are primarily written in Python or C and then wrapped in a container or VM for cloud deployment. Those apps are heavyweight (esp with Python), not portable across CPU/GPU platforms, difficult for devs (esp with C), and very slow with Python-based data processing. Wasm has emerged as a strong alternative runtime for AI inference workloads. Developers write inference functions in Rust / JS / Python, and then run them in Wasm sandboxes. Wasm functions are tiny, fast, safe, and very easy to develop. They run without modification on almost any device/OS, and can automatically take advantage of the device’s CPU or GPU or other hardware accelerators. They are securely isolated for cloud-native deployment and can be managed by container tools. In this talk, we will start with the architecture of Wasm-based AI services. Then we will deep dive into how to create Pytorch and TF inference functions, as well as newer LLM frameworks such as GGML, in Rust and running these in Wasm. We will demonstrate complete examples using Google Mediapipe models and llama2 LLM models.

Комментарии

Информация по комментариям в разработке