This video provides a comprehensive demonstration of how you can run self-hosted models behind the AI Gateway. Self-hosting models is crucial for organizations looking for maximum data privacy and security, specific cost optimizations and having custom finetuned models.
First off, we start by understanding the key concepts in TrueFoundry platform that form the basis of deployments - clusters, workspaces, repositories and environments. Then, we explore "how" the deployment is enabled - via open-source model servers such as vLLM, SGLang and TensorRT-LLM to prevent any vendor lock-in with TrueFoundry - as well as the key role TrueFoundry platform plays in simplifying the infrastructure management with its user-friendly platform interface. Models can be deployed from HuggingFace, or your models logged into TrueFoundry model registry, or via your custom inference code in any of the frameworks.
Then we explore 3 unique model deployment scenarios -
1) simple deployment using model deployment feature
2) gated deployment using model deployment feature
3) model deployment using service deployment feature (that is basis for the model deployment abstraction)
Models deployed are popular open-source ones - DeepSeek-R1-Distill-Qwen-1.5B, Mixtral-8x22B-Instruct-v0.1, and Llama-3.2-3B-Instruct. Through the demo, we see secret store feature in action additionally which is useful for storing HuggingFace access tokens, for instance.
Finally, once the self-hosted models are deployed, we show they can be integrated into a new model provider account (or an existing one). Inference can be done via the Playground feature for experimentation, or via code snippets to integrate in your applications. Remember, a centralized API key is provided, and all self-hosted models in the TrueFoundry AI Gateway have OpenAI-compatible endpoints making model switching streamlined overall, even with other closed-source models!
We encourage you to explore our detailed documentation and create a free account to experience the powerful capabilities of the TrueFoundry AI Gateway firsthand. Furthermore, please watch the other feature walkthroughs in this playlist if anything is unclear and you can also contact us, details below.
Link to documentation: https://docs.truefoundry.com/gateway/
Create a free account: https://www.truefoundry.com/register
Contact: Navigate to chat at https://www.truefoundry.com/ and the team will get back to you
_____________________________________
Chapters
00:00 The Case for Self-Hosted Models
00:20 Fundamental Deployment Concepts in TrueFoundry
01:50 The "How" of Deploying Self-Hosting Models
03:04 How You can Self-Host Models in TrueFoundry
03:42 3 Deployment Examples of Self-Hosted Models
04:24 Platform Demo of Cluster, Workspace, Repository, Environment
05:52 Platform Demo of Deployment Interface
06:25 Self-Hosted Model Deployment #1 - DeepSeek-R1-Distill-Qwen-1.5B
07:08 Model Serving and Machine Configurations Available
07:49 Back to Self-Hosted Model Deployment #1
10:15 Deployment Options with Python/YAML/Direct UI
11:12 Self-Hosted Model Deployment #2 - Mixtral-8x22B-Instruct-v0.1
11:31 How to Add Secrets for Gated Models in TrueFoundry
12:34 Back to Self-Hosted Model Deployment #2
13:08 Self-Hosted Model Deployment #3 - Llama-3.2-3B-Instruct (with Service Deployment)
15:04 Add Models within the AI Gateway
16:20 Inference of Self-Hosted Models with Playground / Application Code
16:38 Closing Notes
Explore TrueFoundry AI Gateway
https://www.truefoundry.com/ai-gateway
Explore TrueFoundry MCP Gateway
https://www.truefoundry.com/mcp-gateway
_____________________________________
ABOUT TRUEFOUNDRY
TrueFoundry provides a low latency enterprise AI Gateway with MCP server integrations, guardrails, observability, latency based routing, model fallback, caching, quota and access control (on-premise, VPC or cloud). The company also provides a robust enterprise LLMOPs Platform for model serving, inference and efficient finetuning.
TrueFoundry is a Cloud-native PaaS for Machine learning teams to build, deploy and ship ML/LLM Applications on their own cloud/on-premise infrastructure in a faster, scalable, cost efficient way with the right governance controls, allowing them to achieve 90% faster time to value than other teams.
TrueFoundry abstracts out the engineering required and offers GenAI accelerators - LLM PlayGround, LLM Gateway, LLM Deploy, LLM Finetune, RAG Playground and Application Templates that can enable an organisation to speed up the layout of their overall GenAI/LLMOps framework. Enterprises can plug and play these accelerators with their internal systems as well as build on top of our accelerators to enable a LLMOps platform of their choice to the GenAI developers. TrueFoundry is modular and completely API driven, has native integration with popular tools in the market like LangChain, VectorDBs, GuardRails, etc.
Website
https://www.truefoundry.com
#ai #modelcontextprotocol #agenticai #genai
Информация по комментариям в разработке