Скачать или смотреть 2510.18234 - DeepSeek OCR: Contexts Optical Compression

2510.18234 - DeepSeek OCR: Contexts Optical Compression

Machine LearningSecure MLRobust MLThrustworthy AIMachine Learning SecurityData Science

Скачать 2510.18234 - DeepSeek OCR: Contexts Optical Compression бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно 2510.18234 - DeepSeek OCR: Contexts Optical Compression или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку 2510.18234 - DeepSeek OCR: Contexts Optical Compression бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео 2510.18234 - DeepSeek OCR: Contexts Optical Compression

title: DeepSeek-OCR: Contexts Optical Compression
author: Haoran Wei, Yaofeng Sun, Yukun Li
arXiv:2510.18234 - https://arxiv.org/abs/2510.18234

We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder. Specifically, DeepEncoder serves as the core engine, designed to maintain low activations under high-resolution input while achieving high compression ratios to ensure an optimal and manageable number of vision tokens. Experiments show that when the number of text tokens is within 10 times that of vision tokens (i.e., a compression ratio < 10x), the model can achieve decoding (OCR) precision of 97%. Even at a compression ratio of 20x, the OCR accuracy still remains at about 60%. This shows considerable promise for research areas such as historical long-context compression and memory forgetting mechanisms in LLMs. Beyond this, DeepSeek-OCR also demonstrates high practical value. On OmniDocBench, it surpasses GOT-OCR2.0 (256 tokens/page) using only 100 vision tokens, and outperforms MinerU2.0 (6000+ tokens per page on average) while utilizing fewer than 800 vision tokens. In production, DeepSeek-OCR can generate training data for LLMs/VLMs at a scale of 200k+ pages per day (a single A100-40G). Codes and model weights are publicly accessible at http://github.com/deepseek-ai/DeepSee....
#llms #ocr

Комментарии

Информация по комментариям в разработке