[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Описание к видео [QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

The paper investigates extreme-token phenomena in transformer-based LLMs, revealing mechanisms behind attention sinks and proposing strategies to mitigate their impact during pretraining.

https://arxiv.org/abs//2410.13835

YouTube:    / @arxivpapers  

TikTok:   / arxiv_papers  

Apple Podcasts: https://podcasts.apple.com/us/podcast...

Spotify: https://podcasters.spotify.com/pod/sh...

Комментарии

Информация по комментариям в разработке