Скачать или смотреть Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Скачать Randomized Exploration for Non-Stationary Stochastic Linear Bandits бесплатно в качестве 4к (2к / 1080p)

У нас вы можете скачать бесплатно Randomized Exploration for Non-Stationary Stochastic Linear Bandits или посмотреть видео с ютуба в максимальном доступном качестве.

Для скачивания выберите вариант из формы ниже:

Информация по загрузке:

Cкачать музыку Randomized Exploration for Non-Stationary Stochastic Linear Bandits бесплатно в формате MP3:

Если иконки загрузки не отобразились, ПОЖАЛУЙСТА, НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если у вас возникли трудности с загрузкой, пожалуйста, свяжитесь с нами по контактам, указанным в нижней части страницы.
Спасибо за использование сервиса video2dn.com

Описание к видео Randomized Exploration for Non-Stationary Stochastic Linear Bandits

"Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Baekjin Kim (University of Michigan)*; Ambuj Tewari (University of Michigan)

We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice. The first approach replaces optimism with a simple randomization when using confidence sets. The second one adds random perturbations to its current estimate before maximizing the expected reward. For non-stationary linear bandits, where each action is associated with a $d$-dimensional feature and the unknown parameter is time-varying with total variation $B_T$, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. We highlight the statistical optimality versus computational efficiency trade-off between them in that the former asymptotically achieves the optimal dynamic regret $\tilde{\cO}(d ^{2/3}B_T^{1/3} T^{2/3})$, but the latter is oracle-efficient with an extra logarithmic factor in the number of arms compared to minimax-optimal dynamic regret. In a simulation study, both algorithms show the outstanding performance in tackling conservatism issue that Discounted LinUCB struggles with."

Комментарии

Информация по комментариям в разработке