Studying Average Reward Reinforcement Learning via Anchored Value Iter…
| 구분 | 박사학위 논문 발표 |
|---|---|
| 일정 | 2025-06-19 12:00 ~ 13:00 |
| 강연자 | 이종민 (서울대학교) |
| 기타 | |
| 담당교수 | 강명주 |
Average-reward Markov decision processes (MDPs) provide a fundamental framework for long-term, steady-state decision-making. As reinforcement learning becomes central to deep learning and large-language-model research, interest in the average-reward setting has grown. However, compared with the discounted-reward counterpart, average-reward MDPs are harder to analyze, and the literature remains sparse. This thesis advances the study of average-reward reinforcement learning through Anchored Value Iteration (Anc-VI), presenting three main contributions in tabular setup, generative model setup, and offline RL setup.
