Studying Average Reward Reinforcement Learning via Anchored Value Iteration > 세미나

본문 바로가기
사이트 내 전체검색


세미나

모드선택 :              
세미나 신청은 모드에서 세미나실 사용여부를 먼저 확인하세요

Studying Average Reward Reinforcement Learning via Anchored Value Iter…

김한나 0 294
구분 박사학위 논문 발표
일정 2025-06-19 12:00 ~ 13:00
강연자 이종민 (서울대학교)
기타
담당교수 강명주

Average-reward Markov decision processes (MDPs) provide a fundamental framework for long-term, steady-state decision-making. As reinforcement learning becomes central to deep learning and large-language-model research, interest in the average-reward setting has grown. However, compared with the discounted-reward counterpart, average-reward MDPs are harder to analyze, and the literature remains sparse. This thesis advances the study of average-reward reinforcement learning through Anchored Value Iteration (Anc-VI), presenting three main contributions in tabular setup, generative model setup, and offline RL setup.


세미나명

   

상단으로

Research Institute of Mathematics
서울특별시 관악구 대학동 서울대학교 자연과학대학 129동 305호
Tel. 02-880-6562 / Fax. 02-877-6541 su305@snu.ac.kr

COPYRIGHT ⓒ 자연과학대학 수학연구소 ALL RIGHT RESERVED.