Frontiers of Offline Interactive Machine Learning: From Contextual Bandits to LLM Alignment > 세미나

본문 바로가기
사이트 내 전체검색


세미나

모드선택 :              
세미나 신청은 모드에서 세미나실 사용여부를 먼저 확인하세요

Frontiers of Offline Interactive Machine Learning: From Contextual Ban…

홍영준교수님 0 49
구분 응용수학
일정 2025-10-29 16:00 ~ 17:00
강연자 전광성 (University of Arizona)
기타
담당교수 홍영준

At the heart of modern machine learning lies a fundamental challenge: how can an intelligent system not just learn from data but also decide which data to collect for learning? This is the essence of interactive machine learning (IML) -- a paradigm that encompasses reinforcement learning, contextual bandits, and active learning. Recently, the offline version of IML has gained popularity because the standard online version often cannot be run due to real-world constraints. In this talk, I will show two recent advances in offline IML problems. First, I will discuss the contextual bandit problem that has applications in recommendation systems. I will show how an improved confidence bound for [0,∞)-valued random variable translates into a superior learning algorithm, both in theory and practice. Second, I will show that the LLM alignment problem is an instance of offline IML and that existing training objectives for it lack theoretical justifications, leaving us wondering if they are the right ones to use. As such, I will present a novel theoretical framework for alignment from which three different alignment algorithms are derived along with theoretical guarantees, which is a strong form of justification. Surprisingly, two of them are very similar to existing algorithms called Direct Policy Optimization (DPO) and reinforcement learning from human feedback (RLHF), respectively. Together with our theoretical guarantees, our work can be seen as providing theoretical justifications to DPO and RLHF, with minor corrections. Furthermore, our theory confirms the existing empirical finding that RLHF performs better than DPO. I will conclude with empirical results and exciting future research directions.

세미나명

   

상단으로

Research Institute of Mathematics
서울특별시 관악구 대학동 서울대학교 자연과학대학 129동 305호
Tel. 02-880-6562 / Fax. 02-877-6541 su305@snu.ac.kr

COPYRIGHT ⓒ 자연과학대학 수학연구소 ALL RIGHT RESERVED.