Frontiers of Offline Interactive Machine Learning: From Contextual Bandits to LLM Alignment > 세미나

본문 바로가기

$서울대학교 수리과학부$

사이트 내 전체검색

로그인

세미나

모드선택 :
세미나 신청은 모드에서 세미나실 사용여부를 먼저 확인하세요

Frontiers of Offline Interactive Machine Learning: From Contextual Ban…

홍영준교수님 129동 309호 0 49 2025.10.20 22:28

구분	응용수학
일정	2025-10-29 16:00 ~ 17:00
강연자	전광성 (University of Arizona)
기타
담당교수	홍영준

At the heart of modern machine learning lies a fundamental challenge: how can an intelligent system not just learn from data but also decide which data to collect for learning? This is the essence of interactive machine learning (IML) -- a paradigm that encompasses reinforcement learning, contextual bandits, and active learning. Recently, the offline version of IML has gained popularity because the standard online version often cannot be run due to real-world constraints. In this talk, I will show two recent advances in offline IML problems. First, I will discuss the contextual bandit problem that has applications in recommendation systems. I will show how an improved confidence bound for [0,∞)-valued random variable translates into a superior learning algorithm, both in theory and practice. Second, I will show that the LLM alignment problem is an instance of offline IML and that existing training objectives for it lack theoretical justifications, leaving us wondering if they are the right ones to use. As such, I will present a novel theoretical framework for alignment from which three different alignment algorithms are derived along with theoretical guarantees, which is a strong form of justification. Surprisingly, two of them are very similar to existing algorithms called Direct Policy Optimization (DPO) and reinforcement learning from human feedback (RLHF), respectively. Together with our theoretical guarantees, our work can be seen as providing theoretical justifications to DPO and RLHF, with minor corrections. Furthermore, our theory confirms the existing empirical finding that RLHF performs better than DPO. I will conclude with empirical results and exciting future research directions.

$프린트$

목록

구분 강연일 시간 세미나명 강연자 초청자 신청자

표현론,초청강연

2026-02-06

15:00

129동 309호 A conjecture on inequalities defining polyhedral realizations and monomial realizations of crystal bases
표현론,초청강연 Yuki Kanakubo 20260206 15:00 기타

Yuki Kanakubo

기타

허태혁
응용수학

2026-01-27

13:30

27동 116호 Machine learning for nuclear fusion: plasma diagnosis, prediction, and control
응용수학 서재민 20260127 13:30 홍영준

서재민

홍영준

홍영준교수님
응용수학

2026-01-20

13:30

27동 116호 Effectively training neural ordinary differential equations for data-driven dynamics discovery
응용수학 고준혁 20260120 13:30 홍영준

고준혁

홍영준

홍영준교수님
초청강연

2026-01-20

15:00

27동 220호 Many Facets of the Generalized Rank Invariant
초청강연 김우진 20260120 15:00 Otto van Koert

김우진

Otto van Koert

김선우
조화해석학

2026-01-20

15:00

27동 325호 Mizohata–Takeuchi type inequalities for the moment curve
조화해석학 Zane Li 20260120 15:00 오창근

Zane Li

오창근

오창근
초청강연

2026-01-20

16:00

27동 220호 Generalization of cMDS to metric measure spaces
초청강연 임선혁 20260120 16:00 Otto van Koert

임선혁

Otto van Koert

김선우
조화해석학

2026-01-20

16:00

27동 325호 Curved Kakeya problems and applications
조화해석학 Shaoming Guo 20260120 16:00 오창근

Shaoming Guo

오창근

오창근
Geometry Physics and Symmetry

2026-01-16

10:00

27동 325호 Categorical Enumerative Invariants and Large N Matrix Models
Geometry Physics and Symmetry Jakob Ulmer 20260116 10:00 유필상

Jakob Ulmer

유필상

유필상
HYKE

2026-01-15

10:30

27동 116호 Sea Ice Multiscale Modeling: From Particle to Kinetic to Hydrodynamic Descriptions with an Application to Data Assimilation
HYKE Quanling Deng 20260115 10:30 하승열

Quanling Deng

하승열

김수현
Geometry Physics and Symmetry

2026-01-14

10:00

27동 325호 Categorical Enumerative Invariants and Large N Matrix Models
Geometry Physics and Symmetry Jakob Ulmer 20260114 10:00 유필상

Jakob Ulmer

유필상

유필상
Geometry Physics and Symmetry

2026-01-14

14:00

27동 325호 Categorical Enumerative Invariants and Large N Matrix Models
Geometry Physics and Symmetry Jakob Ulmer 20260114 14:00 유필상

Jakob Ulmer

유필상

유필상
정수론-동역학

2026-01-09

11:00

129동 301호 Extreme events for some unipotent actions on the space of lattices
정수론-동역학 Shucheng Yu 20260109 11:00 임선희

Shucheng Yu

임선희

김성민
집중강연

2026-01-09

14:00

기타1 Khintchine’s theorem on manifolds: dynamics, geometry of numbers, and Fourier analysis
집중강연 김우연 20260109 14:00 오창근

김우연

오창근

김선우
집중강연

2026-01-08

14:00

기타1 Khintchine’s theorem on manifolds: dynamics, geometry of numbers, and Fourier analysis
집중강연 김우연 20260108 14:00 오창근

김우연

오창근

김선우
집중강연

2026-01-07

14:00

기타1 Khintchine’s theorem on manifolds: dynamics, geometry of numbers, and Fourier analysis
집중강연 김우연 20260107 14:00 오창근

김우연

오창근

김선우

1
2
3
4
5
6
7
8
9
10

상단으로

개인정보취급방침 수리과학부 회사소개 오시는길 사이트맵

Research Institute of Mathematics
서울특별시 관악구 대학동 서울대학교 자연과학대학 129동 305호
Tel. 02-880-6562 / Fax. 02-877-6541 su305@snu.ac.kr

COPYRIGHT ⓒ 자연과학대학 수학연구소 ALL RIGHT RESERVED.