Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting
CVPR Findings 2026
A study on preserving observable regions and generating unobservable areas to achieve temporally coherent and realistic video outpainting.
Video Understanding · Video Generation & Editing · Multimodal Learning
M.S. Student in Electrical and Electronic Engineering at Yonsei University, with research interests in video understanding, generation, and editing, with a focus on multimodal learning.
My research focuses on intelligent systems for video, spanning video understanding, generation, and editing. I am particularly interested in developing scalable and controllable video creation technologies for complex dynamic scenes, with the long-term vision of supporting next-generation video production and streaming platforms.
Research Focus
Exploring intelligent systems across the video spectrum, including video understanding, generation, and editing, with an emphasis on multimodal learning and scalable video creation technologies.
About
I am an M.S. student in Electrical and Electronic Engineering at Yonsei University, advised by Prof. Sangyoun Lee. My research lies broadly in computer vision and artificial intelligence, with a focus on advancing intelligent systems for complex visual data.
I am particularly interested in models that can understand and generate dynamic visual content, reason about motion and long-range temporal structure, and operate robustly in real-world video environments.
My long-term goal is to develop scalable visual intelligence systems that can support next-generation video technologies and large-scale content creation platforms.
News
Co-authored paper under review at ECCV 2026.
Co-authored paper under review at ICIP 2026.
One first-author paper accepted to CVPR 2026. (Findings Paper)
One first-author paper accepted to ICIP 2025. (Oral Paper)
Began the M.S. program in Electrical and Electronic Engineering at Yonsei University.
Publications
CVPR Findings 2026
A study on preserving observable regions and generating unobservable areas to achieve temporally coherent and realistic video outpainting.
ICIP Oral 2025
A study on learning cross-modal representations between appearance and motion through a masking-based token modulation framework.
Research
Exploring intelligent systems that understand dynamic visual content, reason about motion, and capture long-range temporal structure in videos.
Topics: Temporal Reasoning, Temporal Grounding, Moment Retrieval
Developing controllable and temporally coherent systems for video creation, manipulation, and high-quality content synthesis in complex real-world scenes.
Topics: Motion Editing, Video Inpainting, Video Outpainting
Learning robust interactions across heterogeneous modalities to improve both understanding and generation in complex visual environments.
Topics: Text, Optical Flow, Speech
Studying robust methods for discovering and segmenting salient objects in videos, with a particular interest in unsupervised settings, motion cues, and dynamic scene parsing.
Topics: Unsupervised VOS, Motion-Guided Segmentation, Salient Object Discovery
Investigating scalable generative paradigms for visual synthesis, with an emphasis on controllability, fidelity, and temporal consistency.
Topics: Diffusion Models, Flow-based Models, Autoregressive Modeling
Building effective representations for videos that capture appearance, motion, and cross-modal structure for downstream analysis and generation.
Topics: Temporal Modeling, Motion Representation, Cross-Modal Representation
Contact
I am always happy to discuss research, collaboration, and Ph.D. opportunities.
please feel free to get in touch.