Video Understanding · Video Generation & Editing · Multimodal Learning

InSeok Jeon

M.S. Student in Electrical and Electronic Engineering at Yonsei University, with research interests in video understanding, generation, and editing, with a focus on multimodal learning.

My research focuses on intelligent systems for video, spanning video understanding, generation, and editing. I am particularly interested in developing scalable and controllable video creation technologies for complex dynamic scenes, with the long-term vision of supporting next-generation video production and streaming platforms.

View Publications Download CV

Email GitHub Google Scholar LinkedIn

2 First-Author Papers

CVPR 2026 Findings Paper

ICIP 2025 Oral Paper ⭐

Research Focus

Intelligent Systems for Video

Exploring intelligent systems across the video spectrum, including video understanding, generation, and editing, with an emphasis on multimodal learning and scalable video creation technologies.

About

Researcher Profile

I am an M.S. student in Electrical and Electronic Engineering at Yonsei University, advised by Prof. Sangyoun Lee. My research lies broadly in computer vision and artificial intelligence, with a focus on advancing intelligent systems for complex visual data.

I am particularly interested in models that can understand and generate dynamic visual content, reason about motion and long-range temporal structure, and operate robustly in real-world video environments.

My long-term goal is to develop scalable visual intelligence systems that can support next-generation video technologies and large-scale content creation platforms.

Affiliation Yonsei University

Advisor Prof. Sangyoun Lee

OPEN TO Ph.D. Opportunities, Research Internships, Collaborations, and Fellowships

News

Recent Updates

Mar, 2026

ECCV 2026 Submission

Co-authored paper under review at ECCV 2026.

Feb, 2026

ICIP 2026 Submission

Co-authored paper under review at ICIP 2026.

Oct, 2025

CVPR 2026

One first-author paper accepted to CVPR 2026. (Findings Paper)

Feb, 2025

ICIP 2025

One first-author paper accepted to ICIP 2025. (Oral Paper)

Sep, 2024

Started M.S. at Yonsei University

Began the M.S. program in Electrical and Electronic Engineering at Yonsei University.

Publications

Selected Publications

Featured

Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

Inseok Jeon, Minhyeok Lee, Seunghoon Lee, Minseok Kang, Suhwan Cho, Sangyoun Lee

CVPR Findings 2026

A study on preserving observable regions and generating unobservable areas to achieve temporally coherent and realistic video outpainting.

Paper Code

Featured

CMTM: Cross-Modal Token Modulation for Unsupervised Video Object Segmentation

Inseok Jeon, Suhwan Cho, Minhyeok Lee, Seunghoon Lee, Minseok Kang, Jungho Lee, Chaewon Park, Donghyeong Kim, Sangyoun Lee

ICIP Oral 2025

A study on learning cross-modal representations between appearance and motion through a masking-based token modulation framework.

Paper Code

Research

Research Interests

Video Understanding

Exploring intelligent systems that understand dynamic visual content, reason about motion, and capture long-range temporal structure in videos.

Topics: Temporal Reasoning, Temporal Grounding, Moment Retrieval

Video Editing

Developing controllable and temporally coherent systems for video creation, manipulation, and high-quality content synthesis in complex real-world scenes.

Topics: Motion Editing, Video Inpainting, Video Outpainting

Multimodal Learning

Learning robust interactions across heterogeneous modalities to improve both understanding and generation in complex visual environments.

Topics: Text, Optical Flow, Speech

Video Object Segmentation

Studying robust methods for discovering and segmenting salient objects in videos, with a particular interest in unsupervised settings, motion cues, and dynamic scene parsing.

Topics: Unsupervised VOS, Motion-Guided Segmentation, Salient Object Discovery

Generative Models

Investigating scalable generative paradigms for visual synthesis, with an emphasis on controllability, fidelity, and temporal consistency.

Topics: Diffusion Models, Flow-based Models, Autoregressive Modeling

Video Representation Learning

Building effective representations for videos that capture appearance, motion, and cross-modal structure for downstream analysis and generation.

Topics: Temporal Modeling, Motion Representation, Cross-Modal Representation

Contact

Let’s Connect

I am always happy to discuss research, collaboration, and Ph.D. opportunities.
please feel free to get in touch.

Email Me Download CV

Email GitHub Google Scholar LinkedIn