Tutorial on Generative Models for Content Creation, Reconstruction, and Beyond
Tutorial will take place at Indian Conference on Computer Vision, Graphics and Image Processing (ICVGIP) 2025 organised by IIT Mandi from Dec 17-20.
Overview 🚀
Abstract Generative models have advanced rapidly in recent years, driving breakthroughs across image and video synthesis, 3D scene understanding, and even text generation. Among these, diffusion models have emerged as one of the most powerful and versatile generative frameworks. This tutorial provides a comprehensive introduction to diffusion models, covering their core principles, key design choices, and the evolution of architectures enabling state-of-the-art performance in modern computer vision tasks. We begin with foundational concepts and gradually build toward advanced topics, ensuring accessibility for newcomers while offering depth for experienced researchers. The tutorial is organized into three sections as outlined below:
[Section 1] Introduction to Diffusion Models and their application in Content Creation
Index
- What are Diffusion Models?
- Classifier guidance
- Latent Diffusion Model
- Conditional generation
- ControlNet
- Diffusion Image Transformers
- OminiControl
- SeeThrough3D
- Applications
- Instruct Pix2Pix
- Continuous Editing Control
- Personalization
- PreciseControl
- Text2Place
- Depth Prediction
[Section 2] Diffusion Models for 3D Generation
Index
- Different 3D Representations
- Image Priors for Sparse Views Novel-View Synthesis
- 3D Generation using Image Diffusion Models
- DreamFusion
- DreamGaussian
- Large Reconstruction Models
- Diffusion Models for 3D Generation
[Section 3] Diffusion Models for Video Generation and Multimodal Applications
- Video Generation
- Different Video Models
- Controlling Video Models
- Camera Control
- Motion Control
- Evaluation
- Vbench, EvalCrafter, VideoScore etc.
- Physics Evaluations
- Limitations
- Fixed Length Denoising
- Non Causal Video Generation
- Diffusion Forcing
- Inadequate prompts, Motion Dynamics
Speakers
Rishubh Parihar
Ankit Dhiman
Badrinath Singhal
AS Anudeep
Prerequisites
To get the most out of sessions, attendees should be comfortable with: Deep Learning Fundamentals: Working knowledge of CNNs, RNNs/Transformers, and loss functions. Python & PyTorch/TensorFlow: Ability to read and modify deep learning code.
Slides for the tutorial
(Updated Link) Here is the link for the slides. Please note that the link will be disabled after 30 days from 17th Dec 2025
Organisers
- Ankit Dhiman (PhD Student at Vision and AI Lab (VAL) at IISc, Bangalore)
- Badrinath Singhal (PhD Student at Vision and AI Lab (VAL) at IISc, Bangalore)
- AS Anudeep (Masters Student at IISc, Bangalore)
- Rishubh Parihar (PhD Student at Vision and AI Lab (VAL) at IISc, Bangalore)
Questions: For any inquiries regarding the tutorial content, please contact us at badrinaths@iisc.ac.in with subject of the email as “Tutorial at ICVGIP 2025”.