3D Generation using Diffusion Models and Neural Rendering

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/22-1666

Type:

NAISS Small Compute

Principal Investigator:

David Nilsson

Affiliation:

Chalmers tekniska högskola

Start Date:

2025-12-08

End Date:

2026-02-01

Primary Classification:

10207: Computer graphics and computer vision (System engineering aspects at 20208)

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month

Abstract

Recent advances in diffusion models can be used to render realistic videos. However, upon closer inspection it is common that such rendered videos contain artifacts such as appearing and disappearing scene content, and there can be significant deformations of parts of the scene that are supposed to be static. Furthermore, there are challenges in precisely controlling the viewpoint and camera motion of the video. In this project I will attempt to modify the video diffusion sampling in such a way that the rendered video is grounded to a 3D model of the scene. This is done by iteratively fitting a 3D model according to the current sample of the diffusion model, as well as using the 3D model as a guidance signal to steer the diffusion sample in the direction towards the space of renders from a correct 3D scene. The method is training-free and does not require fine-tuning a diffusion model. The method will be evaluated on the tanks-and-temples dataset using various baseline video diffusion models.