This project aims to find ways to improve the ability of Diffusion Models to generate 3D consistent novel views of an object or scene. Current methods are limited in the geometric consistency of generated views, and the fine-tuning process is very heavy (taking up to 7 days using 8xA100 GPUs). We aim to investigate the latent space of these diffusion models and to find efficient ways of utilizing geometric constraints to improve consistency, focusing specifically on improving results for scenes and not just single objects. During developing large amounts of intermediate data (3D reconstructions, image sequences and videos) is created that is needed for development and evaluation, leading to a need for large storage during development of these methods.