HAD: Hallucination-Aware Diffusion Priors for 3D Reconstruction

1Amazon AWS 2Clemson University
Project lead *Work done at Amazon

Abstract

Diffusion priors have recently demonstrated strong capability in enhancing the quality of sparse-view 3D reconstruction by augmenting training views at novel viewpoints, but they inevitably introduce hallucinated content-- artifacts inconsistent with the input views -- into the final 3D model. To address this challenge, we propose Hallucination-Aware Diffusion prior (HAD), which estimates pixel-wise hallucination score maps for augmented images by leveraging multi-view reasoning capabilities from a feedforward novel view synthesis (NVS) network pre-trained on large-scale 3D data. These hallucination scores enable selective masking of unreliable pixels during the progressive 3D reconstruction procedure, preventing the introduction of non-existent artifacts into the 3D model. To further enhance performance, we create multiple versions of augmented images at each novel view by conditioning the diffusion prior on different input views, which are then fused into a final image that leverages the broader context across all input views. We show that our method substantially reduces hallucination artifacts in diffusion-assisted 3D reconstruction, thereby achieving state-of-the-art performance across multiple benchmarks on novel view synthesis.

Method Overview

HAD method overview

We train 3DGS with input images and HAD-augmented novel views. HAD combines a pretrained diffusion prior (which generates images from 3DGS-rendered views conditioned on reference input images) with our hallucination score network (which predicts pixel-wise reliability maps). Our multi-sampling strategy fuses multiple generated versions into refined augmented views. Hallucination scores guide 3DGS optimization by masking off unreliable content improving reconstruction quality in data-sparse scenarios.

Hallucination Detection

Our hallucination scoring network can recognize artifacts introduced by diverse generative priors, including image diffusion, video diffusion, and multi-view diffusion models.

We evaluate whether the hallucination scoring module generalizes beyond the diffusion model used during training. Although HAD is trained on images generated by Diffix3D+, which adopts image diffusion priors, we find that the learned scorer also successfully identifies unreliable regions produced by other generative methods without additional fine-tuning, including GenFusion, which relies on video diffusion priors, and SVC, which uses multi-view diffusion. This suggests that the predicted scores are driven by underlying multi-view inconsistencies rather than artifacts specific to a particular generator, enabling HAD to detect hallucinations from diverse diffusion models before they are fused into the 3D reconstruction.

Hallucination Removal Results

Our method effectively eliminates "dreaming" artifacts introduced by diffusion priors

DL3DV Dataset Results

Hallucination Analysis

Video Comparison: Navigate through scenes to see corresponding hallucination analysis above

Difix3D (with hallucinations) | Ours (HAD) (mitigated hallucinations)

MipNeRF360 Dataset Results

Hallucination Analysis

Video Comparison: Navigate through scenes to see corresponding hallucination analysis above

Difix3D (with hallucinations) | Ours (HAD) (mitigated hallucinations)

Qualitative Comparisons

DL3DV Dataset

Side-by-side comparison showing reconstruction quality improvements

Gsplat-MCMC | Ours (HAD)

MipNeRF360 Dataset

Evaluation on challenging 360° scenes

Gsplat-MCMC | Ours (HAD)