MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

CVPR 2025

Ankit Dhiman ^1,2*, Manan Shah ^1* and R Venkatesh Babu ¹

^* Equal Contribution
¹ Vision and AI Lab, IISc Bangalore
² Samsung R & D Institute India - Bangalore

TL;DR: We introduce SynMirrorV2, a large-scale synthetic dataset containing 207K samples with full scene geometry, including depth maps, normal maps, and segmentation masks. We demonstrate that, with a training curriculum, MirrorFusionv2; a depth-conditioned generation network; can effectively generalize to real-world scenes.

Key featues of SynMirrorV2 are:
• Pose randomization (position, rotation, grounding)
• Object pairing to enable complex multi-object scenes

Paper Source Code Dataset (coming soon) Reflecting-Reality

Introduction

Despite remarkable progress in text-to-image generation, state-of-the-art method fails to generate realistic mirror reflections. We put recent state-of-the-art models, Stable Diffusion 3.5 and FLUX to generate a scene with realistic and coherent reflections.

Prompt: A perfect plane mirror reflection of a mug which is placed in front of the mirror.

Prompt: A perfect plane mirror reflection of a stuffed toy bear which is placed in front of the mirror.

We observe that T2I methods fail on the challenging task of generating realistic and plausible mirror reflections. Either they get the orientation of the object wrong in the reflection or fail to create a photo-realistic scene with mirror placed in it. Further, inpainting methods also fail in this task. Recent method to generate realistic and controllable mirror reflections, MirrorFusion also falls short on real-world and challenging scenes as apparent in the below figure.

Dataset

We find that previous mirror datasets are not enough to train a generative model and are not tailored for the mirror reflection generation task. Further, previously proposed SynMirror lacks key augmentations such as object grounding, rotation, and multi-object scenarios, which restrict the model, which is trained on SynMirror, to generalize to real-world scenes.

We introduce SynMirrorV2, a dataset enhanced with key augmentations such as object grounding, rotation, and support for multiple objects within a scene. To create the dataset, we use 3D assets from Objaverse and Amazon Berkeley Objects (ABO).

We employ BlenderProc to render each 3D object along with its corresponding depth map, normal map, and segmentation mask. For each object, we generate three random views and apply augmentations, including varied object placement and orientation relative to the mirror within the scene.

Dataset	Type	Size	Attributes
MSD	Real	4018	RGB, Masks
Mirror-NeRF	Real & Synthetic	9 scenes	RGB, Masks, Multi-View
DLSU-OMRS	Real	454	RGB, Mask
TROSD	Real	11060	RGB, Mask
PMD	Real	6461	RGB, Masks
RGBD-Mirror	Real	3049	RGB, Depth
Mirror3D	Real	7011	RGB, Masks, Depth
SynMirror	Synthetic	198204	Single Fixed Objects: RGB, Depth,Masks, Normals, Multi-View
SynMirrorV2 (Ours)	Synthetic	207610	Single + Multiple Objects: RGB, Depth, Masks, Normals, Multi-View, Augmentations

Dataset

Type

Size

Attributes

MSD

Real

4018

RGB, Masks

Mirror-NeRF

Real & Synthetic

9 scenes

RGB, Masks, Multi-View

DLSU-OMRS

Real

454

RGB, Mask

TROSD

Real

11060

RGB, Mask

PMD

Real

6461

RGB, Masks

RGBD-Mirror

Real

3049

RGB, Depth

Mirror3D

Real

7011

RGB, Masks, Depth

SynMirror

Synthetic

198204

Single Fixed Objects: RGB, Depth,Masks, Normals, Multi-View

                            SynMirrorV2 (Ours)
                            Synthetic
                            207610 
                            Single + Multiple Objects: RGB, Depth,
                                Masks, Normals, Multi-View, Augmentations
                        

Qualitative Results

Comparison with MirrorFusion.
We compare our method with the baseline MirrorFusion on MirrorBenchV2. The baseline method shown struggles with pose variations, even in single-object scenes, and fails to produce accurate reflections for multiple objects. In contrast, our method handles variations in the object orientation effectively and generates geometrically accurate reflections, even in complex, multi-object scenarios.

Comparison on Real-World Dataset.
We show results for MirrorFusion, our method and our method fine-tuned on the MSD dataset. We observe that our method can generate reflections capturing the intricacies of complex scenes, such as a cluttered cable on the table and the presence of two mirrors in a 3D scene.