Generative models can produce impressively realistic images. This paper demonstrates that generated images have geometric features different from those of real images. We build a set of collections of generated images, prequalified to fool simple, signal-based classifiers into believing they are real. We then show that prequalified generated images can be identified reliably by classifiers that only look at geometric properties. We use three such classifiers. All three classifiers are denied access to image pixels, and look only at derived geometric features. The first classifier looks at the perspective field of the image, the second looks at lines detected in the image, and the third looks at relations between detected objects and shadows. Our procedure detects generated images more reliably than SOTA local signal based detectors, for images from a number of distinct generators. Saliency maps suggest that the classifiers can identify geometric problems reliably. We conclude that current generators cannot reliably reproduce geometric properties of real images.
Grad-CAM analysis on indoor scenes exposes geometric flaws: original images show object-shadow mismatches, while Grad-CAM highlights synthetic indicators like misdirected shadows from furniture. Further, it identifies anomalies in Perspective Fields, such as misaligned lines and cupboard top discrepancies, affirming our model's accuracy in detecting AI-generated image errors.
Grad-CAM reveals key errors in outdoor images: varied shadow directions in vehicles and structural distortions near vanishing points, highlighting its precision in detecting geometric inconsistencies in AI-generated content.
DALL-E 3 generated interior scenes analyzed using Object-Shadow and Perspective Fields cues, with GradCam visualizations. Object-Shadow GradCam identifies mismatched shadow directions and lengths, while Perspective Fields GradCam reveals inaccuracies in line alignment and vanishing points in room geometry.
Adobe Firefly's street scenes reveal projective geometry inconsistencies, analyzed using Object-Shadow and Perspective Fields. Object-Shadow GradCam highlights mismatched shadow directions and lengths, while Perspective Fields GradCam detects line inconsistencies near vanishing points and in scene depth, particularly in road markings and building facades.
ROC Curves Assessing Projective Geometry Cues in AI-Generated images by Kandinsky
ROC Curves Assessing Projective Geometry Cues in AI-Generated images by DeepFloyd, Firefly, SDXL, PixArt-alpha, DALL-E 3
@InProceedings{Sarkar_2024_CVPR,
author = {Sarkar, Ayush and Mai, Hanlin and Mahapatra, Amitabh and Lazebnik, Svetlana and Forsyth, D.A. and Bhattad, Anand},
title = {Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {28140-28149}
}