The Unreasonable Effectiveness of Deep Features as a Perceptual Metric.
🌏 Source
Downloadable at: Open Access - CVPR 2018. Source code is available at: GitHub - richzhang/PerceptualSimilarity.
The paper argues that widely used image quality metrics like SSIM and PSNR mentioned in image-quality-assessment are simple and shallow functions that may fail to account for many nuances of human perception. The paper introduces a new dataset of human perceptual similarity judgments to systematically evaluate deep features across different architectures and tasks and compare them with classic metrics.
Findings of this paper suggests that perceptual similarity is an emergent property shared across deep visual representations.
In this paper, the author provides a hypothesis that perceptual similarity is not a special function all of its own, but rather a consequence of visual representations tuned to be predictive about important structure in the world.
The paper suggests that with this data, we can improve performance by calibrating feature responses from a pre-trained network.
This content is less related to my interests. I'll cover them briefly.
The distance between reference and distorted patches
The paper considers the following variants:
Finally, the paper refer to these as variants of the proposed Learned Perceptual Image Patch Similarity (LPIPS).
Figure 4 shows the performance of various low-level metrics (in red), deep networks, and human ceiling (in black).
The 2AFC distortion preference test has high correlation to JND:
Pairs which BiGAN perceives to be far but SSIM to be close generally contain some blur. BiGAN tends to perceive correlated noise patterns to be a smaller distortion than SSIM.
The stronger a feature set is at classification and detection, the stronger it is as a model of perceptual similarity judgments.
Features that are good at semantic tasks, are also good at self-supervised and unsupervised tasks, and also provide good models of both human perceptual behavior and macaque neural activity.