The following is a summary of “From Photos to Sketches – how humans and deep neural networks process objects across different levels of Visual abstraction,” published in the February 2022 issue of Ophthalmology by Singer et al.
For a study, researchers sought to investigate the ability of deep convolutional neural networks (CNNs) to generalize abstracted object images, such as drawings, and compare this ability to human behavior. Specifically, the study addressed the seemingly conflicting findings in previous work, where CNNs trained on natural images exhibited poor classification performance on drawings. In contrast, other studies demonstrated highly similar latent representations in the networks for abstracted and natural images.
The study analyzed the activation patterns of a CNN trained on natural images across a set of photographs, drawings, and sketches of the same objects and compared them to human behavior. In addition, the symbolic structure was analyzed across levels of visual abstraction in the early and intermediate layers of the network. The study also identified texture bias in CNNs as a contributing factor to the dissimilar representational structure in later layers and poor performance on drawings. Finally, the study demonstrated the general utility of features learned on natural images in early and intermediate layers for recognizing drawings by fine-tuning late network layers with object drawings.
The study found a highly similar representational structure across levels of visual abstraction in early and intermediate layers of the CNN. However, this similarity did not translate to later stages in the network, resulting in low classification performance for drawings and sketches.
Furthermore, texture bias in CNNs was identified as contributing to the dissimilar representational structure in late layers and the poor performance on drawings. Finally, the study demonstrated that performance could be largely restored by fine-tuning late network layers with object drawings, demonstrating the general utility of features learned on natural images in early and intermediate layers to recognize drawings.
Generalization to abstracted images, such as drawings, is an emergent property of CNNs trained on natural images but is suppressed by domain-related biases that arise during later processing stages in the network. The study highlighted the importance of understanding the factors that contribute to the differences in representational structure between natural and abstracted images in CNNs and provides insights into how to improve the ability of CNNs to generalize to abstracted images.
Reference: jov.arvojournals.org/article.aspx?articleid=2778420