Authors:
Qing Lin; Bo Yan; Weimin Tan
Publication:
This paper is included in the Proceedings of the 29th ACM International Conference on Multimedia (ACM MM), October 20–24, 2021.
Abstract:
Glasses removal is a challenging task due to the diversity of glasses species and the difficulty of obtaining paired datasets. Most existing methods need to build different models for different glasses or expensive paired datasets for supervised training, which lacks universality. In this paper, we propose a multimodal asymmetric dual learning method for unsupervised glasses removal. This method uses large-scale face images with and without glasses for dual feature learning, which does not require intensive manual marking of the glasses. Given a face image with glasses, we aim to generate a glasses-free image preserving the person identity. Thus, in order to make up for the lack of semantic features in the glasses region, we introduce the text description of the target image into the task, and propose a text-guided multimodal feature fusion method. We adaptively select the glasses-free image closest to the target one for better dual feature learning. We also propose an exchange residual loss to generate more precise mask of glasses. Extensive experiments prove that our method can generate real glasses-free images, and better retain the person identity, which can be useful for face recognition.