- 无标题文档
查看论文信息

中文题名:

 

基于ASAPNet的图像翻译模型优化方法研究

    

姓名:

 李欢    

学号:

 1049732002605    

保密级别:

 公开    

论文语种:

 chi    

学科代码:

 085212    

学科名称:

 工学 - 工程 - 软件工程    

学生类型:

 硕士    

学校:

 武汉理工大学    

院系:

 计算机与人工智能学院    

专业:

 软件工程    

研究方向:

 深度学习,图像处理    

第一导师姓名:

 李玉强    

第一导师院系:

 计算机与人工智能学院    

完成日期:

 2023-03-29    

答辩日期:

 2023-05-20    

中文关键词:

 

图像翻译 ; 空间相关性 ; 焦频损失 ; 数据增强 ; 联合训练

    

中文摘要:

图像翻译也被称作图像到图像的转换,是指把一种输入图像转换成另一种输出图像的任务,在计算机视觉领域中有着非常广泛的应用。深度学习的发展催生了许多以生成对抗网络为基础的图像翻译模型。其中,一些模型训练速度较快但翻译质量有待提升,还有一些模型容易过拟合,需要大量数据才能获得较好的效果。因此,本文以快速图像翻译领域的经典模型ASAPNet为基础,从模型目标函数优化、模型结构等角度改善该模型的图像翻译质量,并缓解模型的过拟合问题,使之在少量数据的情况下也能获得较好的翻译效果。本文的主要研究内容如下:

(1)针对ASAPNet模型中的损失函数无法解耦图像结构和外观以及缺乏频域优化导致图像翻译质量欠佳的问题,提出一种基于空间相关性和焦频损失的图像翻译模型SF-ASAPNet。该模型采用两个特征提取器来提取输入图像和输出图像的自相似性模式,并利用空间相关性损失替换ASAPNet模型中的特征匹配损失,以缓解图像场景结构差异。同时,在模型中应用焦频损失FFL可以弥补在图像合成过程中缺失的频域约束,对现有的空间损失进行补充,提高图像翻译时的图像合成质量。最后,通过在公开数据集上与图像翻译领域的其他具有代表性的模型进行对比实验,验证了SF-ASAPNet模型改进工作的有效性。

(2)针对ASAPNet模型在数据较少时容易发生过拟合,训练发散的问题,提出一种基于ReMix数据增强和生成性联合训练网络的图像翻译模型RG-ASAPNet。一方面,受ReMix数据增强方法的启发,在特征层面对训练样本进行插值,通过边训练边增强的方式增加样本数量,并基于样本之间的感知关系提出一种新的内容损失,使生成器学习特征级样本而不是训练集,从而减少生成器的过拟合;另一方面,引入一种生成性联合训练网络来替换原有的鉴别器,通过参数多样化来联合训练多个不同的互补鉴别器,专注于识别图像中的不同信息,减少鉴别器的过拟合。最后,通过大量实验验证了所提模型RG-ASAPNet有效地提高了在数据较少时的图像生成质量,缓解了模型的过拟合问题。

(3)本文将空间相关性损失、焦频损失用于改进ASAPNet模型的优化目标,将ReMix数据增强作用于生成器端以及改进模型训练流程,将生成性联合训练网络用于改进ASAPNet模型的鉴别器,把这些方法有机融合在一起,共同作用于ASAPNet模型。基于此,提出了一个综合改进模型SFRG-ASAPNet。实验结果表明,相比原模型ASAPNet,SFRG-ASAPNet模型在没有过多影响平均运行时间的前提下提升了图像翻译质量,并且在少量数据的情况下依然能获得较好的图像翻译效果。

参考文献:

[1] Isola P, Zhu J Y, Zhou T, et al. Image-to-image translation with conditional adversarial networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 1125-1134.

[2] Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. Proceedings of the IEEE International Conference on Computer Vision, 2017: 2223-2232.

[3] Goodfellow I, Pouget Abadie J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2014, 63: 139-144.

[4] Cherian A, Pais G D, Jain S, et al. InSeGAN: A Generative Approach to Segmenting Identical Instances in Depth Images[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10023-10032.

[5] Sarda S. Face Verification Bypass[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 1-9.

[6] Wu J C, Chen D J, Fuh C S, et al. Learning unsupervised metaformer for anomaly detection[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 4369-4378.

[7] 赵汉理, 刘影, 卢望龙, 等. 基于感知去模糊的高效人脸图像修复算法[J]. 计算机辅助设计与图形学学报, 2022, 34(09): 1420-1431.

[8] Tian C, Zhang X, Lin J C W, et al. Generative adversarial networks for image super-resolution: A survey[J]. Image and Video Processing, 2022, 1-31.

[9] Maeda S. Unpaired image super-resolution using pseudo-supervision[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 291-300.

[10] 刘建伟, 谢浩杰, 罗雄麟. 生成对抗网络在各领域应用研究进展[J]. 自动化学报, 2020, 46(12): 2500-2536.

[11] Shaham T R, Gharbi M, Zhang R, et al. Spatially-adaptive pixelwise networks for fast image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 14882-14891.

[12] Chen X, Mishra N, Rohaninejad M, et al. Pixelsnail: An improved autoregressive generative model[C]. International Conference on Machine Learning, 2018: 864-872.

[13] Srivastava N, Salakhutdinov R R, Hinton G E. Modeling Documents with Deep Boltzmann Machines[C]. Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013: 1-9.

[14] Abdelhamed A, Brubaker M A, Brown M S. Noise flow: Noise modeling with conditional normalizing flows[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 3165-3173.

[15] Vahdat A, Kautz J. NVAE: A deep hierarchical variational autoencoder[J]. Advances in Neural Information Processing Systems, 2020, 33: 19667-19679.

[16] Chen Q, Koltun V. Photographic image synthesis with cascaded refinement networks[C]. Proceedings of the IEEE International Conference on Computer Vision, 2017: 1511-1520.

[17] Wang T C, Liu M Y, Zhu J Y, et al. High-resolution image synthesis and semantic manipulation with conditional gans[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8798-8807.

[18] Zhang P, Zhang B, Chen D, et al. Cross-domain correspondence learning for exemplar-based image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 5143-5153.

[19] Zhou X, Zhang B, Zhang T, et al. Cocosnet v2: Full-resolution correspondence learning for image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 11465-11475.

[20] Zhu J Y, Zhang R, Pathak D, et al. Toward multimodal image-to-image translation[J]. Advances in Neural Information Processing Systems, 2017, 30: 1-12.

[21] Bansal A, Sheikh Y, Ramanan D. Pixelnn: Example-based image synthesis[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017: 1-10.

[22] Kim H, Mnih A. Disentangling by factorising[C]. International Conference on Machine Learning, 2018: 2649-2658.

[23] 毛琳, 王萌, 杨大伟. 内容特征一致性风格迁移网络[J]. 计算机辅助设计与图形学学报, 2022, 34(06): 892-900.

[24] Taigman Y, Polyak A, Wolf L. Unsupervised cross-domain image generation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2016: 1-14.

[25] Yi Z, Zhang H, Tan P, et al. Dualgan: Unsupervised dual learning for image-to-image translation[C]. Proceedings of the IEEE International Conference on Computer Vision, 2017: 2849-2857.

[26] Kim T, Cha M, Kim H, et al. Learning to discover cross-domain relations with generative adversarial networks[C]. International Conference on Machine Learning, 2017: 1857-1865.

[27] Mo S, Cho M, Shin J. InstaGAN: Instance-aware Image-to-Image Translation[C]. International Conference on Learning Representations, 2018: 1-26.

[28] Choi Y, Choi M, Kim M, et al. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 8789-8797.

[29] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans[C]. International Conference on Machine Learning, 2017: 2642-2651.

[30] He Z, Zuo W, Kan M, et al. Attgan: Facial attribute editing by only changing what you want[J]. IEEE Transactions on Image Processing, 2019, 28: 5464-5478.

[31] Lin J, Xia Y, Qin T, et al. Conditional image-to-image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 5524-5532.

[32] Huang X, Liu M Y, Belongie S, et al. Multimodal unsupervised image-to-image translation[C]. Proceedings of the European conference on computer vision 2018: 172-189.

[33] Lee H Y, Tseng H Y, Huang J B, et al. Diverse image-to-image translation via disentangled representations[C]. Proceedings of the European Conference on Computer Vision, 2018: 35-51.

[34] Mao Q, Lee H Y, Tseng H Y, et al. Mode seeking generative adversarial networks for diverse image synthesis[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 1429-1437.

[35] Tomei M, Cornia M, Baraldi L, et al. Art2real: Unfolding the reality of artworks via semantically-aware image-to-image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5849-5859.

[36] Cho W, Choi S, Park D K, et al. Image-to-image translation via group-wise deep whitening-and-coloring transformation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 10639-10647.

[37] Van Der Ouderaa T F, Worrall D E. Reversible gans for memory-efficient image-to-image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4720-4728.

[38] Chen R, Huang W, Huang B, et al. Reusing discriminators for encoding: Towards unsupervised image-to-image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8168-8177.

[39] Li M, Lin J, Ding Y, et al. Gan compression: Efficient architectures for interactive conditional gans[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 5284-5294.

[40] Wang C, Zheng H, Yu Z, et al. Discriminative region proposal adversarial networks for high-quality image-to-image translation[C]. Proceedings of the European Conference on Computer Vision, 2018: 770-785.

[41] Tang H, Xu D, Sebe N, et al. Multi-channel attention selection gan with cascaded semantic guidance for cross-view image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2417-2426.

[42] Park T, Liu M Y, Wang T C, et al. Semantic image synthesis with spatially-adaptive normalization[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2337-2346.

[43] Zhu P, Abdal R, Qin Y, et al. Sean: Image synthesis with semantic region-adaptive normalization[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 5104-5113.

[44] Li M, Huang H, Ma L, et al. Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks[C]. Proceedings of the European Conference on Computer Vision, 2018: 184-199.

[45] Chen X, Xu C, Yang X, et al. Attention-gan for object transfiguration in wild images[C]. Proceedings of the European Conference on Computer Vision, 2018: 164-180.

[46] Kim J, Kim M, Kang H, et al. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation[C]. International Conference on Learning Representations, 2019: 1-19.

[47] Chang H Y, Wang Z, Chuang Y Y. Domain-specific mappings for generative adversarial style transfer[C]. Proceedings of the European Conference on Computer Vision, 2020: 573-589.

[48] Amodio M, Krishnaswamy S. Travelgan: Image-to-image translation by transformation vector learning[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8983-8992.

[49] Wu W, Cao K, Li C, et al. Transgaga: Geometry-aware unsupervised image-to-image translation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 8012-8021.

[50] Zhao Y, Wu R, Dong H. Unpaired image-to-image translation using adversarial consistency loss[C]. Proceedings of the European Conference on Computer Vision, 2020: 800-815.

[51] Liang J, Zeng H, Zhang L. High-resolution photorealistic image translation in real-time: A laplacian pyramid translation network[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 9392-9400.

[52] Park T, Efros A A, Zhang R, et al. Contrastive learning for unpaired image-to-image translation[C]. Proceedings of the European Conference on Computer Vision, 2020: 319-345.

[53] Fu H, Gong M, Wang C, et al. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 2427-2436.

[54] Zheng C, Cham T J, Cai J. The spatially-correlative loss for various image translation tasks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 16407-16417.

[55] Jiang L, Dai B, Wu W, et al. Focal frequency loss for image reconstruction and synthesis[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 13919-13929.

[56] 葛轶洲, 刘恒, 王言, 等. 小样本困境下的深度学习图像识别综述[J]. 软件学报, 2022, 33(01): 193-210.

[57] 冯晓硕, 沈樾, 王冬琦. 基于图像的数据增强方法发展现状综述[J]. 计算机科学与应用, 2021, 11(2): 370-382.

[58] Shorten C, Khoshgoftaar T M. A survey on image data augmentation for deep learning[J]. Journal of Big Data, 2019, 6: 1-48.

[59] Yun S, Han D, Oh S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]. Proceedings of the IEEE International Conference on Computer Vision, 2019: 6023-6032.

[60] Inoue H. Data augmentation by pairing samples for images classification[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 1-8.

[61] Zhong Z, Zheng L, Kang G, et al. Random erasing data augmentation[C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34: 13001-13008.

[62] Chen P, Liu S, Zhao H, et al. Gridmask data augmentation[J]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2020, 1-9.

[63] Cubuk E D, Zoph B, Mane D, et al. Autoaugment: Learning augmentation policies from data[J]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 1-14.

[64] Cubuk E D, Zoph B, Shlens J, et al. Randaugment: Practical automated data augmentation with a reduced search space[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020: 702-703.

[65] Li Y, Hu G, Wang Y, et al. DADA: Differentiable Automatic Data Augmentation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2020: 1-16.

[66] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[C]. International Conference on Learning Representations, 2015: 1-16.

[67] He Y, Schiele B, Fritz M. Diverse conditional image generation by stochastic regression with latent drop-out codes[C]. Proceedings of the European Conference on Computer Vision 2018: 406-421.

[68] Cao J, Hou L, Yang M H, et al. Remix: Towards image-to-image translation with limited data[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 15018-15027.

[69] Zhao S, Liu Z, Lin J, et al. Differentiable augmentation for data-efficient gan training[J]. Advances in Neural Information Processing Systems, 2020, 33: 7559-7570.

[70] Karras T, Aittala M, Hellsten J, et al. Training generative adversarial networks with limited data[J]. Advances in Neural Information Processing Systems, 2020, 33: 12104-12114.

[71] Cui K, Huang J, Luo Z, et al. GenCo: generative co-training for generative adversarial networks with limited data[C]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36: 499-507.

[72] Yoo J, Uh Y, Chun S, et al. Photorealistic style transfer via wavelet transforms[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 9036-9045.

[73] Karras T, Laine S, Aittala M, et al. Analyzing and improving the image quality of stylegan[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8110-8119.

[74] Pidhorskyi S, Adjeroh D A, Doretto G. Adversarial latent autoencoders[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 14104-14113.

[75] Wang S Y, Wang O, Zhang R, et al. CNN-generated images are surprisingly easy to spot... for now[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 8695-8704.

[76] Mildenhall B, Srinivasan P P, Tancik M, et al. Nerf: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2021, 65: 99-106.

[77] Tyleček R, Šára R. Spatial pattern templates for recognition of objects with regular structure[C]. The German Conference on Pattern Recognition, 2013: 364-374.

[78] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2016: 3213-3223.

[79] Heusel M, Ramsauer H, Unterthiner T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017, 30: 1-38.

[80] Tan Z, Chen D, Chu Q, et al. Efficient semantic image synthesis via class-adaptive normalization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44: 4852-4866.

[81] Chen Y J, Cheng S I, Chiu W C, et al. Vector Quantized Image-to-Image Translation[C]. Proceedings of the European Conference on Computer Vision, 2022: 440-456.

[82] Yu F, Koltun V, Funkhouser T. Dilated residual networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017: 472-480.

[83] Karras T, Laine S, Aila T. A style-based generator architecture for generative adversarial networks[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 4401-4410.

[84] Berthelot D, Carlini N, Goodfellow I, et al. Mixmatch: A holistic approach to semi-supervised learning[J]. Advances in Neural Information Processing Systems, 2019, 32: 1-14.

[85] Devries T, Taylor G W. Dataset augmentation in feature space [C]. International Conference on Learning Representations. 2017: 1-12.

[86] Wan J, Tang S, Zhang Y, et al. Hdidx: High-dimensional indexing for efficient approximate nearest neighbor search[J]. NeuroComputing, 2017, 237: 401-404.

[87] Qiao S, Shen W, Zhang Z, et al. Deep co-training for semi-supervised image recognition[C]. Proceedings of the European Conference on Computer Vision, 2018: 135-152.

[88] Saito K, Watanabe K, Ushiku Y, et al. Maximum classifier discrepancy for unsupervised domain adaptation[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018: 3723-3732.

中图分类号:

 TP391.41    

条码号:

 002000073845    

馆藏号:

 YD10001928    

馆藏位置:

 203    

备注:

 403-西院分馆博硕论文库;203-余家头分馆博硕论文库    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式