Cross-CNN：基于CNN和Transformer混合模型的动画跨帧线稿着色算法

余毅丰; 钱江波; 严迪群; 王翀; 董理

doi:10.12263/DZXB.20230622

您当前的位置：

首页 >

文章列表页 >

Cross-CNN：基于CNN和Transformer混合模型的动画跨帧线稿着色算法

学术论文 | 更新时间：2025-12-24

- Cross-CNN：基于CNN和Transformer混合模型的动画跨帧线稿着色算法
- Cross-CNN: An Animation Cross-Frame Sketch Colorization Algorithm Based on Hybrid Model with CNN and Transformer
- 电子学报 2024年52卷第7期页码：2491-2502
- 作者机构：
  
  1.宁波大学信息科学与工程学院，浙江宁波 315000
  2.浙江省移动网应用技术重点实验室，浙江宁波 315000
- 作者简介：
  
  [ "余毅丰男，1998年8月出生，浙江省宁波人.宁波大学信息科学与工程学院硕士研究生.主要研究方向是计算机视觉、图像着色. E-mail: 2011082343@nbu.edu.cn" ]
  [ "钱江波男，1974年7月出生，浙江省宁波人.宁波大学信息科学与工程学院教授、博士生导师.主要研究方向为计算机视觉、数据挖掘. E-mail: qianjiangbo@nbu.edu.cn" ]
  [ "严迪群男，1979年7月出生，浙江省宁波人.现为宁波大学信息科学与工程学院副教授.主要研究方向为深度学习、计算机视觉.E-mail: yandiqun@nbu.edu.cn" ]
  [ "王翀男，1985年2月出生，浙江省宁波人.现为宁波大学信息科学与工程学院副教授.主要研究方向为计算机视觉、图像/视频处理.E-mail: wangchong@nbu.edu.cn" ]
  [ "董理男，1990年8月出生，河南省周口人.宁波大学信息科学与工程学院副研究员.主要研究方向为多媒体内容.中国电子学会会员编号：E190036628M.E-mail: dongli@nbu.edu.cn" ]
- 基金信息：
  
  国家自然科学基金(62271274);宁波市科技项目(2024Z004;2023Z059)
- DOI：10.12263/DZXB.20230622
  中图分类号： TP391.41;
- 收稿：2023-07-03，
  
  修回：2023-10-25，
  
  纸质出版：2024-07-25
- 稿件说明：
移动端阅览
余毅丰, 钱江波, 严迪群, 等. Cross-CNN：基于CNN和Transformer混合模型的动画跨帧线稿着色算法[J]. 电子学报, 2024, 52(07): 2491-2502.

YU Yi-feng, QIAN Jiang-bo, YAN Di-qun, et al. Cross-CNN: An Animation Cross-Frame Sketch Colorization Algorithm Based on Hybrid Model with CNN and Transformer[J]. Acta Electronica Sinica, 2024, 52(07): 2491-2502.
余毅丰, 钱江波, 严迪群, 等. Cross-CNN：基于CNN和Transformer混合模型的动画跨帧线稿着色算法[J]. 电子学报, 2024, 52(07): 2491-2502. DOI：10.12263/DZXB.20230622

YU Yi-feng, QIAN Jiang-bo, YAN Di-qun, et al. Cross-CNN: An Animation Cross-Frame Sketch Colorization Algorithm Based on Hybrid Model with CNN and Transformer[J]. Acta Electronica Sinica, 2024, 52(07): 2491-2502. DOI：10.12263/DZXB.20230622

摘要

对长序列的动画线稿帧进行着色是计算机视觉中一项具有挑战性的任务.一方面，线稿中包含的信息较为稀疏，需要着色算法对缺失的信息进行推断；另一方面，连续帧之间的色彩需要保持一致，以确保整个视频的视觉质量.现有的着色算法多数只针对单张图片进行着色，这类算法只给出一个开放性的符合合理范围的色彩结果，无法适用于帧序列着色.另一些基于参考帧的着色算法，并没有将2帧之间的关系有机地联系起来，导致着色效果不够出色.在同一镜头序列中，同一对象的特征往往不会发生太大变化，因此，可以设计一个根据给定参考帧，即可给线稿自动着色的模型.为此，本文提出了基于CNN（Convolutional Neural Networks）和Transformer相结合的模型Cross-CNN，该模型能够从参考帧中寻找并匹配颜色，从而保证时间维度上的特征一致性.Cross-CNN模型参考帧和线稿帧在通道维度叠加，输入预训练的R

esnet50网络提取局部融合特征，将融合特征图传给Transformer结构进行编码以提取全局特征.在Transformer结构中设计了交叉注意力机制更好地匹配远距离特征.最后使用带有跳层连接的卷积解码器完成着色图片输出.本文在数据集方面从8部电影中截取画面并经过严格筛选，最终制作了一个包含20 000对二元组的数据集用于实验研究.Cross-CNN的SSIM（Structural SIMilarity）达到了0.932，高于SOTA算法0.014.本文算法代码链接：

https：//github.com/silenye/Cross-CNN

https://github.com/silenye/Cross-CNN

Abstract

Coloring long sequences of animated sketch frames is a challenging task in computer vision. On one hand

the information contained in sketches is sparse

and coloring algorithms need to infer missing information. On the other hand

the colors between consecutive frames need to be consistent to ensure visual quality throughout the video. Most existing coloring algorithms are designed for single images and only provide one open-ended

reasonable color result

which is not suitable for coloring frame sequences. Other reference-based coloring algorithms do not have an organic connection between two frames

resulting in unsatisfactory coloring results. In the same shot sequence

the features of same object usually do not change too much. Therefore

a model that can automatically color sketches based on a given reference frame can be designed. This paper proposes a new model called Cross-CNN that combines convolutional neural networks (CNN) and Transformer. Our Cross-CNN can find and match colors from the reference frame

thus ensuring temporal feature consistency. In this model

the reference frame and the sketch frame are superimposed in the channel dimension

and the pre-trained Resnet50 network is used to extract locally fused features. The fused feature map is then passed to the Transformer structure for encoding to extract global features. In the Transformer structure

a cross attention mechanism is designed to better match long-distance features. Finally

a convolutional decoder with skip connections is used to output the colored image. In terms of the dataset

this paper extracte

d frames from eight movies and conducted strict screening to create a dataset containing 20 000 pairs of reference and sketch frames for experimental research. The SSIM (Structural SIMilarity) of Cross-CNN can reach 0.932

which is higher than the SOTA algorithm by 0.014. The algorithm codes link for this paper:

https://github.com/silenye/Cross-CNN

关键词

Keywords

references

ZENG R . Research on the application of computer digital animation technology in film and television [J ] . Journal of Physics: Conference Series , 2021 , 1915 ( 3 ): 032047 .

ZHANG Q , WANG B , WEN W , et al . Line art correlation matching feature transfer network for automatic animation colorization [C ] // 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2021 : 3871 - 3880 .

SHI M , ZHANG J Q , CHEN S Y , et al . Reference-based deep line art video colorization [J ] . IEEE Transactions on Visualization and Computer Graphics , 2023 , 29 ( 6 ): 2965 - 2979 .

YOO S , BAHNG H , CHUNG S , et al . Coloring with limited data: Few-shot colorization via memory augmented networks [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 11275 - 11284 .

CASEY E , PÉREZ V , LI Z R . The animation transformer: Visual correspondence via segment matching [C ] // 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2021 : 11303 - 11312 .

LI S Y , ZHAO S Y , YU W J , et al . Deep animation video interpolation in the wild [C ] // 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2021 : 6583 - 6591 .

HORIUCHI T , HIRANO S . Colorization algorithm for grayscale image by propagating seed pixels [C ] // Proceedings 2003 International Conference on Image Processing . Piscataway : IEEE , 2003 : 1 - 457 .

LEVIN A , LISCHINSKI D , WEISS Y . Colorization using optimization [J ] . ACM Transactions on Graphics , 23 ( 3 ): 689 - 694 .

QU Y G , WONG T T , HENG P A . Manga colorization [J ] . ACM Transactions on Graphics , 25 ( 3 ): 1214 - 1220 .

SÝKORA D , DINGLIANA J , COLLINS S . LazyBrush: Flexible painting tool for hand-drawn cartoons [J ] . Computer Graphics Forum , 2009 , 28 ( 2 ): 599 - 608 .

YATZIV L , SAPIRO G . Fast image and video colorization using chrominance blending [J ] . IEEE Transactions on Image Processing , 2006 , 15 ( 5 ): 1120 - 1129 .

GOODFELLOW I J , POUGET-ABADIE J , MIRZA M , et al . Generative adversarial nets [C ] // Proceedings of the 27th International Conference on Neural Information Processing Systems . New York : ACM , 2014 : 2672 - 2680 .

ISOLA P , ZHU J Y , ZHOU T H , et al . Image-to-image translation with conditional adversarial networks [C ] // 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2017 : 5967 - 5976 .

MIRZA M , OSINDERO S . Conditional generative adversarial nets [EB/OL ] . ( 2014-11-06 )[ 2023-07-01 ] . http://arxiv.org/abs/1411.1784 http://arxiv.org/abs/1411.1784 .

ZHU J Y , PARK T , ISOLA P , et al . Unpaired image-to-image translation using cycle-consistent adversarial networks [C ] // 2017 IEEE International Conference on Computer Vision (ICCV) . Piscataway : IEEE , 2017 : 2242 - 2251 .

NAZERI K , NG E , EBRAHIMI M . Image colorization using generative adversarial networks [M ] // Articulated Motion and Deformable Objects . Cham : Springer International Publishing , 2018 : 85 - 94 .

SU J W , CHU H K , HUANG J B . Instance-aware image colorization [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 7965 - 7974 .

金正猛 , 周晨 . 基于耦合全变差的快速图像着色算法 [J ] . 电子学报 , 2016 , 44 ( 10 ): 2364 - 2369 .

JIN Z M , ZHOU C . A fast coupled total variation algorithm for image colorization [J ] . Acta Electronica Sinica , 2016 , 44 ( 10 ): 2364 - 2369 . (in Chinese)

李洪安 , 郑峭雪 , 张婧 , 等 . 结合Pix2Pix生成对抗网络的灰度图像着色方法 [J ] . 计算机辅助设计与图形学学报 , 2021 , 33 ( 6 ): 929 - 938 .

LI H A , ZHENG Q X , ZHANG J , et al . Pix2Pix-based grayscale image coloring method [J ] . Journal of Computer-Aided Design & Computer Graphics , 2021 , 33 ( 6 ): 929 - 938 . (in Chinese)

RONNEBERGER O , FISCHER P , BROX T . U-Net: Convolutional networks for biomedical image segmentation‍ [M ] // Lecture Notes in Computer Science . Cham : Springer International Publishing , 2015 : 234 - 241 .

ZHANG L M , JI Y , LIN X , et al . Style transfer for anime sketches with enhanced residual U-net and auxiliary classifier GAN [C ] // 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) . Piscataway : IEEE , 2017 : 506 - 511 .

ZHANG L M , LI C Z , WONG T T , et al . Two-stage sketch colorization [J ] . ACM Transactions on Graphics , 2018 , 37 ( 6 ): 1 - 14 .

ZHANG B , HE M M , LIAO J , et al . Deep exemplar-based video colorization [C ] // 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2019 : 8044 - 8053 .

LIU X T , WU W L , LI C Z , et al . Reference-guided structure-aware deep sketch colorization for cartoons [J ] . Computational Visual Media , 2022 , 8 ( 1 ): 135 - 148 .

LEE J , KIM E , LEE Y , et al . Reference-based sketch image colorization using augmented-self reference and dense semantic correspondence [C ] // 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE , 2020 : 5800 - 5809 .

ZHANG J S , ZHU S Q , LIU K X , et al . UGSC-GAN: User-guided sketch colorization with deep convolution generative adversarial networks [J ] . Computer Animation and Virtual Worlds , 2022 , 33 ( 1 ): e2032 .

CHO Y , LEE J , YANG S , et al . Guiding users to where to give color hints for efficient interactive sketch colorization via unsupervised region prioritization [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 1818 - 1827 .

LI X Y , ZHANG B , LIAO J , et al . Deep sketch-guided cartoon video inbetweening [J ] . IEEE Transactions on Visualization and Computer Graphics , 2022 , 28 ( 8 ): 2938 - 2952 .

HENSMAN P , AIZAWA K . cGAN-based manga colorization using a single training image [C ] // 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) . Piscataway : IEEE , 2017 : 72 - 77 .

THASARATHAN H , NAZERI K , EBRAHIMI M . Automatic temporally coherent video colorization [C ] // 2019 16th Conference on Computer and Robot Vision (CRV) . Piscataway : IEEE , 2019 : 189 - 194 .

LI Z K , GENG Z Y , KANG Z , et al . Eliminating gradient conflict in reference-based line-art colorization [C ] // European Conference on Computer Vision . Cham : Springer , 2022 : 579 - 596 .

LIN X X , WANG X , LI F , et al . Example-based image recoloring in an indoor environment [J ] . Computer Animation and Virtual Worlds , 2019 , 31 ( 2 ): e1917 .

LIU S F , ZHONG G Y , DE MELLO S , et al . Switchable temporal propagation network [C ] // European Conference on Computer Vision . Cham : Springer , 2018 : 89 - 104 .

VONDRICK C , SHRIVASTAVA A , FATHI A , et al . Tracking emerges by colorizing videos [C ] // Computer Vision-ECCV 2018: 15th European Conference . New York : ACM , 2018 : 402 - 419 .

CHENG S N , CHEN Y J , CHIU W C , et al . Adaptively-realistic image generation from stroke and sketch with diffusion model [C ] // 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Piscataway : IEEE , 2023 : 4043 - 4051 .

CHEN J N , LU Y Y , YU Q H , et al . TransUNet: Transformers make strong encoders for medical image segmentation [EB/OL ] . ( 2024-02-08 )[ 2023-07-01 ] . http://arxiv.org/abs/2102.04306 http://arxiv.org/abs/2102.04306 .

KINGMA D P , BA J . Adam: A method for stochastic optimization [EB/OL ] . ( 2014-12-22 )[ 2023-07-01 ] . http://arxiv.org/abs/1412.6980 http://arxiv.org/abs/1412.6980 .

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于边缘辅助和多尺度Transformer的无参考屏幕内容图像质量评估

基于深度学习的单帧图像超分辨率重建综述