CN114898021B - Intelligent cartoon method for music stage performance video - Google Patents

Intelligent cartoon method for music stage performance video Download PDF

Info

Publication number
CN114898021B
CN114898021B CN202210812946.2A CN202210812946A CN114898021B CN 114898021 B CN114898021 B CN 114898021B CN 202210812946 A CN202210812946 A CN 202210812946A CN 114898021 B CN114898021 B CN 114898021B
Authority
CN
China
Prior art keywords
cartoon
image
loss function
representing
generation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210812946.2A
Other languages
Chinese (zh)
Other versions
CN114898021A (en
Inventor
朱春霖
姜秋晨子
廖勇
夏雄军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Normal University
Original Assignee
Hunan Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Normal University filed Critical Hunan Normal University
Priority to CN202210812946.2A priority Critical patent/CN114898021B/en
Publication of CN114898021A publication Critical patent/CN114898021A/en
Application granted granted Critical
Publication of CN114898021B publication Critical patent/CN114898021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an intelligent cartoon method for music stage performance videos, which comprises the following steps: acquiring a real stage image data set and a cartoon image data set, and preprocessing image data; performing semantic segmentation on characters, props and backgrounds in the music stage performance video; step three, constructing and training different cartoon video generation models for the music stage performance aiming at different objects such as characters, props, backgrounds and the like; inputting the music stage performance video into the model to obtain cartoon music stage performance video; and fifthly, constructing a composite image coordination model to carry out image harmony processing on the cartoon music stage performance video. The invention can carry out cartoon processing on the music stage performance video, thereby being applied to the fields of music performance, animation production and the like, and being more beneficial to generating the music stage performance video with clean outline, clear boundary and harmonious color.

Description

Intelligent cartoon method for music stage performance video
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an intelligent cartoon method for music stage performance videos.
Background
In recent years, with the continuous development of artificial intelligence, many algorithms are applied in the field of image processing, such as image style conversion. The cartoon is a very popular artistic expression form at present, and the artistic expression form is widely applied to various aspects of the society, including advertisement, games, movie and television works, photography and the like. At present, young people in this era are mostly influenced by Japanese cartoons, and the cartoons are really influenced worldwide, but because the cartoons are drawn and generated by hands and then rendered by computers, the time and the labor are relatively much, which can not be finished by people without drawing bases, and therefore, the modern cartoon animation workflow allows artists to use various resources to create contents. Some famous caricatures are created by converting real-world pictures into available cartoon scene material, a process called image cartoonization.
The image cartoon method can also be applied to the field of music education. The music stage performance video is rendered by a cartoon method, and the music stage performance is displayed in an artistic form of cartoon style, so that the interest of children in the music stage performance can be attracted. Although the existing cartoon method is applied to many fields, the application in the music education field is scarce, and in addition, the existing cartoon method cannot realize cartoon for characters, props and backgrounds at the same time, and harmony processing is not carried out on the characters, props and backgrounds after cartoon, so that a unified cartoon animation is formed.
The noun explains:
DCNN model based on semantic segmentation: a model for semantic segmentation of images using Deep Convolutional Neural Network (DCNN).
Based on the cartoon model for generating the countermeasure network: the images are cartoonized with a model that generates a countermeasure network (GAN).
Disclosure of Invention
The invention aims to provide a novel intelligent cartoon method for music stage performance videos aiming at the defects of the conventional image stylizing method, which is used for semantically segmenting different contents in a complex scene and cartoonizing the different contents by using different image stylizing methods.
The purpose of the invention is realized by the following technical scheme:
an intelligent cartoon method for music stage performance videos comprises the following steps:
acquiring image data and preprocessing the image data; the image data comprises a real stage image data set and a cartoon image data set; the real stage image dataset is obtained from a music stage performance video;
constructing a semantic segmentation model, wherein the semantic segmentation model carries out semantic segmentation on characters, props and backgrounds in the image data;
step three, constructing and training different cartoon video generation models for the music stage performance aiming at characters, props and backgrounds respectively; respectively obtaining a trained character cartoon video generation model, a trained property cartoon video generation model and a trained background cartoon video generation model:
respectively forming a character cartoon video generation model, a prop cartoon video generation model and a background cartoon video generation model which correspond to characters, props and backgrounds on the basis of a cartoon model for generating an antagonistic network;
3.1) Total loss function of character cartoon video Generation modelL body The following were used:
Figure 758013DEST_PATH_IMAGE001
whereinλ 1 、λ 2 、λ 3 、λ 4 、λ 5 、λ 6 Respectively being vague of surface information loss of the personL surface Loss function of character structure informationL structure Loss function of character texture informationL texture Loss function of character content informationL content Figure total variation loss functionL tv And l1 regularization termL 1 The information emphasis point of the generated image is controlled by giving different weights;
3.11) loss function of character surface information
Figure 738607DEST_PATH_IMAGE002
The following:
Figure 577250DEST_PATH_IMAGE003
edge-preserving filtering with a miniboot filter, denoted as
Figure 326900DEST_PATH_IMAGE004
Returning the extracted surface representation with an image I as input, itself as a guide map
Figure 923622DEST_PATH_IMAGE005
Deleting textures and details; introduction of discriminatorD s To determine if the model output and the reference cartoon image have similar surfaces and to direct the generatorGLearning information stored in the extracted surface representation; whereinGA representation generator for generating a representation of the object,D s a discriminator for the information on the surface is represented,I c the representation of a cartoon image is made,I p representing a real image;
3.12) loss function of character structure informationL structure The following were used:
Figure 502688DEST_PATH_IMAGE007
using advanced features extracted from a pre-trained VGG16 network, and then strengthening spatial constraints between character cartoon images generated by the character cartoon video generation model and structural representations extracted from the generated character cartoon images;
Figure 247791DEST_PATH_IMAGE008
the method comprises the steps of representing the structural representation extraction of the generated cartoon image of the person, namely the selective searching process and the color filling of the picture area;
Figure 809222DEST_PATH_IMAGE009
representing advanced features extracted in the generated character cartoon image by a VGG network;
and carrying out weighted summation according to the median value and the average value of the region to calculate the color of the region, wherein the formula is as follows:
Figure 374195DEST_PATH_IMAGE010
wherein the content of the first and second substances,S i,j a pixel value representing a region having a position (i, j),
Figure 882537DEST_PATH_IMAGE011
represents the average of the pixel values of the current region,
Figure 606780DEST_PATH_IMAGE012
a median value representing a current region pixel value; i denotes a row, j denotes a column;σ(S)to representSStandard deviation of (d);
3.13) loss function of human texture information
Figure 214478DEST_PATH_IMAGE013
The following were used:
Figure 125803DEST_PATH_IMAGE014
wherein the content of the first and second substances,F rcs representing a random color shift algorithm, and extracting a single-channel texture representation from the color image;D t representing a discriminator;
extracting single-channel texture representations from color images using random color transfer algorithmsF rcs (I rgb ) The formula is as follows:
Figure 375518DEST_PATH_IMAGE015
wherein the content of the first and second substances,I rgb representing a 3-channel RGB color image,I r I g andI b there are shown three color channels of the color,Yrepresenting a standard gray image converted from an RGB color image; introduction of discriminator
Figure 954267DEST_PATH_IMAGE016
Distinguishing character cartoon image output generated by a character cartoon video generation model from texture representation extracted from the character cartoon image generated by the model, and guiding the generator to learn clear outlines and fine textures stored in the texture representation; alpha represents the weight of the standard gray-scale image,β 1 、β 2 、β 3 respectively representing the weights of the r channel, the g channel and the b channel, and the value range is (-1, 1);
3.14) loss function of the personal content information
Figure 60763DEST_PATH_IMAGE017
Figure 334750DEST_PATH_IMAGE018
Figure 984562DEST_PATH_IMAGE019
Feature mapping representing a VGG layer, using VGG feature mapping between input photograph and generated picture after initialization
Figure 558762DEST_PATH_IMAGE020
Sparse regularization to refine semantic content loss;
3.15) Total figure variation loss function as follows:
Figure 632898DEST_PATH_IMAGE021
wherein the content of the first and second substances,H、W、Crepresenting a spatial dimension of an image;
Figure 659760DEST_PATH_IMAGE022
to represent
Figure 844753DEST_PATH_IMAGE023
The backward difference of the forward direction of the differential,
Figure 601357DEST_PATH_IMAGE024
to represent
Figure 721759DEST_PATH_IMAGE025
Backward difference of (d);
3.16) l1 regularization term:
Figure 94972DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 224602DEST_PATH_IMAGE027
the one-norm of the cartoon image of the person generated by the cartoon video generation model of the person is represented;
3.2) Total loss function of Property cartoon video Generation modelL prop The following were used:
Figure 632450DEST_PATH_IMAGE028
wherein the content of the first and second substances,a、b、c、d、eis thatL adv 、L con 、L tex 、L 1 、L IS The weight of (a) is determined,L adv 、L con 、L tex 、L 1 、L IS edge-contributing antagonism loss function, content information loss function, texture information loss function, l1 regularization term, and illumination smoothing loss, respectively;
3.21) edge-promoted resistance loss:
for each image
Figure 251650DEST_PATH_IMAGE029
Epsilon Sdata (c) and the following three steps are applied: (1) standard Canny was used
An edge detector detects edge pixels; (2) Expanding the edge region; (3) applying gaussian smoothing to the expanded edge region to obtain sdata (e); wherein Sdata (c) represents a collection of cartoon images, Sdata (e) represents a collection of cartoon images with clear boundaries removed,c i represents the ith sheet of the collection sdata (c) of cartoon images,e j the second of the set of cartoon images representing the removed clear boundaryjThe paper is stretched and put in a paper-making machine,p k representing the second in a set of images to be cartoonizedkOpening;
thus, the edge-facilitated antagonism loss functionL adv The following were used:
Figure 779900DEST_PATH_IMAGE031
Figure 917620DEST_PATH_IMAGE032
representing discrete variables
Figure 784687DEST_PATH_IMAGE033
In the probability distribution S data (c) The entropy of the lower one of the two,
Figure 7858DEST_PATH_IMAGE034
representing discrete variables
Figure 924047DEST_PATH_IMAGE035
In the probability distribution S data (e) The entropy of the lower one of the two,
Figure 822733DEST_PATH_IMAGE036
representing discrete variables
Figure 861096DEST_PATH_IMAGE037
In the probability distribution S data The entropy under (p) is given by the entropy,Dthe presence of the discriminator is indicated by the expression,Ga representation generator for generating a representation of the object,G(p k ) Represents the image generated by the generator G;
3.22) content information loss functionL con The following were used:
Figure 995275DEST_PATH_IMAGE038
a feature map representing one VGG layer;
3.23) texture information loss functionL tex The following were used:
Figure 505070DEST_PATH_IMAGE039
3.3) Total loss function of background cartoon video Generation modelL background The following were used:
Figure 296309DEST_PATH_IMAGE040
wherein the content of the first and second substances,e、f、g、hrespectively in background cartoon video generation modelL adv 、L con 、L str 、L 1 The weight of (c);
respectively training a character cartoon video generation model, a prop cartoon video generation model and a background cartoon video generation model to minimize the total loss function of the character cartoon video generation model, the total loss function of the prop cartoon video generation model and the total loss function of the background cartoon video generation model, so as to respectively obtain a trained character cartoon video generation model, a trained prop cartoon video generation model and a trained background cartoon video generation model; respectively inputting different parts of to-be-cartoon videos after semantic segmentation into a trained character cartoon video generation model, a trained property cartoon video generation model and a trained background cartoon video to obtain a character cartoon video, a property cartoon video and a background cartoon video; then compositing each frame of the character cartoon video, the prop cartoon video and the background cartoon video to obtain a composite image
Figure 361741DEST_PATH_IMAGE041
Thereby obtaining a composite cartoon music stage performance video;
step four, preprocessing the music stage performance video to be processed, then, after segmenting out characters, properties and backgrounds through a semantic segmentation model, inputting the characters, the properties and the backgrounds into a trained character cartoon video generation model, a trained property cartoon video generation model and a trained background cartoon video generation model respectively, and obtaining cartoon music stage performance video;
and fifthly, constructing a composite image coordination model to carry out image harmony processing on the cartoon music stage performance video to obtain the final cartoon music stage performance video.
In a further improvement, in the first step, the preprocessing method includes image enhancement and image normalization.
In the second step, the semantic segmentation model is a DCNN model based on semantic segmentation;
firstly, sending a picture into a DCNN model based on semantic segmentation, adding a hole convolution extraction feature to obtain a high-level semantic feature and a low-level semantic feature, wherein the hole convolution process is as follows:
Figure 718773DEST_PATH_IMAGE042
where y [ i ] represents the hole convolution output at position i,
x [ i + τ · K ] represents the input at position i + τ · K, K represents the length of the convolution kernel, w [ K ] represents the convolution filter of length K, τ represents the sampling step of the input signal;
the low-level semantic features are feature information obtained after hole convolution with a hole rate of 1 for one time, the high-level semantic features are feature information obtained after hole convolution with four times, extracted high-level semantic features are input into a hole pyramid pooling module and are convolved with hole convolution layers with different hole rates to obtain four feature maps, wherein the hole convolution hole rates are 1, 6, 12 and 18 respectively; pooling the extracted high-level semantic features to obtain a feature map; all the branches obtain five characteristic graphs, and the five characteristic graphs are spliced together to obtain a first characteristic graph;
putting the first characteristic diagram into a multilayer channel attention module to obtain a second characteristic diagram; carrying out bilinear interpolation upsampling on the second feature map and merging the second feature map with the low-level semantic features to obtain a merged feature map; the decoder part recovers the spatial information of the merged feature map by using 3 multiplied by 3 convolution and samples a fine target boundary on bilinear interpolation to obtain a segmentation result;
since there are multiple objects in the image segmentation task, a multi-classification cross entropy loss function is used
Figure 552737DEST_PATH_IMAGE043
The formula is as follows:
Figure 300113DEST_PATH_IMAGE044
wherein the content of the first and second substances,p i indicates that the sample belongs toiThe probability of a class is determined by the probability of the class,y i is an indication of the hit rate of the sample label, when the sample belongs to category i,y i = 1; when the sample does not belong to the first category i,y i =0;Crepresents the number of samples;
through the process, the characters and the props are separated from the stage background.
In a further improvement, γ 1 =20,γ 2 =40。
Further improvement, the concrete steps of the fifth step are as follows:
combining the images
Figure 244936DEST_PATH_IMAGE041
Resolution into reflectance
Figure 659736DEST_PATH_IMAGE045
And illuminating the intrinsic image
Figure 336705DEST_PATH_IMAGE046
Figure 495679DEST_PATH_IMAGE047
Wherein, l is an element-level product;
will coordinate through an image reconstruction loss functionL rec Embedding the process from the decomposition of the composite image to the reconstruction of the real image:
Figure 244192DEST_PATH_IMAGE048
Figure 920024DEST_PATH_IMAGE049
an entropy value representing a norm of the spatial distance of the outputted harmonised image from the real image,
Figure 626949DEST_PATH_IMAGE050
representing the outputted harmonised image,
to be provided with
Figure 348917DEST_PATH_IMAGE051
Figure 307646DEST_PATH_IMAGE052
As a constraint on coordinating reflectivity, a harmonic loss of reflectivity is generatedL RH
Figure 228197DEST_PATH_IMAGE053
Wherein the content of the first and second substances,
Figure 43707DEST_PATH_IMAGE051
representing the gradient of reflectance of the harmonised image,
Figure 925075DEST_PATH_IMAGE054
representing the gradient of the harmonised image,
Figure 815058DEST_PATH_IMAGE055
an entropy value representing a norm of a difference between a degree of reflection of the harmonised image and a gradient of the harmonised image;
Figure 465482DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 514210DEST_PATH_IMAGE057
the harmonised inherent image is represented and,
Figure 617295DEST_PATH_IMAGE058
represents a gradient;
to coordinate the illumination, the illumination of the foreground and background will be compatible, learning the light first and then transferring the light from the background to the foreground, provided that the image gradient corresponding to the illumination is smooth, with
Figure 776881DEST_PATH_IMAGE059
Constraint of 0 provides for decoupling, providing illumination smoothing loss;
setting lighting coordination lossL IH The following were used:
Figure 875287DEST_PATH_IMAGE060
Figure 235861DEST_PATH_IMAGE061
a real image is represented by a real image,
Figure 950876DEST_PATH_IMAGE062
the natural image after the harmony is represented,
Figure 117415DEST_PATH_IMAGE063
entropy values of two norms representing spatial distances between the harmonious inherent image and the real image;
constructing a composite image
Figure 742432DEST_PATH_IMAGE064
Discordance loss ofL IF
Figure 398541DEST_PATH_IMAGE065
Wherein the content of the first and second substances,
Figure 476218DEST_PATH_IMAGE066
in the form of a function of the similarity between,
Figure 977607DEST_PATH_IMAGE067
the encoder is shown receiving the composite image as an input, producing as an output a dissonant feature map,Cis composed of
Figure 776237DEST_PATH_IMAGE067
The number of the channels of (a) is,
Figure 213034DEST_PATH_IMAGE068
for the entropy value of the number of channels, the sum
Figure 902642DEST_PATH_IMAGE069
Reduced gray-scale real images of the same size;
get the total loss functionL harm The following were used:
Figure 83087DEST_PATH_IMAGE070
through training, the total loss function is enabledL harm Obtaining a final harmony processing model, inputting the obtained composite cartoon music stage performance video into the final harmony processing model to obtain a harmony music stage tablePerforming video playing;λ RH 、λ IS 、λ IH andλ IF are respectively asL RH 、L IS 、L IH AndL IF the weight of (c).
The invention has the advantages that:
compared with the prior art, the method and the device can carry out cartoon processing on the shot music stage performance video, and can carry out different cartooning on objects such as characters, props, backgrounds and the like. Wherein edge antagonism loss and high-level feature maps in VGG networks are facilitatedl1Sparse regularization provides good flexibility for reproducing smooth shadows. And training the image harmony model to harmonize the cartoon synthesized video so as to make the foreground consistent with the background. The method is more favorable for generating clean outline, clear boundary and harmonious color, and the generated cartoon video for the music stage performance can be widely applied to the field of music education, so that the interest of children in music is increased.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a test video screenshot;
FIG. 3 is a video semantic segmentation result screenshot;
fig. 4 is a video screenshot after image harmonization.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following examples.
The invention relates to an intelligent cartoon method for music stage performance videos, which comprises the following steps:
the method comprises the following steps of firstly, acquiring a real stage image data set and a cartoon image data set, and preprocessing the image data:
and collecting a real scene image data set and a cartoon image data set, carrying out image preprocessing, and constructing a training set and a testing set.
Step two, performing semantic segmentation on characters, props and backgrounds in the music stage performance video:
and performing framing processing on the original music stage table performance video input by the user, and designing a DCNNs model based on the hole convolution to perform semantic segmentation on each frame image. Extracting features using the DCNNs model and predicting a label, e.g., a person, a background, a prop, etc.;
designing a loss function to measure the difference between the predicted label and the real label;
calculating the gradient of each layer of parameters according to the difference, and then updating the gradient;
repeating the previous steps until the predicted label and the real label reach a certain accuracy;
given a picture, each pixel will output a probability of different categories, thereby generating a corresponding mask to segment characters, props and backgrounds in the music stage performance video.
Step three, constructing and training different cartoon video generation models for the music stage performance aiming at different objects such as characters, props, backgrounds and the like respectively:
designing and training a cartoon model based on the generated countermeasure network to cartoon different objects.
Constructing a cartoon model based on the generated countermeasure network to cartoon the character, wherein the total loss function is as follows:
Figure DEST_PATH_IMAGE072AA
whereinλ 1 、λ 2 、λ 3 、λ 4 、λ 5 、λ 6 Respectively as a function of loss of information on the surface of the personL surface Loss function of character structure informationL structure Loss function of character texture informationL texture Loss function of character content informationL content Figure total variation loss functionL tv And l1 regularization termL 1 The information emphasis point of the generated image is controlled by giving different weights. Wherein the content of the first and second substances,
1) loss function of surface information:
Figure 72909DEST_PATH_IMAGE073
edge-preserving filtering with a miniboot filter, denoted asF dgf It takes an image I as input, itself as a guide map, and returns the extracted surface representationF dgf (I,I)And texture and detail are removed. Introduction of discriminatorD s To determine whether the model output and the reference cartoon image have similar surfaces and to direct the generator G to learn the information stored in the extracted surface representation.
2) Loss of structural information:
Figure 805241DEST_PATH_IMAGE074
the extracted high-level features using the pre-trained VGG16 network then enforce spatial constraints between our results and the extracted structural representation. Let aF st Representing structural characterization extraction.
An adaptive filtering algorithm is used here in which median filtering is combined with mean filtering. The formula is as follows:
Figure 857511DEST_PATH_IMAGE075
where γ is set 1 =20,γ 2 =40。
3) Loss of texture information:
Figure 231861DEST_PATH_IMAGE076
a single-channel texture representation is extracted from the color image using a random color transfer algorithm. The formula is as follows:
Figure 748293DEST_PATH_IMAGE077
we have set a = 0.8,β 1 β 2 andβ 3 ∼U(-1,1)。
introduces a discriminatorD t To distinguish texture representations extracted from the model output and the cartoon and to direct the generator to learn the sharp contours and fine textures stored in the texture representations.
4) Loss of content information:
Figure 526893DEST_PATH_IMAGE078
Figure 191092DEST_PATH_IMAGE079
representing the feature map of a particular VGG layer. Using VGG feature mapping between input photograph and generated picture after initializationl 1 Sparse regularization to refine semantic content loss.
Figure 575325DEST_PATH_IMAGE080
Sparse regularization can better cope with the effect of large style differences on the feature maps.
5) Total variation loss function:
Figure 477422DEST_PATH_IMAGE081
designing a total variation loss functionL tv Spatial smoothness is imposed on the generated image. It also reduces high frequency noise, such as salt and pepper noise. In the formula (I), the compound is shown in the specification,H、W、Crepresenting the spatial dimension of the image.
6) l1 regularization term:
Figure 285978DEST_PATH_IMAGE082
constructing a cartoon model based on the generated confrontation network to cartoon the prop, wherein the total loss function is as follows:
Figure 171894DEST_PATH_IMAGE083
wherein the content of the first and second substances,a、b、c、d、eis to balance the weight for a given loss,L adv 、L con 、L tex 、L 1 、L IS respectively, an antagonism loss function, a content information loss function, a texture information loss function, an l1 regularization term, and a lighting smoothing loss.
1) Loss of antagonism:
for each image
Figure 153625DEST_PATH_IMAGE084
E Sdata (c), we apply the following three steps: (1) standard Canny was used
The edge detector detects edge pixels (2) the dilated edge region (3) applies gaussian smoothing to the dilated edge region, resulting in sdata (e). Thus, the edge-promoting antagonism loss function is as follows:
Figure 444317DEST_PATH_IMAGE085
2) loss of content information:
Figure 79567DEST_PATH_IMAGE086
the loss function of the content information is used to ensure that the cartoon results and input photo semantics are unchanged and it is also computed over the pre-trained VGG16 feature space.
3) Loss of texture information:
Figure 593725DEST_PATH_IMAGE087
introduces a discriminatorD t To distinguish texture representations extracted from the model output and the cartoon and to direct the generator to learn the sharp contours and fine textures stored in the texture representations.
4) l1 regularization term:
Figure 785671DEST_PATH_IMAGE088
5) loss of smoothness of illumination
To coordinate the lighting, we need to adjust the foreground lighting
Figure 193519DEST_PATH_IMAGE089
By background illumination
Figure 750402DEST_PATH_IMAGE090
From
Figure 814173DEST_PATH_IMAGE090
I is approximately equal, so that the illumination of the foreground and background will be compatible, we have devised a new illumination strategy, learning the light first and then shifting the light from the background to the foreground, provided that the image gradient corresponding to the illumination is small (i.e. the illumination is smooth), we have
Figure 606549DEST_PATH_IMAGE059
Constraint of ≈ 0 to decouple, providing illumination smoothing loss:
Figure 9848DEST_PATH_IMAGE091
constructing a cartoon model based on the generated countermeasure network to cartoon the background, wherein the total loss function is as follows:
Figure 599617DEST_PATH_IMAGE092
wherein the content of the first and second substances,f、g、h、iis to balance the weight of a given penalty,L adv 、L con 、L str 、L 1 the antagonism loss function, the content information loss function, the structural information loss, the l1 regularization term, and the illumination smoothing loss, respectively.
1) Loss of antagonism:
Figure 822788DEST_PATH_IMAGE093
2) loss of content information:
Figure 153275DEST_PATH_IMAGE094
3) loss of structural information:
Figure 145502DEST_PATH_IMAGE095
the extracted high-level features using the pre-trained VGG16 network then enforce spatial constraints between our results and the extracted structural representation. LetF st Representing structural characterization extraction.
4) l1 regularization term:
Figure 372084DEST_PATH_IMAGE096
three cartoonization models for different objects based on the generation of the countermeasure network are continuously trained by the loss function.
Step four, inputting the music stage performance video into the model to obtain cartoon music stage performance video:
and inputting the music stage performance video into the music stage performance cartoon video generation model to obtain the music stage performance video with cartoon effect. Firstly, extracting original video frames by using opencv, then carrying out semantic segmentation on images of different categories by using the methods in the steps two, three, four and five on each image, carrying out cartoon processing on the images of different categories by using different style migration algorithms, and finally carrying out image harmony processing. And then reading and writing each frame of the harmonious music stage performance cartoon image into the video, and then obtaining a complete video after the harmonious music stage performance cartoon. And extracting the audio of the original video through movie, and adding the audio to the cartoon music stage performance video to obtain the final cartoon effect of the music stage performance video.
And fifthly, constructing a composite image coordination model to carry out image harmony processing on the cartoon music stage performance video, and acquiring the cartoon music stage performance video with harmonious colors:
and carrying out image harmony processing on the video after cartoonization by using transfer learning. And carrying out image harmony processing on the cartoon video generated by the model by using a new composite image harmony method, wherein incoordination is eliminated mainly through separable reflectivity and illumination intrinsic image harmony, so that foreground and background are better fused. Firstly, an automatic encoder-based framework is constructed, a composite image is decomposed into a reflectivity and an illumination inherent image, then the reflectivity is punished and coordinated through material consistency, meanwhile, illumination is coordinated through adjusting the compatibility of foreground illumination and a background, a coordination relation model between the foreground and the background is further established, the coordination of the inherent image is guided, a mask is used in the illumination and guidance processes to separate the foreground and the background, and finally, the input video is enabled to obtain a harmonious performance video through a trained model.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the protection scope of the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (5)

1. The intelligent cartoon method for the music stage performance video is characterized by comprising the following steps of:
acquiring image data and preprocessing the image data; the image data comprises a real stage image data set and a cartoon image data set; the real stage image dataset is obtained from a music stage performance video;
constructing a semantic segmentation model, wherein the semantic segmentation model carries out semantic segmentation on figures, props and backgrounds in image data;
step three, constructing and training different cartoon video generation models for the music stage performance aiming at characters, props and backgrounds respectively; respectively obtaining a trained character cartoon video generation model, a trained property cartoon video generation model and a trained background cartoon video generation model:
respectively forming a character cartoon video generation model, a prop cartoon video generation model and a background cartoon video generation model which correspond to characters, props and backgrounds on the basis of a cartoon model for generating an antagonistic network;
3.1) Total loss function of character cartoonization video Generation modelL body The following were used:
Figure DEST_PATH_IMAGE001
whereinλ 1 、λ 2 、λ 3 、λ 4 、λ 5 、λ 6 Respectively being vague of surface information loss of the personL surface Loss function of character structure information
Figure 316097DEST_PATH_IMAGE002
Loss function of character texture information
Figure DEST_PATH_IMAGE003
Loss function of character content informationL content Figure total variation loss functionL tv And l1 regularization termL 1 By giving different weights to control the information emphasis of the generated image;
3.11) loss function of character surface information
Figure 436500DEST_PATH_IMAGE004
The following:
Figure DEST_PATH_IMAGE005
edge-preserving filtering with a miniboot filter, denoted as
Figure DEST_PATH_IMAGE006
Returning the extracted surface representation with an image I as input, itself as a guide map
Figure DEST_PATH_IMAGE007
Deleting textures and details; introduction of discriminatorD s To determine if the model output and the reference cartoon image have similar surfaces and to direct the generatorGLearning information stored in the extracted surface representation; whereinGA representation generator for generating a representation of the object,D s a discriminator for the information on the surface is represented,I c the representation of a cartoon image is made,I p representing a real image;
3.12) loss function of character structure informationL structure The following were used:
Figure DEST_PATH_IMAGE009
using advanced features extracted from a pre-trained VGG16 network, and then strengthening spatial constraints between character cartoon images generated by the character cartoon video generation model and structural representations extracted from the generated character cartoon images;
Figure DEST_PATH_IMAGE010
the method comprises the steps of representing the structural representation extraction of the generated cartoon image of the person, namely the selective searching process and the color filling of the picture area;
Figure DEST_PATH_IMAGE011
representing advanced features extracted in the generated character cartoon image by a VGG network;
and carrying out weighted summation according to the median value and the average value of the region to calculate the color of the region, wherein the formula is as follows:
Figure DEST_PATH_IMAGE012
wherein the content of the first and second substances,S i,j a pixel value representing a region having a position (i, j),
Figure DEST_PATH_IMAGE013
represents the average of the pixel values of the current region,
Figure DEST_PATH_IMAGE014
a median value representing a current region pixel value; i denotes a row and j denotes a column;σ(S)to representSThe standard deviation of (a);
3.13) loss function of human texture information
Figure DEST_PATH_IMAGE015
The following were used:
Figure DEST_PATH_IMAGE016
wherein the content of the first and second substances,F rcs representing a random color shift algorithm, and extracting a single-channel texture representation from the color image;D t representing a discriminator;
extracting single-channel texture representations from color images using random color transfer algorithmsF rcs (I rgb ) The formula is as follows:
Figure DEST_PATH_IMAGE017
wherein the content of the first and second substances,I rgb an RGB color image representing 3 channels is shown,I r I g andI b there are shown three color channels of the color,Yrepresenting a standard gray image converted from an RGB color image; introduction of discriminator
Figure DEST_PATH_IMAGE018
Distinguishing character cartoon image output generated by a character cartoon video generation model from texture representation extracted from the character cartoon image generated by the model, and guiding the generator to learn clear outlines and fine textures stored in the texture representation; alpha represents the weight of the standard gray-scale image,β 1 、β 2 、β 3 respectively representing the weights of the r channel, the g channel and the b channel, and the value range is (-1, 1);
3.14) loss function of the personal content information
Figure DEST_PATH_IMAGE019
Figure DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
Representing a feature mapping of a VGG layer, using the VGG feature mapping between the input photograph and the generated picture after initialization
Figure DEST_PATH_IMAGE022
Sparse regularization to refine semantic content loss;
3.15) Total figure variation loss function as follows:
Figure DEST_PATH_IMAGE023
wherein the content of the first and second substances,H、W、Crepresenting a spatial dimension of an image;
Figure DEST_PATH_IMAGE024
to represent
Figure DEST_PATH_IMAGE025
The backward difference of the forward direction of the differential,
Figure DEST_PATH_IMAGE026
to represent
Figure DEST_PATH_IMAGE027
Backward difference of (2);
3.16) l1 regularization term:
Figure DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE029
the one-norm of the cartoon image of the person generated by the cartoon video generation model of the person is represented;
3.2) Total loss function of Property cartoon video Generation modelL prop The following were used:
Figure DEST_PATH_IMAGE030
wherein the content of the first and second substances,a、b、c、d、eis thatL adv 、L con 、L tex 、L 1 、L IS The weight of (a) is determined,L adv 、L con 、L tex 、L 1 、L IS edge-contributing antagonism loss function, content information loss function, texture information loss function, l1 regularization term, and illumination smoothing loss, respectively;
3.21) edge-promoted resistance loss:
for each image
Figure DEST_PATH_IMAGE031
Epsilon Sdata (c) and the following three steps are applied: (1) standard Canny was used
An edge detector detects edge pixels; (2) expanding the edge region; (3) applying gaussian smoothing to the expanded edge region to obtain sdata (e); wherein Sdata (c) represents a collection of cartoon images, Sdata (e) represents a collection of cartoon images with clear boundaries removed,c i the fifth of the set Sdata (c) representing cartoon images
Figure DEST_PATH_IMAGE032
The paper is stretched and put in a paper-making machine,e j the second of the set of cartoon images representing the removed sharp boundariesjThe paper is stretched and put in a paper-making machine,p k representing the second in a set of images to be cartoonizedkOpening;
thus, the edge-facilitated antagonism loss functionL adv The following were used:
Figure DEST_PATH_IMAGE033
Figure DEST_PATH_IMAGE034
representing discrete variables
Figure DEST_PATH_IMAGE035
In the probability distribution S data (c) The entropy of the lower one of the two,
Figure DEST_PATH_IMAGE036
representing discrete variables
Figure DEST_PATH_IMAGE037
In the probability distribution S data (e) The entropy of the lower part of the entropy is,
Figure DEST_PATH_IMAGE038
representing discrete variables
Figure DEST_PATH_IMAGE039
In the probability distribution S data The entropy under (p) is given by the entropy,Da decision-maker is shown which is,Ga representation generator for generating a representation of the object,G(p k ) Represents the image generated by the generator G;
3.22) content information loss functionL con The following were used:
Figure DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE041
a feature map representing one VGG layer;
3.23) texture information loss functionL tex The following were used:
Figure DEST_PATH_IMAGE042
3.3) Total loss function of background cartoon video Generation modelL background The following:
Figure DEST_PATH_IMAGE043
wherein the content of the first and second substances,e、f、g、hrespectively in background cartoon video generation modelL adv 、L con 、L str 、L 1 The weight of (c);
respectively training a character cartoon video generation model, a prop cartoon video generation model and a background cartoon video generation model to minimize the total loss function of the character cartoon video generation model, the total loss function of the prop cartoon video generation model and the total loss function of the background cartoon video generation model, so as to respectively obtain a trained character cartoon video generation model, a trained prop cartoon video generation model and a trained background cartoon video generation model; respectively inputting different parts of to-be-cartoon videos after semantic segmentation into a trained character cartoon video generation model, a trained property cartoon video generation model and a trained background cartoon video to obtain a character cartoon video, a property cartoon video and a background cartoon video; then compositing each frame of the character cartoon video, the prop cartoon video and the background cartoon video to obtain a composite image
Figure DEST_PATH_IMAGE044
Thereby obtaining a composite cartoon music stage performance video;
step four, preprocessing the music stage performance video to be processed, then, after segmenting out characters, properties and backgrounds through a semantic segmentation model, inputting the characters, the properties and the backgrounds into a trained character cartoon video generation model, a trained property cartoon video generation model and a trained background cartoon video generation model respectively, and obtaining cartoon music stage performance video;
and fifthly, constructing a composite image coordination model to carry out image harmony processing on the cartoon music stage performance video to obtain the final cartoon music stage performance video.
2. The intelligent cartoonification method of a music stage performance video according to claim 1, wherein in the first step, the preprocessing method comprises image enhancement and image normalization.
3. The intelligent cartoonization method of a music stage performance video according to claim 1, wherein in the second step, the semantic segmentation model is a DCNN model based on semantic segmentation;
firstly, sending a picture into a DCNN model based on semantic segmentation, adding a hole convolution extraction feature to obtain a high-level semantic feature and a low-level semantic feature, wherein the hole convolution process is as follows:
Figure DEST_PATH_IMAGE045
where y [ i ] represents the hole convolution output at position i,
x [ i + τ · K ] represents the input at position i + τ · K, K represents the length of the convolution kernel, w [ K ] represents the convolution filter of length K, τ represents the sampling step of the input signal;
the low-level semantic features are feature information obtained after cavity convolution with a cavity rate of 1 for one time, the high-level semantic features are feature information obtained after cavity convolution for four times, extracted high-level semantic features are input into a cavity pyramid pooling module and are convolved with cavity convolution layers with different cavity rates, and four feature graphs are obtained, wherein the cavity convolution cavity rates are 1, 6, 12 and 18 respectively; pooling the extracted high-level semantic features to obtain a feature map; all the branches obtain five characteristic graphs, and the five characteristic graphs are spliced together to obtain a first characteristic graph;
putting the first characteristic diagram into a multilayer channel attention module to obtain a second characteristic diagram; carrying out bilinear interpolation upsampling on the second feature map and merging the second feature map with the low-level semantic features to obtain a merged feature map; the decoder part recovers the spatial information of the merged feature map by using 3 multiplied by 3 convolution and samples a fine target boundary on bilinear interpolation to obtain a segmentation result;
since there are multiple objects in the image segmentation task, a multi-classification cross entropy loss function is used
Figure DEST_PATH_IMAGE046
The formula is as follows:
Figure DEST_PATH_IMAGE047
wherein the content of the first and second substances,p i indicates that the sample belongs toiThe probability of a class is determined by the probability of the class,y i is an indication of the hit rate of the sample label, when the sample belongs to category i,y i = 1; when the sample does not belong to the first category i,y i =0;Crepresents the number of samples;
through the process, the characters and the props are separated from the stage background.
4. The intelligent cartoonizing method for music stage performance video as claimed in claim 3, wherein γ is 1 =20,γ 2 =40。
5. The intelligent cartoonization method of a music stage performance video according to claim 1, wherein the concrete steps of the fifth step are as follows:
combining the images
Figure 76558DEST_PATH_IMAGE044
Resolution into reflectance
Figure DEST_PATH_IMAGE048
And illuminating the intrinsic image
Figure DEST_PATH_IMAGE049
Figure DEST_PATH_IMAGE050
Wherein, l is an element-level product;
will coordinate through an image reconstruction loss functionL rec Embedding into the process from composite image decomposition to real image reconstruction:
Figure DEST_PATH_IMAGE051
Figure DEST_PATH_IMAGE052
an entropy value representing a norm of the spatial distance of the outputted harmonised image from the real image,
Figure DEST_PATH_IMAGE053
representing the outputted harmonised image,
to be provided with
Figure DEST_PATH_IMAGE054
Figure DEST_PATH_IMAGE055
As a constraint on coordinating reflectivity, a reflectivity harmonic loss is generatedL RH
Figure DEST_PATH_IMAGE056
Wherein the content of the first and second substances,
Figure 842739DEST_PATH_IMAGE054
representing the gradient of reflectance of the harmonised image,
Figure DEST_PATH_IMAGE057
representing the gradient of the image after the harmony,
Figure DEST_PATH_IMAGE058
an entropy value representing a norm of a difference between a degree of reflection of the harmonised image and a gradient of the harmonised image;
Figure DEST_PATH_IMAGE059
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE060
the natural image after the harmony is represented,
Figure DEST_PATH_IMAGE061
represents a gradient;
to coordinate the illumination, the illumination of the foreground and background will be compatible, learning the light first and then transferring the light from the background to the foreground, provided that the image gradient corresponding to the illumination is smooth, with
Figure DEST_PATH_IMAGE062
Constraint of 0 provides for decoupling, providing illumination smoothing loss;
setting lighting coordination lossL IH The following were used:
Figure DEST_PATH_IMAGE063
Figure DEST_PATH_IMAGE064
a real image is represented by a real image,
Figure DEST_PATH_IMAGE065
the natural image after the harmony is represented,
Figure DEST_PATH_IMAGE066
entropy values representing two norms of spatial distances between the harmonious inherent images and the real images;
constructing a composite image
Figure DEST_PATH_IMAGE067
Discordance loss ofL IF
Figure DEST_PATH_IMAGE068
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE069
in the form of a function of the similarity between,
Figure DEST_PATH_IMAGE070
the encoder is shown receiving the composite image as an input, producing as an output a dissonant feature map,Cis composed of
Figure 591865DEST_PATH_IMAGE070
The number of the channels of (a) is,
Figure DEST_PATH_IMAGE071
is the entropy value at the number of channels,
Figure DEST_PATH_IMAGE072
is shown and
Figure 165060DEST_PATH_IMAGE070
reduced gray-scale real images of the same size;
get the total loss functionL harm The following were used:
Figure DEST_PATH_IMAGE073
through training, the total loss function is enabledL harm Obtaining a final harmony processing model, and inputting the obtained composite cartoon music stage performance video into the final harmony processing model to obtain a harmony music stage performance video;λ RH 、λ IS 、λ IH andλ IF are respectively asL RH 、L IS 、L IH AndL IF the weight of (c).
CN202210812946.2A 2022-07-12 2022-07-12 Intelligent cartoon method for music stage performance video Active CN114898021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210812946.2A CN114898021B (en) 2022-07-12 2022-07-12 Intelligent cartoon method for music stage performance video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210812946.2A CN114898021B (en) 2022-07-12 2022-07-12 Intelligent cartoon method for music stage performance video

Publications (2)

Publication Number Publication Date
CN114898021A CN114898021A (en) 2022-08-12
CN114898021B true CN114898021B (en) 2022-09-27

Family

ID=82729610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210812946.2A Active CN114898021B (en) 2022-07-12 2022-07-12 Intelligent cartoon method for music stage performance video

Country Status (1)

Country Link
CN (1) CN114898021B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100334B (en) * 2022-08-24 2022-11-25 广州极尚网络技术有限公司 Image edge tracing and image animation method, device and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112295211A (en) * 2019-07-31 2021-02-02 上海虞姿信息技术有限公司 Stage performance virtual entertainment practical training system and method
CN112561786A (en) * 2020-12-22 2021-03-26 作业帮教育科技(北京)有限公司 Online live broadcast method and device based on image cartoonization and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011045768A2 (en) * 2009-10-15 2011-04-21 Yeda Research And Development Co. Ltd. Animation of photo-images via fitting of combined models
US10671838B1 (en) * 2019-08-19 2020-06-02 Neon Evolution Inc. Methods and systems for image and voice processing
CN112070080A (en) * 2020-08-19 2020-12-11 湖南师范大学 Method for classifying cartoon characters playing songs based on Faster R-CNN
CN112102153B (en) * 2020-08-20 2023-08-01 北京百度网讯科技有限公司 Image cartoon processing method and device, electronic equipment and storage medium
CN112132922A (en) * 2020-09-24 2020-12-25 扬州大学 Method for realizing cartoon of images and videos in online classroom

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112295211A (en) * 2019-07-31 2021-02-02 上海虞姿信息技术有限公司 Stage performance virtual entertainment practical training system and method
CN112561786A (en) * 2020-12-22 2021-03-26 作业帮教育科技(北京)有限公司 Online live broadcast method and device based on image cartoonization and electronic equipment

Also Published As

Publication number Publication date
CN114898021A (en) 2022-08-12

Similar Documents

Publication Publication Date Title
Li et al. Low-light image enhancement via progressive-recursive network
CN109299274B (en) Natural scene text detection method based on full convolution neural network
CN110378985B (en) Animation drawing auxiliary creation method based on GAN
CN112887698B (en) High-quality face voice driving method based on nerve radiation field
CN111242841B (en) Image background style migration method based on semantic segmentation and deep learning
CN108830913B (en) Semantic level line draft coloring method based on user color guidance
CN111553837B (en) Artistic text image generation method based on neural style migration
CN112508991B (en) Panda photo cartoon method with separated foreground and background
CN111967533B (en) Sketch image translation method based on scene recognition
CN114898021B (en) Intelligent cartoon method for music stage performance video
Chen et al. A review of image and video colorization: From analogies to deep learning
Zhao et al. Cartoon image processing: a survey
Xiao et al. Image hazing algorithm based on generative adversarial networks
Zhao et al. Research on the application of computer image processing technology in painting creation
Xing et al. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models
Yi et al. Animating portrait line drawings from a single face photo and a speech signal
Mun et al. Texture preserving photo style transfer network
Ye et al. Hybrid scheme of image’s regional colorization using mask r-cnn and Poisson editing
CN115018729A (en) White box image enhancement method for content
Ruan Anime Characters Generation with Generative Adversarial Networks
Chen et al. Image Colored-Pencil-Style Transformation Based on Generative Adversarial Network
Raut et al. Generative Adversial Network Approach for Cartoonifying image using CartoonGAN.
Ye et al. Method of Image Style Transfer Based on Edge Detection
Guo Oil painting art style extraction method based on image data recognition
Wang et al. Deep Learning in Computer Real-time Graphics and Image Using the Visual Effects of Non-photorealistic Rendering of Ink Painting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant