CN114037644A

CN114037644A - Artistic digital image synthesis system and method based on generation countermeasure network

Info

Publication number: CN114037644A
Application number: CN202111421417.1A
Authority: CN
Inventors: 刘印全; 陈庄
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-02-11
Anticipated expiration: 2041-11-26
Also published as: CN114037644B

Abstract

The invention requests to protect an art digital image synthesis system and method based on a generation countermeasure network, comprising the following steps: an image preprocessing module: the method is used for preprocessing the artistic word picture by adopting a morphological method and limiting the font transformation degree when the network learns the font transformation; a structure transformation module: the device comprises a generator and a discriminator, wherein the generator is a conversion network and is a network structure formed by sequentially connecting a plurality of convolution layers, a residual error layer and a deconvolution layer, and the discriminator comprises two parts, namely a fully-connected discriminator and a fully-convoluted generation countermeasure network discriminator; a texture transformation module: for adding texture to the structurally transformed image by a loop-generating competing network. The invention utilizes the characteristic of the generation countermeasure network to guide the generation of artistic word pictures with better and better quality. The system can generate artistic words of various styles, generate posters of commercial products through keywords, and extract style effects of the artistic words in the network.

Description

Artistic digital image synthesis system and method based on generation countermeasure network

Technical Field

The invention belongs to the field of image processing, and particularly relates to an image synthesis technology.

Background

With the development of deep learning in the field of artificial intelligence, the continuous iterative improvement of image style migration and generation of an antagonistic network, the research on artistic word style migration becomes a hot point of research. A general image style migration process is that, given a reference image and a target image, a style migration system may migrate the style of the reference image to the target image to implement style migration. Text effect style migration aims at rendering text images with style images, thereby producing text effects. By means of the generation of the countermeasure network, the existing art words which are designed in a complex way can be analogized, and the word effect is applied to other words, so that different tasks are met. The poster of the commercial product can be generated through the key words given by the user, so that the characters can better express the main characteristics of the product. The well-designed fonts with special effects are more attractive than common fonts, can well reflect the ideas and feelings of users, and meet the requirements of mass media on multiple aspects and multiple levels in publications, advertisements and various software capable of customizing the fonts, such as social software QQ, theme fonts of smart phones and the like.

A general style migration uses a convolution neural network model, the model is trained by an image classification task, the trained image classification model can well separate semantics and style in an image, the output of different layers in the network is used as semantic loss and style loss, iteration minimization is carried out through gradient descent, and therefore the generated picture can simultaneously keep the semantic information of a target image and the style information of a reference image.

The existing method can generate ghost images for characters with complicated strokes, thereby influencing the identification of the characters by users. The method adopts a morphological method to preprocess the artistic word picture, aims to limit the font conversion degree when the network learns the font conversion, adds the distance conversion loss to limit the texture transfer process when training the texture network, better learns the shape and texture characteristics, judges the quality of the picture generated by a generator through a discriminator, and guides the generation of the artistic word picture with better and better quality by utilizing the characteristic of a generated confrontation network. The system can generate artistic words of various styles, generate posters of commercial products through keywords, and extract style effects of the artistic words in the network.

Upon retrieval, the closest prior art is CN111971689A, a computer-implemented method for synthesizing medical images using a trained statistical learning model, the method comprising: receiving medical imaging data obtained using a first imaging modality type; applying the trained statistical learning model to the received medical imaging data to synthesize a medical image corresponding to a different second imaging modality type; and providing the synthesized medical image for presentation or for further processing; wherein the trained statistical learning model is built at least in part using a similarity determination between training imaging data provided at a model input and synthetic imaging data at a model output, the training imaging data corresponding to the first imaging modality type and the synthetic imaging data corresponding to the second imaging modality type; and wherein the trained statistical learning model is built at least in part using a separate statistical learning model that is built to discriminate between actual imaging data corresponding to the second imaging modality and the synthetic imaging data. The technology belongs to the application of generating an antagonistic network, however, in the field of style migration of images, the transformation of images between two different types and different domains is often required, and the difference between the first modality type image and the second modality type image of the method of the technology is not very large.

The generation countermeasure network used by the structure transformation module in the invention can map the strokes of the characters and the outline structures of other images to the same space, thus overcoming the defects of the method.

CN109637634B, a medical image synthesis method based on a generation countermeasure network, relating to the field of image synthesis. Synthesizing a neural network generator branch of a healthy image on a focus image through the focus image, and performing non-focus treatment on a focus area; synthesizing a neural network generator branch of a focus image on the health image through the health image, and performing focus treatment on a certain region of the health image; constructing a generated countermeasure loss function between the generated image and the real image according to the generated countermeasure network model; in order to stabilize the training of the neural network, a cycle consistency loss function is constructed between the focus image and the health image which are correspondingly generated by the two generators and between the health image and the health image which are correspondingly generated by the two generators; in order to optimize the results of the generated health image, a fidelity term loss function is constructed in the lesion image and the non-lesion area corresponding to the generated health image. The technology uses a circular consistent generation countermeasure network which is common in the field of style migration, however, the method is designed aiming at medical images, structural transformation required by synthesis of artistic character and wind images is not available, because strokes of characters need to be recognizable by users, and clear and recognizable artistic digital images cannot be generated by the method using the technology.

The texture transformation module in the present invention is similar to the structure of the above-described technique. However, the texture and structure of the stylized image can be represented on the generated artistic word only when the output of the structure transformation module is used as the input of the texture transformation module.

Disclosure of Invention

The present invention is directed to solving the above problems of the prior art. An artistic digital image synthesis system and method based on a generation countermeasure network are provided. The technical scheme of the invention is as follows:

an artistic digital image synthesis system based on a generative confrontation network, comprising: an image preprocessing module: preprocessing an artistic word picture and a style image by using a morphological method, and limiting the font transformation degree during network learning font transformation; calculating the distance between the background color and the character strokes and the outline of the object in the character image, and distinguishing the distance degree by the contrast color;

a structure transformation module: and mapping the character image and the style image under different preprocessing parameters to the same domain with the source image so as to learn how to transform the edge contour of the stroke of the character into the contour of the style image. The module comprises a generator and a discriminator, wherein the generator is a conversion network and is a network structure formed by sequentially connecting a plurality of convolution layers, a residual error layer and a deconvolution layer, and the discriminator comprises two parts, namely a fully-connected discriminator and a fully-convoluted generation countermeasure network discriminator; the fully-connected discriminator is used for calculating the loss values of the output image and the input image in pixel levels, and the fully-convolved generation countermeasure network discriminator is used for evaluating the loss values of the whole output image and the whole input image;

a texture transformation module: for adding texture information on the structurally transformed image by a loop-generating countermeasure network.

Further, the preprocessing of the artistic word picture by adopting the morphological method is used for limiting the font transformation degree during the network learning of the font transformation, and specifically comprises the following steps:

the step of the morphological method is to use a erosion operation in the image processing to limit the distance between strokes in the structural transformation phase by setting the size of the convolution kernel in the erosion operation.

Further, the loss of the generator of the structure transformation module is: for training generator G_BOne image in the character data set is t, and the interval [0,1 ]]Taking a parameter value l to represent different morphological transformation degrees, wherein the loss function of the generator is as follows:

the mathematical expectation is represented by the mathematical expectation,

loss value, G, of generator representing structure transformation module_B(t, l) represents the generation of a texture transformation module for an input image with parameter l as a pre-processing parameterThe output of the device, t represents a character image;

the configuration conversion module also needs a discriminator D_BTo constrain the generator:

D_Blearning to determine authenticity of an input image and a given smoothed image

And whether it matches the parameter/in which,

representing the loss value of the arbiter of the structure transformation module, the total loss takes the form:

furthermore, a discriminator of the texture transformation module is of a fully-connected neural network structure, and the quality of the image generated by the generator is obtained through the output of the discriminator;

generator G of texture transformation module_TD, discriminator D_TThe losses are respectively:

refers to the mathematical expectation when the inputs are x and y, G_T(x) Refers to the output of the generator of the texture transform module after the x-image is input. x means pre-treatedThe latter image without texture, and y denotes a normal image containing texture.

Further, the distance transformation loss module is also included: the distance loss image of the image X and the distance image D generated by the texture transformation module is used for respectively calculating a character image C and a distance image D, and the loss function is as follows:

an artistic digital image synthesis method based on a generation countermeasure network comprises the following steps: an image preprocessing step: preprocessing an artistic word picture and a style image by using a morphological method, and limiting the font transformation degree during network learning font transformation; calculating the distance between the background color and the character strokes and the outline of the object in the character image, and distinguishing the distance degree by the contrast color;

structure transformation: and mapping the character image and the style image under different preprocessing parameters to the same domain with the source image so as to learn how to transform the edge contour of the stroke of the character into the contour of the style image. The module comprises a generator and a discriminator, wherein the generator is a conversion network and is a network structure formed by sequentially connecting a plurality of convolution layers, a residual error layer and a deconvolution layer, and the discriminator comprises two parts, namely a fully-connected discriminator and a fully-convoluted generation countermeasure network discriminator; the fully-connected discriminator is used for calculating the loss values of the output image and the input image in pixel levels, and the fully-convolved generation countermeasure network discriminator is used for evaluating the loss values of the whole output image and the whole input image;

and (3) texture transformation: for adding texture information on the structurally transformed image by a loop-generating countermeasure network.

Further, the structure transformation step includes a training stage and a testing stage, which are respectively:

a training stage: inputting different convolution kernel sizes of the preprocessed corrosion operation, changing the same character image into a plurality of different character images, calculating pixel-level loss of the output homologous character image of the module after passing through a structure transformation module, and inputting the pixel-level loss into a discriminator to calculate the countermeasure loss;

and (3) a testing stage: inputting the texture-removed style image subjected to image preprocessing, simultaneously using a parameter to restrict the degree of preprocessing, calculating pixel-level loss of the output homologous character image of the module, and inputting the pixel-level loss into the discriminator to calculate the countermeasure loss.

Furthermore, the degree of preprocessing is constrained by using a parameter, the pixel-level loss of the output homologous character image is calculated, the parameter refers to a parameter for controlling the size of a convolution kernel of the corrosion operation in the preprocessing stage, and the degree of the corrosion operation is controlled by the parameter; l of output homologous character image for calculating pixel level loss₁And (4) norm.

Further, the training phase of the texture transformation step specifically includes:

inputting a texture-removed style image subjected to image preprocessing, generating a texture by using a generator for generating an anti-network in a cycle consistency mode, and calculating the loss of the generated image by a discriminator so as to perform texture coloring on the style image;

training phase of texture transformation module: inputting the character and image which are processed by image preprocessing, and directly using the generator which generates the antagonistic network in a cycle consistency way to generate the texture so as to complete the generation of the artistic character.

The invention has the following advantages and beneficial effects:

the innovations of the invention are mainly those of claims 2 and 5. The method of claim 2, which aims at the image erosion operation, can better map the stroke structure outline and the structure outline of the style image into the same domain, and improve the effect of generating the image. The distance transform loss module of claim 5, whereby the problem of inter-stroke sticking and artifacts around text images, which occur when generating artistic digital images for text images with complicated stroke structures in prior art methods, is solved.

Drawings

FIG. 1 is a schematic diagram of a configuration transformation module according to a preferred embodiment of the present invention;

FIG. 2 is a texture transformation module;

FIG. 3 is a distance conversion loss module

FIG. 4 is a schematic diagram of an artistic digital image synthesis system and method based on a generative confrontation network.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.

The technical scheme for solving the technical problems is as follows:

as shown in fig. 1-3, an artistic digital image synthesis system based on a generation countermeasure network mainly comprises three modules:

1. an image preprocessing module: distance conversion image of the character image and the image after morphological erosion, distance conversion loss module;

2. a structure transformation module: the module comprises a generator and a discriminator, wherein the generator is a conversion network and is a network structure formed by sequentially connecting a plurality of convolution layers, a residual error layer and a deconvolution layer, and the discriminator comprises two parts, namely a fully-connected discriminator and a PatchGAN discriminator.

Loss of generator of structure transformation module: for training generator G_BOne image in the character data set is t, and the interval [0,1 ]]Taking a parameter value l to represent different morphological transformation degrees, wherein the loss function of the generator is as follows:

And whether the sum matches the parameter l. Thus, the total loss takes the form:

3. a texture transformation module: the module is a loop generation countermeasure network, and the main function is to add texture to the image of the structural transformation. The module discriminator is a fully-connected neural network structure, and the quality of the generated image of the generator can be obtained through the output of the discriminator.

distance conversion loss module: the module respectively calculates the distance loss between the character image C and the distance image D, the image X generated by the texture transformation module and the distance image D, and the loss function is as follows:

preferably, the method also comprises an artistic digital image synthesis method based on the generation of the confrontational network, which comprises the following steps: an image preprocessing step: preprocessing an artistic word picture by adopting a morphological method for limiting the font transformation degree during network learning font transformation;

structure transformation: the device comprises a generator and a discriminator, wherein the generator is a conversion network and is a network structure formed by sequentially connecting a plurality of convolution layers, a residual error layer and a deconvolution layer, and the discriminator comprises two parts, namely a fully-connected discriminator and a PatchGAN discriminator; and (3) texture transformation: for adding texture to the structurally transformed image by a loop-generating competing network.

a training stage: inputting character images subjected to image preprocessing, using a parameter to restrict the degree of preprocessing, calculating pixel-level loss of the output homologous character images, and inputting the pixel-level loss into a discriminator to calculate the countermeasure loss;

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.

Claims

1. An artistic digital image synthesis system based on a generative confrontation network, comprising: an image preprocessing module: preprocessing an artistic word picture and a style image by using a morphological method, and limiting the font transformation degree during network learning font transformation; calculating the distance between the background color and the character strokes and the outline of the object in the character image, and distinguishing the distance degree by the contrast color;

2. The system for synthesizing artistic digital images based on the generation countermeasure network as claimed in claim 1, wherein the morphological pre-processing of the artistic word images for limiting the font transformation degree during the network learning font transformation comprises:

3. The artistic digital image synthesis system based on generation of confrontational networks according to claim 2, characterized in that the loss of the generator of the structural transformation module is: for training generator G_BOne image in the character data set is t, and the interval [0,1 ]]Taking out a parameter value

Representing different degrees of morphological transformation, the loss function of the generator is:

the mathematical expectation is represented by the mathematical expectation,

a loss value of the generator representing the structure transformation module,

representing the input image in parameters

As a preprocessing parameter, the output of the generator of the structure transformation module, t represents a text image;

And whether or not to sum the parameters

The matching is carried out, wherein,

4. the artistic digital image synthesis system based on the generation countermeasure network as claimed in claim 1, wherein the discriminator of the texture transformation module is a fully connected neural network structure, and the quality of the image generated by the generator is obtained through the output of the discriminator;

refers to the mathematical expectation when the inputs are x and y, G_T(x) Refers to the output of the generator of the texture transform module after the x-image is input. x refers to the image without texture after pre-processing, y tableShown is a normal image containing texture.

5. The system for synthesizing artistic digital images based on the generation of confrontational networks according to any one of claims 1 to 4, characterized by further comprising a distance transformation loss module: the distance loss image of the image X and the distance image D generated by the texture transformation module is used for respectively calculating a character image C and a distance image D, and the loss function is as follows:

6. an artistic digital image synthesis method based on a generation countermeasure network is characterized by comprising the following steps: an image preprocessing step: preprocessing an artistic word picture and a style image by using a morphological method, and limiting the font transformation degree during network learning font transformation; calculating the distance between the background color and the character strokes and the outline of the object in the character image, and distinguishing the distance degree by the contrast color;

7. The method for synthesizing artistic digital images based on generation of confrontational networks according to claim 5, wherein the structure transformation step comprises a training phase and a testing phase, respectively:

8. The artistic digital image synthesis method based on the generative countermeasure network, as claimed in claim 7, wherein the degree of preprocessing is constrained by a parameter, the pixel level loss of the output homologous character image is calculated, the parameter refers to a parameter for controlling the size of a convolution kernel of the corrosion operation in the preprocessing stage, and the degree of the corrosion operation is controlled by the parameter; l of output homologous character image for calculating pixel level loss₁And (4) norm.

9. The artistic digital image synthesis method based on generation of confrontational networks as claimed in claim 7, wherein the training phase of the texture transformation step is specifically as follows: