CN114943783A - Artistic word generation system oriented to complex texture structure - Google Patents

Artistic word generation system oriented to complex texture structure Download PDF

Info

Publication number
CN114943783A
CN114943783A CN202210651537.9A CN202210651537A CN114943783A CN 114943783 A CN114943783 A CN 114943783A CN 202210651537 A CN202210651537 A CN 202210651537A CN 114943783 A CN114943783 A CN 114943783A
Authority
CN
China
Prior art keywords
black
style
preset
white
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210651537.9A
Other languages
Chinese (zh)
Inventor
王中风
毛文东
石卉虹
林军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202210651537.9A priority Critical patent/CN114943783A/en
Publication of CN114943783A publication Critical patent/CN114943783A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The application provides an artistic word generation system facing to a complex texture structure, which comprises an input processing module, a black and white text mask and a style block, wherein the input processing module processes input source characters to generate the black and white text mask; generating a first generator of an antagonistic network model, processing a black-and-white text mask and a style small block, and generating a style large block for expanding a real edge of a preset multiple; processing the style large block by a second generator for generating an antagonistic network model to generate a black and white style mask with the style large block; the detail thinning module comprises a structure thinning network and a texture thinning network, and the structure thinning network thins and processes large style blocks to generate intermediate artistic words; and the texture refining network is used for refining the intermediate artistic words according to the black-white mask texture to generate final artistic words. Therefore, the artistic words with complex style effects are generated based on the complex texture structure by generating the artistic word rudiment and then carrying out structure and detail refinement on the artistic word rudiment.

Description

Artistic word generation system oriented to complex texture structure
Technical Field
The application relates to the field of artistic word generation, in particular to an artistic word generation system for a complex texture structure.
Background
With the development of visual arts, the use requirements of art words in various propaganda advertisements are more and more. Therefore, artistic words with corresponding styles need to be quickly generated according to different requirements. Currently, artistic words can be generated through neural network models.
For artistic word generation, a commonly used neural Network model generates a Generic Adaptive Network (GAN) model. GAN is an unsupervised deep learning model, and there are generally two methods for generating artistic words by using GAN: a method is according to source characters and corresponding source artistic word, produce the target artistic word that the target characters correspond to, the style of the target artistic word is the same as source artistic word, this method needs the special artistic word style data set to train, and can only produce the artistic word according to existing artistic word style after training, can't meet the various use demands of various propaganda advertisements, if should produce the complicated artistic word based on this method, need develop the large-scale artistic word data set with high-resolution image and diversification, so the cost is too high; the other method is to set a simple stylized text, to refer to a style picture or a style video to perform animation processing on a static text image, and to sense the shape in the style video to generate artistic words with controllable font corresponding to the static text image.
In summary, the conventional art word generation system can only generate simple art words based on simple styles and cannot generate art words with complex style effects based on complex texture structures.
Disclosure of Invention
The application provides an art word generating system for a complex texture structure, which can be used for solving the technical problems that the existing art word generating system can only generate simple art words based on a simple style and can not generate art words with complex style effects based on the complex texture structure.
In order to solve the technical problem, the application discloses the following technical scheme:
a complex texture-oriented artistic word generation system, the generation system comprising: the system comprises an input processing module, a generation confrontation network model and a detail refining module which are connected in sequence;
the input processing module is used for processing input source characters to generate a black and white text mask with smooth edges, and processing input style pictures to generate style small blocks by using the black and white text mask, wherein the style pictures are pictures with complex textures;
the generation confrontation network model comprises a first generator and a second generator, wherein the first generator is used for processing the black-and-white text mask and the style small block to generate a style large block which expands the real edge of a preset multiple; the second generator is used for processing the style big block and generating a black-and-white style mask of the style big block;
the generation and training is to train the generation and confrontation network model to be convergent by taking a preset black and white mask small block and a preset cutting style small block in a pre-created generation and training set as input and taking a preset style large block and a preset black and white style mask large block in the generation and training set as output;
the detail thinning module comprises a structure thinning network and a texture thinning network, and the structure thinning network is used for performing structure thinning processing on the style big block to generate an intermediate artistic word; the texture refining network is used for carrying out texture refining processing on the intermediate artistic word according to the black-and-white style mask to generate a final artistic word;
the Structure refinement network is a Structure Net network which is trained through styles, the styles are trained by taking pictures in a preset conventional picture data set as an input content graph, taking a preset source style picture as a reference style graph and taking a stylized content graph as output, and the Structure refinement network is trained to be convergent.
In one implementation, the generated training set is pre-created by:
selecting an original style image Y g The primitive style image Y g An image with complex texture;
acquiring the original style image Y g Original black and white mask M g The primitive style image Y g The style part in (1) is in the original black-and-white mask M g A black area in the middle, the original style image Y g Is in the original black-and-white mask M g The middle is a white area;
selecting the original black-white mask M according to a preset first size L multiplied by L g Local black-white mask M with maximum medium black area l And the primitive style image Y g Neutralizing the partial black-and-white mask M l Corresponding local style image Y l
For the original black and white mask M g Performing edge simplification processing to generate a first black-and-white mask with smooth edge
Figure BDA0003686308510000021
For the local black and white mask M l Performing edge simplification treatment to generate a second black-and-white mask with smooth edge
Figure BDA0003686308510000022
According to a preset large block clipping method, the original style image Y is clipped g And the local style image Y l In the method, a plurality of preset style chunks y are obtained by clipping, and each preset style chunk y is obtained in the original black-and-white mask M g And said partial black-and-white mask M l The preset black-white mask big blocks at corresponding positions in the first black-white mask are obtained, and each preset black-white mask big block m is positioned in the first black-white mask
Figure BDA0003686308510000023
And the second black-and-white mask
Figure BDA0003686308510000024
The preset black-white mask block m at the corresponding position s
Randomly clipping each preset style large block y to obtain preset style small blocks
Figure BDA0003686308510000025
The size of the preset style big block y is a preset style small block
Figure BDA0003686308510000026
A preset multiple of the size;
for each preset black-white mask block m s According to the preset multiple, down-sampling is carried out to obtain a small block with a preset black-white mask
Figure BDA0003686308510000027
The black and white mask block m s Is the preset black and white mask small block
Figure BDA0003686308510000028
A preset multiple of the size;
passing through the preset black and white mask small block
Figure BDA0003686308510000029
Cutting out the preset style small blocks
Figure BDA00036863085100000210
Obtaining a preset clipping style tile
Figure BDA00036863085100000211
All the preset clipping style small blocks
Figure BDA00036863085100000212
And all the preset black-white mask small blocks
Figure BDA00036863085100000213
Determining to generate a training set.
In one implementation, the preset bulk clipping method includes:
setting the second size of the large block as xNxN, wherein xN is less than L, and x is the preset multiple;
cropping a plurality of blocks from a first reference image according to a first probability a, said first reference image comprising said original-style image Y g The original black-and-white mask M g And the first black-and-white mask
Figure BDA00036863085100000214
Cropping a plurality of large blocks from a second reference image comprising said local-style image Y with a second probability 1-a l The local black and white mask M l And the second black-and-white mask
Figure BDA00036863085100000215
In one implementation, the size of the preset small block is N × N, and the preset small block includes the preset style small block
Figure BDA00036863085100000216
The preset black and white mask small block
Figure BDA00036863085100000217
And the preset cutting style small block
Figure BDA00036863085100000218
In one implementation, the first generator includes a convolutional layer, a residual module, a stitching layer, and a transposed convolutional layer set according to a generation requirement; the second generator comprises a convolution layer, a residual error module and a splicing layer which are arranged according to the generation requirement.
In one implementation, the generating training includes a first generating training and a second generating training, where:
the first generation training sets the stride and expansion rate of the convolutional layer inside the first generator, the stride and expansion rate of the transposed convolutional layer and the size of the convolutional kernel, and the preset black-and-white mask small block
Figure BDA00036863085100000219
And the preset clipping style small block
Figure BDA00036863085100000220
Taking the preset style big block y as input, processing the black-and-white text mask and the style small block, and generating the style big block with the real edge expanded by preset times;
and the second generation training is to realize the training of processing the style big block and generating the black and white style mask of the style big block by setting the stride and the expansion rate of the convolutional layer in the second generator, taking the preset style big block y as input and the preset black and white mask big block m as output.
In one implementation, the generating the antagonistic network model further comprises a discriminator for completing the first generation training in cooperation with the first generator.
In one implementation manner, the generation system further includes a deformable module disposed between the input processing module and the generation countermeasure network model, and the deformable module is configured to implement the deformation degree control on the black-and-white text mask by performing processing of adding noise and edge erosion on the black-and-white text mask.
In an implementation manner, the deformable module implements processing of adding noise and edge corrosion to the black-and-white text mask by the following steps, and implements control of the deformation degree of the black-and-white text mask:
eroding the edge of the black-and-white text mask, and adding noise to the eroded edge to obtain a noise black-and-white mask;
setting a vector f to expand the noise black and white mask, and realizing the deformation degree control of the black and white text mask; the vector f comprises f 0 、f 1 And f 2 Wherein f is 0 Size of core for erosion and expansion, f 1 To the extent of noise addition in the edge, f 2 To control the degree of internal noise addition.
In one implementation, the structure refinement network further comprises an attention mechanism module and an image conversion network disposed before the structure refinement network, wherein:
the attention mechanism module is used for outputting an element attention parameter according to an input tensor of the structure refinement network, wherein the input tensor is the input content graph, and the element attention parameter is used for enabling the image conversion network to pay attention to an element part of the input content graph;
the image conversion network is used for generating an output tensor according to a result of element-by-element multiplication of the input tensor and the element attention parameters, wherein the output tensor is the stylized content graph, the output tensor is used for calculating loss function values by combining the reference style graph, and the loss function values are used for training the structure refinement network.
The application provides an art word generating system facing a complex texture structure, which comprises an input processing module, a processing module and a processing module, wherein the input processing module processes input source words to generate a black and white text mask, and the black and white text mask is used for processing input style pictures to generate style small blocks; generating a first generator of an antagonistic network model, processing a black-and-white text mask and a style small block, and generating a style large block for expanding a real edge of a preset multiple; processing the style large block by a second generator for generating an antagonistic network model to generate a black and white style mask with the style large block; the detail thinning module comprises a structure thinning network and a texture thinning network, and the structure thinning network thins and processes large style blocks to generate intermediate artistic words; and the texture refining network is used for refining the intermediate artistic words according to the black-white mask texture to generate final artistic words. Therefore, the artistic words with complex style effects are generated based on the complex texture structure by generating the artistic word rudiment and then carrying out structure and detail refinement on the artistic word rudiment.
Drawings
Fig. 1 is a schematic structural diagram of a complex texture structure-oriented art word generating system provided in the present application;
FIG. 2 is a schematic diagram illustrating creation of a training set for a complex texture structure-oriented artistic word generation system according to the present application;
FIG. 3 is a schematic structural diagram of a generation confrontation network model of a complex texture structure-oriented art word generation system according to the present application;
FIG. 4 is a schematic diagram of a first generation training of a complex texture structure-oriented artistic word generation system provided in the present application;
FIG. 5 is a schematic diagram of style training of a complex texture structure-oriented artistic word generation system provided in the present application;
FIG. 6 is a structure-refined network architecture diagram of a complex texture structure-oriented artistic word generation system provided in the present application;
FIG. 7 is a schematic structural diagram of a deformable module of a complex texture structure-oriented artistic word generating system provided by the present application;
FIG. 8 is a schematic diagram of a testing stage of a complex texture structure-oriented art word generating system according to the present application;
fig. 9 is a schematic diagram of a final artistic word of the complex texture structure-oriented artistic word generating system provided by the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one, two or more, "a plurality" means two or more. The term "and/or" is used to describe an association relationship that associates objects, meaning that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
In order to solve the technical problem that the existing art word generating system can only generate simple art words based on simple styles and cannot generate art words with complex style effects based on complex texture structures, the application provides an art word generating system oriented to complex texture structures, and the embodiment of the application is specifically described below with reference to the attached drawings.
Referring to fig. 1, a schematic structural diagram of an art word generating system for a complex texture structure is provided in the present application;
as can be seen from fig. 1, the generation system includes: the system comprises an input processing module, a generation confrontation network model and a detail refining module which are connected in sequence;
the input processing module is used for processing input source characters to generate a black and white text mask with smooth edges, and processing input style pictures to generate style small blocks by using the black and white text mask, wherein the style pictures are pictures with complex textures;
the generation confrontation network model comprises a first generator and a second generator, wherein the first generator is used for processing the black-and-white text mask and the style small block to generate a style large block which expands the real edge of a preset multiple; the second generator is used for processing the style big block to generate a black and white style mask of the style big block;
the generation and training is to train the generation and confronting network model to be convergent by taking a preset black and white mask small block and a preset cutting style small block in a pre-created generation and training set as input and taking a preset style large block and a preset black and white style mask large block in the generation and training set as output;
the detail thinning module comprises a Structure thinning network (Ns) and a Texture thinning network (Texture Net, Nt), and the Structure thinning network is used for performing Structure thinning processing on the style big blocks to generate middle artistic words; the texture refining network is used for carrying out texture refining processing on the intermediate artistic word according to the black-and-white style mask to generate a final artistic word;
the Structure refinement network is a Structure Net network which is trained through styles, the styles are trained by taking pictures in a preset conventional picture data set as an input content graph, taking a preset source style picture as a reference style graph and taking a stylized content graph as output, and the Structure refinement network is trained to be convergent.
Referring to fig. 2, a schematic diagram is created for a generated training set of the complex texture structure-oriented art word generating system provided by the present application;
as can be seen from fig. 2, the generated training set in the present application is created in advance by:
step 101, selecting an original style image Y g The primitive style image Y g An image with complex texture;
102, obtaining the original style image Y g Original black-white mask M g The primitive style image Y g The style part in (1) is in the original black-and-white mask M g A black area in the middle, the original style image Y g Is in the original black-and-white mask M g The middle is a white area;
step 103, selecting the original black-and-white mask M according to a preset first size lxl g Local black-white mask M with maximum medium black area l And the primitive style image Y g Neutralizing the partial black-and-white mask M l Corresponding local style image Y l
Specifically, through steps 101 to 103, contour features of the style elements of the style image are acquired.
Step 104, masking the original black and white mask M g Performing edge simplification processing to generate a first black-and-white mask with smooth edge
Figure BDA0003686308510000051
Step 105, masking the local black and white mask M l Performing edge simplification treatment to generate a second black-and-white mask with smooth edge
Figure BDA0003686308510000052
Specifically, through steps 104 to 105, smoothness of the black and white mask edge of the original text is simulated, and the edge simplification process is completed through gaussian blur and sigmoid (.) functions.
Step 106, according to the preset block clipping method, the original style image Y is clipped g And the local style image Y l In the method, a plurality of preset style chunks y are obtained by clipping, and each preset style chunk y is obtained in the original black-and-white mask M g And said partial black-and-white mask M l The preset black-white mask big blocks at corresponding positions in the first black-white mask are obtained, and each preset black-white mask big block m is positioned in the first black-white mask
Figure BDA0003686308510000053
And the second black-and-white mask
Figure BDA0003686308510000054
The preset black-white mask block m at the corresponding position s
Thus, the six-channel real training sample of the generated training set can be created from one single style picture.
Specifically, the preset bulk clipping method is completed through the following steps:
step 601, setting the second size of the large block as xN x xN, wherein xN is less than L, and x is the preset multiple;
step 602, cropping a plurality of large blocks from a first reference image according to a first probability a, said first reference image comprising said original-style image Y g The original black-and-white mask M g And the first black-and-white mask
Figure BDA0003686308510000055
Step 603, cropping a plurality of large blocks from a second reference image according to a second probability 1-a, wherein the second reference image comprises the local style image Y l The local black and white mask M l And the second black-and-white mask
Figure BDA0003686308510000056
Thus, the preset style big block y has actual style characteristics, the preset black and white mask big block m is used for describing real outlines of style elements, and the preset style big block y and the preset black and white mask big block m are combined to obtain a six-channel real training sample [ y; m ].
Step 107, randomly cutting from each preset style chunk yCutting to obtain small blocks with preset styles
Figure BDA0003686308510000057
The size of the preset style big block y is a preset style small block
Figure BDA0003686308510000058
A preset multiple of the size;
108, for each preset black and white mask block m s According to the preset multiple, down-sampling is carried out to obtain a preset black and white mask small block
Figure BDA0003686308510000059
The black and white mask block m s Is the preset black and white mask small block
Figure BDA00036863085100000510
A preset multiple of the size.
Step 109, passing the preset black and white mask small block
Figure BDA00036863085100000511
Cutting the preset style small block
Figure BDA00036863085100000512
Obtaining preset cutting style small blocks
Figure BDA00036863085100000513
Step 110, all the preset clipping style small blocks
Figure BDA00036863085100000514
And all the preset black and white mask small blocks
Figure BDA00036863085100000515
Determining to generate a training set.
Specifically, the size of the preset small block is N × N, and the preset small block includes the preset style small block
Figure BDA0003686308510000061
The preset black and white mask small block
Figure BDA0003686308510000062
And the preset clipping style small block
Figure BDA0003686308510000063
Thus, due to the simplification of the black and white mask edges, the preset cropping style tiles
Figure BDA0003686308510000064
And the preset black and white mask small block
Figure BDA0003686308510000065
Black and white mask with smooth contour, its characteristics and original text
Figure BDA0003686308510000066
And its cutting style small block
Figure BDA0003686308510000067
Similarly, where superscript' denotes input or output data for the test phase. Cutting the preset cutting style small block
Figure BDA0003686308510000068
And the preset black-white mask small block
Figure BDA0003686308510000069
And generating six-channel input data of the training set
Figure BDA00036863085100000610
Referring to fig. 3, a structural diagram of a generation confrontation network model of a complex texture structure-oriented art word generation system is provided in the present application;
as can be seen from FIG. 3, the first generator G p1 Comprises a convolution layer, a residual module and a splicing layer which are arranged according to the generation requirementLayers and transposed convolutional layers; the second generator G p2 The method comprises a convolution layer, a residual module and a splicing layer which are arranged according to the generation requirement.
Specifically, the internal structure of the first generator is sequentially a first convolution layer, a second convolution layer, a third convolution layer, a plurality of residual modules, a first splicing layer, a first transposing convolution layer, a second transposing convolution layer and a fourth convolution layer;
the second generator comprises a fifth convolution layer, a residual error module, a sixth convolution layer and a second splicing layer which are connected in sequence;
the generation training includes a first generation training and a second generation training, wherein:
the first generation training is to set the stride and the expansion rate of the convolution layer inside the first generator, the stride and the expansion rate of the transposition convolution layer and the size of the convolution kernel, and to set the preset black-and-white mask small block
Figure BDA00036863085100000611
And the preset cutting style small block
Figure BDA00036863085100000612
Taking the preset style big block y as input, and realizing the training of processing the black-and-white text mask and the style small block to generate a style big block of a real edge with an expanded preset multiple;
specifically, Sidj represents the step size of the layer as i and the expansion rate as j. Where the convolutional layer with step 2 may downsample the feature and the transposed convolutional layer with step 2 may upsample the feature. kx denotes the convolution kernel size of the convolutional layer as x × x, and ty denotes the convolution kernel size of the transposed convolutional layer as y × y.
Referring to fig. 4, a schematic diagram of a first generation training of a complex texture structure-oriented artistic word generation system is provided in the present application;
as can be seen from fig. 4, the input of the generation training phase uses the preset clipping style small blocks in the generation training set
Figure BDA00036863085100000613
And the preset black-white mask small block
Figure BDA00036863085100000614
The output is the preset style large block y and the preset black-and-white mask large block m with large preset multiple in the generated training set, specifically, the first generator G p1 The corresponding output is the preset style chunk y, the second generator G p2 The corresponding output is the preset black and white mask chunk m, the first generator G p1 The function of (1) is to obtain a large block picture with real edges and content expanded by a preset multiple according to the mask with smooth edges and the small style picture, and the second generator G p2 The function of the method is to extract a real black and white mask of the large picture.
And the second generation training is to realize the training of processing the style big block and generating the black and white style mask of the style big block by setting the stride and the expansion rate of the convolutional layer in the second generator, taking the preset style big block y as input and the preset black and white mask big block m as output.
Specifically, the generation of the antagonistic network model further comprises a discriminator Dp for completing the first generation training in cooperation with the first generator.
Referring to fig. 5, a schematic diagram of style training of a complex texture structure-oriented artistic word generation system provided by the present application is shown;
as can be seen from fig. 5, specifically, the Structure refinement network is a Structure Net network trained by style, and the style training takes the pictures in the conventional picture data set as input content graphs, such as coco dataset, the preset source style pictures as reference style graphs, and the stylized content graphs as output, and trains the Structure refinement network to converge.
Referring to fig. 6, a network architecture diagram is refined for the structure of the complex texture structure-oriented art word generating system provided by the present application;
as can be seen from fig. 6, in particular, the structure refinement network further includes an attention mechanism module (attention module) and an Image Transformation Network (ITN) disposed before the structure refinement network, where:
the attention mechanism module is used for outputting an element attention parameter according to an input tensor of the structure refinement network, wherein the input tensor is the input content graph, and the element attention parameter is used for enabling the image conversion network to pay attention to an element part of the input content graph;
specifically, the attention mechanism module comprises an averaging pooling layer (averaging capacitance), a seventh convolution layer, a relu activation function layer, an eighth convolution layer and a sigmoid layer which are connected in sequence, wherein k1 represents that the convolution kernel of the convolution layer is 1, Och1 represents that the output channel of the seventh convolution layer is 1, Och3 represents that the output channel of the eighth convolution layer is 3, and the relu activation function layer and the sigmoid layer represent corresponding nonlinear layers.
The image conversion network is used for generating an output tensor according to a result of element-by-element multiplication of the input tensor and the element attention parameters, wherein the output tensor is the stylized content graph, the output tensor is used for calculating loss function values by combining the reference style graph, and the loss function values are used for training the structure refinement network.
Referring to fig. 7, a schematic structural diagram of a deformable module of a complex texture structure-oriented artistic word generating system is provided in the present application;
as can be seen from fig. 7, the generating system further includes a deformable module disposed between the input processing module and the generation countermeasure network model, and the deformable module is configured to perform processing of adding noise and edge erosion to the black-and-white text mask, so as to control a degree of deformation of the black-and-white text mask.
Specifically, the deformable module implements the processing of adding noise and corroding edges on the black-and-white text mask by the following steps, and implements the control of the deformation degree of the black-and-white text mask:
eroding the edge of the black-and-white text mask, and adding noise to the eroded edge to obtain a noise black-and-white mask;
setting a vector f to expand the noise black and white mask, and realizing the deformation degree control of the black and white text mask; the vector f comprises f 0 、f 1 And f 2 Wherein f is 0 Size of the eroded and expanded core, f 1 To the extent of noise addition in the edge, f 2 To control the degree of internal noise addition.
In particular, three factors in the deformable module are related to style control. Two of which are related to edge erosion and others to the degree of internal deformation. Specifically, as shown in fig. 7(a), first, f passes 0 Controlling etching of the edge of the black and white mask and passing f 1 Adding noise at the edges of the corrosion. Second, the noisy black and white mask is expanded. The scale of the edge deformation is given by (f) 0 ,f 1 ) And (5) controlling. For internal morphing, noise will be added inside the text, where f 2 The degree of internal noise addition is controlled. Will (f) 0 ,f 1 ,f 2 ) The combination of (2) is named vector f, which determines the degree of deformation. Therefore, as shown in fig. 7(b), text black and white masks of various scales can be obtained by changing the vector f. It is noted that the deformable module is only used in the testing phase to process the text black and white mask and then crop the picture with the style changeable black and white mask to obtain a coarse style text. As shown in fig. 7(c), multi-scale artistic text is obtained by a changeable-style black-and-white mask and a clipped texture without retraining the network.
Referring to fig. 8, a schematic diagram of a testing stage of the complex texture structure-oriented artistic word generating system provided by the present application is shown;
referring to fig. 9, a schematic diagram of a final artistic word of the complex texture structure-oriented artistic word generating system is provided.
As shown in fig. 8 and 9, in the present application, a test stage is set to verify the results of the generation training and the style training, and in the test stage, under the control of a vector f, a binary text mask and a style image enter the deformable module to obtain a coarse-grained style text prototype, which is sent to Gp, Ns, and Nt in sequence for forward reasoning operation, and finally, Nt is output as a stylized final artistic word.
The application provides an art word generating system facing a complex texture structure, which comprises an input processing module, a processing module and a processing module, wherein the input processing module processes input source words to generate a black and white text mask, and the black and white text mask is used for processing input style pictures to generate style small blocks; generating a first generator of an antagonistic network model, processing a black-and-white text mask and a style small block, and generating a style large block for expanding a real edge of a preset multiple; processing the style large block by a second generator for generating an antagonistic network model to generate a black and white style mask with the style large block; the detail thinning module comprises a structure thinning network and a texture thinning network, and the structure thinning network thins large style blocks to generate intermediate artistic words; and the texture refining network refines the intermediate artistic words according to the black-and-white style mask textures to generate final artistic words. Therefore, the artistic words with complex style effects are generated based on the complex texture structure by generating the artistic word rudiment and then carrying out structure and detail refinement on the artistic word rudiment.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains; it is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof; the scope of the invention is limited only by the appended claims.

Claims (10)

1. A complex texture structure-oriented artistic word generation system, the generation system comprising: the system comprises an input processing module, a generation confrontation network model and a detail refining module which are connected in sequence;
the input processing module is used for processing input source characters to generate a black and white text mask with smooth edges, and processing input style pictures to generate style small blocks by using the black and white text mask, wherein the style pictures are pictures with complex textures;
the generation confrontation network model comprises a first generator and a second generator, wherein the first generator is used for processing the black-and-white text mask and the style small block to generate a style large block which expands the real edge of a preset multiple; the second generator is used for processing the style big block to generate a black and white style mask of the style big block;
the generation and training is to train the generation and confrontation network model to be convergent by taking a preset black and white mask small block and a preset cutting style small block in a pre-created generation and training set as input and taking a preset style large block and a preset black and white style mask large block in the generation and training set as output;
the detail refining module comprises a structure refining network and a texture refining network, and the structure refining network is used for carrying out structure refining processing on the style large blocks to generate middle artistic words; the texture refining network is used for carrying out texture refining processing on the intermediate artistic word according to the black-and-white style mask to generate a final artistic word;
the Structure refinement network is a Structure Net network which is trained through styles, the styles are trained by taking pictures in a preset conventional picture data set as an input content graph, taking a preset source style picture as a reference style graph and taking a stylized content graph as output, and the Structure refinement network is trained to be convergent.
2. A complex-texture-oriented artistic word generation system according to claim 1, wherein the generation training set is created in advance by:
selecting an original style image Y g The primitive style image Y g An image with complex texture;
acquiring the original style image Y g Original black and white mask M g The primitive style image Y g The style part in the original black-and-white mask M g A black area in the middle, the original style image Y g Is in the original black-and-white mask M g The middle is a white area;
selecting the original black-white mask M according to a preset first size L multiplied by L g Local black-white mask M with maximum medium black area l And the primitive style image Y g Neutralizing the partial black-and-white mask M l Corresponding local style image Y l
For the original black and white mask M g Performing edge simplification processing to generate a first black-and-white mask with smooth edge
Figure FDA0003686308500000011
For the partial black-white mask M l Performing edge simplification treatment to generate a second black-and-white mask with smooth edge
Figure FDA0003686308500000012
According to a preset large block clipping method, the original style image Y is clipped g And the local style image Y l In the method, a plurality of preset style chunks y are obtained by clipping, and each preset style chunk y is obtained in the original black-and-white mask M g And said partial black-and-white mask M l The preset black-white mask big blocks at corresponding positions in the first black-white mask are obtained, and each preset black-white mask big block m is positioned in the first black-white mask
Figure FDA0003686308500000013
And the second black-and-white mask
Figure FDA0003686308500000014
Preset black and white mask block m in corresponding position s
Randomly clipping each preset style large block y to obtain preset style small blocks
Figure FDA0003686308500000015
The size of the preset style big block y is a preset style small block
Figure FDA0003686308500000016
A preset multiple of the size;
for each preset black and white mask block m s According to the preset multiple, down-sampling is carried out to obtain a preset black and white mask small block
Figure FDA0003686308500000017
The black and white mask block m s Is the preset black-white mask small block
Figure FDA0003686308500000018
A preset multiple of the size;
passing through the preset black and white mask small block
Figure FDA0003686308500000019
Cutting out the preset style small blocks
Figure FDA00036863085000000110
Obtaining preset cutting style small blocks
Figure FDA00036863085000000111
All the preset cutting style small blocks
Figure FDA00036863085000000112
And all the preset black and white mask small blocks
Figure FDA00036863085000000113
Determining to generate a training set.
3. The system for generating artistic words facing complex textures of claim 2, wherein the preset large block clipping method comprises:
setting a second size xNxN of the large block, wherein xN is less than L, and x is the preset multiple;
cropping a plurality of large blocks from a first reference image according to a first probability a, said first reference image comprising said original-style image Y g The original black-and-white mask M g And the first black-and-white mask
Figure FDA0003686308500000021
Cropping a plurality of large blocks from a second reference image according to a second probability 1-a, the second reference image comprising the local-style image Y l The local black and white mask M l And the second black-and-white mask
Figure FDA0003686308500000022
4. The system of claim 3, wherein the preset tiles have a size of NxN, and the preset tiles comprise the preset style tiles
Figure FDA0003686308500000023
The preset black and white mask small block
Figure FDA0003686308500000024
And the preset cutting style small block
Figure FDA0003686308500000025
5. The complex texture structure-oriented artistic word generation system of claim 1, wherein the first generator comprises a convolutional layer, a residual module, a splicing layer and a transposed convolutional layer which are set according to generation requirements; the second generator comprises a convolution layer, a residual error module and a splicing layer which are arranged according to the generation requirement.
6. The complex texture structure-oriented artistic word generation system of claim 5, wherein the generation training comprises a first generation training and a second generation training, wherein:
the first generation training sets the stride and expansion rate of the convolutional layer inside the first generator, the stride and expansion rate of the transposed convolutional layer and the size of the convolutional kernel, and the preset black-and-white mask small block
Figure FDA0003686308500000026
And the preset clipping style small block
Figure FDA0003686308500000027
Taking the preset style big block y as input, and realizing the training of processing the black-and-white text mask and the style small block to generate a style big block of a real edge with an expanded preset multiple;
and the second generation training is to realize the training of processing the style big block and generating the black and white style mask of the style big block by setting the stride and the expansion rate of the convolutional layer in the second generator, taking the preset style big block y as input and the preset black and white mask big block m as output.
7. The complex-texture-oriented artistic word generation system of claim 6, wherein the generative confrontation network model further comprises a discriminator, the discriminator being configured to cooperate with the first generator to perform the first generative training.
8. The complex texture structure-oriented artistic word generating system as claimed in claim 1, wherein the generating system further comprises a deformable module disposed between the input processing module and the generation countermeasure network model, the deformable module is configured to implement a degree of deformation control of the black-and-white text mask by performing a process of adding noise and edge erosion to the black-and-white text mask.
9. The complex texture structure-oriented artistic word generating system of claim 8, wherein the deformable module implements the processing of adding noise and edge erosion to the black-and-white text mask by the following steps, and implements the control of the deformation degree of the black-and-white text mask:
eroding the edge of the black-and-white text mask, and adding noise to the eroded edge to obtain a noise black-and-white mask;
setting a vector f to expand the noise black and white mask, and realizing the deformation degree control of the black and white text mask; the vector f comprises f 0 、f 1 And f 2 Wherein, f 0 Size of core for erosion and expansion, f 1 To the extent of noise addition in the edge, f 2 To control the degree of internal noise addition.
10. A complex texture structure-oriented artistic word generating system as claimed in claim 1, wherein the structure refinement network further comprises an attention mechanism module and an image conversion network disposed before the structure refinement network, wherein:
the attention mechanism module is used for outputting an element attention parameter according to an input tensor of the structure refinement network, wherein the input tensor is the input content graph, and the element attention parameter is used for enabling the image conversion network to pay attention to an element part of the input content graph;
the image conversion network is used for generating an output tensor according to a result of element-by-element multiplication of the input tensor and the element attention parameters, wherein the output tensor is the stylized content graph, the output tensor is used for calculating a loss function value by combining the reference style graph, and the loss function value is used for training the structure refinement network.
CN202210651537.9A 2022-06-09 2022-06-09 Artistic word generation system oriented to complex texture structure Pending CN114943783A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210651537.9A CN114943783A (en) 2022-06-09 2022-06-09 Artistic word generation system oriented to complex texture structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210651537.9A CN114943783A (en) 2022-06-09 2022-06-09 Artistic word generation system oriented to complex texture structure

Publications (1)

Publication Number Publication Date
CN114943783A true CN114943783A (en) 2022-08-26

Family

ID=82910004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210651537.9A Pending CN114943783A (en) 2022-06-09 2022-06-09 Artistic word generation system oriented to complex texture structure

Country Status (1)

Country Link
CN (1) CN114943783A (en)

Similar Documents

Publication Publication Date Title
Cao et al. Do-conv: Depthwise over-parameterized convolutional layer
EP3329463B1 (en) Method and device for image synthesis
CN105374007A (en) Generation method and generation device of pencil drawing fusing skeleton strokes and textural features
KR20200055841A (en) Learning data set generating apparatus and method for machine learning
CN109920021B (en) Face sketch synthesis method based on regularized width learning network
CN112102303A (en) Semantic image analogy method for generating countermeasure network based on single image
CN112184582B (en) Attention mechanism-based image completion method and device
CN109447897B (en) Real scene image synthesis method and system
CN101877122A (en) Method for denoising and enhancing anisotropic diffusion image with controllable diffusion degree
Wei et al. A-ESRGAN: Training real-world blind super-resolution with attention U-Net Discriminators
CN115471831B (en) Image saliency detection method based on text reinforcement learning
CN117788629B (en) Image generation method, device and storage medium with style personalization
Li et al. Detail-enhanced image inpainting based on discrete wavelet transforms
CN111783862A (en) Three-dimensional significant object detection technology of multi-attention-directed neural network
Ko et al. SKFont: skeleton-driven Korean font generator with conditional deep adversarial networks
Zhou et al. Neural texture synthesis with guided correspondence
CN111583412B (en) Method for constructing calligraphy relief deep learning network and method for constructing calligraphy relief
CN111667401B (en) Multi-level gradient image style migration method and system
CN117095172A (en) Continuous semantic segmentation method based on internal and external distillation
CN114943783A (en) Artistic word generation system oriented to complex texture structure
CN114037644B (en) Artistic word image synthesis system and method based on generation countermeasure network
KR20230073751A (en) System and method for generating images of the same style based on layout
DE102021124428A1 (en) TRAIN ENERGY-BASED VARIATIONAL AUTOENCODERS
CN106355593A (en) Sketch generation method based on Markov random field
Nandal et al. A Synergistic Framework Leveraging Autoencoders and Generative Adversarial Networks for the Synthesis of Computational Fluid Dynamics Results in Aerofoil Aerodynamics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination