CN114494003B - Ancient character generation method combining shape transformation and texture transformation - Google Patents

Ancient character generation method combining shape transformation and texture transformation Download PDF

Info

Publication number
CN114494003B
CN114494003B CN202210336338.9A CN202210336338A CN114494003B CN 114494003 B CN114494003 B CN 114494003B CN 202210336338 A CN202210336338 A CN 202210336338A CN 114494003 B CN114494003 B CN 114494003B
Authority
CN
China
Prior art keywords
generator
transformation
ancient
network
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210336338.9A
Other languages
Chinese (zh)
Other versions
CN114494003A (en
Inventor
黄双萍
黄鸿翔
杨代辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou, South China University of Technology SCUT filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Priority to CN202210336338.9A priority Critical patent/CN114494003B/en
Publication of CN114494003A publication Critical patent/CN114494003A/en
Application granted granted Critical
Publication of CN114494003B publication Critical patent/CN114494003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T3/18
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/40Filling a planar surface by adding surface attributes, e.g. colour or texture
    • G06T3/02

Abstract

The invention discloses an ancient character generation method combining shape transformation and texture conversion, which comprises the following steps: constructing a shape transformation generation countermeasure network A, comprising a generator A1 and a discriminator A2; constructing a texture conversion generation countermeasure network B, which comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, wherein the texture conversion generation countermeasure network B generates a countermeasure network for circulation; connecting the shape conversion generation countermeasure network A with the texture conversion generation countermeasure network B to obtain a shooting ancient character generation network model; shape transformation in the joint training photography ancient character generation network model generates a confrontation network A and texture conversion generates a confrontation network B; connecting the output end of the generator A1 with the input end of the generator B1 by using the trained generator A1 and generator B1 to obtain a photographic ancient character image generator; and generating ancient characters by adopting the photographic ancient text and digital image generator.

Description

Ancient character generation method combining shape transformation and texture transformation
Technical Field
The invention belongs to the technical field of image processing and artificial intelligence, and particularly relates to an ancient character generation method combining shape transformation and texture conversion.
Background
The algorithm based on deep learning often depends on massive training data to improve the performance of the algorithm, wherein the supervised learning algorithm has great dependence on data marking, so that people pay attention to marking data acquired in a low labor cost mode, and common methods include data augmentation, data generation and the like. However, for ancient digital images with various foreground shapes and rich background textures, data diversity is very limited because data augmentation generally depends on prior probability distribution of manual design. In contrast, data generation facilitates mining of more diverse data by directly fitting the distribution of the data.
The most common data generation technology at present is to generate a countermeasure network, wherein a shape transformation method and a texture transformation method based on the generation of the countermeasure network have high research enthusiasm in the field of character generation respectively. The shape transformation method based on generation of the countermeasure network generally combines methods such as a space transformation network, a cavity convolution, a deformable convolution and the like with countermeasure learning, so that shape transformation of characters is achieved, wherein the cavity convolution and the deformable convolution can achieve a good deformation result under the support of massive labeled training data, the space transformation network is more suitable for unsupervised learning, and requirements for the training data are relatively low. On the other hand, some methods realize supervised texture conversion through a conditional generation countermeasure network, and some methods realize unsupervised conversion of texture patterns through a loop generation countermeasure network framework, wherein the unsupervised texture conversion hardly requires annotation data, which is friendly to ancient text and digital images with small data volume.
The existing character generation technology mainly carries out independent training on the shape transformation method and the texture transformation generation countermeasure network respectively, and then carries out stacking connection for generating more vivid samples on the shape and texture levels, but the connection mode can cause the problems of gradient disappearance during training and inconsistency after fusion of the generated image shape and texture characteristics.
The shape transformation technique in the prior art is a random font augmentation mode, and deformation parameters need to be randomly sampled from artificially designed prior distribution according to experience, so that diversity is generated. However, the distribution of manual design may not fit the distribution of the real ancient character shape, and the calculation and selection of the distribution are complex, which consumes a lot of labor cost, and the error of distribution fitting may result in low data generation quality. And if the shape conversion model and the texture conversion model are simply stacked and connected, the problem of gradient disappearance is easily caused, the shape conversion model and the texture conversion model cannot be combined and optimized, the character foreground after shape conversion and the background rich in texture cannot be well fused, and the quality of generating the ancient characters is reduced.
Disclosure of Invention
In view of the above, there is a need to provide a method for generating ancient characters by combining shape transformation and texture transformation, the method uses a shape transformation generation countermeasure network combining affine transformation and Thin Plate Spline (TPS) transformation, the network can autonomously learn a target shape probability distribution from data and can generate more refined shape transformation, and the method uses a training mode based on information interaction to jointly optimize a plurality of generation countermeasure networks, so that they can mutually promote tuning and improve the quality of generated samples.
The invention discloses an ancient character generation method combining shape transformation and texture conversion, which comprises the following steps:
step 1, constructing a shape transformation generation confrontation network A, which comprises a generator A1 and a discriminator A2; firstly, copying ancient character images as input of a generator A1, generating deformed character images after space transformation, connecting an output end of the generator A1 with an input end of a discriminator A2, simultaneously inputting target character images to the other input end of the discriminator A2, and outputting a discrimination result of the deformed character images and the target character images by the discriminator A2;
step 2, constructing a texture conversion generation confrontation network B, wherein the texture conversion generation confrontation network B comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, and the texture conversion generation confrontation network B generates the confrontation network for circulation; firstly, inputting a copied ancient text digital image into a generator B1, wherein the output end of the generator B1 is connected with the input end of a discriminator B2, then, inputting a photographed ancient text digital image into the discriminator B2, simultaneously, inputting a photographed ancient text digital image into a generator B3, wherein the output end of a generator B3 is connected with the input end of the discriminator B4, then, inputting a copied ancient text digital image into a discriminator B4, and obtaining discrimination results of the texture of the photographed ancient text image and the texture of the copied ancient text image from the discriminator B2 and the discriminator B4 respectively;
step 3, connecting the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B to obtain a photography ancient character generation network model; the output of the generator A1 in the shape transformation generation countermeasure network A is connected with the input end of the generator B1 in the texture transformation generation countermeasure network B, the output end of the generator B3 in the texture transformation generation countermeasure network B is connected with the input end of the discriminator A2 in the shape transformation generation countermeasure network A, namely the output of the generator B3 is used as the target character image of the shape transformation generation countermeasure network A, and the output of the generator A1 is used as the deformed character image and is input into the generator B1;
step 4, jointly training the shape transformation in the photography ancient character generation network model to generate a confrontation network A and performing texture conversion to generate a confrontation network B;
step 5, connecting the output end of the generator A1 with the input end of the generator B1 by using the trained generator A1 and generator B1 to obtain a generator of the photographed ancient character image;
and 6, generating ancient characters by adopting the photographing ancient character image generator.
Specifically, the generator a1 is a spatial transform network, including an encoder, a predictor, a sampler, a noise reconstruction network, and an image reconstruction network;
the encoder is composed of a plurality of convolution modules, each convolution module comprises a two-dimensional convolution layer, a nonlinear activation layer and a convergence layer which are sequentially connected;
the predictor consists of a plurality of full-connection modules and a last full-connection layer, each full-connection module comprises a full-connection layer and a nonlinear activation layer, and the number of output channels of the last full-connection layer is set as the number of deformation parameters to be predicted;
the sampler maps the deformed character image pixel area to the copy ancient character image pixel area by applying matrix multiplication on a sampling grid;
the image reconstruction network is sequentially connected by a plurality of full-connection modules, a full-connection layer and a plurality of transposition convolution modules, and each transposition convolution module comprises a transposition convolution layer and a nonlinear activation layer which are sequentially connected;
the noise reconstruction network is sequentially connected by a plurality of layers of full connection modules and a full connection layer;
the discriminator A2 is based on the structure of patchGAN and is composed of five convolution modules connected in sequence, wherein each convolution module in the first four convolution modules comprises a two-dimensional convolution layer, an example normalization layer and a leakyReLU activation layer, and the last convolution module comprises a padding layer and a two-dimensional convolution layer;
firstly, copy the ancient character image
Figure 790325DEST_PATH_IMAGE001
As input to the encoder, the encoder copies the ancient text image
Figure 238624DEST_PATH_IMAGE001
Extracting shape feature, and outputting a shape feature vector
Figure 424886DEST_PATH_IMAGE002
Then, a noise hidden vector is randomly selected from the standard normal distribution
Figure 351254DEST_PATH_IMAGE003
The shape feature vector
Figure 821418DEST_PATH_IMAGE002
Sum noise hidden vector
Figure 175039DEST_PATH_IMAGE003
Fusing, inputting the fused hidden vector into a predictor, wherein the predictor is responsible for mapping out TPS transformation parameters and affine transformation parameters, the TPS transformation parameters are coordinate values of TPS transformation sampling grid matching points, the affine transformation parameters are converted into affine transformation sampling grids, then, the TPS transformation sampling grids, the affine transformation sampling grids and the copied ancient text digital images are input into a sampler, deformed character images are output, meanwhile, the output end of the predictor is connected with the input ends of an image reconstruction network and a noise reconstruction network, and the input copied ancient text digital images and the noise hidden direction are respectively reconstructedThen, the deformed character image and the target character image output by the generator are input to a discriminator, respectively, and the discriminator outputs the discrimination results of the deformed character image and the target character image.
Optionally, each convolution module in the encoder further comprises a batch normalization layer located between the two-dimensional convolution layer and the nonlinear active layer; the nonlinear activation function of the nonlinear activation layer in the encoder selects a ReLU function, and the convergence operation of the convergence layer selects to maximize convergence.
Optionally, each fully-connected module in the predictor further comprises a batch normalization layer located between the fully-connected layer and the nonlinear activation layer; the nonlinear activation function of the nonlinear activation layer in the predictor selects a ReLU function, and the convergence operation of the convergence layer selects the maximization of convergence.
Preferably, the number of output channels of the last fully-connected layer in the predictor is set to 132, 128 parameters of the deformation parameters to be predicted are coordinates of 64 TPS transformed sampling grid matching points, and 4 parameters are element values of an affine transformation matrix.
Specifically, the sampler is implemented by a torch.nn.functional.grid _ sample () method in the torch, and simultaneously, affine parameters are converted into affine transformation sampling grids by the torch.nn.functional.affine _ grid () method in the torch.
Specifically, the generator B1 includes three convolution modules, four residual convolution modules, three transposed convolution modules and one output convolution module, where each convolution module structurally includes a two-dimensional convolution layer, an example normalization layer and a nonlinear ReLU active layer, which are connected in sequence, each residual convolution module includes two convolution modules and an adder, which are connected in sequence, the adder adds the input end and the output end of the residual convolution module, each transposed convolution module includes a transposed convolution layer, an example normalization layer and a nonlinear ReLU active layer, which are connected in sequence, and the output convolution module includes a convolution layer and a Tanh active function, which are connected in sequence; the generator B3 and the generator B1 have the same structure;
the structures of the discriminator B2 and the discriminator B4 are the same as the structure of the discriminator A2;
preferably, the gradient backhauling is cut off between A1 and B1, and between B3 and A2, and the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B interact through the forward propagation of the image information.
Preferably, a least square generated countermeasure loss function of the shape transformation generation countermeasure network a is constructed as an optimization target of training, and a calculation formula of the least square generated countermeasure loss function is as follows:
Figure 848597DEST_PATH_IMAGE004
Figure 578656DEST_PATH_IMAGE005
Figure 903327DEST_PATH_IMAGE006
representing the loss function of generator a1,
Figure 100002_DEST_PATH_IMAGE007
representing the shape transformation to generate the discriminators in the countermeasure network a,
Figure 834374DEST_PATH_IMAGE008
representing the shape transformation generation against the generators in network a,
Figure 119861DEST_PATH_IMAGE009
representing a signal-to-noise reconstruction loss function,
Figure 512666DEST_PATH_IMAGE010
a function representing the loss of diversity is expressed,
Figure 567209DEST_PATH_IMAGE011
representing the loss function of arbiter a2,
Figure 669157DEST_PATH_IMAGE001
representing the copying of the ancient character image,
Figure 441941DEST_PATH_IMAGE012
a target character image is represented by a character image,
Figure 641366DEST_PATH_IMAGE013
and
Figure 691362DEST_PATH_IMAGE014
indicating the corresponding mathematical expectation.
The signal-noise reconstruction loss function comprises a signal reconstruction sub-term, a noise reconstruction sub-term and a reconstruction error ratio term, and the calculation formula is as follows:
Figure 557686DEST_PATH_IMAGE015
Figure 552187DEST_PATH_IMAGE016
the mean absolute error is represented by the average absolute error,
Figure 286794DEST_PATH_IMAGE017
a noise-hidden vector is represented that is,
Figure 315930DEST_PATH_IMAGE018
and
Figure 494101DEST_PATH_IMAGE019
separately representing image reconstruction networks
Figure 975898DEST_PATH_IMAGE020
And
Figure DEST_PATH_IMAGE021
the reconstructed imitation ancient text digital image and the noise hidden vector,
Figure 45354DEST_PATH_IMAGE022
is a dynamic coefficient of the motion of the object,if the reconstruction error ratio term is greater than logM, let α =1, if the reconstruction error ratio term is less than-logM, let α = -1, if the reconstruction error ratio term is within the ideal range, i.e., [ -logM, logM]Then, α =0 is set, and M represents a hyperparameter;
the calculation formula of the diversity loss function is as follows:
Figure DEST_PATH_IMAGE023
where P denotes a predictor, E denotes an encoder,
Figure 335521DEST_PATH_IMAGE024
and
Figure DEST_PATH_IMAGE025
respectively representing different noise hidden vectors taken from the same gaussian distribution.
Preferably, a least square generated countermeasure loss function of the texture transformation countermeasure network B is constructed as an optimization target of training, and a calculation formula of the least square generated countermeasure loss function is as follows:
Figure 465020DEST_PATH_IMAGE026
Figure 43900DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 995676DEST_PATH_IMAGE028
representing samples from a set of the input copied ancient digital image and the deformed character image output by generator a1, y representing a photographed ancient digital image,
Figure 126967DEST_PATH_IMAGE029
a texture discriminator for representing a photographed ancient character image in the texture conversion generation countermeasure network B,
Figure 771575DEST_PATH_IMAGE030
the representation texture conversion generation confronts the photographic ancient character image texture generator in the network B,
Figure 103331DEST_PATH_IMAGE031
the expression texture conversion generates a texture discriminator of the copy ancient characters in the confrontation network B,
Figure 858797DEST_PATH_IMAGE032
the representation texture conversion generates a texture generator against the copied ancient text in the network B,
Figure 576086DEST_PATH_IMAGE033
expressing a stroke perception cycle consistency loss function, and calculating the formula as follows:
Figure 391595DEST_PATH_IMAGE034
wherein x represents copying the ancient text digital image, y represents photographing the ancient text digital image,
Figure 210647DEST_PATH_IMAGE036
representing the vector element inner product, W is a weight matrix extracted from the copied ancient text digital image, and is calculated as follows:
Figure 769804DEST_PATH_IMAGE037
wherein the content of the first and second substances,
Figure 607179DEST_PATH_IMAGE038
and
Figure DEST_PATH_IMAGE039
the area of the stroke region and the area of the background region, respectively.
In particular, the objective of the TPS transformation is to solve a deformation function
Figure 734535DEST_PATH_IMAGE040
So that
Figure 165516DEST_PATH_IMAGE041
And the function of the bending energy is the minimum,
Figure 997206DEST_PATH_IMAGE042
coordinates representing matching points of the TPS transformed sampling grid on the original character image,
Figure 954667DEST_PATH_IMAGE043
the TPS transform of the deformed character image samples the coordinates of the grid matching points,
Figure 846399DEST_PATH_IMAGE044
for the number of sampling grid matching points in a TPS transform, assume that n sets of matching point pairs for two images have been acquired:
Figure 640043DEST_PATH_IMAGE045
Figure 541003DEST_PATH_IMAGE046
、…、
Figure 102039DEST_PATH_IMAGE047
the deformation function is imagined as bending a thin metal plate so that the plate passes through the given n TPS transform sampling grid matching points, and the energy function for bending the thin plate is expressed as:
Figure 617203DEST_PATH_IMAGE048
the thin plate spline function can be proved to be the function with the minimum bending energy, and the thin plate spline function is as follows:
Figure 553935DEST_PATH_IMAGE049
where U is the basis function:
Figure 917308DEST_PATH_IMAGE050
Figure 724727DEST_PATH_IMAGE051
Figure 99208DEST_PATH_IMAGE052
Figure 116711DEST_PATH_IMAGE053
Figure 359473DEST_PATH_IMAGE054
and
Figure 896765DEST_PATH_IMAGE055
the method is solved by preset values of n TPS transformation sampling grid matching point coordinates and the offset predicted by a predictor, so that the method can obtain
Figure 94613DEST_PATH_IMAGE056
And (5) specific expressions.
The sampling formula of the affine transformation sampling grid is as follows:
Figure 820253DEST_PATH_IMAGE057
wherein the content of the first and second substances,
Figure 725761DEST_PATH_IMAGE058
Figure 100002_DEST_PATH_IMAGE059
Figure 648718DEST_PATH_IMAGE060
Figure 489635DEST_PATH_IMAGE061
respectively representing affine transformation parameters predicted by the predictor,
Figure 484660DEST_PATH_IMAGE062
and
Figure 944591DEST_PATH_IMAGE063
respectively the position coordinates of the pixel points before and after transformation.
Preferably, the pixel size of all images is 64 x 64, the batch size is 64, the initial learning rate of the shape transformation generation countermeasure network A is 0.0001, the learning rate of the texture transformation generation countermeasure network B is 0.001, the training iteration number is 30000, the learning rate starts to linearly decay to 1e-5 after 15000 iterations, and the network is optimized by using an adam optimizer.
Compared with the prior art, the invention has the beneficial effects that:
the method adopts the space transformation network with the reconstruction branch as a generator for generating the confrontation network through shape transformation, automatically learns the shape distribution of the photographed ancient Chinese digital image from the global and local two levels, does not need to manually preset the shape distribution, reduces the labor cost and simultaneously improves the authenticity and diversity of the generated sample;
the method realizes the joint optimization of the shape transformation generation countermeasure network and the texture transformation generation countermeasure network, so that the character foreground and the background with rich texture after the shape transformation are better fused, and the authenticity of the generated sample is improved.
Drawings
FIG. 1 shows a schematic flow diagram of a method embodying the present invention;
FIG. 2 is a schematic diagram of the structure of a shape transformation generation countermeasure network in an embodiment of the present invention;
FIG. 3 is a diagram illustrating the structure of a texture transformation generation countermeasure network in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a photographic ancient character generation network model in the embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For the sake of reference and clarity, the technical terms, abbreviations or abbreviations used hereinafter are to be interpreted in summary as follows:
STN: a spatialtransformational network space transform network.
TPS, thin plate spline and thin plate spline.
CNN: a convolutional neural network.
FC network: fully connected network.
Pythrch: a mainstream deep learning framework encapsulates a plurality of commonly used deep learning related functions and classes.
ReLU/leakyReLU: a non-linear activation function.
And (3) generating a countermeasure network: a generating network training framework based on the idea of zero sum game comprises a generator and an arbiter.
Hidden vector quantity: vectors in random variable space.
The invention discloses an ancient character generation method combining shape transformation and texture conversion, which aims to solve a plurality of problems in the prior art. Fig. 1 shows a method flow diagram of an embodiment of the present invention, and a method for generating an ancient character combining shape transformation and texture conversion includes the following steps:
step 1, constructing a shape transformation generation confrontation network A, which comprises a generator A1 and a discriminator A2; firstly, copying ancient character images as input of a generator A1, generating deformed character images after space transformation, connecting an output end of the generator A1 with an input end of a discriminator A2, simultaneously inputting target character images to the other input end of the discriminator A2, and outputting a discrimination result of the deformed character images and the target character images by the discriminator A2;
step 2, constructing a texture conversion generation confrontation network B, wherein the texture conversion generation confrontation network B comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, and the texture conversion generation confrontation network is used for circularly generating the confrontation network; firstly, inputting a copied ancient character image into a generator B1, wherein the output end of the generator B1 is connected with the input end of a discriminator B2, then, inputting a photographed ancient character image into the discriminator B2, simultaneously, inputting a photographed ancient character image into a generator B3, wherein the output end of a generator B3 is connected with the input end of the discriminator B4, then, inputting a copied ancient character image into the discriminator B4, and obtaining discrimination results of the texture of the photographed ancient character image and the texture of the copied ancient character image from the discriminator B2 and the discriminator B4 respectively;
step 3, connecting the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B to obtain a shooting ancient character generation network model; the output of the generator A1 in the shape transformation generation countermeasure network A is connected with the input end of the generator B1 in the texture transformation generation countermeasure network B, the output end of the generator B3 in the texture transformation generation countermeasure network B is connected with the input end of the discriminator A2 in the shape transformation generation countermeasure network A, namely the output of the generator B3 is used as the target character image of the shape transformation generation countermeasure network A, and the output of the generator A1 is used as the deformed character image and is input into the generator B1;
step 4, jointly training the shape transformation in the photography ancient character generation network model to generate a confrontation network A and performing texture conversion to generate a confrontation network B;
step 5, connecting the output end of the generator A1 with the input end of the generator B1 by using the trained generator A1 and generator B1 to obtain a generator of the photographed ancient character image;
and 6, generating ancient characters by adopting the photographing ancient character image generator.
Specifically, the present embodiment adopts the following steps to implement the inventive method.
1. And collecting manually copied ancient character image data and the ancient character image data shot in a real scene, wherein the manually copied ancient character image can be a character image generated by a drawing data collecting device such as a drawing digital board, an electronic tablet and the like or computer drawing software similar to Photoshop and the like. And respectively constructing a data set for the collected copied ancient character image and the photographed ancient character image.
2. The generation countermeasure network a for shape transformation is constructed, including a generator a1 and a discriminator a 2.
(1) First, a spatial transform network is constructed as generator a1, the spatial transform network comprising three modules, respectively an encoder, a predictor and a sampler. Firstly, an encoder is constructed, the encoder is composed of connected Convolutional Neural Networks (CNNs), the number of convolutional layers of the CNNs is generally selected to be more than 3, in an implementation example, 4 convolutional modules are selected to be connected in sequence, each convolutional module comprises a two-dimensional convolutional layer, a batch normalization layer, a nonlinear activation layer and a convergence (pooling) layer, wherein the batch normalization layer is optional, the nonlinear activation function can be selected from a ReLU function or a leakyReLU function, the convergence operation can be selected from maximization convergence, average convergence or self-adaptive convergence, and the ReLU function and the maximization convergence are adopted in the implementation example.
(2) Secondly, a predictor is constructed, the predictor is composed of a fully-connected (FC) neural network, the number of FC layers is generally more than 2, the FC network of the embodiment comprises 3 FC modules which are connected in sequence, and finally, one FC layer is connected. Each FC module comprises an FC layer, a batch normalization layer and a nonlinear activation layer, wherein the batch normalization layer is optional, the nonlinear activation function can be a ReLU function or a LEAKYReLU function, the aggregation operation can be maximum aggregation, average aggregation or adaptive aggregation, the ReLU function and the maximum aggregation are adopted in the example, the number of output channels of the last FC layer is set as the number of deformation parameters needing to be predicted, the number of the deformation parameters adopted in the example is 132, the 128 parameters are coordinate values of 8 × 8=64 TPS transformation sampling grid matching points, and the 4 parameters are element values of an affine transformation matrix. Note that the number of TPS transform sampling grid matching points may instead be any integer smaller than the original character image pixels.
(3) Next, a sampler is constructed, and this example is implemented by using a torch.nn.functional.grid _ sample () method in the deep learning framework.
(4) Finally, an image reconstruction network is constructed
Figure 315530DEST_PATH_IMAGE064
Sum noise reconstruction network
Figure 451982DEST_PATH_IMAGE065
Figure 275581DEST_PATH_IMAGE064
Consists of 3 FC modules, 1 FC layer and 4 transposition convolution modules which are connected in sequence,
Figure 539204DEST_PATH_IMAGE065
the device comprises 3 FC modules and 1 FC layer which are connected in sequence, wherein the transposed convolution module comprises 1 transposed convolution layer, 1 batch normalization layer and 1 nonlinear activation layer which are connected in sequence, wherein the batch normalization layer is optional, the nonlinear activation function can select a ReLU function or a LEAKyReLU function, the convergence operation can select maximum convergence, average convergence or self-adaptive convergence, and the ReLU function and the maximum convergence are adopted in the embodiment.
(5) The working principle of the space transformation network is as follows: first, copy the ancient character image
Figure 764649DEST_PATH_IMAGE066
As input to the encoder, the encoder extracts shape-removed features from the input image and outputs a shape feature vector
Figure 72002DEST_PATH_IMAGE067
Next, we randomly choose a noise hidden vector from the standard normal distribution
Figure 648477DEST_PATH_IMAGE068
Will be
Figure 450211DEST_PATH_IMAGE067
And
Figure 530162DEST_PATH_IMAGE068
the fusion is carried out in a way that the fusion of the example is directly carried out by superposition summation, wherein
Figure 883783DEST_PATH_IMAGE067
The character pattern characteristic information is expressed, the function of ensuring the output authenticity is realized,
Figure 72188DEST_PATH_IMAGE068
certain randomness can be brought, and the diversity of output is ensured. And inputting the fused implicit vector into a predictor, wherein the predictor is responsible for mapping out TPS transformation parameters and affine transformation parameters, the TPS transformation parameters are coordinate values of matched points of a TPS transformation sampling grid, the sampling grid is provided with 8 x 8=64 matched points, and the affine transformation parameters are affine matrix parameters and have 4 parameters in total. Next, the 4 affine parameters are converted into affine transformation sampling grids by the method of torch. Next, the TPS transformed sampling grid and the affine transformed sampling grid and the original image are input to the sampler, and the transformed character image is output. We assume that the coordinates of n sets of TPS sample grid matching point pairs for two images (panels a and B) have been acquired:
Figure 677613DEST_PATH_IMAGE069
Figure 877650DEST_PATH_IMAGE070
、…、
Figure 795315DEST_PATH_IMAGE071
in this example, n is 64. The process of calculating the coordinate correspondence of map a and map B using the TPS transform is as follows: the goal of the TPS transform is to solve a function
Figure DEST_PATH_IMAGE073
So that
Figure 487327DEST_PATH_IMAGE074
And the bending energy function is minimum, so that other points on the image can obtain a good transformation result through interpolation. The deformation function can be thought of as bending a thin metal plate through a given n TPS transform sampling grid matching points, and the energy function for bending the thin plate can be expressed as:
Figure 755498DEST_PATH_IMAGE075
it can be proved that the thin plate spline function is the function with the minimum bending energy, and the thin plate spline function is as follows:
Figure 934675DEST_PATH_IMAGE049
where U is the basis function:
Figure 36623DEST_PATH_IMAGE050
Figure 809407DEST_PATH_IMAGE076
in the above formula, only need to obtain
Figure 5902DEST_PATH_IMAGE052
Figure 180532DEST_PATH_IMAGE077
Figure 922223DEST_PATH_IMAGE078
And
Figure 572516DEST_PATH_IMAGE079
can determine
Figure 182489DEST_PATH_IMAGE080
Figure 86991DEST_PATH_IMAGE081
Figure 124217DEST_PATH_IMAGE082
Figure 606014DEST_PATH_IMAGE084
And
Figure 412820DEST_PATH_IMAGE085
the solution can be done by sampling the preset values of the grid matching point coordinates and the offset predicted by the predictor through 64 TPS transforms.
Similarly, we assume that
Figure 30883DEST_PATH_IMAGE086
And
Figure 379956DEST_PATH_IMAGE087
the positions of the pixel points before and after the transformation are respectively, the sampling formula of the affine transformation is as follows:
Figure 349049DEST_PATH_IMAGE088
wherein
Figure 691038DEST_PATH_IMAGE089
Figure 570132DEST_PATH_IMAGE090
Figure DEST_PATH_IMAGE091
Figure 870532DEST_PATH_IMAGE092
Each representing 4 affine transformation parameters predicted by the predictor.
And finally, connecting the output end of the predictor with the input ends of the image reconstruction network and the noise reconstruction network, and reconstructing an original image and a noise hidden vector respectively.
(6) And constructing a discriminator A2, wherein the discriminator A2 is a patchGAN-based structure [6] and consists of 5 convolution modules which are connected in sequence, and each module in the first 4 convolution modules comprises a two-dimensional convolution layer, an example normalization layer and a LEAKYRELU activation layer. The last convolution module includes an optional padding layer and a two-dimensional convolution layer.
(7) A generation countermeasure network A composed of a generator (space transformation network) for copying ancient character images and a discriminator is shown in FIG. 2
Figure 326922DEST_PATH_IMAGE093
As an input, a character image which is subjected to spatial conversion to generate a distortion is connected with the output terminal of the generator A1 and the input terminal of the discriminator A2, and at the same time, a target character image is displayed
Figure 692175DEST_PATH_IMAGE094
The input is the other output end of the discriminator A2, and the discriminator A2 outputs the discrimination result of the deformed ancient character image and the target character image.
3. A generative confrontation network B for texture transformation is constructed, which is a loop structure including a generator B1 and a discriminator B2, and a generator B3 and a discriminator B4.
(1) The generator B1 and the generator B3 are constructed, the structures of the generator B1 and the generator B3 are completely the same, the generator B8932 comprises three convolution modules, four residual convolution modules, three transposed convolution modules and an output convolution module which are connected in sequence, the structure of each convolution module comprises a two-dimensional convolution layer, an example normalization layer and a nonlinear ReLU active layer which are connected in sequence, each residual convolution module comprises two convolution modules and an adder which are connected in sequence, the adder adds the input end and the output end of the residual convolution module, and each transposed convolution module comprises a transposed convolution layer, an example normalization layer and a nonlinear ReLU active layer which are connected in sequence. The output convolution module comprises a convolution layer and a Tanh activation function which are connected in sequence.
(2) Discriminator B2 and discriminator B4 are constructed, and their structures are identical to those of discriminator a 2.
(3) The generation of the confrontation network B is connected in a manner as shown in fig. 3, and in order to cyclically generate the confrontation network, the copied ancient Chinese digital image is first input to the generator B1, the output terminal of the generator B1 is connected to the input terminal of the discriminator B2, and the photographed ancient Chinese digital image is also input to the discriminator B2. Similarly, the photographed ancient text image is input to the generator B3, the output terminal of the generator B3 is connected to the input terminal of the discriminator B4, and the copied ancient text image is also input to the discriminator B4.
4. As shown in fig. 4, the whole photographing ancient character generation system is constructed by connecting a and B in the following connection modes: the output of the generator a1 in the generation countermeasure network a is connected to the input of the generator B1 in the generation countermeasure network B, and the output of the generator B3 in the generation countermeasure network B is connected to the input of the discriminator a2 in the generation countermeasure network a, that is, when the output of the generator B3 is input to the generator B1 as the target character image of the generation countermeasure network a and the output of the generator a1 is input to the generator B1 as the character image of the deformation. Note that to avoid the gradient disappearing, the gradient backhauling is cut off between a1 and B1, and between B3 and a2, creating a countermeasure network against the forward propagation of image information by a and B.
5. And constructing an optimization target for guiding the learning of the neural network.
The signal noise reconstruction loss function, the diversity function and the least square generation countermeasure loss function are constructed for the countermeasure network A as follows:
Figure 815989DEST_PATH_IMAGE095
wherein the content of the first and second substances,
Figure 490553DEST_PATH_IMAGE096
the mean absolute error loss function is represented,
Figure 434238DEST_PATH_IMAGE098
and
Figure 603182DEST_PATH_IMAGE099
separately representing reconstructed networks
Figure 315923DEST_PATH_IMAGE100
And
Figure DEST_PATH_IMAGE101
and the reconstructed copied ancient text digital image and the noise hidden vector. Under the condition of lacking strong supervision, the action generated by input information can be restrained in the neural network learning process, and in order to avoid the condition, reconstruction loss items are respectively designed for information and noise vectors, so that the action of guaranteeing authenticity of shape information and guaranteeing diversity of noise cannot be restrained. In addition, in order to ensure that the transformation degree of the transformed font is more reasonable and controllable, a reconstruction loss ratio is designed to seek the balance of respective effects of shape information and noise, and a hyper-parameter M is further used>1 to constrain this term. Where α is the dynamic coefficient, we assume α =1 if the term is greater than logM during training. In this case, the noise reconstruction is much worse than the signal reconstruction. This means that the effect of noise is suppressed by the network and we use the gradient descent method to optimize the positive ratio term. Conversely, when the term is less than-logM, we assume α = -1. In this case, the signal reconstruction is much worse than the noise reconstruction. This means that the effect of noise is too prominent, the effect of shape information is suppressed and we use the gradient descent method to optimize the negative ratio term. If the term is within the ideal range, i.e., [ -logM, logM]Then α =0 is set, i.e. without any additional optimization.
Figure 214916DEST_PATH_IMAGE102
Where P denotes a predictor, E denotes an encoder,
Figure 786843DEST_PATH_IMAGE103
and
Figure 274325DEST_PATH_IMAGE105
respectively representing different noise hidden vectors taken from the same gaussian distribution.
Figure 294102DEST_PATH_IMAGE106
Figure 224638DEST_PATH_IMAGE107
Wherein the content of the first and second substances,
Figure 988588DEST_PATH_IMAGE108
is the output of generator B3.
To build a stroke-aware loop consistency loss function and a least-squares generation countermeasure loss function for the countermeasure network B, the following are generated:
Figure 889548DEST_PATH_IMAGE110
wherein x represents copying the ancient text digital image, y represents photographing the ancient text digital image,
Figure 232674DEST_PATH_IMAGE112
representing the vector element inner product, W is a weight matrix extracted from the copied ancient text digital image, and is calculated as follows:
Figure 436253DEST_PATH_IMAGE114
wherein the content of the first and second substances,
Figure 310668DEST_PATH_IMAGE115
,
Figure 408462DEST_PATH_IMAGE116
in the area of the stroke area and the area of the background area, respectively.
Figure 950301DEST_PATH_IMAGE118
Wherein the content of the first and second substances,
Figure 324782DEST_PATH_IMAGE120
representing a sample in a set formed by the input copied ancient text digital image and the output deformed character image of generator a1, y represents a photographic ancient text digital image.
Figure 43900DEST_PATH_IMAGE027
6. All image pixel sizes were set to 64 x 64, batch size was 64, with an initial learning rate of 0.0001 for generation of countermeasure network a, 0.001 for generation of countermeasure network B, number of training iterations 30000, and after 15000 iterations the learning rate started to decay linearly to 1e-5, with the network optimized using adam optimizer.
7. And (3) training jointly to generate the confrontation networks A and B according to the connection mode in the step 4 to obtain a trained generator A1 and a generator B1, and connecting the output end of the generator A1 with the input end of the generator B1 to obtain a complete photographic ancient character image generator which can be used for generating various generation samples.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A method for generating ancient characters by combining shape transformation and texture conversion is characterized by comprising the following steps:
step 1, constructing a shape transformation generation confrontation network A, which comprises a generator A1 and a discriminator A2; firstly, a copied ancient character image is taken as the input of a generator A1, a deformed character image is generated after spatial transformation, the output end of the generator A1 is connected with the input end of a discriminator A2, meanwhile, a target character image is input into the other input end of the discriminator A2, and the discriminator A2 outputs the discrimination result of the deformed character image and the target character image;
step 2, constructing a texture conversion generation confrontation network B, wherein the texture conversion generation confrontation network B comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, and the texture conversion generation confrontation network B generates the confrontation network for circulation; firstly, inputting a copied ancient text digital image into a generator B1, wherein the output end of the generator B1 is connected with the input end of a discriminator B2, then, inputting a photographed ancient text digital image into the discriminator B2, simultaneously, inputting a photographed ancient text digital image into a generator B3, wherein the output end of a generator B3 is connected with the input end of the discriminator B4, then, inputting a copied ancient text digital image into a discriminator B4, and obtaining discrimination results of the texture of the photographed ancient text image and the texture of the copied ancient text image from the discriminator B2 and the discriminator B4 respectively;
step 3, connecting the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B to obtain a shooting ancient character generation network model; the output of the generator A1 in the shape transformation generation countermeasure network A is connected with the input end of the generator B1 in the texture transformation generation countermeasure network B, the output end of the generator B3 in the texture transformation generation countermeasure network B is connected with the input end of the discriminator A2 in the shape transformation generation countermeasure network A, namely the output of the generator B3 is used as the target character image of the shape transformation generation countermeasure network A, and the output of the generator A1 is used as the deformed character image and is input into the generator B1;
step 4, jointly training the shape transformation in the photography ancient character generation network model to generate a confrontation network A and performing texture conversion to generate a confrontation network B;
step 5, connecting the output end of the generator A1 with the input end of the generator B1 by using the trained generator A1 and generator B1 to obtain a generator of the photographed ancient character image;
step 6, generating ancient characters by adopting the photographic ancient character image generator;
the generator A1 is a space transformation network, which comprises an encoder, a predictor, a sampler, a noise reconstruction network and an image reconstruction network;
the encoder is composed of a plurality of convolution modules, and each convolution module comprises a two-dimensional convolution layer, a nonlinear activation layer and a convergence layer which are sequentially connected;
the predictor consists of a plurality of full-connection modules and a last full-connection layer, each full-connection module comprises a full-connection layer and a nonlinear activation layer, and the number of output channels of the last full-connection layer is set as the number of deformation parameters to be predicted;
the sampler maps the deformed character image pixel area to the copied ancient character image pixel area by applying matrix multiplication on a sampling grid;
the image reconstruction network is sequentially connected by a plurality of full-connection modules, a full-connection layer and a plurality of transposition convolution modules, and each transposition convolution module comprises a transposition convolution layer and a nonlinear activation layer which are sequentially connected;
the noise reconstruction network is sequentially connected by a plurality of layers of full connection modules and a full connection layer;
the discriminator A2 is based on the structure of patchGAN and is composed of five convolution modules which are connected in sequence, wherein each convolution module in the first four convolution modules comprises a two-dimensional convolution layer, an example normalization layer and a LEAKYRELU activation layer, and the last convolution module comprises a padding layer and a two-dimensional convolution layer;
first, copy the ancient character image
Figure DEST_PATH_IMAGE001
As input to the encoder, the encoder copies the ancient text image
Figure 696585DEST_PATH_IMAGE001
Extracting shape feature, and outputting a shape feature vector
Figure 282418DEST_PATH_IMAGE002
Then, a noise hidden vector is randomly selected from the standard normal distribution
Figure 291963DEST_PATH_IMAGE003
The shape feature vector
Figure 263330DEST_PATH_IMAGE002
Sum noise hidden vector
Figure 332917DEST_PATH_IMAGE003
Fusing, inputting the fused implicit vector into a predictor, wherein the predictor is responsible for mapping out TPS (transformation protocol secure) transformation parameters and affine transformation parameters, the TPS transformation parameters are coordinate values of TPS transformation sampling grid matching points, the affine transformation parameters are converted into affine transformation sampling grids, and then, the TPS is transformed into a pseudo-random number (TPS) modelThe method comprises the steps of inputting a transformation sampling grid, an affine transformation sampling grid and a copied ancient character image into a sampler, outputting a deformed character image, connecting the output end of a predictor with the input ends of an image reconstruction network and a noise reconstruction network, reconstructing the input copied ancient character image and a noise hidden vector respectively, then inputting the deformed character image and a target character image output by a generator A1 into a discriminator A2 respectively, and outputting discrimination results of the deformed character image and the target character image by the discriminator A2.
2. The method for generating ancient words by combining shape transformation and texture transformation according to claim 1, wherein the generator B1 comprises three convolution modules, four residual convolution modules, three transposition convolution modules and an output convolution module which are connected in sequence, each convolution module has a structure comprising a two-dimensional convolution layer, an example normalization layer and a nonlinear ReLU activation layer which are connected in sequence, each residual convolution module comprises two convolution modules and an adder which are connected in sequence, the adder adds the input end and the output end of the residual module, each transposition convolution module comprises a transposition convolution layer, an example normalization layer and a nonlinear ReLU activation layer which are connected in sequence, and the output convolution module comprises a convolution layer and a Tanh activation function which are connected in sequence; the generator B3 and the generator B1 have the same structure;
the structures of the discriminator B2 and the discriminator B4 are the same as the structure of the discriminator A2;
the gradient backhauling is cut off between generator a1 and generator B1, and between generator B3 and discriminator a2, and the shape transformation generation countermeasure network a and the texture transformation generation countermeasure network B interact through the forward propagation of the image information.
3. The method of claim 1, wherein each of said convolution modules in said encoder further comprises a batch normalization layer disposed between said two-dimensional convolution layer and said non-linear active layer; the nonlinear activation function of the nonlinear activation layer in the encoder selects a ReLU function, and the convergence operation of the convergence layer selects the maximized convergence;
each full-connection module in the predictor also comprises a batch normalization layer which is positioned between the full-connection layer and the nonlinear activation layer; the nonlinear activation function of the nonlinear activation layer in the predictor selects a ReLU function, and the convergence operation of the convergence layer selects the maximization of convergence.
4. The method for generating ancient characters by combining shape transformation and texture transformation according to claim 1 or 2, wherein a least square generation countermeasure loss function of the shape transformation generation countermeasure network A is constructed as an optimization target of training, and a calculation formula of the least square generation countermeasure loss function is as follows:
Figure DEST_PATH_IMAGE004
Figure 107844DEST_PATH_IMAGE005
representing the loss function of generator a1,
Figure 45713DEST_PATH_IMAGE006
representing the shape transformation to generate the discriminators in the countermeasure network a,
Figure DEST_PATH_IMAGE007
the representation shape transformation generates a generator in the countermeasure network a,
Figure 356739DEST_PATH_IMAGE008
representing a signal-to-noise reconstruction loss function,
Figure 331649DEST_PATH_IMAGE009
representing a diversity loss function, P representing a predictor, E representing an encoder,
Figure 141342DEST_PATH_IMAGE010
representing the loss function of arbiter a2,
Figure DEST_PATH_IMAGE011
representing the copying of the ancient character image,
Figure 227109DEST_PATH_IMAGE012
a target character image is represented by a character image,
Figure 416477DEST_PATH_IMAGE013
and
Figure DEST_PATH_IMAGE014
indicating the corresponding mathematical expectation.
5. The method as claimed in claim 4, wherein the signal-to-noise reconstruction loss function comprises a signal reconstruction sub-term, a noise reconstruction sub-term and a reconstruction error ratio term, and the calculation formula is as follows:
Figure 155763DEST_PATH_IMAGE015
Figure 328118DEST_PATH_IMAGE016
the mean absolute error is represented by the average absolute error,
Figure DEST_PATH_IMAGE017
a noise-hidden vector is represented that is,
Figure 592892DEST_PATH_315930DEST_PATH_IMAGE018
and
Figure 545356DEST_PATH_603182DEST_PATH_IMAGE099
separately representing image reconstruction networks
Figure 102819DEST_PATH_975898DEST_PATH_IMAGE020
And
Figure 498028DEST_PATH_DEST_PATH_IMAGE021
the reconstructed imitation ancient text digital image and the noise hidden vector,
Figure 936823DEST_PATH_IMAGE022
is a dynamic coefficient, if the reconstruction error ratio term is greater than logM, let α =1, if the reconstruction error ratio term is less than-logM, let α = -1, if the reconstruction error ratio term is within the ideal range, i.e., [ -logM, logM]If the parameter is set to be alpha =0, M represents a hyperparameter;
the calculation formula of the diversity loss function is as follows:
Figure 732741DEST_PATH_IMAGE023
where P denotes a predictor, E denotes an encoder,
Figure 876146DEST_PATH_IMAGE024
and
Figure 226356DEST_PATH_IMAGE025
respectively representing different noise hidden vectors taken from the same gaussian distribution.
6. The method for generating ancient texts by combining shape transformation and texture transformation according to claim 1 or 2, wherein a least square generation countermeasure loss function of a texture transformation countermeasure network B is constructed as an optimization target of training, and a calculation formula of the least square generation countermeasure loss function is as follows:
Figure DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 332983DEST_PATH_IMAGE027
samples in a set collectively constituted by the input copied ancient text digital image and the deformed character image output by generator a1,
Figure 108041DEST_PATH_IMAGE028
representing a photographic ancient text-based digital image,
Figure DEST_PATH_IMAGE029
a texture discriminator for representing a photographed ancient character image in the texture conversion generation countermeasure network B,
Figure 766556DEST_PATH_IMAGE030
the expression texture conversion generates a texture generator against the photographed ancient text image in the network B,
Figure 509121DEST_PATH_IMAGE031
the expression texture conversion generates a texture discriminator of the copy ancient characters in the confrontation network B,
Figure DEST_PATH_IMAGE032
the representation texture conversion generates a texture generator against the copied ancient text digital image in the network B,
Figure 137549DEST_PATH_IMAGE033
expressing a stroke perception cycle consistency loss function, and calculating the formula as follows:
Figure DEST_PATH_IMAGE034
wherein the content of the first and second substances,
Figure 252267DEST_PATH_IMAGE035
representing the inner product of vector elements, W being a weight matrix extracted from a copied ancient text digital imageThe calculation is as follows:
Figure 347262DEST_PATH_IMAGE036
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE037
and
Figure 531118DEST_PATH_IMAGE038
the area of the stroke region and the area of the background region, respectively.
7. The method for generating ancient words by combining shape transformation and texture transformation according to claim 1 or 2, wherein the number of output channels of the last fully-connected layer in the predictor is set to 132, 128 parameters in the deformation parameters to be predicted are coordinates of 64 TPS transformation sampling grid matching points, and 4 parameters are element values of an affine transformation matrix.
8. The method of claim 7, wherein the TPS transformation aims to solve a deformation function
Figure 635340DEST_PATH_IMAGE039
So that
Figure DEST_PATH_165516DEST_PATH_IMAGE041
And the bending energy function is the smallest,
Figure DEST_PATH_997206DEST_PATH_IMAGE042
coordinates representing matching points of the TPS transformed sampling grid on the original character image,
Figure DEST_PATH_954667DEST_PATH_IMAGE043
TPS representing morphed character imagesThe coordinates of the matching points of the sampling grid are transformed,
Figure 89007DEST_PATH_IMAGE044
for the number of sampling grid matching points in a TPS transform, assume that n sets of matching point pairs for two images have been acquired:
Figure DEST_PATH_IMAGE045
Figure 162006DEST_PATH_IMAGE046
、…、
Figure DEST_PATH_IMAGE047
the deformation function is imagined as bending a thin metal plate through a given n TPS transform sampling grid matching points, and the energy function for bending the thin plate is expressed as:
Figure 333224DEST_PATH_IMAGE048
the thin plate spline function can be proved to be the function with the minimum bending energy, and the thin plate spline function is as follows:
Figure 277958DEST_PATH_IMAGE049
where U is the basis function:
Figure 458404DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE051
Figure 917067DEST_PATH_IMAGE052
Figure 400132DEST_PATH_IMAGE053
and
Figure 983560DEST_PATH_IMAGE054
the method is solved by preset values of n TPS transformation sampling grid matching point coordinates and the offset predicted by a predictor,
Figure 341598DEST_PATH_IMAGE055
thereby can obtain
Figure 61292DEST_PATH_IMAGE056
A specific expression;
the sampling formula of the affine transformation sampling grid is as follows:
Figure DEST_PATH_IMAGE057
wherein the content of the first and second substances,
Figure 433368DEST_PATH_IMAGE058
Figure DEST_PATH_IMAGE059
Figure 51562DEST_PATH_IMAGE060
Figure 839390DEST_PATH_IMAGE061
respectively representing affine transformation parameters predicted by the predictor,
Figure 272645DEST_PATH_IMAGE062
and
Figure 222146DEST_PATH_IMAGE063
respectively the position coordinates of the pixel points before and after transformation.
9. The method for generating ancient words by combining shape transformation and texture transformation as claimed in claim 7, wherein said sampler is implemented by a torch.nn.functional.grid sample () method in a pyrtch, and at the same time, the affine parameters are converted into affine transformation sampling grid by a torch.nn.functional.affine _ grid () method in a pyrtch.
CN202210336338.9A 2022-04-01 2022-04-01 Ancient character generation method combining shape transformation and texture transformation Active CN114494003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210336338.9A CN114494003B (en) 2022-04-01 2022-04-01 Ancient character generation method combining shape transformation and texture transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210336338.9A CN114494003B (en) 2022-04-01 2022-04-01 Ancient character generation method combining shape transformation and texture transformation

Publications (2)

Publication Number Publication Date
CN114494003A CN114494003A (en) 2022-05-13
CN114494003B true CN114494003B (en) 2022-06-21

Family

ID=81488501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210336338.9A Active CN114494003B (en) 2022-04-01 2022-04-01 Ancient character generation method combining shape transformation and texture transformation

Country Status (1)

Country Link
CN (1) CN114494003B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681604B (en) * 2023-04-24 2024-01-02 吉首大学 Qin simple text restoration method based on condition generation countermeasure network
CN116977667B (en) * 2023-08-01 2024-01-26 中交第二公路勘察设计研究院有限公司 Tunnel deformation data filling method based on improved GAIN

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626926A (en) * 2020-04-06 2020-09-04 温州大学 Intelligent texture image synthesis method based on GAN

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664996B (en) * 2018-04-19 2020-12-22 厦门大学 Ancient character recognition method and system based on deep learning
CN110309889A (en) * 2019-07-04 2019-10-08 西南大学 A kind of Old-Yi character symbol restorative procedure of double arbiter GAN

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626926A (en) * 2020-04-06 2020-09-04 温州大学 Intelligent texture image synthesis method based on GAN

Also Published As

Publication number Publication date
CN114494003A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN108416752B (en) Method for removing motion blur of image based on generation type countermeasure network
CN114494003B (en) Ancient character generation method combining shape transformation and texture transformation
CN111798369B (en) Face aging image synthesis method for generating confrontation network based on circulation condition
CN112348743B (en) Image super-resolution method fusing discriminant network and generation network
CN106447626A (en) Blurred kernel dimension estimation method and system based on deep learning
CN112419455B (en) Human skeleton sequence information-based character action video generation method and system and storage medium
CN113449612B (en) Three-dimensional target point cloud identification method based on sub-flow sparse convolution
CN109345604B (en) Picture processing method, computer device and storage medium
CN116168067B (en) Supervised multi-modal light field depth estimation method based on deep learning
CN112907448A (en) Method, system, equipment and storage medium for super-resolution of any-ratio image
CN111861886A (en) Image super-resolution reconstruction method based on multi-scale feedback network
CN109658508B (en) Multi-scale detail fusion terrain synthesis method
CN115713462A (en) Super-resolution model training method, image recognition method, device and equipment
CN115439849B (en) Instrument digital identification method and system based on dynamic multi-strategy GAN network
Kubade et al. Afn: Attentional feedback network based 3d terrain super-resolution
CN113313133A (en) Training method for generating countermeasure network and animation image generation method
Wang et al. High-resolution point cloud reconstruction from a single image by redescription
CN115601257A (en) Image deblurring method based on local features and non-local features
CN109087247A (en) The method that a kind of pair of stereo-picture carries out oversubscription
CN114972024A (en) Image super-resolution reconstruction device and method based on graph representation learning
Yang Super resolution using dual path connections
CN114782961B (en) Character image augmentation method based on shape transformation
Yang et al. Deep networks for image super-resolution using hierarchical features
CN117173368B (en) Human body template dynamic expression method, device, equipment and medium
CN117292144A (en) Sonar image simulation method based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant