CN114494003B

CN114494003B - Ancient character generation method combining shape transformation and texture transformation

Info

Publication number: CN114494003B
Application number: CN202210336338.9A
Authority: CN
Inventors: 黄双萍; 黄鸿翔; 杨代辉
Original assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou; South China University of Technology SCUT
Current assignee: Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou; South China University of Technology SCUT
Priority date: 2022-04-01
Filing date: 2022-04-01
Publication date: 2022-06-21
Anticipated expiration: 2042-04-01
Also published as: CN114494003A

Abstract

The invention discloses an ancient character generation method combining shape transformation and texture conversion, which comprises the following steps: constructing a shape transformation generation countermeasure network A, comprising a generator A1 and a discriminator A2; constructing a texture conversion generation countermeasure network B, which comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, wherein the texture conversion generation countermeasure network B generates a countermeasure network for circulation; connecting the shape conversion generation countermeasure network A with the texture conversion generation countermeasure network B to obtain a shooting ancient character generation network model; shape transformation in the joint training photography ancient character generation network model generates a confrontation network A and texture conversion generates a confrontation network B; connecting the output end of the generator A1 with the input end of the generator B1 by using the trained generator A1 and generator B1 to obtain a photographic ancient character image generator; and generating ancient characters by adopting the photographic ancient text and digital image generator.

Description

Ancient character generation method combining shape transformation and texture transformation

Technical Field

The invention belongs to the technical field of image processing and artificial intelligence, and particularly relates to an ancient character generation method combining shape transformation and texture conversion.

Background

The algorithm based on deep learning often depends on massive training data to improve the performance of the algorithm, wherein the supervised learning algorithm has great dependence on data marking, so that people pay attention to marking data acquired in a low labor cost mode, and common methods include data augmentation, data generation and the like. However, for ancient digital images with various foreground shapes and rich background textures, data diversity is very limited because data augmentation generally depends on prior probability distribution of manual design. In contrast, data generation facilitates mining of more diverse data by directly fitting the distribution of the data.

The most common data generation technology at present is to generate a countermeasure network, wherein a shape transformation method and a texture transformation method based on the generation of the countermeasure network have high research enthusiasm in the field of character generation respectively. The shape transformation method based on generation of the countermeasure network generally combines methods such as a space transformation network, a cavity convolution, a deformable convolution and the like with countermeasure learning, so that shape transformation of characters is achieved, wherein the cavity convolution and the deformable convolution can achieve a good deformation result under the support of massive labeled training data, the space transformation network is more suitable for unsupervised learning, and requirements for the training data are relatively low. On the other hand, some methods realize supervised texture conversion through a conditional generation countermeasure network, and some methods realize unsupervised conversion of texture patterns through a loop generation countermeasure network framework, wherein the unsupervised texture conversion hardly requires annotation data, which is friendly to ancient text and digital images with small data volume.

The existing character generation technology mainly carries out independent training on the shape transformation method and the texture transformation generation countermeasure network respectively, and then carries out stacking connection for generating more vivid samples on the shape and texture levels, but the connection mode can cause the problems of gradient disappearance during training and inconsistency after fusion of the generated image shape and texture characteristics.

The shape transformation technique in the prior art is a random font augmentation mode, and deformation parameters need to be randomly sampled from artificially designed prior distribution according to experience, so that diversity is generated. However, the distribution of manual design may not fit the distribution of the real ancient character shape, and the calculation and selection of the distribution are complex, which consumes a lot of labor cost, and the error of distribution fitting may result in low data generation quality. And if the shape conversion model and the texture conversion model are simply stacked and connected, the problem of gradient disappearance is easily caused, the shape conversion model and the texture conversion model cannot be combined and optimized, the character foreground after shape conversion and the background rich in texture cannot be well fused, and the quality of generating the ancient characters is reduced.

Disclosure of Invention

In view of the above, there is a need to provide a method for generating ancient characters by combining shape transformation and texture transformation, the method uses a shape transformation generation countermeasure network combining affine transformation and Thin Plate Spline (TPS) transformation, the network can autonomously learn a target shape probability distribution from data and can generate more refined shape transformation, and the method uses a training mode based on information interaction to jointly optimize a plurality of generation countermeasure networks, so that they can mutually promote tuning and improve the quality of generated samples.

The invention discloses an ancient character generation method combining shape transformation and texture conversion, which comprises the following steps:

step 1, constructing a shape transformation generation confrontation network A, which comprises a generator A1 and a discriminator A2; firstly, copying ancient character images as input of a generator A1, generating deformed character images after space transformation, connecting an output end of the generator A1 with an input end of a discriminator A2, simultaneously inputting target character images to the other input end of the discriminator A2, and outputting a discrimination result of the deformed character images and the target character images by the discriminator A2;

step 2, constructing a texture conversion generation confrontation network B, wherein the texture conversion generation confrontation network B comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, and the texture conversion generation confrontation network B generates the confrontation network for circulation; firstly, inputting a copied ancient text digital image into a generator B1, wherein the output end of the generator B1 is connected with the input end of a discriminator B2, then, inputting a photographed ancient text digital image into the discriminator B2, simultaneously, inputting a photographed ancient text digital image into a generator B3, wherein the output end of a generator B3 is connected with the input end of the discriminator B4, then, inputting a copied ancient text digital image into a discriminator B4, and obtaining discrimination results of the texture of the photographed ancient text image and the texture of the copied ancient text image from the discriminator B2 and the discriminator B4 respectively;

step 3, connecting the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B to obtain a photography ancient character generation network model; the output of the generator A1 in the shape transformation generation countermeasure network A is connected with the input end of the generator B1 in the texture transformation generation countermeasure network B, the output end of the generator B3 in the texture transformation generation countermeasure network B is connected with the input end of the discriminator A2 in the shape transformation generation countermeasure network A, namely the output of the generator B3 is used as the target character image of the shape transformation generation countermeasure network A, and the output of the generator A1 is used as the deformed character image and is input into the generator B1;

step 4, jointly training the shape transformation in the photography ancient character generation network model to generate a confrontation network A and performing texture conversion to generate a confrontation network B;

step 5, connecting the output end of the generator A1 with the input end of the generator B1 by using the trained generator A1 and generator B1 to obtain a generator of the photographed ancient character image;

and 6, generating ancient characters by adopting the photographing ancient character image generator.

Specifically, the generator a1 is a spatial transform network, including an encoder, a predictor, a sampler, a noise reconstruction network, and an image reconstruction network;

the encoder is composed of a plurality of convolution modules, each convolution module comprises a two-dimensional convolution layer, a nonlinear activation layer and a convergence layer which are sequentially connected;

the predictor consists of a plurality of full-connection modules and a last full-connection layer, each full-connection module comprises a full-connection layer and a nonlinear activation layer, and the number of output channels of the last full-connection layer is set as the number of deformation parameters to be predicted;

the sampler maps the deformed character image pixel area to the copy ancient character image pixel area by applying matrix multiplication on a sampling grid;

the image reconstruction network is sequentially connected by a plurality of full-connection modules, a full-connection layer and a plurality of transposition convolution modules, and each transposition convolution module comprises a transposition convolution layer and a nonlinear activation layer which are sequentially connected;

the noise reconstruction network is sequentially connected by a plurality of layers of full connection modules and a full connection layer;

the discriminator A2 is based on the structure of patchGAN and is composed of five convolution modules connected in sequence, wherein each convolution module in the first four convolution modules comprises a two-dimensional convolution layer, an example normalization layer and a leakyReLU activation layer, and the last convolution module comprises a padding layer and a two-dimensional convolution layer;

firstly, copy the ancient character image

As input to the encoder, the encoder copies the ancient text image

Extracting shape feature, and outputting a shape feature vector

Then, a noise hidden vector is randomly selected from the standard normal distribution

The shape feature vector

Sum noise hidden vector

Fusing, inputting the fused hidden vector into a predictor, wherein the predictor is responsible for mapping out TPS transformation parameters and affine transformation parameters, the TPS transformation parameters are coordinate values of TPS transformation sampling grid matching points, the affine transformation parameters are converted into affine transformation sampling grids, then, the TPS transformation sampling grids, the affine transformation sampling grids and the copied ancient text digital images are input into a sampler, deformed character images are output, meanwhile, the output end of the predictor is connected with the input ends of an image reconstruction network and a noise reconstruction network, and the input copied ancient text digital images and the noise hidden direction are respectively reconstructedThen, the deformed character image and the target character image output by the generator are input to a discriminator, respectively, and the discriminator outputs the discrimination results of the deformed character image and the target character image.

Optionally, each convolution module in the encoder further comprises a batch normalization layer located between the two-dimensional convolution layer and the nonlinear active layer; the nonlinear activation function of the nonlinear activation layer in the encoder selects a ReLU function, and the convergence operation of the convergence layer selects to maximize convergence.

Optionally, each fully-connected module in the predictor further comprises a batch normalization layer located between the fully-connected layer and the nonlinear activation layer; the nonlinear activation function of the nonlinear activation layer in the predictor selects a ReLU function, and the convergence operation of the convergence layer selects the maximization of convergence.

Preferably, the number of output channels of the last fully-connected layer in the predictor is set to 132, 128 parameters of the deformation parameters to be predicted are coordinates of 64 TPS transformed sampling grid matching points, and 4 parameters are element values of an affine transformation matrix.

Specifically, the sampler is implemented by a torch.nn.functional.grid _ sample () method in the torch, and simultaneously, affine parameters are converted into affine transformation sampling grids by the torch.nn.functional.affine _ grid () method in the torch.

Specifically, the generator B1 includes three convolution modules, four residual convolution modules, three transposed convolution modules and one output convolution module, where each convolution module structurally includes a two-dimensional convolution layer, an example normalization layer and a nonlinear ReLU active layer, which are connected in sequence, each residual convolution module includes two convolution modules and an adder, which are connected in sequence, the adder adds the input end and the output end of the residual convolution module, each transposed convolution module includes a transposed convolution layer, an example normalization layer and a nonlinear ReLU active layer, which are connected in sequence, and the output convolution module includes a convolution layer and a Tanh active function, which are connected in sequence; the generator B3 and the generator B1 have the same structure;

the structures of the discriminator B2 and the discriminator B4 are the same as the structure of the discriminator A2;

preferably, the gradient backhauling is cut off between A1 and B1, and between B3 and A2, and the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B interact through the forward propagation of the image information.

Preferably, a least square generated countermeasure loss function of the shape transformation generation countermeasure network a is constructed as an optimization target of training, and a calculation formula of the least square generated countermeasure loss function is as follows:

representing the loss function of generator a1,

representing the shape transformation to generate the discriminators in the countermeasure network a,

representing the shape transformation generation against the generators in network a,

representing a signal-to-noise reconstruction loss function,

a function representing the loss of diversity is expressed,

representing the loss function of arbiter a2,

representing the copying of the ancient character image,

a target character image is represented by a character image,

and

indicating the corresponding mathematical expectation.

The signal-noise reconstruction loss function comprises a signal reconstruction sub-term, a noise reconstruction sub-term and a reconstruction error ratio term, and the calculation formula is as follows:

the mean absolute error is represented by the average absolute error,

a noise-hidden vector is represented that is,

and

separately representing image reconstruction networks

And

the reconstructed imitation ancient text digital image and the noise hidden vector,

is a dynamic coefficient of the motion of the object,if the reconstruction error ratio term is greater than logM, let α =1, if the reconstruction error ratio term is less than-logM, let α = -1, if the reconstruction error ratio term is within the ideal range, i.e., [ -logM, logM]Then, α =0 is set, and M represents a hyperparameter;

the calculation formula of the diversity loss function is as follows:

where P denotes a predictor, E denotes an encoder,

and

respectively representing different noise hidden vectors taken from the same gaussian distribution.

Preferably, a least square generated countermeasure loss function of the texture transformation countermeasure network B is constructed as an optimization target of training, and a calculation formula of the least square generated countermeasure loss function is as follows:

wherein the content of the first and second substances,

representing samples from a set of the input copied ancient digital image and the deformed character image output by generator a1, y representing a photographed ancient digital image,

a texture discriminator for representing a photographed ancient character image in the texture conversion generation countermeasure network B,

the representation texture conversion generation confronts the photographic ancient character image texture generator in the network B,

the expression texture conversion generates a texture discriminator of the copy ancient characters in the confrontation network B,

the representation texture conversion generates a texture generator against the copied ancient text in the network B,

expressing a stroke perception cycle consistency loss function, and calculating the formula as follows:

wherein x represents copying the ancient text digital image, y represents photographing the ancient text digital image,

representing the vector element inner product, W is a weight matrix extracted from the copied ancient text digital image, and is calculated as follows:

wherein the content of the first and second substances,

and

the area of the stroke region and the area of the background region, respectively.

In particular, the objective of the TPS transformation is to solve a deformation function

So that

And the function of the bending energy is the minimum,

coordinates representing matching points of the TPS transformed sampling grid on the original character image,

the TPS transform of the deformed character image samples the coordinates of the grid matching points,

for the number of sampling grid matching points in a TPS transform, assume that n sets of matching point pairs for two images have been acquired:

、

、…、

the deformation function is imagined as bending a thin metal plate so that the plate passes through the given n TPS transform sampling grid matching points, and the energy function for bending the thin plate is expressed as:

the thin plate spline function can be proved to be the function with the minimum bending energy, and the thin plate spline function is as follows:

where U is the basis function:

，

，

and

the method is solved by preset values of n TPS transformation sampling grid matching point coordinates and the offset predicted by a predictor, so that the method can obtain

And (5) specific expressions.

The sampling formula of the affine transformation sampling grid is as follows:

wherein the content of the first and second substances,

，

，

，

respectively representing affine transformation parameters predicted by the predictor,

and

respectively the position coordinates of the pixel points before and after transformation.

Preferably, the pixel size of all images is 64 x 64, the batch size is 64, the initial learning rate of the shape transformation generation countermeasure network A is 0.0001, the learning rate of the texture transformation generation countermeasure network B is 0.001, the training iteration number is 30000, the learning rate starts to linearly decay to 1e-5 after 15000 iterations, and the network is optimized by using an adam optimizer.

Compared with the prior art, the invention has the beneficial effects that:

the method adopts the space transformation network with the reconstruction branch as a generator for generating the confrontation network through shape transformation, automatically learns the shape distribution of the photographed ancient Chinese digital image from the global and local two levels, does not need to manually preset the shape distribution, reduces the labor cost and simultaneously improves the authenticity and diversity of the generated sample;

the method realizes the joint optimization of the shape transformation generation countermeasure network and the texture transformation generation countermeasure network, so that the character foreground and the background with rich texture after the shape transformation are better fused, and the authenticity of the generated sample is improved.

Drawings

FIG. 1 shows a schematic flow diagram of a method embodying the present invention;

FIG. 2 is a schematic diagram of the structure of a shape transformation generation countermeasure network in an embodiment of the present invention;

FIG. 3 is a diagram illustrating the structure of a texture transformation generation countermeasure network in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a photographic ancient character generation network model in the embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

For the sake of reference and clarity, the technical terms, abbreviations or abbreviations used hereinafter are to be interpreted in summary as follows:

STN: a spatialtransformational network space transform network.

TPS, thin plate spline and thin plate spline.

CNN: a convolutional neural network.

FC network: fully connected network.

Pythrch: a mainstream deep learning framework encapsulates a plurality of commonly used deep learning related functions and classes.

ReLU/leakyReLU: a non-linear activation function.

And (3) generating a countermeasure network: a generating network training framework based on the idea of zero sum game comprises a generator and an arbiter.

Hidden vector quantity: vectors in random variable space.

The invention discloses an ancient character generation method combining shape transformation and texture conversion, which aims to solve a plurality of problems in the prior art. Fig. 1 shows a method flow diagram of an embodiment of the present invention, and a method for generating an ancient character combining shape transformation and texture conversion includes the following steps:

step 2, constructing a texture conversion generation confrontation network B, wherein the texture conversion generation confrontation network B comprises a generator B1, a discriminator B2, a generator B3 and a discriminator B4, and the texture conversion generation confrontation network is used for circularly generating the confrontation network; firstly, inputting a copied ancient character image into a generator B1, wherein the output end of the generator B1 is connected with the input end of a discriminator B2, then, inputting a photographed ancient character image into the discriminator B2, simultaneously, inputting a photographed ancient character image into a generator B3, wherein the output end of a generator B3 is connected with the input end of the discriminator B4, then, inputting a copied ancient character image into the discriminator B4, and obtaining discrimination results of the texture of the photographed ancient character image and the texture of the copied ancient character image from the discriminator B2 and the discriminator B4 respectively;

step 3, connecting the shape transformation generation countermeasure network A and the texture transformation generation countermeasure network B to obtain a shooting ancient character generation network model; the output of the generator A1 in the shape transformation generation countermeasure network A is connected with the input end of the generator B1 in the texture transformation generation countermeasure network B, the output end of the generator B3 in the texture transformation generation countermeasure network B is connected with the input end of the discriminator A2 in the shape transformation generation countermeasure network A, namely the output of the generator B3 is used as the target character image of the shape transformation generation countermeasure network A, and the output of the generator A1 is used as the deformed character image and is input into the generator B1;

Specifically, the present embodiment adopts the following steps to implement the inventive method.

1. And collecting manually copied ancient character image data and the ancient character image data shot in a real scene, wherein the manually copied ancient character image can be a character image generated by a drawing data collecting device such as a drawing digital board, an electronic tablet and the like or computer drawing software similar to Photoshop and the like. And respectively constructing a data set for the collected copied ancient character image and the photographed ancient character image.

2. The generation countermeasure network a for shape transformation is constructed, including a generator a1 and a discriminator a 2.

(1) First, a spatial transform network is constructed as generator a1, the spatial transform network comprising three modules, respectively an encoder, a predictor and a sampler. Firstly, an encoder is constructed, the encoder is composed of connected Convolutional Neural Networks (CNNs), the number of convolutional layers of the CNNs is generally selected to be more than 3, in an implementation example, 4 convolutional modules are selected to be connected in sequence, each convolutional module comprises a two-dimensional convolutional layer, a batch normalization layer, a nonlinear activation layer and a convergence (pooling) layer, wherein the batch normalization layer is optional, the nonlinear activation function can be selected from a ReLU function or a leakyReLU function, the convergence operation can be selected from maximization convergence, average convergence or self-adaptive convergence, and the ReLU function and the maximization convergence are adopted in the implementation example.

(2) Secondly, a predictor is constructed, the predictor is composed of a fully-connected (FC) neural network, the number of FC layers is generally more than 2, the FC network of the embodiment comprises 3 FC modules which are connected in sequence, and finally, one FC layer is connected. Each FC module comprises an FC layer, a batch normalization layer and a nonlinear activation layer, wherein the batch normalization layer is optional, the nonlinear activation function can be a ReLU function or a LEAKYReLU function, the aggregation operation can be maximum aggregation, average aggregation or adaptive aggregation, the ReLU function and the maximum aggregation are adopted in the example, the number of output channels of the last FC layer is set as the number of deformation parameters needing to be predicted, the number of the deformation parameters adopted in the example is 132, the 128 parameters are coordinate values of 8 × 8=64 TPS transformation sampling grid matching points, and the 4 parameters are element values of an affine transformation matrix. Note that the number of TPS transform sampling grid matching points may instead be any integer smaller than the original character image pixels.

(3) Next, a sampler is constructed, and this example is implemented by using a torch.nn.functional.grid _ sample () method in the deep learning framework.

(4) Finally, an image reconstruction network is constructed

Sum noise reconstruction network

，

Consists of 3 FC modules, 1 FC layer and 4 transposition convolution modules which are connected in sequence,

the device comprises 3 FC modules and 1 FC layer which are connected in sequence, wherein the transposed convolution module comprises 1 transposed convolution layer, 1 batch normalization layer and 1 nonlinear activation layer which are connected in sequence, wherein the batch normalization layer is optional, the nonlinear activation function can select a ReLU function or a LEAKyReLU function, the convergence operation can select maximum convergence, average convergence or self-adaptive convergence, and the ReLU function and the maximum convergence are adopted in the embodiment.

(5) The working principle of the space transformation network is as follows: first, copy the ancient character image

As input to the encoder, the encoder extracts shape-removed features from the input image and outputs a shape feature vector

Next, we randomly choose a noise hidden vector from the standard normal distribution

Will be

And

the fusion is carried out in a way that the fusion of the example is directly carried out by superposition summation, wherein

The character pattern characteristic information is expressed, the function of ensuring the output authenticity is realized,

certain randomness can be brought, and the diversity of output is ensured. And inputting the fused implicit vector into a predictor, wherein the predictor is responsible for mapping out TPS transformation parameters and affine transformation parameters, the TPS transformation parameters are coordinate values of matched points of a TPS transformation sampling grid, the sampling grid is provided with 8 x 8=64 matched points, and the affine transformation parameters are affine matrix parameters and have 4 parameters in total. Next, the 4 affine parameters are converted into affine transformation sampling grids by the method of torch. Next, the TPS transformed sampling grid and the affine transformed sampling grid and the original image are input to the sampler, and the transformed character image is output. We assume that the coordinates of n sets of TPS sample grid matching point pairs for two images (panels a and B) have been acquired:

、

、…、

in this example, n is 64. The process of calculating the coordinate correspondence of map a and map B using the TPS transform is as follows: the goal of the TPS transform is to solve a function

So that

And the bending energy function is minimum, so that other points on the image can obtain a good transformation result through interpolation. The deformation function can be thought of as bending a thin metal plate through a given n TPS transform sampling grid matching points, and the energy function for bending the thin plate can be expressed as:

it can be proved that the thin plate spline function is the function with the minimum bending energy, and the thin plate spline function is as follows:

where U is the basis function:

in the above formula, only need to obtain

，

，

And

can determine

，

，

，

And

the solution can be done by sampling the preset values of the grid matching point coordinates and the offset predicted by the predictor through 64 TPS transforms.

Similarly, we assume that

And

the positions of the pixel points before and after the transformation are respectively, the sampling formula of the affine transformation is as follows:

wherein

，

，

，

Each representing 4 affine transformation parameters predicted by the predictor.

And finally, connecting the output end of the predictor with the input ends of the image reconstruction network and the noise reconstruction network, and reconstructing an original image and a noise hidden vector respectively.

(6) And constructing a discriminator A2, wherein the discriminator A2 is a patchGAN-based structure [6] and consists of 5 convolution modules which are connected in sequence, and each module in the first 4 convolution modules comprises a two-dimensional convolution layer, an example normalization layer and a LEAKYRELU activation layer. The last convolution module includes an optional padding layer and a two-dimensional convolution layer.

(7) A generation countermeasure network A composed of a generator (space transformation network) for copying ancient character images and a discriminator is shown in FIG. 2

As an input, a character image which is subjected to spatial conversion to generate a distortion is connected with the output terminal of the generator A1 and the input terminal of the discriminator A2, and at the same time, a target character image is displayed

The input is the other output end of the discriminator A2, and the discriminator A2 outputs the discrimination result of the deformed ancient character image and the target character image.

3. A generative confrontation network B for texture transformation is constructed, which is a loop structure including a generator B1 and a discriminator B2, and a generator B3 and a discriminator B4.

(1) The generator B1 and the generator B3 are constructed, the structures of the generator B1 and the generator B3 are completely the same, the generator B8932 comprises three convolution modules, four residual convolution modules, three transposed convolution modules and an output convolution module which are connected in sequence, the structure of each convolution module comprises a two-dimensional convolution layer, an example normalization layer and a nonlinear ReLU active layer which are connected in sequence, each residual convolution module comprises two convolution modules and an adder which are connected in sequence, the adder adds the input end and the output end of the residual convolution module, and each transposed convolution module comprises a transposed convolution layer, an example normalization layer and a nonlinear ReLU active layer which are connected in sequence. The output convolution module comprises a convolution layer and a Tanh activation function which are connected in sequence.

(2) Discriminator B2 and discriminator B4 are constructed, and their structures are identical to those of discriminator a 2.

(3) The generation of the confrontation network B is connected in a manner as shown in fig. 3, and in order to cyclically generate the confrontation network, the copied ancient Chinese digital image is first input to the generator B1, the output terminal of the generator B1 is connected to the input terminal of the discriminator B2, and the photographed ancient Chinese digital image is also input to the discriminator B2. Similarly, the photographed ancient text image is input to the generator B3, the output terminal of the generator B3 is connected to the input terminal of the discriminator B4, and the copied ancient text image is also input to the discriminator B4.

4. As shown in fig. 4, the whole photographing ancient character generation system is constructed by connecting a and B in the following connection modes: the output of the generator a1 in the generation countermeasure network a is connected to the input of the generator B1 in the generation countermeasure network B, and the output of the generator B3 in the generation countermeasure network B is connected to the input of the discriminator a2 in the generation countermeasure network a, that is, when the output of the generator B3 is input to the generator B1 as the target character image of the generation countermeasure network a and the output of the generator a1 is input to the generator B1 as the character image of the deformation. Note that to avoid the gradient disappearing, the gradient backhauling is cut off between a1 and B1, and between B3 and a2, creating a countermeasure network against the forward propagation of image information by a and B.

5. And constructing an optimization target for guiding the learning of the neural network.

The signal noise reconstruction loss function, the diversity function and the least square generation countermeasure loss function are constructed for the countermeasure network A as follows:

wherein the content of the first and second substances,

the mean absolute error loss function is represented,

and

separately representing reconstructed networks

And

and the reconstructed copied ancient text digital image and the noise hidden vector. Under the condition of lacking strong supervision, the action generated by input information can be restrained in the neural network learning process, and in order to avoid the condition, reconstruction loss items are respectively designed for information and noise vectors, so that the action of guaranteeing authenticity of shape information and guaranteeing diversity of noise cannot be restrained. In addition, in order to ensure that the transformation degree of the transformed font is more reasonable and controllable, a reconstruction loss ratio is designed to seek the balance of respective effects of shape information and noise, and a hyper-parameter M is further used>1 to constrain this term. Where α is the dynamic coefficient, we assume α =1 if the term is greater than logM during training. In this case, the noise reconstruction is much worse than the signal reconstruction. This means that the effect of noise is suppressed by the network and we use the gradient descent method to optimize the positive ratio term. Conversely, when the term is less than-logM, we assume α = -1. In this case, the signal reconstruction is much worse than the noise reconstruction. This means that the effect of noise is too prominent, the effect of shape information is suppressed and we use the gradient descent method to optimize the negative ratio term. If the term is within the ideal range, i.e., [ -logM, logM]Then α =0 is set, i.e. without any additional optimization.

Where P denotes a predictor, E denotes an encoder,

and

Wherein the content of the first and second substances,

is the output of generator B3.

To build a stroke-aware loop consistency loss function and a least-squares generation countermeasure loss function for the countermeasure network B, the following are generated:

wherein the content of the first and second substances,

,

in the area of the stroke area and the area of the background area, respectively.

Wherein the content of the first and second substances,

representing a sample in a set formed by the input copied ancient text digital image and the output deformed character image of generator a1, y represents a photographic ancient text digital image.

6. All image pixel sizes were set to 64 x 64, batch size was 64, with an initial learning rate of 0.0001 for generation of countermeasure network a, 0.001 for generation of countermeasure network B, number of training iterations 30000, and after 15000 iterations the learning rate started to decay linearly to 1e-5, with the network optimized using adam optimizer.

7. And (3) training jointly to generate the confrontation networks A and B according to the connection mode in the step 4 to obtain a trained generator A1 and a generator B1, and connecting the output end of the generator A1 with the input end of the generator B1 to obtain a complete photographic ancient character image generator which can be used for generating various generation samples.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for generating ancient characters by combining shape transformation and texture conversion is characterized by comprising the following steps:

step 1, constructing a shape transformation generation confrontation network A, which comprises a generator A1 and a discriminator A2; firstly, a copied ancient character image is taken as the input of a generator A1, a deformed character image is generated after spatial transformation, the output end of the generator A1 is connected with the input end of a discriminator A2, meanwhile, a target character image is input into the other input end of the discriminator A2, and the discriminator A2 outputs the discrimination result of the deformed character image and the target character image;

step 6, generating ancient characters by adopting the photographic ancient character image generator;

the generator A1 is a space transformation network, which comprises an encoder, a predictor, a sampler, a noise reconstruction network and an image reconstruction network;

the encoder is composed of a plurality of convolution modules, and each convolution module comprises a two-dimensional convolution layer, a nonlinear activation layer and a convergence layer which are sequentially connected;

the sampler maps the deformed character image pixel area to the copied ancient character image pixel area by applying matrix multiplication on a sampling grid;

the discriminator A2 is based on the structure of patchGAN and is composed of five convolution modules which are connected in sequence, wherein each convolution module in the first four convolution modules comprises a two-dimensional convolution layer, an example normalization layer and a LEAKYRELU activation layer, and the last convolution module comprises a padding layer and a two-dimensional convolution layer;

first, copy the ancient character image

As input to the encoder, the encoder copies the ancient text image

Extracting shape feature, and outputting a shape feature vector

The shape feature vector

Sum noise hidden vector

Fusing, inputting the fused implicit vector into a predictor, wherein the predictor is responsible for mapping out TPS (transformation protocol secure) transformation parameters and affine transformation parameters, the TPS transformation parameters are coordinate values of TPS transformation sampling grid matching points, the affine transformation parameters are converted into affine transformation sampling grids, and then, the TPS is transformed into a pseudo-random number (TPS) modelThe method comprises the steps of inputting a transformation sampling grid, an affine transformation sampling grid and a copied ancient character image into a sampler, outputting a deformed character image, connecting the output end of a predictor with the input ends of an image reconstruction network and a noise reconstruction network, reconstructing the input copied ancient character image and a noise hidden vector respectively, then inputting the deformed character image and a target character image output by a generator A1 into a discriminator A2 respectively, and outputting discrimination results of the deformed character image and the target character image by the discriminator A2.

2. The method for generating ancient words by combining shape transformation and texture transformation according to claim 1, wherein the generator B1 comprises three convolution modules, four residual convolution modules, three transposition convolution modules and an output convolution module which are connected in sequence, each convolution module has a structure comprising a two-dimensional convolution layer, an example normalization layer and a nonlinear ReLU activation layer which are connected in sequence, each residual convolution module comprises two convolution modules and an adder which are connected in sequence, the adder adds the input end and the output end of the residual module, each transposition convolution module comprises a transposition convolution layer, an example normalization layer and a nonlinear ReLU activation layer which are connected in sequence, and the output convolution module comprises a convolution layer and a Tanh activation function which are connected in sequence; the generator B3 and the generator B1 have the same structure;

the gradient backhauling is cut off between generator a1 and generator B1, and between generator B3 and discriminator a2, and the shape transformation generation countermeasure network a and the texture transformation generation countermeasure network B interact through the forward propagation of the image information.

3. The method of claim 1, wherein each of said convolution modules in said encoder further comprises a batch normalization layer disposed between said two-dimensional convolution layer and said non-linear active layer; the nonlinear activation function of the nonlinear activation layer in the encoder selects a ReLU function, and the convergence operation of the convergence layer selects the maximized convergence;

each full-connection module in the predictor also comprises a batch normalization layer which is positioned between the full-connection layer and the nonlinear activation layer; the nonlinear activation function of the nonlinear activation layer in the predictor selects a ReLU function, and the convergence operation of the convergence layer selects the maximization of convergence.

4. The method for generating ancient characters by combining shape transformation and texture transformation according to claim 1 or 2, wherein a least square generation countermeasure loss function of the shape transformation generation countermeasure network A is constructed as an optimization target of training, and a calculation formula of the least square generation countermeasure loss function is as follows:

representing the loss function of generator a1,

the representation shape transformation generates a generator in the countermeasure network a,

representing a signal-to-noise reconstruction loss function,

representing a diversity loss function, P representing a predictor, E representing an encoder,

representing the loss function of arbiter a2,

representing the copying of the ancient character image,

a target character image is represented by a character image,

and

indicating the corresponding mathematical expectation.

5. The method as claimed in claim 4, wherein the signal-to-noise reconstruction loss function comprises a signal reconstruction sub-term, a noise reconstruction sub-term and a reconstruction error ratio term, and the calculation formula is as follows:

the mean absolute error is represented by the average absolute error,

a noise-hidden vector is represented that is,

Figure 592892DEST_PATH_315930DEST_PATH_IMAGE018

and

Figure 545356DEST_PATH_603182DEST_PATH_IMAGE099

separately representing image reconstruction networks

Figure 102819DEST_PATH_975898DEST_PATH_IMAGE020

And

Figure 498028DEST_PATH_DEST_PATH_IMAGE021

is a dynamic coefficient, if the reconstruction error ratio term is greater than logM, let α =1, if the reconstruction error ratio term is less than-logM, let α = -1, if the reconstruction error ratio term is within the ideal range, i.e., [ -logM, logM]If the parameter is set to be alpha =0, M represents a hyperparameter;

the calculation formula of the diversity loss function is as follows:

where P denotes a predictor, E denotes an encoder,

and

6. The method for generating ancient texts by combining shape transformation and texture transformation according to claim 1 or 2, wherein a least square generation countermeasure loss function of a texture transformation countermeasure network B is constructed as an optimization target of training, and a calculation formula of the least square generation countermeasure loss function is as follows:

wherein the content of the first and second substances,

samples in a set collectively constituted by the input copied ancient text digital image and the deformed character image output by generator a1,

representing a photographic ancient text-based digital image,

the expression texture conversion generates a texture generator against the photographed ancient text image in the network B,

the representation texture conversion generates a texture generator against the copied ancient text digital image in the network B,

wherein the content of the first and second substances,

representing the inner product of vector elements, W being a weight matrix extracted from a copied ancient text digital imageThe calculation is as follows:

wherein, the first and the second end of the pipe are connected with each other,

and

7. The method for generating ancient words by combining shape transformation and texture transformation according to claim 1 or 2, wherein the number of output channels of the last fully-connected layer in the predictor is set to 132, 128 parameters in the deformation parameters to be predicted are coordinates of 64 TPS transformation sampling grid matching points, and 4 parameters are element values of an affine transformation matrix.

8. The method of claim 7, wherein the TPS transformation aims to solve a deformation function

So that

Figure DEST_PATH_165516DEST_PATH_IMAGE041

And the bending energy function is the smallest,

Figure DEST_PATH_997206DEST_PATH_IMAGE042

Figure DEST_PATH_954667DEST_PATH_IMAGE043

TPS representing morphed character imagesThe coordinates of the matching points of the sampling grid are transformed,

、

、…、

the deformation function is imagined as bending a thin metal plate through a given n TPS transform sampling grid matching points, and the energy function for bending the thin plate is expressed as:

where U is the basis function:

，

，

and

the method is solved by preset values of n TPS transformation sampling grid matching point coordinates and the offset predicted by a predictor,

thereby can obtain

A specific expression;

the sampling formula of the affine transformation sampling grid is as follows:

wherein the content of the first and second substances,

，

，

，

and

9. The method for generating ancient words by combining shape transformation and texture transformation as claimed in claim 7, wherein said sampler is implemented by a torch.nn.functional.grid sample () method in a pyrtch, and at the same time, the affine parameters are converted into affine transformation sampling grid by a torch.nn.functional.affine _ grid () method in a pyrtch.