CN117351227A

CN117351227A - Training of alpha-bone character picture generation model, and alpha-bone character picture generation method and device

Info

Publication number: CN117351227A
Application number: CN202311592812.5A
Authority: CN
Inventors: 王秋锋; 李菁
Original assignee: Xian Jiaotong Liverpool University
Current assignee: Xian Jiaotong Liverpool University
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-01-05
Anticipated expiration: 2043-11-27
Also published as: CN117351227B

Abstract

The application relates to the technical field of ancient character information processing, in particular to a training method of an alpha-character image generation model, an alpha-character image generation method and a device, wherein the training method comprises the steps of inputting rubbing images into a preset alpha-character image generation model, converting the rubbing images and generating corresponding character model images; encoding the matrix picture to generate content characteristics of the matrix picture; encoding the rubbing picture to generate style description of the rubbing picture; compressing the rubbing picture into implicit features, and performing forward diffusion on the implicit features; inputting the content characteristics and the style description as control conditions into a denoising model, and guiding the alpha-word picture generation model to learn implicit characteristics; and calculating training loss through a loss function, and updating model parameters of the alpha-character picture generation model until the loss converges to obtain the alpha-character picture generation model after training. The rubbing picture generation method and device can generate rubbing pictures with high quality and diversity.

Description

Training of alpha-bone character picture generation model, and alpha-bone character picture generation method and device

Technical Field

The application relates to the technical field of ancient word information processing, in particular to a training method for an A-word picture generation model, an A-word picture generation method and a device.

Background

The oracle is a key form of Chinese character development, and the research on the oracle is not only helpful for people to know ancient Chinese history, culture and language, but also provides important clues for human history and culture evolution. The recognition of oracle characters is one of the important scientific research techniques for reading oracle characters, and the purpose of the recognition of oracle characters is to classify characters and symbols found in oracle bone fragments. With the continuous development of artificial intelligence technology, modern Chinese character recognition technology has made a major breakthrough by applying artificial intelligence technology. Therefore, in the field of ancient character recognition, how to assist in recognition of oracle characters by artificial intelligence technology is also considered.

The image data of the oracle characters are mainly divided into a matrix picture and a rubbing picture, wherein the matrix picture is generated by copying the oracle bone fragments by an expert, and the characters are complete and the background is clean. In contrast, rubbing pictures are cut from the images of the nail pieces, so that the problems of noise, incomplete and the like are unavoidable. Currently, an automatic oracle character recognition technology based on deep learning is still in a starting stage, and the most important influence is that the number of samples is insufficient when oracle characters are recognized due to the scarcity of oracle characters, and a data enhancement method is adopted to expand the samples to solve the problem of insufficient sample number. The data enhancement refers to expanding the existing data set through a series of technologies and processing means so as to improve the performance of the deep neural network model.

Data enhancement methods in the field of oracle word recognition can be broadly divided into two categories: the first is to synthesize new samples by the existing samples in the same domain, and then supplement the data set with the synthesized samples to alleviate the imbalance problem, thereby improving the recognition accuracy of the model in the domain. The second category uses the rubbing single-word image which appears more frequently in the practical application as the key point, and the rubbing data set is amplified by converting the word-matrix image into the rubbing image, so that the accuracy of rubbing single-word identification is improved.

The second type of data enhancement method in the oracle character recognition field mainly comprises the following steps: the method comprises the steps that a style of a rubbing picture is added to a matrix picture through a generator, the matrix picture is then converted into the rubbing picture, the generated rubbing picture and a real rubbing picture are simultaneously input into a discriminator, the discriminator distinguishes the generated rubbing picture from the real picture, and the generator needs to generate a rubbing picture which is approximately real so that the discriminator cannot distinguish the generated rubbing picture.

In summary, when converting a matrix image into a rubbing image to amplify a rubbing data set in the current oracle character recognition field, the training mode of competing with each other through a generator and a discriminator is mainly relied on, however, in the competition process of the generator and the discriminator, the optimal balance condition of the generator and the discriminator is difficult to find, and the generator and the discriminator are easy to collapse, so that the generated rubbing image is distorted, and the purpose of amplifying the rubbing data set cannot be well achieved.

Disclosure of Invention

The application provides a training method of an A character picture generation model, an A character picture generation method and a device, which can generate a rubbing picture with given character matrix content and rubbing style, so that a rubbing data set is effectively amplified. The application provides the following technical scheme:

in a first aspect, the present application provides a training method for a nail word image generation model, the method including:

inputting a rubbing picture into a preset alpha-bone character picture generation model, and converting the rubbing picture to generate a corresponding character model picture;

encoding the matrix picture to generate content characteristics of the matrix picture; encoding the rubbing picture to generate style description of the rubbing picture;

acquiring implicit characteristics of the rubbing picture, and performing forward diffusion on the implicit characteristics;

inputting the content characteristics of the matrix pictures and the style description of the rubbing pictures into a denoising model preset in the alpha-word picture generation model as control conditions, and guiding the alpha-word picture generation model to learn the implicit characteristics;

and calculating training loss through a loss function, and updating model parameters of the alpha-character image generation model until the loss converges to obtain the trained alpha-character image generation model.

In a specific embodiment, the encoding the rubbing picture, generating the style description of the rubbing picture includes:

converting the rubbing picture into an embedded image;

inputting the embedded image to a multi-layer attention extraction module of the alpha-word picture generation model, and extracting key information related to styles in the embedded image;

and projecting the key information to a text space, and generating the style description of the rubbing picture.

In a specific embodiment, the acquiring the implicit feature of the rubbing picture, and the forward diffusing the implicit feature includes:

mapping the rubbing picture to a potential space, and acquiring implicit characteristics of the rubbing picture;

sampling Gaussian noise at a moment t, adding the Gaussian noise sampled t times, and performing forward diffusion on the implicit characteristic;

acquiring implicit characteristics containing noise, and storing Gaussian noise sampled at the moment t.

In a specific implementation manner, the content characteristics of the matrix pictures and the style descriptions of the rubbing pictures are used as control conditions to be input into a denoising model preset in the alpha-word picture generation model, and the alpha-word picture generation model is guided to learn the implicit characteristics; calculating training loss through a loss function, updating model parameters of the alpha-word image generation model until the loss converges, and obtaining the trained alpha-word image generation model comprises the following steps:

The content characteristics of the matrix pictures, the style description of the rubbing pictures and the moment are recordedAnd inputting the implicit characteristics containing noise into a denoising model;

implicit features for noise under conditional control of the content characteristics and the style descriptionsSyndrome of middle energizerPredicting the noise added at each moment to obtain predicted noise;

calculating an average square error between the predicted noise and the actual sampling noise;

and updating model parameters of the alpha-character image generation model through a loss function, and repeating forward diffusion and noise prediction operations until loss converges to obtain a trained model.

In a second aspect, the present application provides a method for generating a nail picture, the method including:

selecting a preset matrix picture and a rubbing picture;

inputting the matrix picture and the rubbing picture into a first bone word picture generation model, wherein the first bone word picture generation model is obtained through training by the training method of the first bone word picture generation model;

Inputting initial Gaussian noise, content characteristics of the matrix pictures and style description of the rubbing pictures into a denoising model preset in the alpha-character picture generation model, and performing reverse diffusion to generate implicit characteristics;

and decoding the implicit characteristic to generate a rubbing picture with given word matrix content and rubbing style.

In a specific embodiment, the inputting the initial gaussian noise, the content features of the matrix picture and the style description of the rubbing picture into a denoising model, performing inverse diffusion and generating implicit features includes:

sampling Gaussian noise, and setting the Gaussian noise as the firstThe feature of noise at the moment;

describing the content characteristics, the style and the timeEngravingAnd said->The characteristic of noise at the moment is input into a denoising model to obtain predicted noise;

performing inverse diffusion to the first one according to the prediction noiseDenoising the characteristic of noise at any time to obtainA time of day noisy feature;

at the position ofConverting the input of the denoising model into content characteristics, style descriptions and time +.>Andthe noise-containing characteristic of the moment by +.>And repeatedly carrying out the operation of predicting noise and denoising to generate the implicit characteristic without noise at the 0 th moment.

In a third aspect, the present application provides a training device for generating a model by using a nail word picture, including:

the rubbing picture conversion module is used for inputting rubbing pictures into a preset A-bone word picture generation model, converting the rubbing pictures and generating corresponding word model pictures;

the picture coding module is used for coding the matrix pictures and generating content characteristics of the matrix pictures; encoding the rubbing picture to generate style description of the rubbing picture;

the forward diffusion module is used for acquiring implicit characteristics of the rubbing picture and performing forward diffusion on the implicit characteristics;

the model training module is used for inputting the content characteristics of the word matrix pictures and the style description of the rubbing pictures into a denoising model as control conditions and guiding the alpha-bone word picture generation model to learn the implicit characteristics;

the model acquisition module is used for calculating training loss through a loss function, updating model parameters of the alpha-character picture generation model until the loss converges, and obtaining the alpha-character picture generation model after training is completed.

In a fourth aspect, the present application provides an apparatus for generating a nail picture, including:

The picture selection module is used for selecting a preset matrix picture and a rubbing picture;

the image input module is used for inputting the matrix image and the rubbing image into an A-bone image generation model, wherein the A-bone image generation model is obtained by training the A-bone image generation model according to any one of claims 1-4;

the implicit characteristic generation module is used for inputting initial Gaussian noise, the content characteristics of the matrix pictures and the style description of the rubbing pictures into a denoising model preset in the alpha-bone matrix picture generation model, performing reverse diffusion and generating implicit characteristics;

and the picture generation module is used for decoding the implicit characteristic and generating a rubbing picture with given word matrix content and rubbing style.

In a fifth aspect, the present application provides an electronic device comprising a processor and a memory; the memory stores a program that is loaded and executed by the processor to implement a training method of an alpha-word picture generation model as in the first aspect or to implement an alpha-word picture generation method as in the second aspect.

In a sixth aspect, the present application provides a computer-readable storage medium having stored therein a program for implementing a training method of a nail picture generation model as in the first aspect or implementing a nail picture generation method as in the second aspect when executed by a processor.

In summary, the beneficial effects of the present application at least include:

(1) Compared with the prior art, which adopts a training mode that the generator and the discriminator compete with each other, the method and the device utilize the diffusion model as the basis to generate the model of the rubbing picture, and the parameters of the generating model of the alpha-word picture can be continuously updated through the calculation of the loss function, so that the generating model of the alpha-word picture is more stable and is not easy to collapse.

(2) In the A-bone character model training process, the content characteristics of character model pictures and the style description of rubbing pictures are used as control conditions to guide the generation of rubbing pictures of the diffusion model, so that the training degree of the model can be controlled to a certain extent, and the model training precision can be ensured.

(3) In the A-bone model training process, the segmentation training is allowed, the diffusion model and the style encoder are trained first, so that the style encoder has good robustness, then the content encoder and the diffusion model are trained on the basis of the training of the style encoder, the content characteristics and the style description are independent and cannot influence each other, and the extraction accuracy of the content characteristics and the style description is improved.

In the model training stage, firstly, obtaining a matrix picture corresponding to the pixel level of the rubbing picture through a conversion model, and then respectively extracting the content characteristics of the matrix picture and the style description of the rubbing picture through a content encoder and a style encoder. Then, implicit characteristics of the rubbing picture are obtained through a picture encoder, and forward diffusion is carried out in a potential space. And then, inputting the content characteristics and style descriptions as control conditions into a denoising model, and guiding the diffusion model to learn implicit characteristic information in a potential space. Finally, training loss is calculated, model parameters are updated until the loss converges, and a training-completed alpha-character image generation model is obtained. And then in the picture generation stage, the A-bone character picture generation model trained by the training method can respectively input the rubbing picture and the character matrix picture into a style encoder and a content encoder according to the input character matrix picture and the rubbing picture, so as to acquire the content characteristics of the character matrix picture and the style description of the rubbing picture. And then, simultaneously inputting Gaussian noise, content characteristics of the matrix picture and style description of the rubbing picture into a denoising model, and continuously performing reverse diffusion until the implicit characteristic without noise at the 0 th moment is generated. Finally, the implicit characteristics are decoded by using a picture decoder in the A-word picture generation model, so that rubbing pictures with given word-matrix contents and rubbing styles are generated, and the rubbing data set is effectively amplified.

The foregoing description is only an overview of the technical solutions of the present application, and in order to make the technical means of the present application more clearly understood, it can be implemented according to the content of the specification, and the following detailed description of the preferred embodiments of the present application will be given with reference to the accompanying drawings.

Drawings

Fig. 1 is a schematic flow chart of a training method of an a-word image generation model according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of a model training process in the training method of the formazan word picture generation model according to the embodiment of the application.

FIG. 3 is a potential space for training the A-word image generation model according to the embodiment of the present applicationA flow chart within the diffusion model at time.

Fig. 4 is a flow chart of a method for generating a nail picture according to an embodiment of the present application.

Fig. 5 is a schematic flow chart of an alpha-word image generating method in an image generating stage according to an embodiment of the present application.

Fig. 6 is a block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The detailed description of the present application is further described in detail below with reference to the drawings and examples. The following examples are illustrative of the present application, but are not intended to limit the scope of the present application.

Optionally, the method for generating the alpha-word picture based on the diffusion model provided by each embodiment is used for illustration in an electronic device, where the electronic device is a terminal or a server, and the terminal may be a computer, a tablet computer, or the like.

First, a diffusion model is briefly introduced, and the diffusion model is divided into a diffusion stage and a back diffusion stage. The diffusion phase aims to gradually add random noise to the data, while the purpose of the back-diffusion phase is to learn how to denoise from the noise data through the deep neural network. Finally, the diffusion model can generate new data samples by step-wise denoising from the noise.

The diffusion model includes forward diffusion, denoising model and reverse diffusion, training involves forward diffusion and denoising model, and generating involves denoising model and reverse diffusion. To reduce computational effort and increase speed, both diffusion operations and back-diffusion operations are performed in a low-dimensional potential space.

Referring to fig. 1, a flow chart of a training method for generating a model by using a nail word picture according to an embodiment of the present application includes:

s110, inputting the rubbing picture into a preset alpha-character picture generation model.

S120, converting the rubbing picture into a corresponding matrix picture, coding the matrix picture and obtaining the content characteristics of the matrix picture.

Specifically, referring to fig. 2, a flow chart of a model training process in the training method of an a-word image generation model according to an embodiment of the present application is shown, a rubbing image is input to a content feature extraction module in the a-word image generation model, the content feature extraction module includes a conversion model and a content encoder, firstly, the rubbing image is input to the conversion model, and the rubbing image is converted to generate a corresponding word model image. Wherein the conversion model aims at converting the style of the rubbing picture into the style of the matrix picture, such as resolution, background, stroke weight, etc., while preserving the alpha-word in the rubbing picture. And then using a content encoder to encode the converted matrix pictures so as to obtain the content characteristics of the matrix pictures.

S130, encoding the rubbing picture and obtaining style description of the rubbing picture.

Specifically, referring to fig. 2, a flow chart of a model training process in the training method of the first-word image generation model according to the embodiment of the present application is provided, and rubbing images are input to a style description extraction module in the first-word image generation model, where the style description extraction module includes a style image encoder, a multi-layer attention module, and a text encoder. Firstly, converting a rubbing picture into an embedded image through a style picture encoder, then extracting key information related to styles in the embedded image by using a multi-layer attention module, and finally projecting the key information into a text space by using a text encoder, thereby obtaining style description of the rubbing picture.

S140, acquiring implicit characteristics of the rubbing picture, and performing forward diffusion on the implicit characteristics to acquire the implicit characteristics containing noise.

Specifically, referring to fig. 2, a flow chart of a model training process in the training method of the first-word image generation model according to the embodiment of the present application is first mapped to a potential space by a preset image encoder in the first-word image generation model, so as to obtain implicit characteristics of the rubbing image. And then inputting the implicit characteristic of the rubbing picture into a diffusion model of the alpha-word picture generation model, wherein the diffusion model comprises a forward diffusion model and a denoising model.

Referring to fig. 3, a potential space is provided for training a training method of an alpha-word image generation model according to an embodiment of the present applicationFirstly, sampling Gaussian noise at a moment t randomly, then adding the Gaussian noise sampled randomly for t times, performing forward diffusion on implicit features, so as to obtain the implicit features containing noise, and storing the Gaussian noise sampled at the moment t. Wherein,and t is defined as a positive integer other than 0.

S150, predicting the noise added at the moment t in the implicit characteristic containing the noise to obtain the predicted noise.

Specifically, referring to fig. 3, a potential space is provided for training a training method of an alpha-word image generation model according to an embodiment of the present application during trainingThe internal flow chart of the diffusion model at the moment is used for describing the content characteristics of the matrix picture, the style of the rubbing picture and the moment +.>And inputting the implicit characteristic containing noise into a denoising model, and performing +.f. on the implicit characteristic containing noise under the condition control of the content characteristic and the style description>And predicting the noise added at each moment to obtain the predicted noise.

And S160, calculating training loss through a loss function, and updating model parameters of the alpha-character picture generation model until the loss converges to obtain the alpha-character picture generation model after training.

In implementation, referring to fig. 2, a schematic flow chart of a model training process in a training method for generating a model by using a nail word picture according to an embodiment of the present application is provided. The training loss is calculated through a loss function, and parameters of a style encoder, a content encoder and a diffusion model are updated, wherein the style encoder is a summary of a style picture encoder, a multi-layer attention module and a text encoder.

Specifically, the average square error between the predicted noise and the actually sampled Gaussian noise is calculated through a loss function, forward diffusion and noise prediction operations are repeatedly carried out until the loss converges, and a training-completed alpha-word picture generation model is obtained.

In summary, compared with the prior art in which the training mode of competing the generator and the discriminator is adopted, the method and the device utilize the diffusion model as the basis to generate the model of the rubbing picture, and the parameters of the generating model of the first-order character picture can be continuously updated through the calculation of the loss function, so that the generating model of the first-order character picture is more stable and is not easy to collapse.

In addition, in the process of training the A-bone character model, the content characteristics of the character model picture and the style description of the rubbing picture are used as control conditions to guide the generation of the rubbing picture of the diffusion model, so that the training degree of the model can be controlled to a certain extent, and the model training precision can be ensured. In the A-bone model training process, the segmentation training is allowed, the diffusion model and the style encoder are trained first, so that the style encoder has good robustness, then the content encoder and the diffusion model are trained on the basis of the training of the style encoder, the content characteristics and the style description are independent and cannot influence each other, and the extraction accuracy of the content characteristics and the style description is improved.

Referring to fig. 4, a flowchart of a method for generating a first word picture according to an embodiment of the present application includes:

S210, selecting a preset matrix picture and a rubbing picture, and inputting the matrix picture and the rubbing picture into the A-bone matrix picture generation model.

Specifically, a preset matrix picture and a rubbing picture are randomly selected, and the matrix picture and the rubbing picture are input into a preset alpha-frame picture generation model, wherein the alpha-frame picture generation model is trained by the training method of the alpha-frame picture generation model.

S220, encoding the matrix picture and obtaining the content characteristics of the matrix picture.

Specifically, referring to fig. 5, a schematic flow chart of a method for generating an a-word image in an image generation stage according to an embodiment of the present application is provided. And inputting the matrix picture into a content encoder of the A-bone matrix picture generation model, and encoding the matrix picture so as to obtain the content characteristics of the matrix picture.

S230, encoding the rubbing picture and obtaining style description of the rubbing picture.

Specifically, referring to fig. 5, a schematic flow chart of a method for generating an a-word image in an image generation stage according to an embodiment of the present application is provided. The rubbing picture is input into a style encoder of an A-word picture generation model, firstly, the rubbing picture is converted into an embedded image through the style picture encoder, then, key information related to the style in the embedded image is extracted by using a multi-layer attention module, and finally, the key information is projected into a text space by using a text encoder, so that style description of the rubbing picture is obtained.

S240, sampling Gaussian noise, inputting the Gaussian noise, the content characteristics of the character model picture and the style description of the rubbing picture into a denoising model, and performing reverse diffusion to generate implicit characteristics.

Specifically, referring to fig. 5, a schematic flow chart of a method for generating an a-word image in an image generation stage according to an embodiment of the present application is provided. Firstly, randomly sampling Gaussian noise and setting the Gaussian noise as a firstTime of day includes characteristics of noise. The content characteristics of the matrix picture, the style description of the rubbing picture and the moment +.>First->And inputting the characteristic containing noise at the moment into a denoising model to obtain the predicted noise. Then, the reverse diffusion is carried out, and the +.>Denoising the characteristic of noise at any time to obtainThe noisy nature of the moment in time,

finally atThe input of the denoising model is sequentially converted into the content characteristics of the matrix picture, the style description of the rubbing picture and the moment +.>And->The noise-containing characteristic of the moment by +.>And repeatedly carrying out the operation of predicting noise and denoising to generate the implicit characteristic without noise at the 0 th moment.

S250, decoding the implicit characteristic to generate a rubbing picture with given word module content and rubbing style.

Specifically, referring to fig. 5, a schematic flow chart of a method for generating an a-word image in an image generation stage according to an embodiment of the present application is provided. After generating the implicit feature without noise at time 0, decoding the implicit feature using a picture decoder in the alpha-word picture generation model to generate a rubbing picture with given word model content and rubbing style.

In summary, in the model training stage, firstly, a matrix picture corresponding to the pixel level of the rubbing picture is obtained through the conversion model, and then, the content characteristics of the matrix picture and the style description of the rubbing picture are respectively extracted through the content encoder and the wind style encoder. Then, implicit characteristics of the rubbing picture are obtained through a picture encoder, and forward diffusion is carried out in a potential space. And then, inputting the content characteristics and style descriptions as control conditions into a denoising model, and guiding the diffusion model to learn implicit characteristic information in a potential space. Finally, training loss is calculated, model parameters are updated until the loss converges, and a training-completed alpha-character image generation model is obtained. And then in the picture generation stage, the A-bone character picture generation model trained by the training method can respectively input the rubbing picture and the character matrix picture into a style encoder and a content encoder according to the input character matrix picture and the rubbing picture, so as to acquire the content characteristics of the character matrix picture and the style description of the rubbing picture. And then, simultaneously inputting Gaussian noise, content characteristics of the matrix picture and style description of the rubbing picture into a denoising model, and continuously performing reverse diffusion until the implicit characteristic without noise at the 0 th moment is generated. Finally, the implicit characteristics are decoded by using a picture decoder in the A-word picture generation model, so that rubbing pictures with given word-matrix contents and rubbing styles are generated, and the rubbing data set is effectively amplified.

The present application is described in more detail below by way of a specific example: rubbing pictureAs a specific example of training phase, this picture is resized to +.>。

The training method of the A-bone character picture generation model comprises the following steps: (model training stage)

S1001, rubbing pictures are firstly subjected to conversion modelConversion into a matrix picture->。

In an implementation, rubbing picturesAnd word die picture->Are all +.>Wherein->For height +.>For width (S)>For the number of channels, i.e.)>。

Optionally, in this embodiment, the CUT based on the contrast learning and the generating of the countermeasure model is used as a conversion model, and other conversion models may be used to convert the rubbing image, which is not limited in the type of the conversion model in this application.

S1002, extracting a matrix pictureContent characteristics->。

In implementations, a content encoder is used for matrix picturesEncoding to obtain content characteristics->The generation size is +.>Wherein->The number of channels being implicit, i.e. +.>。

Alternatively, the content encoder in this embodiment is a Convolutional Neural Network (CNN), and other content encoders may be used, which is not limited in type.

S1003, extracting rubbing pictures Style description of->。

In an implementation, a style encoder is used to rubbing picturesEncodingTo obtain a style description of the text embedding form +.>The generation size is +.>，/>For maximum coding length of text, < >>Is the dimension of the text vector, the size of which is。

Optionally, the present embodiment uses a pre-trained CLIP image encoder as the style picture encoder and a pre-trained CLIP text encoder as the text encoder, where the CLIP related parameters are fixed.

S1004, extracting rubbing picturesImplicit features of->，

In an implementation, a rubbing picture is encoded using a picture encoderMapping to potential space, obtaining implicit features of a picture. The size of rubbing picture is from->Conversion to->Wherein->Is of the implicit natureThe number of channels, i.e.)>。

Alternatively, the present embodiment employs an encoder in the pre-trained autoencoerkl self-encoder as the picture encoder, where the parameters of the autoencoerkl are fixed.

S1005, predicting the noise added at the moment t in the implicit characteristic containing noise to obtain the predicted noise。

In practice, first, the moments are randomly sampled from a uniform distribution U (0, T)Next by adding +.>Sub-randomly sampled Gaussian noise, for implicit features +.>Forward diffusion, i.e.) >WhereinIs a superparameter of gaussian distribution variance, +.>. Finally, the implicit characteristic of noise is obtained>And preserve->Time-of-day added Gaussian noise->The sizes of the Chinese herbal medicines are all +.>。

Specifically, assume that1000, time->I.e. from random samples from U (0, 1000). Then forward diffusion is carried out, the extracted implicit feature is +.>Add->Sub-randomly sampled Gaussian noise, wherein Gaussian distribution method +.>Is a linear interpolation from 0.00085 to 0.012, resulting in +.>Implicit feature of temporal noise->First->Gaussian noise added at each moment +.>. Content characterization->Description of styles->Time->And implicit features->Input to a denoising modelAppearance characteristics->And style description->Under the conditional control of (1) the noise implicit feature of the denoising model +.>Middle->Predicting the noise added at each moment to obtain predicted noise +.>。

Optionally, the embodiment adopts UNet as the denoising model, and other denoising models can be selected, which does not limit the type of the denoising model.

S1006, calculating training loss through a loss function, and updating model parameters of the alpha-word image generation model until the loss converges to obtain the trained alpha-word image generation model.

In practice, the model trains the loss function as，

I.e. calculating the prediction noiseNoise +.>And updating parameters of the style encoder, the content encoder and the diffusion model by the average square error, and repeating forward diffusion and noise prediction operations until loss converges to obtain a trained alpha-character picture generation model.

The method for generating the alpha-word picture comprises the following steps: (Picture generation stage)

S1007, taking the matrix picture as the content picture and inputting it into the content encoder to obtain the contentCapacitive features。

In practice, a predetermined matrix picture is selectedAnd encodes it using a content encoder to obtain the content feature +.>The generation size is +.>。

S1008, using the rubbing picture as a style picture, and obtaining corresponding style description through a style encoder。

In practice, a predetermined rubbing picture is randomly selectedAnd encodes it using a style encoder to obtain a style description +.>The generation size is +.>。

S1009, sampling Gaussian noise, inputting the Gaussian noise, the content characteristics of the character model picture and the style description of the rubbing picture into a denoising model, and performing reverse diffusion to generate implicit characteristics.

Specifically, it is assumed that the size is set from the 1000 th time Is the initial implicit feature +.>The generation size is +.>. Content characterization is then->Description of styles->Time of dayAnd implicit features->Inputting the predicted noise to a denoising model to obtain predicted noise +.>. In practice, according to the input conditions->And->Denoising model prediction adding +.>Noise->. Then back-diffusing, the predicted noise is +.>From implicit features->Subtracting from the above to obtain implicit feature ++999 at the 999 th moment after denoising>. Content characterization again->Description of styles->Time 999 and implicit feature->As input to the denoising model, denoising is repeated until the implicit feature of no noise at time 0 is obtained +.>。

S1010, implicit characteristic after denoisingAnd inputting a picture decoder to finally generate a new rubbing picture.

In implementations, implicit features are used with a picture decoderDecoding is carried out, and finally a rubbing image with new pixel dimension is generated>The size is->I.e. +.>The generated new image has the style of rubbing and the content of the matrix.

Alternatively, the embodiment adopts the decoder in AutoEncoderKL as the picture decoder, and other picture decoders can be selected, which is not limited in type.

The embodiment of the application also provides a training device for the nail word picture generation model, which at least comprises:

the rubbing picture conversion module is used for inputting the rubbing picture into a preset A-bone character picture generation model, converting the rubbing picture and generating a corresponding character model picture.

The picture coding module is used for coding the character matrix picture and generating the content characteristics of the character matrix picture; and encoding the rubbing picture to generate the style description of the rubbing picture.

The forward diffusion module is used for acquiring implicit characteristics of the rubbing picture and performing forward diffusion on the implicit characteristics.

The model training module is used for inputting the content characteristics of the character model pictures and the style description of the rubbing pictures as control conditions into a denoising model preset in the alpha-character picture generation model, and guiding the alpha-character picture generation model to learn implicit characteristics.

The model acquisition module is used for calculating training loss through the loss function, updating model parameters of the alpha-word picture generation model until the loss converges, and obtaining the alpha-word picture generation model after training.

The embodiment of the application also provides a device for generating the alpha-word picture, which at least comprises:

the picture selection module is used for selecting a preset matrix picture and a rubbing picture.

The image input module is used for inputting the matrix image and the rubbing image into the first bone image generation model, wherein the first bone image generation model is obtained through training by the training method of the first bone image generation model.

And the implicit characteristic generation module is used for inputting the initial Gaussian noise, the content characteristics of the character matrix picture and the style description of the rubbing picture into the denoising model, carrying out reverse diffusion and generating the implicit characteristic.

Fig. 6 is a block diagram of an electronic device provided in one embodiment of the present application. The device comprises at least a processor 401 and a memory 402.

Processor 401 may include one or more processing cores such as: 4 core processors, 8 core processors, etc. The processor 401 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 401 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 401 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 401 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 402 may include one or more computer-readable storage media, which may be non-transitory. Memory 402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the training method or the nail picture generation method of the nail picture generation model provided by the method embodiments in the present application.

In some embodiments, the electronic device may further optionally include: a peripheral interface and at least one peripheral. The processor 401, memory 402, and peripheral interfaces may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface via buses, signal lines or circuit boards. Illustratively, peripheral devices include, but are not limited to: radio frequency circuitry, touch display screens, audio circuitry, and power supplies, among others.

Of course, the electronic device may also include fewer or more components, as the present embodiment is not limited in this regard.

Optionally, the application further provides a computer readable storage medium, in which a program is stored, and the program is loaded and executed by a processor to implement the training method of the nail word image generation model or the nail word image generation method of the above method embodiment.

Optionally, the application further provides a computer product, which includes a computer readable storage medium, where a program is stored, and the program is loaded and executed by a processor to implement the training method of the nail word image generation model or the nail word image generation method of the above method embodiment.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The training method of the A-bone word picture generation model is characterized by comprising the following steps of:

2. The training method of the alpha-word picture generation model according to claim 1, wherein the encoding the rubbing picture and generating the style description of the rubbing picture include:

Converting the rubbing picture into an embedded image;

3. The training method of the nail picture generation model according to claim 1, wherein the obtaining the implicit feature of the rubbing picture, and the forward diffusing the implicit feature comprises:

4. The training method of a nail picture generation model according to claim 3, wherein the content characteristics of the character model picture and the style description of the rubbing picture are input into a denoising model preset in the nail picture generation model as control conditions, and the nail picture generation model is guided to learn the implicit characteristics; calculating training loss through a loss function, updating model parameters of the alpha-word image generation model until the loss converges, and obtaining the trained alpha-word image generation model comprises the following steps:

under the conditional control of the content feature and the style description, the method comprises the following steps ofPredicting the noise added at each moment to obtain predicted noise;

5. A method for generating a nail picture, the method comprising:

selecting a preset matrix picture and a rubbing picture;

inputting the matrix picture and the rubbing picture into a first bone word picture generation model, wherein the first bone word picture generation model is obtained by training the first bone word picture generation model according to any one of claims 1-4;

6. The method of claim 5, wherein inputting the initial gaussian noise, the content features of the matrix picture, and the style description of the rubbing picture to a denoising model, performing inverse diffusion, and generating implicit features comprises:

describing the content characteristics, the style and the timeAnd said->The characteristic of noise at the moment is input into a denoising model to obtain predicted noise;

at the position ofConverting the input of the denoising model into content characteristics, style descriptions and time +.>Andthe noise-containing characteristic of the moment by +. >And repeatedly carrying out the operation of predicting noise and denoising to generate the implicit characteristic without noise at the 0 th moment.

7. A training device for a nail word picture generation model, comprising:

the model training module is used for inputting the content characteristics of the matrix pictures and the style description of the rubbing pictures into a denoising model preset in the alpha-word picture generation model as control conditions, and guiding the alpha-word picture generation model to learn the implicit characteristics;

8. An alpha-word picture generation device, comprising:

9. An electronic device comprising a processor and a memory; stored in the memory is a program that is loaded and executed by the processor to implement the training method of the nail print generation model according to any one of claims 1 to 4, or to implement the nail print generation method according to any one of claims 5 to 6.

10. A computer-readable storage medium, characterized in that the storage medium has stored therein a program which, when executed by a processor, is adapted to carry out a training method of a nail picture generation model according to any one of claims 1 to 4 or to carry out a nail picture generation method according to any one of claims 5 to 6.