CN117333881A

CN117333881A - Oracle auxiliary decoding method based on conditional diffusion model

Info

Publication number: CN117333881A
Application number: CN202311295878.8A
Authority: CN
Inventors: 管海粟; 万金鹏; 匡嚞玢; 张凯乐; 王鹏杰; 陈文炳; 刘禹良; 金连文; 白翔
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-01-02

Abstract

The invention discloses a oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps: collecting and arranging font evolution data of the decoded oracle characters in five stages of oracle, gold, seal, script and regular script; pairing two text pictures in different periods, and performing random cutting operation with fixed size; constructing a conditional diffusion model neural network, sending the paired pictures into the neural network for training, and optimizing network parameters; inputting an oracle text picture, performing reverse diffusion on the picture by using a trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method. The invention provides a simple and effective oracle font evolution generation model, which utilizes the supervision information of the characters and pictures of the decoded oracle at different periods to achieve the purpose of inputting the oracle pictures, and the model can predict and generate the font pictures of the oracle pictures at any period, thereby assisting oracle specialists to decode the oracle which is not decoded.

Description

Oracle auxiliary decoding method based on conditional diffusion model

Technical Field

The invention relates to the field of artificial intelligence and computer vision, in particular to a method for assisting in decoding oracle bone based on a conditional diffusion model.

Background

The characters are symbolism of civilization and are also an important mark of ethnicity. The oracle, the earliest systematic text in China, is an archaeological resource which is not more precious. The research on the oracle is deepened, the historical background and the cultural connotation of the oracle are deeply excavated, the historical acceptance of the Chinese ethnicity can be enhanced, the self-strengthening of the culture is promoted, new brilliance of the socioeconomic culture is created, and the requisite way of the socioeconomic culture is built.

Although the nations push the research work of the oracle, and more expert scholars are put into the oracle research, the social attention is gradually improved, the subsequent research work is difficult to develop because the oracle cracking rate is still small, and the oracle cracking work is still the core of the oracle research work.

At present, the main difficulties of the oracle interpretation work are focused on the problems of difficult construction of a data set, large difference of written font styles, data loss in the font evolution stage and the like. The interpretation of the oracle bone is complex, and the oracle bone can be really interpreted only by the literal students after multi-dimensional deep research and a large amount of evidence are needed. There is currently little universal method of efficient oracle fracture.

Therefore, the invention provides an oracle auxiliary decoding method based on a conditional diffusion model, which can generate a text picture of each font evolution stage corresponding to oracle and assist oracle specialists to decode the oracle.

Disclosure of Invention

Aiming at the problems, the invention provides a oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps:

step one: and collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script.

Step two: and (3) preprocessing data, namely matching two character pictures of the same word in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures.

Step three: training a network, constructing a conditional diffusion model neural network, sending the paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by utilizing the supervision information of the text pictures of the oracle in different periods.

Step four: and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method.

The further improvement is that: the first step of the method comprises the step of decoding the character of the oracle, wherein the character of the oracle comprises character development data of five stages of 'oracle', 'golden character', 'seal character', 'clerical script' and 'regular script'. The five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.

The further improvement is that: the matching of the character pictures in the second step can be carried out by arbitrarily selecting two different font stages of the same character, firstly unifying all the pictures to a fixed resolution of 100 x 100, then carrying out random cutting operation with a fixed size at the same position of the two paired pictures, finally combining the paired pictures after random cutting, and ensuring that the combined pictures are paired.

The further improvement is that: the randomly cropped picture size is fixed to a size of 64 x 3, where 3 is the original RGB channel of the picture. For a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.

The further improvement is that: the backbone structure of the conditional diffusion model neural network in the third step is a U-Net denoising neural network structure, and the components of the conditional diffusion model neural network are composed of a downsampling network, an upsampling network and a jump connection.

The further improvement is that: the downsampling network consists of 5 convolution modules, wherein the first 4 modules consist of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module consists of only two residual modules and one self-attention module. Up-sampling network 4 up-convolution modules, each consisting of 3 residual modules, 3 self-attention modules and one up-sampling convolution. The downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.

The further improvement is that: the training process of the U-Net denoising neural network comprises the following steps: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; and (3) connecting the two pictures in the channel dimension to obtain a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture. The gradient descent is then used to train the U-Net denoising neural network.

The further improvement is that: the U-Net denoising neural network is specifically realized by:

by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageAnd removing noise of various degrees until the regular script picture Y' is obtained. The diffusion model comprises two steps of forward noise adding and reverse noise removing: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y ₀ The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:

wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:

wherein alpha is _1：T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y _o Is supported for sampling at arbitrary time step t, represented by the following formula:

wherein the method comprises the steps ofFor rapid sampling y at step t _t Quite helpful, expressed by the following formula:

for given conditional oracle picturesAnd noise target picture y _t Training a U-Net neural network as a denoising model, denoted +.>For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:

wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ _t-1 And gamma _t Is of uniform distribution U (gamma) _t-1 ，γ _t ) Sampling gamma; furthermore, a given (y _o ，y _t ) Posterior distribution y of (2) _t-1 The following formula is derived:

the inverse denoising process is given by the parameter θ, defined as:

wherein the inverse noise adding process distributes the latent variable p _θ (y _T ) Conversion to data distribution p _θ (y ₀ ) Wherein x is a oracle picture, the above are combined and y is taken as the reference ₀ Substitution posterior distribution q (y _t-1 |y ₀ ，y _t ) Medium parameterisation p _θ (y _t-1 |y _t The mean value of x) is the following formula:

finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:

wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.

The further improvement is that: the U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture at a certain stage can be appointed to be generated from the oracle. This process can be seen as a category condition, for a total of 4 categories: the oracle-to-gold, oracle-to-large seal, oracle-to-clerical script, oracle-to-regular script. The 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.

The further improvement is that: and in the fourth step, a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is designated, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.

The beneficial effects are that:

by embedding the oracle pictures and the character form evolution stage categories as condition information into a condition diffusion model, and training the condition diffusion model by taking the character pictures of the already decoded oracle at different periods as supervision information, the model can predict and generate the character form evolution pictures of each stage corresponding to the appointed oracle pictures, and the oracle experts are assisted in oracle decoding.

Description of the drawings:

FIG. 1 is a schematic flow chart of a method for assisted cracking of oracle bone based on a conditional diffusion model;

FIG. 2 is a schematic diagram of a U-Net denoising neural network in the method of the present invention;

FIG. 3 is a schematic diagram of forward diffusion and reverse reasoning of a conditional diffusion model in the method of the present invention;

FIG. 4 is a schematic diagram of training and reasoning tests of a conditional diffusion model in the method of the present invention.

The specific embodiment is as follows:

the present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

As shown in fig. 1, the embodiment of the invention provides an oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps:

step one: and collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script. The decoded oracle characters comprise font evolution data of five stages, namely 'oracle', 'gold', 'big seal', 'clerical script' and 'regular script'. The five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.

Step two: and (3) preprocessing data, namely matching two character pictures of the same word in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures. The matching of the character and the picture can be carried out by arbitrarily selecting two different character pattern stages of the same character, firstly, unifying all the pictures to a fixed resolution of 100 x 100, then, carrying out random cutting operation with fixed size at the same position of the two paired pictures, finally, combining the paired pictures after random cutting, and ensuring that the combined pictures are paired. The randomly cropped picture size is fixed to a size of 64 x 3, where 3 is the original RGB channel of the picture. For a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.

As shown in fig. 2, the backbone structure of the conditional diffusion model neural network in the method is a U-Net denoising neural network structure, and the components of the neural network structure are composed of a downsampling network, an upsampling network and a jump connection. The downsampling network consists of 5 convolution modules, the first 4 modules are each composed of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module is composed of only two residual modules and one self-attention module. Up-sampling network 4 up-convolution modules, each consisting of 3 residual modules, 3 self-attention modules and one up-sampling convolution. The downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.

The training process of the U-Net denoising neural network is as follows: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; and (3) connecting the two pictures in the channel dimension to obtain a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture. The gradient descent is then used to train the U-Net denoising neural network.

As shown in fig. 3, forward diffusion and reverse reasoning of the conditional diffusion model in the method of the present invention are specifically implemented as:

by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageAnd removing noise of various degrees until the regular script picture Y' is obtained. The diffusion model has two modes of forward noise adding and reverse noise removingThe method comprises the following steps: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y ₀ The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:

the inverse denoising process is given by the parameter θ, defined as:

The U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture at a certain stage can be appointed to be generated from the oracle. This process can be seen as a category condition, for a total of 4 categories: the oracle-to-gold, oracle-to-large seal, oracle-to-clerical script, oracle-to-regular script. The 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.

Step four: the reasoning test process of the conditional diffusion model in the method of the invention as shown in fig. 4 is specifically as follows:

and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method. The specific process is that a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is specified, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.

By embedding the oracle pictures and the class of the character form evolution stage into the conditional diffusion model as the conditional information, the model can predict and generate the character form evolution picture of each stage corresponding to the appointed oracle picture, and the oracle experts are assisted in oracle decoding.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The oracle auxiliary decoding method based on the conditional diffusion model is characterized by comprising the following steps of:

step one: collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script;

step two: data preprocessing, namely matching two character pictures of the same character in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures;

step three: training a network, constructing a conditional diffusion model neural network, sending paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by using the supervision information of the text pictures of the oracle at different periods;

2. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: the first step of the method comprises the step of decoding character evolution data of the oracle of the first five stages, namely, the character of the second five stages, namely, the character of the first five stages, the character of the third five stages, namely, the character of the first five stages, the character of the first three stages; the five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.

3. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: matching the character pictures in the second step, optionally selecting two different font stages of the same character, firstly unifying all the pictures to a fixed resolution of 100 x 100, then performing random cutting operation of a fixed size at the same position of the two paired pictures, finally combining the paired pictures after random cutting, and ensuring that the combined pictures are paired.

4. The conditional diffusion model-based oracle-text auxiliary decoding method according to claim 3, wherein: the size of the randomly cropped picture is fixed to be 64 x 3, wherein 3 is the original RGB channel of the picture; for a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.

5. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: the backbone structure of the conditional diffusion model neural network in the third step is a U-Net denoising neural network structure, and the components of the conditional diffusion model neural network are composed of a downsampling network, an upsampling network and jump connection.

6. The conditional diffusion model-based oracle-bone-text auxiliary decoding method of claim 5, wherein the method comprises the following steps of: the downsampling network consists of 5 convolution modules, wherein the first 4 modules consist of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module consists of only two residual modules and one self-attention module; the up-sampling network comprises 4 up-convolution modules, wherein each module comprises 3 residual modules, 3 self-attention modules and one up-sampling convolution; the downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.

7. The conditional diffusion model-based oracle-bone-text auxiliary decoding method of claim 5, wherein the method comprises the following steps of: the training process of the U-Net denoising neural network is as follows: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; obtaining a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6 after connecting two pictures in the channel dimension, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture; the gradient descent is then used to train the U-Net denoising neural network.

8. The conditional diffusion model-based oracle-text auxiliary decoding method of claim 7, wherein the method comprises the following steps of: the U-Net denoising neural network is specifically realized as follows:

by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageTo remove various degrees of noise from the image,until a regular script picture Y' is obtained; the diffusion model comprises two steps of forward noise adding and reverse noise removing: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y ₀ The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:

for given conditional oracle picturesAnd noise target picture y _t Training a U-Net neural network as denoisingModel, denoted->For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:

wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ _t-1 And gamma _t Is of uniform distribution U (gamma) _t-1 ，γ _t ) Sampling gamma; furthermore, a given (y _o ,y _t ) Posterior distribution y of (2) _t-1 The following formula is derived:

the inverse denoising process is given by the parameter θ, defined as:

wherein the inverse noise adding process distributes the latent variable p _θ (y _T ) Conversion to data distribution p _θ (y ₀ ) Wherein x is a oracle picture, the above are combined and y is taken as the reference _o Substitution posterior distribution q (y _t-1 |y ₀ ，y _t ) Medium parameterisation p _θ (y _t-1 |y _t The mean value of x) is the following formula:

9. The conditional diffusion model-based oracle-based assisted interpretation method of any one of claims 5 to 8, wherein: the U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture of a certain stage is appointed to be generated from the oracle; considering this process as a class condition, there are a total of 4 classes: the oracle to the gold, the oracle to the seal, the oracle to the clerical script and the oracle to the regular script; the 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.

10. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: and in the fourth step, a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is designated, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.