CN117333881A - Oracle auxiliary decoding method based on conditional diffusion model - Google Patents
Oracle auxiliary decoding method based on conditional diffusion model Download PDFInfo
- Publication number
- CN117333881A CN117333881A CN202311295878.8A CN202311295878A CN117333881A CN 117333881 A CN117333881 A CN 117333881A CN 202311295878 A CN202311295878 A CN 202311295878A CN 117333881 A CN117333881 A CN 117333881A
- Authority
- CN
- China
- Prior art keywords
- oracle
- pictures
- picture
- conditional
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000009792 diffusion process Methods 0.000 title claims abstract description 64
- 238000013528 artificial neural network Methods 0.000 claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000005520 cutting process Methods 0.000 claims abstract description 13
- 239000010931 gold Substances 0.000 claims abstract description 8
- 229910052737 gold Inorganic materials 0.000 claims abstract description 8
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 claims abstract description 6
- 238000010586 diagram Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 15
- 238000009826 distribution Methods 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 238000009827 uniform distribution Methods 0.000 claims description 3
- 208000010392 Bone Fractures Diseases 0.000 claims 4
- 238000011160 research Methods 0.000 description 6
- 210000000988 bone and bone Anatomy 0.000 description 4
- 238000005336 cracking Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/19007—Matching; Proximity measures
- G06V30/19013—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
- G06V30/1902—Shifting or otherwise transforming the patterns to accommodate for positional errors
- G06V30/1904—Shifting or otherwise transforming the patterns to accommodate for positional errors involving a deformation of the sample or reference pattern; Elastic matching
- G06V30/19053—Shifting or otherwise transforming the patterns to accommodate for positional errors involving a deformation of the sample or reference pattern; Elastic matching based on shape statistics, e.g. active shape models of the pattern to be recognised
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19187—Graphical models, e.g. Bayesian networks or Markov models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps: collecting and arranging font evolution data of the decoded oracle characters in five stages of oracle, gold, seal, script and regular script; pairing two text pictures in different periods, and performing random cutting operation with fixed size; constructing a conditional diffusion model neural network, sending the paired pictures into the neural network for training, and optimizing network parameters; inputting an oracle text picture, performing reverse diffusion on the picture by using a trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method. The invention provides a simple and effective oracle font evolution generation model, which utilizes the supervision information of the characters and pictures of the decoded oracle at different periods to achieve the purpose of inputting the oracle pictures, and the model can predict and generate the font pictures of the oracle pictures at any period, thereby assisting oracle specialists to decode the oracle which is not decoded.
Description
Technical Field
The invention relates to the field of artificial intelligence and computer vision, in particular to a method for assisting in decoding oracle bone based on a conditional diffusion model.
Background
The characters are symbolism of civilization and are also an important mark of ethnicity. The oracle, the earliest systematic text in China, is an archaeological resource which is not more precious. The research on the oracle is deepened, the historical background and the cultural connotation of the oracle are deeply excavated, the historical acceptance of the Chinese ethnicity can be enhanced, the self-strengthening of the culture is promoted, new brilliance of the socioeconomic culture is created, and the requisite way of the socioeconomic culture is built.
Although the nations push the research work of the oracle, and more expert scholars are put into the oracle research, the social attention is gradually improved, the subsequent research work is difficult to develop because the oracle cracking rate is still small, and the oracle cracking work is still the core of the oracle research work.
At present, the main difficulties of the oracle interpretation work are focused on the problems of difficult construction of a data set, large difference of written font styles, data loss in the font evolution stage and the like. The interpretation of the oracle bone is complex, and the oracle bone can be really interpreted only by the literal students after multi-dimensional deep research and a large amount of evidence are needed. There is currently little universal method of efficient oracle fracture.
Therefore, the invention provides an oracle auxiliary decoding method based on a conditional diffusion model, which can generate a text picture of each font evolution stage corresponding to oracle and assist oracle specialists to decode the oracle.
Disclosure of Invention
Aiming at the problems, the invention provides a oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps:
step one: and collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script.
Step two: and (3) preprocessing data, namely matching two character pictures of the same word in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures.
Step three: training a network, constructing a conditional diffusion model neural network, sending the paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by utilizing the supervision information of the text pictures of the oracle in different periods.
Step four: and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method.
The further improvement is that: the first step of the method comprises the step of decoding the character of the oracle, wherein the character of the oracle comprises character development data of five stages of 'oracle', 'golden character', 'seal character', 'clerical script' and 'regular script'. The five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.
The further improvement is that: the matching of the character pictures in the second step can be carried out by arbitrarily selecting two different font stages of the same character, firstly unifying all the pictures to a fixed resolution of 100 x 100, then carrying out random cutting operation with a fixed size at the same position of the two paired pictures, finally combining the paired pictures after random cutting, and ensuring that the combined pictures are paired.
The further improvement is that: the randomly cropped picture size is fixed to a size of 64 x 3, where 3 is the original RGB channel of the picture. For a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.
The further improvement is that: the backbone structure of the conditional diffusion model neural network in the third step is a U-Net denoising neural network structure, and the components of the conditional diffusion model neural network are composed of a downsampling network, an upsampling network and a jump connection.
The further improvement is that: the downsampling network consists of 5 convolution modules, wherein the first 4 modules consist of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module consists of only two residual modules and one self-attention module. Up-sampling network 4 up-convolution modules, each consisting of 3 residual modules, 3 self-attention modules and one up-sampling convolution. The downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.
The further improvement is that: the training process of the U-Net denoising neural network comprises the following steps: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; and (3) connecting the two pictures in the channel dimension to obtain a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture. The gradient descent is then used to train the U-Net denoising neural network.
The further improvement is that: the U-Net denoising neural network is specifically realized by:
by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageAnd removing noise of various degrees until the regular script picture Y' is obtained. The diffusion model comprises two steps of forward noise adding and reverse noise removing: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y 0 The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:
wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:
wherein alpha is 1:T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y o Is supported for sampling at arbitrary time step t, represented by the following formula:
wherein the method comprises the steps ofFor rapid sampling y at step t t Quite helpful, expressed by the following formula:
for given conditional oracle picturesAnd noise target picture y t Training a U-Net neural network as a denoising model, denoted +.>For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:
wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ t-1 And gamma t Is of uniform distribution U (gamma) t-1 ,γ t ) Sampling gamma; furthermore, a given (y o ,y t ) Posterior distribution y of (2) t-1 The following formula is derived:
the inverse denoising process is given by the parameter θ, defined as:
wherein the inverse noise adding process distributes the latent variable p θ (y T ) Conversion to data distribution p θ (y 0 ) Wherein x is a oracle picture, the above are combined and y is taken as the reference 0 Substitution posterior distribution q (y t-1 |y 0 ,y t ) Medium parameterisation p θ (y t-1 |y t The mean value of x) is the following formula:
finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:
wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.
The further improvement is that: the U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture at a certain stage can be appointed to be generated from the oracle. This process can be seen as a category condition, for a total of 4 categories: the oracle-to-gold, oracle-to-large seal, oracle-to-clerical script, oracle-to-regular script. The 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.
The further improvement is that: and in the fourth step, a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is designated, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.
The beneficial effects are that:
by embedding the oracle pictures and the character form evolution stage categories as condition information into a condition diffusion model, and training the condition diffusion model by taking the character pictures of the already decoded oracle at different periods as supervision information, the model can predict and generate the character form evolution pictures of each stage corresponding to the appointed oracle pictures, and the oracle experts are assisted in oracle decoding.
Description of the drawings:
FIG. 1 is a schematic flow chart of a method for assisted cracking of oracle bone based on a conditional diffusion model;
FIG. 2 is a schematic diagram of a U-Net denoising neural network in the method of the present invention;
FIG. 3 is a schematic diagram of forward diffusion and reverse reasoning of a conditional diffusion model in the method of the present invention;
FIG. 4 is a schematic diagram of training and reasoning tests of a conditional diffusion model in the method of the present invention.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, the embodiment of the invention provides an oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps:
step one: and collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script. The decoded oracle characters comprise font evolution data of five stages, namely 'oracle', 'gold', 'big seal', 'clerical script' and 'regular script'. The five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.
Step two: and (3) preprocessing data, namely matching two character pictures of the same word in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures. The matching of the character and the picture can be carried out by arbitrarily selecting two different character pattern stages of the same character, firstly, unifying all the pictures to a fixed resolution of 100 x 100, then, carrying out random cutting operation with fixed size at the same position of the two paired pictures, finally, combining the paired pictures after random cutting, and ensuring that the combined pictures are paired. The randomly cropped picture size is fixed to a size of 64 x 3, where 3 is the original RGB channel of the picture. For a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.
Step three: training a network, constructing a conditional diffusion model neural network, sending the paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by utilizing the supervision information of the text pictures of the oracle in different periods.
As shown in fig. 2, the backbone structure of the conditional diffusion model neural network in the method is a U-Net denoising neural network structure, and the components of the neural network structure are composed of a downsampling network, an upsampling network and a jump connection. The downsampling network consists of 5 convolution modules, the first 4 modules are each composed of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module is composed of only two residual modules and one self-attention module. Up-sampling network 4 up-convolution modules, each consisting of 3 residual modules, 3 self-attention modules and one up-sampling convolution. The downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.
The training process of the U-Net denoising neural network is as follows: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; and (3) connecting the two pictures in the channel dimension to obtain a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture. The gradient descent is then used to train the U-Net denoising neural network.
As shown in fig. 3, forward diffusion and reverse reasoning of the conditional diffusion model in the method of the present invention are specifically implemented as:
by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageAnd removing noise of various degrees until the regular script picture Y' is obtained. The diffusion model has two modes of forward noise adding and reverse noise removingThe method comprises the following steps: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y 0 The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:
wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:
wherein alpha is 1:T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y o Is supported for sampling at arbitrary time step t, represented by the following formula:
wherein the method comprises the steps ofFor rapid sampling y at step t t Quite helpful, expressed by the following formula:
for given conditional oracle picturesAnd noise target picture y t Training a U-Net neural network as a denoising model, denoted +.>For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:
wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ t-1 And gamma t Is of uniform distribution U (gamma) t-1 ,γ t ) Sampling gamma; furthermore, a given (y o ,y t ) Posterior distribution y of (2) t-1 The following formula is derived:
the inverse denoising process is given by the parameter θ, defined as:
wherein the inverse noise adding process distributes the latent variable p θ (y T ) Conversion to data distribution p θ (y 0 ) Wherein x is a oracle picture, the above are combined and y is taken as the reference 0 Substitution posterior distribution q (y t-1 |y 0 ,y t ) Medium parameterisation p θ (y t-1 |y t The mean value of x) is the following formula:
finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:
wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.
The U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture at a certain stage can be appointed to be generated from the oracle. This process can be seen as a category condition, for a total of 4 categories: the oracle-to-gold, oracle-to-large seal, oracle-to-clerical script, oracle-to-regular script. The 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.
Step four: the reasoning test process of the conditional diffusion model in the method of the invention as shown in fig. 4 is specifically as follows:
and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method. The specific process is that a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is specified, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.
By embedding the oracle pictures and the class of the character form evolution stage into the conditional diffusion model as the conditional information, the model can predict and generate the character form evolution picture of each stage corresponding to the appointed oracle picture, and the oracle experts are assisted in oracle decoding.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (10)
1. The oracle auxiliary decoding method based on the conditional diffusion model is characterized by comprising the following steps of:
step one: collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script;
step two: data preprocessing, namely matching two character pictures of the same character in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures;
step three: training a network, constructing a conditional diffusion model neural network, sending paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by using the supervision information of the text pictures of the oracle at different periods;
step four: and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method.
2. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: the first step of the method comprises the step of decoding character evolution data of the oracle of the first five stages, namely, the character of the second five stages, namely, the character of the first five stages, the character of the third five stages, namely, the character of the first five stages, the character of the first three stages; the five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.
3. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: matching the character pictures in the second step, optionally selecting two different font stages of the same character, firstly unifying all the pictures to a fixed resolution of 100 x 100, then performing random cutting operation of a fixed size at the same position of the two paired pictures, finally combining the paired pictures after random cutting, and ensuring that the combined pictures are paired.
4. The conditional diffusion model-based oracle-text auxiliary decoding method according to claim 3, wherein: the size of the randomly cropped picture is fixed to be 64 x 3, wherein 3 is the original RGB channel of the picture; for a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.
5. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: the backbone structure of the conditional diffusion model neural network in the third step is a U-Net denoising neural network structure, and the components of the conditional diffusion model neural network are composed of a downsampling network, an upsampling network and jump connection.
6. The conditional diffusion model-based oracle-bone-text auxiliary decoding method of claim 5, wherein the method comprises the following steps of: the downsampling network consists of 5 convolution modules, wherein the first 4 modules consist of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module consists of only two residual modules and one self-attention module; the up-sampling network comprises 4 up-convolution modules, wherein each module comprises 3 residual modules, 3 self-attention modules and one up-sampling convolution; the downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.
7. The conditional diffusion model-based oracle-bone-text auxiliary decoding method of claim 5, wherein the method comprises the following steps of: the training process of the U-Net denoising neural network is as follows: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; obtaining a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6 after connecting two pictures in the channel dimension, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture; the gradient descent is then used to train the U-Net denoising neural network.
8. The conditional diffusion model-based oracle-text auxiliary decoding method of claim 7, wherein the method comprises the following steps of: the U-Net denoising neural network is specifically realized as follows:
by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageTo remove various degrees of noise from the image,until a regular script picture Y' is obtained; the diffusion model comprises two steps of forward noise adding and reverse noise removing: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y 0 The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:
wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:
wherein alpha is 1:T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y o Is supported for sampling at arbitrary time step t, represented by the following formula:
wherein the method comprises the steps ofFor rapid sampling y at step t t Quite helpful, expressed by the following formula:
for given conditional oracle picturesAnd noise target picture y t Training a U-Net neural network as denoisingModel, denoted->For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:
wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ t-1 And gamma t Is of uniform distribution U (gamma) t-1 ,γ t ) Sampling gamma; furthermore, a given (y o ,y t ) Posterior distribution y of (2) t-1 The following formula is derived:
the inverse denoising process is given by the parameter θ, defined as:
wherein the inverse noise adding process distributes the latent variable p θ (y T ) Conversion to data distribution p θ (y 0 ) Wherein x is a oracle picture, the above are combined and y is taken as the reference o Substitution posterior distribution q (y t-1 |y 0 ,y t ) Medium parameterisation p θ (y t-1 |y t The mean value of x) is the following formula:
finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:
wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.
9. The conditional diffusion model-based oracle-based assisted interpretation method of any one of claims 5 to 8, wherein: the U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture of a certain stage is appointed to be generated from the oracle; considering this process as a class condition, there are a total of 4 classes: the oracle to the gold, the oracle to the seal, the oracle to the clerical script and the oracle to the regular script; the 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.
10. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: and in the fourth step, a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is designated, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311295878.8A CN117333881A (en) | 2023-10-07 | 2023-10-07 | Oracle auxiliary decoding method based on conditional diffusion model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311295878.8A CN117333881A (en) | 2023-10-07 | 2023-10-07 | Oracle auxiliary decoding method based on conditional diffusion model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117333881A true CN117333881A (en) | 2024-01-02 |
Family
ID=89292667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311295878.8A Pending CN117333881A (en) | 2023-10-07 | 2023-10-07 | Oracle auxiliary decoding method based on conditional diffusion model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117333881A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117809318A (en) * | 2024-03-01 | 2024-04-02 | 微山同在电子信息科技有限公司 | Oracle identification method and system based on machine vision |
CN117809318B (en) * | 2024-03-01 | 2024-05-28 | 微山同在电子信息科技有限公司 | Oracle identification method and system based on machine vision |
-
2023
- 2023-10-07 CN CN202311295878.8A patent/CN117333881A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117809318A (en) * | 2024-03-01 | 2024-04-02 | 微山同在电子信息科技有限公司 | Oracle identification method and system based on machine vision |
CN117809318B (en) * | 2024-03-01 | 2024-05-28 | 微山同在电子信息科技有限公司 | Oracle identification method and system based on machine vision |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111428718B (en) | Natural scene text recognition method based on image enhancement | |
CN108170649B (en) | Chinese character library generation method and device based on DCGAN deep network | |
CN109857871B (en) | User relationship discovery method based on social network mass contextual data | |
CN111062329B (en) | Unsupervised pedestrian re-identification method based on augmented network | |
CN111401156B (en) | Image identification method based on Gabor convolution neural network | |
CN110599502A (en) | Skin lesion segmentation method based on deep learning | |
CN116682120A (en) | Multilingual mosaic image text recognition method based on deep learning | |
CN116958827A (en) | Deep learning-based abandoned land area extraction method | |
Wang et al. | A new blind image denoising method based on asymmetric generative adversarial network | |
CN113762265A (en) | Pneumonia classification and segmentation method and system | |
CN117474796B (en) | Image generation method, device, equipment and computer readable storage medium | |
CN114037893A (en) | High-resolution remote sensing image building extraction method based on convolutional neural network | |
CN113140023A (en) | Text-to-image generation method and system based on space attention | |
CN105069767A (en) | Image super-resolution reconstruction method based on representational learning and neighbor constraint embedding | |
CN117333881A (en) | Oracle auxiliary decoding method based on conditional diffusion model | |
CN113743315B (en) | Handwriting elementary mathematical formula identification method based on structure enhancement | |
CN116433934A (en) | Multi-mode pre-training method for generating CT image representation and image report | |
CN114821067A (en) | Pathological image segmentation method based on point annotation data | |
CN115471718A (en) | Construction and detection method of lightweight significance target detection model based on multi-scale learning | |
CN114519344A (en) | Discourse element sub-graph prompt generation and guide-based discourse-level multi-event extraction method | |
CN111260570B (en) | Binarization background noise simulation method for posts based on cyclic consistency confrontation network | |
Fang et al. | A New Method of Image Restoration Technology Based on WGAN. | |
CN114581789A (en) | Hyperspectral image classification method and system | |
CN112396598A (en) | Image matting method and system based on single-stage multi-task collaborative learning | |
CN113901913A (en) | Convolution network for ancient book document image binaryzation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |