CN117333881A - Oracle auxiliary decoding method based on conditional diffusion model - Google Patents

Oracle auxiliary decoding method based on conditional diffusion model Download PDF

Info

Publication number
CN117333881A
CN117333881A CN202311295878.8A CN202311295878A CN117333881A CN 117333881 A CN117333881 A CN 117333881A CN 202311295878 A CN202311295878 A CN 202311295878A CN 117333881 A CN117333881 A CN 117333881A
Authority
CN
China
Prior art keywords
oracle
pictures
picture
conditional
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311295878.8A
Other languages
Chinese (zh)
Inventor
管海粟
万金鹏
匡嚞玢
张凯乐
王鹏杰
陈文炳
刘禹良
金连文
白翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202311295878.8A priority Critical patent/CN117333881A/en
Publication of CN117333881A publication Critical patent/CN117333881A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19013Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • G06V30/1902Shifting or otherwise transforming the patterns to accommodate for positional errors
    • G06V30/1904Shifting or otherwise transforming the patterns to accommodate for positional errors involving a deformation of the sample or reference pattern; Elastic matching
    • G06V30/19053Shifting or otherwise transforming the patterns to accommodate for positional errors involving a deformation of the sample or reference pattern; Elastic matching based on shape statistics, e.g. active shape models of the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19187Graphical models, e.g. Bayesian networks or Markov models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps: collecting and arranging font evolution data of the decoded oracle characters in five stages of oracle, gold, seal, script and regular script; pairing two text pictures in different periods, and performing random cutting operation with fixed size; constructing a conditional diffusion model neural network, sending the paired pictures into the neural network for training, and optimizing network parameters; inputting an oracle text picture, performing reverse diffusion on the picture by using a trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method. The invention provides a simple and effective oracle font evolution generation model, which utilizes the supervision information of the characters and pictures of the decoded oracle at different periods to achieve the purpose of inputting the oracle pictures, and the model can predict and generate the font pictures of the oracle pictures at any period, thereby assisting oracle specialists to decode the oracle which is not decoded.

Description

Oracle auxiliary decoding method based on conditional diffusion model
Technical Field
The invention relates to the field of artificial intelligence and computer vision, in particular to a method for assisting in decoding oracle bone based on a conditional diffusion model.
Background
The characters are symbolism of civilization and are also an important mark of ethnicity. The oracle, the earliest systematic text in China, is an archaeological resource which is not more precious. The research on the oracle is deepened, the historical background and the cultural connotation of the oracle are deeply excavated, the historical acceptance of the Chinese ethnicity can be enhanced, the self-strengthening of the culture is promoted, new brilliance of the socioeconomic culture is created, and the requisite way of the socioeconomic culture is built.
Although the nations push the research work of the oracle, and more expert scholars are put into the oracle research, the social attention is gradually improved, the subsequent research work is difficult to develop because the oracle cracking rate is still small, and the oracle cracking work is still the core of the oracle research work.
At present, the main difficulties of the oracle interpretation work are focused on the problems of difficult construction of a data set, large difference of written font styles, data loss in the font evolution stage and the like. The interpretation of the oracle bone is complex, and the oracle bone can be really interpreted only by the literal students after multi-dimensional deep research and a large amount of evidence are needed. There is currently little universal method of efficient oracle fracture.
Therefore, the invention provides an oracle auxiliary decoding method based on a conditional diffusion model, which can generate a text picture of each font evolution stage corresponding to oracle and assist oracle specialists to decode the oracle.
Disclosure of Invention
Aiming at the problems, the invention provides a oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps:
step one: and collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script.
Step two: and (3) preprocessing data, namely matching two character pictures of the same word in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures.
Step three: training a network, constructing a conditional diffusion model neural network, sending the paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by utilizing the supervision information of the text pictures of the oracle in different periods.
Step four: and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method.
The further improvement is that: the first step of the method comprises the step of decoding the character of the oracle, wherein the character of the oracle comprises character development data of five stages of 'oracle', 'golden character', 'seal character', 'clerical script' and 'regular script'. The five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.
The further improvement is that: the matching of the character pictures in the second step can be carried out by arbitrarily selecting two different font stages of the same character, firstly unifying all the pictures to a fixed resolution of 100 x 100, then carrying out random cutting operation with a fixed size at the same position of the two paired pictures, finally combining the paired pictures after random cutting, and ensuring that the combined pictures are paired.
The further improvement is that: the randomly cropped picture size is fixed to a size of 64 x 3, where 3 is the original RGB channel of the picture. For a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.
The further improvement is that: the backbone structure of the conditional diffusion model neural network in the third step is a U-Net denoising neural network structure, and the components of the conditional diffusion model neural network are composed of a downsampling network, an upsampling network and a jump connection.
The further improvement is that: the downsampling network consists of 5 convolution modules, wherein the first 4 modules consist of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module consists of only two residual modules and one self-attention module. Up-sampling network 4 up-convolution modules, each consisting of 3 residual modules, 3 self-attention modules and one up-sampling convolution. The downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.
The further improvement is that: the training process of the U-Net denoising neural network comprises the following steps: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; and (3) connecting the two pictures in the channel dimension to obtain a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture. The gradient descent is then used to train the U-Net denoising neural network.
The further improvement is that: the U-Net denoising neural network is specifically realized by:
by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageAnd removing noise of various degrees until the regular script picture Y' is obtained. The diffusion model comprises two steps of forward noise adding and reverse noise removing: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y 0 The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:
wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:
wherein alpha is 1:T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y o Is supported for sampling at arbitrary time step t, represented by the following formula:
wherein the method comprises the steps ofFor rapid sampling y at step t t Quite helpful, expressed by the following formula:
for given conditional oracle picturesAnd noise target picture y t Training a U-Net neural network as a denoising model, denoted +.>For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:
wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ t-1 And gamma t Is of uniform distribution U (gamma) t-1 ,γ t ) Sampling gamma; furthermore, a given (y o ,y t ) Posterior distribution y of (2) t-1 The following formula is derived:
the inverse denoising process is given by the parameter θ, defined as:
wherein the inverse noise adding process distributes the latent variable p θ (y T ) Conversion to data distribution p θ (y 0 ) Wherein x is a oracle picture, the above are combined and y is taken as the reference 0 Substitution posterior distribution q (y t-1 |y 0 ,y t ) Medium parameterisation p θ (y t-1 |y t The mean value of x) is the following formula:
finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:
wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.
The further improvement is that: the U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture at a certain stage can be appointed to be generated from the oracle. This process can be seen as a category condition, for a total of 4 categories: the oracle-to-gold, oracle-to-large seal, oracle-to-clerical script, oracle-to-regular script. The 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.
The further improvement is that: and in the fourth step, a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is designated, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.
The beneficial effects are that:
by embedding the oracle pictures and the character form evolution stage categories as condition information into a condition diffusion model, and training the condition diffusion model by taking the character pictures of the already decoded oracle at different periods as supervision information, the model can predict and generate the character form evolution pictures of each stage corresponding to the appointed oracle pictures, and the oracle experts are assisted in oracle decoding.
Description of the drawings:
FIG. 1 is a schematic flow chart of a method for assisted cracking of oracle bone based on a conditional diffusion model;
FIG. 2 is a schematic diagram of a U-Net denoising neural network in the method of the present invention;
FIG. 3 is a schematic diagram of forward diffusion and reverse reasoning of a conditional diffusion model in the method of the present invention;
FIG. 4 is a schematic diagram of training and reasoning tests of a conditional diffusion model in the method of the present invention.
The specific embodiment is as follows:
the present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
As shown in fig. 1, the embodiment of the invention provides an oracle auxiliary decoding method based on a conditional diffusion model, which comprises the following steps:
step one: and collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script. The decoded oracle characters comprise font evolution data of five stages, namely 'oracle', 'gold', 'big seal', 'clerical script' and 'regular script'. The five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.
Step two: and (3) preprocessing data, namely matching two character pictures of the same word in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures. The matching of the character and the picture can be carried out by arbitrarily selecting two different character pattern stages of the same character, firstly, unifying all the pictures to a fixed resolution of 100 x 100, then, carrying out random cutting operation with fixed size at the same position of the two paired pictures, finally, combining the paired pictures after random cutting, and ensuring that the combined pictures are paired. The randomly cropped picture size is fixed to a size of 64 x 3, where 3 is the original RGB channel of the picture. For a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.
Step three: training a network, constructing a conditional diffusion model neural network, sending the paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by utilizing the supervision information of the text pictures of the oracle in different periods.
As shown in fig. 2, the backbone structure of the conditional diffusion model neural network in the method is a U-Net denoising neural network structure, and the components of the neural network structure are composed of a downsampling network, an upsampling network and a jump connection. The downsampling network consists of 5 convolution modules, the first 4 modules are each composed of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module is composed of only two residual modules and one self-attention module. Up-sampling network 4 up-convolution modules, each consisting of 3 residual modules, 3 self-attention modules and one up-sampling convolution. The downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.
The training process of the U-Net denoising neural network is as follows: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; and (3) connecting the two pictures in the channel dimension to obtain a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture. The gradient descent is then used to train the U-Net denoising neural network.
As shown in fig. 3, forward diffusion and reverse reasoning of the conditional diffusion model in the method of the present invention are specifically implemented as:
by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageAnd removing noise of various degrees until the regular script picture Y' is obtained. The diffusion model has two modes of forward noise adding and reverse noise removingThe method comprises the following steps: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y 0 The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:
wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:
wherein alpha is 1:T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y o Is supported for sampling at arbitrary time step t, represented by the following formula:
wherein the method comprises the steps ofFor rapid sampling y at step t t Quite helpful, expressed by the following formula:
for given conditional oracle picturesAnd noise target picture y t Training a U-Net neural network as a denoising model, denoted +.>For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:
wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ t-1 And gamma t Is of uniform distribution U (gamma) t-1 ,γ t ) Sampling gamma; furthermore, a given (y o ,y t ) Posterior distribution y of (2) t-1 The following formula is derived:
the inverse denoising process is given by the parameter θ, defined as:
wherein the inverse noise adding process distributes the latent variable p θ (y T ) Conversion to data distribution p θ (y 0 ) Wherein x is a oracle picture, the above are combined and y is taken as the reference 0 Substitution posterior distribution q (y t-1 |y 0 ,y t ) Medium parameterisation p θ (y t-1 |y t The mean value of x) is the following formula:
finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:
wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.
The U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture at a certain stage can be appointed to be generated from the oracle. This process can be seen as a category condition, for a total of 4 categories: the oracle-to-gold, oracle-to-large seal, oracle-to-clerical script, oracle-to-regular script. The 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.
Step four: the reasoning test process of the conditional diffusion model in the method of the invention as shown in fig. 4 is specifically as follows:
and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method. The specific process is that a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is specified, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.
By embedding the oracle pictures and the class of the character form evolution stage into the conditional diffusion model as the conditional information, the model can predict and generate the character form evolution picture of each stage corresponding to the appointed oracle picture, and the oracle experts are assisted in oracle decoding.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. The oracle auxiliary decoding method based on the conditional diffusion model is characterized by comprising the following steps of:
step one: collecting data, namely collecting the decoded oracle characters, and finishing font evolution data of the characters in five stages of oracle, gold, seal, clerical script and regular script;
step two: data preprocessing, namely matching two character pictures of the same character in different periods, performing random cutting operation with fixed size, and combining the cut paired pictures;
step three: training a network, constructing a conditional diffusion model neural network, sending paired pictures into the neural network, and training the conditional diffusion model neural network until convergence by taking Gaussian noise regression loss at a plurality of moments in the forward diffusion process as a loss function of the network by using the supervision information of the text pictures of the oracle at different periods;
step four: and testing a network, storing a trained conditional diffusion model, inputting a oracle text picture, performing reverse diffusion on the picture by using the trained conditional diffusion model, and generating an oracle font evolution picture by using a weighted sliding method.
2. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: the first step of the method comprises the step of decoding character evolution data of the oracle of the first five stages, namely, the character of the second five stages, namely, the character of the first five stages, the character of the third five stages, namely, the character of the first five stages, the character of the first three stages; the five-stage fonts do not necessarily exist all but at least two-stage fonts of the oracle and the regular script are needed.
3. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: matching the character pictures in the second step, optionally selecting two different font stages of the same character, firstly unifying all the pictures to a fixed resolution of 100 x 100, then performing random cutting operation of a fixed size at the same position of the two paired pictures, finally combining the paired pictures after random cutting, and ensuring that the combined pictures are paired.
4. The conditional diffusion model-based oracle-text auxiliary decoding method according to claim 3, wherein: the size of the randomly cropped picture is fixed to be 64 x 3, wherein 3 is the original RGB channel of the picture; for a pair of two-stage glyph pictures, the spatial positions of random cropping must be the same.
5. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: the backbone structure of the conditional diffusion model neural network in the third step is a U-Net denoising neural network structure, and the components of the conditional diffusion model neural network are composed of a downsampling network, an upsampling network and jump connection.
6. The conditional diffusion model-based oracle-bone-text auxiliary decoding method of claim 5, wherein the method comprises the following steps of: the downsampling network consists of 5 convolution modules, wherein the first 4 modules consist of 2 residual modules, 2 self-attention modules and one downsampling convolution, and the 5 th convolution module consists of only two residual modules and one self-attention module; the up-sampling network comprises 4 up-convolution modules, wherein each module comprises 3 residual modules, 3 self-attention modules and one up-sampling convolution; the downsampling and upsampling modules with the same resolution are formed by using jump connection, namely, a downsampling convolution characteristic diagram and an upsampling convolution characteristic diagram are connected in channel dimension, and then the downsampling and upsampling modules are sent into an upsampling network to perform residual convolution and self-attention calculation.
7. The conditional diffusion model-based oracle-bone-text auxiliary decoding method of claim 5, wherein the method comprises the following steps of: the training process of the U-Net denoising neural network is as follows: an RGB oracle picture with the height of 64, the width of 64 and the number of color channels of 3 and an RGB noisy regular script picture with the height of 64, the width of 64 and the number of color channels of 3 are given; obtaining a fusion characteristic diagram with the height of 64, the width of 64 and the channel number of 6 after connecting two pictures in the channel dimension, then entering U-Net, inputting a prediction noise with the height of 64, the width of 64 and the channel number of 3, and obtaining a loss function by calculating the distance between the prediction noise and the noise adding noise of the regular script picture; the gradient descent is then used to train the U-Net denoising neural network.
8. The conditional diffusion model-based oracle-text auxiliary decoding method of claim 7, wherein the method comprises the following steps of: the U-Net denoising neural network is specifically realized as follows:
by pictures of oracleTraining U-Net denoising neural network model for condition, gradually from pure noise imageTo remove various degrees of noise from the image,until a regular script picture Y' is obtained; the diffusion model comprises two steps of forward noise adding and reverse noise removing: the forward noise adding process q is regarded as a Markov chain model to continuously process the original image, namely the regular script picture y 0 The gaussian noise is added to =y' until a pure noise image is obtained, which is expressed by the following equation:
wherein T is the number of steps of the diffusion model, and each iteration of the forward process is denoised according to the following equation:
wherein alpha is 1:T For the super parameter between 0 and 1, determining the noise variable of each step, wherein I is a standard identity matrix, and the forward noise adding process is carried out on a given original picture y o Is supported for sampling at arbitrary time step t, represented by the following formula:
wherein the method comprises the steps ofFor rapid sampling y at step t t Quite helpful, expressed by the following formula:
for given conditional oracle picturesAnd noise target picture y t Training a U-Net neural network as denoisingModel, denoted->For predicting noise vector e, where ∈ ->For conditional oracle pictures, γ is the statistic of noise variance, and finally the diffusion loss term is minimized:
wherein gamma-p (gamma) and is setDuring training, first a time step T to 0, is uniformly sampled, then from γ t-1 And gamma t Is of uniform distribution U (gamma) t-1 ,γ t ) Sampling gamma; furthermore, a given (y o ,y t ) Posterior distribution y of (2) t-1 The following formula is derived:
the inverse denoising process is given by the parameter θ, defined as:
wherein the inverse noise adding process distributes the latent variable p θ (y T ) Conversion to data distribution p θ (y 0 ) Wherein x is a oracle picture, the above are combined and y is taken as the reference o Substitution posterior distribution q (y t-1 |y 0 ,y t ) Medium parameterisation p θ (y t-1 |y t The mean value of x) is the following formula:
finally, in the reasoning stage, the heavy parameters of the reverse process are obtained by the following formula:
wherein the method comprises the steps ofThe model is finally predicted using +.>As an output.
9. The conditional diffusion model-based oracle-based assisted interpretation method of any one of claims 5 to 8, wherein: the U-Net denoising neural network can perform multi-stage oracle inference, namely, a character pattern picture of a certain stage is appointed to be generated from the oracle; considering this process as a class condition, there are a total of 4 classes: the oracle to the gold, the oracle to the seal, the oracle to the clerical script and the oracle to the regular script; the 4 categories are sampled together at the same time and are converted into embedded information which is embedded into the characteristic diagram of each convolution module, so that the purpose of specifying the font generation stage is achieved.
10. The method for assisted oracle bone fracture according to claim 1, wherein the method is characterized by comprising the following steps: and in the fourth step, a piece of oracle pictures is subjected to sliding window cutting, a series of cut pictures are input into a conditional diffusion model, after the font stage category of the generated pictures is designated, the diffusion model outputs the prediction generation result of each cut picture, and the pictures output after the generation results are subjected to weighted sliding are the final prediction generation result.
CN202311295878.8A 2023-10-07 2023-10-07 Oracle auxiliary decoding method based on conditional diffusion model Pending CN117333881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311295878.8A CN117333881A (en) 2023-10-07 2023-10-07 Oracle auxiliary decoding method based on conditional diffusion model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311295878.8A CN117333881A (en) 2023-10-07 2023-10-07 Oracle auxiliary decoding method based on conditional diffusion model

Publications (1)

Publication Number Publication Date
CN117333881A true CN117333881A (en) 2024-01-02

Family

ID=89292667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311295878.8A Pending CN117333881A (en) 2023-10-07 2023-10-07 Oracle auxiliary decoding method based on conditional diffusion model

Country Status (1)

Country Link
CN (1) CN117333881A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117809318A (en) * 2024-03-01 2024-04-02 微山同在电子信息科技有限公司 Oracle identification method and system based on machine vision
CN117809318B (en) * 2024-03-01 2024-05-28 微山同在电子信息科技有限公司 Oracle identification method and system based on machine vision

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117809318A (en) * 2024-03-01 2024-04-02 微山同在电子信息科技有限公司 Oracle identification method and system based on machine vision
CN117809318B (en) * 2024-03-01 2024-05-28 微山同在电子信息科技有限公司 Oracle identification method and system based on machine vision

Similar Documents

Publication Publication Date Title
CN111428718B (en) Natural scene text recognition method based on image enhancement
CN108170649B (en) Chinese character library generation method and device based on DCGAN deep network
CN109857871B (en) User relationship discovery method based on social network mass contextual data
CN111062329B (en) Unsupervised pedestrian re-identification method based on augmented network
CN111401156B (en) Image identification method based on Gabor convolution neural network
CN110599502A (en) Skin lesion segmentation method based on deep learning
CN116682120A (en) Multilingual mosaic image text recognition method based on deep learning
CN116958827A (en) Deep learning-based abandoned land area extraction method
Wang et al. A new blind image denoising method based on asymmetric generative adversarial network
CN113762265A (en) Pneumonia classification and segmentation method and system
CN117474796B (en) Image generation method, device, equipment and computer readable storage medium
CN114037893A (en) High-resolution remote sensing image building extraction method based on convolutional neural network
CN113140023A (en) Text-to-image generation method and system based on space attention
CN105069767A (en) Image super-resolution reconstruction method based on representational learning and neighbor constraint embedding
CN117333881A (en) Oracle auxiliary decoding method based on conditional diffusion model
CN113743315B (en) Handwriting elementary mathematical formula identification method based on structure enhancement
CN116433934A (en) Multi-mode pre-training method for generating CT image representation and image report
CN114821067A (en) Pathological image segmentation method based on point annotation data
CN115471718A (en) Construction and detection method of lightweight significance target detection model based on multi-scale learning
CN114519344A (en) Discourse element sub-graph prompt generation and guide-based discourse-level multi-event extraction method
CN111260570B (en) Binarization background noise simulation method for posts based on cyclic consistency confrontation network
Fang et al. A New Method of Image Restoration Technology Based on WGAN.
CN114581789A (en) Hyperspectral image classification method and system
CN112396598A (en) Image matting method and system based on single-stage multi-task collaborative learning
CN113901913A (en) Convolution network for ancient book document image binaryzation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination