CN115019323A - Handwriting erasing method and device, electronic equipment and storage medium - Google Patents
Handwriting erasing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN115019323A CN115019323A CN202210660738.5A CN202210660738A CN115019323A CN 115019323 A CN115019323 A CN 115019323A CN 202210660738 A CN202210660738 A CN 202210660738A CN 115019323 A CN115019323 A CN 115019323A
- Authority
- CN
- China
- Prior art keywords
- handwriting
- image
- sample
- prediction
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000012549 training Methods 0.000 claims abstract description 101
- 230000006870 function Effects 0.000 claims description 175
- 238000012545 processing Methods 0.000 claims description 79
- 238000010586 diagram Methods 0.000 claims description 47
- 230000004927 fusion Effects 0.000 claims description 27
- 238000005070 sampling Methods 0.000 claims description 22
- 238000001914 filtration Methods 0.000 claims description 8
- 238000007639 printing Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 15
- 238000004590 computer program Methods 0.000 description 15
- 238000004364 calculation method Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000873 masking effect Effects 0.000 description 2
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/36—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure provides a handwriting erasing method, including: acquiring an image to be detected, wherein the image to be detected comprises print handwriting and handwriting; inputting the image to be detected into the handwriting erasing model to obtain a handwriting prediction graph and a print prediction graph of the image to be detected, wherein the handwriting erasing model is obtained based on sample image training with the handwriting image and the print image; and removing the residual handwriting in the print prediction graph according to the handwriting prediction graph to obtain an image with the erased handwriting. By adopting the scheme, the handwriting residue in the print prediction image can be eliminated according to the handwriting prediction image output by the handwriting erasing model, so that the final image with clean handwriting can be obtained, and a better handwriting erasing effect can be achieved.
Description
Technical Field
The present disclosure relates to the field of deep learning technologies, and in particular, to a handwriting erasing method and apparatus, an electronic device, and a storage medium.
Background
In a scene of taking pictures and processing paper, electronically removing handwritten handwriting on the paper is a document restoration technology and is widely applied to the fields of education, office work and the like.
Currently, in the related art, there are methods such as clustering and the like used as handwriting erasing methods. Researches find that the existing handwriting erasing method has poor erasing effect and is difficult to erase the handwriting traces in the paper cleanly.
Disclosure of Invention
In order to solve the technical problems or at least partially solve the technical problems, embodiments of the present disclosure provide a handwriting erasing method, apparatus, electronic device, and storage medium.
According to an aspect of the present disclosure, there is provided a method for training a handwriting-erasure model, including:
acquiring a training sample set, wherein the training sample set comprises sample images, and the sample images comprise sample handwritten form images and sample print form images;
training a model to be trained based on the training sample set to obtain a prediction result of the model to be trained on the sample image, wherein the prediction result comprises a sample handwriting prediction graph and a sample print prediction graph;
determining a loss function value of the model to be trained according to the sample handwritten form prediction graph, the sample handwritten form image, the sample print form prediction graph and the sample print form image;
and updating the network parameters of the model to be trained according to the loss function values to carry out iterative training until the loss function values of the model to be trained are less than or equal to a preset value, so as to obtain the handwriting erasure model.
According to another aspect of the present disclosure, there is provided a handwriting erasing method, the method including:
acquiring an image to be detected, wherein the image to be detected comprises print handwriting and handwriting;
inputting the image to be detected into a handwriting and erasing model to obtain a handwriting prediction graph and a print prediction graph of the image to be detected, wherein the handwriting and erasing model is obtained by training based on a sample image with the handwriting image and the print image;
and removing the residual handwritten handwriting in the print prediction graph according to the handwritten character prediction graph to obtain an image with the erased handwritten handwriting.
According to another aspect of the present disclosure, there is provided a training apparatus for a handwriting-erasure model, including:
the system comprises a sample set acquisition module, a sample analysis module and a sample analysis module, wherein the sample set acquisition module is used for acquiring a training sample set, the training sample set comprises sample images, and the sample images comprise sample handwriting images and sample print images;
the prediction result obtaining module is used for training a model to be trained based on the training sample set so as to obtain a prediction result of the model to be trained on the sample image, wherein the prediction result comprises a sample handwritten form prediction graph and a sample print form prediction graph;
the calculation module is used for determining a loss function value of the model to be trained according to the sample handwriting prediction graph, the sample handwriting image, the sample print prediction graph and the sample print image;
and the parameter updating module is used for updating the network parameters of the model to be trained according to the loss function values to carry out iterative training until the loss function values of the model to be trained are less than or equal to a preset value so as to obtain the handwriting erasing model.
According to another aspect of the present disclosure, there is provided a handwriting erasing apparatus, the apparatus including:
the image acquisition module is used for acquiring an image to be detected, wherein the image to be detected comprises print handwriting and handwriting;
the input module is used for inputting the image to be detected into a handwriting erasing model so as to obtain a handwriting prediction graph and a print prediction graph of the image to be detected, wherein the handwriting erasing model is obtained based on sample image training with a handwriting image and a print image;
and the processing module is used for eliminating the handwriting remained in the print prediction graph according to the handwriting prediction graph to obtain an image of the erased handwriting.
According to another aspect of the present disclosure, there is provided an electronic device including:
a processor; and
a memory for storing the program, wherein the program is stored in the memory,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform a method of training a handwriting-erasure model according to the one aspect or to perform a method of handwriting-erasure according to the other aspect.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method of training a handwriting-erasure model according to the preceding aspect or to perform the handwriting-erasure method of the preceding aspect.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program, wherein the computer program, when being executed by a processor, implements the method for training a handwriting-erasure model according to the preceding aspect or performs the method for handwriting-erasure according to the preceding aspect.
According to one or more technical schemes provided in the embodiment of the disclosure, a pre-trained handwriting erasing model is used for obtaining a handwriting prediction graph and a print prediction graph in an image to be detected, and then handwriting residue in the print prediction graph is eliminated according to the handwriting prediction graph, so that an image with handwriting erased can be obtained, and a better handwriting erasing effect is achieved.
Drawings
Further details, features and advantages of the disclosure are disclosed in the following description of exemplary embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 shows a flow diagram of a method of training a handwriting-erasure model according to an example embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a method of training a handwriting-erasure model, according to another example embodiment of the present disclosure;
FIG. 3 illustrates a network structure diagram of a model to be trained, according to an exemplary embodiment of the present disclosure;
FIG. 4 illustrates a flow chart of a method of training a handwriting-erasure model according to yet another exemplary embodiment of the present disclosure;
FIG. 5 shows a flow diagram of a handwriting erasure method according to an example embodiment of the present disclosure;
FIG. 6 schematically shows an image to be detected;
FIG. 7 illustrates a schematic diagram of a print prediction graph from a handwriting erasure model provided in accordance with the present disclosure;
FIG. 8 is a schematic diagram illustrating a handwriting prediction graph derived from a handwriting erasure model provided in accordance with the present disclosure;
fig. 9 is a diagram exemplarily illustrating an example of an erasing effect of a handwriting erasing according to the handwriting erasing method provided by the present disclosure;
FIG. 10 shows a schematic block diagram of a training apparatus for a handwriting-erasure model according to an exemplary embodiment of the present disclosure;
FIG. 11 shows a schematic block diagram of a handwriting erasure apparatus according to an example embodiment of the present disclosure;
FIG. 12 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The following describes a training method and apparatus for a handwriting-erasure model and a handwriting-erasure method and apparatus provided by the present disclosure with reference to the drawings.
Fig. 1 is a flowchart illustrating a method for training a handwriting-erasure model according to an exemplary embodiment of the present disclosure, where the method may be performed by a device for training a handwriting-erasure model, where the device may be implemented by software and/or hardware, and may be generally integrated in an electronic device, and the electronic device may be a terminal device such as a computer, and may also be a server. As shown in fig. 1, the method for training the handwriting-erasure model includes:
In the embodiment of the disclosure, a plurality of sample images including print handwriting and handwritten handwriting may be obtained from images published on the internet or collected offline, where the obtained sample images include, but are not limited to, educational scene documents such as test papers, and the obtained sample images are processed to obtain sample handwritten images and sample print images in the sample images, so as to obtain a training sample set. The sample handwritten form image comprises the handwritten handwriting of the sample image and does not comprise the print form handwriting, and the sample print form image comprises the print form handwriting of the sample image and does not comprise the handwritten handwriting. That is, in the embodiment of the present disclosure, the training sample set may be composed of a plurality of sample subsets, and one sample subset includes one sample image, the sample handwriting image in the sample image, and the sample print image in the sample image.
It should be noted that, in the embodiment of the present disclosure, the sizes of the sample image, the sample handwriting image, and the sample print image are the same, so as to facilitate the subsequent processing.
Optionally, because there is a difference in size between different sample images, in order to facilitate model training, the collected sample images may be subjected to a size correction process, so as to correct the sample images with different sizes into a uniform size, for example, uniformly scaling the sample images to a size of 512 × 512. In addition, the pixel values of all pixel points in the sample image can be normalized, the pixel value range of 0-255 in the sample image is normalized to be 0-1, and then the preprocessed sample image is utilized to construct a training sample set for model training.
And 102, training a model to be trained based on the training sample set to obtain a prediction result of the model to be trained on the sample image, wherein the prediction result comprises a sample handwriting prediction graph and a sample print prediction graph.
The model to be trained is an improved generation countermeasure network (GAN), which is different from the existing Pixel2Pixel method, handwriting area mask (mask) prediction is added on the original basis, a loss function is strengthened, and the model to be trained comprises two output branches which can respectively output a sample handwriting prediction graph and a sample printing prediction graph.
In the embodiment of the disclosure, after the training sample set is obtained, a plurality of sample images in the training sample set may be sequentially input to the model to be trained, the model to be trained performs classification detection and identification on the handwriting area and the printing area in the sample image, and outputs the corresponding sample handwriting prediction graph and the sample printing prediction graph as prediction results.
It can be understood that when the model to be trained predicts the handwriting and print of the input sample image, the model is mainly predicted according to the feature information extracted from the sample image, and the prediction result corresponding to the feature information is output. The extraction of the feature information may be obtained by performing convolution processing on the sample image, or may be obtained by performing extraction through a feature extraction network.
And 103, determining a loss function value of the model to be trained according to the sample handwritten form prediction graph, the sample handwritten form image, the sample print form prediction graph and the sample print form image.
In the embodiment of the present disclosure, after the sample handwriting prediction graph and the sample print prediction graph of the sample image output by the model to be trained are obtained, the loss function value of the model to be trained may be calculated according to the sample handwriting prediction graph, the sample handwriting image, the sample print prediction graph, and the sample print image.
For example, a predetermined loss function may be used to determine a loss function value for the model to be trained by calculating the difference between the sample handwriting prediction map and the sample handwriting image, and calculating the difference between the sample print prediction map and the sample print image.
And 104, updating the network parameters of the model to be trained according to the loss function values, and performing iterative training until the loss function values of the model to be trained are less than or equal to a preset value, so as to obtain the handwriting erasure model.
The preset value may be preset, for example, the preset value is set to 0.01, 0.001, and the like.
It can be understood that the training of the model is a repeated iteration process, and the trained model is obtained by calculating the loss function value after each iteration and continuously adjusting the network parameters of the model to perform iterative training when the loss function value is greater than the preset value until the overall loss function value of the model is less than or equal to the preset value or the overall loss function value of the model is not changed or the change amplitude is slow, and the model converges.
In the embodiment of the disclosure, the loss function value obtained by each calculation is compared with a preset value, if the loss function value is greater than the preset value, the network parameter of the model to be trained is updated to perform iterative training, the model to be trained is required to obtain the sample handwriting prediction image and the sample printing prediction image again after updating the network parameter, the loss function value after local iteration is recalculated and compared with the preset value again, and the iteration is performed until the loss function value of the model to be trained is less than or equal to the preset value, so that the trained handwriting erasure model is obtained.
Alternatively, in each iterative training process, a loss function value may be calculated according to the difference between the sample handwriting prediction image and the sample handwriting image, and a loss function value may be calculated according to the difference between the sample print prediction image and the sample print image, and the two calculated loss function values may be summed or weighted to obtain a loss function value of the whole model to be trained.
The training method of the handwriting erasure model of the embodiment of the disclosure includes obtaining a training sample set, the training sample set including a sample image, the sample image including a sample handwriting image and a sample print image, inputting the sample image into a model to be trained to obtain a prediction result of the model to be trained on the sample image, the prediction result including a sample handwriting prediction graph and a sample print prediction graph, then calculating a loss function value of the model to be trained according to the sample handwriting prediction graph, the sample handwriting image, the sample print prediction graph and the sample print image, and further updating a network parameter of the model to be trained according to the loss function value to perform iterative training until the loss function value of the model to be trained is less than or equal to a preset value to obtain the handwriting erasure model, thereby enabling the trained handwriting erasure model to detect and recognize handwriting and print in the image, the method provides conditions for processing the handwriting residue in the recognized print prediction image by using the recognized handwriting prediction image subsequently to obtain an image with clean handwriting erased, thereby being beneficial to improving the effect of handwriting erasing.
Optionally, in the process of training the model to be trained, the training iteration number may be counted, and when the training iteration number of the model to be trained reaches the iteration number threshold, the model to be trained is considered to be converged, so as to obtain the trained handwriting erasure model. The threshold of the number of iterations may be determined according to the number of sample images in the training sample set. And when the loss function value of the model to be trained in the training process is larger than the preset value and the training iteration number reaches the iteration number threshold value, ending the training to obtain the trained handwriting erasure model. By setting the iteration time threshold, when the training iteration time of the model to be trained reaches the iteration time threshold, the well-trained handwriting erasing model is obtained, the model training process can be ended in time when the loss function value of the model to be trained cannot be converged, and the condition that the model training cannot be ended due to the fact that the loss function value cannot be converged is avoided.
Fig. 2 shows a flowchart of a training method of a handwriting-erasure model according to another exemplary embodiment of the present disclosure, and as shown in fig. 2, step 102 may include the following sub-steps based on the embodiment shown in fig. 1:
In the embodiment of the disclosure, when the model to be trained is trained, each sample image in the training sample set may be sequentially input into the model to be trained, and the input sample image is convolved by the multilayer convolution layer of the model to be trained, so as to extract the feature information in the sample image, and obtain the output feature map.
The number of layers of the convolution layer in the model to be trained can be flexibly set according to actual requirements, and the number of layers is not limited by the disclosure.
In an optional embodiment of the present disclosure, when performing convolution processing on an input sample image to generate an output feature map, the sample image may be convolved and downsampled by using two layers of 3 × 3 convolutions to obtain a first feature map, and then the first feature map is used as an input, and the first feature map is convolved and downsampled by using the next two layers of 3 × 3 convolutions to obtain a second feature map; then, taking the second feature map as input, performing convolution processing on the second feature map by utilizing the subsequent two layers of 3-by-3 convolution and performing down-sampling to obtain a third feature map; next, the third feature map is input, and the third feature map is convolved by the following two layers of 3 × 3 convolution and down-sampled to obtain a fourth feature map. And then, taking the fourth feature map as an input, and performing convolution processing on the fourth feature map by using a layer of 1-by-1 convolution to obtain a fifth feature map. Secondly, performing up-sampling on the fifth feature map, performing convolution processing on the up-sampled feature map by using two layers of 3-by-3 convolutions, and performing channel fusion on the feature map obtained by convolution and the third feature map to obtain a sixth feature map; secondly, upsampling the sixth feature map, performing convolution processing on the upsampled feature map by using two layers of 3-by-3 convolutions, and then performing channel fusion on the feature map obtained by the convolution and the second feature map to obtain a seventh feature map; and finally, performing up-sampling on the seventh feature map, performing convolution processing on the up-sampled feature map by using two layers of 3-by-3 convolutions, and performing channel fusion on the feature map obtained by convolution and the first feature map to obtain an output feature map. It can be seen that, in the present embodiment, there are 15 convolutional layers for feature extraction in the model to be trained.
In the embodiment of the disclosure, a first feature map is obtained by performing convolution processing and down-sampling on a sample image by using two layers of 3 × 3 convolutions, a feature map generated by the last convolution is continuously performed by using two layers of 3 × 3 convolutions and down-sampling is performed to obtain a second feature map, a third feature map and a fourth feature map respectively, then, the fourth feature map is performed by using one layer of 1 × 1 convolution to obtain a fifth feature map, the fifth feature map is up-sampled and is convoluted by using two layers of 3 × 3 convolutions, then, the fifth feature map and the third feature map are subjected to channel fusion to obtain a sixth feature map, the sixth feature map is up-sampled and is convoluted by using two layers of 3 × 3 convolutions, then, the sixth feature map and the second feature map are subjected to channel fusion to obtain a seventh feature map, finally, the seventh feature map is up-sampled and is convoluted by using two layers of 3 convolutions, and performing channel fusion with the first feature map to obtain an output feature map, thereby extracting image features in the sample image through the multilayer convolution layer and performing channel fusion on the feature map, extracting more detailed features from the sample image, improving the quality of the output feature map and ensuring that more abundant features are extracted.
And step 202, performing convolution processing on the output characteristic diagram through a handwriting and erasing branch of the model to be trained to obtain a sample print prediction diagram of the sample image.
And 203, performing convolution processing on the output characteristic diagram through the handwriting mask branch of the model to be trained to obtain a sample handwriting prediction diagram of the sample image.
In the embodiment of the disclosure, the output layer of the model to be trained includes two branches, namely a handwriting erasing branch and a handwriting masking branch, wherein the handwriting erasing branch is used for outputting a sample print prediction graph for erasing handwriting, and the handwriting masking branch is used for outputting a sample handwriting prediction graph in an image. In the embodiment of the present disclosure, after the output feature map is obtained, the output feature map may be respectively input to the handwriting erasing branch and the handwriting mask branch of the model to be trained, the output feature map is convolved by the handwriting erasing branch to obtain a sample print prediction map of the input sample image, and the output feature map is convolved by the handwriting mask branch to obtain a sample handwriting prediction map of the input sample image.
For example, the output feature maps may be convolved twice by 3 x 3, respectively, to obtain a sample print prediction map and a sample handwriting prediction map.
Fig. 3 is a schematic diagram illustrating a network structure of a model to be trained according to an exemplary embodiment of the present disclosure, in fig. 3, a dotted line portion is a newly added handwriting mask branch for predicting a handwriting area in an input image. As shown in fig. 3, the size of the input image is 512 × 3, and after being convolved by two layers 3 × 3 and down-sampled, a first feature map with a size of 256 × 128 is obtained; taking the first feature map as the input of a subsequent convolution layer, performing convolution by two layers 3 × 3, and performing down-sampling to obtain a second feature map with the size of 128 × 256; taking the second feature map as an input of a subsequent convolution layer, performing 3 × 3 convolution on two layers, and performing down-sampling to obtain a third feature map with the size of 64 × 512; taking the third feature map as the input of a subsequent convolution layer, performing convolution by two layers of 3 × 3, and performing down-sampling to obtain a fourth feature map with the size of 32 × 1024; performing a layer of 1 × 1 convolution on the fourth feature map as an input to obtain a fifth feature map with the size of 32 × 1024; performing up-sampling on the fifth feature map, performing two-layer 3 × 3 convolution to obtain a 64 × 512 feature map, and performing channel merging on the obtained feature map and the third feature map to obtain a sixth feature map with the size of 64 × 1024; performing upsampling on the sixth feature map, performing two-layer 3 × 3 convolution to obtain a 128 × 256 feature map, and performing channel merging on the obtained feature map and the second feature map to obtain a seventh feature map with the size of 128 × 512; and finally, performing upsampling on the seventh feature map, performing two-layer 3 × 3 convolution to obtain 256 × 128 feature maps, and performing channel merging on the obtained feature maps and the first feature map to obtain an output feature map with the size of 256 × 256. Finally, the output feature maps are respectively subjected to 3 × 3 convolution twice, and a sample printed matter prediction map with the size of 512 × 3 and a sample handwritten matter prediction map with the size of 512 × 1 are obtained. Therefore, the sizes of the output sample print prediction image and the sample handwriting prediction image are the same as the size of the input image, and the calculation of the loss function value is facilitated. By training the model shown in fig. 3 to obtain a handwriting erasure model, a print area and a handwriting area in an image can be recognized.
The handwriting erasing model training method comprises the steps of inputting a sample image into a model to be trained, carrying out convolution processing on the sample image through a plurality of layers of convolution layers of the model to be trained to obtain an output characteristic diagram, carrying out convolution processing on the output characteristic diagram through a handwriting erasing branch of the model to be trained to obtain a sample print prediction diagram of the sample image, and carrying out convolution processing on the output characteristic diagram through a handwriting mask branch of the model to be trained to obtain a sample handwriting prediction diagram of the sample image.
In the embodiment of the disclosure, a mask branch capable of predicting a handwritten region in an image is added to a model to be trained on the basis of an original GAN model, and based on the mask branch, the loss function of the model is redefined, and loss calculation of the mask branch part is innovatively added. Thus, in one possible implementation manner of the embodiment of the present disclosure, as shown in fig. 4, on the basis of the embodiment shown in fig. 1, step 103 may include the following sub-steps:
In the embodiment of the present disclosure, after the sample handwriting prediction chart and the sample print prediction chart output by the model to be trained are obtained, the first loss function value may be calculated according to the sample print prediction chart and the sample print image based on the preset first loss function.
For example, when the first loss function value is calculated from the sample print prediction chart and the sample print image, the first loss function value may be calculated from a difference between pixel values of the same pixel points in the sample print prediction chart and the sample print image.
In the embodiment of the present disclosure, in addition to calculating the first loss function value from the sample pattern prediction chart and the sample pattern image based on the first loss function, the second loss function value is calculated from the sample pattern prediction chart and the sample pattern image based on the second loss function.
Wherein the second loss function and the first loss function are different predefined loss functions.
Illustratively, the second loss function value may be the average of the L1 losses for all pixel points between the sample print prediction map and the sample print image.
In the embodiment of the present disclosure, a third loss function value is calculated based on a third loss function according to the obtained sample handwriting prediction graph and the sample handwriting image of the corresponding sample image in the training sample set.
For example, the sample handwritten form prediction map and the sample handwritten form image may be regarded as a set, the set includes pixel points in the image, and then a preset set similarity measurement function is used to calculate a third loss function value.
And 304, determining a fourth loss function value according to the sample handwriting prediction graph and the sample print prediction graph based on the third loss function.
In the embodiment of the present disclosure, after the sample handwriting prediction graph and the sample print prediction graph output by the model to be trained are obtained, a fourth loss function value may be calculated based on the third loss function according to the sample handwriting prediction graph and the sample print prediction graph.
It should be noted that, in this embodiment of the present disclosure, the execution sequence of steps 301 to 304 is not sequential, and each step may be executed sequentially or simultaneously, which is not limited in this disclosure. Fig. 4 illustrates the present disclosure by way of example only, and not by way of limitation, as the sequential execution of steps.
In the embodiment of the present disclosure, after the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value are obtained through calculation, the loss function value of the model to be trained may be obtained through calculation according to the calculated loss values.
For example, the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value may be summed or weighted to obtain the loss function value of the model to be trained.
For example, the average of the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value may be calculated, and the obtained average may be used as the loss function value of the model to be trained.
In an alternative embodiment of the present disclosure, the first loss function may be a mean square error loss function, the second loss function may be an L1 norm loss function, and the third loss function may be a Dice loss function.
Illustratively, the loss function value of the model to be trained can be calculated by the following formula (1).
L=L GAN +L l1 +L GANmask +L mask (1)
In the above formula (1), the calculation method of each part is shown in the following formula (2):
wherein, in the above formula (2), L GAN Is a first loss function value; l is l1 Calculating the average value of the L1 losses of all pixel points between the sample print prediction image and the sample print image as a second loss function value; l is GANmask Is the third loss function value; l is mask Is the fourth loss function value; output represents a sample print prediction graph output by a model to be trained, and output i The ith pixel point, y, in the sample print prediction graph representing the output of the model to be trained i Representing a print image corresponding to an input sample image, input representing a sample handwritten image corresponding to a sample image, input i Representation to sample image correspondenceThe mask represents a sample handwriting prediction image output by the model to be trained, n represents the number of pixel points contained in the sample handwriting prediction image or the sample handwriting image, and i represents the ith pixel point in the image.
The training method of the handwriting erasure model according to the embodiment of the present disclosure determines a first loss function value from a sample print prediction chart and a sample print image based on a first loss function, determines a second loss function value from the sample print prediction chart and the sample print image based on a second loss function, determines a third loss function value from the sample handwriting prediction chart and the sample handwriting image based on a third loss function, and determines a fourth loss function value from the sample handwriting prediction chart and the sample print prediction chart based on the third loss function, and further determines a loss function value of a model to be trained based on the first loss function, the second loss function, the third loss function and the fourth loss function, thereby adding calculation of the third loss function value and the fourth loss function value to the original model loss function, redefining the loss function of the model, increasing the calculation of the fourth loss function value to enable the model to better learn the position information of the handwriting, improving the learning effect and accuracy of the position of the handwriting, increasing the calculation of the third loss function value to better guide the details and texture characteristics of the generated image, and being beneficial to extracting rich characteristic information.
In the embodiment of the present disclosure, the handwriting erasure model obtained by training in the foregoing embodiment can be used to detect and recognize the print and the handwriting in the image, and separate the print and the handwriting from the image. Fig. 5 is a flowchart illustrating a handwriting erasure method according to an exemplary embodiment of the present disclosure, where the handwriting erasure model may be trained by the handwriting erasure model training method described in the foregoing embodiment, and the handwriting erasure method may be executed by a handwriting erasure apparatus based on the handwriting erasure model, where the handwriting erasure apparatus may be implemented in software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 5, the handwriting erasing method may include the steps of:
In the embodiment of the disclosure, the image to be detected may be any image that needs to be erased by handwriting, for example, a test paper image written by a student, where the test paper image written by the student includes original print handwriting of the test paper and handwriting written by the student.
The handwriting prediction graph comprises a handwriting predicted from an image to be detected, the print prediction graph comprises a print predicted from the image to be detected, and the obtained print prediction graph possibly comprises partial handwriting residues due to the limited effect of the handwriting erasing model.
In the embodiment of the disclosure, for the acquired image to be detected, the image to be detected can be input into a trained handwriting erasing model, and the handwriting erasing model outputs a handwriting prediction graph and a print prediction graph contained in the image to be detected. The handwriting erasure model can be obtained by training through the handwriting erasure model training method described in the foregoing embodiment, and the training process of the handwriting erasure model is not repeated here.
For example, for the image to be detected shown in fig. 6, inputting it into the trained handwriting erasure model shown in fig. 3, the print prediction graph shown in fig. 7 and the handwriting prediction graph shown in fig. 8 can be obtained.
In an optional embodiment of the present disclosure, when the size of the sample image used in training to obtain the handwriting-erasure model is a specific size, for example, 512 × 3, before the image to be detected is input to the handwriting-erasure model, it may be determined whether the size of the image to be detected is the specific size, and if so, the image to be detected is directly input to the handwriting-erasure model; if not, the image to be detected may be pre-processed, including but not limited to scaling the image to be detected to a specific size. In addition, normalization processing can be carried out on the image to be processed, and the data range of 0-255 in the image is converted into the range of 0-1. And then, inputting the preprocessed image into a handwriting erasing model to obtain a handwriting prediction graph and a print prediction graph of the image.
And step 403, removing the handwriting remained in the print prediction graph according to the handwriting prediction graph to obtain an image of the erased handwriting.
Due to the effect limitation of the trained handwriting and erasing model, a part of handwriting may remain in the obtained print prediction graph, for example, the print prediction graph shown in fig. 7 still contains some handwriting, and the handwriting and erasing model has a poor effect of erasing the handwriting.
In view of the above problem, in the embodiment of the present disclosure, after the handwritten form prediction image and the print image prediction image of the image to be detected are obtained, the handwritten form handwriting remaining in the print image prediction image may be removed according to the obtained handwritten form prediction image, so as to obtain an image with clean handwritten form handwriting.
In an optional implementation manner of the present disclosure, the handwriting erasure model outputs the probability value of whether each pixel in the handwriting prediction graph is a handwriting region or not, and the higher the probability is, the higher the possibility that the pixel belongs to the handwriting region is. In this embodiment, the pixel points of the corresponding handwriting area in the print prediction graph can be found according to the probability value of whether each pixel point in the handwriting prediction graph corresponds to the handwriting area, and the pixel values of the pixel points are replaced with the pixel values (i.e., the paper color) of the pixel points in the background area in the print prediction graph, so that the purpose of removing the residual handwriting in the print prediction graph is achieved, and a clean print image is obtained.
According to the handwriting erasing method, the image to be detected is acquired, the image to be detected is input into the handwriting erasing model, the handwriting prediction graph and the print prediction graph of the image to be detected are acquired, the handwriting remained in the print prediction graph is eliminated according to the handwriting prediction graph, the image with the erased handwriting is acquired, therefore, the handwriting remained in the print prediction graph can be eliminated according to the handwriting prediction graph output by the handwriting erasing model, the final image with the erased handwriting, and a good handwriting erasing effect is achieved.
In an optional implementation manner of the present disclosure, when removing handwritten form handwriting remaining in a print prediction graph according to a handwritten form prediction graph to obtain an image after the handwritten form handwriting is erased, a prediction score of each pixel point in the handwritten form prediction graph may be obtained first, a handwritten form mask image is determined based on the prediction score, then the handwritten form mask image is matrix-multiplied by the print prediction graph to obtain a handwritten form residual image, a handwritten form residual region in the handwritten form residual image is filtered to obtain a handwritten form residual optimized image, and an image after the handwritten form handwriting is erased is determined based on the handwritten form residual image and the handwritten form residual optimized image. Therefore, the handwriting left in the print prediction graph can be eliminated, a cleaner print image can be obtained, and the erasing effect of the handwriting can be improved.
The prediction score of each pixel point in the handwriting prediction graph can be obtained through prediction based on a pre-trained handwriting erasure model, when the handwriting erasure model outputs the handwriting prediction graph, the prediction score of each pixel point in the handwriting prediction graph is output at the same time, the prediction score is used for expressing the probability that the corresponding pixel point is handwritten, and the higher the prediction score is, the higher the probability that the corresponding pixel point is handwritten is.
In the embodiment of the disclosure, for the handwriting prediction image output by the handwriting erasure model, the prediction score of each pixel point in the handwriting prediction image can be obtained, and then the handwriting mask image is determined based on the prediction score.
In an optional embodiment of the present disclosure, when determining a handwritten mask image based on a prediction score, the prediction score of each pixel point in the handwritten mask image may be compared with a preset threshold, where the prediction score is used to represent a probability that the corresponding pixel point is handwritten; and further setting the pixel value of the pixel point with the prediction score larger than the preset threshold value as 1, and setting the pixel value of the pixel point with the prediction score not larger than the preset threshold value as 0, so as to obtain the handwriting mask image. Therefore, the handwriting prediction image can be subjected to binarization processing to obtain a handwriting mask image only containing 0 and 1, and data support is provided for subsequent handwriting removal.
The preset threshold may be preset, for example, the preset threshold is set to 0.5.
It can be understood that the pixel values of the pixel points are reset according to the size relationship between the prediction score and the preset threshold value to obtain a handwritten form mask image, and then the handwritten form mask image and the print prediction image are subjected to matrix multiplication to obtain a handwritten residual image. The pixel value of 0 in the handwritten mask image is multiplied by the pixel value of the corresponding pixel point in the print prediction image, the obtained pixel value is 0, namely the pixel value of the corresponding pixel point in the handwritten residual image is 0, the pixel value of 1 in the handwritten mask image is multiplied by the pixel value of the corresponding pixel point in the print prediction image, and the obtained pixel value is consistent with the pixel value of the corresponding pixel point in the print prediction image. Because the predicted handwriting area in the handwriting prediction image is usually larger than the handwriting area remaining in the print prediction image, the handwriting residual image obtained after matrix multiplication contains pixels with pixel values of 0 and pixels of the handwriting residual area as well as pixels of the background area near the handwriting area, so that the handwriting residual area in the handwriting residual image is subjected to filtering processing, the pixel values of the pixels of the handwriting residual image in the surrounding background area can be utilized to optimize the pixel values of the pixels of the handwriting residual area in the handwriting residual image, so that the pixel values of the pixels of the handwriting residual area are close to the pixel values of the pixels of the surrounding background area, and the handwriting residual optimized image is obtained.
For example, the pixel values (corresponding to the paper color) of the pixel points in the background area may be obtained as reference pixel values, the pixel values of the pixel points in the handwritten residual image, which are not 0, are compared with the reference pixel values, and the pixel points different from the reference pixel values are the pixel points corresponding to the handwritten residual area. When filtering processing is performed on the handwriting residual area in the handwriting residual image, filtering manners such as mean filtering, median filtering, and the like may be adopted, which is not limited by the present disclosure.
Finally, based on the handwriting residual image and the handwriting residual optimized image, the image after handwriting erasing can be determined, the final image with the handwriting removed completely can be obtained, the obtained final image is free of handwriting, and a good handwriting erasing effect is achieved.
The handwriting residual image can be used for determining a handwriting residual position needing to be erased in the handwriting prediction image, and according to the determined position, the pixel value of the handwriting residual position in the handwriting prediction image can be weakened according to the pixel value of the pixel point at the same position in the handwriting prediction image, so that the purpose of removing the handwriting residual in the handwriting prediction image is achieved.
In an optional implementation manner of the present disclosure, when determining an image after erasing handwritten form handwriting based on a handwritten residual image and a handwritten residual optimized image, corresponding pixel points may be found from the handwritten residual optimized image and a print prediction image respectively according to positions of each pixel point included in a handwritten residual region in the handwritten residual image, and an average value between a pixel value of a corresponding pixel point in the handwritten residual optimized image and a pixel value of a corresponding pixel point in the print prediction image is calculated, and the obtained average value is used as a new pixel value of a corresponding pixel point in the print prediction image, so that the handwritten residual in the print prediction image is weakened by the pixel value of the corresponding pixel point in the handwritten residual optimized image, the perceptibility of the handwritten residual word in the print prediction image is reduced, and a better handwriting erasing effect is obtained.
In an optional implementation manner of the present disclosure, when determining an image after erasing handwritten form handwriting based on a handwritten residual image and a handwritten residual optimized image, pixel values of pixel points in a handwritten residual region in a print prediction image may be replaced with pixel values of corresponding pixel points in the handwritten residual optimized image, so as to obtain the image after erasing handwritten form handwriting.
Exemplarily, a corresponding pixel point can be found from the handwriting residual optimized image according to the position of each pixel point included in the handwriting residual image, and the pixel value of each found pixel point is used to replace the pixel value of the corresponding pixel point in the print prediction image, so as to obtain an image after the handwriting is erased, as shown in fig. 9. As can be seen from fig. 9, the finally obtained image has no handwriting, and the handwriting erasing effect is good.
In the embodiment of the disclosure, the image after erasing the handwritten character is obtained by replacing the pixel values of the pixels in the handwritten residual area in the print prediction graph with the pixel values of the corresponding pixels in the handwritten residual optimization image, so that the pixel values corresponding to the handwritten character remaining in the print prediction graph are replaced with the optimized pixel values, the handwritten character in the print prediction graph is weakened, the handwritten character is difficult to perceive, and a better handwriting erasing effect is achieved.
In an optional embodiment of the present disclosure, when acquiring the handwriting prediction image and the print prediction image of the image to be inspected, a first convolution layer of a handwriting erasure model may be utilized first, performing convolution and down-sampling on the image to be detected to obtain a first feature map, continuously performing down-sampling on a second convolution layer, a third convolution layer and a fourth convolution layer of the handwriting and erasure model to obtain a second feature map, a third feature map and a fourth feature map, wherein the second feature map is obtained by convolving and downsampling the first feature map with the second convolution layer, the third feature map is obtained by convolving and downsampling the second feature map with the third convolution layer, the fourth feature map is obtained by performing convolution and downsampling processing on the third feature map by the fourth convolution layer.
Wherein, the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer are all two layers of 3 x 3 convolution networks.
In the embodiment of the present disclosure, the image to be detected may be sequentially input into a plurality of continuous two-layer 3 × 3 convolution networks to perform convolution processing and downsampling, so as to obtain a first feature map, a second feature map, a third feature map, and a fourth feature map of different sizes, and then determine a handwriting prediction map and a print prediction map of the image to be detected based on the first feature map, the second feature map, the third feature map, and the fourth feature map.
For example, the second feature map, the third feature map and the fourth feature map may be respectively up-sampled to obtain feature maps with the same size as the first feature map, then the up-sampled feature maps and the first feature map are subjected to fusion processing to obtain fusion feature maps, and then the fusion feature maps are respectively subjected to convolution processing twice by 3 × 3 to obtain the handwriting prediction map and the print prediction map, respectively.
In an optional embodiment of the present disclosure, when determining the handwriting prediction graph and the print prediction graph of the image to be detected, a fifth convolution layer of the handwriting erasure model may be utilized to perform convolution processing on the fourth feature graph to obtain a fifth feature graph, perform upsampling and convolution processing on the fifth feature graph, and perform channel fusion with the third feature graph to obtain a sixth feature graph; then, performing up-sampling and convolution processing on the sixth feature map, and performing channel fusion with the second feature map to obtain a seventh feature map; and further performing upsampling and convolution processing on the seventh characteristic diagram, and performing channel fusion with the first characteristic diagram to obtain a handwritten form prediction diagram and a print form prediction diagram of the image to be detected.
Wherein, the fifth convolution layer can be a layer of 1 × 1 convolution network.
In the embodiment of the present disclosure, after the fourth feature map is obtained, the convolution processing may be continuously performed on the fourth feature map by using one layer of 1 × 1 convolution to obtain a fifth feature map, the up-sampling may be performed on the fifth feature map, the convolution processing may be performed by using two layers of 3 × 3 convolution, then the channel fusion may be performed on the fifth feature map and the third feature map to obtain a sixth feature map, the up-sampling may be performed on the sixth feature map, the convolution processing may be performed by using two layers of 3 × 3 convolution, then the channel fusion may be performed on the seventh feature map, the channel fusion may be performed on the seventh feature map and the first feature map to obtain a fusion feature map, and then the convolution processing may be performed on the fusion feature map by using two layers of 3 × 3 respectively to obtain a handwriting prediction map and a printout-of-print volume prediction map, thereby, image features in the sample image may be extracted by using a multi-layer convolution layer and the feature maps may be subjected to channel fusion, more detail features can be extracted from the sample image, the quality of an output feature map is improved, and extraction of richer features is guaranteed.
The exemplary embodiment of the present disclosure also provides a training device for a handwriting and erasing model. Fig. 10 shows a schematic block diagram of a training apparatus for a handwriting-erasure model according to an exemplary embodiment of the present disclosure, and as shown in fig. 10, the training apparatus 50 for a handwriting-erasure model includes: a sample set obtaining module 501, a prediction result obtaining module 502, a calculating module 503 and a parameter updating module 504.
The system comprises a sample set acquisition module 501, a sample print analysis module, a sample set analysis module and a sample analysis module, wherein the sample set acquisition module 501 is used for acquiring a training sample set, the training sample set comprises sample images, and the sample images comprise sample handwriting images and sample print images;
a prediction result obtaining module 502, configured to train a model to be trained based on the training sample set, so as to obtain a prediction result of the model to be trained on the sample image, where the prediction result includes a sample handwritten form prediction diagram and a sample print form prediction diagram;
a calculating module 503, configured to determine a loss function value of the model to be trained according to the sample handwriting prediction graph, the sample handwriting image, the sample print prediction graph, and the sample print image;
and a parameter updating module 504, configured to update the network parameter of the model to be trained according to the loss function value, and perform iterative training until the loss function value of the model to be trained is less than or equal to a preset value, so as to obtain the handwriting erasure model.
Optionally, the prediction result obtaining module 502 includes:
the first processing unit is used for inputting the sample image into the model to be trained and carrying out convolution processing on the sample image through the multilayer convolution layer of the model to be trained to obtain an output characteristic diagram;
the second processing unit is used for performing convolution processing on the output characteristic diagram through a handwriting and erasing branch of the model to be trained to obtain a sample print prediction diagram of the sample image;
and the third processing unit is used for performing convolution processing on the output characteristic diagram through the handwriting mask branch of the model to be trained to obtain a sample handwriting prediction diagram of the sample image.
Optionally, the first processing unit is further configured to:
performing convolution processing and down-sampling on the sample image by utilizing two layers of 3-by-3 convolution to obtain a first feature map;
continuously utilizing two layers of 3-by-3 convolution to carry out convolution processing on the feature map generated by the last convolution and down-sampling to respectively obtain a second feature map, a third feature map and a fourth feature map; the second feature map is obtained by performing convolution processing on the first feature map and downsampling the first feature map, the third feature map is obtained by performing convolution processing on the second feature map and downsampling the second feature map, and the fourth feature map is obtained by performing convolution processing on the third feature map and downsampling the third feature map;
performing convolution processing on the fourth feature map by using a layer of 1-by-1 convolution to obtain a fifth feature map;
performing up-sampling on the fifth feature map, performing convolution processing by using two layers of 3-by-3 convolution, and performing channel fusion with the third feature map to obtain a sixth feature map;
performing up-sampling on the sixth feature map, performing convolution processing by using two layers of 3-by-3 convolution, and performing channel fusion with the second feature map to obtain a seventh feature map;
and performing up-sampling on the seventh feature map, performing convolution processing by using two layers of 3-by-3 convolution, and performing channel fusion with the first feature map to obtain the output feature map.
Optionally, the calculating module 503 is further configured to:
determining a first loss function value according to the sample print prediction graph and the sample print image based on a first loss function;
determining a second loss function value according to the sample print prediction graph and the sample print image based on a second loss function;
determining a third loss function value according to the sample handwriting prediction image and the sample handwriting image based on a third loss function;
determining a fourth loss function value according to the sample handwriting prediction graph and the sample print prediction graph based on the third loss function;
and determining a loss function value of the model to be trained according to the first loss function, the second loss function, the third loss function and the fourth loss function.
Optionally, the first loss function is a mean square error loss function, the second loss function is an L1 norm loss function, and the third loss function is a Dice loss function.
The training device for the handwriting erasure model, provided by the embodiment of the disclosure, can execute any training method for the handwriting erasure model, which can be applied to electronic equipment such as a terminal and the like, and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.
The exemplary embodiment of the present disclosure further provides a handwriting erasing apparatus based on the handwriting erasing model, and the handwriting erasing model can be obtained by training using the training method of the handwriting erasing model described in the foregoing embodiment. Fig. 11 shows a schematic block diagram of a handwriting erasing apparatus according to an exemplary embodiment of the present disclosure, and as shown in fig. 11, the handwriting erasing apparatus 60 may include: an image acquisition module 601, an input module 602, and a processing module 603.
The image acquisition module 601 is configured to acquire an image to be detected, where the image to be detected includes print handwriting and handwriting;
an input module 602, configured to input the image to be detected to a handwriting and erasing model to obtain a handwriting prediction graph and a print prediction graph of the image to be detected, where the handwriting and erasing model is obtained by training based on a sample image with a handwriting image and a print image;
and the processing module 603 is configured to remove the handwriting remained in the print prediction graph according to the handwriting prediction graph, so as to obtain an erased image of the handwriting.
Optionally, the processing module 603 includes:
the acquisition unit is used for acquiring the prediction score of each pixel point in the handwritten prediction graph;
a first determination unit configured to determine a handwritten mask image based on the prediction score;
the processing unit is used for carrying out matrix multiplication on the handwritten mask image and the print prediction image to obtain a handwritten residual image;
the filtering unit is used for carrying out filtering processing on a handwriting residual area in the handwriting residual image to obtain a handwriting residual optimized image;
and the second determining unit is used for determining the image after the handwriting of the handwriting is erased based on the handwriting residual image and the handwriting residual optimized image.
Optionally, the second determining unit is further configured to:
and replacing the pixel values of the pixel points of the handwriting residual area in the print prediction graph with the pixel values of the corresponding pixel points in the handwriting residual optimization image to obtain an image with the handwriting erased.
Optionally, the first determining unit is further configured to:
comparing the prediction score of each pixel point in the handwritten form prediction graph with a preset threshold value, wherein the prediction score is used for representing the probability that the corresponding pixel point is handwritten;
setting the pixel value of the pixel point with the prediction score larger than the preset threshold value as 1, and setting the pixel value of the pixel point with the prediction score not larger than the preset threshold value as 0 to obtain the handwriting mask image.
Optionally, the input module 602 includes:
the first convolution unit is used for performing convolution and down-sampling processing on an image to be detected by utilizing the first convolution layer of the handwriting erasure model to obtain a first characteristic diagram;
a second convolution unit configured to perform downsampling processing successively using a second convolution layer, a third convolution layer, and a fourth convolution layer of the handwriting erasure model to obtain a second feature map, a third feature map, and a fourth feature map, wherein the second feature map is obtained by performing convolution and downsampling processing on the first feature map by the second convolution layer, the third feature map is obtained by performing convolution and downsampling processing on the second feature map by the third convolution layer, and the fourth feature map is obtained by performing convolution and downsampling processing on the third feature map by the fourth convolution layer;
and the third determining unit is used for determining the handwriting prediction graph and the print prediction graph of the image to be detected based on the first feature graph, the second feature graph, the third feature graph and the fourth feature graph.
Optionally, the third determining unit is further configured to:
performing convolution processing on the fourth feature map by using a fifth convolution layer of the handwriting erasure model to obtain a fifth feature map;
performing upsampling and convolution processing on the fifth feature map, and performing channel fusion on the fifth feature map and the third feature map to obtain a sixth feature map;
performing upsampling and convolution processing on the sixth feature map, and performing channel fusion with the second feature map to obtain a seventh feature map;
and performing upsampling and convolution processing on the seventh characteristic diagram, and performing channel fusion on the seventh characteristic diagram and the first characteristic diagram to obtain a handwritten form prediction diagram and a print form prediction diagram of the image to be detected.
Optionally, the handwriting erasing apparatus further includes: the training module is used for training to obtain a handwriting erasure model;
wherein the training module is further configured to:
acquiring a training sample set, wherein the training sample set comprises sample images, and the sample images comprise sample handwritten form images and sample print form images;
training a model to be trained based on the training sample set to obtain a prediction result of the model to be trained on the sample image, wherein the prediction result comprises a sample handwriting prediction graph and a sample print prediction graph;
determining a loss function value of the model to be trained according to the sample handwritten form prediction graph, the sample handwritten form image, the sample print form prediction graph and the sample print form image;
and updating the network parameters of the model to be trained according to the loss function values to perform iterative training until the loss function values of the model to be trained are less than or equal to a preset value, so as to obtain the handwriting erasure model.
Optionally, the loss function values of the model to be trained comprise a first loss function value and a second loss function value,
wherein the first loss function value and the second loss function value are determined based on a sample print prediction image and a sample print image output by a model to be trained.
Optionally, the first loss function value is:
the second loss function value is:
wherein L is GAN Representing the first loss function value, L l1 Representing the second loss function value, output i The ith pixel point, y, in the sample print prediction graph representing the output of the model to be trained i And n represents the number of pixels included in the sample print prediction chart or the sample print image.
Optionally, the loss function values of the model to be trained comprise a third loss function value and a fourth loss function value,
and the fourth loss function value is determined based on the sample print prediction image and the sample handwriting prediction image output by the model to be trained.
Optionally, the third loss function value is:
the fourth loss function value is:
wherein, L is GANmask Represents the value of the third loss function, L mask And expressing the fourth loss function value, input expressing a sample handwriting image corresponding to the sample image, mask expressing a sample handwriting prediction graph output by the model to be trained, and output expressing a sample printing prediction graph output by the model to be trained.
The handwriting erasing device provided by the embodiment of the disclosure can execute any handwriting erasing method which can be applied to electronic equipment such as a terminal and the like and has corresponding functional modules and beneficial effects of the execution method. Reference may be made to the description of any method embodiment of the disclosure that may not be described in detail in the embodiments of the apparatus of the disclosure.
An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a handwriting-erasure method or a training method of a handwriting-erasure model according to an embodiment of the present disclosure.
The exemplary embodiments of the present disclosure also provide a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a handwriting-erasure method or a training method of a handwriting-erasure model according to an embodiment of the present disclosure.
Exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when being executed by a processor of a computer, is adapted to cause the computer to carry out a method of training a handwriting-erasure model or a method of handwriting-erasure according to embodiments of the present disclosure.
Referring to fig. 12, a block diagram of a structure of an electronic device 1100, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 12, the electronic device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM1102, and the RAM1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
A number of components in electronic device 1100 connect to I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, and the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. Storage unit 1108 may include, but is not limited to, a magnetic or optical disk. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.
The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above. For example, in some embodiments, the handwriting-erasure model training method or the handwriting-erasure method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1100 via the ROM1102 and/or the communication unit 1109. In some embodiments, the computing unit 1101 may be configured to perform a handwriting-erasure method or a training method of a handwriting-erasure model by any other suitable means (e.g., by means of firmware).
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Claims (14)
1. A handwriting wipe method, comprising:
acquiring an image to be detected, wherein the image to be detected comprises print handwriting and handwriting;
inputting the image to be detected into a handwriting erasing model to obtain a handwriting prediction graph and a print prediction graph of the image to be detected, wherein the handwriting erasing model is obtained based on sample image training with the handwriting image and the print image;
and removing the residual handwritten handwriting in the print prediction graph according to the handwritten character prediction graph to obtain an image with the erased handwritten handwriting.
2. The method of claim 1, wherein the removing the handwriting remained in the print prediction graph according to the handwriting prediction graph to obtain the erased image of the handwriting comprises:
obtaining the prediction score of each pixel point in the handwritten form prediction graph;
determining a handwriting mask image based on the prediction score;
performing matrix multiplication on the handwritten form mask image and the print form prediction image to obtain a handwritten residual image;
filtering a handwriting residual area in the handwriting residual image to obtain a handwriting residual optimized image;
and determining the image after the handwriting is erased based on the handwriting residual image and the handwriting residual optimized image.
3. The method of claim 2, wherein determining the erased image of the handwritten script based on the handwritten residual image and the handwritten residual optimized image comprises:
and replacing the pixel values of the pixel points of the handwriting residual area in the print prediction graph with the pixel values of the corresponding pixel points in the handwriting residual optimization image to obtain an image with the handwriting erased.
4. The method of claim 2, wherein said determining a handwriting mask image based on said prediction score comprises:
comparing the prediction score of each pixel point in the handwritten form prediction graph with a preset threshold value, wherein the prediction score is used for representing the probability that the corresponding pixel point is handwritten;
and setting the pixel value of the pixel point with the prediction score larger than the preset threshold value as 1, and setting the pixel value of the pixel point with the prediction score not larger than the preset threshold value as 0 to obtain the handwriting mask image.
5. The method according to any one of claims 1 to 4, wherein inputting the image to be detected into the handwriting and erasing model to obtain a handwriting prediction map and a print prediction map of the image to be detected comprises:
performing convolution and downsampling processing on an image to be detected by utilizing the first convolution layer of the handwriting erasure model to obtain a first characteristic diagram;
performing downsampling processing successively using a second convolutional layer, a third convolutional layer and a fourth convolutional layer of the handwriting-erasure model to obtain a second feature map, a third feature map and a fourth feature map, wherein the second feature map is obtained by performing convolution and downsampling processing on the first feature map by the second convolutional layer, the third feature map is obtained by performing convolution and downsampling processing on the second feature map by the third convolutional layer, and the fourth feature map is obtained by performing convolution and downsampling processing on the third feature map by the fourth convolutional layer;
and determining a handwriting prediction graph and a print prediction graph of the image to be detected based on the first feature graph, the second feature graph, the third feature graph and the fourth feature graph.
6. The method according to claim 5, wherein determining the handwriting prediction graph and the print prediction graph of the image to be detected based on the first feature graph, the second feature graph, the third feature graph and the fourth feature graph comprises:
performing convolution processing on the fourth feature map by using a fifth convolution layer of the handwriting erasure model to obtain a fifth feature map;
performing up-sampling and convolution processing on the fifth feature map, and performing channel fusion with the third feature map to obtain a sixth feature map;
performing upsampling and convolution processing on the sixth feature map, and performing channel fusion with the second feature map to obtain a seventh feature map;
and performing up-sampling and convolution processing on the seventh characteristic diagram, and performing channel fusion with the first characteristic diagram to obtain a handwriting prediction diagram and a print prediction diagram of the image to be detected.
7. The method of any of claims 1-4, wherein the handwriting-erasure model is trained based on:
acquiring a training sample set, wherein the training sample set comprises sample images, and the sample images comprise sample handwritten form images and sample print form images;
training a model to be trained on the basis of the training sample set to obtain a prediction result of the model to be trained on the sample image, wherein the prediction result comprises a sample handwriting prediction graph and a sample print prediction graph;
determining a loss function value of the model to be trained according to the sample handwritten form prediction graph, the sample handwritten form image, the sample print form prediction graph and the sample print form image;
and updating the network parameters of the model to be trained according to the loss function values to carry out iterative training until the loss function values of the model to be trained are less than or equal to a preset value, so as to obtain the handwriting erasure model.
8. The method of claim 7, wherein the loss function values of the model to be trained comprise a first loss function value and a second loss function value,
wherein the first loss function value and the second loss function value are determined based on a sample print prediction image and a sample print image output by the model to be trained.
9. The method of claim 8, wherein the first loss function value is:
the second loss function value is:
wherein L is GAN Representing the first loss function value, L l1 Representing the second loss function value, output i The ith pixel point, y, in the sample print prediction graph representing the output of the model to be trained i And n represents the number of pixels included in the sample print prediction chart or the sample print image.
10. The method of claim 7, wherein the loss function values of the model to be trained comprise a third loss function value and a fourth loss function value,
and the fourth loss function value is determined based on the sample print prediction image and the sample handwriting prediction image output by the model to be trained.
11. The method of claim 10, wherein the third loss function value is:
the fourth loss function value is:
wherein, L is GANmask Represents the value of the third loss function, L mask And expressing the fourth loss function value, input expressing a sample handwriting image corresponding to the sample image, mask expressing a sample handwriting prediction graph output by the model to be trained, and output expressing a sample printing prediction graph output by the model to be trained.
12. A handwriting wipe apparatus, comprising:
the image acquisition module is used for acquiring an image to be detected, wherein the image to be detected comprises print handwriting and handwriting;
the input module is used for inputting the image to be detected into the handwriting erasing model so as to obtain a handwriting prediction graph and a print prediction graph of the image to be detected, wherein the handwriting erasing model is obtained based on sample image training with a handwriting image and a print image;
and the processing module is used for eliminating the handwriting remained in the print prediction graph according to the handwriting prediction graph to obtain an image of the erased handwriting.
13. An electronic device, comprising:
a processor; and
a memory for storing a program, wherein the program is stored in the memory,
wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the handwriting wipe method of any of claims 1-11.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the handwriting wipe method of any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210660738.5A CN115019323B (en) | 2022-06-13 | 2022-06-13 | Handwriting erasing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210660738.5A CN115019323B (en) | 2022-06-13 | 2022-06-13 | Handwriting erasing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115019323A true CN115019323A (en) | 2022-09-06 |
CN115019323B CN115019323B (en) | 2024-07-19 |
Family
ID=83074329
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210660738.5A Active CN115019323B (en) | 2022-06-13 | 2022-06-13 | Handwriting erasing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115019323B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488881A (en) * | 2020-04-10 | 2020-08-04 | 杭州睿琪软件有限公司 | Method, device and storage medium for removing handwritten content in text image |
CN111626284A (en) * | 2020-05-26 | 2020-09-04 | 广东小天才科技有限公司 | Method and device for removing handwritten fonts, electronic equipment and storage medium |
CN114255242A (en) * | 2021-11-17 | 2022-03-29 | 科大讯飞股份有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN114332150A (en) * | 2021-12-28 | 2022-04-12 | 武汉天喻信息产业股份有限公司 | Handwriting erasing method, device, equipment and readable storage medium |
CN114399771A (en) * | 2021-12-09 | 2022-04-26 | 南京红松信息技术有限公司 | Handwriting font removing method based on convolution self-adaptive denoising |
-
2022
- 2022-06-13 CN CN202210660738.5A patent/CN115019323B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111488881A (en) * | 2020-04-10 | 2020-08-04 | 杭州睿琪软件有限公司 | Method, device and storage medium for removing handwritten content in text image |
CN111626284A (en) * | 2020-05-26 | 2020-09-04 | 广东小天才科技有限公司 | Method and device for removing handwritten fonts, electronic equipment and storage medium |
CN114255242A (en) * | 2021-11-17 | 2022-03-29 | 科大讯飞股份有限公司 | Image processing method, image processing device, electronic equipment and storage medium |
CN114399771A (en) * | 2021-12-09 | 2022-04-26 | 南京红松信息技术有限公司 | Handwriting font removing method based on convolution self-adaptive denoising |
CN114332150A (en) * | 2021-12-28 | 2022-04-12 | 武汉天喻信息产业股份有限公司 | Handwriting erasing method, device, equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115019323B (en) | 2024-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229591B (en) | Neural network adaptive training method and apparatus, device, program, and storage medium | |
CN112380566A (en) | Method, apparatus, electronic device, and medium for desensitizing document image | |
CN110598686A (en) | Invoice identification method, system, electronic equipment and medium | |
CN110866529A (en) | Character recognition method, character recognition device, electronic equipment and storage medium | |
CN113781356B (en) | Training method of image denoising model, image denoising method, device and equipment | |
CN113850238B (en) | Document detection method and device, electronic equipment and storage medium | |
CN112906794A (en) | Target detection method, device, storage medium and terminal | |
KR20110051374A (en) | Apparatus and method for processing data in terminal having touch screen | |
CN112700460A (en) | Image segmentation method and system | |
CN116434252A (en) | Training of image recognition model and image recognition method, device, medium and equipment | |
CN114120349A (en) | Test paper identification method and system based on deep learning | |
CN113469148B (en) | Text erasing method, model training method, device and storage medium | |
CN114005019B (en) | Method for identifying flip image and related equipment thereof | |
CN114332458A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN113516697A (en) | Image registration method and device, electronic equipment and computer-readable storage medium | |
CN116863017A (en) | Image processing method, network model training method, device, equipment and medium | |
CN113850239B (en) | Multi-document detection method and device, electronic equipment and storage medium | |
CN115019323B (en) | Handwriting erasing method and device, electronic equipment and storage medium | |
CN116977195A (en) | Method, device, equipment and storage medium for adjusting restoration model | |
CN115273057A (en) | Text recognition method and device, dictation correction method and device and electronic equipment | |
CN114708580A (en) | Text recognition method, model training method, device, apparatus, storage medium, and program | |
CN114387315A (en) | Image processing model training method, image processing device, image processing equipment and image processing medium | |
CN114120305A (en) | Training method of text classification model, and recognition method and device of text content | |
CN118056222A (en) | Cascaded multi-resolution machine learning for image processing with improved computational efficiency | |
CN112699875A (en) | Character recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |