CN112364838A - Method for improving handwriting OCR performance by utilizing synthesized online text image - Google Patents
Method for improving handwriting OCR performance by utilizing synthesized online text image Download PDFInfo
- Publication number
- CN112364838A CN112364838A CN202011429519.3A CN202011429519A CN112364838A CN 112364838 A CN112364838 A CN 112364838A CN 202011429519 A CN202011429519 A CN 202011429519A CN 112364838 A CN112364838 A CN 112364838A
- Authority
- CN
- China
- Prior art keywords
- content
- style
- image
- data set
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 4
- 238000010586 diagram Methods 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 38
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 11
- 239000002131 composite material Substances 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 6
- 238000013508 migration Methods 0.000 claims description 6
- 230000005012 migration Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 2
- 238000012015 optical character recognition Methods 0.000 abstract description 21
- 239000012190 activator Substances 0.000 description 4
- 230000003042 antagnostic effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/22—Character recognition characterised by the type of writing
- G06V30/226—Character recognition characterised by the type of writing of cursive writing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Character Discrimination (AREA)
- Image Processing (AREA)
Abstract
The invention provides a method for improving the performance of handwritten OCR (optical character recognition) by utilizing a synthesized online text image, which comprises the following steps: step S1, selecting and dividing a data set, and adopting an IAM data set, wherein the IAM data set comprises an IAM handwriting data set and an IAM online handwriting data set; step S2, constructing a generator of the style GAN network, wherein the generator comprises three parts, namely a content encoder, a content decoder and a style encoder; step S3, training a generator in the network; and step S4, synthesizing the text image in the online data set through the trained generator network model. The handwriting image generated by the framework can effectively improve the OCR recognition accuracy, and a feasible alternative scheme is provided for collecting and constructing a large-scale handwriting data set.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a method for improving handwriting OCR performance by utilizing a synthesized online text image.
Background
The generation of countermeasure networks (GANs) has become a popular research direction in the field of deep learning. The GAN network actually contains 2 networks, one being a Generator network (Generator) and the other being a Discriminator network (Discriminator). The two networks may be neural networks, from convolutional neural networks, recurrent neural networks to autoencoders. In this configuration, the two networks participate in a competitive game and attempt to overtake each other while helping them complete their own tasks. After thousands of iterations, if everything is successful, the generator network, i.e., the generator, can perfectly generate a realistic false image, and the discriminator network, i.e., the discriminator, can well judge whether the image is real or false. The core of GAN is actually the generator, and as for why there is a discriminator, mainly to introduce countertraining by which the generator network can generate high quality pictures. The generator aims to generate pictures which are as vivid as possible in the training process to enable the discriminator to judge whether the picture is a real picture or a generated false picture, and the discriminator aims to distinguish the real picture and the false picture as far as possible in the training process, so that the generator hopes that the error rate of the discriminator is maximized, the discriminator hopes that the error rate of the discriminator is minimized, and the discriminator are mutually confronted and progress in competition.
In the field of handwriting OCR, an ideal training set should be able to cover various writing styles, background and light and shadow variations, all possible occurring vocabularies, etc. for a deep learning-based handwriting OCR engine. However, acquiring such a training set is time consuming and labor intensive, and in some cases severely limits the accuracy of handwriting OCR recognition. The importance of the composition of the handwritten image is thus apparent. Owing to the recent development of antagonistic generation networks, many scholars have proposed methods for generating handwritten-style text lines from text or printed-style text lines. However, the handwriting style generated by the method is still not rich enough, and the style of the handwritten characters is lacked. Another idea of handwritten text line generation is to convert online handwritten data to offline images. The online data can be conveniently acquired by devices such as a mobile phone and a writing board, the data volume is large, the style is variable, and if the online data can be converted into a vivid offline handwritten image, the training of handwriting OCR can be well assisted.
Disclosure of Invention
The invention aims to provide a method for improving handwriting performance by utilizing a synthesized online text image, so as to ensure the recognition accuracy of a handwriting OCR (optical character recognition), and convert the handwriting image into a vivid offline handwriting image for assisting OCR training.
To achieve the above object, the present invention provides a method for improving the performance of handwritten OCR by using a synthesized online text image, comprising the steps of:
step S1, selecting and dividing a data set: adopting an IAM data set, wherein the IAM data set comprises an IAM handwriting data set and an IAM online handwriting data set; the real image IgtObtaining a real handwritten image I after data processingsty,And storing the data in an IAM handwriting data set; step S2, constructing a generator of the style GAN network, wherein the generator comprises a content encoder, a content decoder and a style encoder;
step S3, training the generator of GAN network: for real hand-written image IstyPerforming skeletonization operation to obtain a skeleton diagram IskeThen the skeleton diagram I is drawnskeThe content features are extracted by inputting the content into the content encoder, a feature map is generated and output to a content decoder; will be a true hand-written image IstyInputting the style features into a style encoder to extract style features, and outputting a 512-dimensional style vector s obtained through global pooling; the content decoder receives the feature map from the content encoder and receives the feature map from the content encoderThe component obtained by affine transformation of the style vector s of the style encoder changes the distribution on the level of the feature diagram by adopting the operation of AdaIN, blends style information into the feature diagram, and outputs a composite diagram Isyn;
Step S4, synthesizing the text image in the online data set through the trained generator: and converting the IAM online handwritten data into a skeleton diagram, selecting a test set from the IAM handwritten data set as a style diagram, inputting the test set into a generator together, and generating an offline handwritten image.
Further, in step S2, the content encoder, with reference to the network structure of VGG-19, is composed of a plurality of pooling layers and convolutional layers for down-sampling the input; the content encoder includes five convolutional layer modules and three full-link layers, respectively.
Further, in the step S3, further, in the step S3, the AdaIN changes the data distribution at the aspect diagram level, and the effect of the style migration can be achieved by controlling and changing the affine parameters in the AdaIN layer; the AdaIN layer changes the distribution of the characteristic diagram inside the network, hands over the task of the format migration to AdaIN, and realizes other tasks on the network structure, wherein AdaIN is operated as follows:
the feature map after the content picture coding represented by X is represented by a style vector s through affine sigma and mu, and is a mean value and a standard deviation.
Further, in step S2, the content decoder has a structure similar to that of the decoder part in U-Net, and the structure is symmetrical to that of the content encoder, and a composite handwritten text picture with the same input size is generated by using multiple layers of convolution and bilinear upsampling layers; the content decoder includes five convolutional layer modules.
Further, the stepsIn S3, for skeleton diagram I with height H and width WskeThe content encoder converts the feature map into a feature map, and the input of the style encoder is a handwritten image I from the truestyOutputting a 512-dimensional style vector s obtained through global pooling; the content encoder generates an image I with a height H and a width W using multiple layers of convolution and bilinear upsampling layerssynThe style vector s output by the style encoder adjusts the feature map of the middle layer of the content decoder in the mode of AdaIN so as to blend the style into the final output I of the content decodersyn。
Further, in the step S3, by adopting three loss functions, including a content loss function, a perceptual loss function and an antagonistic loss function, the synthetic picture generated by the training generator is as realistic as possible and has a style corresponding to the real picture;
the content loss function accepts the difference from the content feature pixel level between the synthesized text image and the real image to optimize the best parameters, and the generator generates I the synthesized imagesynThe true hand-written image is Igt:
The perceptual loss function solves the problem that the content loss function causes the generated image to be excessively smooth. Using a pre-trained VGG-19 network on ImageNet, in the feature space after relu activation function after the first convolution layer of five convolution modules:
calculating a composite map IsynAnd a real graph IgtThe characteristic difference is as follows:
MSE represents a mean square error function, and alpha values are 1/32, 1/16, 1/8, 1/4 and 1 respectively;
the countermeasure loss function adopts the method of patchGAN, and the arbiter network of former GAN network has been changed into the full convolution network, and patchGAN's arbiter network differentiates a small region of input image receptive field and outputs, makes the model more can focus on image details, and the convolution that superposes one by one finally outputs N × N's matrix, and each element wherein actually represents a bigger receptive field in the original image, corresponds to the patch in the original image, specifically as follows:
d represents a network of discriminators for determining the probability of the comparison between the two being true or false.
Compared with the prior art, the invention has the beneficial effects that: a GAN frame is provided, which adopts the structure of an encoder-decoder, and uses a style encoder to extract style characteristics from a real handwritten image as conditional input of the decoder; the method has good advantages for maintaining high-resolution and high-detail images, and can improve the fidelity of local areas of the generated images; the generated synthetic image is used as a real handwritten text image to train an OCR model, so that the precision is greatly improved; after the generator network structure is trained, the obtained generator network model can convert online handwritten data into a vivid offline handwritten image for assisting OCR training and further contributing to improving the recognition accuracy of the OCR model; the handwriting image generated by the framework can effectively improve the OCR recognition accuracy, and a feasible alternative scheme is provided for collecting and constructing a large-scale handwriting data set.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a block diagram of a stylistic GAN network of the present invention;
FIG. 3 is a diagram of the training process of the GAN network of the present invention;
fig. 4 is a structural diagram of a generator in the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
1-4, the present invention provides a method for improving handwritten OCR performance using synthesized online text images, comprising the steps of:
step S1, selecting and dividing a data set, wherein an IAM data set is adopted and comprises a plurality of types of data sets, wherein the data sets comprise an IAM handwriting data set and an IAM online handwriting data set; in the IAM handwritten data set, there are 6,161 offline handwritten text lines in the training set, the height is normalized to H =96, the width is set to W =3H, 1840 text lines in the verification set, 1861 lines in the test set, and 13049 text lines in the IAM online handwritten data set. Wherein, the real image IgtObtaining a real handwritten image I after data processingsty,And storing the data in an IAM handwriting data set;
step S2, constructing a generator of the style GAN network, wherein the generator comprises three parts, namely a content encoder, a content decoder and a style encoder; the content encoder comprises five convolutional layer modules and three fully-connected layers, wherein the first convolutional layer module conv1 comprises two convolutional layers, the convolutional core of the convolutional layer is (3 x 64), a ReLU activator is arranged behind each convolutional layer, and a pooling layer is arranged behind two adjacent convolutional modules for down-sampling; the second convolutional layer module conv2 contains two convolutional layers (3 × 128), each followed by a ReLU activator; the third convolution module comprises four convolution layers (3 x 256), each convolution layer is followed by a ReLU activation function, and the activation function of the last convolution layer is followed by a pooling layer for down-sampling; the fourth convolution module contains four convolution layers (3 x 512), each convolution layer is followed by a ReLU activation function, and the activation function of the last convolution layer is followed by a pooling layer for down-sampling; the fifth convolution module comprises four convolution layers, the convolution kernel of each convolution layer is (3 × 512), the last convolution layer is connected with a pooling layer, the full-connection layer FC4096 is sequentially connected with a ReLU activation function, the full-connection layer FC4096, the ReLU activation function, the full-connection layer FC1000 and a softmax classifier, the result is output, the skeletonized skeleton map is used as input, an image space is converted into a feature space through a content encoder to extract the content features of the handwritten character picture to form a feature map, and then the feature map is input into a content decoder;
for the style encoder, it is identical to the structure of the content encoder except that a real handwritten image I is receivedstyOutputting the style vector s as a 512-dimensional style vector s obtained through global pooling, and adjusting the feature map of the middle layer of the content decoder by the style vector s in an AdaIN operation mode, so that the style signs extracted by the style encoder can be combined with the content features of the style signs, and the style signs extracted by the style encoder can be combined with the content features of the style signs; AdaIN is the effect of changing data distribution on the aspect of a feature diagram, and the effect of style migration can be realized by controlling and changing affine parameters in an AdaIN layer. Because the AdaIN layer changes the distribution of feature maps within the network, the task of format migration can be handed over to AdaIN, implementing other tasks on the network fabric, where AdaIN operates:
wherein X represents the coded characteristic diagram V of the content pictureσ,VμThe method is obtained by affine transformation of a style vector s, parameters of affine transformation are different for different layers of a content decoder, and mean values and standard deviations are represented by sigma and mu;
for a content decoder, the structure of the content decoder is similar to that of a decoder part in U-Net, the structure of the content decoder is symmetrical to that of a content encoder, and a composite handwritten text picture with the same input size is generated by utilizing a multilayer convolution and a bilinear upsampling layer; including five convolutional layer modules, the first convolutional layer module conv1 contains four convolutional layers, the convolutional kernels of which are (3 × 512), each convolutional layer is followed by a ReLU activator, and two adjacent convolutional modules have an upsampled layer to amplify the image; the second convolutional layer module conv2 contains four convolutional layers (3 × 512), each of which is followed by a ReLU activator; the third convolution module contains four convolution layers (3 × 256), each followed by a ReLU activation function; the fourth convolution module has two convolution layers, the convolution kernel of which is (3 × 128); the fifth convolution module has two convolution layers of (3 x 64). An activation function is arranged behind each convolution layer, an up-sampling layer is arranged between two adjacent convolution modules to amplify the size of an image, wherein the up-sampling layer adopts a bilinear method to perform up-sampling, a feature diagram of each size is output, a component obtained by a style encoder after affine transformation of a style vector s is received, the distribution of the feature diagram is changed on the feature diagram level by adopting AdaIN operation, and style features in the style encoder are blended into a content feature diagram;
and step S3, training a generator of the GAN network, and converting the skeleton diagram into a handwritten image which has the style of a real diagram and the same and lifelike content through the generator. For training to generateThe device performs skeletonization operation on a real image to obtain a skeleton drawing I of the real imageskeThen the skeleton diagram I is drawnskeInputting into content encoder for content feature extraction, and extracting real image IgtThe input to the genre encoder extracts the genre features therein, and the output of the content encoder is then fed to the content decoder. For the skeleton diagram I with the height of H and the width of WskeThe input of the style encoder is from a real hand-written image IgtOutputting a 512-dimensional style vector s obtained through global pooling; the content encoder generates an image I of height H and width W using multiple layers of 3 x 3 convolution and bilinear upsampling layerssynThe style vector s output by the style encoder adjusts the feature map of the middle layer of the content decoder in the mode of AdaIN so as to blend the style into the final output I of the content decodersyn;
Wherein the skeleton diagram IskeWith said true handwritten image IstyForm aPaired training data Iske-IstyAnd the sample pair is used for extracting a verification set and a test set from the IAM handwriting data set, and detecting the effect of the synthetic picture of the generator network model so as to train the generator.
Three loss functions are adopted in the training experiment of the invention, including a content loss function, a perception loss function and an antagonistic loss function; the content loss function accepts the difference from the content feature pixel level between the synthesized text image and the real image to optimize the best parameters, and the generator generates the synthesized image as IsynThe real image is Isyn:
The perceptual loss function solves the problem that the content loss function causes the generated image to be excessively smooth. Using a pre-trained VGG-19 network on ImageNet, the feature space after the ReLU activation function after the first convolution layer of five convolution modules:
calculating a composite map IsynAnd a real image IgtThe characteristic difference is as follows:
MSE represents the mean square error function and α takes values of 1/32, 1/16, 1/18, 1/4 and 1, respectively.
The method of PatchGAN is adopted for the resistance loss function, the original GAN discriminator network is replaced by a full convolution network, a general GAN network only needs to output a true or false vector, and the PatchGAN discriminator network discriminates and outputs a small area of an input image receptive field, so that the model can pay more attention to image details through training; because the convolution layer that superposes one by one finally outputs a matrix of N × N, each element therein actually represents a relatively large receptive field in the original image, and corresponds to the patch in the original image, so that the method has a good advantage for maintaining the high-resolution and high-detail image, and can improve the fidelity of the generated image local area, specifically as follows:
d represents a discriminator network for judging the probability of the comparison between the two being true or not;
and step S4, synthesizing the text images in the online data set through the trained generator. And converting the IAM online handwriting data into a skeleton diagram, selecting a test set from the IAM handwriting data set as a style diagram, inputting the style diagram into a generator together, and generating an offline handwriting image, namely generating a very vivid offline handwriting image. Experiments show that the generated synthetic image is used as a real handwritten text image to train an OCR model, and compared with an image synthesized by adopting other algorithms to synthesize online data, the accuracy is greatly improved; and the OCR model is trained in a mode of training the generated image and the real image together, so that the recognition accuracy of the OCR model is improved.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A method for improving handwritten OCR performance using a composite online text image, comprising the steps of:
step S1, selecting and dividing a data set: adopting an IAM data set, wherein the IAM data set comprises an IAM handwriting data set and an IAM online handwriting data set; the real image IgtObtaining a real hand-written picture after data processingLike Isty,And storing the data in an IAM handwriting data set;
step S2, constructing a generator of the style GAN network, wherein the generator comprises a content encoder, a content decoder and a style encoder;
step S3, training the generator of GAN network: for real hand-written image IstyPerforming skeletonization operation to obtain a skeleton diagram IskeThen the skeleton diagram I is drawnskeThe content features are extracted by inputting the content into the content encoder, a feature map is generated and output to a content decoder; will be a true hand-written image IstyInputting the style features into a style encoder to extract style features, and outputting a 512-dimensional style vector s obtained through global pooling; the content decoder receives the characteristic diagram from the content encoder, receives the component obtained by affine transformation of the style vector s from the style encoder, changes the distribution of the characteristic diagram on the level of the characteristic diagram by adopting the operation of AdaIN, blends style information into the characteristic diagram, and outputs a composite diagram Isyn;
Step S4, synthesizing the text image in the online data set through the trained generator: and converting the IAM online handwritten data into a skeleton diagram, selecting a test set from the IAM handwritten data set as a style diagram, inputting the test set into a generator together, and generating an offline handwritten image.
2. The method of claim 1, wherein in step S2, the content encoder, with reference to the network structure of VGG-19, is composed of a plurality of pooling layers and convolutional layers for down-sampling the input; the content encoder includes five convolutional layer modules and three full-link layers, respectively.
3. The method of claim 1, wherein in step S3, the skeleton map I is used to improve the performance of OCRskeWith said true handwritten image IstyForm pairs of training data Iske-IstySample pairAnd then extracting a verification set and a test set from the IAM handwriting data set, and detecting the effect of the synthetic picture of the generator network model for training the generator.
4. The method of claim 1, wherein in step S3, AdaIN is changing data distribution at feature map level, and the effect of style migration can be achieved by controlling changing affine parameters in AdaIN layer; the AdaIN layer changes the distribution of the characteristic diagram inside the network, hands over the task of the format migration to AdaIN, and realizes other tasks on the network structure, wherein AdaIN is operated as follows:
the characteristic diagram of the content picture after being coded, which is expressed by X, is obtained by affine transformation of a style vector s, and parameters of the affine transformation are different for different layers of a content decoder; σ and μ represent mean and standard deviation.
5. The method of claim 1, wherein in step S2, the content decoder has a structure similar to the decoder part in U-Net, and a structure symmetrical to the structure of the content encoder, and generates a synthesized handwritten text picture with the same input size by using multi-layer convolution and bilinear upsampling layers; the content decoder includes five convolutional layer modules.
6. The method of claim 1, wherein in step S3, for skeleton drawing I with height H and width WskeThe input of the style encoder is from a real hand-written image IstyThe output is 512-dimensional wind obtained by global poolingA lattice vector s; the content encoder generates an image I of height H and width W using multiple layers of 3 x 3 convolution and bilinear upsampling layerssynThe style vector s output by the style encoder adjusts the feature map of the middle layer of the content decoder in an AdaIN mode so as to blend the style of the feature map into the final output I of the content decodersyn。
7. The method of claim 1, wherein in step S3, the composite picture generated by the training generator is as realistic as possible and has a style corresponding to a real picture by taking three loss functions, including a content loss function, a perceptual loss function, and an anti-loss function;
the content loss function accepts the difference from the content feature pixel level between the synthesized text image and the real image to optimize the best parameters, and the generator generates I the synthesized imagesynThe real image is Igt:
The perception loss function solves the problem that the generated image is excessively smooth due to the content loss function; using a pre-trained VGG-19 network on ImageNet, in the feature space after relu activation function after the first convolution layer of five convolution modules:
calculating a composite map IsynAnd a real graph IgtThe characteristic difference is as follows:
MSE represents a mean square error function, and alpha values are 1/32, 1/16, 1/18, 1/4 and 1 respectively;
the countermeasure loss function adopts the method of patchGAN, and the arbiter network of former GAN network has been changed into the full convolution network, and patchGAN's arbiter network differentiates a small region of input image receptive field and outputs, makes the model more can focus on image details, and the convolution that superposes one by one finally outputs N × N's matrix, and each element wherein actually represents a bigger receptive field in the original image, corresponds to the patch in the original image, specifically as follows:
d represents a network of discriminators for determining the probability of the comparison between the two being true or false.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011429519.3A CN112364838B (en) | 2020-12-09 | 2020-12-09 | Method for improving handwriting OCR performance by utilizing synthesized online text image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011429519.3A CN112364838B (en) | 2020-12-09 | 2020-12-09 | Method for improving handwriting OCR performance by utilizing synthesized online text image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112364838A true CN112364838A (en) | 2021-02-12 |
CN112364838B CN112364838B (en) | 2023-04-07 |
Family
ID=74536790
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011429519.3A Active CN112364838B (en) | 2020-12-09 | 2020-12-09 | Method for improving handwriting OCR performance by utilizing synthesized online text image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112364838B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113190161A (en) * | 2021-04-25 | 2021-07-30 | 无锡乐骐科技有限公司 | Electronic writing exercise method based on convolutional neural network |
CN113516136A (en) * | 2021-07-09 | 2021-10-19 | 中国工商银行股份有限公司 | Handwritten image generation method, model training method, device and equipment |
CN113554549A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Text image generation method and device, computer equipment and storage medium |
CN114973279A (en) * | 2022-06-17 | 2022-08-30 | 北京百度网讯科技有限公司 | Training method and device for handwritten text image generation model and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
CN110992252A (en) * | 2019-11-29 | 2020-04-10 | 北京航空航天大学合肥创新研究院 | Image multi-format conversion method based on latent variable feature generation |
US20200134415A1 (en) * | 2018-10-30 | 2020-04-30 | Huawei Technologies Co., Ltd. | Autoencoder-Based Generative Adversarial Networks for Text Generation |
WO2020140421A1 (en) * | 2019-01-03 | 2020-07-09 | Boe Technology Group Co., Ltd. | Computer-implemented method of training convolutional neural network, convolutional neural network, computer-implemented method using convolutional neural network, apparatus for training convolutional neural network, and computer-program product |
CN111402179A (en) * | 2020-03-12 | 2020-07-10 | 南昌航空大学 | Image synthesis method and system combining countermeasure autoencoder and generation countermeasure network |
CN111753493A (en) * | 2019-09-29 | 2020-10-09 | 西交利物浦大学 | Style character generation method containing multiple normalization processes based on small amount of samples |
-
2020
- 2020-12-09 CN CN202011429519.3A patent/CN112364838B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200134415A1 (en) * | 2018-10-30 | 2020-04-30 | Huawei Technologies Co., Ltd. | Autoencoder-Based Generative Adversarial Networks for Text Generation |
WO2020140421A1 (en) * | 2019-01-03 | 2020-07-09 | Boe Technology Group Co., Ltd. | Computer-implemented method of training convolutional neural network, convolutional neural network, computer-implemented method using convolutional neural network, apparatus for training convolutional neural network, and computer-program product |
CN110443864A (en) * | 2019-07-24 | 2019-11-12 | 北京大学 | A kind of characters in a fancy style body automatic generation method based on single phase a small amount of sample learning |
CN111753493A (en) * | 2019-09-29 | 2020-10-09 | 西交利物浦大学 | Style character generation method containing multiple normalization processes based on small amount of samples |
CN110992252A (en) * | 2019-11-29 | 2020-04-10 | 北京航空航天大学合肥创新研究院 | Image multi-format conversion method based on latent variable feature generation |
CN111402179A (en) * | 2020-03-12 | 2020-07-10 | 南昌航空大学 | Image synthesis method and system combining countermeasure autoencoder and generation countermeasure network |
Non-Patent Citations (2)
Title |
---|
JUNYANG CAI ET AL: "TH-GAN: Generative Adversarial Network based Transfer Learning for Historical Chinese Character Recognition", 《2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)》 * |
肖奋溪: "基于深度学习的汉字字体风格迁移", 《中国优秀博硕士学位论文全文数据库(硕士)》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113190161A (en) * | 2021-04-25 | 2021-07-30 | 无锡乐骐科技有限公司 | Electronic writing exercise method based on convolutional neural network |
CN113516136A (en) * | 2021-07-09 | 2021-10-19 | 中国工商银行股份有限公司 | Handwritten image generation method, model training method, device and equipment |
CN113554549A (en) * | 2021-07-27 | 2021-10-26 | 深圳思谋信息科技有限公司 | Text image generation method and device, computer equipment and storage medium |
CN114973279A (en) * | 2022-06-17 | 2022-08-30 | 北京百度网讯科技有限公司 | Training method and device for handwritten text image generation model and storage medium |
CN114973279B (en) * | 2022-06-17 | 2023-02-17 | 北京百度网讯科技有限公司 | Training method and device for handwritten text image generation model and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112364838B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112364838B (en) | Method for improving handwriting OCR performance by utilizing synthesized online text image | |
Jiang et al. | Scfont: Structure-guided chinese font generation via deep stacked networks | |
CN109635883B (en) | Chinese character library generation method based on structural information guidance of deep stack network | |
CN111340122B (en) | Multi-modal feature fusion text-guided image restoration method | |
CN108537733B (en) | Super-resolution reconstruction method based on multi-path deep convolutional neural network | |
CN111179167B (en) | Image super-resolution method based on multi-stage attention enhancement network | |
CN107679462B (en) | Depth multi-feature fusion classification method based on wavelets | |
CN110276354B (en) | High-resolution streetscape picture semantic segmentation training and real-time segmentation method | |
CN113096017B (en) | Image super-resolution reconstruction method based on depth coordinate attention network model | |
Peyrard et al. | ICDAR2015 competition on text image super-resolution | |
CN110163801A (en) | A kind of Image Super-resolution and color method, system and electronic equipment | |
CN110853039B (en) | Sketch image segmentation method, system and device for multi-data fusion and storage medium | |
CN110070517A (en) | Blurred picture synthetic method based on degeneration imaging mechanism and generation confrontation mechanism | |
CN114581905B (en) | Scene text recognition method and system based on semantic enhancement mechanism | |
CN113378949A (en) | Dual-generation confrontation learning method based on capsule network and mixed attention | |
CN107578455A (en) | Arbitrary dimension sample texture synthetic method based on convolutional neural networks | |
US20230162409A1 (en) | System and method for generating images of the same style based on layout | |
CN114626984A (en) | Super-resolution reconstruction method for Chinese text image | |
Li et al. | Line drawing guided progressive inpainting of mural damages | |
CN116935043A (en) | Typical object remote sensing image generation method based on multitasking countermeasure network | |
CN109766918A (en) | Conspicuousness object detecting method based on the fusion of multi-level contextual information | |
CN116152374A (en) | Chinese character font generating method | |
CN116703725A (en) | Method for realizing super resolution for real world text image by double branch network for sensing multiple characteristics | |
CN115713462A (en) | Super-resolution model training method, image recognition method, device and equipment | |
CN117292017B (en) | Sketch-to-picture cross-domain synthesis method, system and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |