CN113901952A

CN113901952A - Print form and handwritten form separated character recognition method based on deep learning

Info

Publication number: CN113901952A
Application number: CN202111309350.2A
Authority: CN
Inventors: 方海泉
Original assignee: Zhejiang Xingsuan Technology Co ltd
Current assignee: Zhejiang Xingsuan Technology Co ltd
Priority date: 2021-11-06
Filing date: 2021-11-06
Publication date: 2022-01-07

Abstract

A method for recognizing characters by separating print form from handwriting based on deep learning. The method comprises the following steps: after preprocessing, classifying and identifying the print form and the handwritten form by using a deep learning model, respectively obtaining the print form picture and the handwritten form picture according to the classification and identification results, and finally respectively carrying out character identification on the print form picture and the handwritten form picture. The method can realize the separated character recognition of the print form and the handwritten form for the document picture containing the print form and the handwritten form simultaneously, is completely automatic, intelligent and self-adaptive, and does not need to manually set parameters.

Description

Print form and handwritten form separated character recognition method based on deep learning

Technical Field

The invention relates to a print and handwriting separated character recognition method based on deep learning, and belongs to the field of artificial intelligence vision.

Background

The text recognition of the document picture is a relatively mature technology, but when the document picture contains both the print and the handwriting, the existing text recognition technology is to put the print and the handwriting together for recognition, and the print and the handwriting cannot be separated. The method has many important applications of respectively carrying out character recognition on document pictures containing printed forms and handwritten forms, such as character recognition of bank notes, automatic correction of student test papers, conversion of litigation documents into electronic files and the like. The key technical problem to realize the character recognition of the printed form and the handwritten form respectively is the classification recognition of the printed form and the handwritten form. The print and handwriting classification problem can be defined as: for pictures containing both print and handwriting, it is desirable to implement pixel-level classification of print, handwriting and background in the picture. The traditional machine vision method is difficult to realize pixel-level classification recognition on print and handwriting, especially when the print and the handwriting have cross overlap. The semantic segmentation technology in the deep learning method can realize pixel level classification and identification, and can well solve the problem. Popular algorithms in semantic segmentation techniques include full convolution neural network (FCN) and FCN with hole convolution.

Disclosure of Invention

The invention adopts the following technical scheme for realizing the purpose of separating the character recognition of the printed form and the handwritten form:

a print form and handwriting form separated character recognition method based on deep learning comprises the following steps.

The method comprises the following steps of (1) making a training sample data set, wherein the training sample data set comprises the following steps:

(1.1) preparing a piece of paper, wherein the printing font on the paper is required to be black, and the blank of the paper is white;

(1.2) writing on paper with a red pen;

(1.3) the picture obtained by photographing the paper on which the characters are written is recorded as a picture a. This step can also be scanned with a scanner;

and (1.4) preprocessing the graph a through an algorithm program, wherein the preprocessing comprises frame cutting and binarization to form a black-and-white graph, the obtained result graph is a graph b, and the graph b is an input sample of the training model. The print and the handwriting of fig. b are black, the background becomes white;

(1.5) the print, handwriting and background in figure a are classified at pixel level by an algorithm program, and the result is recorded as figure c. The classification principle of the algorithm program is that the red handwriting is easily distinguished from the black printing and the white background based on the different sizes of the pixel values corresponding to different colors. Graph c is an output sample of the training model. Here, the pixel of the background is represented by 0, the pixel of the print font is represented by 1, and the pixel of the handwritten font is represented by 2. Therefore, pixel-level labeling of handwritten fonts, printed fonts and backgrounds is realized.

And (2) establishing a deep learning model, which is mainly an artificial intelligence model in the field of image semantic segmentation, wherein the artificial intelligence model can be a full convolution neural network (FCN), a hole convolution (scaled convolution), an FCN model with hole convolution, a segNet network, a U-Net network or the like.

The model training in the step (3) is to input training samples into the model for training, and comprises the following steps:

(3.1) input of training samples

Inputting the graph b as an input of a training sample into the artificial intelligence model;

(3.2) output of training samples

And c is taken as the target output of the training sample. Graph c includes 3 categories, denoted respectively as: the printing form is marked as 1, the handwritten font is marked as 2, and the background is marked as 0;

and (3.3) training the model by preparing the model and the training sample, wherein the training of the model can be trained on a personal computer, or on a CPU server or a GPU server. If the sample size is large, it is best trained on the GPU server. After training the model, the model is saved. Meanwhile, the model is detected by using a detection sample, the classification accuracy of the model is detected, and the method can be used for practical application if the classification accuracy is high.

And (4) acquiring a picture containing a print form and a handwritten form, wherein the method comprises the following steps: the paper containing the print and the handwriting is photographed by a mobile phone, a scanner or a high-speed camera.

And (5) preprocessing the picture, including picture correcting, shadow removing and the like. Picture rectification may also be understood as border cropping. Unshaded may also be understood as binarized.

And (6) inputting the preprocessed picture into a trained model for detection to obtain a pixel-level classification result graph, wherein the method comprises the following steps:

(6.1) loading the trained model;

(6.2) dividing the preprocessed picture into N small pictures, for example, N =4 may be taken. The division into small pictures is used for improving the detection speed;

and (6.3) inputting each small picture into the model for detection, and obtaining a classification result through parallel calculation by a multithreading method.

And (7) respectively obtaining a print form picture and a handwritten form picture according to the classification recognition result, so that the separation of the print form and the handwritten form is realized.

And (8) respectively performing character recognition on the separated print form and the handwriting picture, wherein the character recognition software can adopt a hundred-degree open source program PaddleOCR.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 contains both print and handwritten original pictures;

FIG. 3 isolates the print from the original picture;

FIG. 4 separates handwriting from original pictures;

FIG. 5 is a diagram illustrating character recognition of the separated print;

fig. 6 performs character recognition on the separated handwriting.

Detailed Description

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

FIG. 1 is a flow chart of the present invention, which includes the following steps.

(1.2) writing on paper with a red pen;

(1.5) the print, handwriting and background in figure a are classified at pixel level by an algorithm program, and the result is recorded as figure c. The classification principle of the algorithm program is that the red handwriting is easily distinguished from the black printing and the white background based on the different sizes of the pixel values corresponding to different colors. Graph c is an output sample of the training model. Here, the pixel of the background is represented by 0, the pixel of the print font is represented by 1, and the pixel of the handwritten font is represented by 2. Thereby realizing the pixel level labeling of the handwritten font, the printed font and the background;

(1.6) to achieve the desired effect, the sample number is much more beneficial, for example, 2200 sheets can be made, 2000 sheets are used for training and 200 sheets are used for detection. The languages of characters in the currently produced pictures are mainly Chinese, English and mathematical characters. The same method can be used to make samples of pictures in other languages.

In consideration of practical engineering application, the constructed model needs to be as simple as possible and has high classification accuracy. Too complex models are computationally expensive and time consuming. An example of one of the models is as follows:

(1) the input layer of the model is a 3-channel graph of 1024 multiplied by 1024;

(2) a void convolution layer having 32 feature maps, a convolution kernel of 5 × 5, and an expansion ratio = 2;

(3) dropout layer with a Dropout probability of 20%;

(4) a void convolution layer having 32 feature maps, a convolution kernel of 5 × 5, and an expansion ratio = 3;

(5) a pooling layer with a sampling factor of 2 × 2;

(6) a hole convolution layer having 64 feature maps, a convolution kernel of 5 × 5, and an expansion ratio = 2;

(7) dropout layer with a Dropout probability of 20%;

(8) a hole convolution layer having 64 feature maps, a convolution kernel of 5 × 5, and an expansion ratio = 3;

(9) a pooling layer with a sampling factor of 2 × 2;

(10) a hole convolution layer having 64 feature maps, a convolution kernel of 5 × 5, and an expansion ratio = 2;

(11) dropout layer with a Dropout probability of 20%;

(12) a hole convolution layer having 64 feature maps, a convolution kernel of 5 × 5, and an expansion ratio = 3;

(13) a convolution layer having 32 feature maps, the convolution kernel being 1 × 1;

(14) the deconvolution layer has 3 signatures, a convolution kernel of 4 × 4 and a step size of 4 × 4.

(3.1) input of training samples

(3.2) output of training samples

And (4) acquiring a picture containing a print form and a handwritten form, wherein the method comprises the following steps: the paper containing the print and the handwriting is photographed by a mobile phone, a scanner or a high-speed camera. Fig. 2 is a part of the original image obtained by photographing.

(6.1) loading the trained model;

And (7) respectively obtaining a print form picture and a handwritten form picture according to the classification recognition result, so that the separation of the print form and the handwritten form is realized. Fig. 3 shows a printed form separated from the original picture, and fig. 4 shows a handwritten form separated from the original picture.

And (8) respectively performing character recognition on the separated print form and the handwritten form picture, wherein character recognition software can adopt a Baidu open source program PaddleOCR, for example, FIG. 5 shows the result of character recognition on the separated print form, and FIG. 6 shows the result of character recognition on the separated handwritten form. Most of the recognition is correct as can be seen from the character recognition result.

Those skilled in the art will appreciate that the invention may be practiced without these specific details.

Claims

1. A print form and handwriting form separated character recognition method based on deep learning is characterized in that: the method comprises the following steps:

(1) making a training sample, namely making a training sample data set containing handwritten fonts and printed fonts;

(2) establishing a deep learning model;

(3) training a model;

(4) acquiring a picture containing a print form and a handwritten form;

(5) preprocessing the picture;

(6) detecting the image input model to obtain a pixel level classification result graph;

(7) respectively obtaining a print picture and a handwritten picture according to the classification and identification results;

(8) and respectively carrying out character recognition on the printed form and the handwritten form picture.

2. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: the method comprises the following steps of (1) making a training sample data set, wherein the training sample data set comprises the following steps:

(1.2) writing on paper with a red pen;

(1.3) taking a picture of the paper with the written characters, and recording the picture as a picture a, wherein the step can be scanned by a scanner;

(1.4) preprocessing the graph a by an algorithm program, wherein the preprocessing comprises frame cutting and binary conversion into a black-white graph, the obtained result graph is marked as a graph b, the graph b is an input sample of a training model, the print and the handwriting of the graph b are black, and the background is changed into white;

(1.5) carrying out pixel-level classification on the print, the handwriting and the background in the graph a through an algorithm program, and recording an obtained result as a graph c; the classification principle of the algorithm program is that based on the fact that pixel values corresponding to different colors are different in size, a red handwritten form is easily distinguished from a black printed form and a white background; FIG. c is an output sample of the training model; here, the pixel of the background is represented by 0, the pixel of the printed font is represented by 1, and the pixel of the handwritten font is represented by 2; therefore, pixel-level labeling of handwritten fonts, printed fonts and backgrounds is realized.

3. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: and (2) establishing a deep learning model, which is mainly an artificial intelligence model in the field of image semantic segmentation, wherein the artificial intelligence model can be a full convolution neural network (FCN), a hole convolution (scaled convolution), an FCN model with hole convolution, a segNet network, a U-Net network or the like.

4. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: the model training in the step (3) is to input training samples into the model for training, and comprises the following steps:

(3.1) input of training samples

(3.2) output of training samples

Taking the graph c as the target output of the training sample, the graph c includes 3 categories, which are respectively recorded as: the printing form is marked as 1, the handwritten font is marked as 2, the background is marked as 0, the model can be trained after the model and the training sample are prepared, and the training of the model can be carried out on a personal computer, a CPU (central processing unit) server or a GPU (graphics processing unit) server; if the sample size is large, training is preferably carried out on a GPU server, and the model is stored after the model is trained; meanwhile, the model is detected by using a detection sample, the classification accuracy of the model is detected, and the method can be used for practical application if the classification accuracy is high.

5. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: and (4) acquiring a picture containing a print form and a handwritten form, wherein the method comprises the following steps: the paper containing the print and the handwriting is photographed by a mobile phone, a scanner or a high-speed camera.

6. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: and (5) preprocessing the picture, wherein the preprocessing comprises picture correction, shadow removal and the like, the picture correction can also be understood as frame clipping, and the shadow removal can also be understood as binarization.

7. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: and (6) inputting the preprocessed picture into a trained model for detection to obtain a pixel-level classification result graph, wherein the method comprises the following steps:

(6.1) loading the trained model;

(6.2) dividing the preprocessed picture into N small pictures, for example, N =4, wherein the division into the small pictures is to improve the detection speed;

8. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: and (7) respectively obtaining a print form picture and a handwritten form picture according to the classification recognition result, so that the separation of the print form and the handwritten form is realized.

9. The method for print-to-handwriting separated character recognition based on deep learning according to claim 1, wherein: and (8) respectively performing character recognition on the separated print form and the handwriting picture, wherein the character recognition software can adopt a hundred-degree open source program PaddleOCR.