CN110321830B

CN110321830B - Chinese character string picture OCR recognition method based on neural network

Info

Publication number: CN110321830B
Application number: CN201910576921.5A
Authority: CN
Inventors: 胡铮; 张春红; 唐晓晟; 李杭
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2020-11-13
Anticipated expiration: 2039-06-28
Also published as: CN110321830A

Abstract

The invention discloses a neural network-based OCR (optical character recognition) method for a Chinese character string picture, and belongs to the field of optical character recognition. Firstly, collecting a plurality of pictures to be identified, and respectively normalizing each pixel value in each picture; and simultaneously initializing each class of the neural network model, and initializing the central feature vector of each class. And then, respectively inputting the normalized pixels in each picture to be recognized into a neural network model for feature extraction to obtain a depth feature matrix of each picture, performing Pooling operation through RoI Pooling, and extending features to obtain a feature vector with the length of L. Finally, dividing the characteristic vector into a training sample and a test sample, and training the neural network model by the training sample; and the characteristic vector of each test sample is respectively connected to the fully-connected layer of the trained neural network model, the category of each test sample is output, and the overall recognition of each picture character string is completed. The method and the device can classify and identify the whole character string picture, and the identification accuracy is higher.

Description

Chinese character string picture OCR recognition method based on neural network

Technical Field

The invention belongs to the field of optical character recognition, and relates to content recognition of Chinese character strings with specific category numbers in a limited range, in particular to a neural network-based OCR (optical character recognition) method for Chinese character string pictures.

Background

Recognizing character strings in pictures is an important link of most Optical Character Recognition (OCR) technologies. Common applications of the character string and picture recognition technology are as follows: identification of identification numbers, license plate numbers and the like. Common application scenes comprise receipt information identification, street view shop name identification and the like.

The traditional optical character recognition method comprises the following processing steps: and finally, the recognition results of a plurality of single character pictures are connected in series to obtain the final result.

However, the above method of character cutting and then recognition has the following problems: the final recognition result depends on the effect of the character cutting model to a great extent, and if the character cutting is wrong, a series of errors are generated in the recognition process; the character cutting model has high requirements on sample picture quality, requires the character strings to be cut, and cannot correctly process the condition that some characters of the character strings are adhered or crossed. In addition, the traditional character recognition link does not consider the characteristics of Chinese in a specific field, and the whole Chinese character is used as a candidate result for recognition, so that the recognition result is poor.

Therefore, based on the method of firstly cutting characters and then identifying the characters, the experimental effect is not ideal, and the overall identification effect of the picture is poor due to the propagation of errors.

Disclosure of Invention

The invention is applied to the scene of Chinese character string recognition in the specific field, adopts a Chinese character string picture OCR recognition method based on a neural network, avoids the character cutting link, removes the influence of the error of the link on the recognition link, and obviously improves the recognition accuracy by adopting the whole character string picture as input in the recognition process.

The method comprises the following specific steps:

step one, collecting a plurality of pictures to be identified, and respectively normalizing each pixel value in each picture.

The normalization process is as follows:

first, the average, minimum, and maximum values of all pixel values on each picture are calculated.

Then, the average value is subtracted from the value of each pixel point on each picture, and then the average value is divided by the difference value between the maximum value and the minimum value.

img is a certain pixel value on the picture; mean (img) is the average of all pixels on the picture; max (img) is the maximum value of all pixels on the picture; min (img) is the minimum value of all pixels on the picture;

and step two, initializing each class of the neural network model, and respectively initializing a corresponding class center feature vector for each class.

The vector set of class center features is { C₁,C₂,....,C_i,...,C_n}；C_iThe central feature vector of the ith category; n is the total number of central feature vectors of the class.

And step three, inputting all normalized pixels in each picture to be identified into the neural network model respectively, and performing feature extraction through the convolution layer and the pooling layer to obtain the depth feature matrix of each picture.

Aiming at the N collected pictures to be identified, corresponding to N depth feature matrixes;

and step four, performing Pooling operation on the depth feature matrix of each picture through RoI Pooling for each picture to be recognized, and stretching features to obtain a feature vector with the length of L.

The specific process is as follows: firstly, dividing each depth feature matrix into grids with the length w and the width h;

then, the maximum value or the average value of all pixel values in each grid is calculated to be used as the output of the grid, so that a depth feature matrix with the length w and the width h is obtained.

And finally, flattening the depth feature matrix with the length w and the width h to obtain a feature vector with the length L.

The feature vector set of the N pictures to be identified is as follows: l is₁,L₂,...L_j,...L_N。

Dividing the N characteristic vectors into training samples and testing samples, wherein each training sample is respectively labeled with the category to which the training sample belongs, and training a neural network model;

the specific training process is as follows:

step 501, aiming at a certain training sample, calculating a square error loss value of a feature vector of the sample and a labeled class center vector;

the feature vector for the mth training sample is L_mThe labeled class center feature vector is C_mThe square error loss calculation formula is as follows:

step 502, recording the square error Loss value of each training sample as Weighted-Center-Loss function value, and optimizing the neural network model;

the softmax function obtains the weight of each dimension of each depth feature by acting on w, and M is the number of training samples; m is less than N.

Step 503, respectively connecting the feature vector of each training sample to a full connection layer of the neural network model to obtain a prediction output vector of each training sample;

step 504, converting the category marked by each training sample into a K-dimensional vector through one-hot coding;

step 505, combining the predicted output vector of each training sample with the K-dimensional vector of each training sample, and inputting the predicted output vector of each training sample into the softmax-Loss function together, wherein the output result is a softmax-Loss function value;

and step 506, taking the Weighted-Center-Loss function value and the softmax-Loss function value as the Weighted sum of the Weighted-Center-Loss function value and the softmax-Loss function value as the final Loss value to optimize the neural network model.

The final Loss value of Loss calculation formula is as follows:

Loss＝softmax-Loss+λ*Weighted-Center-Loss

λ is a weighting coefficient.

And step six, respectively connecting the feature vector of each test sample to the full connection layer of the trained neural network model, outputting the category of each test sample, and further completing the overall recognition of each picture character string.

And respectively inputting each feature vector with the length of L of the test sample into a full-connection layer of the trained neural network model, classifying each output on a node of the full-connection layer through a softmax function, wherein the number of neurons of the full-connection layer is consistent with the number of actually classified categories.

And converting the output of the full connection layer into probability by softmax, and taking the dimension with the maximum probability as an output category.

The invention has the advantages that:

1) the Chinese character string picture OCR recognition method based on the neural network is used for processing the features by using a RoI Pooling method, and is insensitive to the dimension of the input features, so that the model can process input data with different dimensions.

2) The Chinese character string picture OCR recognition method based on the neural network is characterized in that a Weighted-Center-Loss function and a Softmax-Loss function are combined to optimize a model, so that a feature vector obtained by optimizing the model can better accord with the characteristics of large difference between categories and small difference between the same categories in metric learning, and the model can benefit from better feature vector to obtain better recognition accuracy.

3) The character vectors are connected to a full connection layer to obtain output, and only one category output is obtained for the whole input picture, so that the model can classify and recognize the whole character string picture.

Drawings

FIG. 1 is a flow chart of a Chinese character string picture OCR recognition method based on neural network according to the present invention;

FIG. 2 is a schematic diagram of the present invention showing different sizes of data being extended by RoI Pooling to obtain uniform eigenvectors;

FIG. 3 is a schematic diagram of the present invention, which is used to collect a plurality of leukocyte pictures and classify them by using a DCNN model to obtain the output categories of the pictures;

FIG. 4 is a graph comparing accuracy of the present invention and other models;

FIG. 5 is a diagram illustrating the improved accuracy of the examples to be identified according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The invention relates to a Chinese character string picture OCR recognition method based on a Neural Network, which takes the whole character string picture as input, obtains a depth characteristic matrix of the picture through the treatment of a Convolutional Neural Network (CNN), uses the region Pooling (RoI Pooling) technology to carry out the characteristic extension of the depth characteristic matrix, calculates the Loss value of a model by using the weight class Center Loss (Weighted Center Loss) function, trains by using a labeled data set, inputs the character string picture of a task to be processed into the trained Neural Network model for recognition, connecting the output features of the model to a full connection layer to perform softmax classifier classification, wherein the specific classification of the classification is the actual classification of the Chinese character string of the processed task, for example, the name of the store in the street view store name identification task and the name of the laboratory sheet test item in the document information identification task; and finally, integrally identifying the character string picture content.

The invention uses the RoI Pooling and Weighted Center Loss functions, does not need to normalize the size of the picture, enables the model to process the input of data with different sizes, can obtain better effect on the performance of the model, can carry out integral identification on the Chinese character string picture, achieves the effect of integrally identifying the content of the Chinese character string with high precision, and has the advantages and significance of processing different sizes, having higher input precision and being capable of integrally identifying the character string picture.

The specific steps are shown in fig. 1 and comprise the following steps:

The normalization process is performed on the pixel values of the picture because the distribution of the pixel values is generally between 0 and 255 for the picture input, the specific pixel values of the picture are not important, the difference ratio between the pixel values is important, and the normalization of the input data is also the basic operation for most neural network models. The normalization method of the invention is different from the common normalization method, and specifically comprises the following steps:

The input of the neural network model is a w1 × h1 × 1 matrix formed by all normalized pixels in each picture to be identified; w1 is a wide occupied pixel of the picture to be recognized, and h1 is a high occupied pixel of the picture to be recognized. Obtaining a depth feature matrix of the picture after 5 layers of rolling blocks (rolling layer + pooling layer), wherein the shape of the depth feature matrix is (w2, h2, c2), and w2 is the width of the depth feature matrix; h2 is the height of the depth feature matrix; c2 is the length of the depth feature matrix.

Aiming at the N collected pictures to be identified, extracting the features of the images by adopting a traditional general convolutional neural network, wherein the N depth feature matrixes correspond to the features of the images;

As shown in fig. 2, the input depth feature matrix is processed by using the RoI Pooling to obtain an output with a uniform shape and size, so that data input with different sizes can be processed.

the specific training process is as follows:

The Weighted-Center-Loss function is used as an optimization target to improve the training effect of the model.

The final Loss value of Loss calculation formula is as follows:

Loss＝softmax-Loss+λ*Weighted-Center-Loss

λ is a weighting coefficient, typically set to 0.1. The neural network model is trained by minimizing the objective function Loss. During training, an Adam optimizer is used to perform gradient descent. In the step, an optimization target is set for the neural network model, and a corresponding optimizer is selected to optimize the model, so that the training of the whole model is realized, and the OCR (optical character recognition) model of the Chinese character string in a specific field is obtained.

And respectively inputting each feature vector with the length of L of the test sample into a full-connection layer of the trained neural network model, classifying each output on a node of the full-connection layer through a softmax function, wherein the number of neurons of the full-connection layer is consistent with the number of actually classified categories. And converting the output of the full connection layer into probability by softmax, and taking the dimension with the maximum probability as the output category of the test sample.

In the step, the output of the network model is connected to a full connection layer, and then the softmax function is acted on the full connection layer, so that multi-classification is realized, and the overall recognition of the Chinese character string picture can be realized in the overall process.

Example (b):

in order to better show the effectiveness and the innovation of the method, a batch of data is used for carrying out experimental verification and evaluation on the performance and the effect of the model provided by the invention. As shown in fig. 3, several white blood cell count pictures are collected, and after the model of DCNN (Weighted Center Loss) is used in the method, the depth feature matrix of each picture is obtained, and the feature vectors corresponding to each picture are obtained through the extended features of RoI pobing. The network model output part calculates to obtain a Loss Softmax-Loss by using a Softmax Loss function; the Weighted Center Loss is calculated to obtain the Loss Weighted-Center-Loss, and the two Loss values are summed through a weighting coefficient lambda to be used as the optimization target of the network. And respectively inputting the feature vectors of the to-be-processed test pictures into the fully connected layer of the trained neural network model, and classifying each output on the nodes of the fully connected layer through a softmax function to obtain the output category of each picture.

The comparison of the effect obtained by the model and the effect obtained by other models in the data is shown in fig. 4, and it can be seen from the figure that the model adopting the DCNN method obtains the highest accuracy, so that the method can be known to have better performance in terms of the accuracy of the model.

As shown in fig. 5, some examples of recognition are listed, and it can be seen from the figure that by using the model of the method, it can be realized that all pictures input in different sizes are processed and the recognition result is obtained, and the final recognition result is overall recognition rather than character-by-character recognition. However, if the RoI Pooling method in the invention is not used, the model cannot process input pictures with different sizes, and the method of using Weighted Center Loss in the invention improves the accuracy of the model, so that the effect of the model is optimal. Moreover, if the method for integrally recognizing the Chinese character string is not adopted, the whole Chinese character string cannot be integrally recognized.

Claims

1. A Chinese character string picture OCR recognition method based on a neural network is characterized by comprising the following specific steps:

step one, collecting a plurality of pictures to be identified, and respectively normalizing each pixel value in each picture;

initializing each category of the neural network model, and initializing a category center feature vector corresponding to each category for each category;

the vector set of class center features is { C₁,C₂,....,C_i,...,C_n}；C_iThe central feature vector of the ith category; n is the total number of central feature vectors of the category;

inputting all normalized pixels in each picture to be identified into a neural network model respectively, and performing feature extraction through the convolution layer and the pooling layer to obtain a depth feature matrix of each picture;

step four, performing Pooling operation on the depth feature matrix of each picture through RoI Pooling for each picture to be recognized, and stretching features to obtain a feature vector with the length of L;

the specific training process is as follows:

the feature vector for the mth training sample is L_mThe labeled class center feature vector is C_mLoss meter for square differenceThe calculation formula is as follows:

the softmax function obtains the weight of each dimension of each depth feature by acting on w, and M is the number of training samples; m is less than N;

step 506, taking Weighted-Center-Loss function value and softmax-Loss function value as a Weighted sum of the Weighted-Center-Loss function value and the softmax-Loss function value as a final Loss value to optimize the neural network model;

the final Loss value of Loss calculation formula is as follows:

Loss＝softmax-Loss+λ*Weighted-Center-Loss

λ is a weighting coefficient;

step six, respectively connecting the characteristic vector of each test sample to a full connection layer of the trained neural network model, and outputting the category of each test sample, thereby completing the overall recognition of each picture character string;

2. The neural network-based Chinese character string image OCR recognition method as claimed in claim 1, wherein the normalization process in the first step is:

firstly, calculating the average value, the minimum value and the maximum value of all pixel values on each picture;

then, subtracting the average value from the value of each pixel point on each picture, and dividing the average value by the difference value between the maximum value and the minimum value;

img is a certain pixel value on the picture; mean (img) is the average of all pixels on the picture; max (img) is the maximum value of all pixels on the picture; min (img) is the minimum value of all pixels on the picture.

3. The OCR recognition method for Chinese character string pictures based on neural network as claimed in claim 1, wherein said step four is specifically comprised of: firstly, dividing each depth feature matrix into grids with the length w and the width h;

then, calculating the maximum value or the average value of all pixel values in each grid as the output of the grid, thereby obtaining a depth feature matrix with the length w and the width h;

finally, flattening the depth feature matrix with the length w and the width h to obtain a feature vector with the length L;

4. The neural network-based Chinese character string image OCR recognition method as claimed in claim 1, wherein softmax in step six converts the output of the full connection layer into probability, and takes the dimension with the highest probability as the output category.