CN109993109A

CN109993109A - Image character recognition method

Info

Publication number: CN109993109A
Application number: CN201910248870.3A
Authority: CN
Inventors: 李孝杰; 罗超; 史沧红; 吴锡; 周激流; 李俊良; 刘书樵; 张宪; 伍贤宇
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2019-03-29
Filing date: 2019-03-29
Publication date: 2019-07-09

Abstract

The invention discloses a kind of image character recognition methods, are related to image identification technical field.The image character recognition method the following steps are included: step S1, acquire multiple have text natural scene image information；Step S2, the character area of collected natural scene image information is manually marked, to obtain label data, and label data is pre-processed to obtain image data；Step S3, the multiple dimensioned neural network of multilayer based on convolution is established, described image data are input to the multiple dimensioned neural network of the multilayer and are trained；Step S4, it acquires natural scene image information to be identified and is pre-processed to obtain image data to be processed, image data to be processed is input in the multiple dimensioned neural network of the multilayer after training, passes through the multiple dimensioned neural network recognition of the multilayer and exports the text information in the natural scene image information to be identified.The text in automatic quickly identification natural scene image may be implemented in the present invention.

Description

Image character recognition method

Technical field

The present invention relates to image identification technical fields, more particularly to a kind of image character recognition method.

Background technique

The purpose of natural scene image words identification is the semantic information that a word is included in the image obtained after cutting. Since natural scene image and file and picture make a big difference, traditional character identifying method is not directly applicable nature field Text identification in scape image.In recent years, researcher has carried out the research of the Text region largely in natural scene image. Text region is the process that image information is converted to series of sign, these symbols can be by computer representation and processing.This In matter, Text region task is considered a kind of special translation process: picture signal is converted into " natural language ", this Similar with speech recognition and machine translation: from the perspective of mathematics, they turn one group of list entries comprising much noise It is changed to one group of output sequence of given set of tags.

In the prior art, have researcher using entire natural image to identify text: they are using based on gradual change Characteristic pattern carrys out more prefabricated word image, and the word for including in present image is determined using dynamic k neighbour, it is depended on The word image of fixed dictionary and pre-generatmg.When by 2013, integrated Fisher vector and structuring support vector machines are used Frame come establish picture and entire word coding between relationship.

Google has delivered the article of street view image number identification in 2013.One kind is described in text to mention from streetscape The system of access word.The system uses end-to-end neural network, and how author elaborates in consolidated network with Human Sperm Exactness defeats CAPTCHA (the Completely Automated Public Turing Test to Tell of Google oneself Computers and Humans Apart, the full-automatic turing test for distinguishing computer and the mankind) system, and the frame of Google Structure is verified to be suitable for CAPTCHAS.In this article, propose that maxout (activation primitive) is used to swash as non-linear first Unit living constructs a depth CNN (Convolutional Neural Networks, convolutional neural networks) to encode entire figure Picture, and the classifier using the character rank of multiple position sensings for text identification.They take in terms of determining streetscape number Obtained immense success.Model is also applied to 8 identifying code identification missions by them, and uses compound training data training pattern. This method realizes the discrimination more than 96% in Google's streetscape Number Reorganization task.Meanwhile it have been obtained for be more than 99% Google's identifying code identification mission discrimination, then obtains state-of-the-art result in text classification.But depth CNN's The disadvantage is that having pre-selected the maximum length of predictable sequence, this is more suitable for house number or license plate number.

Subtle modification has been carried out to above-mentioned model in 2014 Nian Shiyou researchers: having been eliminated for predicting character length Classifier, and using full stop instruction text ending.Then, they demonstrate the model using the training of compound training data It can be applied successfully to practical identification problem.It is a kind of feasible dictionary word recognition methods that word, which is encoded into vector, but In abandoned situation, character can be in any combination.When number of characters is enough, the method based on regular length vector coding Performance be decreased obviously.But, however it remains some shortcomings: some researchs have used depth in single character recognition step Habit technology, but overall framework still follows traditional process flow design, therefore in other steps the problem of describe in the introduction It still can be encountered in rapid.The research of Google is done directly entire identification process using pure neural network, and obtains industry-leading As a result.However, since they need to use the image of fixed size as input, and input picture is encoded to regular length Feature vector, therefore in the picture there are many in the case where character, the accuracy of identification of model will be greatly reduced.On the other hand, by In their model no clearly positioning and segmented image, therefore it can not can know that position of each character in original image It sets.

Summary of the invention

The main purpose of the present invention is to provide a kind of image character recognition methods, it is intended to can accurately identify and come from Text in right scene.

To achieve the above object, the present invention provides a kind of image character recognition method, comprising the following steps:

Step S1, multiple natural scene image information for having text are acquired；

Step S2, the character area of collected natural scene image information is manually marked, to obtain number of tags According to, and pre-processed the label data to obtain image data；

Step S3, the multiple dimensioned neural network of multilayer based on convolution is established, described image data are input to the multilayer Multiple dimensioned neural network is trained and obtains trained neural network model；

Step S4, it acquires natural scene image information to be identified and is pre-processed to obtain image data to be processed, it will The image data to be processed is input in the multiple dimensioned neural network model of the multilayer after training, passes through the more rulers of the multilayer Degree neural network recognition simultaneously exports the text information in the natural scene image information to be identified.

Preferably, the multiple dimensioned neural network of the multilayer includes input layer, down sample module, residual error module and full articulamentum Module；The down sample module is made of sizable three convolutional layers of convolution kernel；The residual error module by convolution kernel respectively not The intensive submodule of equal seven is constituted；

The input layer extracts the semantic feature in image data for receiving image data, the down sample module, and The semantic feature is input in residual error module, then via described seven intensive submodules processing after, by affiliated full articulamentum Output character information.

Preferably, in the step S2, the label data is pre-processed to obtain image data to further include following step It is rapid:

Step S21, in the label data, selection is normalized comprising the image of text, to reach default point Resolution；

Step S22, the image after normalized is cut to obtain the image data of pre-set dimension.

Preferably, the step S21 further include:

The pixel value of image after the normalized is between 0~255.

Preferably, the step S22 further include:

The size of described image data after cutting is the reception size of the multiple dimensioned neural network of multilayer.

Preferably, the step S3 the following steps are included:

Step S31, described image data are inputted in the input layer of the multiple dimensioned neural network of the multilayer；

Step S32, use the image data of preset quantity as training sample using the confirmation of five folding cross-validation methods, it is remaining Part is used as test sample；Initialize neuron weight and the parameter in the convolutional layer of the multiple dimensioned neural network of the multilayer；

Step S33,64 described image data are inputted in the input layer of the multiple dimensioned neural network of the multilayer；

Step S34, the multiple dimensioned neural network of the multilayer is trained by propagated forward algorithm and exports prediction text Word；The probability distribution of Text region in image is exported by normalization exponential function classifier；

Step S35, the predictive text of output and the error of label data are calculated by accuracy algorithm；

Step S36, optimize and update the weight parameter of the multiple dimensioned neural network of the multilayer based on the error；

Step S37, step S33 to step S36 is repeated, until training loss and test loss no longer reduce.

Preferably, the equation of the accuracy algorithm are as follows:

Wherein, T indicates to identify that correct image, F indicate to identify wrong image.

Preferably, the step S36 further include:

Optimize the weight parameter of the multiple dimensioned neural network of the multilayer by Adam majorized function.

The invention has the following beneficial effects: the text in automatic identification natural scene image may be implemented in the present invention, and can obtain Higher accuracy of identification；And the generalization ability of this method is very strong, suitable for the Text region under most of natural scenes.

Detailed description of the invention

Fig. 1 is the flow diagram of image character recognition method of the present invention；

Fig. 2 is the flow diagram of the embodiment of the present invention；

Fig. 3 is the network structure in image character recognition method of the present invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

The following further describes the present invention with reference to the drawings.

As shown in Figure 1 and Figure 2, the embodiment of the present invention provides a kind of image character recognition method, comprising the following steps:

Step S1, multiple natural scene image information for having text are acquired.

In a particular embodiment, published natural image data set is acquired, totally 9960 natural scene images.

Step S2, the character area of collected natural scene image information is manually marked, to obtain number of tags According to, and pre-processed the label data to obtain image data.

Specifically, in step s 2, it is pre-processed the label data to obtain image data further comprising the steps of:

Step S21, in the label data, selection is normalized comprising the image of text, to reach default point Resolution.Specifically, the pixel value of the image after the normalized is between 0~255.

Step S22, the image after normalized is cut to obtain the image data of pre-set dimension.Specifically, The size of described image data after cutting is the reception size of the multiple dimensioned neural network of multilayer.

In a particular embodiment, by being manually labeled the character area in natural scene image information, as label Data.

Step S3, the multiple dimensioned neural network of multilayer based on convolution is established, described image data are input to the multilayer Multiple dimensioned neural network is trained and obtains trained neural network model.

As shown in figure 3, the multiple dimensioned neural network of multilayer includes input layer, down sample module, residual error module and Quan Lian Connect a layer module；The down sample module is made of sizable three convolutional layers of convolution kernel；The residual error module is by convolution kernel Each unequal seven intensive submodules are constituted；The input layer extracts figure for receiving image data, the down sample module It is input in residual error module as the semantic feature in data, and by the semantic feature, then via seven intensive submodules After processing, by affiliated full articulamentum output character information.

The advantages of image character recognition method of the present invention is neural network combination dense network multiple dimensioned by multilayer is used for The method of natural image Text region.The recognition methods first passes around the down sample module of three convolutional layers composition, which can To extract deeper time, more abstract semantic feature.Then defeated each layer in residual error module using residual error module It is all added in the input of back layer out, this improves feature reuse rates, avoid the loss of feature.Finally by full articulamentum Module exports the probability of last identification text.

Specifically, pretreated image data is input in the multiple dimensioned neural network of multilayer and is trained, and saved The best model of prediction effect uses the best model of the effect when needing to carry out new natural scene image words identification Carry out Text region.

Specifically, step S3 further include:

Step S31, described image data are inputted in the input layer of the multiple dimensioned neural network of the multilayer.

Step S32, use the image data of preset quantity as training sample using the confirmation of five folding cross-validation methods, it is remaining Part is used as test sample；Initialize neuron weight and the parameter in the convolutional layer of the multiple dimensioned neural network of the multilayer.? In specific embodiment, each cross validation does training sample using the data of 7968 images (the 80% of i.e. 9960), and 1992 The data of a image (the 20% of i.e. 9960) do test sample.

Step S35, the predictive text of output and the error of label data are calculated by accuracy algorithm.

Specifically, the equation of the accuracy algorithm are as follows:

Specifically, optimize the weight parameter of the multiple dimensioned neural network of the multilayer by Adam majorized function.In other realities It applies in example, can also be optimized using other majorized functions.

Step S4, it acquires natural scene image information to be identified and is pre-processed to obtain image data to be processed, it will The image data to be processed is input in the multiple dimensioned neural network of the multilayer after training, passes through the multiple dimensioned mind of the multilayer Through network automatic identification and export the text information in the natural scene image information to be identified.

It should be understood that the above is only a preferred embodiment of the present invention, the scope of the patents of the invention cannot be therefore limited, It is all to utilize equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content, it is applied directly or indirectly in Other related technical areas are included within the scope of the present invention.

Claims

1. a kind of image character recognition method, which comprises the following steps:

Step S2, the character area of collected natural scene image information is manually marked, to obtain label data, and It is pre-processed the label data to obtain image data；

Step S3, the multiple dimensioned neural network of multilayer based on convolution is established, described image data are input to the more rulers of the multilayer Degree neural network is trained and obtains trained neural network model；

Step S4, it acquires natural scene image information to be identified and is pre-processed to obtain image data to be processed, it will be described Image data to be processed is input in the multiple dimensioned neural network model of the multilayer after training, passes through the multiple dimensioned mind of the multilayer Through network automatic identification and export the text information in the natural scene image information to be identified.

2. image character recognition method according to claim 1, which is characterized in that the multiple dimensioned neural network packet of multilayer Include input layer, down sample module, residual error module and full articulamentum module；The down sample module is by convolution kernel sizable three A convolutional layer is constituted；The residual error module is made of each unequal seven intensive submodules of convolution kernel；

For the input layer for receiving image data, the down sample module extracts the semantic feature in image data, and by institute Semantic feature is stated to be input in residual error module, then via described seven intensive submodules processing after, exported by affiliated full articulamentum Text information.

3. image character recognition method according to claim 2, which is characterized in that in the step S2, by the label Data are pre-processed to obtain image data further comprising the steps of:

Step S21, in the label data, selection is normalized comprising the image of text, to reach default resolution Rate；

4. image character recognition method according to claim 3, which is characterized in that the step S21 further include:

The pixel value of image after the normalized is between 0~255.

5. image character recognition method according to claim 3, which is characterized in that the step S22 further include:

6. image character recognition method according to claim 2, which is characterized in that the step S3 the following steps are included:

Step S32, use the image data of preset quantity as training sample, remainder using the confirmation of five folding cross-validation methods As test sample；Initialize neuron weight and the parameter in the convolutional layer of the multiple dimensioned neural network of the multilayer；

Step S34, the multiple dimensioned neural network of the multilayer is trained by propagated forward algorithm and exports predictive text； The probability distribution of Text region in image is exported by normalization exponential function classifier；

7. image character recognition method according to claim 6, which is characterized in that the equation of the accuracy algorithm are as follows:

8. image character recognition method according to claim 6, which is characterized in that the step S36 further include: