CN110245545A

CN110245545A - A kind of character recognition method and device

Info

Publication number: CN110245545A
Application number: CN201811126275.4A
Authority: CN
Inventors: 任宇鹏; 卢维; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2018-09-26
Filing date: 2018-09-26
Publication date: 2019-09-17

Abstract

The invention discloses a kind of character recognition method and devices, and the recognition result accuracy for solving the problems, such as text in image is not high.This method comprises: the image comprising text to be identified is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, the first score value that the content for including in the location information and each Suggestion box for each Suggestion box for including in described image is text is obtained；Screen the candidate Suggestion box that score value is greater than default scoring threshold value；According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；Each target Suggestion box is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, identifies the text for including in each target Suggestion box.

Description

A kind of character recognition method and device

Technical field

The present invention relates to deep learning and technical field of character recognition more particularly to a kind of character recognition methods and device.

Background technique

With the fast development of image capture device, more and more image informations need the mankind to be managed it.And The automatic management that image information is realized using Internet technology is current best means.

In identification image before text, need to position the text in image first.The text in image is fixed at present Position method is broadly divided into following two categories: the first is based on Faster RCNN (Faster Region Convolutional Neural Networks), YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector) The position frame homing method of network, such method can directly export line of text scoring and positioning frame body；Second is based on complete The dividing method of convolutional neural networks (Fully Convolutional Networks, FCN), such method pass through prediction pixel The text classification of grade is as a result, and carry out certain post-processing generation boundary rectangle frame to result.Real-time and precision are all higher Faster RCNN method suggests network (Region Proposal Networks, RPN) method after convolution using region Different text filed candidate frames is generated on characteristic pattern, and classification and position frame time are carried out to candidate region by neural network Return.But acutely due to the variation of line of text length, conventional candidate frame scheme is difficult to realize the accurate positionin to the type objects, together When, due to the limitation of computing cost and the requirement of real-time, cannot meet simply by candidate frame size and shape is increased Required precision needs to improve existing RPN scheme.

It is that number connection inscription product science and technology in Chengdu is limited with the closest existing implementation of the present invention in terms of pictograph identification A kind of " complex script recognition methods based on deep learning " patent formula of company's application.The program is using single convolution mind Identify that single character does not account for the context and semantic information that text sequence is included through network, recognition result accuracy is not It is high.

Summary of the invention

The purpose of the embodiment of the present invention is that a kind of character recognition method and device are provided, for solving the knowledge of text in image The not high problem of other result precision.

The embodiment of the invention provides a kind of character recognition methods, comprising:

Image comprising text to be identified is input to and trains the neural comprising convolutional neural networks and circulation of completion in advance In first model of network, include in the location information for each Suggestion box for including in acquisition described image and each Suggestion box Content is the first score value of text, wherein first model obtains the characteristic pattern of described image, based on the characteristic pattern into The operation of row sliding window, determines each window feature, each according to preset width and Height Prediction in each window feature Position Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, it is based on institute State Recognition with Recurrent Neural Network obtain described image in include each Suggestion box location information and each Suggestion box in include it is interior Hold the first score value for text；

Screen the candidate Suggestion box that the first score value is greater than default scoring threshold value；

According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；

Each target Suggestion box is input to the convolutional neural networks comprising Recognition with Recurrent Neural Network that training is completed in advance In second model, the text for including in each target Suggestion box is identified.

Further, it is described by the image comprising text to be identified be input to that training in advance completes comprising convolutional Neural net Before in first model of network and Recognition with Recurrent Neural Network, the method also includes:

Described image is handled using threshold segmentation method and connected domain analysis method；

And image carries out text orientation correction to treated.

Further, the position according to each candidate Suggestion box merges to obtain target and build to candidate Suggestion box Discussing frame includes:

For the first candidate Suggestion box in each candidate Suggestion box, recognize whether and the horizontal seat of the first candidate Suggestion box The distance between mark is less than preset first threshold, and the degree of overlapping of vertical direction is greater than preset second threshold, and shape is similar Degree is greater than the second candidate Suggestion box of preset third threshold value, if it does, by the described first candidate Suggestion box and described second Candidate Suggestion box is incorporated as the first candidate Suggestion box；If it does not, using the first candidate Suggestion box as target Suggestion box.

Further, it is determined that the degree of overlapping of the vertical direction includes:

According to the of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box Two height and the second vertical coordinate, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the Vertical Square To degree of overlapping, wherein y_A2Represent the second vertical coordinate of the described second candidate Suggestion box, y_D1First candidate is represented to build Discuss the first vertical coordinate of frame, h₁And h₂The first height and second candidate for respectively representing the described first candidate Suggestion box are built Discuss the second height of frame.

Further, it is determined that the shape similarity includes:

According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following public affairs Formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the shape similarity, wherein h₁And h₂Respectively represent institute State the height of the first candidate Suggestion box and the second candidate Suggestion box.

Further, the process for training first model in advance includes:

Sample image is obtained, wherein being labelled with the location information of each Suggestion box in the sample image and each position is built The content that view frame includes is the second score value of text；

Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to every The output of a first model is trained first model.

Further, the process for training second model in advance includes:

Obtain each line of text marked in sample image；

Each sample image comprising corresponding line of text is input to comprising convolutional neural networks and Recognition with Recurrent Neural Network In second model, according to the output of each second model, second model is trained.

The embodiment of the invention provides a kind of character recognition device, which includes:

Obtain module, for by include text to be identified image be input to that training in advance completes comprising convolutional Neural net In first model of network and Recognition with Recurrent Neural Network, obtains the location information for each Suggestion box for including in described image and each build The content for including in view frame is the first score value of text, wherein first model obtains the characteristic pattern of described image, is based on The characteristic pattern carries out sliding window operation, determines each window feature, in each window feature according to preset width and The each position Suggestion box of Height Prediction；Using the corresponding window feature sequence of every row of the characteristic pattern as Recognition with Recurrent Neural Network The input of model obtains the location information for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network submodel And the content in each Suggestion box including is the first score value of text；

Screening module, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value；

Merging module is merged to obtain target and be built for the position according to each candidate Suggestion box to candidate Suggestion box Discuss frame；

Identification module, for by each target Suggestion box be input to that training in advance completes comprising convolutional neural networks and following In second model of ring neural network, the text for including in each target Suggestion box is identified.

Further, described device further include:

Correction module, for being handled using threshold segmentation method and connected domain analysis method described image；And it is right Treated, and image carries out text orientation correction.

Further, the merging module is specifically used for for the first candidate Suggestion box, identification in each candidate Suggestion box It is less than preset first threshold, the degree of overlapping of vertical direction with the presence or absence of with the distance between the first candidate Suggestion box abscissa Greater than preset second threshold, and shape similarity is greater than the second candidate Suggestion box of preset third threshold value, if it does, will Described first candidate Suggestion box and the second candidate Suggestion box are incorporated as the first candidate Suggestion box；If it does not, should First candidate Suggestion box is as target Suggestion box.

Further, the merging module, specifically for the first height and first according to the described first candidate Suggestion box The second height and the second vertical coordinate of vertical coordinate and the second candidate Suggestion box, using following formula: overlap=| y_A2- y_D1|/min(h₁,h₂), determine the degree of overlapping of the vertical direction, wherein y_A2Represent the described second candidate Suggestion box second is hung down Straight coordinate, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the described first candidate suggestion The height of frame and the second candidate Suggestion box.

Further, the merging module, specifically for the first height and second according to the described first candidate Suggestion box Second height of candidate Suggestion box, using following formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the shape Shape similarity, wherein h₁And h₂Respectively represent the first height and the second candidate Suggestion box of the described first candidate Suggestion box Second height.

Further, described device further include:

First training module, for obtaining sample image, wherein being labelled with the position of each Suggestion box in the sample image The content that confidence breath and each position Suggestion box include is the second score value of text；Each sample image is input to comprising volume In first model of product neural network and Recognition with Recurrent Neural Network, according to the output of each first model, to first model into Row training.

Further, described device further include:

Second training module, for obtaining each line of text marked in sample image；The every of corresponding line of text will be included A sample image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each second model Output, is trained second model.

The embodiment of the present invention provides a kind of character recognition method and device, this method are defeated by the image comprising text to be identified Enter in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed to preparatory training, includes in acquisition image The content for including in the location information of each Suggestion box and each Suggestion box is the first score value of text, wherein the first model Obtain image characteristic pattern, based on characteristic pattern carry out sliding window operation, determine each window feature, in each window feature according to Preset width and each position Suggestion box of Height Prediction；Using the corresponding window feature sequence of every row of characteristic pattern as circulation mind Input through network, location information and each Suggestion box based on each Suggestion box for including in Recognition with Recurrent Neural Network acquisition image In include content be text the first score value.Identify that the first score value is greater than the candidate Suggestion box of default scoring threshold value；Root According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；Each target Suggestion box is defeated Enter in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed to preparatory training, identifies each target suggestion The text for including in frame.

Due in embodiments of the present invention, by the image comprising text to be identified be input to that training in advance completes comprising volume In first model of product neural network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtained The content for including in a Suggestion box is the first score value of text.First model can effectively obtain the upper and lower of text sequence Literary information simultaneously adds it in position fixing process, specifically, the score value that the white space Suggestion box between same row text is set It can be promoted because of the sequence signature of front and back text, the line of text position frame obtained is finally made to be more in line with text sequence Position feature, line of text positioning result are more accurate.Secondly, including by what each target Suggestion box was input to training completion in advance In second model of convolutional neural networks and Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.This second Model is due to can be enhanced the extraction of word sequence contextual information comprising Recognition with Recurrent Neural Network, so that the prediction of text sequence As a result more accurate.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of flow diagram for character recognition method that the embodiment of the present invention 1 provides；

Fig. 2 is the specific implementation procedure schematic diagram for the line of text positioning operation that the embodiment of the present invention 1 provides；

Fig. 3 is the effect diagram operated by Recognition with Recurrent Neural Network that the embodiment of the present invention 1 provides；

Fig. 4 is the specific implementation procedure schematic diagram for the line of text identification operation that the embodiment of the present invention 1 provides；

Fig. 5 is the Suggestion box desired position information schematic diagram that the embodiment of the present invention 3 provides；

Fig. 6 is that the entire flow diagram for the express delivery face list Text region that the embodiment of the present invention 7 provides is intended to；

Fig. 7 is a kind of character recognition device structural schematic diagram that the embodiment of the present invention 8 provides.

Specific embodiment

The present invention will be describe below in further detail with reference to the accompanying drawings, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist All other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.

Embodiment 1:

Fig. 1 is a kind of process schematic of character recognition method provided in an embodiment of the present invention, which includes following step It is rapid:

S101: by the image comprising text to be identified be input to that training in advance completes comprising convolutional neural networks and circulation In first model of neural network, obtains and wrapped in the location information and each Suggestion box for each Suggestion box for including in described image The content contained is the first score value of text, wherein first model obtains the characteristic pattern of described image, is based on the feature Figure carries out sliding window operation, each window feature is determined, according to preset width and Height Prediction in each window feature Each position Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, base Include in the location information and each Suggestion box that the Recognition with Recurrent Neural Network obtains each Suggestion box for including in described image Content be text the first score value.

Since the text information in image is likely distributed in any position in image, and may there was only one in image Subregion includes text to be identified.Therefore before being identified to the text in image, it is necessary first to the text in image Current row location information carries out positioning operation and obtains the location information of line of text in the picture.According to the text line position after positioning operation Confidence breath carries out identification operation to text wherein included.

Wherein, two neural networks for including in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network can be with It is: convolutional neural networks and Recognition with Recurrent Neural Network.Since the purpose of the operation of convolutional neural networks and Recognition with Recurrent Neural Network is all In order to realize the positioning to line of text location information in image, therefore by above-mentioned two neural network collectively the first mould Type.The image of text to be identified is input in convolutional neural networks, the convolution sum pondization operation by several layers finally obtains The characteristic pattern of image.Sliding window convolution operation is carried out on this feature figure, obtains window feature；And when carrying out sliding window operation every A sliding window center goes out each position Suggestion box according to the width and Height Prediction of setting.It is obtained above-mentioned according to sliding window convolution operation Window feature be input in Recognition with Recurrent Neural Network, the coordinate information of each position Suggestion box of final output and the position are built The first score value containing text in frame is discussed, the score value is for judging whether the position Suggestion box is candidate Suggestion box.

When determining position Suggestion box, the embodiment of the present invention passes through the width and height at each sliding window center using setting Predict each position Suggestion box.Since the text height in the line of text in image is indefinite, built according to fixation in the prior art The Suggestion box generation method of frame size and shape, the problem of will cause line of text position inaccurate are discussed, and the embodiment of the present invention mentions The method of the generation position Suggestion box of confession can solve above-mentioned problem.And by setting threshold decision position Suggestion box whether For candidate Suggestion box, redundant position Suggestion box is removed, it is possible to reduce open due to increasing the calculating of Suggestion box size and shape bring Pin.Meanwhile line of text is positioned by introducing Recognition with Recurrent Neural Network model in first model, due to recycling nerve net Network model itself has the characteristic of memory, therefore the context of text sequence can be effectively obtained using the Recognition with Recurrent Neural Network Information simultaneously adds it in position fixing process.In specific implementation it is possible that a kind of situation be, between same row text White space Suggestion box score value can because front and back text sequence signature and be promoted, finally make acquisition text Row Suggestion box is more in line with the position feature of text sequence, keeps positioning result more accurate.

For example, by taking the single image Text region of express delivery face as an example, the line of text positioning operation of express delivery face single image to be identified Specific implementation procedure it is as shown in Figure 2, wherein Convx_x represents the convolution operation of disparate modules, the dotted line connection of convolution module Part represents pondization operation.BLSTM (Bidirectional Long Short-term Memory) is two-way long short-term memory Neural network, FC (Fully Connected) refer to full articulamentum, predict k location Suggestion box altogether in characteristic pattern conv5_3, After BLSTM and FC layers, exporting the content for including in the location information and each Suggestion box of each Suggestion box of prediction is The score value of text.

Image to be identified is input to the convolutional neural networks based on VGGNet that training is completed in advance first, extracts figure As feature, the network alternately convolution-pondization is operated, specifically, image pass through altogether 13 3 × 3 convolutional layer and 42 × 2 maximum pond layer, the final shape that obtains is W × H × C characteristic pattern conv5_3, and wherein W, H, C respectively represent characteristic pattern Wide, high and port number；

It is 1 that step-length is carried out on characteristic pattern conv5_3 obtained above, and the sliding window convolution that convolution kernel size is 3 × 3 is grasped Make, and predicts k position Suggestion box according to certain shapes and sizes at each sliding window center；

In specific implementation, k is set as 10；And certain shapes and sizes are particularly: fixed wide using small scale Degree, the position Suggestion box set-up mode only changed on altitude range.Specifically, fixed width can be set to 16 pixels, it is high The mode changed in degree range are as follows: height is reduced to 11 pictures from 283 pixels according to the method that reduction ratio is 0.7 respectively Element predicts 10 position Suggestion box according to above-mentioned method altogether.

Secondly, the window feature of features described above figure conv5_3 t 3 × 3 × C obtained by sliding window convolution operation is made It is characterized sequence inputting BLSTM neural network, is cyclically updated the internal state H of hidden layer_t, according to the following formula to internal state It is cyclically updated:

Wherein X_t∈R^3×3×CIt is the characteristic sequence obtained in the every a line of characteristic pattern conv5_3 from t sliding window, W is The width of characteristic pattern conv5_3, C are characterized the port number of figure conv5_3,For nonlinear function.Obtain effective context Information connects the first scoring that the content for including in the location information and each Suggestion box of the FC layers of each Suggestion box of output is text Value.

For example, Fig. 3 be specific implementation process in an example, the figure show the result is that by BLSTM neural network Corresponding first score value of the Suggestion box and Suggestion box of prediction after operation.Wherein the box of the third line represents Suggestion box, the The number of two rows represents the first score value of Suggestion box, and the number in the first row represents the location index value of corresponding Suggestion box, Middle index value is for traversing Suggestion box.

S102: the first score value of screening is greater than the candidate Suggestion box of default scoring threshold value.

In the above-mentioned Suggestion box determined, it is possible to there is the Suggestion box not comprising text information.Therefore it is directed to upper one The content for including in each Suggestion box that step obtains is the score value of text, eliminates redundancy suggestion by preset scoring threshold value Frame obtains candidate Suggestion box.Specifically, the Suggestion box is considered as if the score value of the Suggestion box is greater than preset scoring threshold value Candidate Suggestion box；On the contrary, the Suggestion box is considered as redundancy suggestion if the score value of the Suggestion box is not more than preset scoring threshold value Frame, and remove the Suggestion box.

For example, in specific implementation, preset scoring threshold value can be set to 0.7, judge Suggestion box score value whether Greater than 0.7, if so, the Suggestion box is candidate Suggestion box；If it is not, then the Suggestion box is considered as redundancy Suggestion box, and it is superfluous to eliminate this Remaining Suggestion box.

S103: according to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box.

The corresponding text of a line every in image is positioned in order to realize, candidate Suggestion box obtained above need to be carried out Merging finds out target Suggestion box.Therefore according to the location information of the above-mentioned candidate Suggestion box found out, to candidate Suggestion box one by one into Row, which merges, finds out target Suggestion box.

Wherein, the process merged for two candidate Suggestion box, in specific implementation, a kind of possible embodiment For using the minimum circumscribed rectangle of this two candidate Suggestion box as the frame obtained after merging, i.e. target Suggestion box.

A kind of possible embodiment judges this two candidate Suggestion box in level for any two candidate's Suggestion box Whether the distance in direction is less than the threshold value of setting, if it is, merging this two candidate Suggestion box.

S104: by each target Suggestion box be input in advance training complete comprising convolutional neural networks and circulation nerve net In second model of network, the text for including in each target Suggestion box is identified.

Obtain line of text location information positioning result after, need to identify the text in line of text, identification it is accurate Rate is very crucial for the automatic management for realizing image.Therefore the line of text in image is positioned by aforesaid operations, And after obtaining target Suggestion box, in order to identify the text in the target Suggestion box after positioning, by target Suggestion box obtained above In the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that input training in advance is completed, identify in target Suggestion box Text information.

Wherein, two neural networks for including in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network can be with It is: convolutional neural networks and Recognition with Recurrent Neural Network.Since the purpose of the operation of convolutional neural networks and Recognition with Recurrent Neural Network is all In order to realize the identification to text information in image, therefore by above-mentioned two neural network collectively the second model.

Using target Suggestion box obtained above as the input of convolutional neural networks, by several convolution sum maximum ponds Operation obtains image convolution feature, and using image convolution feature as the input of Recognition with Recurrent Neural Network, obtains the output of convolutional layer And it is calculated as the corresponding classification scoring of width dimension therewith.Using connectionism timing classification method by Recognition with Recurrent Neural Network Output result be converted into sequence label, and probability is defined to sequence label according to the predicted value of every frame, uses negative pair of probability Number likelihood, can be corresponding with sequence label with direct construction image as objective function training network, without marking single character.

For example, input picture is the target Suggestion box obtained after aforesaid operations, wait know by taking the single image of express delivery face as an example The specific implementation procedure of the line of text identification operation of other express delivery face single image is as shown in Figure 4, wherein Convolution represents 3 × 3 convolutional layer, Dense Blocks represent 1 × 1 and 3 × 3 combined convolutional layers, and Transition Layers represents 2 × 2 Maximum pond layer, BGRU (Bidirectional Gated Recurrent Unit) be based on two-way GRU recycle nerve net Network model.

Detailed process are as follows: by the express delivery face single image after positioning be input in advance training complete based on DenseNet's Ultra-deep network structure extracts characteristics of image, image first pass around 3 × 3 convolutional layer, again successively alternately across several 1 × 1 and 3 × 3 combined convolutional layers and 1 × 1 convolutional layer and 2 × 2 maximum pond layer, network model depth reaches 120 layers.

Convolutional layer is obtained by being based on two-way GRU Recognition with Recurrent Neural Network layer using characteristics of image obtained above as input It exports and is calculated as the corresponding classification scoring of width dimension therewith；

Using connectionism timing classify (Connectionist Temporal Classification, CTC) method, Sequence label is converted by Recognition with Recurrent Neural Network layer output result.Probability is defined to sequence label according to the predicted value of every frame, is made Use the negative log-likelihood of probability as objective function training network, can be corresponding with sequence label with direct construction image, it is not necessarily to Mark single character.

In embodiments of the present invention, relative to the Transition Layers of traditional DenseNet using average pond Mode, using maximum pond layer come the texture information of keeping characteristics figure in the embodiment of the present invention, and in most latter two maximum pond In layer, step-length is used to operate in width dimension for 1 pondization, more characteristic informations for retaining width dimension, so that narrow character is examined It surveys more robust.GRU method is used in the embodiment of the present invention, is a kind of Recognition with Recurrent Neural Network more efficient than LSTM network, The extraction of word sequence contextual information can be enhanced, so that the prediction result of text sequence is more accurate.The embodiment of the present invention The CTC method of middle use is the common method for transformation of processing cycle neural network output result, can be converted output result to Sequence label obtains last text results by the operations such as duplicate removal and rejecting space, and process object is entire sequence label, Rather than single character.

Embodiment 2:

It is on the basis of the above embodiments, in embodiments of the present invention, described to incite somebody to action in order to keep Text region accuracy higher Image comprising text to be identified is input to first comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed Before in model, the method also includes:

And image carries out text orientation correction to treated.

The image comprising text to be identified is obtained by image capture devices such as cameras, since the text information in image can It can be distributed any position in the picture, and may only some region include text to be identified in image.Therefore exist Before being identified to the text in image, first using threshold segmentation method and connected domain analysis method to image at Reason removes redundant area, and retains the area image comprising text to be identified.Also, for the Text region knot for guaranteeing image Fruit is more accurate, carries out text orientation correction to the area image comprising text to be identified, is horizontally oriented line of text.Its In, process that sampling threshold dividing method and connected domain analysis method handle image and to comprising text to be identified The process that area image carries out text orientation correction belongs to the prior art, no longer repeats herein the process.

Embodiment 3:

It is every in order to obtain due to only including the text information of sub-fraction in each candidate Suggestion box obtained above The corresponding complete text row information of a line, on the basis of the various embodiments described above, in embodiments of the present invention, according to each time The position for selecting Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box include:

Specifically, the distance between abscissa is the minimum value and the of the abscissa of four angle points of the first candidate Suggestion box The four of the absolute value of difference between the minimum value of the abscissa of four angle points of two candidate Suggestion box or the first candidate Suggestion box Difference between the maximum value of the abscissa of four angle points of the maximum value of the abscissa of a angle point and the second candidate Suggestion box it is exhausted To value.Wherein, the absolute value of difference is smaller, then this two candidate Suggestion box, are more possible to as a pair of of connected frame；Vertical direction Degree of overlapping is the lap determination in vertical direction with the second candidate Suggestion box according to the first candidate Suggestion box, weight Folded degree is bigger, then this two candidate Suggestion box, are more possible to as a pair of of connected frame；Shape similarity be the first candidate Suggestion box and The similarity in terms of global shape of second candidate Suggestion box, shape similarity is bigger, then this two candidate Suggestion box, more have It may be a pair of of connected frame.

Specifically, determine the first candidate Suggestion box and the second candidate Suggestion box whether be a pair of of connected frame process: be directed to Each first candidate Suggestion box judges whether there is one second candidate Suggestion box, wherein second candidate frame and first time The distance d between Suggestion box in the horizontal direction is selected to be less than preset first threshold thresh1, the degree of overlapping of vertical direction Overlap is more than preset second threshold thresh2, and shape similarity similarity is greater than preset third threshold value thresh3.If it exists, then it is assumed that the first candidate Suggestion box is a pair of of connected frame with the second candidate Suggestion box, by the distich The minimum circumscribed rectangle of frame is tied as the first candidate Suggestion box, otherwise, using the first candidate Suggestion box as target Suggestion box.

As shown in figure 5, be wherein the candidate Suggestion box obtained in the embodiment of the present invention according to the above process in dotted line frame, it is empty Wire frame is target Suggestion box, and two frames below arrow are two candidate Suggestion box, the respectively first candidate Suggestion box and second Candidate Suggestion box.Wherein A1, B1, C1, D1, A2, B2, C2, D2 respectively represent the first candidate Suggestion box and the second candidate Suggestion box Four corner locations, h₁And h₂Respectively represent the first candidate Suggestion box the first height and the second candidate Suggestion box it is second high Degree.

When calculating the degree of overlapping of the first candidate Suggestion box and the second candidate Suggestion box in vertical direction, a kind of possible reality The mode of applying is, according to the first candidate Suggestion box and the second candidate Suggestion box vertical direction coordinate length of overlapped part divided by h₁ And h₂In maximum value, i.e., according to following formula: overlap=| y_A2-y_D1|/max(h₁,h₂), calculate the overlapping of vertical direction Degree.

The embodiment of another possibility is, according to the first candidate Suggestion box and the second candidate Suggestion box in vertical direction Coordinate lap divided by h₁And h₂Average value, i.e., according to following formula: overlap=| y_A2-y_D1|/mean(h₁,h₂), Calculate the degree of overlapping of vertical direction.

The third possible embodiment is contemplated that, according to the first candidate Suggestion box and the second candidate Suggestion box in Vertical Square To coordinate lap divided by h₁And h₂Union, i.e., according to following formula: overlap=| y_A2-y_D1|/union(h₁, h₂), calculate the degree of overlapping of vertical direction.

Embodiment 4:

In order to make to determine that the degree of overlapping of vertical direction is more acurrate, on the basis of the various embodiments described above, implement in the present invention In example, determine that the degree of overlapping of vertical direction includes:

According to the of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box Two height and the second vertical coordinate, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the Vertical Square To degree of overlapping, wherein y_A2Represent the second vertical coordinate of the described second candidate Suggestion box, y_D1First candidate is represented to build Discuss the first vertical coordinate of frame, h₁And h₂Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.

In order to accurately determine whether the first candidate Suggestion box and the second candidate Suggestion box are a pair of of connected frame, in the present invention In embodiment, after having determined all candidate Suggestion box according to the above process, for arbitrary two candidate Suggestion box, respectively should Two candidate Suggestion box are set as the first candidate Suggestion box and the second candidate Suggestion box, identify the first of the first candidate Suggestion box The second height and the second vertical coordinate of height and the first vertical coordinate and the second candidate Suggestion box.Firstly, calculating first The absolute value of the difference of vertical coordinate and the second vertical coordinate；Secondly, calculating the smallest height in the first height and the second height Value；Finally, the absolute value of calculating difference and the ratio of minimum height values, the ratio be the first candidate Suggestion box and this second Degree of overlapping of the candidate Suggestion box in vertical direction.The value is bigger, then the first candidate Suggestion box and the second candidate Suggestion box are one A possibility that connected frame, is bigger.

Specifically, according to the first height of the first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box Second height and the second vertical coordinate, according to following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine first time Select the degree of overlapping of Suggestion box and the second candidate Suggestion box.Wherein, y_A2The second vertical coordinate of the described second candidate Suggestion box is represented, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the described first candidate Suggestion box and institute State the height of the second candidate Suggestion box.

Embodiment 5:

In order to make to determine that shape similarity is more acurrate, on the basis of the various embodiments described above, in embodiments of the present invention, really Determining shape similarity includes:

It whether is a pair of of connected frame in order to which the first candidate Suggestion box and the second candidate Suggestion box is determined more accurately, at this In inventive embodiments, after having determined all candidate Suggestion box according to the above process, for arbitrary two candidate Suggestion box, respectively This two candidate Suggestion box are set as the first candidate Suggestion box and the second candidate Suggestion box, identify the first candidate Suggestion box Second height of the first height and the second candidate Suggestion box.Firstly, determining the minimum value in first height and the second height； Secondly, determining the maximum value in first height and the second height；Finally, determine the ratio of the minimum value and maximum value, it should Ratio is the shape similarity of the first candidate Suggestion box and the second candidate Suggestion box.The value is bigger, then first candidate A possibility that Suggestion box and the second candidate Suggestion box are a pair of of connected frame is bigger.

Specifically, according to the second height of the first height of the first candidate Suggestion box and the second candidate Suggestion box, according to Lower formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine the first candidate Suggestion box and the second candidate Suggestion box Shape similarity.Wherein, h₁And h₂Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.

Embodiment 6:

In order to be positioned to the image comprising text to be identified newly inputted, therefore also wrap before locating it Pre-training process is included, on the basis of the various embodiments described above, in embodiments of the present invention, trains the mistake of first model in advance Journey includes:

Since the purpose of first model is to input image to be identified to position the line of text in images to be recognized It is the location information of each position Suggestion box in the image and each position Suggestion box in order to obtain into first model In include content be text the second score value, which is to calculate whether the position Suggestion box is that candidate builds Discuss frame.Therefore before carrying out pre-training to the first model, it is necessary first to be labeled to image data, obtain sample image. Specifically, being labelled in the location information and each position Suggestion box of each position Suggestion box in each image and including Content is the second score value of text.

In specific implementation, a certain number of batch sample images are inputted every time, using propagated forward, error calculation, backward It propagates and weight updates step and is updated to model parameter；It continually enters batch sample and repeats above step, constantly adjustment ginseng Number, the error of corrective networks output and a reference value finally obtain the network parameter of optimization, the i.e. network model of training completion.

Particularly, before network model starts training, general training method is all by the way of random initializtion The initial parameter value of model is set.But the mode of random initializtion model parameter can theoretically converge to it is optimal, But its disadvantage it is also obvious that model restrain needed for the training time it is longer, be easily trapped into local optimum, it is not easy to obtain high-precision Network model.It therefore, in embodiments of the present invention, will trained model in the prior art using transfer learning method Parameter moves to the mode that original model parameter random initializtion is replaced in new model, and this method is accelerated and optimizes new model Learning efficiency and convergence rate.Specifically, real as the present invention using the Text region model parameter of some general datas training The initial parameter for applying the model of example is trained.

Further, using incremental learning training method.Since the sample of simulation mark and true labeled data quantity are poor Different great disparity.Therefore, in embodiments of the present invention, the sample of the simulation mark of ten million magnitude is trained first, and then incremental learning is true Real labeled data.In the increased situation of authentic specimen dynamic, the repetitive learning of the sample to magnanimity simulation mark is avoided, together When take full advantage of history training result, constantly adjust and optimize final model, reduce model training for the time and depositing Store up the demand in space.

Embodiment 7:

It in order to be identified to the image after positioning, therefore further include pre-training process before being identified to it, On the basis of the various embodiments described above, in embodiments of the present invention, the process of second model is trained to include: in advance

Obtain each line of text marked in sample image；

Since the purpose of second model is to input image to be identified to identify the line of text in images to be recognized It is the line of text in order to obtain in the image into second model, determining can be obtained text by Chinese dictionary after this article current row Text information in current row.Therefore before carrying out pre-training to the second model, it is necessary first to be labeled, obtain to image data Take sample image.Specifically, being labelled with each line of text in each image.Next training identical with the first model is used Mode is trained, final the second model for obtaining training and completing, the Text region for new input picture.

For example, by taking the single image Text region of express delivery face as an example, the entire flow of list Text region in express delivery face as shown in FIG. 6 Figure.

First against the express delivery face single image of input, list region in face is intercepted using Threshold segmentation and connected domain analysis method, Preliminary text orientation correction is carried out to the face list region of interception, so that line of text is all in horizontal direction.

Face list area image after aforesaid operations is input to line of text locating module, specific operation process are as follows: will Input of the face list area image as convolutional neural networks, obtains characteristic pattern；Sliding window operation is carried out on this feature figure, each Sliding window center predicts k position Suggestion box according to certain shapes and sizes；Using characteristic pattern obtained above as circulation mind Input through network, obtaining the content for including in the location information and each position Suggestion box of position Suggestion box is commenting for text Score value；Candidate Suggestion box is obtained by given threshold for the score value of position Suggestion box, and according to above-mentioned candidate Suggestion box Merging algorithm it is merged, obtain target Suggestion box, target Suggestion box is the text that the locating module finally obtains Row positioning result.

The line of text positioning result obtained after aforesaid operations is input to line of text identification module, specific operation process Are as follows: using line of text positioning result as the input of convolutional neural networks, extract characteristic pattern；Using this feature figure as circulation nerve net The input of network obtains convolutional layer and exports and be calculated as the corresponding classification scoring of width dimension therewith；It will be followed using CTC method The convolutional layer output result of ring neural network is converted into sequence label, is compared, is obtained most by sequence label and Chinese dictionary Text information afterwards.The text information of above-mentioned acquisition is sorted out respectively according to name, phone and address etc., structuring can be obtained Fast reading electronic surface list information.

Embodiment 8:

Fig. 7 is a kind of character recognition device provided in an embodiment of the present invention, which includes:

Obtain module 701, for by include text to be identified image be input in advance training complete comprising convolution mind In the first model through network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in described image and every is obtained The content for including in a Suggestion box is the first score value of text, wherein first model obtains the characteristic pattern of described image, Sliding window operation is carried out based on the characteristic pattern, each window feature is determined, according to preset width in each window feature Degree and each position Suggestion box of Height Prediction；Using the corresponding window feature sequence of every row of the characteristic pattern as circulation nerve net The input of network obtains the location information for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network and each builds The content for including in view frame is the first score value of text；

Screening module 702, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value；

Merging module 703 merges to obtain target for the position according to each candidate Suggestion box to candidate Suggestion box Suggestion box；

Identification module 704, for by each target Suggestion box be input to that training in advance completes comprising convolutional neural networks In the second model of Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.

Described device further include: correction module 705, for using threshold segmentation method and connected domain analysis method to described Image is handled；And image carries out text orientation correction to treated.

The merging module 703 is specifically used for identifying whether to deposit for the first candidate Suggestion box in each candidate Suggestion box It is less than preset first threshold at a distance between the first candidate Suggestion box abscissa, the degree of overlapping of vertical direction is greater than pre- If second threshold, and shape similarity is greater than the second candidate Suggestion box of preset third threshold value, if it does, by described the One candidate Suggestion box and the second candidate Suggestion box are incorporated as the first candidate Suggestion box；If it does not, this first is waited Select Suggestion box as target Suggestion box.

The merging module 703, specifically for being sat according to the first height of the described first candidate Suggestion box and first are vertical It is marked with and the second height and the second vertical coordinate of the second candidate Suggestion box, using following formula: overlap=| y_A2-y_D1|/ min(h₁,h₂), determine the degree of overlapping of the vertical direction, wherein y_A2Represent the second vertical seat of the described second candidate Suggestion box Mark, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂Respectively represent the described first candidate Suggestion box Second height of the first height and the second candidate Suggestion box.

The merging module 703, specifically for being built according to the first height and the second candidate of the described first candidate Suggestion box The second height for discussing frame, using following formula: similarity=min (h₁,h₂)/max(h₁,h₂), determine that the shape is similar Degree, wherein h₁And h₂Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.

Described device further include:

First training module 706, for obtaining sample image, wherein being labelled with each Suggestion box in the sample image The content that location information and each position Suggestion box include is the second score value of text；Each sample image is input to and includes In first model of convolutional neural networks and Recognition with Recurrent Neural Network, according to the output of each first model, to first model It is trained.

Described device further include:

Second training module 707, for obtaining each line of text marked in sample image；Corresponding line of text will be included Each sample image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each second model Output, second model is trained.

In conclusion the embodiment of the present invention provides a kind of character recognition method and device, comprising: will include text to be identified Image be input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, obtain image In include each Suggestion box location information and each Suggestion box in include content be text the first score value, wherein First model obtains the characteristic pattern of image, carries out sliding window operation based on characteristic pattern, determines each window feature, special in each window According to preset width and each position Suggestion box of Height Prediction in sign；The corresponding window feature sequence of every row of characteristic pattern is made For the input of Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtained based on Recognition with Recurrent Neural Network The content for including in a Suggestion box is the first score value of text；Identify that the first score value is greater than the default candidate for scoring threshold value and builds Discuss frame；According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box；Each target is built View frame is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, and identifies each mesh The text for including in mark Suggestion box.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions each in flowchart and/or the block diagram The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computers Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute In the dress for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram It sets.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of character recognition method, which is characterized in that the described method includes:

By the image comprising text to be identified be input to that training in advance completes comprising convolutional neural networks and Recognition with Recurrent Neural Network The first model in, obtain described image in include each Suggestion box location information and each Suggestion box in include content For the first score value of text, wherein first model obtains the characteristic pattern of described image, is slided based on the characteristic pattern Window operation, determines each window feature, according to preset width and each position of Height Prediction in each window feature Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, followed based on described Ring neural network obtain described image in include each Suggestion box location information and each Suggestion box in include content be First score value of text；

Each target Suggestion box is input to second comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed In model, the text for including in each target Suggestion box is identified.

2. the method as described in claim 1, which is characterized in that described that the image comprising text to be identified is input to preparatory instruction Before practicing in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed, the method also includes:

And image carries out text orientation correction to treated.

3. the method as described in claim 1, which is characterized in that candidate is built in the position according to each candidate Suggestion box View frame merges to obtain target Suggestion box

For the first candidate Suggestion box in each candidate Suggestion box, recognize whether with the first candidate Suggestion box abscissa it Between distance be less than preset first threshold, the degree of overlapping of vertical direction is greater than preset second threshold, and shape similarity is big In the second candidate Suggestion box of preset third threshold value, if it does, by the described first candidate Suggestion box and second candidate Suggestion box is incorporated as the first candidate Suggestion box；If it does not, using the first candidate Suggestion box as target Suggestion box.

4. method as claimed in claim 3, which is characterized in that the degree of overlapping for determining the vertical direction includes:

It is high according to the second of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box Degree and the second vertical coordinate, using following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the vertical direction Degree of overlapping, wherein y_A2Represent the second vertical coordinate of the described second candidate Suggestion box, y_D1Represent the described first candidate Suggestion box The first vertical coordinate, h₁And h₂Respectively represent the first height and the second candidate Suggestion box of the described first candidate Suggestion box Second height.

5. method as claimed in claim 3, which is characterized in that determine that the shape similarity includes:

According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following formula: Similarity=min (h₁,h₂)/max(h₁,h₂), determine the shape similarity, wherein h₁And h₂Respectively represent described The height of one candidate Suggestion box and the second candidate Suggestion box.

6. the method as described in claim 1, which is characterized in that the process for training first model in advance includes:

Obtain sample image, wherein be labelled in the sample image each Suggestion box location information and each position Suggestion box The content for including is the second score value of text；

Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each The output of one model is trained first model.

7. the method as described in claim 1, which is characterized in that the process for training second model in advance includes:

Obtain each line of text marked in sample image；

Each sample image comprising corresponding line of text is input to second comprising convolutional neural networks and Recognition with Recurrent Neural Network In model, according to the output of each second model, second model is trained.

8. a kind of character recognition device, which is characterized in that described device includes:

Obtain module, for by include text to be identified image be input to that training in advance completes comprising convolutional neural networks and In first model of Recognition with Recurrent Neural Network, the location information for each Suggestion box for including in acquisition described image and each Suggestion box In include content be text the first score value, wherein first model obtains the characteristic pattern of described image, based on described Characteristic pattern carries out sliding window operation, each window feature is determined, according to preset width and height in each window feature Predict each position Suggestion box；Using the corresponding window feature sequence of every row of the characteristic pattern as the defeated of Recognition with Recurrent Neural Network Enter, is obtained in location information and each Suggestion box for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network The content for including is the first score value of text；

Merging module merges candidate Suggestion box to obtain target Suggestion box for the position according to each candidate Suggestion box；

Identification module, for each target Suggestion box to be input to the convolution mind comprising Recognition with Recurrent Neural Network that training in advance is completed In the second model through network, the text for including in each target Suggestion box is identified.

9. device as claimed in claim 8, which is characterized in that described device further include:

Correction module, for being handled using threshold segmentation method and connected domain analysis method described image；And to processing Image afterwards carries out text orientation correction.

10. device as claimed in claim 8, which is characterized in that the merging module is specifically used for for each candidate suggestion First candidate Suggestion box in frame recognizes whether that the distance between the first candidate Suggestion box abscissa is less than preset the One threshold value, the degree of overlapping of vertical direction are greater than preset second threshold, and shape similarity is greater than the of preset third threshold value Two candidate Suggestion box, if it does, the described first candidate Suggestion box and the second candidate Suggestion box are incorporated as the first time Select Suggestion box；If it does not, using the first candidate Suggestion box as target Suggestion box.

11. device as claimed in claim 10, which is characterized in that the merging module is specifically used for waiting according to described first The first height of Suggestion box and the second height and the second vertical coordinate of the first vertical coordinate and the second candidate Suggestion box are selected, is adopted With following formula: overlap=| y_A2-y_D1|/min(h₁,h₂), determine the degree of overlapping of the vertical direction, wherein y_A2Represent institute State the second vertical coordinate of the second candidate Suggestion box, y_D1Represent the first vertical coordinate of the described first candidate Suggestion box, h₁And h₂ Respectively represent the first height of the described first candidate Suggestion box and the second height of the second candidate Suggestion box.

12. device as claimed in claim 10, which is characterized in that the merging module is specifically used for waiting according to described first The first height of Suggestion box and the second height of the second candidate Suggestion box are selected, using following formula: similarity=min (h₁, h₂)/max(h₁,h₂), determine the shape similarity, wherein h₁And h₂Respectively represent the described first candidate Suggestion box and described the The height of two candidate Suggestion box.

13. device as claimed in claim 8, which is characterized in that described device further include:

First training module, for obtaining sample image, wherein being labelled with the position letter of each Suggestion box in the sample image The content that breath and each position Suggestion box include is the second score value of text；Each sample image is input to comprising convolution mind In the first model through network and Recognition with Recurrent Neural Network, according to the output of each first model, first model is instructed Practice.

14. device as claimed in claim 8, which is characterized in that described device further include:

Second training module, for obtaining each line of text marked in sample image；By each sample comprising corresponding line of text This image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to the defeated of each second model Out, second model is trained.