CN110245545A - A kind of character recognition method and device - Google Patents
A kind of character recognition method and device Download PDFInfo
- Publication number
- CN110245545A CN110245545A CN201811126275.4A CN201811126275A CN110245545A CN 110245545 A CN110245545 A CN 110245545A CN 201811126275 A CN201811126275 A CN 201811126275A CN 110245545 A CN110245545 A CN 110245545A
- Authority
- CN
- China
- Prior art keywords
- suggestion box
- candidate
- text
- model
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
Abstract
The invention discloses a kind of character recognition method and devices, and the recognition result accuracy for solving the problems, such as text in image is not high.This method comprises: the image comprising text to be identified is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, the first score value that the content for including in the location information and each Suggestion box for each Suggestion box for including in described image is text is obtained;Screen the candidate Suggestion box that score value is greater than default scoring threshold value;According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box;Each target Suggestion box is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, identifies the text for including in each target Suggestion box.
Description
Technical field
The present invention relates to deep learning and technical field of character recognition more particularly to a kind of character recognition methods and device.
Background technique
With the fast development of image capture device, more and more image informations need the mankind to be managed it.And
The automatic management that image information is realized using Internet technology is current best means.
In identification image before text, need to position the text in image first.The text in image is fixed at present
Position method is broadly divided into following two categories: the first is based on Faster RCNN (Faster Region Convolutional
Neural Networks), YOLO (You Only Look Once) or SSD (Single Shot MultiBox Detector)
The position frame homing method of network, such method can directly export line of text scoring and positioning frame body;Second is based on complete
The dividing method of convolutional neural networks (Fully Convolutional Networks, FCN), such method pass through prediction pixel
The text classification of grade is as a result, and carry out certain post-processing generation boundary rectangle frame to result.Real-time and precision are all higher
Faster RCNN method suggests network (Region Proposal Networks, RPN) method after convolution using region
Different text filed candidate frames is generated on characteristic pattern, and classification and position frame time are carried out to candidate region by neural network
Return.But acutely due to the variation of line of text length, conventional candidate frame scheme is difficult to realize the accurate positionin to the type objects, together
When, due to the limitation of computing cost and the requirement of real-time, cannot meet simply by candidate frame size and shape is increased
Required precision needs to improve existing RPN scheme.
It is that number connection inscription product science and technology in Chengdu is limited with the closest existing implementation of the present invention in terms of pictograph identification
A kind of " complex script recognition methods based on deep learning " patent formula of company's application.The program is using single convolution mind
Identify that single character does not account for the context and semantic information that text sequence is included through network, recognition result accuracy is not
It is high.
Summary of the invention
The purpose of the embodiment of the present invention is that a kind of character recognition method and device are provided, for solving the knowledge of text in image
The not high problem of other result precision.
The embodiment of the invention provides a kind of character recognition methods, comprising:
Image comprising text to be identified is input to and trains the neural comprising convolutional neural networks and circulation of completion in advance
In first model of network, include in the location information for each Suggestion box for including in acquisition described image and each Suggestion box
Content is the first score value of text, wherein first model obtains the characteristic pattern of described image, based on the characteristic pattern into
The operation of row sliding window, determines each window feature, each according to preset width and Height Prediction in each window feature
Position Suggestion box;Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, it is based on institute
State Recognition with Recurrent Neural Network obtain described image in include each Suggestion box location information and each Suggestion box in include it is interior
Hold the first score value for text;
Screen the candidate Suggestion box that the first score value is greater than default scoring threshold value;
According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box;
Each target Suggestion box is input to the convolutional neural networks comprising Recognition with Recurrent Neural Network that training is completed in advance
In second model, the text for including in each target Suggestion box is identified.
Further, it is described by the image comprising text to be identified be input to that training in advance completes comprising convolutional Neural net
Before in first model of network and Recognition with Recurrent Neural Network, the method also includes:
Described image is handled using threshold segmentation method and connected domain analysis method;
And image carries out text orientation correction to treated.
Further, the position according to each candidate Suggestion box merges to obtain target and build to candidate Suggestion box
Discussing frame includes:
For the first candidate Suggestion box in each candidate Suggestion box, recognize whether and the horizontal seat of the first candidate Suggestion box
The distance between mark is less than preset first threshold, and the degree of overlapping of vertical direction is greater than preset second threshold, and shape is similar
Degree is greater than the second candidate Suggestion box of preset third threshold value, if it does, by the described first candidate Suggestion box and described second
Candidate Suggestion box is incorporated as the first candidate Suggestion box;If it does not, using the first candidate Suggestion box as target Suggestion box.
Further, it is determined that the degree of overlapping of the vertical direction includes:
According to the of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box
Two height and the second vertical coordinate, using following formula: overlap=| yA2-yD1|/min(h1,h2), determine the Vertical Square
To degree of overlapping, wherein yA2Represent the second vertical coordinate of the described second candidate Suggestion box, yD1First candidate is represented to build
Discuss the first vertical coordinate of frame, h1And h2The first height and second candidate for respectively representing the described first candidate Suggestion box are built
Discuss the second height of frame.
Further, it is determined that the shape similarity includes:
According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following public affairs
Formula: similarity=min (h1,h2)/max(h1,h2), determine the shape similarity, wherein h1And h2Respectively represent institute
State the height of the first candidate Suggestion box and the second candidate Suggestion box.
Further, the process for training first model in advance includes:
Sample image is obtained, wherein being labelled with the location information of each Suggestion box in the sample image and each position is built
The content that view frame includes is the second score value of text;
Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to every
The output of a first model is trained first model.
Further, the process for training second model in advance includes:
Obtain each line of text marked in sample image;
Each sample image comprising corresponding line of text is input to comprising convolutional neural networks and Recognition with Recurrent Neural Network
In second model, according to the output of each second model, second model is trained.
The embodiment of the invention provides a kind of character recognition device, which includes:
Obtain module, for by include text to be identified image be input to that training in advance completes comprising convolutional Neural net
In first model of network and Recognition with Recurrent Neural Network, obtains the location information for each Suggestion box for including in described image and each build
The content for including in view frame is the first score value of text, wherein first model obtains the characteristic pattern of described image, is based on
The characteristic pattern carries out sliding window operation, determines each window feature, in each window feature according to preset width and
The each position Suggestion box of Height Prediction;Using the corresponding window feature sequence of every row of the characteristic pattern as Recognition with Recurrent Neural Network
The input of model obtains the location information for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network submodel
And the content in each Suggestion box including is the first score value of text;
Screening module, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value;
Merging module is merged to obtain target and be built for the position according to each candidate Suggestion box to candidate Suggestion box
Discuss frame;
Identification module, for by each target Suggestion box be input to that training in advance completes comprising convolutional neural networks and following
In second model of ring neural network, the text for including in each target Suggestion box is identified.
Further, described device further include:
Correction module, for being handled using threshold segmentation method and connected domain analysis method described image;And it is right
Treated, and image carries out text orientation correction.
Further, the merging module is specifically used for for the first candidate Suggestion box, identification in each candidate Suggestion box
It is less than preset first threshold, the degree of overlapping of vertical direction with the presence or absence of with the distance between the first candidate Suggestion box abscissa
Greater than preset second threshold, and shape similarity is greater than the second candidate Suggestion box of preset third threshold value, if it does, will
Described first candidate Suggestion box and the second candidate Suggestion box are incorporated as the first candidate Suggestion box;If it does not, should
First candidate Suggestion box is as target Suggestion box.
Further, the merging module, specifically for the first height and first according to the described first candidate Suggestion box
The second height and the second vertical coordinate of vertical coordinate and the second candidate Suggestion box, using following formula: overlap=| yA2-
yD1|/min(h1,h2), determine the degree of overlapping of the vertical direction, wherein yA2Represent the described second candidate Suggestion box second is hung down
Straight coordinate, yD1Represent the first vertical coordinate of the described first candidate Suggestion box, h1And h2Respectively represent the described first candidate suggestion
The height of frame and the second candidate Suggestion box.
Further, the merging module, specifically for the first height and second according to the described first candidate Suggestion box
Second height of candidate Suggestion box, using following formula: similarity=min (h1,h2)/max(h1,h2), determine the shape
Shape similarity, wherein h1And h2Respectively represent the first height and the second candidate Suggestion box of the described first candidate Suggestion box
Second height.
Further, described device further include:
First training module, for obtaining sample image, wherein being labelled with the position of each Suggestion box in the sample image
The content that confidence breath and each position Suggestion box include is the second score value of text;Each sample image is input to comprising volume
In first model of product neural network and Recognition with Recurrent Neural Network, according to the output of each first model, to first model into
Row training.
Further, described device further include:
Second training module, for obtaining each line of text marked in sample image;The every of corresponding line of text will be included
A sample image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each second model
Output, is trained second model.
The embodiment of the present invention provides a kind of character recognition method and device, this method are defeated by the image comprising text to be identified
Enter in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed to preparatory training, includes in acquisition image
The content for including in the location information of each Suggestion box and each Suggestion box is the first score value of text, wherein the first model
Obtain image characteristic pattern, based on characteristic pattern carry out sliding window operation, determine each window feature, in each window feature according to
Preset width and each position Suggestion box of Height Prediction;Using the corresponding window feature sequence of every row of characteristic pattern as circulation mind
Input through network, location information and each Suggestion box based on each Suggestion box for including in Recognition with Recurrent Neural Network acquisition image
In include content be text the first score value.Identify that the first score value is greater than the candidate Suggestion box of default scoring threshold value;Root
According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box;Each target Suggestion box is defeated
Enter in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed to preparatory training, identifies each target suggestion
The text for including in frame.
Due in embodiments of the present invention, by the image comprising text to be identified be input to that training in advance completes comprising volume
In first model of product neural network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtained
The content for including in a Suggestion box is the first score value of text.First model can effectively obtain the upper and lower of text sequence
Literary information simultaneously adds it in position fixing process, specifically, the score value that the white space Suggestion box between same row text is set
It can be promoted because of the sequence signature of front and back text, the line of text position frame obtained is finally made to be more in line with text sequence
Position feature, line of text positioning result are more accurate.Secondly, including by what each target Suggestion box was input to training completion in advance
In second model of convolutional neural networks and Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.This second
Model is due to can be enhanced the extraction of word sequence contextual information comprising Recognition with Recurrent Neural Network, so that the prediction of text sequence
As a result more accurate.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this
For the those of ordinary skill in field, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow diagram for character recognition method that the embodiment of the present invention 1 provides;
Fig. 2 is the specific implementation procedure schematic diagram for the line of text positioning operation that the embodiment of the present invention 1 provides;
Fig. 3 is the effect diagram operated by Recognition with Recurrent Neural Network that the embodiment of the present invention 1 provides;
Fig. 4 is the specific implementation procedure schematic diagram for the line of text identification operation that the embodiment of the present invention 1 provides;
Fig. 5 is the Suggestion box desired position information schematic diagram that the embodiment of the present invention 3 provides;
Fig. 6 is that the entire flow diagram for the express delivery face list Text region that the embodiment of the present invention 7 provides is intended to;
Fig. 7 is a kind of character recognition device structural schematic diagram that the embodiment of the present invention 8 provides.
Specific embodiment
The present invention will be describe below in further detail with reference to the accompanying drawings, it is clear that described embodiment is only this
Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
All other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Embodiment 1:
Fig. 1 is a kind of process schematic of character recognition method provided in an embodiment of the present invention, which includes following step
It is rapid:
S101: by the image comprising text to be identified be input to that training in advance completes comprising convolutional neural networks and circulation
In first model of neural network, obtains and wrapped in the location information and each Suggestion box for each Suggestion box for including in described image
The content contained is the first score value of text, wherein first model obtains the characteristic pattern of described image, is based on the feature
Figure carries out sliding window operation, each window feature is determined, according to preset width and Height Prediction in each window feature
Each position Suggestion box;Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, base
Include in the location information and each Suggestion box that the Recognition with Recurrent Neural Network obtains each Suggestion box for including in described image
Content be text the first score value.
Since the text information in image is likely distributed in any position in image, and may there was only one in image
Subregion includes text to be identified.Therefore before being identified to the text in image, it is necessary first to the text in image
Current row location information carries out positioning operation and obtains the location information of line of text in the picture.According to the text line position after positioning operation
Confidence breath carries out identification operation to text wherein included.
Wherein, two neural networks for including in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network can be with
It is: convolutional neural networks and Recognition with Recurrent Neural Network.Since the purpose of the operation of convolutional neural networks and Recognition with Recurrent Neural Network is all
In order to realize the positioning to line of text location information in image, therefore by above-mentioned two neural network collectively the first mould
Type.The image of text to be identified is input in convolutional neural networks, the convolution sum pondization operation by several layers finally obtains
The characteristic pattern of image.Sliding window convolution operation is carried out on this feature figure, obtains window feature;And when carrying out sliding window operation every
A sliding window center goes out each position Suggestion box according to the width and Height Prediction of setting.It is obtained above-mentioned according to sliding window convolution operation
Window feature be input in Recognition with Recurrent Neural Network, the coordinate information of each position Suggestion box of final output and the position are built
The first score value containing text in frame is discussed, the score value is for judging whether the position Suggestion box is candidate Suggestion box.
When determining position Suggestion box, the embodiment of the present invention passes through the width and height at each sliding window center using setting
Predict each position Suggestion box.Since the text height in the line of text in image is indefinite, built according to fixation in the prior art
The Suggestion box generation method of frame size and shape, the problem of will cause line of text position inaccurate are discussed, and the embodiment of the present invention mentions
The method of the generation position Suggestion box of confession can solve above-mentioned problem.And by setting threshold decision position Suggestion box whether
For candidate Suggestion box, redundant position Suggestion box is removed, it is possible to reduce open due to increasing the calculating of Suggestion box size and shape bring
Pin.Meanwhile line of text is positioned by introducing Recognition with Recurrent Neural Network model in first model, due to recycling nerve net
Network model itself has the characteristic of memory, therefore the context of text sequence can be effectively obtained using the Recognition with Recurrent Neural Network
Information simultaneously adds it in position fixing process.In specific implementation it is possible that a kind of situation be, between same row text
White space Suggestion box score value can because front and back text sequence signature and be promoted, finally make acquisition text
Row Suggestion box is more in line with the position feature of text sequence, keeps positioning result more accurate.
For example, by taking the single image Text region of express delivery face as an example, the line of text positioning operation of express delivery face single image to be identified
Specific implementation procedure it is as shown in Figure 2, wherein Convx_x represents the convolution operation of disparate modules, the dotted line connection of convolution module
Part represents pondization operation.BLSTM (Bidirectional Long Short-term Memory) is two-way long short-term memory
Neural network, FC (Fully Connected) refer to full articulamentum, predict k location Suggestion box altogether in characteristic pattern conv5_3,
After BLSTM and FC layers, exporting the content for including in the location information and each Suggestion box of each Suggestion box of prediction is
The score value of text.
Image to be identified is input to the convolutional neural networks based on VGGNet that training is completed in advance first, extracts figure
As feature, the network alternately convolution-pondization is operated, specifically, image pass through altogether 13 3 × 3 convolutional layer and 42 ×
2 maximum pond layer, the final shape that obtains is W × H × C characteristic pattern conv5_3, and wherein W, H, C respectively represent characteristic pattern
Wide, high and port number;
It is 1 that step-length is carried out on characteristic pattern conv5_3 obtained above, and the sliding window convolution that convolution kernel size is 3 × 3 is grasped
Make, and predicts k position Suggestion box according to certain shapes and sizes at each sliding window center;
In specific implementation, k is set as 10;And certain shapes and sizes are particularly: fixed wide using small scale
Degree, the position Suggestion box set-up mode only changed on altitude range.Specifically, fixed width can be set to 16 pixels, it is high
The mode changed in degree range are as follows: height is reduced to 11 pictures from 283 pixels according to the method that reduction ratio is 0.7 respectively
Element predicts 10 position Suggestion box according to above-mentioned method altogether.
Secondly, the window feature of features described above figure conv5_3 t 3 × 3 × C obtained by sliding window convolution operation is made
It is characterized sequence inputting BLSTM neural network, is cyclically updated the internal state H of hidden layert, according to the following formula to internal state
It is cyclically updated:
Wherein Xt∈R3×3×CIt is the characteristic sequence obtained in the every a line of characteristic pattern conv5_3 from t sliding window, W is
The width of characteristic pattern conv5_3, C are characterized the port number of figure conv5_3,For nonlinear function.Obtain effective context
Information connects the first scoring that the content for including in the location information and each Suggestion box of the FC layers of each Suggestion box of output is text
Value.
For example, Fig. 3 be specific implementation process in an example, the figure show the result is that by BLSTM neural network
Corresponding first score value of the Suggestion box and Suggestion box of prediction after operation.Wherein the box of the third line represents Suggestion box, the
The number of two rows represents the first score value of Suggestion box, and the number in the first row represents the location index value of corresponding Suggestion box,
Middle index value is for traversing Suggestion box.
S102: the first score value of screening is greater than the candidate Suggestion box of default scoring threshold value.
In the above-mentioned Suggestion box determined, it is possible to there is the Suggestion box not comprising text information.Therefore it is directed to upper one
The content for including in each Suggestion box that step obtains is the score value of text, eliminates redundancy suggestion by preset scoring threshold value
Frame obtains candidate Suggestion box.Specifically, the Suggestion box is considered as if the score value of the Suggestion box is greater than preset scoring threshold value
Candidate Suggestion box;On the contrary, the Suggestion box is considered as redundancy suggestion if the score value of the Suggestion box is not more than preset scoring threshold value
Frame, and remove the Suggestion box.
For example, in specific implementation, preset scoring threshold value can be set to 0.7, judge Suggestion box score value whether
Greater than 0.7, if so, the Suggestion box is candidate Suggestion box;If it is not, then the Suggestion box is considered as redundancy Suggestion box, and it is superfluous to eliminate this
Remaining Suggestion box.
S103: according to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box.
The corresponding text of a line every in image is positioned in order to realize, candidate Suggestion box obtained above need to be carried out
Merging finds out target Suggestion box.Therefore according to the location information of the above-mentioned candidate Suggestion box found out, to candidate Suggestion box one by one into
Row, which merges, finds out target Suggestion box.
Wherein, the process merged for two candidate Suggestion box, in specific implementation, a kind of possible embodiment
For using the minimum circumscribed rectangle of this two candidate Suggestion box as the frame obtained after merging, i.e. target Suggestion box.
A kind of possible embodiment judges this two candidate Suggestion box in level for any two candidate's Suggestion box
Whether the distance in direction is less than the threshold value of setting, if it is, merging this two candidate Suggestion box.
S104: by each target Suggestion box be input in advance training complete comprising convolutional neural networks and circulation nerve net
In second model of network, the text for including in each target Suggestion box is identified.
Obtain line of text location information positioning result after, need to identify the text in line of text, identification it is accurate
Rate is very crucial for the automatic management for realizing image.Therefore the line of text in image is positioned by aforesaid operations,
And after obtaining target Suggestion box, in order to identify the text in the target Suggestion box after positioning, by target Suggestion box obtained above
In the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that input training in advance is completed, identify in target Suggestion box
Text information.
Wherein, two neural networks for including in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network can be with
It is: convolutional neural networks and Recognition with Recurrent Neural Network.Since the purpose of the operation of convolutional neural networks and Recognition with Recurrent Neural Network is all
In order to realize the identification to text information in image, therefore by above-mentioned two neural network collectively the second model.
Using target Suggestion box obtained above as the input of convolutional neural networks, by several convolution sum maximum ponds
Operation obtains image convolution feature, and using image convolution feature as the input of Recognition with Recurrent Neural Network, obtains the output of convolutional layer
And it is calculated as the corresponding classification scoring of width dimension therewith.Using connectionism timing classification method by Recognition with Recurrent Neural Network
Output result be converted into sequence label, and probability is defined to sequence label according to the predicted value of every frame, uses negative pair of probability
Number likelihood, can be corresponding with sequence label with direct construction image as objective function training network, without marking single character.
For example, input picture is the target Suggestion box obtained after aforesaid operations, wait know by taking the single image of express delivery face as an example
The specific implementation procedure of the line of text identification operation of other express delivery face single image is as shown in Figure 4, wherein Convolution represents 3
× 3 convolutional layer, Dense Blocks represent 1 × 1 and 3 × 3 combined convolutional layers, and Transition Layers represents 2 × 2
Maximum pond layer, BGRU (Bidirectional Gated Recurrent Unit) be based on two-way GRU recycle nerve net
Network model.
Detailed process are as follows: by the express delivery face single image after positioning be input in advance training complete based on DenseNet's
Ultra-deep network structure extracts characteristics of image, image first pass around 3 × 3 convolutional layer, again successively alternately across several 1 × 1 and 3 ×
3 combined convolutional layers and 1 × 1 convolutional layer and 2 × 2 maximum pond layer, network model depth reaches 120 layers.
Convolutional layer is obtained by being based on two-way GRU Recognition with Recurrent Neural Network layer using characteristics of image obtained above as input
It exports and is calculated as the corresponding classification scoring of width dimension therewith;
Using connectionism timing classify (Connectionist Temporal Classification, CTC) method,
Sequence label is converted by Recognition with Recurrent Neural Network layer output result.Probability is defined to sequence label according to the predicted value of every frame, is made
Use the negative log-likelihood of probability as objective function training network, can be corresponding with sequence label with direct construction image, it is not necessarily to
Mark single character.
In embodiments of the present invention, relative to the Transition Layers of traditional DenseNet using average pond
Mode, using maximum pond layer come the texture information of keeping characteristics figure in the embodiment of the present invention, and in most latter two maximum pond
In layer, step-length is used to operate in width dimension for 1 pondization, more characteristic informations for retaining width dimension, so that narrow character is examined
It surveys more robust.GRU method is used in the embodiment of the present invention, is a kind of Recognition with Recurrent Neural Network more efficient than LSTM network,
The extraction of word sequence contextual information can be enhanced, so that the prediction result of text sequence is more accurate.The embodiment of the present invention
The CTC method of middle use is the common method for transformation of processing cycle neural network output result, can be converted output result to
Sequence label obtains last text results by the operations such as duplicate removal and rejecting space, and process object is entire sequence label,
Rather than single character.
Embodiment 2:
It is on the basis of the above embodiments, in embodiments of the present invention, described to incite somebody to action in order to keep Text region accuracy higher
Image comprising text to be identified is input to first comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed
Before in model, the method also includes:
Described image is handled using threshold segmentation method and connected domain analysis method;
And image carries out text orientation correction to treated.
The image comprising text to be identified is obtained by image capture devices such as cameras, since the text information in image can
It can be distributed any position in the picture, and may only some region include text to be identified in image.Therefore exist
Before being identified to the text in image, first using threshold segmentation method and connected domain analysis method to image at
Reason removes redundant area, and retains the area image comprising text to be identified.Also, for the Text region knot for guaranteeing image
Fruit is more accurate, carries out text orientation correction to the area image comprising text to be identified, is horizontally oriented line of text.Its
In, process that sampling threshold dividing method and connected domain analysis method handle image and to comprising text to be identified
The process that area image carries out text orientation correction belongs to the prior art, no longer repeats herein the process.
Embodiment 3:
It is every in order to obtain due to only including the text information of sub-fraction in each candidate Suggestion box obtained above
The corresponding complete text row information of a line, on the basis of the various embodiments described above, in embodiments of the present invention, according to each time
The position for selecting Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box include:
For the first candidate Suggestion box in each candidate Suggestion box, recognize whether and the horizontal seat of the first candidate Suggestion box
The distance between mark is less than preset first threshold, and the degree of overlapping of vertical direction is greater than preset second threshold, and shape is similar
Degree is greater than the second candidate Suggestion box of preset third threshold value, if it does, by the described first candidate Suggestion box and described second
Candidate Suggestion box is incorporated as the first candidate Suggestion box;If it does not, using the first candidate Suggestion box as target Suggestion box.
Specifically, the distance between abscissa is the minimum value and the of the abscissa of four angle points of the first candidate Suggestion box
The four of the absolute value of difference between the minimum value of the abscissa of four angle points of two candidate Suggestion box or the first candidate Suggestion box
Difference between the maximum value of the abscissa of four angle points of the maximum value of the abscissa of a angle point and the second candidate Suggestion box it is exhausted
To value.Wherein, the absolute value of difference is smaller, then this two candidate Suggestion box, are more possible to as a pair of of connected frame;Vertical direction
Degree of overlapping is the lap determination in vertical direction with the second candidate Suggestion box according to the first candidate Suggestion box, weight
Folded degree is bigger, then this two candidate Suggestion box, are more possible to as a pair of of connected frame;Shape similarity be the first candidate Suggestion box and
The similarity in terms of global shape of second candidate Suggestion box, shape similarity is bigger, then this two candidate Suggestion box, more have
It may be a pair of of connected frame.
Specifically, determine the first candidate Suggestion box and the second candidate Suggestion box whether be a pair of of connected frame process: be directed to
Each first candidate Suggestion box judges whether there is one second candidate Suggestion box, wherein second candidate frame and first time
The distance d between Suggestion box in the horizontal direction is selected to be less than preset first threshold thresh1, the degree of overlapping of vertical direction
Overlap is more than preset second threshold thresh2, and shape similarity similarity is greater than preset third threshold value
thresh3.If it exists, then it is assumed that the first candidate Suggestion box is a pair of of connected frame with the second candidate Suggestion box, by the distich
The minimum circumscribed rectangle of frame is tied as the first candidate Suggestion box, otherwise, using the first candidate Suggestion box as target Suggestion box.
As shown in figure 5, be wherein the candidate Suggestion box obtained in the embodiment of the present invention according to the above process in dotted line frame, it is empty
Wire frame is target Suggestion box, and two frames below arrow are two candidate Suggestion box, the respectively first candidate Suggestion box and second
Candidate Suggestion box.Wherein A1, B1, C1, D1, A2, B2, C2, D2 respectively represent the first candidate Suggestion box and the second candidate Suggestion box
Four corner locations, h1And h2Respectively represent the first candidate Suggestion box the first height and the second candidate Suggestion box it is second high
Degree.
When calculating the degree of overlapping of the first candidate Suggestion box and the second candidate Suggestion box in vertical direction, a kind of possible reality
The mode of applying is, according to the first candidate Suggestion box and the second candidate Suggestion box vertical direction coordinate length of overlapped part divided by h1
And h2In maximum value, i.e., according to following formula: overlap=| yA2-yD1|/max(h1,h2), calculate the overlapping of vertical direction
Degree.
The embodiment of another possibility is, according to the first candidate Suggestion box and the second candidate Suggestion box in vertical direction
Coordinate lap divided by h1And h2Average value, i.e., according to following formula: overlap=| yA2-yD1|/mean(h1,h2),
Calculate the degree of overlapping of vertical direction.
The third possible embodiment is contemplated that, according to the first candidate Suggestion box and the second candidate Suggestion box in Vertical Square
To coordinate lap divided by h1And h2Union, i.e., according to following formula: overlap=| yA2-yD1|/union(h1,
h2), calculate the degree of overlapping of vertical direction.
Embodiment 4:
In order to make to determine that the degree of overlapping of vertical direction is more acurrate, on the basis of the various embodiments described above, implement in the present invention
In example, determine that the degree of overlapping of vertical direction includes:
According to the of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box
Two height and the second vertical coordinate, using following formula: overlap=| yA2-yD1|/min(h1,h2), determine the Vertical Square
To degree of overlapping, wherein yA2Represent the second vertical coordinate of the described second candidate Suggestion box, yD1First candidate is represented to build
Discuss the first vertical coordinate of frame, h1And h2Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.
In order to accurately determine whether the first candidate Suggestion box and the second candidate Suggestion box are a pair of of connected frame, in the present invention
In embodiment, after having determined all candidate Suggestion box according to the above process, for arbitrary two candidate Suggestion box, respectively should
Two candidate Suggestion box are set as the first candidate Suggestion box and the second candidate Suggestion box, identify the first of the first candidate Suggestion box
The second height and the second vertical coordinate of height and the first vertical coordinate and the second candidate Suggestion box.Firstly, calculating first
The absolute value of the difference of vertical coordinate and the second vertical coordinate;Secondly, calculating the smallest height in the first height and the second height
Value;Finally, the absolute value of calculating difference and the ratio of minimum height values, the ratio be the first candidate Suggestion box and this second
Degree of overlapping of the candidate Suggestion box in vertical direction.The value is bigger, then the first candidate Suggestion box and the second candidate Suggestion box are one
A possibility that connected frame, is bigger.
Specifically, according to the first height of the first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box
Second height and the second vertical coordinate, according to following formula: overlap=| yA2-yD1|/min(h1,h2), determine first time
Select the degree of overlapping of Suggestion box and the second candidate Suggestion box.Wherein, yA2The second vertical coordinate of the described second candidate Suggestion box is represented,
yD1Represent the first vertical coordinate of the described first candidate Suggestion box, h1And h2Respectively represent the described first candidate Suggestion box and institute
State the height of the second candidate Suggestion box.
Embodiment 5:
In order to make to determine that shape similarity is more acurrate, on the basis of the various embodiments described above, in embodiments of the present invention, really
Determining shape similarity includes:
According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following public affairs
Formula: similarity=min (h1,h2)/max(h1,h2), determine the shape similarity, wherein h1And h2Respectively represent institute
State the height of the first candidate Suggestion box and the second candidate Suggestion box.
It whether is a pair of of connected frame in order to which the first candidate Suggestion box and the second candidate Suggestion box is determined more accurately, at this
In inventive embodiments, after having determined all candidate Suggestion box according to the above process, for arbitrary two candidate Suggestion box, respectively
This two candidate Suggestion box are set as the first candidate Suggestion box and the second candidate Suggestion box, identify the first candidate Suggestion box
Second height of the first height and the second candidate Suggestion box.Firstly, determining the minimum value in first height and the second height;
Secondly, determining the maximum value in first height and the second height;Finally, determine the ratio of the minimum value and maximum value, it should
Ratio is the shape similarity of the first candidate Suggestion box and the second candidate Suggestion box.The value is bigger, then first candidate
A possibility that Suggestion box and the second candidate Suggestion box are a pair of of connected frame is bigger.
Specifically, according to the second height of the first height of the first candidate Suggestion box and the second candidate Suggestion box, according to
Lower formula: similarity=min (h1,h2)/max(h1,h2), determine the first candidate Suggestion box and the second candidate Suggestion box
Shape similarity.Wherein, h1And h2Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.
Embodiment 6:
In order to be positioned to the image comprising text to be identified newly inputted, therefore also wrap before locating it
Pre-training process is included, on the basis of the various embodiments described above, in embodiments of the present invention, trains the mistake of first model in advance
Journey includes:
Sample image is obtained, wherein being labelled with the location information of each Suggestion box in the sample image and each position is built
The content that view frame includes is the second score value of text;
Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to every
The output of a first model is trained first model.
Since the purpose of first model is to input image to be identified to position the line of text in images to be recognized
It is the location information of each position Suggestion box in the image and each position Suggestion box in order to obtain into first model
In include content be text the second score value, which is to calculate whether the position Suggestion box is that candidate builds
Discuss frame.Therefore before carrying out pre-training to the first model, it is necessary first to be labeled to image data, obtain sample image.
Specifically, being labelled in the location information and each position Suggestion box of each position Suggestion box in each image and including
Content is the second score value of text.
In specific implementation, a certain number of batch sample images are inputted every time, using propagated forward, error calculation, backward
It propagates and weight updates step and is updated to model parameter;It continually enters batch sample and repeats above step, constantly adjustment ginseng
Number, the error of corrective networks output and a reference value finally obtain the network parameter of optimization, the i.e. network model of training completion.
Particularly, before network model starts training, general training method is all by the way of random initializtion
The initial parameter value of model is set.But the mode of random initializtion model parameter can theoretically converge to it is optimal,
But its disadvantage it is also obvious that model restrain needed for the training time it is longer, be easily trapped into local optimum, it is not easy to obtain high-precision
Network model.It therefore, in embodiments of the present invention, will trained model in the prior art using transfer learning method
Parameter moves to the mode that original model parameter random initializtion is replaced in new model, and this method is accelerated and optimizes new model
Learning efficiency and convergence rate.Specifically, real as the present invention using the Text region model parameter of some general datas training
The initial parameter for applying the model of example is trained.
Further, using incremental learning training method.Since the sample of simulation mark and true labeled data quantity are poor
Different great disparity.Therefore, in embodiments of the present invention, the sample of the simulation mark of ten million magnitude is trained first, and then incremental learning is true
Real labeled data.In the increased situation of authentic specimen dynamic, the repetitive learning of the sample to magnanimity simulation mark is avoided, together
When take full advantage of history training result, constantly adjust and optimize final model, reduce model training for the time and depositing
Store up the demand in space.
Embodiment 7:
It in order to be identified to the image after positioning, therefore further include pre-training process before being identified to it,
On the basis of the various embodiments described above, in embodiments of the present invention, the process of second model is trained to include: in advance
Obtain each line of text marked in sample image;
Each sample image comprising corresponding line of text is input to comprising convolutional neural networks and Recognition with Recurrent Neural Network
In second model, according to the output of each second model, second model is trained.
Since the purpose of second model is to input image to be identified to identify the line of text in images to be recognized
It is the line of text in order to obtain in the image into second model, determining can be obtained text by Chinese dictionary after this article current row
Text information in current row.Therefore before carrying out pre-training to the second model, it is necessary first to be labeled, obtain to image data
Take sample image.Specifically, being labelled with each line of text in each image.Next training identical with the first model is used
Mode is trained, final the second model for obtaining training and completing, the Text region for new input picture.
For example, by taking the single image Text region of express delivery face as an example, the entire flow of list Text region in express delivery face as shown in FIG. 6
Figure.
First against the express delivery face single image of input, list region in face is intercepted using Threshold segmentation and connected domain analysis method,
Preliminary text orientation correction is carried out to the face list region of interception, so that line of text is all in horizontal direction.
Face list area image after aforesaid operations is input to line of text locating module, specific operation process are as follows: will
Input of the face list area image as convolutional neural networks, obtains characteristic pattern;Sliding window operation is carried out on this feature figure, each
Sliding window center predicts k position Suggestion box according to certain shapes and sizes;Using characteristic pattern obtained above as circulation mind
Input through network, obtaining the content for including in the location information and each position Suggestion box of position Suggestion box is commenting for text
Score value;Candidate Suggestion box is obtained by given threshold for the score value of position Suggestion box, and according to above-mentioned candidate Suggestion box
Merging algorithm it is merged, obtain target Suggestion box, target Suggestion box is the text that the locating module finally obtains
Row positioning result.
The line of text positioning result obtained after aforesaid operations is input to line of text identification module, specific operation process
Are as follows: using line of text positioning result as the input of convolutional neural networks, extract characteristic pattern;Using this feature figure as circulation nerve net
The input of network obtains convolutional layer and exports and be calculated as the corresponding classification scoring of width dimension therewith;It will be followed using CTC method
The convolutional layer output result of ring neural network is converted into sequence label, is compared, is obtained most by sequence label and Chinese dictionary
Text information afterwards.The text information of above-mentioned acquisition is sorted out respectively according to name, phone and address etc., structuring can be obtained
Fast reading electronic surface list information.
Embodiment 8:
Fig. 7 is a kind of character recognition device provided in an embodiment of the present invention, which includes:
Obtain module 701, for by include text to be identified image be input in advance training complete comprising convolution mind
In the first model through network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in described image and every is obtained
The content for including in a Suggestion box is the first score value of text, wherein first model obtains the characteristic pattern of described image,
Sliding window operation is carried out based on the characteristic pattern, each window feature is determined, according to preset width in each window feature
Degree and each position Suggestion box of Height Prediction;Using the corresponding window feature sequence of every row of the characteristic pattern as circulation nerve net
The input of network obtains the location information for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network and each builds
The content for including in view frame is the first score value of text;
Screening module 702, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value;
Merging module 703 merges to obtain target for the position according to each candidate Suggestion box to candidate Suggestion box
Suggestion box;
Identification module 704, for by each target Suggestion box be input to that training in advance completes comprising convolutional neural networks
In the second model of Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.
Described device further include: correction module 705, for using threshold segmentation method and connected domain analysis method to described
Image is handled;And image carries out text orientation correction to treated.
The merging module 703 is specifically used for identifying whether to deposit for the first candidate Suggestion box in each candidate Suggestion box
It is less than preset first threshold at a distance between the first candidate Suggestion box abscissa, the degree of overlapping of vertical direction is greater than pre-
If second threshold, and shape similarity is greater than the second candidate Suggestion box of preset third threshold value, if it does, by described the
One candidate Suggestion box and the second candidate Suggestion box are incorporated as the first candidate Suggestion box;If it does not, this first is waited
Select Suggestion box as target Suggestion box.
The merging module 703, specifically for being sat according to the first height of the described first candidate Suggestion box and first are vertical
It is marked with and the second height and the second vertical coordinate of the second candidate Suggestion box, using following formula: overlap=| yA2-yD1|/
min(h1,h2), determine the degree of overlapping of the vertical direction, wherein yA2Represent the second vertical seat of the described second candidate Suggestion box
Mark, yD1Represent the first vertical coordinate of the described first candidate Suggestion box, h1And h2Respectively represent the described first candidate Suggestion box
Second height of the first height and the second candidate Suggestion box.
The merging module 703, specifically for being built according to the first height and the second candidate of the described first candidate Suggestion box
The second height for discussing frame, using following formula: similarity=min (h1,h2)/max(h1,h2), determine that the shape is similar
Degree, wherein h1And h2Respectively represent the height of the described first candidate Suggestion box and the second candidate Suggestion box.
Described device further include:
First training module 706, for obtaining sample image, wherein being labelled with each Suggestion box in the sample image
The content that location information and each position Suggestion box include is the second score value of text;Each sample image is input to and includes
In first model of convolutional neural networks and Recognition with Recurrent Neural Network, according to the output of each first model, to first model
It is trained.
Described device further include:
Second training module 707, for obtaining each line of text marked in sample image;Corresponding line of text will be included
Each sample image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each second model
Output, second model is trained.
In conclusion the embodiment of the present invention provides a kind of character recognition method and device, comprising: will include text to be identified
Image be input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, obtain image
In include each Suggestion box location information and each Suggestion box in include content be text the first score value, wherein
First model obtains the characteristic pattern of image, carries out sliding window operation based on characteristic pattern, determines each window feature, special in each window
According to preset width and each position Suggestion box of Height Prediction in sign;The corresponding window feature sequence of every row of characteristic pattern is made
For the input of Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtained based on Recognition with Recurrent Neural Network
The content for including in a Suggestion box is the first score value of text;Identify that the first score value is greater than the default candidate for scoring threshold value and builds
Discuss frame;According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box;Each target is built
View frame is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed, and identifies each mesh
The text for including in mark Suggestion box.
Due in embodiments of the present invention, by the image comprising text to be identified be input to that training in advance completes comprising volume
In first model of product neural network and Recognition with Recurrent Neural Network, the location information of each Suggestion box for including in image and every is obtained
The content for including in a Suggestion box is the first score value of text.First model can effectively obtain the upper and lower of text sequence
Literary information simultaneously adds it in position fixing process, specifically, the score value that the white space Suggestion box between same row text is set
It can be promoted because of the sequence signature of front and back text, the line of text position frame obtained is finally made to be more in line with text sequence
Position feature, line of text positioning result are more accurate.Secondly, including by what each target Suggestion box was input to training completion in advance
In second model of convolutional neural networks and Recognition with Recurrent Neural Network, the text for including in each target Suggestion box is identified.This second
Model is due to can be enhanced the extraction of word sequence contextual information comprising Recognition with Recurrent Neural Network, so that the prediction of text sequence
As a result more accurate.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions each in flowchart and/or the block diagram
The combination of process and/or box in process and/or box and flowchart and/or the block diagram.It can provide these computers
Processor of the program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices
To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute
In the dress for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram
It sets.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (14)
1. a kind of character recognition method, which is characterized in that the described method includes:
By the image comprising text to be identified be input to that training in advance completes comprising convolutional neural networks and Recognition with Recurrent Neural Network
The first model in, obtain described image in include each Suggestion box location information and each Suggestion box in include content
For the first score value of text, wherein first model obtains the characteristic pattern of described image, is slided based on the characteristic pattern
Window operation, determines each window feature, according to preset width and each position of Height Prediction in each window feature
Suggestion box;Using the corresponding window feature sequence of every row of the characteristic pattern as the input of Recognition with Recurrent Neural Network, followed based on described
Ring neural network obtain described image in include each Suggestion box location information and each Suggestion box in include content be
First score value of text;
Screen the candidate Suggestion box that the first score value is greater than default scoring threshold value;
According to the position of each candidate Suggestion box, candidate Suggestion box is merged to obtain target Suggestion box;
Each target Suggestion box is input to second comprising convolutional neural networks and Recognition with Recurrent Neural Network that training in advance is completed
In model, the text for including in each target Suggestion box is identified.
2. the method as described in claim 1, which is characterized in that described that the image comprising text to be identified is input to preparatory instruction
Before practicing in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network completed, the method also includes:
Described image is handled using threshold segmentation method and connected domain analysis method;
And image carries out text orientation correction to treated.
3. the method as described in claim 1, which is characterized in that candidate is built in the position according to each candidate Suggestion box
View frame merges to obtain target Suggestion box
For the first candidate Suggestion box in each candidate Suggestion box, recognize whether with the first candidate Suggestion box abscissa it
Between distance be less than preset first threshold, the degree of overlapping of vertical direction is greater than preset second threshold, and shape similarity is big
In the second candidate Suggestion box of preset third threshold value, if it does, by the described first candidate Suggestion box and second candidate
Suggestion box is incorporated as the first candidate Suggestion box;If it does not, using the first candidate Suggestion box as target Suggestion box.
4. method as claimed in claim 3, which is characterized in that the degree of overlapping for determining the vertical direction includes:
It is high according to the second of the first height of the described first candidate Suggestion box and the first vertical coordinate and the second candidate Suggestion box
Degree and the second vertical coordinate, using following formula: overlap=| yA2-yD1|/min(h1,h2), determine the vertical direction
Degree of overlapping, wherein yA2Represent the second vertical coordinate of the described second candidate Suggestion box, yD1Represent the described first candidate Suggestion box
The first vertical coordinate, h1And h2Respectively represent the first height and the second candidate Suggestion box of the described first candidate Suggestion box
Second height.
5. method as claimed in claim 3, which is characterized in that determine that the shape similarity includes:
According to the second height of the first height of the described first candidate Suggestion box and the second candidate Suggestion box, using following formula:
Similarity=min (h1,h2)/max(h1,h2), determine the shape similarity, wherein h1And h2Respectively represent described
The height of one candidate Suggestion box and the second candidate Suggestion box.
6. the method as described in claim 1, which is characterized in that the process for training first model in advance includes:
Obtain sample image, wherein be labelled in the sample image each Suggestion box location information and each position Suggestion box
The content for including is the second score value of text;
Each sample image is input in the first model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to each
The output of one model is trained first model.
7. the method as described in claim 1, which is characterized in that the process for training second model in advance includes:
Obtain each line of text marked in sample image;
Each sample image comprising corresponding line of text is input to second comprising convolutional neural networks and Recognition with Recurrent Neural Network
In model, according to the output of each second model, second model is trained.
8. a kind of character recognition device, which is characterized in that described device includes:
Obtain module, for by include text to be identified image be input to that training in advance completes comprising convolutional neural networks and
In first model of Recognition with Recurrent Neural Network, the location information for each Suggestion box for including in acquisition described image and each Suggestion box
In include content be text the first score value, wherein first model obtains the characteristic pattern of described image, based on described
Characteristic pattern carries out sliding window operation, each window feature is determined, according to preset width and height in each window feature
Predict each position Suggestion box;Using the corresponding window feature sequence of every row of the characteristic pattern as the defeated of Recognition with Recurrent Neural Network
Enter, is obtained in location information and each Suggestion box for each Suggestion box for including in described image based on the Recognition with Recurrent Neural Network
The content for including is the first score value of text;
Screening module, the candidate Suggestion box for being greater than default scoring threshold value for screening the first score value;
Merging module merges candidate Suggestion box to obtain target Suggestion box for the position according to each candidate Suggestion box;
Identification module, for each target Suggestion box to be input to the convolution mind comprising Recognition with Recurrent Neural Network that training in advance is completed
In the second model through network, the text for including in each target Suggestion box is identified.
9. device as claimed in claim 8, which is characterized in that described device further include:
Correction module, for being handled using threshold segmentation method and connected domain analysis method described image;And to processing
Image afterwards carries out text orientation correction.
10. device as claimed in claim 8, which is characterized in that the merging module is specifically used for for each candidate suggestion
First candidate Suggestion box in frame recognizes whether that the distance between the first candidate Suggestion box abscissa is less than preset the
One threshold value, the degree of overlapping of vertical direction are greater than preset second threshold, and shape similarity is greater than the of preset third threshold value
Two candidate Suggestion box, if it does, the described first candidate Suggestion box and the second candidate Suggestion box are incorporated as the first time
Select Suggestion box;If it does not, using the first candidate Suggestion box as target Suggestion box.
11. device as claimed in claim 10, which is characterized in that the merging module is specifically used for waiting according to described first
The first height of Suggestion box and the second height and the second vertical coordinate of the first vertical coordinate and the second candidate Suggestion box are selected, is adopted
With following formula: overlap=| yA2-yD1|/min(h1,h2), determine the degree of overlapping of the vertical direction, wherein yA2Represent institute
State the second vertical coordinate of the second candidate Suggestion box, yD1Represent the first vertical coordinate of the described first candidate Suggestion box, h1And h2
Respectively represent the first height of the described first candidate Suggestion box and the second height of the second candidate Suggestion box.
12. device as claimed in claim 10, which is characterized in that the merging module is specifically used for waiting according to described first
The first height of Suggestion box and the second height of the second candidate Suggestion box are selected, using following formula: similarity=min (h1,
h2)/max(h1,h2), determine the shape similarity, wherein h1And h2Respectively represent the described first candidate Suggestion box and described the
The height of two candidate Suggestion box.
13. device as claimed in claim 8, which is characterized in that described device further include:
First training module, for obtaining sample image, wherein being labelled with the position letter of each Suggestion box in the sample image
The content that breath and each position Suggestion box include is the second score value of text;Each sample image is input to comprising convolution mind
In the first model through network and Recognition with Recurrent Neural Network, according to the output of each first model, first model is instructed
Practice.
14. device as claimed in claim 8, which is characterized in that described device further include:
Second training module, for obtaining each line of text marked in sample image;By each sample comprising corresponding line of text
This image is input in the second model comprising convolutional neural networks and Recognition with Recurrent Neural Network, according to the defeated of each second model
Out, second model is trained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811126275.4A CN110245545A (en) | 2018-09-26 | 2018-09-26 | A kind of character recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811126275.4A CN110245545A (en) | 2018-09-26 | 2018-09-26 | A kind of character recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110245545A true CN110245545A (en) | 2019-09-17 |
Family
ID=67882838
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811126275.4A Pending CN110245545A (en) | 2018-09-26 | 2018-09-26 | A kind of character recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110245545A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991435A (en) * | 2019-11-27 | 2020-04-10 | 南京邮电大学 | Express waybill key information positioning method and device based on deep learning |
CN111310762A (en) * | 2020-03-16 | 2020-06-19 | 天津得迈科技有限公司 | Intelligent medical bill identification method based on Internet of things |
CN111611985A (en) * | 2020-04-23 | 2020-09-01 | 中南大学 | OCR recognition method based on model fusion |
CN111639566A (en) * | 2020-05-19 | 2020-09-08 | 浙江大华技术股份有限公司 | Method and device for extracting form information |
CN111666937A (en) * | 2020-04-17 | 2020-09-15 | 广州多益网络股份有限公司 | Method and system for recognizing text in image |
CN112016547A (en) * | 2020-08-20 | 2020-12-01 | 上海天壤智能科技有限公司 | Image character recognition method, system and medium based on deep learning |
CN113139539A (en) * | 2021-03-16 | 2021-07-20 | 中国科学院信息工程研究所 | Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary |
CN113392844A (en) * | 2021-06-15 | 2021-09-14 | 重庆邮电大学 | Deep learning-based method for identifying text information on medical film |
CN113762237A (en) * | 2021-04-26 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Text image processing method, device and equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN108073898A (en) * | 2017-12-08 | 2018-05-25 | 腾讯科技(深圳)有限公司 | Number of people area recognizing method, device and equipment |
CN108288078A (en) * | 2017-12-07 | 2018-07-17 | 腾讯科技(深圳)有限公司 | Character identifying method, device and medium in a kind of image |
CN108564084A (en) * | 2018-05-08 | 2018-09-21 | 北京市商汤科技开发有限公司 | character detecting method, device, terminal and storage medium |
-
2018
- 2018-09-26 CN CN201811126275.4A patent/CN110245545A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN108288078A (en) * | 2017-12-07 | 2018-07-17 | 腾讯科技(深圳)有限公司 | Character identifying method, device and medium in a kind of image |
CN108073898A (en) * | 2017-12-08 | 2018-05-25 | 腾讯科技(深圳)有限公司 | Number of people area recognizing method, device and equipment |
CN108564084A (en) * | 2018-05-08 | 2018-09-21 | 北京市商汤科技开发有限公司 | character detecting method, device, terminal and storage medium |
Non-Patent Citations (2)
Title |
---|
ZHI TIAN等: "Detecting Text in Natural Image with Connectionist Text Proposal Network", 《14TH EUROPEAN CONFERENCE ON COMPUTER VISION (ECCV)》 * |
蔡文哲 等: "基于双门限梯度模式的图像文字检测方法", 《计算机科学》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991435A (en) * | 2019-11-27 | 2020-04-10 | 南京邮电大学 | Express waybill key information positioning method and device based on deep learning |
CN111310762A (en) * | 2020-03-16 | 2020-06-19 | 天津得迈科技有限公司 | Intelligent medical bill identification method based on Internet of things |
CN111666937A (en) * | 2020-04-17 | 2020-09-15 | 广州多益网络股份有限公司 | Method and system for recognizing text in image |
CN111611985A (en) * | 2020-04-23 | 2020-09-01 | 中南大学 | OCR recognition method based on model fusion |
CN111639566A (en) * | 2020-05-19 | 2020-09-08 | 浙江大华技术股份有限公司 | Method and device for extracting form information |
CN112016547A (en) * | 2020-08-20 | 2020-12-01 | 上海天壤智能科技有限公司 | Image character recognition method, system and medium based on deep learning |
CN113139539A (en) * | 2021-03-16 | 2021-07-20 | 中国科学院信息工程研究所 | Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary |
CN113762237A (en) * | 2021-04-26 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Text image processing method, device and equipment and storage medium |
CN113762237B (en) * | 2021-04-26 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Text image processing method, device, equipment and storage medium |
CN113392844A (en) * | 2021-06-15 | 2021-09-14 | 重庆邮电大学 | Deep learning-based method for identifying text information on medical film |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110245545A (en) | A kind of character recognition method and device | |
Yuliang et al. | Detecting curve text in the wild: New dataset and new solution | |
CN107871124B (en) | A kind of Remote Sensing Target detection method based on deep neural network | |
CN108960245B (en) | Tire mold character detection and recognition method, device, equipment and storage medium | |
CN106548151B (en) | Target analyte detection track identification method and system towards intelligent robot | |
CN110991435A (en) | Express waybill key information positioning method and device based on deep learning | |
CN110647829A (en) | Bill text recognition method and system | |
CN108509839A (en) | One kind being based on the efficient gestures detection recognition methods of region convolutional neural networks | |
CN108875600A (en) | A kind of information of vehicles detection and tracking method, apparatus and computer storage medium based on YOLO | |
CN106446896A (en) | Character segmentation method and device and electronic equipment | |
CN107679997A (en) | Method, apparatus, terminal device and storage medium are refused to pay in medical treatment Claims Resolution | |
CN106022363B (en) | A kind of Chinese text recognition methods suitable under natural scene | |
CN111259940A (en) | Target detection method based on space attention map | |
CN110120065B (en) | Target tracking method and system based on hierarchical convolution characteristics and scale self-adaptive kernel correlation filtering | |
CN109284779A (en) | Object detecting method based on the full convolutional network of depth | |
CN107609575A (en) | Calligraphy evaluation method, calligraphy evaluating apparatus and electronic equipment | |
CN110135430A (en) | A kind of aluminium mold ID automatic recognition system based on deep neural network | |
CN110796018A (en) | Hand motion recognition method based on depth image and color image | |
CN110610210B (en) | Multi-target detection method | |
CN107766349A (en) | A kind of method, apparatus, equipment and client for generating text | |
CN104881673B (en) | The method and system of pattern-recognition based on information integration | |
CN109343920A (en) | A kind of image processing method and its device, equipment and storage medium | |
CN110135446A (en) | Method for text detection and computer storage medium | |
CN110287952A (en) | A kind of recognition methods and system for tieing up sonagram piece character | |
CN112734803A (en) | Single target tracking method, device, equipment and storage medium based on character description |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |