CN108898131A - It is a kind of complexity natural scene under digital instrument recognition methods - Google Patents
It is a kind of complexity natural scene under digital instrument recognition methods Download PDFInfo
- Publication number
- CN108898131A CN108898131A CN201810500379.0A CN201810500379A CN108898131A CN 108898131 A CN108898131 A CN 108898131A CN 201810500379 A CN201810500379 A CN 201810500379A CN 108898131 A CN108898131 A CN 108898131A
- Authority
- CN
- China
- Prior art keywords
- digital instrument
- text
- feature
- digital
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/02—Recognising information on displays, dials, clocks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The present invention relates to the digital instrument recognition methods under digital instrument identification technology field more particularly to a kind of complicated natural scene.It is a kind of complexity natural scene under digital instrument recognition methods, include the following steps:The digital instrument zone location under complicated natural scene is carried out using SSD algorithm;Feature is extracted using ResNet50 neural network, the feature of extraction is trained using two-way LSTM network, obtains the line of text positioning in digital instrument region;Line of text feature is extracted using ResNet50 neural network, is trained using line of text feature of the BRNN network to extraction, obtains digital instrument recognition result using CTC algorithm.The invention avoids identification errors caused by Character segmentation when background complexity, improve the recognition accuracy of digital instrument.
Description
Technical field
The present invention relates to the digital instrument knowledges under digital instrument identification technology field more particularly to a kind of complicated natural scene
Other method.
Background technique
Digital instrument identification, which refers to, to be found the position of numerical character from digital picture automatically using computer and identifies number
The technology of word character belongs to the scope of pattern-recognition.Digital instrument is high with its precision, reads the advantages that convenient, easy setting in work
Industry and detection field are widely used.
Currently, there are mainly two types of forms for the identification of digital instrument:
1, based on artificial Meter recognition.This method needs artificial eye to identify and record instrument, and process is numerous
Trivial and inefficiency.And during artificial read, since the subjective reason or outside environmental elements of people are also easy to produce reading
Number error, causes measurement accuracy to decline.Contain simultaneously for some external environments more severe scene, such as chemical industry, electric power etc.
There are pernicious gas or low temperature, high temperature, the high place radiated, the mode being unsuitable for manually to read instrument indication value.
2, based on the Meter recognition of machine vision.This method be Instrument image is acquired using video camera, and according to
The algorithm of machine vision identifies image, substantially increases the efficiency of Meter recognition.Such method uses machine vision generation
For manually identifying to instrument, error caused by artificial subjective factor is not only reduced, but also eliminate artificial execute-in-place
Risk.But the existing Meter recognition method based on machine vision, it can only be determined for the single digital instrument of background
Position, segmentation and identification, and most of algorithm can not identify decimal point and sign.Existing algorithm is to digital instrument simultaneously
After table section positioning, it is necessary to just can be carried out identification by Character segmentation, when background complexity typically result in that Character segmentation occurs partly
A unrecognized situation of character.Therefore, it is necessary to a kind of for complicated natural scene and without the calculation of the Meter recognition of Character segmentation
Method.
Number of patent application is 201510920430.X, entitled《A kind of digit recognition method based on intersection point feature extraction》's
Chinese patent application realizes binaryzation, the target that will be identified to the image after gray processing first with maximum variance between clusters
It is distinguished with image background;Secondly, carrying out Character segmentation to LED number, LED number table binary map is obtained;Then two are utilized
Horizontal line is scanned from left to right at the 3/4 and 1/4 of digital table bianry image, records the number of transitions of pixel respectively;Again
Using a vertical line at the 1/2 of digital table bianry image, column scan is carried out from top to bottom, records the number of transitions of pixel;
Ranks pixel transform number is compared with the number of transitions of standard digital finally, number is carried out according to certain logic strategy
Differentiate.
The disadvantages of the method are as follows being unable to the digital instrument under effective position Complex Natural Environment, and this method is just for 0-9
Number identified, do not design the identification of decimal point and sign.
Number of patent application is 201611031884.2, entitled《A kind of digital instrument reading image-recognizing method》China specially
Benefit application, according to the digital instrument image demarcated in advance, extracts area-of-interest using template matching method in panoramic picture,
Single character zone and decimal point area to be tested in area-of-interest are extracted further according to the relative positional relationship of calibration character;It is right
Single character zone carries out single character recognition using the good convolutional neural networks character model of precondition;Decimal point is waited for
Detection zone utilizes the good Cascade target detection based on piecemeal LBP coding characteristic and Adaboost classifier of precondition
Son carries out decimal point detection, and post-processes to testing result;Finally obtained according to character, decimal point and sign recognition result
Read number.
The disadvantages of the method are as follows identifying to single character, segmentation effect seriously affects digital recognition result and can only
The digital instrument of identification ideally, cannot effectively identify the digital instrument under complicated natural scene.
Summary of the invention
The problem of present invention is identified for above-mentioned digital instrument proposes the digital instrument under a kind of complicated natural scene
Table recognition methods, identification error caused by Character segmentation when avoiding background complexity, improves the recognition accuracy of digital instrument.
To achieve the goals above, the present invention uses following technical scheme:
It is a kind of complexity natural scene under digital instrument recognition methods, include the following steps:
Step 1:The digital instrument zone location under complicated natural scene is carried out using SSD algorithm;
Step 2:Feature is extracted using ResNet50 neural network, the feature of extraction is trained using two-way LSTM network,
Obtain the line of text positioning in digital instrument region;
Step 3:Extract line of text feature using ResNet50 neural network, using BRNN network to the line of text feature of extraction into
Row training obtains digital instrument recognition result using CTC algorithm.
Further, the step 1 includes:
Step 1.1:Sample data is pre-processed, pretreated sample data is obtained;
Step 1.2:SSD network model is constructed, in the infrastructure network of VGG16, by the 6th layer and the 7th layer of full articulamentum
It is converted to convolutional layer;Increase by 3 convolutional layers and an average pond layer;
Step 1.3:To every characteristic pattern after convolution, the coordinate and classification after the recurrence of default frame is generated using 3 × 3 convolution are general
Rate;The calculation formula of size of each default frame is:
Wherein m is characterized map number, sminFrame size, s are defaulted for the bottommaxFrame size is defaulted for top;
Step 1.4:The pointer instrument region that definition has marked in advance is ground truth box, passes through ground truth
Box is trained SSD network model;The accurate positioning of multi-angle pointer instrument is carried out using trained SSD network;
Training process is as follows:
The default frame prior box and ground truth box actually chosen is matched according to IOU, IOU T1's
Prior box is positive sample, remaining is negative sample, the T1It is 0.7;The recurrence loss of prior box is carried out from high to low
Sequence, select to return and loses highest M prior box as set D, the positive sample conduct set P after successful match, then just
Sample set is P-D ∩ P, and negative sample integrates as D-D ∩ P;The positive sample collection and negative sample concentrate the quantity of positive sample and negative sample
Than being 1:4, i.e. M are the 1/4 of prior box quantity;
Network parameter is adjusted by loss function, completes the positioning of pointer instrument;
The loss function is:
Wherein, c is class probability, and l is prediction block, and N is the prior box number to match with ground truth box;Such as
Fruit N=0, loss function 0;LconfFor Classification Loss part;Lloc(x, l, g) is prediction block l and g-th of ground truth
Part is lost in the recurrence of box;λ is the weight for returning loss, represents the contribution for returning loss to entire loss function, λ value
It is 0.5;
Step 1.5:Leave out repetition framework using NMS algorithm, chooses digital instrument region.
Further, the step 2 includes:
Step 2.1:Using ResNet50 neural metwork training digital instrument sample data, obtaining size is W × H × C feature
Figure;
Step 2.2:Each position of characteristic pattern takes the forms feature of 3 × 3 × C in the step 2.1, for predicting the position
The corresponding classification information of k anchor point, i.e. anchor and location information;
Step 2.3:The feature of corresponding 3 × 3 × C of all forms of every row is input in two-way LSTM network, size is obtained
For the Output matrix of W × 256;
Step 2.4:The full articulamentum that the Input matrix 512 of W × 256 is tieed up;
Step 2.5:The feature of full articulamentum is input to classification or is returned in layer, the corresponding classification information of anchor and position are obtained
Confidence breath, to obtain multiple fine digital text detection zones;
Step 2.6:According to thresholding method, given threshold T2, by scores<T2Anchor directly delete, the scores
For class probability, duplicate removal, the T are carried out to remaining anchor text box using NMS algorithm2It is 0.8;
Step 2.7:Merge line of text using text construction algorithm;
Step 2.8:It is finely adjusted using horizontal position of the side-refinement algorithm to prediction text framework, obtains number
The positioning of character text row.
Further, the step 3 includes:
Step 3.1:Digital line of text image is pre-processed, digital text row image size is M × N, sets the scaling of M
Than being zoomed in and out according to the pantograph ratio of M to N;
Step 3.2:Pretreated sample data is inputted in ResNet50 neural network and carries out feature extraction, obtains feature
Characteristic pattern is converted into feature vector by column by figure;
Step 3.3:Feature vector is identified using the two-way LSTM algorithm of BRNN network, obtains the classification sequence of each column feature
Column;
Step 3.4:Optimal classification sequence is solved using CTC algorithm, obtains line of text recognition result.
Compared with prior art, the device have the advantages that:
The invention firstly uses SSD algorithms to carry out digital instrument zone location;Then RestNet50 neural network knot is utilized
The positioning that BLSTM network carries out digital text row is closed, RestNet50 neural network combination BRNN network implementations number is finally utilized
The identification of line of text utilizes the recognition result that CTC algorithm picks are optimal.The invention positions meter region using SSD algorithm, improves
Digital instrument locating accuracy under Complex Natural Environment, while using BLSTM algorithm before not doing single character cutting
It puts and entire line of text is identified, identification error caused by Character segmentation, improves digital instrument when avoiding background complexity
The recognition accuracy of table.
Digital instrument recognizer of the invention is can to receive any scene, arbitrary dimension based on training end to end
Image input, identify the character string of random length.The present invention can carry out the digital instrument numerical value under natural scene effective
Identification.
Detailed description of the invention
Fig. 1 is the basic flow chart of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention.
Fig. 2 is the general frame of the digital instrument recognition methods under a kind of complicated natural scene of another embodiment of the present invention
Figure.
Fig. 3 is the SSD network structure of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention
Schematic diagram.
Fig. 4 is the digital instrument region of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention
Position schematic network structure.
Fig. 5 is that the digital text row of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention is fixed
Position schematic network structure.
Fig. 6 is the numerical character region of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention
Identify schematic network structure.
Fig. 7 is the digital instrument identification of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention
One of test result.
Fig. 8 is the digital instrument identification of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention
The two of test result.
Specific embodiment
With reference to the accompanying drawing with specific embodiment the present invention will be further explained explanation:
Embodiment one:
As shown in Figure 1, the digital instrument recognition methods under a kind of complicated natural scene of the invention, includes the following steps:
Step S101:The digital instrument zone location under complicated natural scene is carried out using SSD algorithm;
Step S102:Extract feature using ResNet50 neural network, using two-way LSTM network to the feature of extraction into
Row training obtains the line of text positioning in digital instrument region;
Step S103:Line of text feature is extracted using ResNet50 neural network, using BRNN network to the text of extraction
Row feature is trained, and obtains digital instrument recognition result using CTC algorithm.
Embodiment two:
As shown in Fig. 2, the digital instrument recognition methods under the complicated natural scene of another kind of the invention, including:
The neural network model in each stage needs prior off-line training in the present invention, needs before off-line training to collected
Digital instrument under natural scene is manually marked, i.e., marks the position of digital instrument in image respectively, in digital instrument
The position of line of text and the recognition result of line of text.Using there are the data of label to be trained network, offline instruction is then utilized
The network perfected tests test sample, realizes digital instrument zone location, the positioning of digital text row, Number character recognition
Function.Digital instrument zone location, the positioning of digital text row, the process of Number character recognition are specific as follows shown:
Step S201:Digital instrument zone location.
Digital instrument zone location is to be determined using trained SSD network the digital instrument region in sample data
The process of position.Wherein SSD network is the method for realizing Target detection and identification using single deep neural network model, SSD net
Then fc6 and fc7 layers as shown in figure 3, be converted to using the infrastructure network of VGG16 using first 5 layers by network structure first
Two convolutional layers.3 convolutional layers and one pool layers of average are especially increased again.The feature map of different levels points
Frame, the i.e. offset of default box and the prediction of different classes of score Yong Yu not defaulted, obtained finally by NMS final
Detect positioning result.
The network structure of digital instrument zone location is as shown in figure 4, specific step is as follows:
Step S2011:Sample data is pre-processed, the sample data of 300 × 300 × 3 sizes is obtained.
Step S2012:SSD network model is constructed, wherein the characteristic pattern of model selection includes:38×38,19×19,10×
10,5 × 5,3 × 3,1 × 1, respectively correspond block4, block7, block8, block9, block10, block11.For every
Characteristic pattern is opened, the coordinate and 8 class probabilities after 4 recurrence of default frame (default box) are generated using 3 × 3 convolution.
Step S2013:It is right after preceding 5 layers of progress convolution operation of treated sample data input VGG16 neural network
Convolution characteristic layer generates default frame, then proceeds by convolution operation, successively extracts default to every layer below of convolution characteristic layer
Frame.Each convolution characteristic pattern generates k default frame according to different sizes (scale) and length-width ratio (ratio).
It is each default frame size calculation formula be:
Wherein m is characterized map number, sminFrame size, s are defaulted for the bottommaxFrame size is defaulted for top.Because every
A default frame length-width ratio arRatio value it is original be { 1,2,3,1/2,1/3 }, so it is each default frame width be
It is a height ofThe default frame for being 1 for ratio, additionally adding a ratio isDefault frame.Finally,
Each point in every characteristic pattern generates 6 default frames.Each default frame center is set asWherein, | fk|
For k-th of characteristic pattern size, i ∈ [1, k].
Due to smin、smaxValue directly affect the calculation amount of meter location algorithm, so passing through statistics Instrument image
Area shared by dial plate region, determines s in samplemin=0.1, smax=0.25.And according to instrument shape in digital Instrument image
The observation of shape counts, and for the purpose of covering various digital instrument dial plates, set the ratio value of the length-width ratio for defaulting frame as { 1,2,1/
2 }, the calculation amount of pointer meters location algorithm is further decreased.Trained and test demonstrates s through a large number of experimentsmin=0.1,
smax=0.25, when length-width ratio={ 1,2,1/2 }, digital instrument recognizer is not under the premise of reducing precision, time complexity
It is minimum.
Step S2014:Digital instrument region under the natural scene that definition manually marks in advance is ground truth
Box is trained SSD network by ground truth box, the network is allow accurately to position complicated natural scene
While lower digital instrument region, the i.e. classification confidence of guarantee default box, prior box is returned as far as possible
It is grouped into ground truth box.The ground truth box is by the true position data ground truth that correctly marks
Composition.
Firstly the need of the positive negative sample of determination.By prior box and ground truth box according to IOU
(JaccardOverlap) it is matched, IOU>T1Prior box be exactly positive sample (positive example), it is other just
It is negative sample (negative example).Since the quantity of the negative sample generated in this way will cause to instruct far more than positive sample
It is difficult to restrain when practicing.The recurrence loss of prior box is ranked up from high to low, selection, which returns, loses highest M prior
Box is used as set D.If the collection of the positive sample serial number after successful match, i.e. Match success is combined into P, then positive sample integrates as P-D
∩ P, negative sample integrate as D-D ∩ P.The present invention controls the ratio of positive negative sample by the quantity of specification M.Due to T1, M value pair
Accurate positioning in digital instrument dial plate region is most important, so by many experiments comparative analysis to sample data, really
Determine T1=0.7, M=1:Meter location algorithm can more completely position dial plate region when 4, and convergence rate is most fast.
Then, network parameter is adjusted using loss function, makes default box as close as ground truth
box.Firstly the need of loss function is solved, i.e., recurrence loss loss (loc) of corresponding default frame and classification lose loss (conf).
Loss function is defined as:
Wherein, c is class probability, and l is prediction block, and N is prior box to match with ground truth box
Number;If N=0, loss function 0;LconfFor Classification Loss part, Classification Loss is measured using softmax loss function;
Lloc(x, l, g) is prediction block l and part is lost in the recurrence of g-th of ground truth box;λ is the weight for returning loss, generation
Table returns contribution of the loss to entire loss function;The setting of λ value has vital influence, needle to meter locating effect
To the various complex background factors of meter under natural scene, by a large amount of training experiments and cross-beta, λ=0.5 is finally set
When achieve optimal locating effect.
Return loss part LlocIt is defined as follows:
Wherein l is prediction block, and g is ground truth, i.e. actual position, and p is the corresponding classification of x, and d is default frame
(default bounding box)。
Step S2015:Finally leave out repetition framework using NMS algorithm, chooses digital instrument region.
Step S202:The positioning of digital text row.
The positioning of digital text row is the mistake that the character zone in the digital instrument region detected to SSD network is positioned
Journey.Convolution operation is carried out to sample image using ResNet50 neural network first, feature vector is obtained by each convolutional layer;
Then feature extraction is carried out to feature vector using the sliding forms of 3X3, will be instructed in the feature input BLSTM algorithm of extraction
Practice, output vector is input to the full articulamentum of FC, obtain three classification or return layers, for determining classification and character text frame
Position, size.Finally, carrying out deduplication operation to the character text frame detected using NMS algorithm, merges line of text, determine sample
The position of digital text row and size in notebook data.
The network structure of digital text row positioning is as shown in Figure 5.Specific step is as follows:
Step S2021:Using ResNet50 neural metwork training digital instrument sample data, obtaining size is W × H × C
Characteristic pattern;
Step S2022:The forms feature of 3 × 3 × C is taken in each position of characteristic pattern, for predicting the anchor of position k
The corresponding classification information of point, i.e. anchor and location information, by the shape of digital text row in statistics Instrument image, through excessive
Secondary experiment determines k=1, and the length-width ratio that the width of anchor is 16, anchor is 6:1;
Step S2023:The feature of corresponding 3 × 3 × C of all forms of every row is input in two-way LSTM, it is described double
It is BLSTM network to LSTM, obtains the Output matrix that size is W × 256;
Step S2024:FC layer, the i.e. full articulamentum that the Input matrix 512 of W × 256 is tieed up;
Step S2025:The feature of full articulamentum is input to three classification or is returned in layer, three classification or recurrence
Layer is respectively coordinate layers of 2k vertical, side-refinement layers of k and scores layers of 2k, wherein 2k
Vertical coordinate and k side-refinement are the location information for returning k anchor, 2k scores
It is the classification information of k anchor, obtains multiple fine digital text detection zones;
Step S2026:According to thresholding method, given threshold T2, by scores<T2Anchor directly delete, it is described
Scores is class probability, then carries out duplicate removal to remaining anchor text box using NMS algorithm.Due to T2Value be balance
The recall ratio of line of text positioning and the great influence factor of precision ratio, so by carrying out analysis balance to many experiments result,
T is set2Locating effect is best when=0.8.
Step S2027:Using text construction algorithm, similar two digital character text frameworks are successively merged, until nothing
Until method merges, that is, merge line of text;
Step S2028:Opposite offset is obtained using side-refinement (edge thinning), by opposite offset to pre-
The horizontal position for surveying text framework is finely adjusted, and obtains the positioning of numerical character line of text.Wherein xsideIt is prediction closest in fact
The x coordinate of the prediction framework horizontal direction of border digital text row, x* sideIt is the x coordinate of the horizontal direction of real figure line of text,
It is to be calculated in advance by the position of the bounding box and anchor of real figure line of text,It is in anchor
Heart x coordinate;waFor the width of anchor, it is worth for 16 pixel of fixed value, opposite offset is:
Step S203:Number character recognition.
Numerical character region recognition is the process identified to the digital text row in sample data.First with
The feature vector of ResNet50 neural network extraction digital text row image;Then two-way LSTM algorithm, i.e. BLSTM algorithm are utilized
Recognition feature vector obtains the probability distribution of each column feature;It is finally solved using CTC algorithm and forward-backward algorithm algorithm optimal
Label sequence obtains digital text row recognition result.
The network structure of numerical character region recognition is as shown in Figure 6.Specific identification step is as follows:
Step S2031:Digital line of text image is pre-processed, such as digital line of text image size is M × N, setting
M '=16 after scaling zoom in and out N according to the pantograph ratio of M;
Step S2032:Pretreated sample data is inputted in ResNet50 neural network and carries out feature extraction, is obtained
Characteristic pattern, then characteristic pattern is converted into feature vector by column;
Step S2033:Feature vector is identified using the two-way LSTM algorithm of RNN network, obtains each column feature
Label sequence, i.e. classification sequence.Need to input the output classification number of the network in advance, the present invention is read according to digital instrument
Feature, determine classification number be 14, the classification include 0-9 ,+,-, background.
Step S2034:Optimal label sequence is solved using CTC algorithm, obtains line of text recognition result, specific steps
It is:
1) by obtained label sequences y=y1,y2,......,yT, it is input in CTC algorithm, wherein T represents sequence
Length, eachA kind of probability distribution on set L ' is represented, L '=L ∪ { blank } is all possibility in task
Label sequence, wherein blank be blank sequence.
2) it by the function beta function of sequence to sequence, removes repeat label sequence and blank blank sequence for the first time, wherein
π∈L′T, T represents length, i.e. β (π)=l;The conditional probability of label identification is defined as the sum of the conditional probability of all π, i.e.,Forward-backward algorithm algorithm is effectively utilized to be calculated.
3) it for without dictionary library and with two kinds of models of dictionary library, chooses the highest label sequence π of prediction probability and is turned
Record transformation.
It is transcribed without dictionary library model:l*≈β(argmaxπp(π|y);
Band dictionary library model is transcribed:
Candidate sequence Nδ(l ') is effectively calculated by the data structure of BK-tree;
4) network training, training dataset X={ Ii,li, IiRepresent training digital text row image, liIt represents actual
Digital text row sequence.Purpose is so that the negative logarithm of actual numbers line of text sequence is minimum, i.e.,
Above-mentioned digital instrument zone location, the positioning of digital text row, Number character recognition three parts combine, and realize nature
Digital instrument identification under scene.Digital instrument recognizer of the invention to the digital instrument numerical value under natural scene have compared with
Strong recognition capability is a kind of digital instrument recognition methods that can be suitable under natural conditions.
Digital instrument recognizer of the invention is can to receive any scene, arbitrary dimension based on training end to end
Image input, identify the character string of random length.The present invention can carry out the digital instrument numerical value under natural scene effective
Identification.Fig. 7, Fig. 8 are the test result that the present invention identifies the digital instrument under natural scene, it is found that the present invention
Digital instrument numerical value under natural scene can effectively be identified.
Illustrated above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (4)
1. the digital instrument recognition methods under a kind of complexity natural scene, which is characterized in that include the following steps:
Step 1:The digital instrument zone location under complicated natural scene is carried out using SSD algorithm;
Step 2:Feature is extracted using ResNet50 neural network, the feature of extraction is trained using two-way LSTM network,
Obtain the line of text positioning in digital instrument region;
Step 3:Extract line of text feature using ResNet50 neural network, using BRNN network to the line of text feature of extraction into
Row training obtains digital instrument recognition result using CTC algorithm.
2. the digital instrument recognition methods under a kind of complicated natural scene according to claim 1, which is characterized in that described
Step 1 includes:
Step 1.1:Sample data is pre-processed, pretreated sample data is obtained;
Step 1.2:SSD network model is constructed, in the infrastructure network of VGG16, by the 6th layer and the 7th layer of full articulamentum
It is converted to convolutional layer;Increase by 3 convolutional layers and an average pond layer;
Step 1.3:To every characteristic pattern after convolution, the coordinate and classification after the recurrence of default frame is generated using 3 × 3 convolution are general
Rate;The calculation formula of size of each default frame is:
Wherein m is characterized map number, sminFrame size, s are defaulted for the bottommaxFrame size is defaulted for top;
Step 1.4:The pointer instrument region that definition has marked in advance is ground truth box, passes through ground truth
Box is trained SSD network model;The accurate positioning of multi-angle pointer instrument is carried out using trained SSD network;
Training process is as follows:
The default frame prior box and ground truth box actually chosen is matched according to IOU, IOU T1Prior
Box is positive sample, remaining is negative sample, the T1It is 0.7;The recurrence loss of prior box is ranked up from high to low, is selected
It selects the highest M prior box of recurrence loss and is used as set D, the positive sample after successful match is used as set P, then positive sample collection
For P-D ∩ P, negative sample integrates as D-D ∩ P;It is 1 that the positive sample collection and negative sample, which concentrate the quantity ratio of positive sample and negative sample,:
4, i.e. M are the 1/4 of prior box quantity;
Network parameter is adjusted by loss function, completes the positioning of pointer instrument;
The loss function is:
Wherein, c is class probability, and l is prediction block, and N is the prior box number to match with ground truth box;Such as
Fruit N=0, loss function 0;LconfFor Classification Loss part;Lloc(x, l, g) is prediction block l and g-th of ground truth
Part is lost in the recurrence of box;λ is the weight for returning loss, represents the contribution for returning loss to entire loss function, λ value
It is 0.5;
Step 1.5:Leave out repetition framework using NMS algorithm, chooses digital instrument region.
3. the digital instrument recognition methods under a kind of complicated natural scene according to claim 1, which is characterized in that described
Step 2 includes:
Step 2.1:Using ResNet50 neural metwork training digital instrument sample data, obtaining size is W × H × C feature
Figure;
Step 2.2:Each position of characteristic pattern takes the forms feature of 3 × 3 × C in the step 2.1, for predicting the position
The corresponding classification information of k anchor point, i.e. anchor and location information;
Step 2.3:The feature of corresponding 3 × 3 × C of all forms of every row is input in two-way LSTM network, size is obtained
For the Output matrix of W × 256;
Step 2.4:The full articulamentum that the Input matrix 512 of W × 256 is tieed up;
Step 2.5:The feature of full articulamentum is input to classification or is returned in layer, the corresponding classification information of anchor and position are obtained
Confidence breath, to obtain multiple fine digital text detection zones;
Step 2.6:According to thresholding method, given threshold T2, by scores<T2Anchor directly delete, the scores is
Class probability carries out duplicate removal, the T to remaining anchor text box using NMS algorithm2It is 0.8;
Step 2.7:Merge line of text using text construction algorithm;
Step 2.8:It is finely adjusted using horizontal position of the side-refinement algorithm to prediction text framework, obtains number
The positioning of character text row.
4. the digital instrument recognition methods under a kind of complicated natural scene according to claim 1, which is characterized in that described
Step 3 includes:
Step 3.1:Digital line of text image is pre-processed, digital text row image size is M × N, sets the scaling of M
Than being zoomed in and out according to the pantograph ratio of M to N;
Step 3.2:Pretreated sample data is inputted in ResNet50 neural network and carries out feature extraction, obtains feature
Characteristic pattern is converted into feature vector by column by figure;
Step 3.3:Feature vector is identified using the two-way LSTM algorithm of BRNN network, obtains the classification sequence of each column feature
Column;
Step 3.4:Optimal classification sequence is solved using CTC algorithm, obtains line of text recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810500379.0A CN108898131A (en) | 2018-05-23 | 2018-05-23 | It is a kind of complexity natural scene under digital instrument recognition methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810500379.0A CN108898131A (en) | 2018-05-23 | 2018-05-23 | It is a kind of complexity natural scene under digital instrument recognition methods |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108898131A true CN108898131A (en) | 2018-11-27 |
Family
ID=64343094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810500379.0A Pending CN108898131A (en) | 2018-05-23 | 2018-05-23 | It is a kind of complexity natural scene under digital instrument recognition methods |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108898131A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109753961A (en) * | 2018-12-26 | 2019-05-14 | 国网新疆电力有限公司乌鲁木齐供电公司 | A kind of substation's spacer units unlocking method and system based on image recognition |
CN109858474A (en) * | 2019-01-08 | 2019-06-07 | 北京全路通信信号研究设计院集团有限公司 | A kind of detection of transformer oil surface temperature controller and recognition methods |
CN109886174A (en) * | 2019-02-13 | 2019-06-14 | 东北大学 | A kind of natural scene character recognition method of warehouse shelf Sign Board Text region |
CN109918987A (en) * | 2018-12-29 | 2019-06-21 | 中国电子科技集团公司信息科学研究院 | A kind of video caption keyword recognition method and device |
CN109948469A (en) * | 2019-03-01 | 2019-06-28 | 吉林大学 | The automatic detection recognition method of crusing robot instrument based on deep learning |
CN110059539A (en) * | 2019-02-27 | 2019-07-26 | 天津大学 | A kind of natural scene text position detection method based on image segmentation |
CN110059694A (en) * | 2019-04-19 | 2019-07-26 | 山东大学 | The intelligent identification Method of lteral data under power industry complex scene |
CN110135431A (en) * | 2019-05-16 | 2019-08-16 | 深圳市信联征信有限公司 | The automatic identifying method and system of business license |
CN110175520A (en) * | 2019-04-22 | 2019-08-27 | 南方电网科学研究院有限责任公司 | Text position detection method, device and the storage medium of robot inspection image |
CN110399882A (en) * | 2019-05-29 | 2019-11-01 | 广东工业大学 | A kind of character detecting method based on deformable convolutional neural networks |
CN110443159A (en) * | 2019-07-17 | 2019-11-12 | 新华三大数据技术有限公司 | Digit recognition method, device, electronic equipment and storage medium |
CN110532855A (en) * | 2019-07-12 | 2019-12-03 | 西安电子科技大学 | Natural scene certificate image character recognition method based on deep learning |
CN110929805A (en) * | 2019-12-05 | 2020-03-27 | 上海肇观电子科技有限公司 | Neural network training method, target detection device, circuit and medium |
CN111428727A (en) * | 2020-03-27 | 2020-07-17 | 华南理工大学 | Natural scene text recognition method based on sequence transformation correction and attention mechanism |
CN111553345A (en) * | 2020-04-22 | 2020-08-18 | 上海浩方信息技术有限公司 | Method for realizing meter pointer reading identification processing based on Mask RCNN and orthogonal linear regression |
CN111967287A (en) * | 2019-05-20 | 2020-11-20 | 江苏金鑫信息技术有限公司 | Pedestrian detection method based on deep learning |
CN113538407A (en) * | 2018-12-29 | 2021-10-22 | 北京市商汤科技开发有限公司 | Anchor point determining method and device, electronic equipment and storage medium |
CN116958998A (en) * | 2023-09-20 | 2023-10-27 | 四川泓宝润业工程技术有限公司 | Digital instrument reading identification method based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164692A (en) * | 2012-12-03 | 2013-06-19 | 北京科技大学 | Special vehicle instrument automatic identification system and algorithm based on computer vision |
CN106529537A (en) * | 2016-11-22 | 2017-03-22 | 亿嘉和科技股份有限公司 | Digital meter reading image recognition method |
CN108052943A (en) * | 2017-12-29 | 2018-05-18 | 杭州占峰科技有限公司 | A kind of instrument character wheel recognition methods and equipment |
-
2018
- 2018-05-23 CN CN201810500379.0A patent/CN108898131A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103164692A (en) * | 2012-12-03 | 2013-06-19 | 北京科技大学 | Special vehicle instrument automatic identification system and algorithm based on computer vision |
CN106529537A (en) * | 2016-11-22 | 2017-03-22 | 亿嘉和科技股份有限公司 | Digital meter reading image recognition method |
CN108052943A (en) * | 2017-12-29 | 2018-05-18 | 杭州占峰科技有限公司 | A kind of instrument character wheel recognition methods and equipment |
Non-Patent Citations (3)
Title |
---|
BAOGUANG SHI等: "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
WEI LIU等: "SSD: Single Shot MultiBox Detector", 《ARXIV》 * |
ZHI TIAN等: "Detecting Text in Natural Image with Connectionist Text Proposal Network", 《ARXIV》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109753961A (en) * | 2018-12-26 | 2019-05-14 | 国网新疆电力有限公司乌鲁木齐供电公司 | A kind of substation's spacer units unlocking method and system based on image recognition |
CN109918987B (en) * | 2018-12-29 | 2021-05-14 | 中国电子科技集团公司信息科学研究院 | Video subtitle keyword identification method and device |
CN109918987A (en) * | 2018-12-29 | 2019-06-21 | 中国电子科技集团公司信息科学研究院 | A kind of video caption keyword recognition method and device |
CN113538407B (en) * | 2018-12-29 | 2022-10-14 | 北京市商汤科技开发有限公司 | Anchor point determining method and device, electronic equipment and storage medium |
CN113538407A (en) * | 2018-12-29 | 2021-10-22 | 北京市商汤科技开发有限公司 | Anchor point determining method and device, electronic equipment and storage medium |
CN109858474A (en) * | 2019-01-08 | 2019-06-07 | 北京全路通信信号研究设计院集团有限公司 | A kind of detection of transformer oil surface temperature controller and recognition methods |
CN109858474B (en) * | 2019-01-08 | 2021-10-19 | 北京全路通信信号研究设计院集团有限公司 | Detection and identification method for transformer oil surface temperature controller |
CN109886174A (en) * | 2019-02-13 | 2019-06-14 | 东北大学 | A kind of natural scene character recognition method of warehouse shelf Sign Board Text region |
CN110059539A (en) * | 2019-02-27 | 2019-07-26 | 天津大学 | A kind of natural scene text position detection method based on image segmentation |
CN109948469A (en) * | 2019-03-01 | 2019-06-28 | 吉林大学 | The automatic detection recognition method of crusing robot instrument based on deep learning |
CN110059694A (en) * | 2019-04-19 | 2019-07-26 | 山东大学 | The intelligent identification Method of lteral data under power industry complex scene |
CN110175520A (en) * | 2019-04-22 | 2019-08-27 | 南方电网科学研究院有限责任公司 | Text position detection method, device and the storage medium of robot inspection image |
CN110135431A (en) * | 2019-05-16 | 2019-08-16 | 深圳市信联征信有限公司 | The automatic identifying method and system of business license |
CN111967287A (en) * | 2019-05-20 | 2020-11-20 | 江苏金鑫信息技术有限公司 | Pedestrian detection method based on deep learning |
CN110399882A (en) * | 2019-05-29 | 2019-11-01 | 广东工业大学 | A kind of character detecting method based on deformable convolutional neural networks |
CN110532855A (en) * | 2019-07-12 | 2019-12-03 | 西安电子科技大学 | Natural scene certificate image character recognition method based on deep learning |
CN110532855B (en) * | 2019-07-12 | 2022-03-18 | 西安电子科技大学 | Natural scene certificate image character recognition method based on deep learning |
CN110443159A (en) * | 2019-07-17 | 2019-11-12 | 新华三大数据技术有限公司 | Digit recognition method, device, electronic equipment and storage medium |
CN110929805A (en) * | 2019-12-05 | 2020-03-27 | 上海肇观电子科技有限公司 | Neural network training method, target detection device, circuit and medium |
CN110929805B (en) * | 2019-12-05 | 2023-11-10 | 上海肇观电子科技有限公司 | Training method, target detection method and device for neural network, circuit and medium |
CN111428727A (en) * | 2020-03-27 | 2020-07-17 | 华南理工大学 | Natural scene text recognition method based on sequence transformation correction and attention mechanism |
CN111428727B (en) * | 2020-03-27 | 2023-04-07 | 华南理工大学 | Natural scene text recognition method based on sequence transformation correction and attention mechanism |
CN111553345A (en) * | 2020-04-22 | 2020-08-18 | 上海浩方信息技术有限公司 | Method for realizing meter pointer reading identification processing based on Mask RCNN and orthogonal linear regression |
CN111553345B (en) * | 2020-04-22 | 2023-10-20 | 上海浩方信息技术有限公司 | Method for realizing meter pointer reading identification processing based on Mask RCNN and orthogonal linear regression |
CN116958998A (en) * | 2023-09-20 | 2023-10-27 | 四川泓宝润业工程技术有限公司 | Digital instrument reading identification method based on deep learning |
CN116958998B (en) * | 2023-09-20 | 2023-12-26 | 四川泓宝润业工程技术有限公司 | Digital instrument reading identification method based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108898131A (en) | It is a kind of complexity natural scene under digital instrument recognition methods | |
Xu et al. | Detecting tiny objects in aerial images: A normalized Wasserstein distance and a new benchmark | |
Liao et al. | Textboxes: A fast text detector with a single deep neural network | |
CN109271895B (en) | Pedestrian re-identification method based on multi-scale feature learning and feature segmentation | |
CN111126482B (en) | Remote sensing image automatic classification method based on multi-classifier cascade model | |
CN111680706B (en) | Dual-channel output contour detection method based on coding and decoding structure | |
Hoque et al. | Real time bangladeshi sign language detection using faster r-cnn | |
CN114582470B (en) | Model training method and device and medical image report labeling method | |
CN110543906B (en) | Automatic skin recognition method based on Mask R-CNN model | |
Shen et al. | In teacher we trust: Learning compressed models for pedestrian detection | |
CN107430678A (en) | Use the inexpensive face recognition of Gauss received field feature | |
CN102708384B (en) | Bootstrapping weak learning method based on random fern and classifier thereof | |
CN110929640A (en) | Wide remote sensing description generation method based on target detection | |
Tereikovskyi et al. | The method of semantic image segmentation using neural networks | |
CN113158777A (en) | Quality scoring method, quality scoring model training method and related device | |
CN112132257A (en) | Neural network model training method based on pyramid pooling and long-term memory structure | |
CN111144462A (en) | Unknown individual identification method and device for radar signals | |
Sun et al. | Image target detection algorithm compression and pruning based on neural network | |
WO2020199498A1 (en) | Palmar digital vein comparison method and device, computer apparatus, and storage medium | |
CN113762151A (en) | Fault data processing method and system and fault prediction method | |
CN110020638A (en) | Facial expression recognizing method, device, equipment and medium | |
CN114387524B (en) | Image identification method and system for small sample learning based on multilevel second-order representation | |
CN111144466A (en) | Image sample self-adaptive depth measurement learning method | |
CN115424000A (en) | Pointer instrument identification method, system, equipment and storage medium | |
CN115936003A (en) | Software function point duplicate checking method, device, equipment and medium based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181127 |
|
RJ01 | Rejection of invention patent application after publication |