CN108898131A

CN108898131A - It is a kind of complexity natural scene under digital instrument recognition methods

Info

Publication number: CN108898131A
Application number: CN201810500379.0A
Authority: CN
Inventors: 张晨民; 彭天强; 李丙涛
Original assignee: Zhengzhou Jinhui Computer System Engineering Co Ltd
Current assignee: Zhengzhou Jinhui Computer System Engineering Co Ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2018-11-27

Abstract

The present invention relates to the digital instrument recognition methods under digital instrument identification technology field more particularly to a kind of complicated natural scene.It is a kind of complexity natural scene under digital instrument recognition methods, include the following steps：The digital instrument zone location under complicated natural scene is carried out using SSD algorithm；Feature is extracted using ResNet50 neural network, the feature of extraction is trained using two-way LSTM network, obtains the line of text positioning in digital instrument region；Line of text feature is extracted using ResNet50 neural network, is trained using line of text feature of the BRNN network to extraction, obtains digital instrument recognition result using CTC algorithm.The invention avoids identification errors caused by Character segmentation when background complexity, improve the recognition accuracy of digital instrument.

Description

It is a kind of complexity natural scene under digital instrument recognition methods

Technical field

The present invention relates to the digital instrument knowledges under digital instrument identification technology field more particularly to a kind of complicated natural scene Other method.

Background technique

Digital instrument identification, which refers to, to be found the position of numerical character from digital picture automatically using computer and identifies number The technology of word character belongs to the scope of pattern-recognition.Digital instrument is high with its precision, reads the advantages that convenient, easy setting in work Industry and detection field are widely used.

Currently, there are mainly two types of forms for the identification of digital instrument：

1, based on artificial Meter recognition.This method needs artificial eye to identify and record instrument, and process is numerous Trivial and inefficiency.And during artificial read, since the subjective reason or outside environmental elements of people are also easy to produce reading Number error, causes measurement accuracy to decline.Contain simultaneously for some external environments more severe scene, such as chemical industry, electric power etc. There are pernicious gas or low temperature, high temperature, the high place radiated, the mode being unsuitable for manually to read instrument indication value.

2, based on the Meter recognition of machine vision.This method be Instrument image is acquired using video camera, and according to The algorithm of machine vision identifies image, substantially increases the efficiency of Meter recognition.Such method uses machine vision generation For manually identifying to instrument, error caused by artificial subjective factor is not only reduced, but also eliminate artificial execute-in-place Risk.But the existing Meter recognition method based on machine vision, it can only be determined for the single digital instrument of background Position, segmentation and identification, and most of algorithm can not identify decimal point and sign.Existing algorithm is to digital instrument simultaneously After table section positioning, it is necessary to just can be carried out identification by Character segmentation, when background complexity typically result in that Character segmentation occurs partly A unrecognized situation of character.Therefore, it is necessary to a kind of for complicated natural scene and without the calculation of the Meter recognition of Character segmentation Method.

Number of patent application is 201510920430.X, entitled《A kind of digit recognition method based on intersection point feature extraction》's Chinese patent application realizes binaryzation, the target that will be identified to the image after gray processing first with maximum variance between clusters It is distinguished with image background；Secondly, carrying out Character segmentation to LED number, LED number table binary map is obtained；Then two are utilized Horizontal line is scanned from left to right at the 3/4 and 1/4 of digital table bianry image, records the number of transitions of pixel respectively；Again Using a vertical line at the 1/2 of digital table bianry image, column scan is carried out from top to bottom, records the number of transitions of pixel； Ranks pixel transform number is compared with the number of transitions of standard digital finally, number is carried out according to certain logic strategy Differentiate.

The disadvantages of the method are as follows being unable to the digital instrument under effective position Complex Natural Environment, and this method is just for 0-9 Number identified, do not design the identification of decimal point and sign.

Number of patent application is 201611031884.2, entitled《A kind of digital instrument reading image-recognizing method》China specially Benefit application, according to the digital instrument image demarcated in advance, extracts area-of-interest using template matching method in panoramic picture, Single character zone and decimal point area to be tested in area-of-interest are extracted further according to the relative positional relationship of calibration character；It is right Single character zone carries out single character recognition using the good convolutional neural networks character model of precondition；Decimal point is waited for Detection zone utilizes the good Cascade target detection based on piecemeal LBP coding characteristic and Adaboost classifier of precondition Son carries out decimal point detection, and post-processes to testing result；Finally obtained according to character, decimal point and sign recognition result Read number.

The disadvantages of the method are as follows identifying to single character, segmentation effect seriously affects digital recognition result and can only The digital instrument of identification ideally, cannot effectively identify the digital instrument under complicated natural scene.

Summary of the invention

The problem of present invention is identified for above-mentioned digital instrument proposes the digital instrument under a kind of complicated natural scene Table recognition methods, identification error caused by Character segmentation when avoiding background complexity, improves the recognition accuracy of digital instrument.

To achieve the goals above, the present invention uses following technical scheme：

It is a kind of complexity natural scene under digital instrument recognition methods, include the following steps：

Step 1：The digital instrument zone location under complicated natural scene is carried out using SSD algorithm；

Step 2：Feature is extracted using ResNet50 neural network, the feature of extraction is trained using two-way LSTM network, Obtain the line of text positioning in digital instrument region；

Step 3：Extract line of text feature using ResNet50 neural network, using BRNN network to the line of text feature of extraction into Row training obtains digital instrument recognition result using CTC algorithm.

Further, the step 1 includes：

Step 1.1：Sample data is pre-processed, pretreated sample data is obtained；

Step 1.2：SSD network model is constructed, in the infrastructure network of VGG16, by the 6th layer and the 7th layer of full articulamentum It is converted to convolutional layer；Increase by 3 convolutional layers and an average pond layer；

Step 1.3：To every characteristic pattern after convolution, the coordinate and classification after the recurrence of default frame is generated using 3 × 3 convolution are general Rate；The calculation formula of size of each default frame is：

Wherein m is characterized map number, s_minFrame size, s are defaulted for the bottom_maxFrame size is defaulted for top；

Step 1.4：The pointer instrument region that definition has marked in advance is ground truth box, passes through ground truth Box is trained SSD network model；The accurate positioning of multi-angle pointer instrument is carried out using trained SSD network；

Training process is as follows：

The default frame prior box and ground truth box actually chosen is matched according to IOU, IOU T₁'s Prior box is positive sample, remaining is negative sample, the T₁It is 0.7；The recurrence loss of prior box is carried out from high to low Sequence, select to return and loses highest M prior box as set D, the positive sample conduct set P after successful match, then just Sample set is P-D ∩ P, and negative sample integrates as D-D ∩ P；The positive sample collection and negative sample concentrate the quantity of positive sample and negative sample Than being 1:4, i.e. M are the 1/4 of prior box quantity；

Network parameter is adjusted by loss function, completes the positioning of pointer instrument；

The loss function is：

Wherein, c is class probability, and l is prediction block, and N is the prior box number to match with ground truth box；Such as Fruit N=0, loss function 0；L_confFor Classification Loss part；L_loc(x, l, g) is prediction block l and g-th of ground truth Part is lost in the recurrence of box；λ is the weight for returning loss, represents the contribution for returning loss to entire loss function, λ value It is 0.5；

Step 1.5：Leave out repetition framework using NMS algorithm, chooses digital instrument region.

Further, the step 2 includes：

Step 2.1：Using ResNet50 neural metwork training digital instrument sample data, obtaining size is W × H × C feature Figure；

Step 2.2：Each position of characteristic pattern takes the forms feature of 3 × 3 × C in the step 2.1, for predicting the position The corresponding classification information of k anchor point, i.e. anchor and location information；

Step 2.3：The feature of corresponding 3 × 3 × C of all forms of every row is input in two-way LSTM network, size is obtained For the Output matrix of W × 256；

Step 2.4：The full articulamentum that the Input matrix 512 of W × 256 is tieed up；

Step 2.5：The feature of full articulamentum is input to classification or is returned in layer, the corresponding classification information of anchor and position are obtained Confidence breath, to obtain multiple fine digital text detection zones；

Step 2.6：According to thresholding method, given threshold T₂, by scores<T₂Anchor directly delete, the scores For class probability, duplicate removal, the T are carried out to remaining anchor text box using NMS algorithm₂It is 0.8；

Step 2.7：Merge line of text using text construction algorithm；

Step 2.8：It is finely adjusted using horizontal position of the side-refinement algorithm to prediction text framework, obtains number The positioning of character text row.

Further, the step 3 includes：

Step 3.1：Digital line of text image is pre-processed, digital text row image size is M × N, sets the scaling of M Than being zoomed in and out according to the pantograph ratio of M to N；

Step 3.2：Pretreated sample data is inputted in ResNet50 neural network and carries out feature extraction, obtains feature Characteristic pattern is converted into feature vector by column by figure；

Step 3.3：Feature vector is identified using the two-way LSTM algorithm of BRNN network, obtains the classification sequence of each column feature Column；

Step 3.4：Optimal classification sequence is solved using CTC algorithm, obtains line of text recognition result.

Compared with prior art, the device have the advantages that：

The invention firstly uses SSD algorithms to carry out digital instrument zone location；Then RestNet50 neural network knot is utilized The positioning that BLSTM network carries out digital text row is closed, RestNet50 neural network combination BRNN network implementations number is finally utilized The identification of line of text utilizes the recognition result that CTC algorithm picks are optimal.The invention positions meter region using SSD algorithm, improves Digital instrument locating accuracy under Complex Natural Environment, while using BLSTM algorithm before not doing single character cutting It puts and entire line of text is identified, identification error caused by Character segmentation, improves digital instrument when avoiding background complexity The recognition accuracy of table.

Digital instrument recognizer of the invention is can to receive any scene, arbitrary dimension based on training end to end Image input, identify the character string of random length.The present invention can carry out the digital instrument numerical value under natural scene effective Identification.

Detailed description of the invention

Fig. 1 is the basic flow chart of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention.

Fig. 2 is the general frame of the digital instrument recognition methods under a kind of complicated natural scene of another embodiment of the present invention Figure.

Fig. 3 is the SSD network structure of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention Schematic diagram.

Fig. 4 is the digital instrument region of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention Position schematic network structure.

Fig. 5 is that the digital text row of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention is fixed Position schematic network structure.

Fig. 6 is the numerical character region of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention Identify schematic network structure.

Fig. 7 is the digital instrument identification of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention One of test result.

Fig. 8 is the digital instrument identification of the digital instrument recognition methods under a kind of complicated natural scene of the embodiment of the present invention The two of test result.

Specific embodiment

With reference to the accompanying drawing with specific embodiment the present invention will be further explained explanation：

Embodiment one：

As shown in Figure 1, the digital instrument recognition methods under a kind of complicated natural scene of the invention, includes the following steps：

Step S101：The digital instrument zone location under complicated natural scene is carried out using SSD algorithm；

Step S102：Extract feature using ResNet50 neural network, using two-way LSTM network to the feature of extraction into Row training obtains the line of text positioning in digital instrument region；

Step S103：Line of text feature is extracted using ResNet50 neural network, using BRNN network to the text of extraction Row feature is trained, and obtains digital instrument recognition result using CTC algorithm.

Embodiment two：

As shown in Fig. 2, the digital instrument recognition methods under the complicated natural scene of another kind of the invention, including：

The neural network model in each stage needs prior off-line training in the present invention, needs before off-line training to collected Digital instrument under natural scene is manually marked, i.e., marks the position of digital instrument in image respectively, in digital instrument The position of line of text and the recognition result of line of text.Using there are the data of label to be trained network, offline instruction is then utilized The network perfected tests test sample, realizes digital instrument zone location, the positioning of digital text row, Number character recognition Function.Digital instrument zone location, the positioning of digital text row, the process of Number character recognition are specific as follows shown：

Step S201：Digital instrument zone location.

Digital instrument zone location is to be determined using trained SSD network the digital instrument region in sample data The process of position.Wherein SSD network is the method for realizing Target detection and identification using single deep neural network model, SSD net Then fc6 and fc7 layers as shown in figure 3, be converted to using the infrastructure network of VGG16 using first 5 layers by network structure first Two convolutional layers.3 convolutional layers and one pool layers of average are especially increased again.The feature map of different levels points Frame, the i.e. offset of default box and the prediction of different classes of score Yong Yu not defaulted, obtained finally by NMS final Detect positioning result.

The network structure of digital instrument zone location is as shown in figure 4, specific step is as follows：

Step S2011：Sample data is pre-processed, the sample data of 300 × 300 × 3 sizes is obtained.

Step S2012：SSD network model is constructed, wherein the characteristic pattern of model selection includes：38×38,19×19,10× 10,5 × 5,3 × 3,1 × 1, respectively correspond block4, block7, block8, block9, block10, block11.For every Characteristic pattern is opened, the coordinate and 8 class probabilities after 4 recurrence of default frame (default box) are generated using 3 × 3 convolution.

Step S2013：It is right after preceding 5 layers of progress convolution operation of treated sample data input VGG16 neural network Convolution characteristic layer generates default frame, then proceeds by convolution operation, successively extracts default to every layer below of convolution characteristic layer Frame.Each convolution characteristic pattern generates k default frame according to different sizes (scale) and length-width ratio (ratio).

It is each default frame size calculation formula be：

Wherein m is characterized map number, s_minFrame size, s are defaulted for the bottom_maxFrame size is defaulted for top.Because every A default frame length-width ratio a_rRatio value it is original be { 1,2,3,1/2,1/3 }, so it is each default frame width be It is a height ofThe default frame for being 1 for ratio, additionally adding a ratio isDefault frame.Finally, Each point in every characteristic pattern generates 6 default frames.Each default frame center is set asWherein, | f_k| For k-th of characteristic pattern size, i ∈ [1, k].

Due to s_min、s_maxValue directly affect the calculation amount of meter location algorithm, so passing through statistics Instrument image Area shared by dial plate region, determines s in sample_min=0.1, s_max=0.25.And according to instrument shape in digital Instrument image The observation of shape counts, and for the purpose of covering various digital instrument dial plates, set the ratio value of the length-width ratio for defaulting frame as { 1,2,1/ 2 }, the calculation amount of pointer meters location algorithm is further decreased.Trained and test demonstrates s through a large number of experiments_min=0.1, s_max=0.25, when length-width ratio={ 1,2,1/2 }, digital instrument recognizer is not under the premise of reducing precision, time complexity It is minimum.

Step S2014：Digital instrument region under the natural scene that definition manually marks in advance is ground truth Box is trained SSD network by ground truth box, the network is allow accurately to position complicated natural scene While lower digital instrument region, the i.e. classification confidence of guarantee default box, prior box is returned as far as possible It is grouped into ground truth box.The ground truth box is by the true position data ground truth that correctly marks Composition.

Firstly the need of the positive negative sample of determination.By prior box and ground truth box according to IOU (JaccardOverlap) it is matched, IOU>T₁Prior box be exactly positive sample (positive example), it is other just It is negative sample (negative example).Since the quantity of the negative sample generated in this way will cause to instruct far more than positive sample It is difficult to restrain when practicing.The recurrence loss of prior box is ranked up from high to low, selection, which returns, loses highest M prior Box is used as set D.If the collection of the positive sample serial number after successful match, i.e. Match success is combined into P, then positive sample integrates as P-D ∩ P, negative sample integrate as D-D ∩ P.The present invention controls the ratio of positive negative sample by the quantity of specification M.Due to T₁, M value pair Accurate positioning in digital instrument dial plate region is most important, so by many experiments comparative analysis to sample data, really Determine T₁=0.7, M=1:Meter location algorithm can more completely position dial plate region when 4, and convergence rate is most fast.

Then, network parameter is adjusted using loss function, makes default box as close as ground truth box.Firstly the need of loss function is solved, i.e., recurrence loss loss (loc) of corresponding default frame and classification lose loss (conf). Loss function is defined as：

Wherein, c is class probability, and l is prediction block, and N is prior box to match with ground truth box Number；If N=0, loss function 0；L_confFor Classification Loss part, Classification Loss is measured using softmax loss function； L_loc(x, l, g) is prediction block l and part is lost in the recurrence of g-th of ground truth box；λ is the weight for returning loss, generation Table returns contribution of the loss to entire loss function；The setting of λ value has vital influence, needle to meter locating effect To the various complex background factors of meter under natural scene, by a large amount of training experiments and cross-beta, λ=0.5 is finally set When achieve optimal locating effect.

Return loss part L_locIt is defined as follows：

Wherein l is prediction block, and g is ground truth, i.e. actual position, and p is the corresponding classification of x, and d is default frame (default bounding box)。

Step S2015：Finally leave out repetition framework using NMS algorithm, chooses digital instrument region.

Step S202：The positioning of digital text row.

The positioning of digital text row is the mistake that the character zone in the digital instrument region detected to SSD network is positioned Journey.Convolution operation is carried out to sample image using ResNet50 neural network first, feature vector is obtained by each convolutional layer； Then feature extraction is carried out to feature vector using the sliding forms of 3X3, will be instructed in the feature input BLSTM algorithm of extraction Practice, output vector is input to the full articulamentum of FC, obtain three classification or return layers, for determining classification and character text frame Position, size.Finally, carrying out deduplication operation to the character text frame detected using NMS algorithm, merges line of text, determine sample The position of digital text row and size in notebook data.

The network structure of digital text row positioning is as shown in Figure 5.Specific step is as follows：

Step S2021：Using ResNet50 neural metwork training digital instrument sample data, obtaining size is W × H × C Characteristic pattern；

Step S2022：The forms feature of 3 × 3 × C is taken in each position of characteristic pattern, for predicting the anchor of position k The corresponding classification information of point, i.e. anchor and location information, by the shape of digital text row in statistics Instrument image, through excessive Secondary experiment determines k=1, and the length-width ratio that the width of anchor is 16, anchor is 6:1；

Step S2023：The feature of corresponding 3 × 3 × C of all forms of every row is input in two-way LSTM, it is described double It is BLSTM network to LSTM, obtains the Output matrix that size is W × 256；

Step S2024：FC layer, the i.e. full articulamentum that the Input matrix 512 of W × 256 is tieed up；

Step S2025：The feature of full articulamentum is input to three classification or is returned in layer, three classification or recurrence Layer is respectively coordinate layers of 2k vertical, side-refinement layers of k and scores layers of 2k, wherein 2k Vertical coordinate and k side-refinement are the location information for returning k anchor, 2k scores It is the classification information of k anchor, obtains multiple fine digital text detection zones；

Step S2026：According to thresholding method, given threshold T₂, by scores<T₂Anchor directly delete, it is described Scores is class probability, then carries out duplicate removal to remaining anchor text box using NMS algorithm.Due to T₂Value be balance The recall ratio of line of text positioning and the great influence factor of precision ratio, so by carrying out analysis balance to many experiments result, T is set₂Locating effect is best when=0.8.

Step S2027：Using text construction algorithm, similar two digital character text frameworks are successively merged, until nothing Until method merges, that is, merge line of text；

Step S2028：Opposite offset is obtained using side-refinement (edge thinning), by opposite offset to pre- The horizontal position for surveying text framework is finely adjusted, and obtains the positioning of numerical character line of text.Wherein x_sideIt is prediction closest in fact The x coordinate of the prediction framework horizontal direction of border digital text row, x^* _sideIt is the x coordinate of the horizontal direction of real figure line of text, It is to be calculated in advance by the position of the bounding box and anchor of real figure line of text,It is in anchor Heart x coordinate；w^aFor the width of anchor, it is worth for 16 pixel of fixed value, opposite offset is：

Step S203：Number character recognition.

Numerical character region recognition is the process identified to the digital text row in sample data.First with The feature vector of ResNet50 neural network extraction digital text row image；Then two-way LSTM algorithm, i.e. BLSTM algorithm are utilized Recognition feature vector obtains the probability distribution of each column feature；It is finally solved using CTC algorithm and forward-backward algorithm algorithm optimal Label sequence obtains digital text row recognition result.

The network structure of numerical character region recognition is as shown in Figure 6.Specific identification step is as follows：

Step S2031：Digital line of text image is pre-processed, such as digital line of text image size is M × N, setting M '=16 after scaling zoom in and out N according to the pantograph ratio of M；

Step S2032：Pretreated sample data is inputted in ResNet50 neural network and carries out feature extraction, is obtained Characteristic pattern, then characteristic pattern is converted into feature vector by column；

Step S2033：Feature vector is identified using the two-way LSTM algorithm of RNN network, obtains each column feature Label sequence, i.e. classification sequence.Need to input the output classification number of the network in advance, the present invention is read according to digital instrument Feature, determine classification number be 14, the classification include 0-9 ,+,-, background.

Step S2034：Optimal label sequence is solved using CTC algorithm, obtains line of text recognition result, specific steps It is：

1) by obtained label sequences y=y₁,y₂,......,y_T, it is input in CTC algorithm, wherein T represents sequence Length, eachA kind of probability distribution on set L ' is represented, L '=L ∪ { blank } is all possibility in task Label sequence, wherein blank be blank sequence.

2) it by the function beta function of sequence to sequence, removes repeat label sequence and blank blank sequence for the first time, wherein π∈L′^T, T represents length, i.e. β (π)=l；The conditional probability of label identification is defined as the sum of the conditional probability of all π, i.e.,Forward-backward algorithm algorithm is effectively utilized to be calculated.

3) it for without dictionary library and with two kinds of models of dictionary library, chooses the highest label sequence π of prediction probability and is turned Record transformation.

It is transcribed without dictionary library model：l^*≈β(argmax_πp(π|y)；

Band dictionary library model is transcribed：

Candidate sequence N_δ(l ') is effectively calculated by the data structure of BK-tree；

4) network training, training dataset X={ I_i,l_i, I_iRepresent training digital text row image, l_iIt represents actual Digital text row sequence.Purpose is so that the negative logarithm of actual numbers line of text sequence is minimum, i.e.,

Above-mentioned digital instrument zone location, the positioning of digital text row, Number character recognition three parts combine, and realize nature Digital instrument identification under scene.Digital instrument recognizer of the invention to the digital instrument numerical value under natural scene have compared with Strong recognition capability is a kind of digital instrument recognition methods that can be suitable under natural conditions.

Digital instrument recognizer of the invention is can to receive any scene, arbitrary dimension based on training end to end Image input, identify the character string of random length.The present invention can carry out the digital instrument numerical value under natural scene effective Identification.Fig. 7, Fig. 8 are the test result that the present invention identifies the digital instrument under natural scene, it is found that the present invention Digital instrument numerical value under natural scene can effectively be identified.

Illustrated above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. the digital instrument recognition methods under a kind of complexity natural scene, which is characterized in that include the following steps：

2. the digital instrument recognition methods under a kind of complicated natural scene according to claim 1, which is characterized in that described Step 1 includes：

Step 1.1：Sample data is pre-processed, pretreated sample data is obtained；

Training process is as follows：

The default frame prior box and ground truth box actually chosen is matched according to IOU, IOU T₁Prior Box is positive sample, remaining is negative sample, the T₁It is 0.7；The recurrence loss of prior box is ranked up from high to low, is selected It selects the highest M prior box of recurrence loss and is used as set D, the positive sample after successful match is used as set P, then positive sample collection For P-D ∩ P, negative sample integrates as D-D ∩ P；It is 1 that the positive sample collection and negative sample, which concentrate the quantity ratio of positive sample and negative sample,: 4, i.e. M are the 1/4 of prior box quantity；

The loss function is：

3. the digital instrument recognition methods under a kind of complicated natural scene according to claim 1, which is characterized in that described Step 2 includes：

Step 2.6：According to thresholding method, given threshold T₂, by scores<T₂Anchor directly delete, the scores is Class probability carries out duplicate removal, the T to remaining anchor text box using NMS algorithm₂It is 0.8；

Step 2.7：Merge line of text using text construction algorithm；

4. the digital instrument recognition methods under a kind of complicated natural scene according to claim 1, which is characterized in that described Step 3 includes：