CN106504233B

CN106504233B - Unmanned plane inspection image electric power widget recognition methods and system based on Faster R-CNN

Info

Publication number: CN106504233B
Application number: CN201610906708.2A
Authority: CN
Inventors: 蒋斌; 王万国; 刘越; 刘俍; 苏建军; 慕世友; 任志刚; 杨波; 李超英; 傅孟潮; 孙晓斌; 李宗谕; 李建祥; 赵金龙
Original assignee: State Grid Corp of China SGCC; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; Shandong Luneng Intelligence Technology Co Ltd
Current assignee: State Grid Intelligent Technology Co Ltd
Priority date: 2016-10-18
Filing date: 2016-10-18
Publication date: 2019-04-09
Anticipated expiration: 2036-10-18
Also published as: CN106504233A

Abstract

The invention discloses unmanned plane inspection image electric power widget recognition methods and system based on Faster R-CNN；It, which is comprised the following steps that, carries out pre-training to ZFnet model, extracts the characteristic pattern of unmanned plane inspection image；Network model training is proposed to the region RPN that initialization obtains, obtain extracted region network, candidate region frame is generated on the characteristic pattern of image using extracted region network, the feature in the frame of candidate region is extracted, and extracts the position feature and further feature of target；Using the position feature of target, further feature and characteristic pattern, the Faster R-CNN detection network that initialization obtains is trained, electric power widget detection model is obtained；Step (4): actual electric power widget recognition detection is carried out using electric power widget detection model.Beneficial effects of the present invention: the recognition speed of every nearly 80ms and 92.7% accuracy rate be can achieve using the electric power widget identification positioning that Faster R-CNN carries out plurality of classes.

Description

Unmanned plane inspection image electric power widget recognition methods based on Faster R-CNN and System

Technical field

The unmanned plane inspection image electric power widget recognition methods based on Faster R-CNN that the present invention relates to a kind of and it is System.

Background technique

In recent years with unmanned plane (Unmanned Aerial Vehicle, UAV)) application gradually popularize, power-line patrolling Unmanned plane by major grid company extensive concern and demonstrated and promoted and applied.Unmanned plane line walking has field work low The characteristics of risk, low cost and flexible operation；Meanwhile the interior work amount of walking operation also being brought to increase, so that magnanimity Data need just obtain final inspection report by a large amount of artificial interpretation.

Currently, power components identification remains in traditional identification level based on shallow-layer feature, pass through Fine design Shallow-layer feature, such as SIFT (Scale-invariant feature transform), edge detection symbol, HOG (Histogram Of Oriented Gridients), or image segmentation is carried out based on component circumference skeleton, adaptive threshold etc., to reach To the purpose of identification.But these methods are often based on particular category to realize in design principle, to insulator, conducting wire etc. Single part can sufficiently excavate its mode of appearance feature, but its accuracy rate is low, not have scalability；And method structure pine It dissipates, lacks and comprehensively utilize to low-level feature and then achieve the purpose that global optimum identifies.

In contrast, the contour detecting and hierarchical image segmentation method and Multiscale combination of Malik team polymerize (Multiscale Combinatorial Grouping, MCG) method and J.Uijlings and K.van de Sande etc. The target identification method based on selective search (Selective Search) that people proposes gives a variety of low level features It carries out global optimization and constructs the normal form of hierarchy Model, improve accuracy rate, but these methods do not have with sample number still Amount increases the ability for promoting recognition accuracy.

Chinese invention patent (application number: 201510907472.X, a kind of patent name: transmission line of electricity small parts identification Method), although this method to the identification of conductor spacer, stockbridge damper and can determine according to the relational implementation of conducting wire and small parts Position, but for the recognition efficiency and recognition effect and bad under complex background, it is not met by live needs.

Summary of the invention

The purpose of the present invention is to solve the above-mentioned problems, provides a kind of unmanned plane inspection based on Faster R-CNN Image electric power widget recognition methods and system respectively compare DPM, SPPNet and Faster R-CNN recognition methods Analysis carries out test verifying to three kinds of methods using the data set that the electric power widget inspection data of actual acquisition construct.Experiment The result shows that the identification for realizing electric power widget based on the inspection image electric power widget recognition methods of deep learning is feasible , and the knowledge that can achieve every nearly 80ms is positioned using the electric power widget identification that Faster R-CNN carries out plurality of classes Other speed and 92.7% accuracy rate.RCNN (Region based Convolutional Neural Network) is based on area The convolutional neural networks that domain is proposed.

To achieve the goals above, the present invention adopts the following technical scheme:

Unmanned plane inspection image electric power widget recognition methods based on FasterR-CNN, comprises the following steps that

Step (1): pre-training is carried out to ZFnet model, extracts the characteristic pattern of unmanned plane inspection image；The region RPN is mentioned View network model and Faster R-CNN detection network are initialized；

Step (2): network model training is proposed to the region RPN that initialization obtains, obtains extracted region network, utilizes area Domain extracts network and generates candidate region frame on the characteristic pattern of image, extracts, extracts to the feature in the frame of candidate region The position feature and further feature of target；

Step (3): using the position feature of target, further feature and characteristic pattern, the Faster R- that initialization is obtained CNN detection network is trained, and obtains electric power widget detection model；

Step (4): actual electric power widget recognition detection is carried out using electric power widget detection model.

The ZFnet model after pre-training obtains pre-training is carried out to ZFnet model using ImageNet categorized data set.

The step of carrying out pre-training to ZFnet model using ImageNet categorized data set is as follows:

ZFnet model be 8 layer network structures, include 5 convolutional layers and 3 full articulamentums, from top to bottom count first layer, The operation of Max Pooling pondization is added behind the second layer and layer 5 convolutional layer.

ZFNet model counts from top to bottom,

First convolutional layer carries out convolution to the ImageNet categorized data set of input, after convolution, carries out first Max The operation of Pooling pondization；

The result that second convolutional layer operates first Max Pooling pondization carries out convolution, after convolution, carries out second A Max Pooling pondization operation；

Third convolutional layer carries out convolution to the result that second Max Pooling pondization operates；After convolution,

4th convolutional layer carries out convolution to the convolution results of third convolutional layer；After convolution,

5th convolutional layer carries out convolution to the convolution results of the 4th convolutional layer, after convolution, carries out third Max The operation of Pooling pondization；

256 output units are obtained after the operation of third Max Pooling pondization, form characteristic pattern (Feature Map)。

Network model and Faster R-CNN detection network, which initialize, to be proposed to the region RPN:

Set the initialization informations such as sliding step, the sliding window size of RPN network extraction module.

The output unit number of detection network is set according to training dataset classification number, and initializes the weight of detection network Parameter.

Step (2): using electric power widget picture construction electric power widget image set, trains with electric power widget image set When network is proposed in the region RPN after initialization, network, which carries out tuning, to be proposed to the region after initialization using back-propagation algorithm.

The region RPN proposal network is input with the image of arbitrary size, and it includes time that the region RPN, which proposes that network exports several, Select the candidate region frame of target.

The region RPN is added behind the 5th convolutional layer of ZFnet model and proposes network convolutional layer, and convolution operation is adopted It is operated with sliding type, for opening a window on the target position of each hand labeled on ZFNet aspect of model figure, benefit Propose that network carries out convolution algorithm with the region RPN, obtains corresponding 256 dimensional feature vector in each position, 256 dimensional feature Vector is used to reflect the further feature in the window of each position；

Propose 9 kinds of convolution kernel letters that network convolutional layer is combined into using 3 kinds of different sizes and 3 kinds of different proportions in the region RPN Several positions to the window comprising candidate target carry out Objective extraction, the position feature about target are obtained, by the position of target Feature training data as input is input to step (3).

It can be predicted by 256 dimensional feature vector:

1) window of the corresponding position of 256 dimensional feature vectors belongs to the probability value of foreground/background；

2) the corresponding position of 256 dimensional feature vectors includes nearby the window of candidate target relative to 256 dimensional feature vectors pair The deviation of the window for the position answered.

3 kinds of different proportions refer to: 1:1,1:2 and 2:1.

Steps are as follows for the step (3):

Step (3-1): the further feature that the 5th convolutional layer is obtained, full connection form high dimensional feature vector and realize image Global feature description, the input as the full articulamentum FC6 of layer 6.

The position feature that the region RPN proposal network convolutional layer obtains also is used as to the input of the full articulamentum FC6 of layer 6；

It is that the mode connected entirely carries out data exchange between layer 6 full articulamentum FC6 and layer 7 prediction interval FC7；

Layer 7 prediction interval FC7 includes categorization module and regression block；

Categorization module is used for the type of judging characteristic, and regression block is used for the target position of precise positioning feature；

Step (3-2): calculating network whole loss function and carries out each layer parameter optimization of network according to loss function.

Loss function L (p, k^*,t,t^*):

Training dataset is divided into K+1 class, k^*Indicate correct tag along sort, p=(p₀,...,p_K) presentation class is k Probability,Indicate the indicia framing information that is calculated of regression block: the abscissa of indicia framing, indicia framing it is vertical The height of coordinate, the width of indicia framing and indicia framing.

Calculation formula are as follows:

The small parameter perturbations training that detection network is carried out according to the label demarcated in training set, by stochastic gradient descent side (Stochastic Gradient Descent, SGD), alternative optimization network parameter.

Fixed regression block parameter constant, the fixed cluster in optimized regression module parameter in Optimum Classification module parameter Module parameter is constant.

The high dimensional feature vector refers to 4096 dimensional feature vectors.

It can be predicted by layer 7 prediction interval FC7:

1) regional frame comprising candidate target belongs to the probability of any classification；

2) the precise information set of target object external surrounding frame, the position correspondence including target in feature description are originally inputted 2 translational movements of the transverse and longitudinal coordinate on image location information, and two to zoom in or out on transverse and longitudinal coordinate axis are put Contracting coefficient.By having the precise information set of this 4 parameters, Accurate Calibration of the target in original image is realized.

Propose that network and Faster R-CNN detection network finally have shared convolutional layer and form an electric power in the region RPN Widget detection model.

Steps are as follows for step (4):

Step (4-1): the electric power widget detection model obtained according to training, initialization electric power widget detection model ginseng Number；

Step (4-2): input picture is obtained after electric power widget detection model to each class included in image Other probability value and location information；

Step (4-3): according to location information, target specific location is gazed in original image subscript.

The electric power widget includes conductor spacer, grading ring and stockbridge damper.

RPN, Region Proposal Networks

Unmanned plane inspection image electric power widget identifying system based on FasterR-CNN, comprising:

Pre-training module: pre-training is carried out to ZFnet model, extracts the characteristic pattern of unmanned plane inspection image；To the region RPN Propose that network model and Faster R-CNN detection network are initialized；

Characteristic extracting module: proposing network model training to the region RPN that initialization obtains, obtain extracted region network, Candidate region frame is generated on the characteristic pattern of image using extracted region network, and the feature in the frame of candidate region is extracted, Extract the position feature and further feature of target；

Model training module: using the position feature of target, further feature and characteristic pattern, the Faster that initialization is obtained R-CNN detection network is trained, and obtains electric power widget detection model；

Detection module: actual electric power widget recognition detection is carried out using electric power widget detection model.

Beneficial effects of the present invention:

1 all compares the accuracy and efficiency that electric power widget identifies using Faster R-CNN even depth learning algorithm Height, experiment show the deep learning method based on statistics can be made to realize to inspection video or figure using specific GPU computing unit The real-time target of picture detects and identification, can be the accurate bat of the intelligent processing and patrol unmanned machine of later period unmanned plane inspection image It takes the photograph and lays a good foundation.

2 compared with Sppnet and Fast R-CNN, and Faster R-CNN method had both breached the time of zoning proposal Bottleneck, and can guarantee ideal discrimination.

The region that 3 the method for the present invention use proposes that network and detection network have good generalization ability, can recognize that Part is blocked and the intermediate conductor spacer for passing through iron, and all can correctly identify to the component of various different directions.

Detailed description of the invention

Fig. 1 is the schematic network structure that the present invention uses；

Fig. 2 is the joint network training process of component identification；

Fig. 3 is network training process schematic diagram；

Fig. 4 is detection identification process；

Fig. 5 (a) is raw-data map；

Fig. 5 (b) is training sample data figure；

Fig. 6 (a) is conductor spacer, stockbridge damper detection effect figure；

Fig. 6 (b) is grading ring, conductor spacer detection effect schematic diagram.

Specific embodiment

The invention will be further described with embodiment with reference to the accompanying drawing.

From 2012, along with the development of high-performance GPU parallel computation, deep learning Remarkable Progress On Electric Artificial has surmounted base In the conventional method of shallow-layer feature and linear classifier, become the leader of field of target recognition, PASCAL (pattern Analysis, statistical modelling and computational learning) and ILSVRC (Imagenet Large Scale Vision Recognition Challenge) sample database base of the contest as evaluation universal identification algorithm Standard has witnessed the breakthrough of deep learning method and has stepped up.The present invention passes through the research to deep learning in terms of identification, needle Problem and data characteristics are identified to power components, identification and the test sample library of three classes power components is constructed, has studied DPM (Deformable Part Models), it is based on RCNN (Region based Convolutional Neural Network) Sppnet (Spatial pyramid pooling networks) and tri- kinds of algorithms of Faster R-CNN, and to the small portion of electric power Part carries out identification assignment test and verifying, analyzes the superiority and inferiority of each algorithm, provides the algorithm suitable for power components identification, is The accurate shooting of the intelligent processing and unmanned plane image of later period unmanned plane inspection image is laid a good foundation.

1 classics DPM method and RCNN

In terms of identification, there are two critical issues, one be target position determination；The other is target category is sentenced Not.Both of these problems are to be mutually related.The mode originally determined according to target position is different, recognition methods can be classified as two Class: one kind is to judge whether there is target object by window by the way of sliding window；Another kind of proposed using region Mode is first concentrated and generates and may include the regional frame of target object, then judges whether each candidate frame includes target object one by one. The typical algorithm of sliding window mode recognition methods is deformable part model DPM；Propose the typical algorithm that mode identifies in region It is the convolutional neural networks RCNN based on region.

1.1 deformable part model DPM

Deformable part model DPM method is the classical Target Recognition Algorithms proposed by P.Felzenszwalb.It is detecting In the stage, DPM as a sliding window operation on characteristics of image pyramid, usually built by HOG feature by characteristics of image pyramid It is vertical.DPM gives each sliding window by one comprehensive part distortion cost function of optimization and the scoring function of images match score Assign a score.The optimal result of whole PASCAL identification match in 2007 to 2011 is all by DPM method and its mutation It obtains.However, the defect that DPM method also has its intrinsic: 1) sliding window mode is actually a kind of traversal search The mode of (exhaustive search), this makes feasibility of the DPM without scale expansion, and with image resolution ratio Raising, the calculation amount of DPM method increases by geometric progression；2) the experimental results showed that, as training samples number increases, DPM The recognition accuracy of method can reach saturation, be unable to fully obtain the convenient advantage of image using unmanned plane to improve the standard of identification True rate.

The 1.2 convolutional neural networks RCNN proposed based on region

The convolutional neural networks method RCNN based on region that Ross etc. was proposed in 2014 greatly improves PASCAL knowledge Recognition accuracy is increased to 53.3% from 41% (DPM++) by taking PASCAL VOC2012 contest as an example by the accuracy rate of other contest (RCNN), it becomes the typical scenario identified based on region proposal mode.In detection-phase, it is divided into following 4 steps:

1) a large amount of candidate regions are generated with some visible sensation methods (such as Selective Search) first.

2) secondly, carrying out character representation with convolutional neural networks CNN to each candidate region, ultimately form high dimensional feature to Amount.

3) these characteristic quantities then, are sent into a linear classifier and calculate category score, for judging included object.

4) finally, position and size to targeted peripheral frame carry out a fine recurrence.

Compared with the traversal search mode of the sliding window of DPM, the region of the first step proposes to be selective search, uses Preceding 2000 regions of highest scoring can effectively reduce the calculation amount of feature extraction below, can cope with scale problem well； Convolutional neural networks CNN uses graphics calculations unit GPU to carry out parallel computation in realization, and computational efficiency is substantially better than the side DPM Method (is calculated in realization using list CPU)；External surrounding frame returns so that further being promoted to the accuracy of target positioning.RCNN is in training Also there are following 4 steps in stage.

1) candidate region for generating every picture is concentrated with selective search first, and each candidate region is mentioned with CNN Feature is taken, CNN is using trained ImageNet network (by 1000 classes up to a million figures of ILSVRC classification match here As training obtains).

2) secondly, carrying out tuning, tuning establishing criteria to ImageNet network using candidate region and the feature extracted Back-propagation algorithm carry out, adjust each layer weight backward since characteristic layer.

It 3) is then, input, Training Support Vector Machines with the high dimensional feature vector sum target category label of characteristic layer output.

4) finally, the recurrence device that training finely returns targeted peripheral frame position and size.

RCNN method, considerably beyond DPM method, becomes the allusion quotation identified based on deep learning in accuracy rate and efficiency Type scheme.2014 and 2015, the researcher of Ross and Microsoft Research, Asia proposed improved RCNN method successively, wrapped It includes and is firstly introduced spatial pyramid pond layer to relax and limit to input dimension of picture and improve the SPPnet of accuracy rate；It adopts Tuning can be carried out to whole network with adaptive scale pondization to improve the Fast R- of the accuracy rate of deep layer Network Recognition CNN；It is finally Faster R-CNN, he proposes that network is searched come the selectivity for replacing time overhead big by the exquisite region of building Suo Fangfa makes being identified as in real time for view-based access control model feature so that having broken zoning proposes the big bottleneck problem of time overhead It is possible.The present invention, which is mainly told about, identifies power components using Faster R-CNN method.

2 power components based on Faster R-CNN method identify positioning

Compared with Sppnet and Fast R-CNN, Faster R-CNN method had both breached the time bottle of zoning proposal Neck, and can guarantee ideal discrimination.Therefore, the present invention extracts electric power widget based on Faster R-CNN recognition methods Identification feature and carry out target identification verifying.

The library Caffe CNN that the present invention is based on open source to the training of network and the detection of test sample carries out.Caffe is One clear and efficient deep learning frame, readable, terseness and performance are all very outstanding, and have been directly integrated convolution Neural network nervous layer.Due to the characteristic of depth convolutional network itself, when accelerating operation that can greatly shorten algorithm training with GPU Between, Caffe also provides corresponding interface.

As shown in Figure 1, ZFnet model is 8 layer network structures, comprising 5 convolutional layers and 3 full articulamentums, from top to bottom Number adds the operation of Max Pooling pondization behind first layer, the second layer and layer 5 convolutional layer.

ZFNet model counts from top to bottom,

The further feature that 5th convolutional layer is obtained, full connection form the global feature that high dimensional feature vector realizes image Description, the input as the full articulamentum FC6 of layer 6.

The network of 2.1 pairs of power components identification is trained

Faster R-CNN method includes two CNN networks: proposing network RPN (Regional Proposal in region Network) and Fast R-CNN detects network.The key step of training stage is as shown in Fig. 2, Fig. 3 is to RPN network and detection Network carries out joint training figure.

(1) pre-training CNN model

The ImageNet network that RPN network and detection network require pre-training is initialized, and be can choose tool and is of five storeys The ZFnet network (Zeiler and Fergus) of network structure, also can choose the VGG16 network with 16 layer network structures (Simonyan and Zisserman modal).Because the data set scale that the present invention constructs is smaller, therefore select ZFnet network.

Using the training data (1,200,000 images, 1000 classes) in ILSVRC2012 image classification task to ZFnet model Carry out pre-training.Propose that network and detection network are all to add specific layer on the basis of ZFnet to obtain in region.ZFnet was once Higher classification accuracy is reached on ILSVRC competition data collection.ZFnet includes 5 convolutional layers, behind some convolutional layers Add Max Pooling pond layer and 3 characteristic layers being fully connected.

The last one convolutional layer of ZFNet, i.e. the 5th convolutional layer include 256 channels, referred to as characteristic pattern (Feature Map).Characteristic pattern can intuitively be interpreted as the deep layer convolution feature of original image, and the further feature of similar object is very close； Further feature without similar object is widely different, i.e., object has good separability on characteristic pattern, this is exactly depth mind Where ability through network --- brilliant feature learning and expression ability.

The ZFnet network pair that present invention use is obtained by ILSVRC categorized data set (1,200,000 pictures, 1000 classes) training Propose that network and detection network are initialized in region.It is demonstrated experimentally that using by big data quantity and more multi-class sample database (phase Compared with the sample database of small data quantity and less classification) the sorter network initialization identification network of training, accuracy rate is higher.

(2) RPN network training

Training set of images, but power components image set and pre-training image set either classification are constructed with electric power image of component All there is very big difference in quantity or image style.When with electric power image of component collection training RPN network, previous step is directly used The ZFnet model initialization RPN of pre-training proposes that network carries out tuning to region using back-propagation algorithm.

RPN network with the image of arbitrary size be input, can export later it is a series of may include target regional frame. As shown in figure 3, adding a small convolutional layer behind the CONV5 of ZFnet, this small convolutional layer is transported using sliding type Make, for each position (position on corresponding original image) on characteristic pattern, convolution algorithm is carried out by small convolutional layer, i.e., A wicket is opened in this position and carries out convolution algorithm, obtains corresponding 256 dimensional vector in the same position (due to there are 256 to lead to Road), which reflects the further feature in the position wicket (a certain window on corresponding original image).It 256 is tieed up by this Feature vector can predict: 1) the position wicket belong to the probability value of foreground/background to get point；2) position is nearby wrapped Deviation of the window containing target relative to the position wicket indicates that 2 translate, 2 scalings with 4 parameters.

By experimental analysis, 9 kinds of benchmark being combined into using 3 kinds of different sizes and 3 kinds of different proportions (1:1,1:2,2:1) Wicket predicts the position of the window comprising target, region can be made to propose more accurate.

(3) Fast R-CNN detects network training

The independent detection network of Fast R-CNN method training, detection are based on according to the region proposed issue that previous step generates Network also utilizes ZFnet pre-training model initialization.

The feature extraction of 5 layers of convolutional network is carried out to input picture, the 5th layer of characteristic pattern (CONV5) is one 256 × 256 Characteristic pattern, take out CONV5 on corresponding depth characteristic, whole features in 256 channels are connected into a higher-dimension (4096 Dimension) feature vector, referred to as FC6 characteristic layer, behind add another 4096 dimension characteristic layer, formed FC7, adopted between FC6 and FC7 With being fully connected.Can be predicted by FC7 characteristic layer: 1) candidate region frame belongs to the probability of any classification to get dividing；2) target object The more suitable position of external surrounding frame, with it, relative to 2 of candidate region frame translations, totally 4 parameters are indicated with 2 scalings.Pass through The use of information back-propagation algorithm marked in advance is finely adjusted the detection network.In training ZFnet, what is used is not electricity Power parts data, the model that training obtains are misfitted with power components, carry out arameter optimization with electric power parts data.It is reversed to pass Broadcasting is exactly the process that arameter optimization is completed using stochastic gradient descent method, adjusts the weight parameter of network.

The CNN of (4) two networks is shared and combines tuning

By two networks, individually the parameter of training and unrealized convolutional network is shared.

RPN network, and fixed shared depth convolutional layer are initialized using the detection network of third step training.I.e. excellent Same input data is used when changing RPN network and categorization module.As shown in figure 3, being adjusted to the special part of RPN network Excellent, in order to corresponding with detection network, this part is referred to as the FC layer of RPN network, and two such network just has shared depth convolutional layer；

Finally, fixed shared convolutional layer, carries out tuning to the FC layer of Fast R-CNN.Two such network just has shared Convolutional layer simultaneously forms a united network.

2.2 detection identification process

By training above it is found that two networks can finally share same 5 layers of the convolutional neural networks, this allows for whole A detection process, which need to only complete serial convolution algorithm, can be completed detection identification process, thoroughly solves original region and proposes step The big bottleneck problem of time overhead.

The process of identification is detected as shown in figure 4, implementation step are as follows:

1) serial convolution algorithm is carried out to whole image first, obtains characteristic pattern CONV5；

2) propose that network generates a large amount of candidate region frames on characteristic pattern by region；

3) non-maximum value inhibition is carried out to candidate region frame, keep score higher preceding 300 frames；

It takes out the feature on characteristic pattern in the frame of candidate region and forms high dimensional feature vector, obtained by detection network query function classification Point, and predict more suitable targeted peripheral frame position.

The comparison of 3 experimental results

Unmanned plane filmed image has the characteristics that resolution ratio is higher, lesser comprising target, and the angle of filmed image has more Sample and certain randomness are suitably for the sample database that deep learning method is provided with enough tolerance amounts.In an experiment, we will know Other 3 class small electrical component-conductor spacer, stockbridge damper and grading ring.

The processing of 3.1 training samples

Data set derives from multi-rotor unmanned aerial vehicle and helicopter routing inspection image, covers spring, summer, autumn and winter four from seasonal A season.Raw video pixel size is 5184 × 3456 (shown in such as Fig. 5 (a)), and square of the interception based on target is small Block image, unified scaling to 500 × 500 (such as Fig. 5 (b)), as training sample, handle in this way so that in sample target size Ratio of the ratio close to sample in PASCAL identification contest.

3.2 training sets and test set building

This test uses 1500 training samples for each base part of conductor spacer, grading ring and stockbridge damper, Totally 4500 sample composing training collection；Every class 500 opens testing image, and totally 1500 images constitute test set.To every in training set The small electrical component not being blocked completely occurred in picture mark its external surrounding frame (it is imperfect in training set picture or by The power components blocked do not mark)；And to test set, all power components occurred in every picture are marked, including endless What whole sum was blocked.

When test, when the external surrounding frame overlapping area of the external surrounding frame and label that identify reaches 90% or more of label external surrounding frame It is treated as once successfully identifying.In this test, the accuracy of identification is judged with accuracy and recall rate, wherein accuracy is Target category marks correct external surrounding frame number divided by all external surrounding frame numbers marked；Recall rate is that target category marks just True external surrounding frame number divided by all standards external surrounding frame number.Since the classification of this identification only has three types, because This respectively counts the accuracy and recall rate of every a kind of power components identification.

3.3 experimental result

The present invention realizes convolutional neural networks model using the Caffe frame of Berkeley vision and learning center's exploitation.Make The training set and test set constructed with 3.2 sections carries out Faster R-CNN method with based on Selective Search method The SPPnet method and DPM method that region is proposed compare, and test result is as shown in table 1.

The comparison of 1 the method for the present invention of table and SPPnet recognition accuracy on test set

The accuracy rate of Faster R-CNN method identification is apparently higher than SPPnet and DPM as can be seen from Table 1, and the side DPM Method accuracy rate is minimum.This is mainly due to regions to propose that network can produce candidate frame more accurately than SPPnet, and DPM method It is detected using sliding window, it is characterized in that HOG feature, rather than depth training characteristics.In addition, the method for the present invention is in network Trained step 2 carries out tuning to the weights of whole characteristic layers and convolutional layer, and SPPnet only tuning characteristic layer, to limit Recognition accuracy.It is worth noting that, the region that the method for the present invention uses proposes network and detection network with extensive well Ability can recognize that part is blocked and the intermediate conductor spacer for passing through iron, and all can be just to the component of various different directions Really identification, and other two methods are then slightly more inferior.Fig. 6 (a)-Fig. 6 (b) is using Faster R-CNN method to three kinds of electric power The result of widget identification.

The present invention it is all test be based on same server and test, test set picture size be 5184 × 3456, DPM methods are based on CPU and realize, Faster R-CNN method and SPPnet use Nivdia Titan Black GPU (6G video memory) carries out convolutional calculation, and identification process consumes 3G video memory.In addition, the non-maximum value inhibition of the method for the present invention is also adopted It is realized with GPU.From table 2 it can be seen that the operation time of DPM in minute grade, can not carry out time efficiency with other two methods Comparison；For typical RCNN method as SPPnet, region proposes to occupy the main calculating time；And side of the present invention Method, since sharing for convolution feature (proposes that network and the special layers of detection network are all added in shared characteristic pattern CONV5 in region Behind), so that region proposes that the time almost can be ignored, detection time can be completed in nearly 80ms.

Test result shows the reality that inspection image may be implemented based on specific graphics acceleration card using deep learning method When detect.

2 the method for the present invention of table and SPPNet, DPM method calculate time overhead comparison

Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.

Claims

1. the unmanned plane inspection image electric power widget recognition methods based on FasterR-CNN, characterized in that such as including step Under:

Step (1): pre-training is carried out to ZFnet model, extracts the characteristic pattern of unmanned plane inspection image；Net is proposed to the region RPN Network model and Faster R-CNN detection network are initialized；ZFnet model is carried out using ImageNet categorized data set Pre-training obtains the ZFnet model after pre-training；

ZFnet model is 8 layer network structures, includes 5 convolutional layers and 3 full articulamentums, counts from top to bottom in first layer, second The operation of Max Pooling pondization is added behind layer and layer 5 convolutional layer；

ZFNet model counts from top to bottom,

The result that second convolutional layer operates first Max Pooling pondization carries out convolution, after convolution, carries out second The operation of Max Pooling pondization；

5th convolutional layer carries out convolution to the convolution results of the 4th convolutional layer, after convolution, carries out third Max Pooling Pondization operation；

256 output units are obtained after the operation of third Max Pooling pondization, form characteristic pattern；

Step (2): network model training is proposed to the region RPN that initialization obtains, obtains extracted region network, is mentioned using region It takes network to generate candidate region frame on the characteristic pattern of image, the feature in the frame of candidate region is extracted, target is extracted Position feature and further feature；

Step (2): using electric power widget picture construction electric power widget image set, initial with the training of electric power widget image set When network is proposed in the region RPN after change, network, which carries out tuning, to be proposed to the region after initialization using back-propagation algorithm；

The region RPN proposal network is input with the image of arbitrary size, and it includes candidate's mesh that the region RPN, which proposes that network exports several, Target candidate region frame；

The region RPN is added behind the 5th convolutional layer of ZFnet model and proposes network convolutional layer, and convolution operation is using cunning Flowing mode running is utilized for opening a window on the target position of each hand labeled on ZFNet aspect of model figure The region RPN proposes that network carries out convolution algorithm, obtains corresponding 256 dimensional feature vector in each position, 256 dimensional feature to Measure the further feature in the window for reflecting each position；

Propose 9 kinds of convolution kernel functions pair that network convolutional layer is combined into using 3 kinds of different sizes and 3 kinds of different proportions in the region RPN The position of window comprising candidate target carries out Objective extraction, the position feature about target is obtained, by the position feature of target Training data as input is input to step (3)；

Step (3): using the position feature of target, further feature and characteristic pattern, the Faster R-CNN that initialization obtains is examined Survey grid network is trained, and obtains electric power widget detection model；

Steps are as follows for the step (3):

Step (3-1): the further feature that the 5th convolutional layer is obtained, full connection form high dimensional feature vector and realize the whole of image Body characteristics description, the input as the full articulamentum FC6 of layer 6；

Step (3-2): calculating network whole loss function and carries out each layer parameter optimization of network according to loss function；

Loss function L (p, k^*,t,t^*):

Training dataset is divided into K+1 class, k^*Indicate correct tag along sort, p=(p₀,...,p_K) presentation class be k it is general Rate,Indicate the indicia framing information that is calculated of regression block: the vertical seat of the abscissa of indicia framing, indicia framing The height of mark, the width of indicia framing and indicia framing；

Calculation formula are as follows:

The small parameter perturbations training of detection network is carried out according to the label demarcated in training set, by stochastic gradient descent side, alternately Optimize network parameter；

In Optimum Classification module parameter, fixed regression block parameter constant, the fixed cluster mould in optimized regression module parameter Block parameter constant；

Propose that network and Faster R-CNN detection network finally have shared convolutional layer and form the small portion of electric power in the region RPN Part detection model；

2. the unmanned plane inspection image electric power widget recognition methods based on FasterR-CNN as described in claim 1, special Sign is can be predicted by 256 dimensional feature vector:

2) the corresponding position of 256 dimensional feature vectors nearby includes that the window of candidate target is corresponding relative to 256 dimensional feature vectors The deviation of the window of position.

3. the unmanned plane inspection image electric power widget recognition methods based on FasterR-CNN as described in claim 1, special Sign is that steps are as follows for step (4):

Step (4-1): the electric power widget detection model obtained according to training initializes electric power widget detection model parameter；

Step (4-2): input picture is obtained after electric power widget detection model to each classification included in image Probability value and location information；

4. system used by the method as described in claim 1, characterized in that include:

Pre-training module: pre-training is carried out to ZFnet model, extracts the characteristic pattern of unmanned plane inspection image；The region RPN is proposed Network model and Faster R-CNN detection network are initialized；

Characteristic extracting module: network model training is proposed to the region RPN that initialization obtains, obtains extracted region network, utilizes Extracted region network generates candidate region frame on the characteristic pattern of image, extracts to the feature in the frame of candidate region, extracts To the position feature and further feature of target；

Model training module: using the position feature of target, further feature and characteristic pattern, the Faster R- that initialization is obtained CNN detection network is trained, and obtains electric power widget detection model；