CN110472545B

CN110472545B - Aerial photography power component image classification method based on knowledge transfer learning

Info

Publication number: CN110472545B
Application number: CN201910721060.5A
Authority: CN
Inventors: 赵俊梅; 张利平; 任一峰; 李晓; 余永俊; 白鑫; 张灵菲
Original assignee: North University of China
Current assignee: North University of China
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2022-09-23
Anticipated expiration: 2039-08-06
Also published as: CN110472545A

Abstract

The invention relates to the field of deep learning and machine vision combination in artificial intelligence, and discloses a knowledge transfer learning-based classification method for aerial photography power component images, which is implemented according to the following steps of establishing a convolutional neural network GoogLeNet; optimizing the convolutional neural network GoogLeNet, and replacing the last three layers of the convolutional neural network GoogLeNet with a full connection layer, a softmax layer and a classification output layer on the basis of the convolutional neural network GoogLeNet to perform optimization setting; when training a network, the network parameters are obtained by combining multiple simulation experiments and a Bayesian optimization algorithm; inputting the acquired electric power component image after normalization preprocessing into the new deep convolution neural network obtained and set in the second step for learning, and classifying according to the categories of insulators, hardware fittings, towers and the like; and carrying out simulation experiments for verification.

Description

Aerial photography power component image classification method based on knowledge transfer learning

Technical Field

The invention relates to the field of combination of deep learning and machine vision in artificial intelligence, which is applied to classification and identification of electric power components in the inspection process of an electric transmission line in an electric power system so as to ensure safe operation of the electric transmission line.

Background

The power transmission line is a vital component in a power grid system, is used as a main line of the power grid system, plays a decisive role in whether the whole power grid is reliable, long-term, safe and stable in operation, and is directly related to the healthy development of national economy in long-term and effective operation of the power grid system. Along with the implementation of transmission network construction engineering, transmission line sharply increases, and the circuit is patrolled and examined the work load and is sharply increased, and geographical environment is complicated, climatic conditions circumstances such as changeable make traditional manual work patrol and examine the operation more dangerous, and it is not high to patrol and examine efficiency and rate of accuracy. Compared with the traditional inspection mode, the helicopter inspection and unmanned aerial vehicle inspection mode has the advantages of high efficiency, flexible inspection mode, short image acquisition period, no influence of natural environment and the like, and gradually becomes the mainstream mode of power transmission line inspection. In the power patrol target, the power components are large in number and number in the power transmission network, and some components are easily damaged. In long-term operation, the power transmission system is influenced by severe weather such as strong wind, thunderstorm, ice mold, ice coating and the like, particularly, the insulator and the like are easily damaged, the normal operation of a power transmission network of a power system is further influenced, and a large-area power failure accident can be caused seriously. The aircraft inspection mode is to shoot and collect images along the power transmission line through the stable speed and the relatively fixed angle of helicopter or unmanned aerial vehicle, and can pass through various landforms such as mountains, rivers, grasslands, houses, arable land and the like in the shooting process, and also can be under various meteorological conditions (rain, snow, fog and the like), and the applicability and robustness of different backgrounds and noises formed by different landforms and different climatic environments to the detection and fault detection of the power transmission line are a great challenge. With the rapid increase of aerial photography power transmission line and power component image data, the data volume is huge, the data redundancy is also large, and higher requirements are put forward for the classification of aerial photography power component images.

In the field of machine vision, an important research direction is the extraction, classification, detection and identification of image features, and the extraction and classification identification of early image features are the features of target images manually extracted by human beings, which consumes a lot of time and workload. The development of artificial neural network technology in artificial intelligence brings new power and vitality to the field of machine vision, particularly in the field of image classification and recognition. At present, the deep learning branch in machine learning develops rapidly, and the convolutional neural network has a very good effect on image classification. By using deep learning, image features do not need to be extracted manually like traditional digital image algorithms, and the convolutional neural network has automatic learning features.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the problems that the background of an aerial electric component image is complex, the shooting angle is indefinite, the data volume is huge, and accurate classification and accurate positioning of an electric component are difficult are solved.

The technical scheme adopted by the invention is as follows: the method for classifying aerial electric power component images based on knowledge transfer learning comprises the following steps

Step one, establishing a convolutional neural network GoogLeNet;

step two, optimizing the convolutional neural network GoogLeNet, replacing the last three layers of the convolutional neural network GoogLeNet with a full connection layer, a softmax layer and a classification output layer on the basis of the convolutional neural network GoogLeNet to further obtain a new deep neural network, and then carrying out optimization setting on the basis of the original convolutional neural network GoogLeNet setting; when training a network, the network parameters are obtained by combining multiple simulation experiments and a Bayesian optimization algorithm;

step three, inputting the acquired electric power component image after normalization preprocessing into the new deep convolution neural network obtained and set in the step two for learning, and classifying according to the categories of insulators, hardware fittings, towers and the like;

and step four, carrying out simulation experiment for verification.

The convolutional neural network GoogLeNet has 144 single layers, wherein 22 functional layers with learning weight comprise 21 convolutional layers and 1 full-connection layer, the first convolutional layer adopts 64 7 × 7 convolutional kernels, the step length stride is a structure of 2, a padding [ 3333 ] structure is filled, 64-dimensional features are obtained after convolution, the feature output of the first convolutional layer is 112 × 112 × 64, the first convolutional layer is directly input into an excitation layer, a universal ReL U function is selected as the excitation function, the first convolutional layer adopts a 3 × 3-kernel pooling layer, the pooling adopts a maximum pooling max pool mode, the feature vectors are changed into 56 × 56 × 64 dimensions after the pooling, and then a norm layer is added; the second layer convolution adopts 192 convolution kernels of 3 × 3, a structure with the step size stride of 1, a structure with padding ═ 1111 ], the feature vector becomes 56 × 56 × 192, and the excitation layer and pooling layer output features have dimensions of 28 × 28 × 192.

split branch operation, using 1 × 1, 3 × 3, 5 × 5 convolution kernels, including 4 branches, using 64 convolution kernels of 1 × 1, padding ═ 0000 ], and also using ReL $ function as excitation layer, thus outputting 28 × 28 × 64 dimensional vector; compressing the feature vector to 28 × 28 × 96 dimensions by using 96 1 × 1 convolution kernels, passing through an excitation layer, and then using 128 3 × 3 convolution kernels, padding ═ 1111, to obtain a 28 × 28 × 128 dimensional vector by the excitation layer; firstly, compressing data to 28 × 28 × 16 by using 16 1 × 1 convolution kernels, performing excitation layer and then 5 × 5 convolution operation, selecting padding to [ 2222 ], and finally outputting a 28 × 28 × 32-dimensional vector; pooling at max pool, and then obtaining a 28 × 28 × 32 dimensional vector by using a 1 × 1 convolution window; and the classification output layer splices the feature vectors output by the four nodes.

The following initiation module proceeds to the next step using the same method and so on.

The parameter optimization setting comprises the steps of adopting a random gradient descent algorithm of momentum, and setting the momentum to be 0.9; the small batch algorithm is set to be 10, and is a mixed form of a random gradient descent algorithm and a batch algorithm; the maximum number of wheels is set to 3;L ₂ the parameter regularization (weight decay) is set to 0.00055, and the regularization strategy makes the weight closer to the origin by adding a regularization term to the objective function; learning rate of 10 ^-4 (ii) a The verification frequency was 15.

The invention has the beneficial effects that: the GoogLeNet not only deepens the hierarchy of the network, but also increases the width of the network, realizes the aggregation of multi-scale features and enables the classification effect of the network to be better. The Incep module is a structure proposed for realizing the increase of the network width, and has two important innovation points, namely, the 1 x 1 convolution kernel is used for reducing the dimension of the upper-layer input, so that the complexity of calculation is reduced; and secondly, connecting a plurality of convolution layers and pooling layers with different scales in parallel, and then splicing the obtained different characteristics, thereby improving the expression capability of the characteristics.

The invention fully exerts the advantages of the convolutional neural network in deep learning, in particular to the function of automatically learning the image characteristics. Meanwhile, the GoogLeNet network structure is reasonably adjusted by using the characteristics of transfer learning, optimal training parameters are obtained through simulation experiments and Bayesian optimization algorithms, the training of a new network is completed, and finally, the images (insulators, towers and hardware) of the electric power components are classified and identified, so that a basis is provided for the state detection of the electric power components, and the safe operation of the power grid is ensured.

Drawings

FIG. 1 is an abstraction model diagram;

FIG. 2 is a schematic flow diagram of the present invention;

fig. 3 is a diagram of a training process.

Detailed Description

Through unmanned aerial vehicle transmission line of taking photo by plane, patrol and examine the circuit, when promoting the efficiency of patrolling and examining, also can gather a large amount of electric power part images. For a traditional digital image processing algorithm, for factors such as complex background, multiple types of electric power components, complex shooting environment, unfixed shooting angle, mutual adhesion among different types of electric power components and the like, the classification detection effect of the traditional digital image processing algorithm on different electric power components is poor, and the applicability is weak. The method makes full use of the advantages of the convolutional neural network in deep learning, utilizes the characteristics of transfer learning, exerts the successful experience of the classical convolutional neural network, and classifies the acquired power component images. Firstly, establishing a sample library of electric power component images, then creating a convolutional neural network based on GoogLeNet by utilizing transfer learning, designing network training parameters, carrying out network training, verifying the effectiveness of the network, and properly adjusting the network parameters according to the training and verifying results. The specific algorithm is introduced as follows:

characteristics of one, convolution neural network GoogLeNet

The champion of ILSVRC2014 is google lenet, which is a novel network structure proposed based on the idea of network-in-network. The GoogLeNet belongs to a deep network in network design, and has 144 single layers, wherein 22 functional layers with learning weight comprise 21 convolutional layers and 1 fully-connected layer. The biggest innovation point of the GoogLeNet in design is to use multi-scale convolution to perform local densification on sparse matrix operation, process features in parallel and then perform feature splicing.

As can be seen from the network configuration, the size of the original input image is first fixed to an RGB image of 224 × 224 × 3.

The first convolutional layer has a structure of 64 convolution kernels of 7 × 7, a stride of 2, and a padding of [ 3333 ], and thus a 64-dimensional feature is obtained by convolution, and therefore the feature output of the first convolutional layer is 112 × 112 × 64. After the convolution layer is directly input into an excitation layer, the excitation function selects a universal ReLU function, then a pooling layer of 3 × 3 cores is adopted for pooling, and a mode of maximum pooling (max pool) is adopted for pooling, so that the feature vector after pooling is changed into dimensions 56 × 56 × 64. Google lenet then adds a norm layer on the network design.

The second layer convolution uses 192 convolution kernels of 3 × 3, a structure with step size stride of 1, and a structure with padding ═ 1111 ] is filled, so that the feature vector becomes 56 × 56 × 192, and the excitation layer and pooling layer output features are similarly performed with 28 × 28 × 192 dimensions.

What follows is the innovation of google lenet, which performs split branch operation, and the authors propose the concept of initiation, which for the sake of computational convenience, uses 1 × 1, 3 × 3, 5 × 5 convolution kernels, which include 4 branches as shown in fig. 1:

(1) using 64 convolution kernels of 1 × 1, padding ═ 0000 ], the ReLU function is also used as the excitation layer, and thus a 28 × 28 × 64 dimensional vector is output.

(2) The feature vector is compressed into 28 × 28 × 96 dimensions by using 96 1 × 1 convolution kernels, and then passes through an excitation layer, and then 128 3 × 3 convolution kernels are used, padding is [ 1111 ], and a 28 × 28 × 128 dimensional vector is obtained by passing through the excitation layer.

(3) The data is compressed to 28 × 28 × 16 by using 16 1 × 1 convolution kernels, excitation layers are performed, then 5 × 5 convolution operation is performed, padding is selected to [ 2222 ], and finally a 28 × 28 × 32-dimensional vector is output. (4)

Pooling was performed with max pool, and then a 28 × 28 × 32 dimensional vector was obtained using a 1 × 1 convolution window. Finally, feature vectors output by the four nodes are spliced, and convolution is performed by selectively using convolution kernels, so that feature addition can be directly performed. The following initiation module proceeds in the same manner and so on.

Secondly, knowledge migration learning is carried out on the GoogLeNet network

Knowledge transfer learning is learning by training samples in other fields, and relevant knowledge is extracted from the training samples and is used for learning in the fields. Through the transfer learning, the machine can improve the ability of utilizing the previously learned knowledge, and the incremental learning is better realized. The most important feature of knowledge transfer learning is to utilize knowledge in related fields to help complete learning tasks in target fields.

The GoogLeNet model has rich weight and bias, different target sample libraries can be adjusted by transfer learning, the GoogLeNet network is low in complexity and high in accuracy, and the GoogLeNet model is very suitable for algorithms of transfer learning. Therefore, the method is based on the GoogleLeNet network for migration, and the method aims to put aside a three-way bypass and a touch bypass in the popular term of migration and can also be understood according to the method of flower transfer and wood grafting. The final purpose is to shorten the network training time and improve the network utilization rate.

And the transfer learning is to re-optimize the last three-layer fully-connected network which can be used for the GoogLeNet of 1000 types according to the type of the aerial electric component image as a new classification number, and continuously reserve the front part of the original network. And finally, the three layers are replaced by a full connection layer, a softmax layer and a classification output layer, so that a new deep neural network is obtained. The invention sets the classification number of the full connection layer as 5, the weight learning rate factor as 30 and the bias learning rate factor as 30. Next, setting new network training parameters, wherein the main parameter settings are as follows:

(1) a random gradient descent algorithm of momentum is adopted, the momentum is set to be 0.9, the algorithm calculates the error of each training data and randomly adjusts the weight. Momentum-based gradient descent algorithms aim to accelerate learning, especially dealing with high region rates, small but uniform gradients, or noisy gradients. The momentum algorithm accumulates the moving average of the previous gradient exponential decay and continues to move in that direction.

(2) The mini-batch algorithm is set to 10, which is a hybrid of the stochastic gradient descent algorithm and the batch algorithm. First, it selects a portion of the data set, and then trains this data set with a batch algorithm. Thus, it calculates a weight update value once with the selected data set, and then adjusts the neural network with the average weight.

(3) The maximum number of rounds is set to 3, and the number of rounds is the number of cycles that all training data participate in training. For the small batch algorithm, the number of times each round of training depends on the choice of the number of data points per small batch.

(4)L ₂ The parameter regularization (weight decay) is set to 0.00055 and the regularization strategy makes the weights closer to the origin by adding a regularization term to the objective function. Regularization is a modification of the learning algorithm that aims to reduce generalization errors rather than training errors.

(5) Learning rate of 10 ^-4 The learning rate is one of the important parameters of the stochastic gradient descent algorithm. If the learning rate is too low, the learning process is slow and the learning may be stuck at a relatively high cost value. If the learning rate is too high, the learning curve will oscillate sharply, and the cost function value will usually increase significantly. The learning rate can be selected by trial and error.

(6) The verification frequency is 15, which is the number of iterations between evaluations of the verification metric, characterizing the frequency of network verification in the number of iterations.

The network parameter value of the invention is set by combining the results of multiple simulation experiments and the Bayesian optimization algorithm, so as to obtain the optimal network parameter. The bayesian optimization algorithm is a very suitable algorithm for optimizing the internal parameters of classification and regression models and can optimize indistinguishable, discontinuous and time-consuming functions for evaluation. The algorithm internally utilizes a gaussian process model of the objective function and trains the model using an objective function estimate. The Bayes optimization process utilizes the famous 'Bayes theorem':

wherein f represents an unknown objective function; d _1：t ＝{(x ₁ ，y ₁ )，(x ₂ ，y ₂ )，…，(x _t ，y _t ) Denotes the observed set; x is the number of _t Representing a decision vector; y is _t ＝f(x _t )+ε _t Representing the observed value; epsilon _t Representing an observation error; p (D) _1：t If) represents the likelihood distribution of y, also called "noise" due to errors in the observed values; p (f) represents the prior probability distribution of f, its assumption of unknown objective function state; p (D) _1：t ) Represents a marginalized f-marginal likelihood distribution or "evidence" that is primarily used to optimize the hyper-parameter; p (f | D) _1：t ) A posterior probability distribution of f is expressed describing the confidence of the unknown objective function after correction of the priors by the observed dataset. Bayesian optimization is an iterative process, and an optimization framework mainly comprises a probability agent model (comprising a prior probability model and an observation model) and an acquisition function. The prior probability model is p (f). The observation model describes the mechanism of observation data generation, namely the likelihood distribution P (D) _1：t If), updating the probabilistic proxy model will result in a posterior probability distribution P (f D) that includes more data information _1：t ). The acquisition function is constructed from a posterior probability distribution by maximizingThe function is collected to select the next best evaluation point. At the same time, the effective acquisition function ensures that the selected series of evaluation points minimizes the total loss. The Bayesian optimization mainly comprises three steps: 1. selecting the best evaluation point x according to the maximized sampling function _t (ii) a 2. According to the selected evaluation point x _t Evaluating the value of the objective function y _t ＝f(x _t )+ε _t (ii) a 3. New obtained input observed value x _t ，y _t Add to historical Observation set D _1：t-1 And updating the probability agent model to prepare for iteration.

The invention is based on the Bayesian algorithm to optimize the parameters, and the steps are roughly as follows:

(1) determining training set and verification set image data;

(2) selecting optimized parameters, the invention focuses on the optimal learning rate, the dynamic value of random gradient descent, and L ₂ Optimizing the regularized strength parameter;

(3) setting variables of a Bayesian optimization algorithm; in order for the Bayesian optimizer to define an objective function, the objective function takes an image training set and a verification set as input, trains a convolutional neural network and returns classification errors of the verification set. And transferring the target function to a Bayesian acquisition function, wherein the acquisition function minimizes cross validation loss.

(4) And setting network structure parameters and training the network. The convolutional neural network structure is input layer → convolutional layer → batch normalization → modified linear unit → max pooling layer → convolutional layer → batch normalization → modified linear unit (ReLU) → max pooling layer → fully connected layer → softmax layer → classified layer. The main parameters of network training are as follows: the random gradient descent algorithm of momentum is adopted, the momentum is 0.95, the maximum number of rounds is 10, and the learning rate is 10 ^-3 ，L ₂ The parameter is normalized to 0.000001.

(5) In network training, a data enhancement method is used to randomly flip the training image along the vertical axis and randomly shift the image by up to four pixels in both the horizontal and vertical directions. Data enhancement helps prevent the network from overfitting and remembering the specific details of the training images.

(6) Predicting labels of the test set, calculating test errors, and executing Bayesian optimization by minimizing classification errors on the verification set so as to obtain an optimal network and verification accuracy and finally optimize parameters.

Data set

The background of the aerial electric component image is very complex, the shooting angle and the image quality are very easily affected by the external environment, particularly severe weather and adverse geographical conditions, the electric component images are more in the condition of mutual adhesion (such as insulators and hardware fittings in towers and hardware fittings in power transmission lines), and sometimes can be shielded by objects such as large trees participating in the sky, so that great difficulty is brought to the accurate classification of the aerial electric component images. At present, no standard power component image library exists, and the power component image library is self-built. The data set adopted by the invention has 932 images, wherein 720 images of the insulator, 86 images of the tower, 55 images of the hardware (downloaded from the internet), 39 images of the lane line and 32 images of the road sign. Some lane line and road sign images are added to the database to prove the classification noise resistance and the robustness of the network. Meanwhile, a group Truth Labeler APP toolbox is used for labeling the 177 images in three categories of insulators, hardware fittings and towers. All images are preprocessed by normalization and the like, and the resolution is unified to 800 x 600. Meanwhile, the proportion is 7: 3 (training set and validation set).

Fourthly, training classification results

The hardware environment of the simulation experiment of the invention is as follows: CPU i5-7200, main frequency 2.7GHz and memory 8G, training is carried out by a single CPU, different numbers of data sets are selected to carry out simulation on a Matlab platform, and the classification results are as follows:

TABLE 1 Classification of different sample data sets

Number of sample images (vice)	Run time (minutes)	Accuracy of classification
			262	32	96％
632	381	97.8％
			932	419	97.83％

Table 1 lists the run time and classification accuracy required when randomly selecting different sample image data volumes. And (4) displaying a classification result, wherein a transfer learning method is utilized, so that huge image data volume can be avoided from being acquired, and the performance of the GoogLeNet network is exerted by utilizing a certain image data volume, so that the image classification of the aerial photography power component is achieved. Fig. 2 is a flowchart. Fig. 3 is a diagram of a training process.

Claims

1. The method for classifying aerial photography power component images based on knowledge transfer learning is characterized by comprising the following steps: the method comprises the following steps

Step one, establishing a convolutional neural network GoogLeNet;

secondly, migrating the convolutional neural network GoogLeNet, replacing the last three layers of the convolutional neural network GoogLeNet with a full connection layer, a softmax layer and a classification output layer on the basis of the convolutional neural network GoogLeNet, further obtaining a new deep neural network, and then performing parameter optimization setting on the basis of the original convolutional neural network GoogLeNet setting; when the network is trained, network parameters are obtained by combining multiple simulation experiments and a Bayesian optimization algorithm, wherein split branch operation adopts 1 × 1, 3 × 3 and 5 × 5 convolution kernels and comprises 4 branches, 64 convolution kernels of 1 × 1 are used, padding = [ 0000 ] is also adopted as an excitation layer, and therefore a 28 × 28 × 64 dimensional vector is output; compressing the feature vector to 28 × 28 × 96 dimensions by using 96 1 × 1 convolution kernels, passing through an excitation layer, and then using 128 3 × 3 convolution kernels, padding = [ 1111 ], so as to obtain a 28 × 28 × 128-dimensional vector through the excitation layer; firstly, compressing data to 28 × 28 × 16 by using 16 1 × 1 convolution kernels, performing excitation layer and then 5 × 5 convolution operation, selecting padding = [ 2222 ], and finally outputting 28 × 28 × 32-dimensional vectors; pooling is carried out by max pool, and then a convolution window of 1 × 1 is used for obtaining a 28 × 28 × 32 dimensional vector; the classification output layer splices the feature vectors output by the four nodes, and then carries out the following initiation module by the same method and so on;

step three, inputting the collected electric power component images after normalization preprocessing into the new deep convolution neural network obtained and set in the step two for learning, and classifying according to the categories of insulators, hardware fittings and towers;

and step four, carrying out simulation experiment for verification.

2. The method for classifying aerial electric power component images based on knowledge transfer learning according to claim 1, wherein: the convolutional neural network GoogLeNet comprises 144 single layers, wherein 22 functional layers with learning weight comprise 21 convolutional layers and 1 full-connection layer, the first convolutional layer adopts 64 7 × 7 convolutional kernels, the step length stride is a 2 structure, a padding = [ 3333 ] structure is filled, a 64-dimensional feature is obtained after convolution, the feature output of the first convolutional layer is 112 × 112 × 64, the convolutional layer is directly input into an excitation layer after the first convolutional layer, a universal ReLU function is selected as the excitation function, the first convolutional layer adopts a 3 × 3 kernel pooling layer, the pooling adopts a maximum pooling max pool mode, the feature vector is changed into 56 × 56 × 64 dimensions after pooling, and then a norm layer is added; the second layer convolution adopts 192 convolution kernels of 3 × 3, a structure with the step size stride of 1, a structure with padding = [ 1111 ], the feature vector becomes 56 × 56 × 192, and the output features of the excitation layer and the pooling layer are 28 × 28 × 192 dimensions.

3. The method for classifying aerial electric power component images based on knowledge transfer learning according to claim 1, wherein: the parameter optimization setting comprises the steps of adopting a random gradient descent algorithm of momentum, and setting the momentum to be 0.9; the small batch algorithm is set to be 10, and is a mixed form of a random gradient descent algorithm and a batch algorithm; the maximum number of wheels is set to 3; l is ₂ The parameter regularization weight attenuation is set to be 0.00055, and the regularization strategy enables the weight to be closer to an origin point by adding a regularization item to the target function; learning rate of 10 ^-4 (ii) a The verification frequency is 15.