Intelligent identification and early warning method for abnormal events in open scene of power field based on edge calculation
Technical Field
The invention discloses an intelligent identification and early warning method for abnormal events in an open scene of an electric power field based on edge calculation, and belongs to the technical field of intelligent identification of electric power abnormal events.
Background
With the continuous development and construction of electric power engineering, the scale of a power transmission network is larger and larger, more and more equipment is provided, and the inspection workload for power transmission equipment and lines is also continuously increased. In addition, in some remote areas, such as mountainous areas, snowfields and the like, related workers are difficult to reach, and the difficulty of patrol work is increased. Meanwhile, the power grid management level is continuously improved, and an operation and maintenance unit also continuously explores the operation management mode of the power transmission line. At present, mainstream solutions for identification and early warning of abnormal events include an online monitoring system, an unmanned aerial vehicle automatic inspection and the like, but all of the solutions have disadvantages.
In the online monitoring system, a monitor installed on a transmission tower (pole) shoots a video or an image, and then the video or the image is transmitted to a server, and the server identifies and processes an abnormality. The disadvantage of this scheme is the delay and instability of anomaly detection. It takes a necessary time to transmit the picture photographed by the terminal monitor to the server, and the picture cannot be transmitted to the server without fail in its entirety, possibly due to a problem of a transmission line. The server needs to process data transmitted from hundreds of remote monitors, which often have a large amount of redundant information, and the server has no ability to identify useful and useless information and cannot set processing priority for the transmitted data to be detected. The automatic patrol of the unmanned aerial vehicle has the defect that the unmanned aerial vehicle is not suitable for long-distance patrol work, and people are required to replace batteries and copy data for the unmanned aerial vehicle.
The invention aims to identify and early warn abnormal events in an open scene of the power field by utilizing an improved SSD target detection model based on edge calculation. By identifying and early warning at the edge, the communication bandwidth between the central control system and the data acquisition terminal is reduced. Because the detection scene is larger, the proportion of the abnormal object in the picture is smaller, and in addition, the invention performs targeted improvement on the original SSD target detection model in consideration of the scene variability, so that the detection of the small object still has higher accuracy and stronger generalization capability on different scenes.
In order to solve the problems of complex background, much noise, easy exposure of pictures and the like of a high-speed rail monitoring video, chinese patent document CN109165575A researches a smoke and fire recognition algorithm based on an image deep learning SSD framework, wherein a detection model training network is a reconstructed VGG16 network, and 6 convolutional layers and 1 pooling layer are added to the reconstructed detection model training network on the basis of VGG 16.
The invention belongs to the technical field of target detection, and particularly relates to a smoke discharge video detection method based on an improved VGG16 convolutional network, which comprises the following steps: step 1: generating a chimney emission image dataset; and 2, sending the training set into an improved VGG16 convolutional network for training to obtain a plurality of weight models. The main network structure of the improved VGG16 convolutional network is VGG16, the last two full-connection layers are changed into two convolutional layers and used for extracting image features of the chimney in a multi-scale mode, the convolutional layers are connected with the global mean pooling layer, the generated matrix is used for outputting results, and finally the matrices are input into a loss function for classification to construct a complete network structure.
Compared with the above-mentioned comparison documents and the prior art, the invention is mainly applied to the open scene in the power field, adopts a plurality of image enhancement methods to process data according to different weather conditions, and researches an intelligent detection algorithm of abnormal events based on an SSD frame, wherein a training network of a detection model is an improved VGG16 network. The Conv4_ x characteristic layer and the Conv5_ x characteristic layer of the VGG16 network are fused by the improved detection model training network, and the fused characteristics are directly acted on the final prediction layer, so that the accuracy of small target detection is improved.
Disclosure of Invention
The invention discloses an intelligent identification and early warning method of abnormal events in an open scene in the power field based on edge calculation.
Summary of the invention:
the improved SSD target detection model is compressed and transplanted to the mobile end, the edge calculation advantage is fully played, and through experiments, the android end is used as an optimal scheme; the Conv4_ x characteristic layer and the Conv5_ x characteristic layer in the VGG16 network are fused, and the fused characteristic layer is directly applied to the last prediction layer, so that the accuracy of small target detection is improved; meanwhile, the invention summarizes a plurality of basic weather conditions: the method is characterized by comprising the following steps of increasing training data under different scenes by using an image enhancement technology on sunny days, cloudy days, rainy days, foggy days and the like so as to improve the generalization capability of the model.
The technical scheme of the invention is as follows:
an intelligent identification and early warning method for abnormal events in an open scene of an electric power field based on edge calculation is characterized by comprising the following steps:
s1: performing image enhancement processing on training data under different scenes, and labeling the training data by using a labeling tool to obtain an xml file;
s2: extracting features of an original picture by using a VGG16 network as a basic network, performing feature fusion on a Conv4_ x feature layer and a Conv5_ x feature layer in the VGG16 network, and applying the fused features to a prediction layer of a final model;
s3: adding different network layers behind a basic network, predicting targets and scores of the attribution categories of the targets on the network layers, and meanwhile, using a small convolution kernel on a characteristic layer to regress accurate positions of a series of bounding boxes;
s4: aiming at a large number of boundary frames generated on the same target position, using non-maximum value to inhibit and find out the optimal target boundary frame, eliminating redundant boundary frames and training a model;
s5: and applying the trained model at the mobile terminal to perform intelligent identification and early warning of the abnormal event. Preferably, the mobile terminal is an android terminal.
Preferably, according to the present invention, the method for enhancing the image in step S1 is:
s11: setting different brightness and/or contrast for an original picture, and performing Gaussian blur processing to simulate different scenes;
s12: the labeling tool is used to label the picture after the processing of step S11,
s13: for each picture processed in step S11, the following operations are randomly performed:
1) using the original picture;
2) sampling a region according to a sampling rule, wherein the sampling rule is as follows: randomly selecting a region between 0 and 1 with the smallest intersection with the object;
3) randomly sampling a block of area;
s14: the sampled area is [0.1, 1] of the original image size ratio, and the aspect ratio is between 0.5 and 2; when the center of the label box is in the sampled area, the overlapping part is reserved;
s15: after the sampling step, each sampled region is adjusted to a fixed size and flipped at a random level with a probability of 0.5.
Preferably, in step S2, the extracting the features of the original picture includes:
s21: collecting images, sending the collected images into a VGG16 network, sequentially passing through 5 layers of convolution layers, a pooling layer and 2 layers of full-connection layers, and extracting the characteristics of the images;
s22: the Conv4_ x feature layer and the Conv5_ x feature layer are fused together, and the fused features are directly applied to the last prediction layer, namely the fused features are input as part of the last prediction layer.
According to the invention, the Conv4_ x feature layer and the Conv5_ x feature layer are preferably fused in a vector splicing mode or a corresponding element addition mode.
The method for determining the accurate position of the bounding box in the step S3 includes:
s31: the picture feature layer extracted through the step S2 is sliced into 8 × 8 or 4 × 4 grids;
s32: generating a series of fixed-size bounding boxes for each mesh on the feature layer, each bounding box including at least 5 prediction parameters: x, y, w, h, conf, wherein (x, y) represents the center coordinates of the bounding box with respect to the grid, (w, h) represents the width and height predicted with respect to the entire image, and (conf) represents the IOU value of the bounding box with any one of the label boxes;
s33: at each feature level, a series of convolution kernels are used to produce a series of fixed-size predictors.
Preferably, in step S33, the size of the convolution kernel used is 3 × 3 × p for an m × n p-channel feature layer.
According to a further embodiment of the invention, the predictor is a confidence score of the attribution category or a position offset value of the bounding box.
Preferably, the method for finding the optimal target bounding box in step S4 using non-maximum suppression includes:
s41: pairing each bounding box with all the annotation boxes: when the intersection ratio between the two is larger than a threshold value, the two are combined into a sample; preferably, the threshold is set between 0.5 and 0.9;
s42: sorting the bounding boxes of which the corresponding prediction results are negative samples at the positions of each object in the original image from large to small according to the confidence degrees, and selecting a plurality of positive and negative samples, wherein the number ratio of the positive and negative samples is about 1: 3;
s43: by using
Indicating that the ith bounding box is matched with the jth label box of the category k; otherwise, the data is not matched,
s44: the total target loss function is obtained by weighted summation of the position loss (loc) and the confidence loss (conf):
in the above formula, x represents whether the bounding box and the labeled box are matched, if so, x is 1, otherwise, x is 0, c represents the attribution type score of the bounding box, l represents the bounding box, g represents the labeled box, N is the number of matched bounding boxes, and if N is 0, the loss value of the objective function is set to be 0; the position loss (loc) is the smooth between the parameters of the bounding box (l) and the label box (g) after the step of S33L1Loss;
in the above formula, d represents that the S32 step is passed but the S33 step is not passedThe bounding box of (1). (cx, cy) represents the center coordinate of d, (w, h) represents the width and length of d,
and representing the label box after log processing.
The confidence loss (conf) is obtained by performing softmax operation on the confidence of a plurality of classes:
in the above-mentioned formula,
a attribution category score indicating that the ith bounding box belongs to the kth category,
indicating after softmax processing
S45: and (5) storing the weight parameters of the characteristic layer through iterative training until the model converges.
According to the preferable embodiment of the present invention, the process of performing intelligent identification and early warning of the abnormal event in the android application model in step S5 includes:
s51: acquiring an original image by using a high-definition camera of the android device, adjusting the brightness and contrast reading of the image to appropriate values, and denoising and enhancing the image;
s52: inputting the image into a model, and utilizing the extracted features of the trained VGG16 network to obtain the image features;
s53: generating a series of boundary frames on the extracted picture feature layer, returning all the boundary frames to correct positions through a trained SSD network, and predicting the correct classification of each boundary frame; the confidence score of each bounding box is determined by the classification information Pr (Class) of each grid
i| Object) and classification confidence information in bounding box
The multiplication results in that:
s54: utilizing a non-maximum value to inhibit and remove redundant bounding boxes, and displaying a detection result on an original image; if an anomaly is detected, an alarm is triggered.
The invention has the advantages of
According to the invention, the model is compressed and transplanted to the mobile terminal, so that the model can autonomously perform intelligent identification and early warning of the abnormity at the front terminal, the advantages of edge calculation are fully exerted, and the time of data transmission and the calculation resources of a back-end server are saved. By fusing the picture feature layer in the VGG16 network, the invention enables the model to make full use of the peripheral information of the small target, thereby improving the accuracy of the small target detection. Meanwhile, the invention uses a series of small convolution kernels to regress the accurate position of the bounding box on the picture characteristic layer, so that the detection speed is further improved. In addition, the invention eliminates redundant bounding boxes by using a non-maximum value inhibition method, so that the detection result is more concise. Finally, the invention adjusts different brightness and contrast of the picture and adopts a Gaussian blur processing method to simulate the picture effect under different scenes by a data enhancement technology, thereby enhancing the generalization capability of the model.
Drawings
FIG. 1 is a flow diagram of the model framework of the present invention;
FIG. 2 is a frame flowchart illustrating that the Conv4_ x and Conv5_ x fusion mode is a vector splicing mode according to the present invention;
FIG. 3 is a block diagram of the Conv4_ x and Conv5_ x fusion method of the present invention as the corresponding element addition method;
FIG. 4 is a schematic diagram of application example 1 of the present invention;
FIG. 5 is a schematic diagram of application example 2 of the present invention;
fig. 6 is a schematic diagram of application example 3 of the present invention.
Detailed Description
The invention is described in detail below with reference to the following examples and the accompanying drawings of the specification, but is not limited thereto.
Examples of the following,
An intelligent identification and early warning method for abnormal events in an open scene of an electric power field based on edge calculation is characterized by comprising the following steps:
s1: performing image enhancement processing on training data under different scenes, and labeling the training data by using a labeling tool to obtain an xml file; the marking refers to artificially determining the position of a target to be detected (such as a crane, construction machinery, a tower crane and the like) in each training picture, using a marking tool to frame the targets respectively by using one rectangular frame, and setting an attribute value for each rectangular frame to indicate the category of the target in the rectangular frame. Therefore, when the model is trained in the subsequent step S4, the model can identify which position in which picture has which kind of object, and train the model according to this principle;
s2: extracting features of an original picture by using a VGG16 network as a basic network, performing feature fusion on a Conv4_ x feature layer and a Conv5_ x feature layer in the VGG16 network, and applying the fused features to a prediction layer of a final model; the step enables the model obtained by training after the steps of S2, S3 and S4 to fully utilize the peripheral information of the small target;
s3: adding different network layers behind a basic network, predicting targets and scores of the attribution categories of the targets on the network layers, and meanwhile, using a small convolution kernel on a characteristic layer to regress accurate positions of a series of bounding boxes; the attribution categories refer to those four categories set in advance: 1) normal conditions; 2) a crane; 3) a construction machine; 4) tower crane; for example, if the score for predicting the target to which the crane category belongs is 0.7, the probability that the model considers the target to be the crane is 70%;
s4: aiming at a large number of boundary frames generated on the same target position, using non-maximum value to inhibit and find out the optimal target boundary frame, eliminating redundant boundary frames and training a model; here, the target position is a target area on the picture after the original captured picture is processed in step S1;
s5: and applying the trained model at the mobile terminal to perform intelligent identification and early warning of the abnormal event. Preferably, the mobile terminal is an android terminal.
The method for enhancing the image in step S1 includes:
s11: setting different brightness and/or contrast for an original picture, and performing Gaussian blur processing to simulate different scenes;
s12: the labeling tool is used to label the picture processed in step S11, and the present invention uses 5 categories: 1) normal conditions; 2) a crane; 3) a construction machine; 4) tower crane;
s13: for each picture processed in step S11, the following operations are randomly performed:
1) using the original picture;
2) sampling a region according to a sampling rule, wherein the sampling rule is as follows: randomly selecting a region between 0 and 1 with the smallest intersection with the object;
3) randomly sampling a block of area;
s14: the sampled area is [0.1, 1] of the original image size ratio, and the aspect ratio is between 0.5 and 2; when the center of the label box is in the sampled area, the overlapping part is reserved;
s15: after the sampling step, each sampled region is adjusted to a fixed size and flipped at a random level with a probability of 0.5.
In step S2, the extracting features of the original picture includes:
s21: collecting images, sending the collected images into a VGG16 network, sequentially passing through 5 layers of convolution layers, a pooling layer and 2 layers of full-connection layers, and extracting the characteristics of the images;
s22: the Conv4_ x feature layer and the Conv5_ x feature layer are fused together, and the fused features are directly applied to the last prediction layer, namely the fused features are input as part of the last prediction layer.
The Conv4_ x feature layer and the Conv5_ x feature layer are fused in a vector splicing mode or a corresponding element addition mode. As shown in fig. 2 and 3.
The method for determining the accurate position of the bounding box in the step S3 includes:
s31: the picture feature layer extracted through the step S2 is sliced into 8 × 8 or 4 × 4 grids;
s32: generating a series of fixed-size bounding boxes for each mesh on the feature layer, each bounding box including at least 5 prediction parameters: x, y, w, h, conf, wherein (x, y) represents the center coordinates of the bounding box with respect to the grid, (w, h) represents the width and height predicted with respect to the entire image, and (conf) represents the IOU value of the bounding box with any one of the label boxes;
s33: at each feature level, a series of convolution kernels are used to produce a series of fixed-size predictors. This function is to calculate the parameters x, y, w, h, conf of the bounding box generated in step S32, that is, to move the bounding box to an appropriate position and calculate the attribution classification score of the bounding box. The most prominent advantage is high speed.
In step S33, for a m × n feature layer with p channels, the size of the convolution kernel used is 3 × 3 × p.
The predicted value is a confidence score for the attribution category or a position offset value for the bounding box.
The method for finding the optimal target bounding box by using non-maximum suppression in step S4 includes:
s41: pairing each bounding box with all the annotation boxes: when the intersection ratio between the two is larger than a threshold value, the two are combined into a sample; preferably, the threshold is set between 0.5 and 0.9;
s42: sorting the bounding boxes of which the corresponding prediction results are negative samples at the positions of each object in the original image from large confidence degrees to small confidence degrees, selecting a plurality of positive and negative samples, and then selecting the plurality of positive and negative samples with the maximum confidence degrees to ensure that the number ratio of the positive and negative samples is about 1: 3; wherein, a series of bounding boxes are generated by the step S32, and then the bounding boxes are moved to proper positions by the step S33, and the respective attribution classification scores are calculated; finally, these bounding boxes are the prediction results. Definition of negative examples: comparing the bounding box after the step of S33 with the labeling box, if the intersection ratio between the bounding box and the labeling box is more than 0.5, considering the bounding box as a positive sample, otherwise, considering the bounding box as a negative sample;
s43: by using
Indicating that the ith bounding box is matched with the jth label box of the category k; otherwise, the data is not matched,
s44: the total target loss function is obtained by weighted summation of the position loss (loc) and the confidence loss (conf):
in the above formula, x represents whether the bounding box and the labeled box are matched, if so, x is 1, otherwise, x is 0, c represents the attribution type score of the bounding box, l represents the bounding box, g represents the labeled box, N is the number of matched bounding boxes, and if N is 0, the loss value of the objective function is set to be 0; the position loss (loc) is the smooth between the parameters of the bounding box (l) and the label box (g) after the step of S33L1Loss;
in the above formula, d represents a bounding box that passes through the step S32 but has not yet passed through the step S33. (cx, cy) represents the center coordinate of d, (w, h) represents the width and length of d,
and representing the label box after log processing.
The confidence loss (conf) is obtained by performing softmax operation on the confidence of a plurality of classes:
in the above-mentioned formula,
a attribution category score indicating that the ith bounding box belongs to the kth category,
indicating after softmax processing
Weight coefficient
A suitable value can be determined by cross validation, generally between 0 and 1;
s45: and (5) storing the weight parameters of the characteristic layer through iterative training until the model converges. Namely, the method can be used for detecting the front-end equipment model.
The process of performing intelligent identification and early warning of the abnormal event on the android application model in the step S5 includes:
s51: acquiring an original image by using a high-definition camera of the android device, adjusting the brightness and contrast reading of the image to appropriate values, and performing denoising and enhancement processing on the image to improve the image quality and further improve the detection effect;
s52: inputting the image into a model, and utilizing the extracted features of the trained VGG16 network to obtain the image features;
s53: generating a series of boundary frames on the extracted picture feature layer, returning all the boundary frames to correct positions through a trained SSD network, and predicting the correct classification of each boundary frame; the confidence score of each bounding box is determined by the classification information Pr (Class) of each grid
i| Object) and classification confidence information in bounding box
The multiplication results in that:
s54: utilizing a non-maximum value to inhibit and remove redundant bounding boxes, and displaying a detection result on an original image; if an anomaly is detected, such as the detection of a crane, construction machine, tower crane, etc., an alarm is triggered.
Application examples 1,
The specific application of the recognition and early warning method of the present invention is as follows, as shown in fig. 4.
By fusing the feature extraction layers, the model can make full use of information around the small target, thereby improving the calibration effect of the small target, as shown in the square in fig. 4.
Application examples 2,
The specific application of the recognition and early warning method of the present invention is as follows, as shown in fig. 5.
By means of the image enhancement technology, the model can still achieve good detection effect in foggy or rainy scenes.
Application examples 3,
The specific application of the recognition and early warning method of the present invention is as follows, as shown in fig. 6.
By means of the image enhancement technology, the model can accurately detect the abnormal target under the condition of dark light.