CN111091069A

CN111091069A - Power grid target detection method and system guided by blind image quality evaluation

Info

Publication number: CN111091069A
Application number: CN201911182943.XA
Authority: CN
Inventors: 李仕林; 赵旭; 李梅玉; 李宏杰; 韩凯; 孙晨曦; 马启林
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-05-01

Abstract

The application belongs to the technical field of image data processing, and particularly relates to a power grid target detection method and system guided by blind image quality evaluation. The image recognition technology adopted in the power industry has the problems of low accuracy and poor algorithm robustness and expansibility. The application provides a power grid target detection method and system guided by blind image quality evaluation. The safety behavior of workers is analyzed through the detected characteristics of the relative position, the characteristic representation, the human body profile representation and the like of the safety appliance target and the human body target, and the management standard and the personnel safety are guaranteed. The method and the device filter the low-quality images, overcome the interference of visual angle change, illumination change, motion change and the like, and improve the detection precision; the method and the device have the advantages that any source image is preprocessed, the application range is wide, and the preprocessing time is shortened.

Description

Power grid target detection method and system guided by blind image quality evaluation

Technical Field

The application relates to the technical field of image data processing, in particular to a power grid target detection method and system guided by blind image quality evaluation.

Background

In the power industry, due to the high voltage property of a power grid field, the operation safety is always the focus of much attention. In order to ensure that workers wear safe labor insurance and do work on duty, the workers generally carry out manual inspection before going to the field, and accidents caused by incomplete safety protection are prevented.

In busy and stressful actual work, the manual mode of self-checking and self-correcting or mutual checking is inevitable to avoid negligence or omission, and the phenomenon that the labor protection article is not worn completely happens, so that the high-risk characteristic is exposed. With the deep development of computer technology, the labor insurance wearing condition of field workers can be conveniently and efficiently identified through the computer vision identification technology, and further the computer vision identification technology becomes one of important ways for assisting manual inspection.

Existing common data sets for computer image recognition include: PASCAL VOC, ImageNet, MS COCO, and the like, mostly aim at higher quality images. These data sets are generally used for the performance test of target detection algorithms by researchers or for related competitions, and as the hot field of image processing, excellent algorithms such as R-CNN, Fast R-CNN, Faster R-CNN, YOLO and SSD are emerged in succession, and excellent performance is obtained on static data sets.

However, in a real environment, images obtained by a video often have phenomena of distortion and degradation, and the accuracy of the algorithm is also reduced. In the actual production of enterprises, in order to obtain a better identification effect, images from any source need to be screened, and the robustness and the expansibility of an image processing algorithm are maintained under the condition of improving the image identification accuracy, so that the actual objective requirement becomes a technical problem to be solved urgently.

Disclosure of Invention

The application provides a power grid target detection method and system guided by blind image quality evaluation, and aims to solve the problems of low target identification accuracy and poor algorithm robustness and expansibility in the current power industry.

The technical scheme adopted by the application is as follows:

in a first aspect of the present application, a power grid target detection method guided by blind image quality evaluation is provided, which includes the following steps:

screening an input monitoring video sequence by training a blind image quality evaluation network, and selecting a video frame with the quality meeting a preset requirement;

performing target detection on the video frames with the quality meeting the preset requirement by training a target detection network;

and analyzing the result of the target detection to obtain a conclusion of the safety state of the target human body.

Optionally, the step of screening the input surveillance video sequence and selecting a video frame with a quality meeting a preset requirement by training the blind image quality evaluation network includes:

the method comprises the steps of creating a blind image quality evaluation data set by using a monitoring video acquired on the site of a power grid and combining a data enhancement technology;

training a blind image quality evaluation network according to the blind image quality evaluation data set;

separating each frame of image of an input monitoring video sequence, inputting each frame of image into a blind image quality evaluation network for evaluation, setting a quality threshold value, extracting images with the quality value larger than or equal to the quality threshold value, and arranging the images into an image set according to the original time sequence.

Optionally, in the step of training the blind image quality evaluation network according to the blind image quality evaluation data set, the convolutional neural network is used to train the blind image quality evaluation data set, a VGG-F model with low computational complexity is used as a basis, a large-step convolutional kernel is selected, and all convolutional layers are filled with zero.

Optionally, in the step of creating an image quality evaluation data set by using the monitoring video acquired in the power grid field and combining with a data enhancement technology, the method includes:

creating a common image quality evaluation dedicated data set into an image quality evaluation data set, wherein the common image quality evaluation dedicated data set specifically comprises images of the data set and image scores, and the common image quality evaluation dedicated data set comprises a LIVE data set, a CSIQ data set and a TID2013 data set.

Optionally, in the step of performing target detection on the video frame with quality meeting the preset requirement by training the target detection network, the method includes:

constructing a target training set of a target human body, a safety helmet and a safety suit in a power grid field;

training a target detection network according to the target training set;

inputting the image sets arranged according to the original time sequence into a target detection network for target detection to obtain the confidence degrees and frame information of all targets, determining that the targets exist when the confidence degrees exceed a preset threshold value, setting the targets as quasi-targets, and reserving the frames of the targets;

and removing the redundant boxes by using a non-maximum suppression algorithm according to the quasi-target information to obtain a final detection result of the target information and the frame information.

Optionally, in the step of analyzing the result of the target detection to obtain the conclusion of the safety state of the target human body, the method includes:

and analyzing the safety state of the target human body according to the detection result of the final target information and the frame information to obtain the conclusion whether the posture of the target human body is safe or not and whether the wearing of the safety helmet and the safety clothes is safe or not.

Optionally, the step of analyzing the safety state of the target human body includes:

judging the relative relation between the safety helmet and the target human body, wherein the safety helmet is detected in the vertical or approximately vertical direction of the target human body, the target human body and the frame of the safety helmet are intersected, and otherwise, reporting that the safety behavior of the safety helmet is abnormal;

judging whether a target human body is wearing safety clothing, collecting samples of the human body wearing safety clothing by utilizing deep layer convolution characteristics, supporting a vector machine by using a machine learning method to train a two-classification model offline, extracting the deep layer convolution characteristics of the human body wearing safety clothing in a detected frame of the target human body during online monitoring, inputting the deep layer convolution characteristics into a two-classification model classifier trained in advance to classify, judging whether the target human body is wearing the safety clothing, and obtaining a conclusion whether the target human body wearing the safety clothing is safe;

and judging whether the target human body posture is safe, and when the width-height ratio of the human body frame is greater than 1 and the ratio is maintained for 10 seconds continuously, determining that a person falling accident occurs, and obtaining a conclusion that the target human body posture is unsafe.

Optionally, after the step of analyzing the result of the target detection to obtain the conclusion of the safety state of the target human body, the method further includes:

and triggering an alarm at the working site to give an alarm when the conclusion of the safety state of the target human body is negative.

In a second aspect of the application, a power grid target detection system guided by blind image quality evaluation is provided, which comprises a blind image quality evaluation module, a target detection module and a target human body safety analysis module;

the blind image quality evaluation module is used for screening the input monitoring video sequence by training a blind image quality evaluation network and selecting a video frame with the quality meeting the preset requirement;

the target detection module is used for carrying out target detection on the video frames with the quality meeting the preset requirement through training a target detection network;

and the target human body safety analysis module is used for analyzing the result of target detection to obtain the safety state conclusion of the target human body.

Optionally, the blind image quality evaluation module is configured to train a blind image quality evaluation network according to a blind image quality evaluation data set, where the blind image quality evaluation module is further configured to train the blind image quality evaluation data set by using a convolutional neural network, select a large-step convolutional kernel based on a VGG-F model with low computational complexity, and use zero padding for all convolutional layers.

The technical scheme of the application has the following beneficial effects:

according to the power grid target detection method guided by blind image quality evaluation, the blind image quality evaluation network is trained, the input monitoring video sequences are screened, and video frames with the quality meeting the preset requirements are selected. The method for preprocessing any image can construct a training set on real power grid image data, and learn characteristics by using a convolutional neural network technology, so that the characteristics can be more accurately represented and the application capability in a power grid environment is improved; the blind image quality evaluation network can filter low-quality images, overcome the interference of visual angle change, illumination change, motion change and the like, and improve the detection precision; the safety behavior of the staff is analyzed, management standards and personnel safety are guaranteed, and personnel safety accidents are timely handled. The method can be used for preprocessing any source image, and has the advantages of wider application range, shortened preprocessing time and saved time cost due to the improved preprocessing method.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a block flow diagram of one embodiment provided by a first aspect of the present application;

FIG. 2 is a block flow diagram of another embodiment provided by the first aspect of the present application;

fig. 3 is a schematic structural diagram of an embodiment provided in the second aspect of the present application.

Detailed Description

Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present application. But merely as exemplifications of systems and methods consistent with certain aspects of the application, as recited in the claims.

Referring to fig. 1, for facilitating understanding of the first aspect of the present application, a blind image quality evaluation guided power grid target detection method is provided, which includes the following steps:

s101, screening an input monitoring video sequence by training a blind image quality evaluation network, and selecting a video frame with quality meeting a preset requirement;

s102, performing target detection on the video frame with the quality meeting the preset requirement through a training target detection network;

and S103, analyzing the target detection result to obtain a safety state conclusion of the target human body.

The images with the quality meeting the preset requirement are selected by preprocessing any source images, so that the method can be suitable for any field video images, and has wide compatibility with images shot by different places, different power grids or different monitoring devices.

s1001, creating a blind image quality evaluation data set by using a monitoring video acquired on the site of a power grid and combining a data enhancement technology;

s1002, training a blind image quality evaluation network according to the blind image quality evaluation data set;

s1003, separating each frame of image of the input monitoring video sequence, inputting each frame of image into a blind image quality evaluation network for evaluation, setting a quality threshold, extracting images with the quality value larger than or equal to the quality threshold, and arranging the images into an image set according to the original time sequence.

Referring to fig. 2, in the present embodiment, the utility model is combined with the grid site, so that the utility model is more practical, and the speciality of the utility model is also embodied. And setting a quality threshold value to screen each frame of image, preliminarily filtering out the image with poor quality, and extracting each frame of image of the monitoring video sequence which is beneficial to identification. And the accuracy of the target identification of the power grid site in the later period is improved.

In the embodiment, the convolutional neural network is utilized, a VGG-F model with low operation complexity is adopted, and a large-step convolution kernel is selected, so that the resolution of a feature mapping spectrum is reduced, the operation complexity is reduced, and the aim of shortening the data processing process and time is fulfilled; in addition, the strategy of using zero padding for all convolutional layers can ensure that the resolution is not affected by convolution.

training a target detection network according to the target training set;

In this embodiment, real monitoring video data in a power grid environment needs to be collected, a synthetic distortion operation of an image quality evaluation database is simulated, distortion processing is performed on a video frame, and a blind image quality evaluation training set is established; and (3) manually marking the video frame, distinguishing different objects such as a target human body and the like, and further establishing a target training set. Training a target detection network by constructing a target training set, and carrying out target detection on the screened image sets arranged according to the original time sequence to obtain confidence degrees and frame information of all targets; setting a confidence preset threshold value, and eliminating interference information, so that a target can be accurately identified; and a non-maximum value suppression algorithm is used for removing the redundant frame, so that the accuracy of the final target information and the frame information is further ensured.

The purpose of this application is in order to detect whether the staff of electric wire netting scene is the gesture safe, whether wear the safety helmet and whether wear the concrete state of safety clothes, therefore, also must draw corresponding target human posture safe, whether safe conclusion of wearing of safety helmet and safety clothes is safe to the pertinence ground.

In the present embodiment, how to judge the method and the determination criteria of the above three states are shown, and actually, those skilled in the art can easily conceive other setting and assignment methods, but all fall within the scope of the present application.

In the embodiment, when the conclusion of the safety state of the target human body is negative, an alarm of a working site is triggered to give an alarm. The system is beneficial to immediate reaction in the presence of unsafe conditions, so that the safety risk of field workers of the power grid is further reduced, and casualty accidents of the workers are reduced.

Referring to fig. 3, for convenience of understanding of the second aspect of the present application, the power grid target detection system guided by blind image quality evaluation provided includes a blind image quality evaluation module, a target detection module, and a target human body safety analysis module;

In the embodiment of the application, the applicant carries out more detailed practical verification on the power grid target detection method guided by blind image quality evaluation. The verification comprises the following specific contents:

in the training stage of the blind image quality evaluation network, in order to shorten the preprocessing time, the idea of the VGG-F model based on low operation complexity adopts a large-step convolution kernel to rapidly reduce the resolution of the feature mapping spectrum, thereby reducing the operation complexity. Specifically, when designing the CNN network structure, the input image size is predetermined to 224 × 224 × 3 based on the mainstream CNN network, and 8 convolution kernels with a step size of 7 × 7 are used in the first convolution layer to reduce the feature mapping spectrum from 224 × 224 to 56 × 56, and then one step size is usedThe maximum pooling layer of 2 further reduces it to 28 x 28. The second convolutional layer will use 16 convolutional kernels of size 3 x 3 with step size 2 to extract the more discriminatory features, where the resolution drops to 14 x 14. After a maximum pooling layer with step size of 2, a bilinear pooling layer is used to convert the 7 × 7 × 16 convolution response tensor into a 1 × 1 × 256 vector by solving its outer product, which is used as the image representation. Mapping image representations to a predicted score of 1 value using a 1 × 1 × 256 × 1 full-connectivity layer, and using l₁The loss function calculates error learning. All convolutional layers use a zero-padding strategy to ensure that the resolution is not affected by the convolution. Table 1 shows a lightweight IQA convolutional neural network structure, and based on the style and convention of MEON network, the parameters of convolutional layers are referred to as "feature mapping spectrum height × feature mapping spectrum width | input channel number × output channel number | convolution step length | convolution zero padding number".

TABLE 1 lightweight IQA convolutional neural network architecture

In order to enable the lightweight IQA convolutional neural network structure to better conform to data in a real power grid, a data set of the real power grid is constructed in practice, and fine adjustment of the IQA network is achieved. 25 volunteers, including 13 males and 12 females, were selected, both of which were professional researchers in the field of non-image quality evaluation. Based on five-level evaluation criteria (custom MOS range in brackets) of ITU-R CCIR500-1 document, the volunteer's evaluation of a grid image may be: one of "excellent (90-100)", "good (80-90)", "medium (70-80)", "poor (60-70)", and "poor (60 or less)". After completing the subjective evaluation of all volunteers, the evaluation received by each image is summarized, and the grade with the highest rating number is taken as the final rating. For each class, 100 total 500 images were selected as a dataset, 80% of which were subjected to fine tuning training and the remaining 20% were tested.

In actual operation, a monitoring video sequence of a real power grid is input into a fine-tuned IQA network for evaluation, images rated as ' poor ' and ' bad ' are removed, and images rated as ' good ', good ' and ' medium ' are reserved for continuous follow-up work.

The method for detecting the target of the monitoring video sequence screened by the IQA network comprises the following steps:

constructing a target training set: the method comprises the steps of making a Pascal VOC data set, an MS COCO data set, an ImageNet data set and the like which are common at present into a Pascal VOC2007 data set format, specifically comprising an image of the data set and a label file (xml format), wherein the label file stores frame information (x) of each target^*,y^*,w^*,h^*) And a class, wherein (x)^*,y^*) Coordinates representing the upper left corner of the bounding box, (w)^*,h^*) Indicating the bezel width and height.

Training the target detection network: based on the convolutional neural network VGG16 as a base network of the detection network, a specific network structure of the VGG16 is shown in table 2 and consists of 13 convolutional layers and 3 fully-connected layers.

TABLE 2 convolutional neural network VGG16

Training a VGG16 model on a target training set, initializing parameters, and training an improved fast RCNN network by using an approximate joint training mode, wherein the specific improvements are as follows:

the input image size is unified, and the short edge is zoomed to 800, so that the detection rate of small objects such as safety instruments is improved; because the shooting distance in the surveillance video is far and near, the size difference of the target is very big, and the size of the sliding window is increased to 9 types, which are: { 32X 32, 64X 64, 96X 96,128X 128,160X 160,192X 192,224X 224,256X 256,288X 288 }.

The loss function of the module detection network consists of two parts including classification loss L_clsAnd frame regression loss L_regDefined as formula (one):

in the formula, i is equal to {1,2, …, N_winDenotes the reference numeral of the sliding window, N_winRepresenting the number of sliding windows used for a batch training, N_regThe number of coordinate positions of the sliding window is represented, λ is a factor (in the embodiment of the present invention, λ is 10) for regulating and controlling the classification loss and the border regression loss, and p is_iIndicating the probability that the window is predicted to be an object,

representing true probability (positive samples)

Negative sample

)。

The real frame coordinate information of each target in the training sample is known as (x)^*,y^*,w^*,h^*) (ii) a The information of the sliding window frame in the RPN network is (x)_r,y_r,w_r,h_r) Wherein (x)_r,y_r) Denotes the coordinates of the upper left corner of the sliding window, (w)_r,h_r) Represents the sliding window width and height; the frame information predicted by the RPN network is (x, y, w, h), wherein (x, y) represents the coordinates of the upper left corner of the predicted frame, and (w, h) represents the width and height of the predicted frame; t is t_iFour-dimensional parametric vector (t) representing coordinate relation of prediction frame and sliding window_x,t_y,t_w,t_h)，

Four-dimensional parameterized vector representing coordinate relation of real frame and sliding window

The specific calculation formula is as follows:

fractional loss L in equation (one)_clsBy cross entropy loss, bounding box regression loss L_regBy using

The loss is specifically defined by the formula (three) to (five):

by minimizing L ({ p)_i},{t_i}) to train and adjust the network, the target detection network adopts a random gradient descent method to carry out back propagation, the iteration is 100000 times, the learning rate is set to be 0.001 in the first 50000 times, and the learning rate is reduced to be 0.1 in the last 10000 times.

And carrying out target detection on the monitoring video sequence by using the network obtained by training to obtain the confidence score and the predicted frame information (x, y, w, h) of the target. The confidence of each predicted frame is calculated by the softmax classifier, and when the confidence score is greater than a set threshold (in this embodiment, the set threshold is T ═ 0.75), the frame is regarded as a quasi-target, and the target frame is retained.

The embodiments provided in the present application are only a few examples of the general concept of the present application, and do not limit the scope of the present application. Any other embodiments extended according to the scheme of the present application without inventive efforts will be within the scope of protection of the present application for a person skilled in the art.

Claims

1. A power grid target detection method guided by blind image quality evaluation is characterized by comprising the following steps:

2. The blind image quality evaluation guided power grid target detection method according to claim 1, wherein in the step of screening the input monitoring video sequence and selecting the video frame with the quality meeting the preset requirement by training the blind image quality evaluation network, the method comprises:

3. The blind image quality evaluation guided power grid target detection method according to claim 2, characterized in that in the step of training the blind image quality evaluation network according to the blind image quality evaluation dataset, the blind image quality evaluation dataset is trained by using a convolutional neural network, a VGG-F model with low computational complexity is based and a large-step convolutional kernel is selected, and all convolutional layers use zero padding.

4. The blind image quality evaluation guided power grid target detection method according to claim 2, wherein in the step of creating an image quality evaluation data set by using the monitoring video acquired on the power grid site in combination with a data enhancement technology, the method comprises:

5. The blind image quality evaluation guided power grid target detection method according to claim 1, wherein in the step of performing target detection on the video frame with the quality meeting the preset requirement through training the target detection network, the method comprises:

training a target detection network according to the target training set;

6. The blind image quality evaluation guided power grid target detection method according to claim 1, wherein in the step of analyzing the result of target detection to obtain the conclusion of the safety state of the target human body, the method comprises:

7. The blind image quality evaluation guided power grid target detection method according to claim 6, wherein the step of analyzing the safety state of the target human body comprises:

8. The blind image quality evaluation guided power grid target detection method according to claim 1, further comprising, after the step of analyzing the result of target detection to draw a conclusion of the safety state of the target human body:

9. A power grid target detection system guided by blind image quality evaluation is characterized by comprising a blind image quality evaluation module, a target detection module and a target human body safety analysis module;

10. The blind image quality evaluation guided power grid target detection system according to claim 9, wherein the blind image quality evaluation module is configured to train a blind image quality evaluation network according to a blind image quality evaluation data set, wherein the blind image quality evaluation module is further configured to train the blind image quality evaluation data set by using a convolutional neural network, select a large-step convolutional kernel based on a VGG-F model with low computational complexity, and use zero padding for all convolutional layers.