CN112270670B

CN112270670B - Panoramic target detection method in power grid inspection

Info

Publication number: CN112270670B
Application number: CN202011242924.4A
Authority: CN
Inventors: 段尚琪; 黄双得; 陈海东; 葛兴科; 赵毅林; 周仿荣; 赵小萌; 胡昌斌; 宋庆
Original assignee: Yunnan Power Grid Co Ltd
Current assignee: Yunnan Power Grid Co Ltd
Priority date: 2020-11-09
Filing date: 2020-11-09
Publication date: 2023-09-12
Anticipated expiration: 2040-11-09
Also published as: CN112270670A

Abstract

The invention relates to a panoramic target detection method in power grid inspection, and belongs to the technical field of target detection of computer vision. The panoramic image is re-projected according to the virtual focal length to obtain the perspective projected image, and then the U-Net network is adopted to realize multi-scale target detection. The invention solves the problems that the current target detection method is mostly carried out based on perspective projection images, but the panoramic image has larger distortion, and simultaneously solves the problems that the resolution of the panoramic image is generally not high, but a scene of 360 degrees is covered, so that more small targets exist. The method is convenient and reliable and is easy to popularize and apply.

Description

Panoramic target detection method in power grid inspection

Technical Field

The invention belongs to the technical field of target detection of computer vision, and particularly relates to a panoramic target detection method in power grid inspection.

Background

Conventional target detection and classification generally adopts an image processing method, and candidate areas are extracted from an image according to information such as color and shape of a target for detection and identification. For example, the AdaBoost detection algorithm utilizes Harr characteristics to combine weak classifiers to obtain a strong classifier to realize rapid detection of human faces, and the HoG combines with SVM to realize detection of human targets. In the conventional detection and recognition, shallow learning models are often adopted for recognition, such as linear classifiers, boosting, SVM and the like, so that the extraction of features becomes a key for improving the recognition rate. The conventional approach designs features based on experience, such as the widely used Harr, hoG, LBP, SIFT feature, etc., which has the advantage of being fast. However, because of subjectivity and locality of human experience, the detection and recognition accuracy is generally not high, and the difference between the target detection and classification effects under the same viewing angle and scene is large. Along with the development of the deep neural network, particularly the successful application of the deep convolutional network in image recognition, the adoption of a deep learning method for automatic detection and recognition of targets becomes the key point and hot spot of research, such as a fast R-CNN, SSD, YOLO/YOLO9000 and other target detection schemes, and the precision is obviously improved compared with the traditional method by combining the latest classification of the deep neural network (such as VGG, resnet, googLeNet and the like). Meanwhile, by means of the powerful parallel computing capacity of the GPU, the SSD and the YOLO can achieve the effect of real-time detection.

After the panoramic image is stored by adopting a two-dimensional image, the panoramic image has larger deformation. Especially, the top and bottom of the image are very deformed due to the projection by the warp and weft method. If panoramic images are directly used for target detection, target deformation increases the detection difficulty. In addition, a large amount of label data can be used for training in the current object detection public database, but most of the object detection public database mainly aims at images acquired by area array perspective projection and rarely aims at panoramic images. Therefore, how to overcome the defects of the prior art is a problem to be solved in the technical field of target detection of computer vision at present.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides a panoramic target detection method in power grid inspection.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a panoramic target detection method in power grid inspection comprises the following steps:

s1, shooting power grid facilities by using panoramic acquisition equipment mounted on an unmanned aerial vehicle, and acquiring continuous panoramic image data;

s2, processing the panoramic image data acquired in the S1 by utilizing a panoramic reprojection technology, uniformly setting virtual perspective projection cameras in a panoramic imaging space, enabling the projection center of each virtual camera to coincide with the sphere center of the panorama, obtaining an imaging equation of perspective projection according to the virtual focal length and the projection view field of the virtual perspective projection cameras, and projecting the panoramic image onto a plurality of perspective projection images;

s3, performing target detection on the perspective projection image obtained in the S2 by using an improved deep neural network target detection model, and extracting a target of power grid inspection attention;

and S4, performing back projection on the target extracted in the step S3, and re-projecting the target into the panoramic image to realize target detection of the panoramic image of the power grid.

Further, in the step S1, it is preferable that the panorama acquisition device is a panorama camera, and the panorama image acquired by the panorama camera is a spherical panorama image.

Further, preferably, in the step S2, the virtual focal length of the perspective projection image is a radius value of the spherical panoramic image, and the angle of view of each perspective projection is 45 degrees; the Z-axis direction of perspective projection is the connecting line of the sphere center of the spherical panoramic image and the perspective projection center.

Further, in the step S2, preferably, 8 virtual perspective projection cameras are provided to obtain 8 perspective projection images.

Further, preferably, in S3, the improved deep neural network target detection model is trained by using a deep learning platform, sample data is extracted from historical data obtained by unmanned aerial vehicle network inspection, the size of a sample image is set to 512×512 pixels, and the sample data is preprocessed to increase the number of samples; the preprocessing comprises image rotation, mirroring, zooming-in and zooming-out operations;

the improved deep neural network target detection model takes a U-Net network in semantic segmentation as a main network, the left half part of the network carries out convolution and pooling operation on an original input three-band image, abstract features are extracted from the image, and a coded feature map is obtained; the right half of the network upsamples the encoded feature map to improve the resolution of the feature map, and the feature map of the left half is spliced with the feature map of the right half by direct connection.

Further, preferably, in the step S4, after the target position detected in the perspective projection image is back projected to the panoramic image, 4 corner points of the target frame form a quadrilateral, and an circumscribed rectangle of the quadrilateral is calculated as a final detection result; the adjacent two sides of the circumscribed rectangle are respectively parallel to XY of the image.

According to the invention, the power grid is widely subjected to intelligent inspection by the unmanned aerial vehicle, and a large amount of perspective projection image data can be obtained as historical data by the perspective projection camera carried by the unmanned aerial vehicle.

The formula of the back projection in the present invention only needs to invert the formula in the detailed description 1.

According to the invention, the panoramic camera can collect light information in all directions at the same time, and the longitude and latitude are used as X and Y coordinates to be stored on a two-dimensional image, so that a spherical panoramic image is obtained.

The perspective projection image has no geometric distortion, and is suitable for target detection by deep learning.

The targets in the invention comprise targets such as insulators, bird nests, towers and the like in power inspection, and the size of the targets on an image cannot be too small, otherwise, the targets cannot be detected. It is recommended that the target to be detected is larger than 10 x 10 pixels.

In the invention, a convolutional neural network model of target detection combining U-Net and multi-scale target detection is adopted to carry out target detection. The overall structure of the neural network model of the invention uses the advantages of the U-Net network, the original 3-band image is subjected to multiple rolling and pooling operations to extract abstract features in the image, and then the resolution of the image is recovered through up-sampling. And simultaneously combining the image obtained in the pooling process with the image with the same resolution obtained by up-sampling. The invention applies the U-Net network in semantic segmentation to target detection, can fully utilize the capability of the U-Net network for integrating information with different resolutions, and improves the information quantity of the feature network. The model can be built with reference to the detailed network structure shown in fig. 3.

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the panoramic image with larger distortion is subjected to reprojection to obtain the perspective projection image without distortion, the existing perspective projection data can be fully utilized to perform training and target detection, and the accuracy of panoramic image target detection is improved;

(2) When the target detection model is designed, the image information before pooling is combined with the image after pooling, so that the abstract characteristics of the image can be obtained while the details of the image can be fully reserved. The model utilizes images with different scales to detect, so that the capability of the model for extracting small targets can be improved; experiments show that compared with the traditional two-step method target detection algorithm (Faster RCNN), the recall rate is improved by 12% to 95.62%, and the accuracy rate is improved by 5.1% to 96.61%; compared with YOLOV3, the algorithm of the invention has the advantages that the recall rate is improved by 4.2%, and the accuracy is improved by 2.6%.

Drawings

FIG. 1 is a schematic view of a spherical panorama;

FIG. 2 is a schematic view of a spherical panoramic image projected from a sphere to any planar perspective projection;

FIG. 3 is a block diagram of a multi-scale object detection network;

fig. 4 is a flow chart of the method of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples.

It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The specific techniques or conditions are not identified in the examples and are performed according to techniques or conditions described in the literature in this field or according to the product specifications. The materials or equipment used are conventional products available from commercial sources, not identified to the manufacturer.

Because panoramic images are stored by adopting two-dimensional images, large deformation exists. Especially the top and bottom of the image, because it is projected in a warp and weft manner, the distortion is very large. If panoramic images are directly used for target detection, target deformation increases the detection difficulty. In addition, a large amount of label data can be used for training in the current object detection public database, but most of the object detection public database mainly aims at images acquired by area array perspective projection and rarely aims at panoramic images. The invention utilizes the virtual focal length and the projection center to project the panoramic image onto the plane to become a common perspective projection image, so that the model can be trained by utilizing the current public database.

The current target detection is generally completed by adopting a convolutional neural network, and images with different scales are acquired through pooling operation. The invention fully utilizes the advantages of the U-Net network, combines the up-sampling and coding stage images with the same scale, and can utilize more information to extract the multi-scale targets.

The invention corrects the distorted panoramic image to the plane projected image through the local image reprojection technology, thus the existing target detection library data can be fully utilized for training. In order to obtain targets with different scales as much as possible, the invention provides a multi-scale neural network target detection model (namely an improved deep neural network target detection model).

FIG. 4 is a general flow chart of the present invention, the core of which is panoramic reprojection and a multi-scale neural network target detection model. The aim of panoramic reprojection is to project a panoramic image with larger distortion into a common perspective projection image, and the multi-scale target detection neural network is supported by the current deep learning research, so that targets with different scales can be extracted from the image as much as possible.

In the step S1, the panorama acquisition device is a panorama camera, and the panorama image acquired by the panorama camera is a spherical panorama image.

In the step S2, the virtual focal length of the perspective projection image is a radius value of the spherical panoramic image, and the view angle of each perspective projection is 45 degrees; the Z-axis direction of perspective projection is the connecting line of the sphere center of the spherical panoramic image and the perspective projection center.

In the step S2, 8 virtual perspective projection cameras are arranged to obtain 8 perspective projection images.

In the step S3, the improved deep neural network target detection model is trained by adopting a deep learning platform, sample data is extracted from historical data obtained by unmanned aerial vehicle power network inspection, the size of a sample image is set to 512 x 512 pixels, and the sample data is preprocessed to increase the number of samples; the preprocessing comprises image rotation, mirroring, zooming-in and zooming-out operations;

In the step S4, after the target position detected in the perspective projection image is back projected to the panoramic image, 4 corner points of the target frame form a quadrilateral, and the circumscribed rectangle of the quadrilateral is calculated to be used as a final detection result; the adjacent two sides of the circumscribed rectangle are respectively parallel to XY of the image.

The method comprises the following steps:

1. panoramic imaging equation

Fig. 1 is a schematic view of the conformation of a spherical panorama. The plane (u, v) represents the spliced spherical panoramic image, and each pixel point p in the image corresponds to a three-dimensional point in the spherical space O ' -X ' Y ' ZPoint->At a radius R _s Is arranged on the sphere of the ball. The abscissa u represents the longitude and the ordinate v represents the latitude. Let the width of the image be wd, the height be ht, the longitude and latitude be: (lon, lat), then there are:

wd＝2×ht (1)

wd＝2πR _s (2)

lon＝u/R _s ，lat＝v/R _s (3)

X′＝R _s cos(lat)sin(lon) (4)

Y′＝R _s cos(lat)cos(lon) (5)

Z′＝R _s sin(lat) (6)

geodetic sittingThe standard system O-XYZ is translated (shown in the figure) And rotation (R shown in the figure) can be converted to O '-X' Y 'Z'.

Homogeneous coordinates were introduced: { X Y Z1] ^T The above formula can be simplified as:

P＝[R T] (9)

T＝-RT _s (10)

in the formula, P is a projection matrix, and the size is 3 x 4. The projection matrix contains information of translation vectors and rotation matrices.

2. Panoramic reprojection technology

The invention adopts the virtual focal length and the virtual perspective projection camera to realize the projection of the panoramic image to the image plane of the virtual perspective projection camera.

The plane image after perspective projection is not distorted, so that the plane image is suitable for being used as data input of subsequent image processing. Fig. 2 is a schematic view of a spherical panoramic image projected from a spherical surface to an arbitrary planar perspective projection, and a gray plane is a plane in an arbitrary direction. The conversion from spherical projection to arbitrary planar projection is to transfer the color information recorded at the P point on the sphere to the P point in the planar coordinate system o-uv. Because the three points O, P and P are collinear, namely correspond to one three-dimensional point in the space coordinate system O-XYZ, the space coordinates of the three-dimensional point can be used as the connection between the spherical point and the plane point, and the conversion from spherical projection to any perspective projection plane can be realized. The key point of the projection of the spherical surface to the plane is to establish a conversion relation between a coordinate system O-uv and O-XYZ, and for the convenience of projection conversion, we assume that the normal direction of the plane points to the coordinate system of spherical projectionCenter O, while defining a spatially-assisted coordinate system O-X ' Y ' Z ' of the image of the virtual perspective projection camera, X ', Y ' of the spatially-assisted coordinate system being parallel to u, v of the image coordinate system. Assuming that the distance from the point O to the image plane is f (representing the virtual focal length), the three-dimensional point [ X Y-f ] can be obtained by connecting the point (X, Y) of the image coordinate system of the virtual perspective projection camera with the projection center of the O-X ' Y ' Z ' system] ^T . Since the origin of the coordinates of the two coordinate systems of O-X ' Y ' Z ' and O-XYZ are the same, the relationship between the two coordinate systems can be represented by a rotation matrix R, written as:

[X Y Z] ^T and [ X ' Y ' Z ] '] ^T Representing three-dimensional point coordinates in two coordinate systems. The rotation matrix R can be determined according to the normal vector of the perspective projection planeTo determine, i.e. the normal vector of the perspective projection plane +.>Can be changed into a Z axis [ 01 ] under an O-XYZ coordinate system after rotation transformation] ^T . Normal vector->Can be calculated from the spherical panoramic image, so the rotation matrix can be calculated using a corresponding three-dimensional graphics algorithm. It can be seen from the above algorithm that, as long as the normal direction of the plane and the virtual focal length value f are determined, the direct conversion between the spherical panoramic image and the perspective projection plane can be conveniently realized, the normal direction is given, and the virtual focal length value can be given in advance (in order to keep the same as the image resolution of the spherical panoramic image as much as possible, the virtual focal length value is generally set as the sphere radius value corresponding to the spherical panoramic image). So that the spherical panorama projection is perspective projection, and can directly be according to the image of the spherical panorama without additional camera parametersThe conversion can be performed.

Panoramic three-dimensional space coordinate [ X Y Z ]] ^T Rotated coordinates [ X ' Y ' Z ] '] ^T The formula between the virtual perspective projection camera and the image coordinates of the virtual perspective projection camera is as follows:

x, y represent coordinate points in the image coordinate system.

3. Convolutional neural network-based target detection

The panoramic image can be converted into a perspective projection image after projection, and the perspective projection image is consistent with the image distortion shot by a common camera, so that the panoramic image can be trained by utilizing a large number of existing public data sets (such as PASCAL VOC, MS COCO and the like). The invention combines U-Net with multi-scale target detection to design a convolutional neural network model for target detection.

Fig. 3 is a diagram of a multi-scale object detection network designed by the present invention, and the overall structure uses the advantages of the U-Net network. The original 3-band image is subjected to multiple convolution and pooling operations to extract abstract features in the image, and then is subjected to up-sampling to restore the resolution of the image. Meanwhile, the image obtained in the pooling process and the image with the same resolution obtained by up-sampling are combined, and the structure is similar to a residual error network, so that the gradient disappearance phenomenon in the training process can be prevented.

Network design as shown in fig. 3, the invention uses the design scheme of the U-Net network in semantic segmentation as a main network for target detection. The left half part of the network carries out rolling and pooling operation on the three-band image which is originally input, the width and the height of the image after pooling each time are reduced by half, and the width and the height of the image after pooling for 4 times are 1/16 of the original size. The left half of the network is mainly intended to extract abstract features from the image, a process also called coding. The right half part of the network carries out up-sampling on the coded characteristic images, the resolution ratio of the characteristic images is continuously improved, and the characteristic images of the left half part and the characteristic images of the right half part are spliced by utilizing direct connection, so that details in images can be kept as far as possible, and the recognition capability of the network is improved.

TABLE 1 detailed architecture of the inventive network

Note that: w represents the width of the input image and H represents the height of the input image.

The present invention uses the three-scale feature maps output by numbers 14, 18, 22 for object detection. In detection, each pixel in the feature map is taken as a center, and the positions of the actual target frames in the image are regressed by taking 5 rectangular frames (1:1, 1:2,2:1,1:3, 3:1) with different proportions as initial values of candidate target frames. The feature map output by 14, 18, 22 is convolved with (5+N) 5 filter kernels, N representing the number of classes of objects to be detected, the first 5 comprising 4 parameters for regression of the object box position and 1 confidence level, and the second 5 representing the number of candidate boxes.

Taking an insulator commonly used in power grid safety inspection as an example, the main steps of target detection include:

1) Collecting insulator samples. Firstly, selecting clear data of an insulator from an image obtained by unmanned aerial vehicle power network inspection, and then marking the insulator in the image by utilizing target standard software, and recording coordinates and long-width information of an external rectangular frame containing the insulator when marking.

2) The sample is subjected to an augmentation process. Since deep learning requires a large number of samples to train and optimize the network, the number and quality of samples can affect the recognition accuracy of the model. The increase in the number of samples is achieved by rotating, scaling, translating, mirroring the samples.

3) The neural network target detection model provided by the invention is constructed, the learning rate is set to be 0.001, the training period (epoch) is set to be 50, and training is performed based on a deep learning platform pytorch.

4) Compared with the traditional two-step method target detection (such as Faster RCNN), the recall rate is improved by 12% to 95.62%, and the accuracy is improved by 5.1% to 96.61%; compared with YOLOV3, the algorithm of the invention has the advantages that the recall rate is improved by 4.2%, and the accuracy is improved by 2.6%.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The panoramic target detection method in the power grid inspection is characterized by comprising the following steps of:

s2, processing the panoramic image data acquired in the S1 by utilizing a panoramic reprojection technology, uniformly setting virtual perspective projection cameras in a panoramic imaging space, enabling the projection center of each virtual perspective projection camera to coincide with the sphere center of the panorama, obtaining an imaging equation of perspective projection according to the virtual focal length and the projection view field of the virtual perspective projection cameras, and projecting the panoramic image onto a plurality of perspective projection images;

s4, performing back projection on the target extracted in the S3, and re-projecting the target into the panoramic image to realize target detection of the panoramic image of the power grid;

the improved deep neural network target detection model takes a U-Net network in semantic segmentation as a main network, the left half part of the network carries out convolution and pooling operation on an original input three-band image, abstract features are extracted from the image, and a coded feature map is obtained; the right half part of the network carries out up-sampling on the coded characteristic diagram, the resolution of the characteristic diagram is improved, and the characteristic diagram of the left half part and the characteristic diagram of the right half part are spliced by utilizing direct connection;

the plane (u, v) represents the spliced spherical panoramic image, and each pixel point p in the image corresponds to a three-dimensional point in the spherical space O ' -X ' Y ' ZPoint->On a sphere of radius Rs; the abscissa u represents longitude and the ordinate v represents latitude; let the width of the image be wd, the height be ht, and the longitude and latitude be (lon, lat), then there are:

wd＝2×ht (1)

wd＝2πR _s (22)

lon＝u/R _s ，lat＝v/R _s (3)

X′＝R _s cos(lat)sin(lon) (4)

Y′＝R _s cos(lat)cos(lon) (5)

Z′＝R _s sin(lat) (6)

the geodetic coordinate system O-XYZ is transformed into O '-X' Y 'Z' through translation and rotation;

the homogeneous coordinates of [ X Y Z1 ] are introduced] ^T The above formula is simplified to:

P＝[R T] (9)

T＝-RT _s (10)

in the formula, P is a projection matrix, and the size is 3 x 4; the projection matrix contains information of translation vectors and rotation matrices.

2. The method for detecting a panoramic object in power grid inspection according to claim 1, wherein in S1, the panoramic acquisition device is a panoramic camera, and the panoramic image acquired by the panoramic camera is a spherical panoramic image.

3. The panoramic object detection method in power grid inspection according to claim 2, wherein in S2, the virtual focal length of the perspective projected image is a radius value of a spherical panoramic image, and the angle of view of each perspective projection is 45 degrees; the Z-axis direction of perspective projection is the connecting line of the sphere center of the spherical panoramic image and the perspective projection center.

4. The panoramic object detection method in power grid inspection according to claim 2, wherein in S2, 8 virtual perspective projection cameras are set to obtain 8 perspective projection images.

5. The method for detecting a panoramic object in power grid inspection according to claim 1, wherein in S4, after the detected object position in the perspective projection image is back projected to the panoramic image, 4 corner points of the object frame form a quadrilateral, and an external rectangle of the quadrilateral is calculated as a final detection result; the adjacent two sides of the circumscribed rectangle are respectively parallel to the XY axes of the image.