CN115661337A

CN115661337A - Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel

Info

Publication number: CN115661337A
Application number: CN202211164520.7A
Authority: CN
Inventors: 陆年生; 张可; 黄文礼; 茆骥; 王柳; 杨建旭; 徐沛哲; 陈博文; 晏雨晴
Original assignee: Anhui Nanrui Jiyuan Power Grid Technology Co ltd; NARI Group Corp
Current assignee: Anhui Nanrui Jiyuan Power Grid Technology Co ltd; NARI Group Corp
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-01-31

Abstract

The invention relates to a binocular vision-based three-dimensional reconstruction method for transformer substation operators, which comprises the following steps: performing monocular calibration on each monocular camera respectively, and calibrating the binocular cameras after storing the acquired monocular parameters; acquiring an image in real time, and converting a target coordinate into a camera coordinate system; performing stereo matching and calculating a parallax value; and calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map subjected to parallax value calculation to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation. According to the invention, the transformer substation and the operating personnel are subjected to three-dimensional reconstruction simultaneously, and the matching combination of the dynamic operating personnel point cloud and the static transformer substation point cloud is realized by taking the static transformer substation three-dimensional coordinate system as a reference; the depth information is obtained in the three-dimensional reconstruction process, the accurate distance calculation can be obtained while the operation personnel are detected, and the operation safety control in the transformer substation is facilitated.

Description

Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel

Technical Field

The invention relates to the technical field of three-dimensional modeling, in particular to a binocular vision-based three-dimensional reconstruction method for substation operating personnel.

Background

The behavior of mistakenly entering the electrified interval in the power operation happens occasionally, and the artificial intelligence technology is widely applied to monitoring the behavior of the power operation and is mainly used for judging whether a person wears safety helmets, tools, crossing electronic fences and other abnormal conditions. In order to ensure the personal safety of operators, the operators need to keep a certain distance with dangerous live equipment during operation, and the safe distance for the operators to operate when the equipment is not powered off is specified in the state network 'safety regulations': 10kv and below 0.7m,35kv 1.0m,110kv 1.5m,220kv 3.0m,500kv 5m. And because the estimated space distance on the two-dimensional image is inaccurate, the operator cannot give an alarm in time before mistakenly touching the electrified device, so that an operation accident occurs.

With the development of technologies such as three-dimensional modeling and virtual reality, the informatization, digitization and intelligent supervision technologies of the transformer substation are mature day by day, and the three-dimensional visualization technology becomes a research hotspot. When monocular and RGBD cameras are used in a vision system, a large amount of calculation is required and it is required to be used in a specific environment, and a real distance between electrical devices cannot be calculated or calculation is inaccurate, whereas binocular vision has advantages of simple calculation, a relatively free use scene, and more accurate distance calculation through extraction conversion of a depth map. At present, a binocular vision-based transformer substation three-dimensional reconstruction technology is partially available, but the technology basically focuses on static transformer substation scenes, few cases exist for identifying and reconstructing transformer substation operation personnel in real time, and the real-time distance between the transformer substation operation personnel and dangerous live equipment is calculated in three-dimensional reconstruction.

Disclosure of Invention

The invention aims to provide a binocular vision-based method for three-dimensional reconstruction of substation operators, which can be used for modeling substation operators with high quality, high precision and high efficiency and provides a good basis for avoiding operation accidents and intelligently supervising a substation.

In order to achieve the purpose, the invention adopts the following technical scheme: a binocular vision-based three-dimensional reconstruction method for substation operators comprises the following steps:

(1) Respectively carrying out monocular calibration on each monocular camera, and calibrating the binocular cameras after storing the acquired monocular parameters, namely calculating to obtain the relative position between the two monocular cameras;

(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;

(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;

(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map subjected to parallax value calculation to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.

The step (1) specifically comprises the following steps:

firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space as

The homogeneous coordinate of the corresponding point in the image is

Expressed by a homogeneous coordinate system as:

wherein, A is the internal reference matrix of the monocular camera, and [ R t ] is the external reference matrix of the monocular camera, and the internal reference matrix and the external reference matrix are converted into:

h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equations in space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;

after the monocular camera internal reference and the monocular camera external reference are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively _l 、T _l And R _r 、T _r Taking any point P (X) in space _w ,Y _w ,Z _w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) _l ,Y _l ,Z _l ) And (X) _z ,Y _z ,Z _z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) _w ,Y _w ,Z _w ) And (X) _l ,Y _l ,Z _l ) The homogeneous form of the coordinates satisfies:

(X _w ,Y _w ,Z _w ) And (X) _r ,Y _r ,Z _r ) The homogeneous form of the coordinates satisfies:

jointly solving the homogeneous form of the two coordinates as follows:

wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.

The step (2) specifically comprises the following steps:

tracking and identifying transformer substation operating personnel by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of yolov5 target detection algorithm identification;

the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the object localization loss, the second term is the confidence loss, the third term is the classification loss, and the loss function of yolov5 object detection algorithm is as follows:

wherein the target positioning loss is:

the confidence loss is:

the classification loss is:

wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width and height respectively,

indicates whether the jth anchor of the ith mesh is responsible for this goal;

representing parameter confidence;

representing a class probability;

meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is set

A feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:

wherein,

representing the feature vector at the (i, j) th position on the output feature map channel,

attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:

the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;

defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:

then solving the value of a discriminant for the Hessian matrix of the pixel points:

judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image acquired by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;

after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;

the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then the rest pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:

wherein xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;

after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;

after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating a Hamming distance, wherein the obtained Hamming distance is used as matching cost of stereo matching;

the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:

wherein, C ₀ (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;

and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.

The step (4) specifically comprises the following steps:

after the extraction of the final characteristic points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB (red, green and blue) map into a single-frame point cloud map in a fusion manner;

setting two pixel points as p and q respectively, and the aggregation cost is as follows:

wherein S (p, q) represents the sum of edge weights between two pixel points; c _d (q) is the matching cost value;

and optimizing the calculation of cost aggregation in the stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p when the parallax value is d as follows:

wherein, P _r (p) is the parent node of pixel p on the tree,

representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallaxes graph of a single image, and converting the parallaxes graph into a depth graph, wherein the conversion formula is as follows:

wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;

obtaining coordinates (x, y, z) of the target relative to the monocular camera in the depth map, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:

wherein, X = X ₁ ,x ₂ ,…,x _m And Y = Y ₁ ,y ₂ ,…,y _m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;

when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.

According to the technical scheme, the beneficial effects of the invention are as follows: firstly, in the field of electric power, three-dimensional reconstruction is basically performed on a scene at present, and detection of transformer substation operators is not combined with a three-dimensional reconstruction technology, but the transformer substation and the operators are subjected to three-dimensional reconstruction simultaneously, and a static transformer substation three-dimensional coordinate system is taken as a reference, so that matching combination of dynamic operator point cloud and static transformer substation point cloud is realized; secondly, the target detection algorithm based on the RGB graph cannot obtain distance information, depth information is obtained in the three-dimensional reconstruction process, accurate distance calculation can be obtained while detecting operators, operation safety control in a transformer substation is greatly facilitated, and the operators are prevented from entering a dangerous area by mistake or being too close to dangerous live equipment.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a flow chart of calibration of a binocular camera;

FIG. 3 is a network architecture diagram of a video image based object detection algorithm;

fig. 4 is a flow chart of converting a depth map into a three-dimensional point cloud map.

Detailed Description

As shown in fig. 1, a binocular vision-based method for three-dimensional reconstruction of substation operators comprises the following steps:

(1) Respectively calibrating each monocular camera, storing the acquired monocular parameters, and calibrating the binocular cameras, namely calculating the relative position between the two monocular cameras;

(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track transformer substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;

(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map calculated by a parallax value to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.

As shown in fig. 2, the step (1) specifically includes:

The homogeneous coordinate of the corresponding point in the image is

Expressed by a homogeneous coordinate system as:

where A is the internal reference matrix of the monocular camera and [ Rt ] is the external reference matrix of the monocular camera, converting it into:

h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equation sets in a space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;

after the internal reference and the external reference of the monocular camera are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively _l 、T _l And R _r 、T _r Taking any point P (X) in space _w ,Y _w ,Z _w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) _l ,Y _l ,Z _l ) And (X) _z ,Y _z ,Z _z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) _w ,Y _w ,Z _w ) And (X) _l ,Y _l ,Z _l ) The homogeneous form of the coordinates satisfies:

jointly solving the homogeneous form of the two coordinates as follows:

As shown in fig. 3, the step (2) specifically refers to:

tracking and identifying transformer substation operating personnel by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum value suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of recognizing the yolov5 target detection algorithm;

the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the target localization loss, the second term is the confidence loss, the third part is the classification loss, and the loss function of yolov5 target detection algorithm is as follows:

wherein the target positioning loss is:

the confidence loss is:

the classification loss is:

wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width, height, respectively,

indicates whether the jth anchor of the ith mesh is responsible for this goal;

representing a parameter confidence;

representing a class probability;

wherein,

judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image acquired by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; then filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;

after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating the Hamming distance, wherein the obtained Hamming distance is used as the matching cost of stereo matching;

As shown in fig. 4, the step (4) specifically includes:

after the extraction of the final feature points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB map into a single-frame point cloud map in a fusion manner;

setting two pixel points as p and q respectively, wherein the polymerization cost is as follows:

and optimizing the calculation of cost aggregation in stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p (x, y) when the parallax value is d as follows:

wherein, P _r (p) is the parent node of pixel p on the tree,

representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallactic image of a single image, and converting the parallactic image into a depth image, wherein the conversion formula is as follows:

obtaining coordinates (x, y, z) of the target in the depth map relative to the monocular camera, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:

In conclusion, the transformer substation and the operating personnel are subjected to three-dimensional reconstruction simultaneously, and the matching combination of the dynamic operating personnel point cloud and the static transformer substation point cloud is realized by taking the static transformer substation three-dimensional coordinate system as a reference; the target detection algorithm based on the RGB graph cannot obtain distance information, depth information is obtained in the three-dimensional reconstruction process, and accurate distance calculation can be obtained while detecting operators, so that operation safety control in a transformer substation is greatly facilitated, and the operators are prevented from entering a dangerous area by mistake or being too close to dangerous live equipment.

Claims

1. A binocular vision-based three-dimensional reconstruction method for substation operators is characterized by comprising the following steps: the method comprises the following steps in sequence:

2. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (1) specifically comprises the following steps:

The homogeneous coordinate of the corresponding point in the image is

Expressed by a homogeneous coordinate system as:

wherein, A is the internal reference matrix of the monocular camera, and [ Rt ] is the external reference matrix of the monocular camera, and the internal reference matrix and the external reference matrix are converted into:

jointly solving the homogeneous form of the two coordinates as follows:

3. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (2) specifically comprises the following steps:

tracking and identifying transformer substation operators by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum value suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of yolov5 target detection algorithm identification;

wherein, the target positioning loss is:

the confidence loss is:

the classification loss is:

indicates whether the jth anchor of the ith mesh is responsible for this goal;

representing a parameter confidence;

representing a class probability;

wherein,

the feature vector at the (i, j) th position on the output feature map channel,

4. the binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;

solving the value of a discriminant for the Hessian matrix of the pixel points:

judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image collected by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;

the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then other pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:

xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;

5. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (4) specifically comprises the following steps:

wherein, P _r (p) is the parent node of pixel p on the tree,