CN115661337A - Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel - Google Patents

Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel Download PDF

Info

Publication number
CN115661337A
CN115661337A CN202211164520.7A CN202211164520A CN115661337A CN 115661337 A CN115661337 A CN 115661337A CN 202211164520 A CN202211164520 A CN 202211164520A CN 115661337 A CN115661337 A CN 115661337A
Authority
CN
China
Prior art keywords
point
image
point cloud
monocular
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211164520.7A
Other languages
Chinese (zh)
Inventor
陆年生
张可
黄文礼
茆骥
王柳
杨建旭
徐沛哲
陈博文
晏雨晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Nanrui Jiyuan Power Grid Technology Co ltd
NARI Group Corp
Original Assignee
Anhui Nanrui Jiyuan Power Grid Technology Co ltd
NARI Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Nanrui Jiyuan Power Grid Technology Co ltd, NARI Group Corp filed Critical Anhui Nanrui Jiyuan Power Grid Technology Co ltd
Priority to CN202211164520.7A priority Critical patent/CN115661337A/en
Publication of CN115661337A publication Critical patent/CN115661337A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a binocular vision-based three-dimensional reconstruction method for transformer substation operators, which comprises the following steps: performing monocular calibration on each monocular camera respectively, and calibrating the binocular cameras after storing the acquired monocular parameters; acquiring an image in real time, and converting a target coordinate into a camera coordinate system; performing stereo matching and calculating a parallax value; and calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map subjected to parallax value calculation to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation. According to the invention, the transformer substation and the operating personnel are subjected to three-dimensional reconstruction simultaneously, and the matching combination of the dynamic operating personnel point cloud and the static transformer substation point cloud is realized by taking the static transformer substation three-dimensional coordinate system as a reference; the depth information is obtained in the three-dimensional reconstruction process, the accurate distance calculation can be obtained while the operation personnel are detected, and the operation safety control in the transformer substation is facilitated.

Description

Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel
Technical Field
The invention relates to the technical field of three-dimensional modeling, in particular to a binocular vision-based three-dimensional reconstruction method for substation operating personnel.
Background
The behavior of mistakenly entering the electrified interval in the power operation happens occasionally, and the artificial intelligence technology is widely applied to monitoring the behavior of the power operation and is mainly used for judging whether a person wears safety helmets, tools, crossing electronic fences and other abnormal conditions. In order to ensure the personal safety of operators, the operators need to keep a certain distance with dangerous live equipment during operation, and the safe distance for the operators to operate when the equipment is not powered off is specified in the state network 'safety regulations': 10kv and below 0.7m,35kv 1.0m,110kv 1.5m,220kv 3.0m,500kv 5m. And because the estimated space distance on the two-dimensional image is inaccurate, the operator cannot give an alarm in time before mistakenly touching the electrified device, so that an operation accident occurs.
With the development of technologies such as three-dimensional modeling and virtual reality, the informatization, digitization and intelligent supervision technologies of the transformer substation are mature day by day, and the three-dimensional visualization technology becomes a research hotspot. When monocular and RGBD cameras are used in a vision system, a large amount of calculation is required and it is required to be used in a specific environment, and a real distance between electrical devices cannot be calculated or calculation is inaccurate, whereas binocular vision has advantages of simple calculation, a relatively free use scene, and more accurate distance calculation through extraction conversion of a depth map. At present, a binocular vision-based transformer substation three-dimensional reconstruction technology is partially available, but the technology basically focuses on static transformer substation scenes, few cases exist for identifying and reconstructing transformer substation operation personnel in real time, and the real-time distance between the transformer substation operation personnel and dangerous live equipment is calculated in three-dimensional reconstruction.
Disclosure of Invention
The invention aims to provide a binocular vision-based method for three-dimensional reconstruction of substation operators, which can be used for modeling substation operators with high quality, high precision and high efficiency and provides a good basis for avoiding operation accidents and intelligently supervising a substation.
In order to achieve the purpose, the invention adopts the following technical scheme: a binocular vision-based three-dimensional reconstruction method for substation operators comprises the following steps:
(1) Respectively carrying out monocular calibration on each monocular camera, and calibrating the binocular cameras after storing the acquired monocular parameters, namely calculating to obtain the relative position between the two monocular cameras;
(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;
(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;
(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map subjected to parallax value calculation to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.
The step (1) specifically comprises the following steps:
firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space as
Figure BDA0003861603630000021
The homogeneous coordinate of the corresponding point in the image is
Figure BDA0003861603630000022
Expressed by a homogeneous coordinate system as:
Figure BDA0003861603630000023
wherein, A is the internal reference matrix of the monocular camera, and [ R t ] is the external reference matrix of the monocular camera, and the internal reference matrix and the external reference matrix are converted into:
Figure BDA0003861603630000024
h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equations in space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;
after the monocular camera internal reference and the monocular camera external reference are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively l 、T l And R r 、T r Taking any point P (X) in space w ,Y w ,Z w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) l ,Y l ,Z l ) And (X) z ,Y z ,Z z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) w ,Y w ,Z w ) And (X) l ,Y l ,Z l ) The homogeneous form of the coordinates satisfies:
Figure BDA0003861603630000031
(X w ,Y w ,Z w ) And (X) r ,Y r ,Z r ) The homogeneous form of the coordinates satisfies:
Figure BDA0003861603630000032
jointly solving the homogeneous form of the two coordinates as follows:
Figure BDA0003861603630000033
wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.
The step (2) specifically comprises the following steps:
tracking and identifying transformer substation operating personnel by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of yolov5 target detection algorithm identification;
the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the object localization loss, the second term is the confidence loss, the third term is the classification loss, and the loss function of yolov5 object detection algorithm is as follows:
Figure BDA0003861603630000041
wherein the target positioning loss is:
Figure BDA0003861603630000042
the confidence loss is:
Figure BDA0003861603630000043
the classification loss is:
Figure BDA0003861603630000044
wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width and height respectively,
Figure BDA0003861603630000051
indicates whether the jth anchor of the ith mesh is responsible for this goal;
Figure BDA0003861603630000052
representing parameter confidence;
Figure BDA0003861603630000053
representing a class probability;
meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is set
Figure BDA0003861603630000054
A feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:
Figure BDA0003861603630000055
wherein,
Figure BDA0003861603630000056
representing the feature vector at the (i, j) th position on the output feature map channel,
Figure BDA0003861603630000057
attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:
Figure BDA0003861603630000058
the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;
defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:
Figure BDA0003861603630000059
then solving the value of a discriminant for the Hessian matrix of the pixel points:
Figure BDA00038616036300000510
judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image acquired by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;
after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;
the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then the rest pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:
Figure BDA0003861603630000061
wherein xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;
after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;
after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating a Hamming distance, wherein the obtained Hamming distance is used as matching cost of stereo matching;
the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:
Figure BDA0003861603630000062
wherein, C 0 (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;
and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.
The step (4) specifically comprises the following steps:
after the extraction of the final characteristic points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB (red, green and blue) map into a single-frame point cloud map in a fusion manner;
setting two pixel points as p and q respectively, and the aggregation cost is as follows:
Figure BDA0003861603630000063
wherein S (p, q) represents the sum of edge weights between two pixel points; c d (q) is the matching cost value;
and optimizing the calculation of cost aggregation in the stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p when the parallax value is d as follows:
Figure BDA0003861603630000071
wherein, P r (p) is the parent node of pixel p on the tree,
Figure BDA0003861603630000072
representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallaxes graph of a single image, and converting the parallaxes graph into a depth graph, wherein the conversion formula is as follows:
Figure BDA0003861603630000073
wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;
obtaining coordinates (x, y, z) of the target relative to the monocular camera in the depth map, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:
Figure BDA0003861603630000074
wherein, X = X 1 ,x 2 ,…,x m And Y = Y 1 ,y 2 ,…,y m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;
when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, in the field of electric power, three-dimensional reconstruction is basically performed on a scene at present, and detection of transformer substation operators is not combined with a three-dimensional reconstruction technology, but the transformer substation and the operators are subjected to three-dimensional reconstruction simultaneously, and a static transformer substation three-dimensional coordinate system is taken as a reference, so that matching combination of dynamic operator point cloud and static transformer substation point cloud is realized; secondly, the target detection algorithm based on the RGB graph cannot obtain distance information, depth information is obtained in the three-dimensional reconstruction process, accurate distance calculation can be obtained while detecting operators, operation safety control in a transformer substation is greatly facilitated, and the operators are prevented from entering a dangerous area by mistake or being too close to dangerous live equipment.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a flow chart of calibration of a binocular camera;
FIG. 3 is a network architecture diagram of a video image based object detection algorithm;
fig. 4 is a flow chart of converting a depth map into a three-dimensional point cloud map.
Detailed Description
As shown in fig. 1, a binocular vision-based method for three-dimensional reconstruction of substation operators comprises the following steps:
(1) Respectively calibrating each monocular camera, storing the acquired monocular parameters, and calibrating the binocular cameras, namely calculating the relative position between the two monocular cameras;
(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track transformer substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;
(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;
(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map calculated by a parallax value to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.
As shown in fig. 2, the step (1) specifically includes:
firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space as
Figure BDA0003861603630000081
The homogeneous coordinate of the corresponding point in the image is
Figure BDA0003861603630000082
Expressed by a homogeneous coordinate system as:
Figure BDA0003861603630000091
where A is the internal reference matrix of the monocular camera and [ Rt ] is the external reference matrix of the monocular camera, converting it into:
Figure BDA0003861603630000092
h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equation sets in a space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;
after the internal reference and the external reference of the monocular camera are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively l 、T l And R r 、T r Taking any point P (X) in space w ,Y w ,Z w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) l ,Y l ,Z l ) And (X) z ,Y z ,Z z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) w ,Y w ,Z w ) And (X) l ,Y l ,Z l ) The homogeneous form of the coordinates satisfies:
Figure BDA0003861603630000093
(X w ,Y w ,Z w ) And (X) r ,Y r ,Z r ) The homogeneous form of the coordinates satisfies:
Figure BDA0003861603630000094
jointly solving the homogeneous form of the two coordinates as follows:
Figure BDA0003861603630000095
wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.
As shown in fig. 3, the step (2) specifically refers to:
tracking and identifying transformer substation operating personnel by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum value suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of recognizing the yolov5 target detection algorithm;
the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the target localization loss, the second term is the confidence loss, the third part is the classification loss, and the loss function of yolov5 target detection algorithm is as follows:
Figure BDA0003861603630000101
wherein the target positioning loss is:
Figure BDA0003861603630000111
the confidence loss is:
Figure BDA0003861603630000112
the classification loss is:
Figure BDA0003861603630000113
wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width, height, respectively,
Figure BDA0003861603630000114
indicates whether the jth anchor of the ith mesh is responsible for this goal;
Figure BDA0003861603630000115
representing a parameter confidence;
Figure BDA0003861603630000116
representing a class probability;
meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is set
Figure BDA0003861603630000117
A feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:
Figure BDA0003861603630000118
wherein,
Figure BDA0003861603630000119
representing the feature vector at the (i, j) th position on the output feature map channel,
Figure BDA00038616036300001110
attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:
Figure BDA00038616036300001111
the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;
defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:
Figure BDA0003861603630000121
then solving the value of a discriminant for the Hessian matrix of the pixel points:
Figure BDA0003861603630000122
judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image acquired by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; then filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;
after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;
the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then the rest pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:
Figure BDA0003861603630000123
wherein xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;
after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;
after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating the Hamming distance, wherein the obtained Hamming distance is used as the matching cost of stereo matching;
the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:
Figure BDA0003861603630000131
wherein, C 0 (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;
and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.
As shown in fig. 4, the step (4) specifically includes:
after the extraction of the final feature points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB map into a single-frame point cloud map in a fusion manner;
setting two pixel points as p and q respectively, wherein the polymerization cost is as follows:
Figure BDA0003861603630000132
wherein S (p, q) represents the sum of edge weights between two pixel points; c d (q) is the matching cost value;
and optimizing the calculation of cost aggregation in stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p (x, y) when the parallax value is d as follows:
Figure BDA0003861603630000133
wherein, P r (p) is the parent node of pixel p on the tree,
Figure BDA0003861603630000134
representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallactic image of a single image, and converting the parallactic image into a depth image, wherein the conversion formula is as follows:
Figure BDA0003861603630000135
wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;
obtaining coordinates (x, y, z) of the target in the depth map relative to the monocular camera, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:
Figure BDA0003861603630000141
wherein, X = X 1 ,x 2 ,…,x m And Y = Y 1 ,y 2 ,…,y m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;
when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.
In conclusion, the transformer substation and the operating personnel are subjected to three-dimensional reconstruction simultaneously, and the matching combination of the dynamic operating personnel point cloud and the static transformer substation point cloud is realized by taking the static transformer substation three-dimensional coordinate system as a reference; the target detection algorithm based on the RGB graph cannot obtain distance information, depth information is obtained in the three-dimensional reconstruction process, and accurate distance calculation can be obtained while detecting operators, so that operation safety control in a transformer substation is greatly facilitated, and the operators are prevented from entering a dangerous area by mistake or being too close to dangerous live equipment.

Claims (5)

1. A binocular vision-based three-dimensional reconstruction method for substation operators is characterized by comprising the following steps: the method comprises the following steps in sequence:
(1) Respectively carrying out monocular calibration on each monocular camera, and calibrating the binocular cameras after storing the acquired monocular parameters, namely calculating to obtain the relative position between the two monocular cameras;
(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;
(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;
(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map calculated by a parallax value to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.
2. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (1) specifically comprises the following steps:
firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space as
Figure FDA0003861603620000011
The homogeneous coordinate of the corresponding point in the image is
Figure FDA0003861603620000012
Expressed by a homogeneous coordinate system as:
Figure FDA0003861603620000013
wherein, A is the internal reference matrix of the monocular camera, and [ Rt ] is the external reference matrix of the monocular camera, and the internal reference matrix and the external reference matrix are converted into:
Figure FDA0003861603620000014
h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equation sets in a space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;
after the monocular camera internal reference and the monocular camera external reference are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively l 、T l And R r 、T r Taking any point P (X) in space w ,Y w ,Z w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) l ,Y l ,Z l ) And (X) z ,Y z ,Z z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) w ,Y w ,Z w ) And (X) l ,Y l ,Z l ) The homogeneous form of the coordinates satisfies:
Figure FDA0003861603620000021
(X w ,Y w ,Z w ) And (X) r ,Y r ,Z r ) The homogeneous form of the coordinates satisfies:
Figure FDA0003861603620000022
jointly solving the homogeneous form of the two coordinates as follows:
Figure FDA0003861603620000023
wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.
3. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (2) specifically comprises the following steps:
tracking and identifying transformer substation operators by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum value suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of yolov5 target detection algorithm identification;
the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the object localization loss, the second term is the confidence loss, the third term is the classification loss, and the loss function of yolov5 object detection algorithm is as follows:
Figure FDA0003861603620000031
wherein, the target positioning loss is:
Figure FDA0003861603620000032
the confidence loss is:
Figure FDA0003861603620000033
the classification loss is:
Figure FDA0003861603620000041
wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width and height respectively,
Figure FDA0003861603620000042
indicates whether the jth anchor of the ith mesh is responsible for this goal;
Figure FDA0003861603620000043
representing a parameter confidence;
Figure FDA0003861603620000044
representing a class probability;
meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is set
Figure FDA0003861603620000045
A feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:
Figure FDA0003861603620000046
wherein,
Figure FDA0003861603620000047
the feature vector at the (i, j) th position on the output feature map channel,
Figure FDA0003861603620000048
attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:
Figure FDA0003861603620000049
4. the binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;
defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:
Figure FDA00038616036200000410
solving the value of a discriminant for the Hessian matrix of the pixel points:
Figure FDA00038616036200000411
judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image collected by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;
after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;
the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then other pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:
Figure FDA0003861603620000051
xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;
after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;
after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating a Hamming distance, wherein the obtained Hamming distance is used as matching cost of stereo matching;
the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:
Figure FDA0003861603620000052
wherein, C 0 (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;
and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.
5. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (4) specifically comprises the following steps:
after the extraction of the final feature points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB map into a single-frame point cloud map in a fusion manner;
setting two pixel points as p and q respectively, wherein the polymerization cost is as follows:
Figure FDA0003861603620000061
wherein S (p, q) represents the sum of edge weights between two pixel points; c d (q) is the matching cost value;
and optimizing the calculation of cost aggregation in the stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p when the parallax value is d as follows:
Figure FDA0003861603620000062
wherein, P r (p) is the parent node of pixel p on the tree,
Figure FDA0003861603620000063
representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallaxes graph of a single image, and converting the parallaxes graph into a depth graph, wherein the conversion formula is as follows:
Figure FDA0003861603620000064
wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;
obtaining coordinates (x, y, z) of the target relative to the monocular camera in the depth map, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:
Figure FDA0003861603620000065
wherein, X = X 1 ,x 2 ,…,x m And Y = Y 1 ,y 2 ,…,y m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;
when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.
CN202211164520.7A 2022-09-23 2022-09-23 Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel Pending CN115661337A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211164520.7A CN115661337A (en) 2022-09-23 2022-09-23 Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211164520.7A CN115661337A (en) 2022-09-23 2022-09-23 Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel

Publications (1)

Publication Number Publication Date
CN115661337A true CN115661337A (en) 2023-01-31

Family

ID=84986127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211164520.7A Pending CN115661337A (en) 2022-09-23 2022-09-23 Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel

Country Status (1)

Country Link
CN (1) CN115661337A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861407A (en) * 2023-02-28 2023-03-28 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) Safe distance detection method and system based on deep learning
CN116973939A (en) * 2023-09-25 2023-10-31 中科视语(北京)科技有限公司 Safety monitoring method and device
CN117061720A (en) * 2023-10-11 2023-11-14 广州市大湾区虚拟现实研究院 Stereo image pair generation method based on monocular image and depth image rendering
CN117115145A (en) * 2023-10-19 2023-11-24 宁德思客琦智能装备有限公司 Detection method and device, electronic equipment and computer readable medium
CN117422754A (en) * 2023-10-30 2024-01-19 河南送变电建设有限公司 Method for calculating space distance of substation near-electricity operation personnel based on instance segmentation, readable storage medium and electronic equipment
CN117765174A (en) * 2023-12-19 2024-03-26 内蒙古电力勘测设计院有限责任公司 Three-dimensional reconstruction method, device and equipment based on monocular cradle head camera

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861407A (en) * 2023-02-28 2023-03-28 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) Safe distance detection method and system based on deep learning
CN115861407B (en) * 2023-02-28 2023-06-16 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) Safety distance detection method and system based on deep learning
CN116973939A (en) * 2023-09-25 2023-10-31 中科视语(北京)科技有限公司 Safety monitoring method and device
CN116973939B (en) * 2023-09-25 2024-02-06 中科视语(北京)科技有限公司 Safety monitoring method and device
CN117061720A (en) * 2023-10-11 2023-11-14 广州市大湾区虚拟现实研究院 Stereo image pair generation method based on monocular image and depth image rendering
CN117061720B (en) * 2023-10-11 2024-03-01 广州市大湾区虚拟现实研究院 Stereo image pair generation method based on monocular image and depth image rendering
CN117115145A (en) * 2023-10-19 2023-11-24 宁德思客琦智能装备有限公司 Detection method and device, electronic equipment and computer readable medium
CN117115145B (en) * 2023-10-19 2024-02-09 宁德思客琦智能装备有限公司 Detection method and device, electronic equipment and computer readable medium
CN117422754A (en) * 2023-10-30 2024-01-19 河南送变电建设有限公司 Method for calculating space distance of substation near-electricity operation personnel based on instance segmentation, readable storage medium and electronic equipment
CN117422754B (en) * 2023-10-30 2024-05-28 河南送变电建设有限公司 Method for calculating space distance of substation near-electricity operation personnel based on instance segmentation, readable storage medium and electronic equipment
CN117765174A (en) * 2023-12-19 2024-03-26 内蒙古电力勘测设计院有限责任公司 Three-dimensional reconstruction method, device and equipment based on monocular cradle head camera

Similar Documents

Publication Publication Date Title
CN115661337A (en) Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel
CN109034018B (en) Low-altitude small unmanned aerial vehicle obstacle sensing method based on binocular vision
CN109559310B (en) Power transmission and transformation inspection image quality evaluation method and system based on significance detection
CN109389043B (en) Crowd density estimation method for aerial picture of unmanned aerial vehicle
CN112381784A (en) Equipment detecting system based on multispectral image
CN107491781A (en) A kind of crusing robot visible ray and infrared sensor data fusion method
CN112379231A (en) Equipment detection method and device based on multispectral image
CN107767374A (en) A kind of GIS disc insulators inner conductor hot-spot intelligent diagnosing method
CN112686938A (en) Electric transmission line clear distance calculation and safety warning method based on binocular image ranging
CN105279485B (en) The detection method of monitoring objective abnormal behaviour under laser night vision
Rong et al. Intelligent detection of vegetation encroachment of power lines with advanced stereovision
WO2020135187A1 (en) Unmanned aerial vehicle recognition and positioning system and method based on rgb_d and deep convolutional network
CN113947555A (en) Infrared and visible light fused visual system and method based on deep neural network
CN112184628B (en) Infrared duplex wave image and cloud early warning system and method for flood prevention and danger detection of dike
CN103729620A (en) Multi-view pedestrian detection method based on multi-view Bayesian network
KR102678988B1 (en) Deep learning based image preprocessing system and method for fire detection
CN114170535A (en) Target detection positioning method, device, controller, storage medium and unmanned aerial vehicle
CN113076825A (en) Transformer substation worker climbing safety monitoring method
CN116563386A (en) Binocular vision-based substation worker near-electricity distance detection method
CN115965578A (en) Binocular stereo matching detection method and device based on channel attention mechanism
Xu et al. Fast detection fusion network (FDFnet): An end to end object detection framework based on heterogeneous image fusion for power facility inspection
CN114494427A (en) Method, system and terminal for detecting illegal behavior of person standing under suspension arm
CN113887272A (en) Violent behavior intelligent safety detection system based on edge calculation
CN113283290A (en) Power distribution room monitoring method based on 3D vision and RGB characteristics
CN116502810B (en) Standardized production monitoring method based on image recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination