CN115661337A - Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel - Google Patents
Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel Download PDFInfo
- Publication number
- CN115661337A CN115661337A CN202211164520.7A CN202211164520A CN115661337A CN 115661337 A CN115661337 A CN 115661337A CN 202211164520 A CN202211164520 A CN 202211164520A CN 115661337 A CN115661337 A CN 115661337A
- Authority
- CN
- China
- Prior art keywords
- point
- image
- point cloud
- monocular
- camera
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000004364 calculation method Methods 0.000 claims abstract description 30
- 230000009466 transformation Effects 0.000 claims abstract description 10
- 239000011159 matrix material Substances 0.000 claims description 46
- 238000001514 detection method Methods 0.000 claims description 34
- 230000002776 aggregation Effects 0.000 claims description 13
- 238000004220 aggregation Methods 0.000 claims description 13
- 238000005457 optimization Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 6
- 238000013519 translation Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 4
- 230000000052 comparative effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims description 3
- 230000003287 optical effect Effects 0.000 claims description 3
- 238000009877 rendering Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 2
- 230000003068 static effect Effects 0.000 abstract description 7
- 230000008569 process Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 7
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a binocular vision-based three-dimensional reconstruction method for transformer substation operators, which comprises the following steps: performing monocular calibration on each monocular camera respectively, and calibrating the binocular cameras after storing the acquired monocular parameters; acquiring an image in real time, and converting a target coordinate into a camera coordinate system; performing stereo matching and calculating a parallax value; and calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map subjected to parallax value calculation to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation. According to the invention, the transformer substation and the operating personnel are subjected to three-dimensional reconstruction simultaneously, and the matching combination of the dynamic operating personnel point cloud and the static transformer substation point cloud is realized by taking the static transformer substation three-dimensional coordinate system as a reference; the depth information is obtained in the three-dimensional reconstruction process, the accurate distance calculation can be obtained while the operation personnel are detected, and the operation safety control in the transformer substation is facilitated.
Description
Technical Field
The invention relates to the technical field of three-dimensional modeling, in particular to a binocular vision-based three-dimensional reconstruction method for substation operating personnel.
Background
The behavior of mistakenly entering the electrified interval in the power operation happens occasionally, and the artificial intelligence technology is widely applied to monitoring the behavior of the power operation and is mainly used for judging whether a person wears safety helmets, tools, crossing electronic fences and other abnormal conditions. In order to ensure the personal safety of operators, the operators need to keep a certain distance with dangerous live equipment during operation, and the safe distance for the operators to operate when the equipment is not powered off is specified in the state network 'safety regulations': 10kv and below 0.7m,35kv 1.0m,110kv 1.5m,220kv 3.0m,500kv 5m. And because the estimated space distance on the two-dimensional image is inaccurate, the operator cannot give an alarm in time before mistakenly touching the electrified device, so that an operation accident occurs.
With the development of technologies such as three-dimensional modeling and virtual reality, the informatization, digitization and intelligent supervision technologies of the transformer substation are mature day by day, and the three-dimensional visualization technology becomes a research hotspot. When monocular and RGBD cameras are used in a vision system, a large amount of calculation is required and it is required to be used in a specific environment, and a real distance between electrical devices cannot be calculated or calculation is inaccurate, whereas binocular vision has advantages of simple calculation, a relatively free use scene, and more accurate distance calculation through extraction conversion of a depth map. At present, a binocular vision-based transformer substation three-dimensional reconstruction technology is partially available, but the technology basically focuses on static transformer substation scenes, few cases exist for identifying and reconstructing transformer substation operation personnel in real time, and the real-time distance between the transformer substation operation personnel and dangerous live equipment is calculated in three-dimensional reconstruction.
Disclosure of Invention
The invention aims to provide a binocular vision-based method for three-dimensional reconstruction of substation operators, which can be used for modeling substation operators with high quality, high precision and high efficiency and provides a good basis for avoiding operation accidents and intelligently supervising a substation.
In order to achieve the purpose, the invention adopts the following technical scheme: a binocular vision-based three-dimensional reconstruction method for substation operators comprises the following steps:
(1) Respectively carrying out monocular calibration on each monocular camera, and calibrating the binocular cameras after storing the acquired monocular parameters, namely calculating to obtain the relative position between the two monocular cameras;
(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;
(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;
(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map subjected to parallax value calculation to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.
The step (1) specifically comprises the following steps:
firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space asThe homogeneous coordinate of the corresponding point in the image isExpressed by a homogeneous coordinate system as:
wherein, A is the internal reference matrix of the monocular camera, and [ R t ] is the external reference matrix of the monocular camera, and the internal reference matrix and the external reference matrix are converted into:
h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equations in space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;
after the monocular camera internal reference and the monocular camera external reference are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively l 、T l And R r 、T r Taking any point P (X) in space w ,Y w ,Z w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) l ,Y l ,Z l ) And (X) z ,Y z ,Z z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) w ,Y w ,Z w ) And (X) l ,Y l ,Z l ) The homogeneous form of the coordinates satisfies:
(X w ,Y w ,Z w ) And (X) r ,Y r ,Z r ) The homogeneous form of the coordinates satisfies:
jointly solving the homogeneous form of the two coordinates as follows:
wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.
The step (2) specifically comprises the following steps:
tracking and identifying transformer substation operating personnel by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of yolov5 target detection algorithm identification;
the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the object localization loss, the second term is the confidence loss, the third term is the classification loss, and the loss function of yolov5 object detection algorithm is as follows:
wherein the target positioning loss is:
the confidence loss is:
the classification loss is:
wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width and height respectively,indicates whether the jth anchor of the ith mesh is responsible for this goal;representing parameter confidence;representing a class probability;
meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is setA feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:
wherein,representing the feature vector at the (i, j) th position on the output feature map channel,attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:
the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;
defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:
then solving the value of a discriminant for the Hessian matrix of the pixel points:
judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image acquired by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;
after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;
the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then the rest pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:
wherein xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;
after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;
after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating a Hamming distance, wherein the obtained Hamming distance is used as matching cost of stereo matching;
the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:
wherein, C 0 (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;
and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.
The step (4) specifically comprises the following steps:
after the extraction of the final characteristic points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB (red, green and blue) map into a single-frame point cloud map in a fusion manner;
setting two pixel points as p and q respectively, and the aggregation cost is as follows:
wherein S (p, q) represents the sum of edge weights between two pixel points; c d (q) is the matching cost value;
and optimizing the calculation of cost aggregation in the stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p when the parallax value is d as follows:
wherein, P r (p) is the parent node of pixel p on the tree,representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallaxes graph of a single image, and converting the parallaxes graph into a depth graph, wherein the conversion formula is as follows:
wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;
obtaining coordinates (x, y, z) of the target relative to the monocular camera in the depth map, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:
wherein, X = X 1 ,x 2 ,…,x m And Y = Y 1 ,y 2 ,…,y m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;
when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.
According to the technical scheme, the beneficial effects of the invention are as follows: firstly, in the field of electric power, three-dimensional reconstruction is basically performed on a scene at present, and detection of transformer substation operators is not combined with a three-dimensional reconstruction technology, but the transformer substation and the operators are subjected to three-dimensional reconstruction simultaneously, and a static transformer substation three-dimensional coordinate system is taken as a reference, so that matching combination of dynamic operator point cloud and static transformer substation point cloud is realized; secondly, the target detection algorithm based on the RGB graph cannot obtain distance information, depth information is obtained in the three-dimensional reconstruction process, accurate distance calculation can be obtained while detecting operators, operation safety control in a transformer substation is greatly facilitated, and the operators are prevented from entering a dangerous area by mistake or being too close to dangerous live equipment.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a flow chart of calibration of a binocular camera;
FIG. 3 is a network architecture diagram of a video image based object detection algorithm;
fig. 4 is a flow chart of converting a depth map into a three-dimensional point cloud map.
Detailed Description
As shown in fig. 1, a binocular vision-based method for three-dimensional reconstruction of substation operators comprises the following steps:
(1) Respectively calibrating each monocular camera, storing the acquired monocular parameters, and calibrating the binocular cameras, namely calculating the relative position between the two monocular cameras;
(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track transformer substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;
(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;
(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map calculated by a parallax value to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.
As shown in fig. 2, the step (1) specifically includes:
firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space asThe homogeneous coordinate of the corresponding point in the image isExpressed by a homogeneous coordinate system as:
where A is the internal reference matrix of the monocular camera and [ Rt ] is the external reference matrix of the monocular camera, converting it into:
h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equation sets in a space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;
after the internal reference and the external reference of the monocular camera are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively l 、T l And R r 、T r Taking any point P (X) in space w ,Y w ,Z w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) l ,Y l ,Z l ) And (X) z ,Y z ,Z z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) w ,Y w ,Z w ) And (X) l ,Y l ,Z l ) The homogeneous form of the coordinates satisfies:
(X w ,Y w ,Z w ) And (X) r ,Y r ,Z r ) The homogeneous form of the coordinates satisfies:
jointly solving the homogeneous form of the two coordinates as follows:
wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.
As shown in fig. 3, the step (2) specifically refers to:
tracking and identifying transformer substation operating personnel by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum value suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of recognizing the yolov5 target detection algorithm;
the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the target localization loss, the second term is the confidence loss, the third part is the classification loss, and the loss function of yolov5 target detection algorithm is as follows:
wherein the target positioning loss is:
the confidence loss is:
the classification loss is:
wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width, height, respectively,indicates whether the jth anchor of the ith mesh is responsible for this goal;representing a parameter confidence;representing a class probability;
meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is setA feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:
wherein,representing the feature vector at the (i, j) th position on the output feature map channel,attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:
the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;
defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:
then solving the value of a discriminant for the Hessian matrix of the pixel points:
judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image acquired by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; then filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;
after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;
the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then the rest pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:
wherein xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;
after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;
after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating the Hamming distance, wherein the obtained Hamming distance is used as the matching cost of stereo matching;
the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:
wherein, C 0 (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;
and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.
As shown in fig. 4, the step (4) specifically includes:
after the extraction of the final feature points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB map into a single-frame point cloud map in a fusion manner;
setting two pixel points as p and q respectively, wherein the polymerization cost is as follows:
wherein S (p, q) represents the sum of edge weights between two pixel points; c d (q) is the matching cost value;
and optimizing the calculation of cost aggregation in stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p (x, y) when the parallax value is d as follows:
wherein, P r (p) is the parent node of pixel p on the tree,representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallactic image of a single image, and converting the parallactic image into a depth image, wherein the conversion formula is as follows:
wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;
obtaining coordinates (x, y, z) of the target in the depth map relative to the monocular camera, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:
wherein, X = X 1 ,x 2 ,…,x m And Y = Y 1 ,y 2 ,…,y m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;
when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.
In conclusion, the transformer substation and the operating personnel are subjected to three-dimensional reconstruction simultaneously, and the matching combination of the dynamic operating personnel point cloud and the static transformer substation point cloud is realized by taking the static transformer substation three-dimensional coordinate system as a reference; the target detection algorithm based on the RGB graph cannot obtain distance information, depth information is obtained in the three-dimensional reconstruction process, and accurate distance calculation can be obtained while detecting operators, so that operation safety control in a transformer substation is greatly facilitated, and the operators are prevented from entering a dangerous area by mistake or being too close to dangerous live equipment.
Claims (5)
1. A binocular vision-based three-dimensional reconstruction method for substation operators is characterized by comprising the following steps: the method comprises the following steps in sequence:
(1) Respectively carrying out monocular calibration on each monocular camera, and calibrating the binocular cameras after storing the acquired monocular parameters, namely calculating to obtain the relative position between the two monocular cameras;
(2) The binocular camera collects images in real time, video recognition is carried out by utilizing a yolov5 target detection algorithm to track substation operators, and target coordinates recognized by the yolov5 target detection algorithm are converted into a camera coordinate system;
(3) Recognizing images acquired by a binocular camera in real time through a yolov5 target detection algorithm, performing stereo matching, and calculating a parallax value;
(4) And calculating depth information, fusing a depth map formed by the depth information with an RGB (red, green and blue) map calculated by a parallax value to generate a single-frame point cloud map, fusing the generated single-frame point cloud map to obtain a complete three-dimensional point cloud, and constructing a real-time three-dimensional image through pose transformation.
2. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (1) specifically comprises the following steps:
firstly, solving the internal reference of the monocular camera through a homography matrix, and setting the homogeneous coordinate of a point in space asThe homogeneous coordinate of the corresponding point in the image isExpressed by a homogeneous coordinate system as:
wherein, A is the internal reference matrix of the monocular camera, and [ Rt ] is the external reference matrix of the monocular camera, and the internal reference matrix and the external reference matrix are converted into:
h is a homography matrix to be obtained, then H is solved by selecting more than four point simultaneous equation sets in a space, and an internal reference matrix A and an external reference matrix [ Rt ] are obtained by calculation after the homography matrix H is obtained;
after the monocular camera internal reference and the monocular camera external reference are obtained through calculation, the relative positions of the two monocular cameras are continuously solved, and the rotation matrix and the translation matrix of the two monocular cameras are set to be R respectively l 、T l And R r 、T r Taking any point P (X) in space w ,Y w ,Z w ) The coordinates of the point in the coordinate systems of the two monocular cameras are respectively (X) l ,Y l ,Z l ) And (X) z ,Y z ,Z z ) According to the conversion relation between the monocular camera coordinate system and the three-dimensional world coordinate system, (X) w ,Y w ,Z w ) And (X) l ,Y l ,Z l ) The homogeneous form of the coordinates satisfies:
(X w ,Y w ,Z w ) And (X) r ,Y r ,Z r ) The homogeneous form of the coordinates satisfies:
jointly solving the homogeneous form of the two coordinates as follows:
wherein, R is a rotation matrix of the monocular camera relative to the other monocular camera, and T is a translation matrix, and the relative position between the two monocular cameras is obtained through calculation according to the formula.
3. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (2) specifically comprises the following steps:
tracking and identifying transformer substation operators by adopting a yolov5 target detection algorithm, selecting a prior frame for a target in an image acquired by a binocular camera in real time by using a K-means clustering method before applying the yolov5 target detection algorithm, and specifically comprising the following steps of: firstly, screening preselected frames by adopting different confidence degree thresholds, and filtering the preselected frames with lower confidence degrees; then adopting a non-maximum value suppression algorithm to further screen the remaining preselection frames; finally, obtaining a final target detection frame and starting the next step of yolov5 target detection algorithm identification;
the loss function of yolov5 target detection algorithm is divided into three parts: the first term is the object localization loss, the second term is the confidence loss, the third term is the classification loss, and the loss function of yolov5 object detection algorithm is as follows:
wherein, the target positioning loss is:
the confidence loss is:
the classification loss is:
wherein S represents gridsize, B represents Box, and lambda is a weight coefficient; x, y represent coordinates, w, h represent width and height respectively,indicates whether the jth anchor of the ith mesh is responsible for this goal;representing a parameter confidence;representing a class probability;
meanwhile, an attention mechanism is added into a yolov5 target detection algorithm to fuse the characteristics of the personnel, and the attention mechanism is setA feature vector indicating a (i, j) position on the feature map adjusted from the nth layer to the lth layer, the features of which are fused as follows:
wherein,the feature vector at the (i, j) th position on the output feature map channel,attention weights for 3 different L layers on the feature map, respectively, are simple scalar variables that can be shared across all channels, setting:
4. the binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (3) specifically comprises the following steps: the real-time three-dimensional reconstruction needs to extract the characteristics of RBG pictures, and binocular stereo matching is carried out after the characteristic points are extracted;
defining any pixel point p in a binocular camera real-time acquisition image, wherein the expression form is as follows:
solving the value of a discriminant for the Hessian matrix of the pixel points:
judging after the value of the discriminant is obtained, comparing the processed pixel points with 26 pixel points in different scale spaces in the image collected by the binocular camera in real time, selecting the maximum value or the minimum value according to the value of the comparative discriminant, and preliminarily selecting key points; filtering out points with larger noise, and screening out stable characteristic points through three-dimensional linear interpolation to serve as final characteristic points;
after the final characteristic point is determined, stereo matching is carried out on the image acquired by the binocular camera in real time, wherein the stereo matching comprises matching cost calculation, matching cost aggregation, parallax value calculation and parallax optimization;
the matching cost calculation firstly selects a central pixel in an image, a domain transformation window is formed on the basis, then other pixels in the domain transformation window are compared with the central pixel, if the gray value of a pixel point is less than or equal to that of the central pixel point, the gray value of the pixel point is recorded as 0, otherwise, the gray value of the pixel point is recorded as 1, and the following formula is shown:
xi (I (p), I (q)) is a comparison function, I (p) is the gray value of the central pixel point, and I (q) is the gray value of the pixel points except the central pixel point;
after all the pixel points are set, 1 or 0 is sequentially formed into a bit string to be used as a matching element of the gray value of the current central pixel point;
after the images of the two monocular cameras are transformed, comparing bit strings of corresponding pixel points in the two images and calculating a Hamming distance, wherein the obtained Hamming distance is used as matching cost of stereo matching;
the matching cost aggregation has the effects of eliminating noise and optimizing a cost matrix, and the specific summation expression is as follows:
wherein, C 0 (i, j, d) is the initial matching cost, C (x, y, d) is the aggregated matching cost;
and after the aggregated matching cost is obtained, calculating a parallax value, namely determining the final parallax of the pixel point p by a winner's eating method, and then performing parallax optimization, namely performing optimization denoising on the final parallax of the pixel point p.
5. The binocular vision based three-dimensional reconstruction method for substation operators according to claim 1, characterized in that: the step (4) specifically comprises the following steps:
after the extraction of the final feature points of the image is finished, outputting the depth information of the single-frame image by calculating a parallax value, and rendering a depth map formed by the depth information and an RGB map into a single-frame point cloud map in a fusion manner;
setting two pixel points as p and q respectively, wherein the polymerization cost is as follows:
wherein S (p, q) represents the sum of edge weights between two pixel points; c d (q) is the matching cost value;
and optimizing the calculation of cost aggregation in the stereo matching by adopting a line segment tree structure, and then obtaining the aggregation cost of any pixel point p when the parallax value is d as follows:
wherein, P r (p) is the parent node of pixel p on the tree,representing all the child nodes, solving the parallaxes of all the pixel points to obtain a parallaxes graph of a single image, and converting the parallaxes graph into a depth graph, wherein the conversion formula is as follows:
wherein Z is depth, f is camera focal length, B is optical center distance of a binocular camera, and d represents a parallax value;
obtaining coordinates (x, y, z) of the target relative to the monocular camera in the depth map, and obtaining a single-frame point cloud map with color information under a single frame by combining the RGB image and a space coordinate system; after the single-frame point cloud image is obtained, performing pose estimation to find out key frames of adjacent points, and fusing the generated single-frame point cloud image to obtain a complete three-dimensional point cloud; estimating the three-dimensional point cloud of the next key pose by an iterative nearest neighbor algorithm, and judging whether the adjacent key points meet the matching relation according to the following formula:
wherein, X = X 1 ,x 2 ,…,x m And Y = Y 1 ,y 2 ,…,y m Respectively two overlapped point clouds, R is a rotation matrix between the two point clouds, and T is a parallel vector;
when the obtained mean square error value is smaller than a set threshold value, the matching is considered to be successful; if the corresponding image meets the requirement of the key frame, the image can be used as a next position point, so that the purpose of pose estimation is achieved, and the next key frame is successfully found; the estimation of the camera pose and the fusion of the point clouds are realized, and meanwhile, the point cloud coordinates of the moving target and the point cloud coordinate system of the background are matched to obtain the complete three-dimensional point cloud.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211164520.7A CN115661337A (en) | 2022-09-23 | 2022-09-23 | Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211164520.7A CN115661337A (en) | 2022-09-23 | 2022-09-23 | Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115661337A true CN115661337A (en) | 2023-01-31 |
Family
ID=84986127
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211164520.7A Pending CN115661337A (en) | 2022-09-23 | 2022-09-23 | Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115661337A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115861407A (en) * | 2023-02-28 | 2023-03-28 | 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) | Safe distance detection method and system based on deep learning |
CN116973939A (en) * | 2023-09-25 | 2023-10-31 | 中科视语(北京)科技有限公司 | Safety monitoring method and device |
CN117061720A (en) * | 2023-10-11 | 2023-11-14 | 广州市大湾区虚拟现实研究院 | Stereo image pair generation method based on monocular image and depth image rendering |
CN117115145A (en) * | 2023-10-19 | 2023-11-24 | 宁德思客琦智能装备有限公司 | Detection method and device, electronic equipment and computer readable medium |
CN117422754A (en) * | 2023-10-30 | 2024-01-19 | 河南送变电建设有限公司 | Method for calculating space distance of substation near-electricity operation personnel based on instance segmentation, readable storage medium and electronic equipment |
CN117765174A (en) * | 2023-12-19 | 2024-03-26 | 内蒙古电力勘测设计院有限责任公司 | Three-dimensional reconstruction method, device and equipment based on monocular cradle head camera |
-
2022
- 2022-09-23 CN CN202211164520.7A patent/CN115661337A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115861407A (en) * | 2023-02-28 | 2023-03-28 | 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) | Safe distance detection method and system based on deep learning |
CN115861407B (en) * | 2023-02-28 | 2023-06-16 | 山东未来网络研究院(紫金山实验室工业互联网创新应用基地) | Safety distance detection method and system based on deep learning |
CN116973939A (en) * | 2023-09-25 | 2023-10-31 | 中科视语(北京)科技有限公司 | Safety monitoring method and device |
CN116973939B (en) * | 2023-09-25 | 2024-02-06 | 中科视语(北京)科技有限公司 | Safety monitoring method and device |
CN117061720A (en) * | 2023-10-11 | 2023-11-14 | 广州市大湾区虚拟现实研究院 | Stereo image pair generation method based on monocular image and depth image rendering |
CN117061720B (en) * | 2023-10-11 | 2024-03-01 | 广州市大湾区虚拟现实研究院 | Stereo image pair generation method based on monocular image and depth image rendering |
CN117115145A (en) * | 2023-10-19 | 2023-11-24 | 宁德思客琦智能装备有限公司 | Detection method and device, electronic equipment and computer readable medium |
CN117115145B (en) * | 2023-10-19 | 2024-02-09 | 宁德思客琦智能装备有限公司 | Detection method and device, electronic equipment and computer readable medium |
CN117422754A (en) * | 2023-10-30 | 2024-01-19 | 河南送变电建设有限公司 | Method for calculating space distance of substation near-electricity operation personnel based on instance segmentation, readable storage medium and electronic equipment |
CN117422754B (en) * | 2023-10-30 | 2024-05-28 | 河南送变电建设有限公司 | Method for calculating space distance of substation near-electricity operation personnel based on instance segmentation, readable storage medium and electronic equipment |
CN117765174A (en) * | 2023-12-19 | 2024-03-26 | 内蒙古电力勘测设计院有限责任公司 | Three-dimensional reconstruction method, device and equipment based on monocular cradle head camera |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115661337A (en) | Binocular vision-based three-dimensional reconstruction method for transformer substation operating personnel | |
CN109034018B (en) | Low-altitude small unmanned aerial vehicle obstacle sensing method based on binocular vision | |
CN109559310B (en) | Power transmission and transformation inspection image quality evaluation method and system based on significance detection | |
CN109389043B (en) | Crowd density estimation method for aerial picture of unmanned aerial vehicle | |
CN112381784A (en) | Equipment detecting system based on multispectral image | |
CN107491781A (en) | A kind of crusing robot visible ray and infrared sensor data fusion method | |
CN112379231A (en) | Equipment detection method and device based on multispectral image | |
CN107767374A (en) | A kind of GIS disc insulators inner conductor hot-spot intelligent diagnosing method | |
CN112686938A (en) | Electric transmission line clear distance calculation and safety warning method based on binocular image ranging | |
CN105279485B (en) | The detection method of monitoring objective abnormal behaviour under laser night vision | |
Rong et al. | Intelligent detection of vegetation encroachment of power lines with advanced stereovision | |
WO2020135187A1 (en) | Unmanned aerial vehicle recognition and positioning system and method based on rgb_d and deep convolutional network | |
CN113947555A (en) | Infrared and visible light fused visual system and method based on deep neural network | |
CN112184628B (en) | Infrared duplex wave image and cloud early warning system and method for flood prevention and danger detection of dike | |
CN103729620A (en) | Multi-view pedestrian detection method based on multi-view Bayesian network | |
KR102678988B1 (en) | Deep learning based image preprocessing system and method for fire detection | |
CN114170535A (en) | Target detection positioning method, device, controller, storage medium and unmanned aerial vehicle | |
CN113076825A (en) | Transformer substation worker climbing safety monitoring method | |
CN116563386A (en) | Binocular vision-based substation worker near-electricity distance detection method | |
CN115965578A (en) | Binocular stereo matching detection method and device based on channel attention mechanism | |
Xu et al. | Fast detection fusion network (FDFnet): An end to end object detection framework based on heterogeneous image fusion for power facility inspection | |
CN114494427A (en) | Method, system and terminal for detecting illegal behavior of person standing under suspension arm | |
CN113887272A (en) | Violent behavior intelligent safety detection system based on edge calculation | |
CN113283290A (en) | Power distribution room monitoring method based on 3D vision and RGB characteristics | |
CN116502810B (en) | Standardized production monitoring method based on image recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |