CN118135435A - Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism - Google Patents
Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism Download PDFInfo
- Publication number
- CN118135435A CN118135435A CN202410197393.3A CN202410197393A CN118135435A CN 118135435 A CN118135435 A CN 118135435A CN 202410197393 A CN202410197393 A CN 202410197393A CN 118135435 A CN118135435 A CN 118135435A
- Authority
- CN
- China
- Prior art keywords
- obstacle
- camera
- point
- under
- radar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000009877 rendering Methods 0.000 title claims abstract description 33
- 230000007246 mechanism Effects 0.000 title claims abstract description 26
- 230000000007 visual effect Effects 0.000 title claims abstract description 8
- 230000003068 static effect Effects 0.000 claims abstract description 55
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 230000008569 process Effects 0.000 claims abstract description 17
- 238000004364 calculation method Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 55
- 239000003086 colorant Substances 0.000 claims description 23
- 238000004040 coloring Methods 0.000 claims description 18
- 238000012216 screening Methods 0.000 claims description 15
- 238000013519 translation Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 5
- 230000004888 barrier function Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 244000309464 bull Species 0.000 claims description 2
- 238000007689 inspection Methods 0.000 abstract description 9
- 230000008447 perception Effects 0.000 abstract description 8
- 230000000694 effects Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Optical Radar Systems And Details Thereof (AREA)
Abstract
Aiming at the problem of large target scale change caused by large flying height change in the unmanned aerial vehicle inspection process, the invention provides a vision and laser radar obstacle recognition rendering method, a system, equipment and a medium based on an attention mechanism, wherein the method comprises the following steps: and a target recognition algorithm based on an attention mechanism is adopted, and the correlation of the obstacles between frames is judged by combining the appearance characteristics, so that static and dynamic obstacles are distinguished. Meanwhile, aiming at the problem of poor positioning accuracy of the obstacle by the visual recognition scheme, lei Dadian cloud information is fused for position calculation. And finally, respectively designing rendering strategies aiming at static and dynamic objects, and superposing and displaying the rendering strategies in a global map to provide richer information for obstacle avoidance in inspection. The invention improves the perception of the obstacle in the flight process of the unmanned aerial vehicle, improves the accuracy of the obstacle, and ensures the safe flight of the unmanned aerial vehicle in the inspection process.
Description
Technical Field
The invention relates to the technical field of unmanned aerial vehicles, in particular to a vision and laser radar obstacle recognition rendering method, system, equipment and medium based on an attention mechanism.
Background
In the prior art, unmanned aerial vehicle autonomous inspection is an important research direction in the unmanned aerial vehicle field. In the unmanned aerial vehicle inspection process, accurate and rapid perception of the obstacle is important. Cameras and lidars are widely used in drone perception. Camera information is rich, but ranging accuracy is poor; the laser radar has high ranging precision, but less point cloud information. Therefore, how to combine the advantages of the two, and to realize accurate and rapid obstacle sensing is a problem to be solved at present.
The current mainstream obstacle recognition method is a visual recognition scheme based on a neural network. At present, two algorithms based on a neural network exist, one is an algorithm which needs to firstly extract a candidate frame, namely a two-stage algorithm, the algorithm firstly extracts a candidate region and then classifies and regresses the candidate region by the neural network, the algorithm comprises RCNN, SPP-net and the like, and the other is an algorithm based on regression, namely a one-stage algorithm, and the regression and detection are realized by directly using a simple convolutional neural network, such as a YOLO series and an SSD series. With the development of a transducer, an excellent effect is obtained by a target recognition algorithm based on the development of the transducer.
But vision solutions do not perform well in locating obstructions. The monocular camera cannot obtain the scale information, and the measurement error of the binocular camera and the RGBD camera is large and the ranging range is small. The laser radar has high ranging precision, but in the aspect of identification, the time for processing the three-dimensional point cloud is long, and the deployment is difficult.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a vision and laser radar obstacle recognition rendering method, a system, equipment and a medium based on an attention mechanism so as to improve the perception capability of an obstacle in unmanned aerial vehicle inspection.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
A visual and laser radar obstacle recognition rendering method based on an attention mechanism comprises the following steps:
step 1, acquiring video by a camera, acquiring depth information by a laser radar, and detecting an obstacle in each video frame by using TPH-YOLOv to obtain a center point of the obstacle in each video frame;
Step 2, matching obstacles among different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;
Step 3, obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;
Step 4, converting three-dimensional coordinates of a central point of the obstacle under a radar system to a global system through a radar odometer to obtain the position of the central point of the obstacle under the global system; according to the track of the obstacle in the video frame, differentiating the position of the central point of the obstacle under the global system to obtain the speed of the obstacle under the global system;
Step 5, judging whether the speed of the obstacle in the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, entering step 6, and if not, judging that the obstacle is a static obstacle, entering step 7;
step 6, screening and coloring static obstacle point clouds from the point clouds acquired by the radar, and updating the static obstacle point cloud color based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;
step 7, screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar, and updating the colors of the dynamic obstacle point clouds based on Bayes; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
In order to optimize the technical scheme, the specific measures adopted further comprise:
Further, the TPH-YOLOv includes four pre-header, transformer encoder, and CBAM modules; transformer encoder include 2 sublayers, the 1 st sublayer is the bull attention layer, and the 2 nd sublayer is the full tie layer, uses residual connection between every sublayer.
Further, the step2 specifically comprises:
the cosine distance dist (a, B) between appearance features of the obstacle in the adjacent frame is calculated as follows:
Wherein A is the appearance characteristic vector of the obstacle of the previous frame, B is the appearance characteristic vector of the obstacle of the next frame, the cosine distance between the two is dist (A, B), and the extraction of the appearance characteristic is obtained through a re-identification network;
obtaining cost matrixes of barriers among different frames by using cosine distances dist (A, B) among appearance features;
finding the minimum element of each row of the cost matrix, subtracting the minimum element of the row from each element of each row of the cost matrix, and subtracting the minimum element of the column from each element of each column of the cost matrix to obtain the cost matrix with a plurality of zero elements; the following steps are then performed:
step a, covering all zero elements along matrix rows and matrix columns by using lines with the least number, and judging whether the number of the lines is equal to the smaller numerical value in the cost matrix rows and columns; if yes, entering a step b; if not, entering the step c;
Step b, obtaining an optimal matching relation of the obstacles between the front frame and the rear frame, and further obtaining the track of the obstacles in the video frame;
Step c, finding the minimum value in the elements which are not covered by the straight line, subtracting the minimum value from each element in the row which is not covered by the straight line completely, adding the minimum value to each element in the column which is covered by the straight line completely, and returning to the step a.
Further, in step 3, the specific process of converting the pixel coordinates of the center point of the obstacle to the camera system by using the camera internal reference matrix and the depth information acquired by the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle in the camera system is as follows:
The pixel coordinates of the center point of the obstacle are converted into the camera system using the following formula:
Where Z represents depth information acquired by the lidar, (u, v) is a pixel coordinate of a center point of the obstacle b i, (X c,Yc,Zc) is a three-dimensional coordinate of the center point of the obstacle bi under the camera system, Is an internal reference matrix of the camera, f x,fy is the focal length in the x-axis direction and the focal length in the y-axis direction respectively, and c x,cy is the origin offset in the x-axis and the y-axis;
The specific process of converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing the rotation matrix and the translation vector between the camera and the laser radar is as follows:
three-dimensional coordinates of the center point of the obstacle under the camera system are converted into those under the radar system using:
In the formula, T is composed of a rotation matrix R and a translation vector T of 3X3, and (X l,Yl,Zl) represents three-dimensional coordinates of the center point of the obstacle b i in the radar system.
Further, in step4, the specific formula used for obtaining the speed of the obstacle under the global system by differentiating the position of the central point of the obstacle under the global system is as follows:
Where v i denotes the speed of the obstacle under the global system, Gpi,t+Lt denotes the position of the center point of the obstacle under the global system at time t+lt, Gpi,t denotes the position of the center point of the obstacle under the global system at time t, and Lt denotes the time interval.
Further, the step 6 specifically includes:
Screening static obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;
Projecting the static obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the static obstacle point cloud, and determining the color of the static obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;
The color of the static obstacle point cloud is updated based on Bayes, and the specific formula is as follows:
In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;
the color of the static obstacle point cloud at the current moment is represented, cs represents the color of the static obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;
And superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame.
Further, the step 7 specifically comprises:
selecting dynamic obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;
Projecting the dynamic obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the dynamic obstacle point cloud, and determining the color of the dynamic obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;
the color of the dynamic obstacle point cloud is updated based on Bayes, and the specific formula is as follows:
In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;
The color of the dynamic obstacle point cloud at the current moment is represented, cs represents the color of the dynamic obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;
Each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
The invention also provides a vision and laser radar obstacle recognition rendering system based on an attention mechanism, which comprises the following steps:
the camera is used for collecting videos;
The laser radar is used for collecting depth information;
the obstacle detection module is used for detecting the obstacle in each frame of video frame by using the TPH-YOLOv to obtain the center point of the obstacle in each frame of video frame;
The obstacle matching module is used for matching obstacles between different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;
The coordinate conversion module is used for obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;
The radar odometer is used for converting three-dimensional coordinates of a central point of the obstacle under the radar system to the global system to obtain the position of the central point of the obstacle under the global system;
the speed calculation module is used for differentiating the central point position of the obstacle under the global system according to the track of the obstacle in the video frame to obtain the speed of the obstacle under the global system;
the judging module is used for judging whether the speed of the obstacle under the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, and if not, judging that the obstacle is a static obstacle;
The static obstacle coloring module is used for screening and coloring static obstacle point clouds from the point clouds acquired by the radar and updating the colors of the static obstacle point clouds based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;
The dynamic obstacle coloring module is used for screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar and updating the colors of the dynamic obstacle point clouds based on Bayesian; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
The invention also proposes an electronic device comprising: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the vision and laser radar obstacle recognition rendering method based on the attention mechanism.
The invention also proposes a computer-readable storage medium storing a computer program that causes a computer to execute the vision and lidar obstacle recognition rendering method based on an attention mechanism as described above.
The beneficial effects of the invention are as follows:
The obstacle recognition algorithm adds a attention mechanism into a traditional vision perception algorithm based on a convolutional neural network, and performs three-dimensional positioning on the obstacle by combining laser radar depth information. Meanwhile, the association between the obstacles is judged by using the appearance characteristics, and the dynamic obstacle and the static obstacle are distinguished. In addition, the invention adopts different rendering strategies to the dynamic and static barriers to be displayed on the map in a superimposed way. The invention improves the perception of the obstacle in the flight process of the unmanned aerial vehicle, improves the accuracy of the obstacle, and ensures the safe flight of the unmanned aerial vehicle in the inspection process.
Drawings
FIG. 1 is a flow chart of a vision and lidar obstacle recognition rendering method based on an attention mechanism according to the present invention;
FIG. 2 is a graph showing the recognition effect of TPH-YOLOv.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings.
In an embodiment, the present invention provides a visual and lidar obstacle recognition rendering method based on an attention mechanism, and a flowchart of the method is shown in fig. 1, and the method includes the following steps:
step 1, acquiring video by a camera, acquiring depth information by a laser radar, and detecting an obstacle in each video frame by using TPH-YOLOv to obtain a center point of the obstacle in each video frame;
The unmanned aerial vehicle inspection comprises a plurality of very small targets, and a prediction head for detecting tiny objects is added to the TPH-YOLOv. The 4-head configuration, in combination with YOLOv other 3 pre-heads, can mitigate the negative effects of dramatic target scale changes. The added pre-measurement head is generated by a low-level high-resolution characteristic diagram and is more sensitive to tiny objects.
TPH-YOLOv replaces some of the convolutions and CSP bottleneck blocks in the YOLOv master with Transformer encoder blocks, which can capture global information and rich context information. Each Transformerencoder block contains 2 sublayers. The 1 st sub-layer is a multi-head attention layer, and the 2 nd sub-layer is a full connection layer. A residual connection is used between each sub-layer. Block Transformer encoder increases the ability to capture different local information.
Meanwhile, TPH-YOLOv also adds CBAM module in the architecture. It is a lightweight module that can be plug and play into CNN architecture and can be trained in an end-to-end fashion. For a given feature map CBAM will infer the attention map sequentially along two independent dimensions of the channel and space, and then multiply the attention map with the input feature map to perform adaptive feature refinement. In images captured by unmanned aerial vehicles, the large coverage area always contains interfering geographical elements. The attention area may be extracted using CBAM to help the TPH-YOLOv reject this information, focusing on useful target objects. The recognition effect of TPH-YOLOv is shown in FIG. 2.
Step 2, matching obstacles among different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame; the method comprises the following steps:
the cosine distance dist (a, B) between appearance features of the obstacle in the adjacent frame is calculated as follows:
Wherein A is the appearance characteristic vector of the obstacle of the previous frame, B is the appearance characteristic vector of the obstacle of the next frame, the cosine distance between the two is dist (A, B), and the extraction of the appearance characteristic is obtained through a re-identification network; the re-identification network structure is shown in table 1.
TABLE 1
Obtaining cost matrixes of barriers among different frames by using cosine distances dist (A, B) among appearance features;
finding the minimum element of each row of the cost matrix, subtracting the minimum element of the row from each element of each row of the cost matrix, and subtracting the minimum element of the column from each element of each column of the cost matrix to obtain the cost matrix with a plurality of zero elements; the following steps are then performed:
step a, covering all zero elements along matrix rows and matrix columns by using lines with the least number, and judging whether the number of the lines is equal to the smaller numerical value in the cost matrix rows and columns; if yes, entering a step b; if not, entering the step c;
Step b, obtaining an optimal matching relation of the obstacles between the front frame and the rear frame, and further obtaining the track of the obstacles in the video frame;
Step c, finding the minimum value in the elements which are not covered by the straight line, subtracting the minimum value from each element in the row which is not covered by the straight line completely, adding the minimum value to each element in the column which is covered by the straight line completely, and returning to the step a.
Step 3, obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; in this embodiment, the center coordinates of the detection frame are selected as the pixel coordinates of the obstacle on the image. Converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; the pixel coordinates of the center point of the obstacle are converted into the camera system using the following formula:
Where Z represents depth information acquired by the lidar, (u, v) is a pixel coordinate of a center point of the obstacle b i, (X c,Yc,Zc) is a three-dimensional coordinate of the center point of the obstacle bi under the camera system, Is an internal camera reference matrix, f x,fy is a focal length in the x-axis direction and a focal length in the y-axis direction, and c x,cy is an origin offset in the x-axis and the y-axis, respectively.
Converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system; three-dimensional coordinates of the center point of the obstacle under the camera system are converted into those under the radar system using:
In the formula, T is composed of a rotation matrix R and a translation vector T of 3X3, and (X l,Yl,Zl) represents three-dimensional coordinates of the center point of the obstacle b i in the radar system.
Step 4, converting three-dimensional coordinates of the central point of the obstacle under the radar system to the global system through the radar odometer, so that the central point of the obstacle under the global system is conveniently displayed in a superimposed mode on the global map, and the central point position of the obstacle under the global system is obtained; according to the track of the obstacle in the video frame, differentiating the position of the central point of the obstacle under the global system to obtain the speed of the obstacle under the global system; the formula used is:
Where v i denotes the speed of the obstacle under the global system, Gpi,t+Lt denotes the position of the center point of the obstacle under the global system at time t+lt, Gpi,t denotes the position of the center point of the obstacle under the global system at time t, and Lt denotes the time interval.
Step 5, judging whether the speed of the obstacle in the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, entering step 6, and if not, judging that the obstacle is a static obstacle, entering step 7;
Step 6, screening and coloring static obstacle point clouds from the point clouds acquired by the radar, and updating the static obstacle point cloud color based on Bayes; most three-dimensional reconstruction algorithms tend to render all map point clouds when rendering maps, occupying a large amount of memory and computing resources. The rendering algorithm is not suitable for the unmanned aerial vehicle platform, so that the method only renders the point cloud for identifying the obstacle, and reduces the memory and time consumption. In the unmanned aerial vehicle inspection process, the color of the same obstacle photographed at different moments and different angles is often different, and in order to keep the color consistency, the invention updates and maintains the color of the point cloud by using Bayes. After updating the color of the obstacle point cloud based on Bayes, superposing and displaying the color of the static obstacle point cloud of the previous frame on the basis of the current frame; the method comprises the following steps:
Screening static obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;
Projecting the static obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the static obstacle point cloud, and determining the color of the static obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;
The color of the static obstacle point cloud is updated based on Bayes, and the specific formula is as follows:
In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;
the color of the static obstacle point cloud at the current moment is represented, cs represents the color of the static obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;
And superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame.
Step 7, screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar, and updating the colors of the dynamic obstacle point clouds based on Bayes; each frame refreshes the cloud of upper color points displaying the dynamic obstacle. The method comprises the following steps:
selecting dynamic obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;
Projecting the dynamic obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the dynamic obstacle point cloud, and determining the color of the dynamic obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;
the color of the dynamic obstacle point cloud is updated based on Bayes, and the specific formula is as follows:
In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;
The color of the dynamic obstacle point cloud at the current moment is represented, cs represents the color of the dynamic obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;
Each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
In another embodiment, the present invention proposes a vision and lidar obstacle recognition rendering system based on an attention mechanism corresponding to the method of the first embodiment, comprising:
the camera is used for collecting videos;
The laser radar is used for collecting depth information;
the obstacle detection module is used for detecting the obstacle in each frame of video frame by using the TPH-YOLOv to obtain the center point of the obstacle in each frame of video frame;
The obstacle matching module is used for matching obstacles between different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;
The coordinate conversion module is used for obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;
The radar odometer is used for converting three-dimensional coordinates of a central point of the obstacle under the radar system to the global system to obtain the position of the central point of the obstacle under the global system;
the speed calculation module is used for differentiating the central point position of the obstacle under the global system according to the track of the obstacle in the video frame to obtain the speed of the obstacle under the global system;
the judging module is used for judging whether the speed of the obstacle under the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, and if not, judging that the obstacle is a static obstacle;
The static obstacle coloring module is used for screening and coloring static obstacle point clouds from the point clouds acquired by the radar and updating the colors of the static obstacle point clouds based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;
The dynamic obstacle coloring module is used for screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar and updating the colors of the dynamic obstacle point clouds based on Bayesian; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
The implementation manner of each module and the function of the module in the system are completely consistent with the first embodiment, so that the description is omitted here.
In another embodiment, the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the vision and laser radar obstacle recognition rendering method based on the attention mechanism.
In another embodiment, the present invention proposes a computer-readable storage medium storing a computer program that causes a computer to execute the vision and lidar obstacle recognition rendering method based on the attention mechanism as described in the first embodiment.
Experiment verification
In order to verify the effectiveness of the algorithm proposed by the present invention, its performance is tested in a simulation environment. For visual obstacle recognition, the mAPs of other recognition algorithms are compared.
Table 2 comparison of target identification performance
From the table, the recognition effect of the TPH-YOLOv adopted by the invention leads other algorithms, and the perception capability is stronger.
For obstacle positioning, the relative positioning errors of the binocular camera, the RGBD camera and the laser radar assistance are compared, and the errors are Euclidean distances between a true value and a measured value.
Table 3 comparative relative positioning properties
From the table, the vision/laser radar adopted by the invention has the highest relative positioning precision.
For map rendering, the data size is tested when the map rendering is transmitted with the local target rendering, and the result shows that the data size is reduced by 65.9%, and the data size can be reduced by 80% when the obstacles are fewer.
Experiments show that the vision and laser radar obstacle recognition rendering method based on the attention mechanism can improve the perception capability of the unmanned aerial vehicle on the obstacle in flight, reduce the transmission data volume and enable the unmanned aerial vehicle to patrol more safely and stably.
In the disclosed embodiments, a computer storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer storage medium would include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.
Claims (10)
1. A visual and laser radar obstacle recognition rendering method based on an attention mechanism is characterized by comprising the following steps:
step 1, acquiring video by a camera, acquiring depth information by a laser radar, and detecting an obstacle in each video frame by using TPH-YOLOv to obtain a center point of the obstacle in each video frame;
Step 2, matching obstacles among different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;
Step 3, obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;
Step 4, converting three-dimensional coordinates of a central point of the obstacle under a radar system to a global system through a radar odometer to obtain the position of the central point of the obstacle under the global system; according to the track of the obstacle in the video frame, differentiating the position of the central point of the obstacle under the global system to obtain the speed of the obstacle under the global system;
Step 5, judging whether the speed of the obstacle in the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, entering step 6, and if not, judging that the obstacle is a static obstacle, entering step 7;
step 6, screening and coloring static obstacle point clouds from the point clouds acquired by the radar, and updating the static obstacle point cloud color based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;
step 7, screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar, and updating the colors of the dynamic obstacle point clouds based on Bayes; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
2. The attention-based vision and lidar obstacle recognition rendering method of claim 1, wherein the TPH-YOLOv includes four pre-measurement heads, transformerencoder, and CBAM modules; transformer encoder include 2 sublayers, the 1 st sublayer is the bull attention layer, and the 2 nd sublayer is the full tie layer, uses residual connection between every sublayer.
3. The vision and lidar obstacle recognition rendering method based on the attention mechanism as claimed in claim 1, wherein the step 2 is specifically:
the cosine distance dist (a, B) between appearance features of the obstacle in the adjacent frame is calculated as follows:
Wherein A is the appearance characteristic vector of the obstacle of the previous frame, B is the appearance characteristic vector of the obstacle of the next frame, the cosine distance between the two is dist (A, B), and the extraction of the appearance characteristic is obtained through a re-identification network;
obtaining cost matrixes of barriers among different frames by using cosine distances dist (A, B) among appearance features;
finding the minimum element of each row of the cost matrix, subtracting the minimum element of the row from each element of each row of the cost matrix, and subtracting the minimum element of the column from each element of each column of the cost matrix to obtain the cost matrix with a plurality of zero elements; the following steps are then performed:
step a, covering all zero elements along matrix rows and matrix columns by using lines with the least number, and judging whether the number of the lines is equal to the smaller numerical value in the cost matrix rows and columns; if yes, entering a step b; if not, entering the step c;
Step b, obtaining an optimal matching relation of the obstacles between the front frame and the rear frame, and further obtaining the track of the obstacles in the video frame;
Step c, finding the minimum value in the elements which are not covered by the straight line, subtracting the minimum value from each element in the row which is not covered by the straight line completely, adding the minimum value to each element in the column which is covered by the straight line completely, and returning to the step a.
4. The vision and lidar obstacle recognition and rendering method based on the attention mechanism as set forth in claim 1, wherein in step 3, the specific process of converting the pixel coordinates of the center point of the obstacle to the camera system by using the camera internal reference matrix and the depth information collected by the lidar to obtain the three-dimensional coordinates of the center point of the obstacle in the camera system is as follows:
The pixel coordinates of the center point of the obstacle are converted into the camera system using the following formula:
Where Z represents depth information acquired by the lidar, (u, v) is a pixel coordinate of a center point of the obstacle b i, (X c,Yc,Zc) is a three-dimensional coordinate of a center point of the obstacle b i under the camera system, Is an internal reference matrix of the camera, f x,fy is the focal length in the x-axis direction and the focal length in the y-axis direction respectively, and c x,cy is the origin offset in the x-axis and the y-axis;
The specific process of converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing the rotation matrix and the translation vector between the camera and the laser radar is as follows:
three-dimensional coordinates of the center point of the obstacle under the camera system are converted into those under the radar system using:
In the formula, T is composed of a rotation matrix R and a translation vector T of 3X3, and (X l,Yl,Zl) represents three-dimensional coordinates of the center point of the obstacle b i in the radar system.
5. The vision and lidar obstacle recognition and rendering method based on the attention mechanism as set forth in claim 1, wherein in step 4, the formula specifically used for differentiating the center point position of the obstacle under the global system to obtain the obstacle speed under the global system is:
Where v i denotes the speed of the obstacle under the global system, gp i,t+Dt denotes the position of the center point of the obstacle under the global system at time t+dt, Gpi,t denotes the position of the center point of the obstacle under the global system at time t, and Dt denotes the time interval.
6. The vision and lidar obstacle recognition rendering method based on the attention mechanism as claimed in claim 1, wherein the step 6 is specifically:
Screening static obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;
Projecting the static obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the static obstacle point cloud, and determining the color of the static obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;
The color of the static obstacle point cloud is updated based on Bayes, and the specific formula is as follows:
In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;
Representing the color of the static obstacle point cloud at the current moment, c s representing the color of the static obstacle point cloud at the previous moment, gs representing the pixel value of the observed color;
And superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame.
7. The vision and lidar obstacle recognition rendering method based on the attention mechanism of claim 1, wherein the step 7 specifically comprises:
selecting dynamic obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;
Projecting the dynamic obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the dynamic obstacle point cloud, and determining the color of the dynamic obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;
the color of the dynamic obstacle point cloud is updated based on Bayes, and the specific formula is as follows:
In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;
The color of the dynamic obstacle point cloud at the current moment is represented, cs represents the color of the dynamic obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;
Each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
8. A vision and lidar obstacle recognition rendering system based on an attention mechanism, comprising: the camera is used for collecting videos;
The laser radar is used for collecting depth information;
the obstacle detection module is used for detecting the obstacle in each frame of video frame by using the TPH-YOLOv to obtain the center point of the obstacle in each frame of video frame;
The obstacle matching module is used for matching obstacles between different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;
The coordinate conversion module is used for obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;
The radar odometer is used for converting three-dimensional coordinates of a central point of the obstacle under the radar system to the global system to obtain the position of the central point of the obstacle under the global system;
the speed calculation module is used for differentiating the central point position of the obstacle under the global system according to the track of the obstacle in the video frame to obtain the speed of the obstacle under the global system;
the judging module is used for judging whether the speed of the obstacle under the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, and if not, judging that the obstacle is a static obstacle;
The static obstacle coloring module is used for screening and coloring static obstacle point clouds from the point clouds acquired by the radar and updating the colors of the static obstacle point clouds based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;
The dynamic obstacle coloring module is used for screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar and updating the colors of the dynamic obstacle point clouds based on Bayesian; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.
9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the vision and lidar obstacle recognition rendering method based on an attention mechanism as claimed in any of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium, characterized by: a computer program is stored which causes a computer to perform the vision and lidar obstacle recognition rendering method based on an attention mechanism as claimed in any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410197393.3A CN118135435A (en) | 2024-02-22 | 2024-02-22 | Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410197393.3A CN118135435A (en) | 2024-02-22 | 2024-02-22 | Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118135435A true CN118135435A (en) | 2024-06-04 |
Family
ID=91243609
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410197393.3A Pending CN118135435A (en) | 2024-02-22 | 2024-02-22 | Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118135435A (en) |
-
2024
- 2024-02-22 CN CN202410197393.3A patent/CN118135435A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI722355B (en) | Systems and methods for correcting a high-definition map based on detection of obstructing objects | |
CN111436216B (en) | Method and system for color point cloud generation | |
Levinson et al. | Traffic light mapping, localization, and state detection for autonomous vehicles | |
CN112232275B (en) | Obstacle detection method, system, equipment and storage medium based on binocular recognition | |
CN109871739B (en) | Automatic target detection and space positioning method for mobile station based on YOLO-SIOCTL | |
US11866056B2 (en) | Ballistic estimation of vehicle data | |
CN114495064A (en) | Monocular depth estimation-based vehicle surrounding obstacle early warning method | |
CN114089330B (en) | Indoor mobile robot glass detection and map updating method based on depth image restoration | |
CN114325634A (en) | Method for extracting passable area in high-robustness field environment based on laser radar | |
WO2023222671A1 (en) | Position determination of a vehicle using image segmentations | |
CN118135435A (en) | Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism | |
Cheng et al. | G-Fusion: LiDAR and Camera Feature Fusion on the Ground Voxel Space | |
Rasyidy et al. | A Framework for Road Boundary Detection based on Camera-LIDAR Fusion in World Coordinate System and Its Performance Evaluation Using Carla Simulator | |
CN115909274A (en) | Automatic driving-oriented dynamic obstacle detection method | |
CN118274833A (en) | Non-cooperative target track estimation method based on binocular panoramic event camera | |
CN113447032A (en) | Positioning method, positioning device, electronic equipment and storage medium | |
CN118225078A (en) | Vehicle positioning method and device, vehicle and storage medium | |
CN117830341A (en) | Method for removing dynamic trace of point cloud map on line | |
CN117727011A (en) | Target identification method, device, equipment and storage medium based on image fusion | |
CN114167871A (en) | Obstacle detection method and device, electronic equipment and storage medium | |
CN117576199A (en) | Driving scene visual reconstruction method, device, equipment and medium | |
CN117493823A (en) | Object pose quick sensing method and system combined with deep learning algorithm | |
CN116721394A (en) | Monocular three-dimensional target detection method, model training method and corresponding device | |
CN117191051A (en) | Method and equipment for realizing autonomous navigation and target identification of lunar surface detector | |
CN117611800A (en) | YOLO-based target grounding point detection and ranging method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |