CN118135435A

CN118135435A - Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism

Info

Publication number: CN118135435A
Application number: CN202410197393.3A
Authority: CN
Inventors: 黄郑; 蒋承伶; 姜海波; 唐一铭; 沈超; 马洲俊; 谭晶; 王红星; 朱洁; 顾徐
Original assignee: State Grid Jiangsu Electric Power Co Ltd; Jiangsu Fangtian Power Technology Co Ltd; Wuxi Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Jiangsu Electric Power Co Ltd; Jiangsu Fangtian Power Technology Co Ltd; Wuxi Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2024-02-22
Filing date: 2024-02-22
Publication date: 2024-06-04

Abstract

Aiming at the problem of large target scale change caused by large flying height change in the unmanned aerial vehicle inspection process, the invention provides a vision and laser radar obstacle recognition rendering method, a system, equipment and a medium based on an attention mechanism, wherein the method comprises the following steps: and a target recognition algorithm based on an attention mechanism is adopted, and the correlation of the obstacles between frames is judged by combining the appearance characteristics, so that static and dynamic obstacles are distinguished. Meanwhile, aiming at the problem of poor positioning accuracy of the obstacle by the visual recognition scheme, lei Dadian cloud information is fused for position calculation. And finally, respectively designing rendering strategies aiming at static and dynamic objects, and superposing and displaying the rendering strategies in a global map to provide richer information for obstacle avoidance in inspection. The invention improves the perception of the obstacle in the flight process of the unmanned aerial vehicle, improves the accuracy of the obstacle, and ensures the safe flight of the unmanned aerial vehicle in the inspection process.

Description

Visual and laser radar obstacle recognition rendering method, system, equipment and medium based on attention mechanism

Technical Field

The invention relates to the technical field of unmanned aerial vehicles, in particular to a vision and laser radar obstacle recognition rendering method, system, equipment and medium based on an attention mechanism.

Background

In the prior art, unmanned aerial vehicle autonomous inspection is an important research direction in the unmanned aerial vehicle field. In the unmanned aerial vehicle inspection process, accurate and rapid perception of the obstacle is important. Cameras and lidars are widely used in drone perception. Camera information is rich, but ranging accuracy is poor; the laser radar has high ranging precision, but less point cloud information. Therefore, how to combine the advantages of the two, and to realize accurate and rapid obstacle sensing is a problem to be solved at present.

The current mainstream obstacle recognition method is a visual recognition scheme based on a neural network. At present, two algorithms based on a neural network exist, one is an algorithm which needs to firstly extract a candidate frame, namely a two-stage algorithm, the algorithm firstly extracts a candidate region and then classifies and regresses the candidate region by the neural network, the algorithm comprises RCNN, SPP-net and the like, and the other is an algorithm based on regression, namely a one-stage algorithm, and the regression and detection are realized by directly using a simple convolutional neural network, such as a YOLO series and an SSD series. With the development of a transducer, an excellent effect is obtained by a target recognition algorithm based on the development of the transducer.

But vision solutions do not perform well in locating obstructions. The monocular camera cannot obtain the scale information, and the measurement error of the binocular camera and the RGBD camera is large and the ranging range is small. The laser radar has high ranging precision, but in the aspect of identification, the time for processing the three-dimensional point cloud is long, and the deployment is difficult.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a vision and laser radar obstacle recognition rendering method, a system, equipment and a medium based on an attention mechanism so as to improve the perception capability of an obstacle in unmanned aerial vehicle inspection.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

A visual and laser radar obstacle recognition rendering method based on an attention mechanism comprises the following steps:

step 1, acquiring video by a camera, acquiring depth information by a laser radar, and detecting an obstacle in each video frame by using TPH-YOLOv to obtain a center point of the obstacle in each video frame;

Step 2, matching obstacles among different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;

Step 3, obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;

Step 4, converting three-dimensional coordinates of a central point of the obstacle under a radar system to a global system through a radar odometer to obtain the position of the central point of the obstacle under the global system; according to the track of the obstacle in the video frame, differentiating the position of the central point of the obstacle under the global system to obtain the speed of the obstacle under the global system;

Step 5, judging whether the speed of the obstacle in the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, entering step 6, and if not, judging that the obstacle is a static obstacle, entering step 7;

step 6, screening and coloring static obstacle point clouds from the point clouds acquired by the radar, and updating the static obstacle point cloud color based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;

step 7, screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar, and updating the colors of the dynamic obstacle point clouds based on Bayes; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.

In order to optimize the technical scheme, the specific measures adopted further comprise:

Further, the TPH-YOLOv includes four pre-header, transformer encoder, and CBAM modules; transformer encoder include 2 sublayers, the 1 st sublayer is the bull attention layer, and the 2 nd sublayer is the full tie layer, uses residual connection between every sublayer.

Further, the step2 specifically comprises:

the cosine distance dist (a, B) between appearance features of the obstacle in the adjacent frame is calculated as follows:

Wherein A is the appearance characteristic vector of the obstacle of the previous frame, B is the appearance characteristic vector of the obstacle of the next frame, the cosine distance between the two is dist (A, B), and the extraction of the appearance characteristic is obtained through a re-identification network;

obtaining cost matrixes of barriers among different frames by using cosine distances dist (A, B) among appearance features;

finding the minimum element of each row of the cost matrix, subtracting the minimum element of the row from each element of each row of the cost matrix, and subtracting the minimum element of the column from each element of each column of the cost matrix to obtain the cost matrix with a plurality of zero elements; the following steps are then performed:

step a, covering all zero elements along matrix rows and matrix columns by using lines with the least number, and judging whether the number of the lines is equal to the smaller numerical value in the cost matrix rows and columns; if yes, entering a step b; if not, entering the step c;

Step b, obtaining an optimal matching relation of the obstacles between the front frame and the rear frame, and further obtaining the track of the obstacles in the video frame;

Step c, finding the minimum value in the elements which are not covered by the straight line, subtracting the minimum value from each element in the row which is not covered by the straight line completely, adding the minimum value to each element in the column which is covered by the straight line completely, and returning to the step a.

Further, in step 3, the specific process of converting the pixel coordinates of the center point of the obstacle to the camera system by using the camera internal reference matrix and the depth information acquired by the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle in the camera system is as follows:

The pixel coordinates of the center point of the obstacle are converted into the camera system using the following formula:

Where Z represents depth information acquired by the lidar, (u, v) is a pixel coordinate of a center point of the obstacle b _i, (X _c,Y_c,Z_c) is a three-dimensional coordinate of the center point of the obstacle bi under the camera system, Is an internal reference matrix of the camera, f _x,f_y is the focal length in the x-axis direction and the focal length in the y-axis direction respectively, and c _x,c_y is the origin offset in the x-axis and the y-axis;

The specific process of converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing the rotation matrix and the translation vector between the camera and the laser radar is as follows:

three-dimensional coordinates of the center point of the obstacle under the camera system are converted into those under the radar system using:

In the formula, T is composed of a rotation matrix R and a translation vector T of 3X3, and (X _l,Y_l,Z_l) represents three-dimensional coordinates of the center point of the obstacle b _i in the radar system.

Further, in step4, the specific formula used for obtaining the speed of the obstacle under the global system by differentiating the position of the central point of the obstacle under the global system is as follows:

Where v _i denotes the speed of the obstacle under the global system, ^Gp_i,t+Lt denotes the position of the center point of the obstacle under the global system at time t+lt, ^Gp_i,t denotes the position of the center point of the obstacle under the global system at time t, and Lt denotes the time interval.

Further, the step 6 specifically includes:

Screening static obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;

Projecting the static obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the static obstacle point cloud, and determining the color of the static obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;

The color of the static obstacle point cloud is updated based on Bayes, and the specific formula is as follows:

In the method, in the process of the invention, Color covariance matrix representing current moment,/>Representing the color covariance matrix of the previous instant,/>Is the variance of white noise,/>For observation time interval,/>Covariance of observed colors;

the color of the static obstacle point cloud at the current moment is represented, cs represents the color of the static obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;

And superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame.

Further, the step 7 specifically comprises:

selecting dynamic obstacle point clouds in a set depth range from the point clouds acquired by the radar, wherein the depth range is determined according to the types of the obstacles;

Projecting the dynamic obstacle point cloud under a pixel coordinate system through the external participation camera of the camera and the internal participation camera of the laser radar to obtain a pixel coordinate corresponding to the dynamic obstacle point cloud, and determining the color of the dynamic obstacle point cloud according to the color of the pixel coordinate in an image acquired by the camera;

the color of the dynamic obstacle point cloud is updated based on Bayes, and the specific formula is as follows:

The color of the dynamic obstacle point cloud at the current moment is represented, cs represents the color of the dynamic obstacle point cloud at the previous moment, and gs represents the pixel value of the observed color;

Each frame refreshes the cloud of upper color points displaying the dynamic obstacle.

The invention also provides a vision and laser radar obstacle recognition rendering system based on an attention mechanism, which comprises the following steps:

the camera is used for collecting videos;

The laser radar is used for collecting depth information;

the obstacle detection module is used for detecting the obstacle in each frame of video frame by using the TPH-YOLOv to obtain the center point of the obstacle in each frame of video frame;

The obstacle matching module is used for matching obstacles between different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame;

The coordinate conversion module is used for obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system;

The radar odometer is used for converting three-dimensional coordinates of a central point of the obstacle under the radar system to the global system to obtain the position of the central point of the obstacle under the global system;

the speed calculation module is used for differentiating the central point position of the obstacle under the global system according to the track of the obstacle in the video frame to obtain the speed of the obstacle under the global system;

the judging module is used for judging whether the speed of the obstacle under the global system is greater than a set threshold value, if so, judging that the obstacle is a dynamic obstacle, and if not, judging that the obstacle is a static obstacle;

The static obstacle coloring module is used for screening and coloring static obstacle point clouds from the point clouds acquired by the radar and updating the colors of the static obstacle point clouds based on Bayes; superposing and displaying the colors of the static obstacle point clouds of the previous frame on the basis of the current frame;

The dynamic obstacle coloring module is used for screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar and updating the colors of the dynamic obstacle point clouds based on Bayesian; each frame refreshes the cloud of upper color points displaying the dynamic obstacle.

The invention also proposes an electronic device comprising: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the vision and laser radar obstacle recognition rendering method based on the attention mechanism.

The invention also proposes a computer-readable storage medium storing a computer program that causes a computer to execute the vision and lidar obstacle recognition rendering method based on an attention mechanism as described above.

The beneficial effects of the invention are as follows:

The obstacle recognition algorithm adds a attention mechanism into a traditional vision perception algorithm based on a convolutional neural network, and performs three-dimensional positioning on the obstacle by combining laser radar depth information. Meanwhile, the association between the obstacles is judged by using the appearance characteristics, and the dynamic obstacle and the static obstacle are distinguished. In addition, the invention adopts different rendering strategies to the dynamic and static barriers to be displayed on the map in a superimposed way. The invention improves the perception of the obstacle in the flight process of the unmanned aerial vehicle, improves the accuracy of the obstacle, and ensures the safe flight of the unmanned aerial vehicle in the inspection process.

Drawings

FIG. 1 is a flow chart of a vision and lidar obstacle recognition rendering method based on an attention mechanism according to the present invention;

FIG. 2 is a graph showing the recognition effect of TPH-YOLOv.

Detailed Description

The invention will now be described in further detail with reference to the accompanying drawings.

In an embodiment, the present invention provides a visual and lidar obstacle recognition rendering method based on an attention mechanism, and a flowchart of the method is shown in fig. 1, and the method includes the following steps:

The unmanned aerial vehicle inspection comprises a plurality of very small targets, and a prediction head for detecting tiny objects is added to the TPH-YOLOv. The 4-head configuration, in combination with YOLOv other 3 pre-heads, can mitigate the negative effects of dramatic target scale changes. The added pre-measurement head is generated by a low-level high-resolution characteristic diagram and is more sensitive to tiny objects.

TPH-YOLOv replaces some of the convolutions and CSP bottleneck blocks in the YOLOv master with Transformer encoder blocks, which can capture global information and rich context information. Each Transformerencoder block contains 2 sublayers. The 1 st sub-layer is a multi-head attention layer, and the 2 nd sub-layer is a full connection layer. A residual connection is used between each sub-layer. Block Transformer encoder increases the ability to capture different local information.

Meanwhile, TPH-YOLOv also adds CBAM module in the architecture. It is a lightweight module that can be plug and play into CNN architecture and can be trained in an end-to-end fashion. For a given feature map CBAM will infer the attention map sequentially along two independent dimensions of the channel and space, and then multiply the attention map with the input feature map to perform adaptive feature refinement. In images captured by unmanned aerial vehicles, the large coverage area always contains interfering geographical elements. The attention area may be extracted using CBAM to help the TPH-YOLOv reject this information, focusing on useful target objects. The recognition effect of TPH-YOLOv is shown in FIG. 2.

Step 2, matching obstacles among different frames by adopting a Hungary algorithm to obtain the track of the obstacle in the video frame; the method comprises the following steps:

Wherein A is the appearance characteristic vector of the obstacle of the previous frame, B is the appearance characteristic vector of the obstacle of the next frame, the cosine distance between the two is dist (A, B), and the extraction of the appearance characteristic is obtained through a re-identification network; the re-identification network structure is shown in table 1.

TABLE 1

Step 3, obtaining pixel coordinates of the center point of the obstacle according to the position of the center point of the obstacle in the video frame; in this embodiment, the center coordinates of the detection frame are selected as the pixel coordinates of the obstacle on the image. Converting pixel coordinates of a central point of the obstacle to a camera system by utilizing the camera internal reference matrix and depth information acquired by the laser radar to obtain three-dimensional coordinates of the central point of the obstacle under the camera system; the pixel coordinates of the center point of the obstacle are converted into the camera system using the following formula:

Where Z represents depth information acquired by the lidar, (u, v) is a pixel coordinate of a center point of the obstacle b _i, (X _c,Y_c,Z_c) is a three-dimensional coordinate of the center point of the obstacle bi under the camera system, Is an internal camera reference matrix, f _x,f_y is a focal length in the x-axis direction and a focal length in the y-axis direction, and c _x,c_y is an origin offset in the x-axis and the y-axis, respectively.

Converting the three-dimensional coordinates of the center point of the obstacle under the camera system to the radar system by utilizing a rotation matrix and a translation vector between the camera and the laser radar to obtain the three-dimensional coordinates of the center point of the obstacle under the radar system; three-dimensional coordinates of the center point of the obstacle under the camera system are converted into those under the radar system using:

Step 4, converting three-dimensional coordinates of the central point of the obstacle under the radar system to the global system through the radar odometer, so that the central point of the obstacle under the global system is conveniently displayed in a superimposed mode on the global map, and the central point position of the obstacle under the global system is obtained; according to the track of the obstacle in the video frame, differentiating the position of the central point of the obstacle under the global system to obtain the speed of the obstacle under the global system; the formula used is:

Step 6, screening and coloring static obstacle point clouds from the point clouds acquired by the radar, and updating the static obstacle point cloud color based on Bayes; most three-dimensional reconstruction algorithms tend to render all map point clouds when rendering maps, occupying a large amount of memory and computing resources. The rendering algorithm is not suitable for the unmanned aerial vehicle platform, so that the method only renders the point cloud for identifying the obstacle, and reduces the memory and time consumption. In the unmanned aerial vehicle inspection process, the color of the same obstacle photographed at different moments and different angles is often different, and in order to keep the color consistency, the invention updates and maintains the color of the point cloud by using Bayes. After updating the color of the obstacle point cloud based on Bayes, superposing and displaying the color of the static obstacle point cloud of the previous frame on the basis of the current frame; the method comprises the following steps:

Step 7, screening and coloring dynamic obstacle point clouds from the point clouds acquired by the radar, and updating the colors of the dynamic obstacle point clouds based on Bayes; each frame refreshes the cloud of upper color points displaying the dynamic obstacle. The method comprises the following steps:

In another embodiment, the present invention proposes a vision and lidar obstacle recognition rendering system based on an attention mechanism corresponding to the method of the first embodiment, comprising:

the camera is used for collecting videos;

The laser radar is used for collecting depth information;

The implementation manner of each module and the function of the module in the system are completely consistent with the first embodiment, so that the description is omitted here.

In another embodiment, the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the vision and laser radar obstacle recognition rendering method based on the attention mechanism.

In another embodiment, the present invention proposes a computer-readable storage medium storing a computer program that causes a computer to execute the vision and lidar obstacle recognition rendering method based on the attention mechanism as described in the first embodiment.

Experiment verification

In order to verify the effectiveness of the algorithm proposed by the present invention, its performance is tested in a simulation environment. For visual obstacle recognition, the mAPs of other recognition algorithms are compared.

Table 2 comparison of target identification performance

From the table, the recognition effect of the TPH-YOLOv adopted by the invention leads other algorithms, and the perception capability is stronger.

For obstacle positioning, the relative positioning errors of the binocular camera, the RGBD camera and the laser radar assistance are compared, and the errors are Euclidean distances between a true value and a measured value.

Table 3 comparative relative positioning properties

From the table, the vision/laser radar adopted by the invention has the highest relative positioning precision.

For map rendering, the data size is tested when the map rendering is transmitted with the local target rendering, and the result shows that the data size is reduced by 65.9%, and the data size can be reduced by 80% when the obstacles are fewer.

Experiments show that the vision and laser radar obstacle recognition rendering method based on the attention mechanism can improve the perception capability of the unmanned aerial vehicle on the obstacle in flight, reduce the transmission data volume and enable the unmanned aerial vehicle to patrol more safely and stably.

In the disclosed embodiments, a computer storage medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The computer storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a computer storage medium would include one or more wire-based electrical connections, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the invention without departing from the principles thereof are intended to be within the scope of the invention as set forth in the following claims.

Claims

1. A visual and laser radar obstacle recognition rendering method based on an attention mechanism is characterized by comprising the following steps:

2. The attention-based vision and lidar obstacle recognition rendering method of claim 1, wherein the TPH-YOLOv includes four pre-measurement heads, transformerencoder, and CBAM modules; transformer encoder include 2 sublayers, the 1 st sublayer is the bull attention layer, and the 2 nd sublayer is the full tie layer, uses residual connection between every sublayer.

3. The vision and lidar obstacle recognition rendering method based on the attention mechanism as claimed in claim 1, wherein the step 2 is specifically:

4. The vision and lidar obstacle recognition and rendering method based on the attention mechanism as set forth in claim 1, wherein in step 3, the specific process of converting the pixel coordinates of the center point of the obstacle to the camera system by using the camera internal reference matrix and the depth information collected by the lidar to obtain the three-dimensional coordinates of the center point of the obstacle in the camera system is as follows:

Where Z represents depth information acquired by the lidar, (u, v) is a pixel coordinate of a center point of the obstacle b _i, (X _c,Y_c,Z_c) is a three-dimensional coordinate of a center point of the obstacle b _i under the camera system, Is an internal reference matrix of the camera, f _x,f_y is the focal length in the x-axis direction and the focal length in the y-axis direction respectively, and c _x,c_y is the origin offset in the x-axis and the y-axis;

5. The vision and lidar obstacle recognition and rendering method based on the attention mechanism as set forth in claim 1, wherein in step 4, the formula specifically used for differentiating the center point position of the obstacle under the global system to obtain the obstacle speed under the global system is:

Where v _i denotes the speed of the obstacle under the global system, gp _i,t+Dt denotes the position of the center point of the obstacle under the global system at time t+dt, ^Gp_i,t denotes the position of the center point of the obstacle under the global system at time t, and Dt denotes the time interval.

6. The vision and lidar obstacle recognition rendering method based on the attention mechanism as claimed in claim 1, wherein the step 6 is specifically:

Representing the color of the static obstacle point cloud at the current moment, c _s representing the color of the static obstacle point cloud at the previous moment, gs representing the pixel value of the observed color;

7. The vision and lidar obstacle recognition rendering method based on the attention mechanism of claim 1, wherein the step 7 specifically comprises:

8. A vision and lidar obstacle recognition rendering system based on an attention mechanism, comprising: the camera is used for collecting videos;

The laser radar is used for collecting depth information;

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the vision and lidar obstacle recognition rendering method based on an attention mechanism as claimed in any of claims 1 to 7 when the computer program is executed.

10. A computer-readable storage medium, characterized by: a computer program is stored which causes a computer to perform the vision and lidar obstacle recognition rendering method based on an attention mechanism as claimed in any of claims 1 to 7.