CN115482282A

CN115482282A - Dynamic SLAM method with multi-target tracking capability in automatic driving scene

Info

Publication number: CN115482282A
Application number: CN202211123948.7A
Authority: CN
Inventors: 刘之涛; 洪峰; 苏宏业
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2022-12-16

Abstract

The invention discloses a dynamic SLAM method with multi-target tracking capability in an automatic driving scene. The method comprises the following steps: firstly, extracting each frame semantic segmentation mask of an image sequence acquired by a camera, and simultaneously obtaining a corresponding optical flow result by an optical flow method; performing point cloud-image projection on the point cloud acquired by the radar, and converting points in a point cloud frame corresponding to the image into a pixel coordinate system through coordinate transformation; then extracting static point clouds in each frame and point clouds corresponding to different vehicles; then, carrying out an initial pose sequence of the vehicle where the vehicle-mounted camera is located and an initial relative transformation matrix of different vehicles in each image frame; further, the existing information is used for constructing a graph optimization problem to further optimize the pose results of the vehicle and the vehicle in the image; and finally, obtaining the target pose information. The invention adopts the laser radar and the camera as sensors, combines the semantic segmentation information, can stably run in a dynamic environment and output the motion state of the vehicle in a picture, and has higher practical application value.

Description

Dynamic SLAM method with multi-target tracking capability in automatic driving scene

Technical Field

The invention belongs to the field of artificial intelligence and robot and computer vision, and relates to an SLAM method in an automatic driving scene, in particular to a multi-target tracking SLAM positioning method based on semantic segmentation and multi-sensor data in the automatic driving scene.

Background

Meanwhile, positioning and mapping (SLAM) is a method for estimating the motion state of an individual carrying a sensor in real time and constructing a surrounding environment model in an unknown environment by taking the observation of the individual carrying the sensor as input. SLAM has wide application in fields such as unmanned driving, robot, high accuracy map construction, unmanned aerial vehicle and AR at present.

The SLAM can be mainly divided into a radar SLAM and a visual SLAM according to different sensors, the radar SLAM is mainly based on a laser radar and a millimeter wave radar at present, the development is mature, a stable positioning and mapping effect can be realized, accurate depth information can be provided, and rich semantic information cannot be given like vision; the visual SLAM mainly comprises monocular, binocular and RGBD cameras, and has the advantages of low cost, good real-time performance and wide application prospect, but can be influenced by conditions such as illumination and the like. The two methods have own advantages, and better effect can be obtained by combining the two methods;

most of the prior SLAM methods have static environment hypothesis, namely, the observation results of the sensors are all static components, and the algorithm realizes the positioning and mapping effects through methods such as geometry and the like on the basis. However, a real scene often does not satisfy the assumption of a static environment, and when a scene with dynamic elements is faced, the existing method is affected by the dynamic elements to generate mismatching and continuously accumulate errors along with time, so that the SLAM algorithm fails, and thus, the requirements of precision and robustness cannot be met in practical application. Also, because existing methods focus on building static element maps, further requirements, such as in an autonomous driving scenario, cannot be met. Because the SLAM algorithm as the front-end needs to provide richer information to the back-end planning layer, such as: the position and speed of surrounding vehicles, etc., to achieve better decision-making behavior.

The existing SLAM algorithm for dealing with dynamic scenes has two types of geometric information and semantic information. The method comprises the steps of projecting the characteristic points of the previous frame to the current frame by using a geometric information method, calculating a reprojection error, and judging whether the characteristic points are dynamic characteristic points according to the size of the reprojection error so as to eliminate dynamic parts, wherein the method can screen dynamic and static states, but the reprojection error is influenced by dynamic elements, so that the improved precision is limited; the method combining the semantic information utilizes a semantic neural network trained in advance to process an image, utilizes the result of target segmentation or semantic segmentation to obtain the semantic information in the image, and then selectively uses the information in the image according to the semantic information. The elimination of dynamic elements is the main idea for processing dynamic states by the current SLAM method, which results in the inefficient utilization of dynamic semantic information.

Disclosure of Invention

The invention aims to provide a multi-target tracking SLAM positioning method based on semantic segmentation and multi-sensor data in an automatic driving scene. According to the method, data collected by a laser radar and a camera are used as input, a target detection technology is combined, a 3D object is constructed by utilizing depth information of the radar and semantic information obtained by target detection, pose transformation is respectively calculated for the obtained static component and the 3D object, a laser camera fusion SLAM system capable of tracking a dynamic object is obtained by combining information optimization pose results obtained by calculation through a graph optimization method, and the algorithm can effectively position and track vehicles in an image in an automatic driving scene containing dynamic elements through verification, so that the method has a high practical application value.

The above purpose is realized by the following technical scheme:

step 1: inputting an image sequence acquired by a vehicle-mounted camera into a semantic segmentation network to obtain a semantic segmentation mask of each image frame in the image sequence, and simultaneously processing the image sequence acquired by the vehicle-mounted camera by an optical flow method to obtain an optical flow result of the current image sequence;

and 2, step: performing point cloud-image projection between an initial point cloud sequence acquired by a vehicle-mounted laser radar and an image sequence acquired by a vehicle-mounted camera to obtain a 3D point cloud sequence;

and step 3: performing point cloud semantic segmentation on each frame of 3D point cloud in the 3D point cloud sequence according to a semantic segmentation mask of each image frame in the image sequence and 2D information in the 3D point cloud sequence to obtain vehicle point cloud and static point cloud of each frame of 3D point cloud, and clustering the vehicle point cloud of each frame of 3D point cloud to obtain point clouds corresponding to different vehicles in each frame of 3D point cloud;

and 4, step 4: determining static characteristic points and dynamic characteristic points of two current frames according to a first adjacent two-frame image frame in the image sequence, static point clouds of 3D point clouds of corresponding frames in the 3D point cloud sequence and point clouds corresponding to different vehicles, and further determining initial pose transformation of a vehicle where a vehicle-mounted camera is located between the two current adjacent frames and initial relative transformation matrixes of the different vehicles between the two current adjacent frames according to light stream results corresponding to the two current frames;

and 5: constructing and solving a graph optimization problem according to static characteristic points and dynamic characteristic points of current two frame image frames, initial pose transformation of a vehicle in which a vehicle-mounted camera is positioned between current adjacent frames and initial relative transformation matrixes of different vehicles between current adjacent frames, and obtaining the optimized pose transformation of the vehicle in which the vehicle-mounted camera is positioned between the current adjacent frames and the optimized relative transformation matrix of different vehicles in the next frame of the current two frames;

step 6: taking the optimized pose transformation of the vehicle in which the vehicle-mounted camera is positioned as the final pose transformation of the vehicle-mounted camera, calculating and obtaining pose information of different vehicles between current adjacent frames according to the optimized pose transformation of the vehicle in which the vehicle-mounted camera is positioned between the current adjacent frames and the optimized relative transformation matrix of different vehicles between the current adjacent frames, and calculating the speed of the corresponding vehicle according to the pose information of different vehicles between the current adjacent frames;

and 7: and repeating the step 4 to the step 6, sequentially calculating and optimizing the pose of the adjacent frames according to the residual image frames in the image sequence, the static point cloud of the corresponding frame 3D point cloud in the 3D point cloud sequence, the point clouds corresponding to different vehicles and the light stream result, and obtaining the pose transformation of the vehicle-mounted camera in each frame, the pose information of different vehicles and the corresponding speed.

In the step 1, the pixel belonging to the vehicle in the semantic segmentation mask of each image frame is marked as 1, the pixel containing the dynamic elements except the vehicle is marked as-1, and the rest pixels are marked as 0.

In the step 2, the initial point cloud sequence under the radar coordinate system is converted into the camera coordinate system according to the external parameters from the vehicle-mounted laser radar to the vehicle-mounted camera, and the initial point cloud sequence under the camera coordinate system is converted into the pixel coordinate system according to the internal parameters of the camera, so that the 3D point cloud sequence is obtained.

In the step 3, the vehicle point clouds of each frame of 3D point cloud are firstly constructed into corresponding K-D trees, and then the search radius is set and then the euclidean clustering segmentation is performed on each frame of K-D trees to obtain the point clouds corresponding to different vehicles in each frame of 3D point cloud.

The step 4 specifically comprises the following steps:

step 4.1: taking the pixel positions of static point clouds of two current frames of 3D point clouds on corresponding image frames as static feature points of the two current frames of image frames, and taking the pixel positions of point clouds corresponding to different vehicles in the two current frames of 3D point clouds on corresponding image frames as dynamic feature points of the two current frames of image frames;

step 4.2: matching the static characteristic points of the current two-frame image frame by using the optical flow result to obtain a current two-frame 2D optical flow static matching result, and matching the dynamic characteristic points of the current two-frame image frame by using the optical flow result to obtain a current two-frame 2D optical flow dynamic matching result;

step 4.3: recording two current adjacent frames as T-1 time and T time, performing 2D coordinate matching of a 2D optical flow static matching result from the static feature point at the T-1 time to the T time to obtain a matching relation from the static feature point at the T-1 time to the 2D coordinate at the T time as a 3D-2D static matching result, and solving through a PnP and random sampling consistency algorithm based on the 3D-2D static matching result to obtain a pose transformation T between the T-1 time and the T time of a vehicle where a vehicle-mounted camera is located;

step 4.4: and checking whether the point clouds corresponding to different vehicles at the t-1 moment and the point clouds corresponding to different vehicles at the t moment form matching by using the dynamic matching result of the 2D optical flow at the t moment, if the point clouds form matching, then carrying out 2D coordinate matching on the dynamic matching result of the dynamic feature points at the t-1 moment to the 2D optical flow at the t moment to obtain the matching relation from the dynamic feature points at the t-1 moment to the 2D coordinates at the t moment and using the matching relation as a 3D-2D dynamic matching result, and solving by using a PnP and random sampling consistency algorithm based on the 3D-2D dynamic matching result to obtain an initial relative transformation matrix H between the t-1 moment and the t moment of different vehicles.

The step 5 specifically comprises the following steps:

step 5.1: constructing a graph optimization problem according to static characteristic points and dynamic characteristic points of two current frame image frames, initial pose transformation of a vehicle where a vehicle-mounted camera is located between current adjacent frames and initial relative transformation matrixes of different vehicles between the current adjacent frames, wherein the graph optimization problem specifically comprises the following steps:

determining the pose of the vehicle where the vehicle-mounted camera is located at the T moment according to the pose transformation T between the T-1 moment and the T moment of the vehicle where the vehicle-mounted camera is located and the pose of the T-1 moment, forming a time window with the length of k from the T moment to the k moment, and constructing the pose of the vehicle where the vehicle-mounted camera is located at each moment, static feature points of each image frame, dynamic feature points of each image frame and initial relative transformation matrixes H of different vehicles in each image frame into a peak of graph optimization in the current time window;

in a current time window, constructing projection errors between the pose of a vehicle where the vehicle-mounted camera is located at each moment and static characteristic points of corresponding image frames into edges between corresponding vertexes; the pose transformation T of the vehicle with the vehicle-mounted camera between the poses of adjacent moments forms an edge between corresponding vertexes; constructing a projection error between the pose of the vehicle where the vehicle-mounted camera is located at each moment and the dynamic characteristic point at the corresponding moment into an edge between corresponding vertexes; forming a side between corresponding vertexes by using the dynamic characteristic points of the adjacent image frames and the transformation errors between the initial relative transformation matrixes H of the corresponding vehicles;

step 5.2: and solving the graph optimization problem constructed in the current time window to obtain the optimized pose transformation of the vehicle where the vehicle-mounted camera is positioned between the current adjacent frames and the optimized relative transformation matrix of different vehicles when the different vehicles are positioned at the next frame of the current two frames.

In step 6, the calculation formula of the pose information of each vehicle between the current adjacent frames is as follows:

T _car ＝H ^-1 *T

h represents the optimized relative transformation matrix of each vehicle, T represents the optimized pose transformation of the vehicle where the vehicle-mounted camera is located at the moment T, and T represents the optimized pose transformation of the vehicle where the vehicle-mounted camera is located at the moment T _car And representing the pose information of the current vehicle from the time t-1 to the time t.

In step 6, the calculation formula of the speed of each vehicle is as follows:

v＝t _car -(I-R _car )c _t-1

wherein, t _car Representing pose information T of each vehicle _car I denotes a 3 x 3 identity matrix, R _car Position and attitude information T representing the current vehicle _car A rotary part of (1), c _t-1 Representing the centroid coordinates of the 3D point cloud of the current vehicle at time t-1.

The invention has the beneficial effects that:

aiming at the situation that the traditional SLAM cannot cope with a dynamic scene, the invention provides a dynamic SLAM method with multi-target tracking capability in an automatic driving scene, effectively overcomes the defects of instability and low precision of the traditional SLAM method under dynamic target interference, and can accurately estimate the motion of a dynamic vehicle in the automatic driving scene. Meanwhile, an optical flow method and a semantic segmentation method in deep learning are introduced, and radar and picture information carried by an automatic driving vehicle are combined, so that robust information fusion is completed. The invention is different from the mainstream dynamic SLAM research which utilizes semantics to remove dynamic information, utilizes the dynamic object information in the scene and provides richer scene information such as target pose and target speed for the subsequent planning control part of the automatic driving research.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is an original picture inputted by the method of the present invention.

FIG. 3 is a diagram of the optical flow processing result of the method of the present invention.

FIG. 4 is a semantic processing result picture of the method of the present invention.

FIG. 5 is a picture of a point cloud clustering result of the method of the present invention.

Fig. 6 is a picture of processing results of the dynamic/static feature point feature points according to the method of the present invention.

FIG. 7 is a geometric relationship picture of the multi-target tracking dynamic SLAM according to the method of the present invention.

FIG. 8 is a graph of the results of the present invention method using the train 0003 sequence of KITTI Tracking data sets versus the speed estimation of vehicles in an image.

FIG. 9 is a graph showing the effect of Tracking and locating a vehicle carrying a sensor and a vehicle in a picture by using a train 0003 sequence of a KITTI Tracking data set.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings given in the present application. It is to be understood that the disclosed embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Compared with the traditional SLAM, the method combines the depth information of the laser point cloud and the semantic information of the picture, utilizes the point cloud clustering method to model the target in the image, and brings the target into the calculation flow of the SLAM, so that the vehicle with the sensor and the target vehicle in the image can be positioned at the same time.

As shown in fig. 1, the present invention comprises the steps of:

step 1: inputting an image sequence acquired by a vehicle-mounted camera into a semantic segmentation network to obtain a semantic segmentation mask of each image frame in the image sequence, and simultaneously processing the image sequence acquired by the vehicle-mounted camera by an optical flow method to obtain an optical flow result of the current image sequence; specifically, after each time image frame and the next time image frame are input into the optical flow network or the conventional optical flow algorithm together, the optical flow vectors of the pixels in the current time image frame are obtained, and the optical flow results of the current image sequence are composed of the optical flow vectors of the pixels in all the image frames, as shown in fig. 3.

In step 1, for one RBG picture shown in fig. 2, the RBG picture is input into a semantic segmentation network, as shown in fig. 4, a pixel belonging to a vehicle in a semantic segmentation mask of each image frame is marked as 1, a pixel containing a dynamic element other than the vehicle is marked as-1, and the remaining pixels are marked as 0.

in step 2, the initial point cloud sequence under the radar coordinate system is converted into the camera coordinate system according to the external parameters from the vehicle-mounted laser radar to the vehicle-mounted camera, and then the initial point cloud sequence under the camera coordinate system is converted into the pixel coordinate system according to the internal parameters of the camera, so that a 3D point cloud sequence is obtained, wherein points in the 3D point cloud sequence comprise depth information and pixel coordinates (namely 2D information) of images.

Specifically, the method comprises the following steps: each scanning of the laser radar obtains n data points in a radar coordinate system, and the coordinate p of each data point in the laser radar coordinate system _l Is p _l ＝[x _l ,y _l ,z _l ]，x _l ,y _l ,z _l Respectively representing the values of data points on x, y and z axes in a radar coordinate system, converting external reference RT of a camera to a camera coordinate system through a laser radar, and obtaining the coordinate p of the current data point in the camera coordinate system _c Expressed as:

p _c ＝RT*p _l

coordinates p in the camera coordinate system for the current data point _c ＝[x _c ,y _c ,z _c ]，x _c ,y _c ,z _c The values of the data points on the x, y and z axes of the camera coordinate system are respectively represented, and the values can be converted into a pixel coordinate system through the internal reference of the camera, and are represented as:

wherein u represents a horizontal value of the data point in the pixel coordinate system, v represents a vertical value of the data point in the pixel coordinate system, f _x Represents the amount of zoom in the lateral direction of the camera parameters, c _x Representing the amount of translation, f, in the transverse direction of the camera reference _y Representing the amount of zoom in and out of the camera parameters, c _y The representation represents the amount of translation in the longitudinal direction in the camera's internal reference.

And step 3: performing point cloud semantic segmentation on each frame of 3D point cloud in the 3D point cloud sequence according to a semantic segmentation mask of each image frame in the image sequence and 2D information in the 3D point cloud sequence to obtain vehicle point cloud and static point cloud of each frame of 3D point cloud, specifically, performing semantic segmentation on each corresponding image frame according to the semantic segmentation mask of each image frame to obtain semantic information corresponding to 2D coordinates of each point in the 3D point cloud sequence on the image, and determining the vehicle point cloud and the static point cloud of the 3D point cloud sequence according to the semantic information corresponding to the 2D coordinates; clustering vehicle point clouds of each frame of 3D point cloud to obtain point clouds corresponding to different vehicles in each frame of 3D point cloud;

in step 3, as shown in fig. 5, vehicle point clouds of each frame of 3D point cloud are first constructed into corresponding K-D trees, and then after a search radius is set, the K-D trees of each frame are subjected to euclidean clustering segmentation to obtain point clouds corresponding to different vehicles in each frame of 3D point cloud. Wherein, the point cloud corresponding to each vehicle in each frame can be represented as P _i ＝{p _i1 ,…,p _iN }(i>0)，p _i1 Representing a first data point of the vehicle i in the current frame, and N representing the number of data points of the vehicle i in the current frame; the static point cloud of each frame of the 3D point cloud may be represented as P ₀ ＝{p ₀₁ ,…,p _0M }，p ₀₁ The first data point of the static point cloud is represented, and M represents the number of data points of the static point cloud.

And 4, step 4: determining static characteristic points and dynamic characteristic points of the current two frames according to a first adjacent two-frame image frame in the image sequence, static point clouds of 3D point clouds of corresponding frames in the 3D point cloud sequence and point clouds corresponding to different vehicles, and further determining initial pose transformation of a vehicle where a vehicle-mounted camera (namely a vehicle-mounted laser radar) is located between the current adjacent frames and initial relative transformation matrixes of different vehicles between the current adjacent frames according to light stream results corresponding to the current two frames;

the step 4 specifically comprises the following steps:

step 4.1: as shown in fig. 6, the pixel positions of the static point clouds of the current two frames of 3D point clouds on the corresponding image frames are used as static feature points of the current two frames of image frames, and the pixel positions of the point clouds corresponding to different vehicles in the current two frames of 3D point clouds on the corresponding image frames are used as dynamic feature points of the current two frames of image frames;

step 4.2: matching static feature points of the current two-frame image frame by using an optical flow result to obtain a current two-frame 2D optical flow static matching result, and matching dynamic feature points of the current two-frame image frame by using the optical flow result to obtain a current two-frame 2D optical flow dynamic matching result;

step 4.3: as shown in fig. 7, two current adjacent frames are recorded as a T-1 time and a T time (namely, a T-1 frame and a T frame), 2D coordinate matching of a 2D optical flow static matching result from the static feature point at the T-1 time to the T time is performed, a matching relation from the static feature point at the T-1 time to the 2D coordinate at the T time is obtained and is used as a 3D-2D static matching result, and a pose transformation T between the T-1 time and the T time of a vehicle where the vehicle-mounted camera is located is obtained by solving through PnP (peer-n-point) and RANdom SAmple Consensus (rannac, RANdom SAmple Consensus) based on the 3D-2D static matching result;

step 4.4: and checking whether the point clouds corresponding to different vehicles at the t-1 moment and the point clouds corresponding to different vehicles at the t moment form matching by using the dynamic matching result of the 2D optical flow at the t moment, if the point clouds form matching, then carrying out 2D coordinate matching on the dynamic matching result of the dynamic feature points at the t-1 moment to the 2D optical flow at the t moment to obtain the matching relation between the dynamic feature points at the t-1 moment and the 2D coordinates at the t moment and using the matching relation as a 3D-2D dynamic matching result, and otherwise, solving by using a PnP and random sampling consistency algorithm based on the 3D-2D dynamic matching result to obtain an initial relative transformation matrix H between the t-1 moment and the t moment of different vehicles. And (7) calculating to obtain the initial pose information of each vehicle through the formula in the step (7) according to the initial pose transformation of the vehicle where the vehicle-mounted camera is and the initial relative transformation matrix of different vehicles.

And 5: constructing and solving a graph optimization problem according to static characteristic points and dynamic characteristic points of current two frames of image frames, initial pose transformation of a vehicle where the vehicle is located between current adjacent frames and initial relative transformation matrixes of different vehicles between the current adjacent frames, and obtaining the optimized pose transformation of the vehicle where the vehicle is located between the current adjacent frames and the optimized relative transformation matrix of the different vehicles when the different vehicles are in the next frame of the current two frames;

the step 5 specifically comprises the following steps:

in the current time window, constructing projection errors between the poses of the vehicle where the vehicle-mounted camera is located at each moment and the static feature points of the corresponding image frames into edges between corresponding vertexes; the pose transformation T of the vehicle with the vehicle-mounted camera between the poses of adjacent moments forms an edge between corresponding vertexes; constructing projection errors between the pose of the vehicle where the vehicle-mounted camera is located at each moment and the dynamic feature points at the corresponding moment into edges between corresponding vertexes; constructing the transformation error between the dynamic characteristic points of the adjacent image frames and the initial relative transformation matrix H of the corresponding vehicle into edges between corresponding vertexes;

and step 5.2: and solving the graph optimization problem constructed in the current time window to obtain the optimized pose transformation of the vehicle where the vehicle-mounted camera is positioned between the current adjacent frames and the optimized relative transformation matrix of different vehicles in the next frame of the current two frames.

Step 6: and calculating to obtain the pose information of different vehicles between the current adjacent frames according to the optimized pose transformation of the vehicle in which the vehicle-mounted camera is positioned between the current adjacent frames and the optimized relative transformation matrix of different vehicles between the current adjacent frames, wherein the left trajectory is the trajectory of the vehicle in which the vehicle-mounted camera is positioned, and the right trajectory is the trajectory of all vehicles in the image, as shown in fig. 9.

In step 6, in the image frame at the time t, the calculation formula of the pose information of each vehicle is as follows:

T _car ＝H ^-1 *T

And calculating the speed of the corresponding vehicle according to the pose information of different vehicles between the current adjacent frames, wherein the calculation formula of the speed of each vehicle is as follows:

v＝t _car -(I-R _car )c _t-1

wherein, t _car Representing pose information T of each vehicle _car I denotes a 3 x 3 identity matrix, R _car Position and attitude information T representing the current vehicle _car A rotary part of (1), c _t-1 And representing the centroid coordinates of the 3D point cloud of the current vehicle at the moment t-1.

Specifically, the numbers "57.11" and "52.50" in the figure represent the speed of the corresponding vehicle in km/h as shown in fig. 8 based on the vehicle speed estimation by the calculated vehicle pose transformation.

It should also be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," can explicitly or implicitly include at least one of the feature, and in the description of the invention, "plurality" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example" or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described, it will be understood that the embodiments are illustrative and not restrictive, and that various changes, modifications, substitutions and alterations may be made herein without departing from the scope of the invention as defined by the appended claims.

Claims

1. A dynamic SLAM method with multi-target tracking capability in an automatic driving scene is characterized by comprising the following steps:

step 2: performing point cloud-image projection between an initial point cloud sequence acquired by a vehicle-mounted laser radar and an image sequence acquired by a vehicle-mounted camera to obtain a 3D point cloud sequence;

and step 3: performing point cloud semantic segmentation on each frame of 3D point cloud in the 3D point cloud sequence according to the semantic segmentation mask of each image frame in the image sequence and the 2D information in the 3D point cloud sequence to obtain vehicle point cloud and static point cloud of each frame of 3D point cloud, and clustering the vehicle point cloud of each frame of 3D point cloud to obtain point clouds corresponding to different vehicles in each frame of 3D point cloud;

and 6: taking the optimized pose transformation of the vehicle in which the vehicle-mounted camera is positioned as the final pose transformation of the vehicle-mounted camera, calculating and obtaining pose information of different vehicles between current adjacent frames according to the optimized pose transformation of the vehicle in which the vehicle-mounted camera is positioned between the current adjacent frames and the optimized relative transformation matrix of different vehicles between the current adjacent frames, and calculating the speed of the corresponding vehicle according to the pose information of different vehicles between the current adjacent frames;

and 7: and repeating the step 4 to the step 6, sequentially calculating and optimizing the poses of adjacent frames according to the residual image frames in the image sequence, the static point clouds of the 3D point clouds in the corresponding frames in the 3D point cloud sequence, the point clouds corresponding to different vehicles and the light stream results, and obtaining the pose transformation of the vehicle-mounted camera in each frame, the pose information of different vehicles and the corresponding speed.

2. The dynamic SLAM method with multi-target tracking capability in an automatic driving scene as claimed in claim 1, wherein in step 1, the pixel belonging to the vehicle in the semantic segmentation mask of each image frame is marked as 1, the pixel containing the dynamic elements except the vehicle is marked as-1, and the rest pixels are marked as 0.

3. The dynamic SLAM method with multi-target tracking capability in the automatic driving scene as claimed in claim 1, wherein in step 2, the initial point cloud sequence in the radar coordinate system is converted to the camera coordinate system according to the external parameters from the vehicle laser radar to the vehicle camera, and then the initial point cloud sequence in the camera coordinate system is converted to the pixel coordinate system according to the internal parameters of the camera, thereby obtaining the 3D point cloud sequence.

4. The dynamic SLAM method with multi-target tracking capability in an automatic driving scene as claimed in claim 1, wherein in step 3, the vehicle point clouds of each frame of 3D point clouds are first constructed into corresponding K-D trees, and then Euclidean clustering segmentation is performed on each frame of K-D trees after the search radius is set, so as to obtain the point clouds corresponding to different vehicles in each frame of 3D point clouds.

5. The dynamic SLAM method with multi-target tracking capability in an automatic driving scenario as claimed in claim 1, wherein said step 4 specifically comprises:

step 4.4: and checking whether the point clouds corresponding to different vehicles at the t-1 moment and the point clouds corresponding to different vehicles at the t moment form matching by using a 2D optical flow dynamic matching result at the t moment, if the point clouds form matching, then performing 2D coordinate matching on the 2D optical flow dynamic matching result from the dynamic feature points at the t-1 moment to the t moment to obtain a matching relation from the dynamic feature points at the t-1 moment to the 2D coordinate at the t moment and using the matching relation as a 3D-2D dynamic matching result, and solving through a PnP and random sampling consistency algorithm based on the 3D-2D dynamic matching result to obtain an initial relative transformation matrix H between the t-1 moment and the t moment of different vehicles.

6. The dynamic SLAM method with multi-target tracking capability in an automatic driving scenario as claimed in claim 1, wherein the step 5 specifically comprises:

step 5.1: constructing a graph optimization problem according to static feature points and dynamic feature points of current two frame image frames, initial pose transformation of a vehicle where a vehicle-mounted camera is located between current adjacent frames and initial relative transformation matrixes of different vehicles between current adjacent frames, wherein the graph optimization problem specifically comprises the following steps:

7. The dynamic SLAM method with multi-target tracking capability in the automatic driving scene as claimed in claim 1, wherein in the step 6, the calculation formula of the pose information of each vehicle between the current adjacent frames is as follows:

T _car ＝H ^-1 *T

8. The dynamic SLAM method with multi-target tracking capability in the automatic driving scenario as claimed in claim 1, wherein in step 6, the velocity of each vehicle is calculated as follows:

v＝t _car -(I-R _car )c _t-1

wherein, t _car Representing pose information T of each vehicle _car I denotes a 3 x 3 identity matrix, R _car Representing pose information T of the current vehicle _car A rotary part of (1), c _t-1 Representing the centroid coordinates of the 3D point cloud of the current vehicle at time t-1.