CN116385493A - Multi-moving-object detection and track prediction method in field environment - Google Patents

Multi-moving-object detection and track prediction method in field environment Download PDF

Info

Publication number
CN116385493A
CN116385493A CN202310381363.3A CN202310381363A CN116385493A CN 116385493 A CN116385493 A CN 116385493A CN 202310381363 A CN202310381363 A CN 202310381363A CN 116385493 A CN116385493 A CN 116385493A
Authority
CN
China
Prior art keywords
track
point cloud
dimensional
tracking
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310381363.3A
Other languages
Chinese (zh)
Inventor
王煜
黄时星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Bote Intelligent Technology Anhui Co ltd
Original Assignee
Zhongke Bote Intelligent Technology Anhui Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Bote Intelligent Technology Anhui Co ltd filed Critical Zhongke Bote Intelligent Technology Anhui Co ltd
Priority to CN202310381363.3A priority Critical patent/CN116385493A/en
Publication of CN116385493A publication Critical patent/CN116385493A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Radar Systems Or Details Thereof (AREA)
  • Aiming, Guidance, Guns With A Light Source, Armor, Camouflage, And Targets (AREA)

Abstract

The invention discloses a method for detecting multiple moving targets and predicting tracks in a field environment, which comprises the following steps: and the point cloud image is fused with a three-dimensional target detection, three-dimensional multi-target tracking, three-dimensional track prediction and dynamic SLAM algorithm to establish a static map. The invention has the beneficial effects that: the method can provide future tracks of moving objects and establish a static map on a homemade field environment data set, verifies the high accuracy of a point cloud image fusion three-dimensional target detection algorithm, the stable tracking effect of a multi-target tracking algorithm, the precision of three-dimensional track prediction, the high preservation rate of static points of the established static map and the high removal rate of dynamic points, and can run in real time on experimental equipment.

Description

Multi-moving-object detection and track prediction method in field environment
Technical Field
The invention relates to the technical field of track prediction methods, in particular to a multi-moving-object detection and track prediction method in a field environment.
Background
With the rapid development of robot technology, the robot is widely applied to industrial manufacture, military operations and civil life. In an actual field environment, an operator can control the ground mobile robot in a remote control mode to finish tasks such as material conveying. However, this approach requires a significant amount of ground mobile robot control experience for the operator. If the ground mobile robot has the function of automatically following the target object, the requirement of the control experience of an operator is greatly reduced.
The Chinese patent application document with the publication number of CN115601397A discloses a ship track tracking and predicting method based on a monocular camera, which comprises the following steps: step 1: constructing a target ship image dataset, and step 2: labeling and training a data set, and step 3: and (4) evaluating and analyzing a model training result, wherein in step (4): target ship tracking experiment based on YOLOv5 and Deep Sort algorithm, step 5: target ship visual positioning and data preprocessing, step 6: constructing an LSTM track prediction model, and step 7: selecting and evaluating a time step of a prediction model, and step 8: and predicting and verifying target ship track. The method has the advantages that the YOLOv5 algorithm is selected to complete the identification of the target ship through the image acquired by the camera, as few interference objects on the sea surface are available, the ship can be identified on the sponge through the image, in a field environment, the interference objects with different surface colors and texture features are more, the requirements of the target identification and the track prediction are difficult to meet only by means of image identification, the accuracy is poor, and the target identification error rate is high.
Some papers have realized target detection and trajectory prediction methods in automatic driving in urban environments. For example, the article "Fast and functional end-to-end 3d detection,trackingand motion forecasting with a single convolutional net" published in 2018, "Proceedings of the IEEE conference on Computer Vision and PatternRecognition" discloses that the tasks of target detection, tracking and track prediction are completed by designing a deep learning neural network, but the method is designed for an automatic driving dataset in urban environment, does not consider the situation of rugged terrain, and can not solve the problem of identifying a target object in an interfering object by using only three-dimensional laser radar point clouds to complete the task, so that the tracking effect of a multi-target tracking algorithm is unstable.
The invention patent application document with the bulletin number of CN114972911A discloses an output data collection processing method, electronic equipment, a storage medium and a vehicle of an automatic driving perception algorithm model, solves the problem of how to realize effective collection of a complementary training set of a preset scene so as to improve the performance of the perception algorithm model, time sequences a target perception result output by the perception algorithm model to obtain a plurality of perception frame data, judges whether each frame of perception frame data accords with the preset scene or not, and takes data acquired by a vehicle-mounted sensor in a time window where the perception frame data accords with the preset scene as the complementary training set of the perception algorithm model so as to train the perception algorithm model, and can more effectively improve the performance of the perception algorithm model. In the process of tracking the target object, the moving objects can be mutually shielded, and a large number of dynamic points can be reserved in the point cloud map. The method is characterized in that a point cloud image fusion three-dimensional target detection module is simply used for obtaining information to remove dynamic points, and the corresponding dynamic points cannot be removed because the target detection algorithm can generate the condition that a certain frame of data moving object is missed. And the residual error from the characteristic point to the corresponding plane is overlarge, so that the estimation of the pose of the point cloud frame is unreasonable.
Therefore, the field environment target tracking and predicting system in the prior art also has the problems of poor accuracy of target identification, unstable multi-target tracking effect and low removal rate of dynamic points.
Disclosure of Invention
In order to solve the technical problems, the invention aims to provide a multi-moving object detection and track prediction method in a field environment, which comprises the following steps: the method for detecting and predicting the multiple moving targets in the field environment has the advantages of being good in target recognition accuracy, stable in multiple target tracking effect and high in dynamic point removal rate.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
the method for detecting and predicting the multiple moving targets in the field environment comprises the following steps:
s1, detecting a three-dimensional target by fusing point cloud images: detecting and obtaining point cloud data by using a three-dimensional detector, wherein the point cloud data obtain point cloud target detection data through a point cloud target detection algorithm; obtaining image data by using an image sensor, wherein the image data obtains image target detection data through an image target detection algorithm; matching the point cloud target detection data with the image target detection data by using a Hungary matching algorithm, and outputting matching result data;
s2, three-dimensional multi-target tracking: creating a corresponding tracking example according to the matching result data of the point cloud image fusion three-dimensional target detection, constructing a three-dimensional Kalman filter to predict the position of the moving object at the next moment, and using a Hungary matching algorithm to complete matching of the target detection result and the predicted value at the next moment so as to complete multi-target tracking;
s3, three-dimensional track prediction: the method comprises the steps of combining a deep learning track prediction algorithm and a Kalman filtering algorithm to infer a target motion track, wherein the deep learning track prediction algorithm is used for inferring a future track of a target based on a historical track of the target in the past for a period of time, calculating an error of the future track by using the Kalman filtering algorithm as a reference, and completing prediction of a three-dimensional track by using the Kalman filtering algorithm when the error of the future track is larger than a preset first threshold value;
s4, establishing a static map by a dynamic SLAM algorithm: taking the last point cloud point of the point cloud data as a reference, and obtaining the pose of the three-dimensional detector corresponding to each point cloud point by using IMU integral interpolation to finish the de-distortion of the point cloud; and similarly, obtaining the initial pose of the point cloud data, removing dynamic points in the point cloud data according to the dynamic object position information obtained by a sensing algorithm, performing grid voxel downsampling, and then constructing an incremental map by using ikd-Tree to finish the establishment of a static map.
By such arrangement: the problem that targets cannot be stably detected and tracked due to the fact that a plurality of field environment interferents exist is solved, the stability of real-time operation of the system can be effectively guaranteed, and the advantage of stable multi-target tracking effect is achieved. The three-dimensional track prediction method has the advantages that the three-dimensional track prediction can be completed by the system under different stages and different conditions, and the predicted result of the track prediction algorithm based on deep learning can be evaluated to obtain the optimal predicted track, so that the advantage of higher accuracy of target motion track prediction is achieved. And the position information of the moving object of the previous frame is used for completing the removal of the dynamic object of the current three-dimensional laser radar point cloud frame, so that the advantage of higher removal rate of the dynamic point is achieved. The parallel computation of the CIA-SSD network process, the YOLOv5 process and the SLAM process is realized, and the real-time running stability of the whole system is ensured.
Preferably, in the step S1, the method further includes the steps of:
the three-dimensional detector obtains point cloud target detection data through a CIA-SSD network, the image sensor obtains image target detection data through a YOLOv5 network, and homogeneous coordinates of a center point of a target object under an image sensor coordinate system are calculated
Figure BDA0004172346070000031
Wherein->
Figure BDA0004172346070000032
Representing homogeneous transformation matrix of three-dimensional detector coordinate system under image sensor coordinate system, P L The homogeneous coordinates of the center point of the target object under the three-dimensional detector coordinate system are represented, a two-dimensional bounding box of the moving object is obtained by using a YOLOv5 network, and then the center point coordinate p of the two-dimensional bounding box is obtained img Then find p L And p is as follows img And constructing a cost matrix by using the Euclidean distance between the point cloud target detection data and the image target detection data, and matching the point cloud target detection data with the image target detection data by using a Hungary matching algorithm.
By such arrangement: and matching of CIA-SSD point cloud target detection results and YOLOv5 image target detection results is achieved.
Preferably, in the step S2, the method further includes the steps of:
predicting the current position P of tracking example by using Kalman filtering algorithm pre Reasonable observation value P of the tracking example is obtained through a Hungary matching algorithm L
By such arrangement: the real-time update of the tracking instance is realized, so that the three-dimensional multi-target tracking function can be realized.
Preferably, in the step S2, the method further includes the steps of:
let p be pre To track projection of example position predictors on an image, p img Detecting a moving object center point obtained by a network for a Yolov5 image target; if p pre And p is as follows img With redundant p img If not, entering a step S2.1; if p pre And p is as follows L 、p img And p is as follows L If the two types are matched, entering a step S2.2; if no observation data exists, the step S2.3 is entered; if p img And p is as follows pre If the categories are not matched, discarding all observation data at the moment;
s2.1, creating a tracking instance for the target, wherein the tracking instance is in an initial state;
S2.2、P L as the observation value of the corresponding tracking example, the tracking example enters a tracking state after obtaining 3 frames of observation data;
s2.3 using the predicted value P pre And deleting the tracking example after the current position of the tracking example is continuously obtained for 5 times without observation data, and entering an extinction state.
By such arrangement: the state of the tracking instance can be used as a judgment basis for making the next operation on the tracking instance, which is beneficial to improving the detection, tracking and prediction of the system on the target.
Preferably, in the step S3, the method further includes the steps of:
the deep learning track prediction algorithm adopts a memory enhancement neural network, uses a history track encoder network to extract history track characteristics hi in a history track, uses a future track encoder network to extract future track characteristics fi in a future track, splices the hi and fi characteristics, inputs the hi and fi characteristics into a decoder network, and restores the future track of a moving object.
By such arrangement: the accuracy of the memory-enhanced neural network trajectory prediction can be ensured.
Preferably, in the step S2, the method further includes the steps of:
acceleration, velocity and position of the tracking instance are estimated using a three-dimensional kalman filter.
By such arrangement: further monitoring and analysis of the state of the tracking instance is facilitated.
Preferably, in the step S3, the method further includes the steps of:
estimating acceleration, speed and position of tracking example by using three-dimensional Kalman filter, and calculating track point of future period
Figure BDA0004172346070000051
By trace points->
Figure BDA0004172346070000052
And calculating track point errors of a period of time in a plurality of future tracks for the reference, selecting the future track with the smallest error as an optimal track, and invalidating the future track when the error of the optimal track is larger than a preset first threshold value.
By such arrangement: the accuracy of future tracks can be further ensured, and the advantage of higher accuracy of target motion track prediction is achieved.
Preferably, in the step S4, the method further includes the steps of:
and acquiring pose changes between two adjacent frames of three-dimensional laser radar point clouds by using IMU pre-integration, constructing a factor graph by using the pose changes as a constraint, and acquiring corresponding state quantities such as IMU zero offset value after the factor graph is optimized, so as to complete initialization of the IMU.
By such arrangement: the accuracy of removing distortion of the point cloud can be guaranteed when pose interpolation based on IMU integration is completed.
Preferably, in the step S4, the method further includes the steps of:
after constructing an incremental map by using ikd-Tree, solving a plurality of nearest map points between characteristic points in point cloud data and ikd-Tree, judging whether the map points are planes, and solving residual errors of the characteristic points and planes where the map points are located if the map points are planes.
By such arrangement: by solving the residual error, the pose estimation of the current three-dimensional detector can be completed, and the positioning accuracy of the whole system is ensured.
Preferably, in the step S4, the method further includes the steps of:
calculation of
Figure BDA0004172346070000053
Wherein the method comprises the steps of
Figure BDA0004172346070000054
Coordinates representing the ith feature point of the point cloud frame,/->
Figure BDA0004172346070000055
Representation->
Figure BDA0004172346070000056
And if S is larger than a preset second threshold value, removing the characteristic point.
By such arrangement: whether the feature point is a noise point can be judged, and the advantages of higher dynamic point removal rate and capability of storing static points as much as possible are achieved.
Compared with the prior art, the invention has the beneficial technical effects that:
1. and the point cloud target detection data obtained by the three-dimensional laser radar and the image target detection data obtained by the camera are subjected to fusion matching through a Hungary matching algorithm, and matching result data can be output. Meanwhile, the parallel computation of the CIA-SSD point cloud target detection process and the YOLOv5 image target detection process is realized, the problem that targets cannot be stably detected and tracked due to more field environment interferents is solved, the stability of real-time operation of a system can be effectively ensured, and the stability of multi-target tracking is improved while higher detection accuracy is achieved.
2. The deep learning-based trajectory prediction algorithm requires a historical trajectory of the moving object over a period of time to complete the inference. If the number of the historical track points of the moving object cannot meet the requirement, the deep learning neural network cannot extract effective features. However, when the actual system is running, the tracking instance is created early, and the number of the historical track points cannot meet the requirement. At the moment, three-dimensional track prediction is completed by using a Kalman filtering algorithm, so that the three-dimensional track prediction can be completed by the system under different conditions at different stages. In addition, the Kalman filtering algorithm predicts the moving object with small track error in a short time, and can evaluate the predicted result of the track prediction algorithm based on the deep learning to obtain the optimal predicted track, thereby achieving the advantage of high accuracy of target motion track prediction.
3. The method is characterized in that the method is not complete by simply using a point cloud image fusion three-dimensional target detection module to obtain information to remove dynamic points, and the corresponding dynamic points cannot be removed because the target detection algorithm can generate the condition that a certain frame of data moving object is missed. In order to solve the problem, the invention designs a dynamic object information collection strategy, which is used for acquiring the target detection result of the previous frame of data on one hand and acquiring the corresponding position estimation value from the tracking example which is not matched with the observed value of the previous frame but is not deleted on the other hand. Since the relative speed between the ground mobile robot and the moving object is not large in the course of the ground mobile robot tracking the target object. Therefore, the position change of the moving object between the adjacent data frames is not great, the position information of the moving object of the previous frame can be used for removing the dynamic object of the current three-dimensional laser radar point cloud frame, and the advantage of higher removal rate of the dynamic point is achieved. The parallel computation of the CIA-SSD network process, the YOLOv5 process and the SLAM process is realized, and the real-time running stability of the whole system is ensured.
4. The invention can provide future tracks of moving objects and establish a static map on a homemade field environment data set, and verifies the high accuracy of a point cloud image fusion three-dimensional target detection algorithm, the stable tracking effect of a multi-target tracking algorithm, the precision of three-dimensional track prediction, the high preservation rate of static points of the established static map and the high removal rate of dynamic points. Can run in real time on experimental equipment. The system has important significance for solving the problem of tracking the target object by the ground mobile robot.
Drawings
FIG. 1 is a schematic diagram of a system flow of a method for detecting multiple moving targets and predicting trajectories in a field environment according to an embodiment of the present invention;
FIG. 2 is a flowchart of a life cycle management process of a matching policy and tracking instance according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a trace-instance state transition flow in an embodiment of the present invention;
FIG. 4 is a schematic diagram of the structure of a historical track encoder network, a future track encoder network, and a decoder network in an embodiment of the present invention;
FIG. 5 is a flow chart of a dynamic SLAM algorithm in an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to the following examples, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, but the scope of the present invention is not limited to the following specific examples.
Referring to fig. 1, the method for detecting multiple moving targets and predicting trajectories in a field environment is used for detecting objects around a ground mobile robot, predicting relative movement trajectories and establishing a static map, and comprises the following steps:
s1, detecting a three-dimensional target by fusing point cloud images: detecting and obtaining point cloud data by using a three-dimensional detector, wherein the point cloud data is original data obtained by the three-dimensional detector, and obtaining point cloud target detection data by a point cloud target detection algorithm; obtaining image data by using an image sensor, wherein the image data obtains image target detection data through an image target detection algorithm; and matching the point cloud target detection data with the image target detection data by using a Hungary matching algorithm, and outputting matching result data. The three-dimensional detector is a three-dimensional laser radar, and the image sensor is a camera.
The three-dimensional detector obtains point cloud target detection data through a CIA-SSD network, and the image sensor obtains point cloud target detection data through a YOLO network v 5, obtaining image target detection data by a network, and calculating to obtain the coordinates of the center point of the moving object under the coordinate system of the image sensor by the formula (1):
Figure BDA0004172346070000071
calculating homogeneous coordinates of the target object in the image coordinate system through a formula (2):
Figure BDA0004172346070000072
let p L = (u, v) represents the projection of the moving object center point onto the image inferred from the CIA-SSD network, where (x C ,y C ,z C 1) represents homogeneous coordinates of a center point of a target object in an image sensor coordinate system,
Figure BDA0004172346070000073
representing homogeneous transformation matrix of three-dimensional detector coordinate system under image sensor coordinate system, P L The method is characterized in that the method comprises the steps of representing homogeneous coordinates of a target object center point under a three-dimensional detector coordinate system, (u, v, 1) representing homogeneous coordinates of the target object center point under an image coordinate system, and K representing an internal reference matrix of the monocular camera. Obtaining a two-dimensional bounding box of a moving object by using a YOLOv5 network, and then obtaining a center point coordinate p of the two-dimensional bounding box img Then find p L And p is as follows img And constructing a cost matrix by using the Euclidean distance between the CIA and the SSD, and matching the CIA-SSD point cloud target detection result with the YOLOv5 image target detection result by using a Hungary matching algorithm to match the point cloud target detection data with the image target detection data.
S2, three-dimensional multi-target tracking: creating a corresponding tracking example according to the matching result data of the point cloud image fusion three-dimensional target detection, constructing a three-dimensional Kalman filter to predict the position of the moving object at the next moment, and using a Hungary matching algorithm to complete matching of the target detection result and the predicted value at the next moment so as to complete multi-target tracking;
since the moving object is moving over rough terrain, three-dimensional kinematics must be used to model the moving object. The position, the speed and the acceleration of the tracking example are taken as state quantity, and the following state equation (3) is obtained according to the kinematic relation:
Figure BDA0004172346070000081
wherein (P) t ,v t ,a t ) Representing the position, velocity and acceleration of the tracking instance, Δt representing the time interval, Q representing the process noise, subject to gaussian distribution;
because only the position of the target object can be obtained in the detection of the point cloud image fusion three-dimensional target, the established observation equation (4) is as follows:
Figure BDA0004172346070000082
wherein P is w The position of the tracking example under the map coordinate system is represented, R represents observation noise and obeys Gaussian distribution;
since the velocity of the moving object, which often occurs in the detection range of the sensor, is substantially uniform during the process of tracking the target object, the present invention will provide an average velocity v of the moving object avg As the initial velocity of the newly found tracking example, when the tracking example obtains 3 times of observation data, after accumulating 3 frames of history track points, obtaining a new velocity estimation value v through a least square algorithm ls . Calculation of v ls Velocity estimation value of Kalman filtering algorithm
Figure BDA0004172346070000091
Is the difference of (2)Deltav. If the modulus of Deltav is greater than the preset third threshold value, v is used ls Updating the tracking instance state and performing iterative computation by using a Kalman filtering algorithm; estimating acceleration, speed and position of a tracking instance by using a three-dimensional Kalman filter;
predicting the current position P of tracking example by using Kalman filtering algorithm pre Reasonable observation value P of the tracking example is obtained through a Hungary matching algorithm L Real-time update of the tracking instance is realized, so that the three-dimensional multi-target tracking function can be realized;
an algorithmic framework for matching policy and tracking instance lifecycle management is shown in FIG. 2, where p pre To track projection of example position predictors on an image, p img Detecting a moving object center point obtained by a network for a Yolov5 image target; if p pre And p is as follows img With redundant p img If no match exists, the method meets the condition 1 and goes to the step s2.1; if it is pre And p is as follows L 、p img And p is as follows L Also the matching is made up, then condition 2 is met and step S2.2 is entered; if no observation data exists, the method meets the condition 3 and proceeds to the step S2.3; if p img And p is as follows pre If the categories are not matched, discarding all observation data at the moment; the life cycle management strategy of the tracking instance is as follows, and the state conversion of the tracking instance is shown in fig. 3, wherein the state of the tracking instance mainly comprises an initial state, a tracking state and an extinction state;
s2.1, creating a tracking instance for the target, wherein the tracking instance is in an initial state;
s2.2、P L as the observation value of the corresponding tracking example, the tracking example enters a tracking state after obtaining 3 frames of observation data;
s2.3 using the predicted value P pre And deleting the tracking example after the current position of the tracking example is continuously obtained for 5 times without observation data, and entering an extinction state.
S3, three-dimensional track prediction: the method comprises the steps that a deep learning track prediction algorithm and a Kalman filtering algorithm are combined to infer a target motion track, the deep learning track prediction algorithm is used for inferring a future track of a target based on a historical track of the target in the past for a period of time, the Kalman filtering algorithm is used as a reference to calculate an error of the future track, when the error of the future track is larger than a preset first threshold, the future track is invalid, and the Kalman filtering algorithm is used for completing three-dimensional track prediction;
the deep learning track prediction algorithm adopts a memory enhancement neural network, uses a history track encoder network to extract history track characteristics hi in a history track, uses a future track encoder network to extract future track characteristics fi in a future track, splices the hi and fi characteristics, inputs the hi and fi characteristics into a decoder network, and restores the future track of a moving object; the network structure of the encoder network and the decoder network is shown in fig. 4, the encoder network is divided into a history track encoder network and a future track encoder network, the network structures are the same, but the learned data are different;
the controller of the memory-enhanced neural network is composed of a fully-connected neural network, and the training process of the controller is used for an encoder network and a decoder network. Firstly, part of training data is required to be selected to obtain historical track feature memory and future track feature memory through a historical track encoder network and a future track encoder network. And then starting formal training, obtaining a history track feature h by using a history track feature encoder, searching 5 history track feature vectors which are most similar to h in history track feature memory through cosine distances, and further searching 5 corresponding future track feature vectors. H is spliced with the 5 future track feature vectors, and 5 future tracks are obtained through the decoder network. The error rate of the predicted trajectory is calculated by the following equation (5):
Figure BDA0004172346070000101
wherein the method comprises the steps of
Figure BDA0004172346070000102
Is an indication function when the predicted three-dimensional trajectory point +.>
Figure BDA0004172346070000103
And true value P F The error is at a certain threshold Th t Within the range it is 1, otherwise 0. While threshold Th t As the prediction time increases, th is set 1s =0.5m,Th 2s =1.0mn,Th 3s =1.5m,Th 4s =2.0m。
Selecting the maximum value E of error rate in 5 predicted tracks r ate_max Loss function L for constructing memory-enhanced neural network controller c The following formula (6) shows:
L c =E rate_max (1-P(w))+(1-E rate_max )P(w) ⑥
wherein P (w) is the probability of writing memory output by the controller. When P (w) > 0.5, the corresponding historical track features and future track features are written into the memory as memory. The controller is trained while the history track feature memory and the future track feature memory are continuously increased for network inference. In the process of training the memory enhancement neural network, noise is randomly added into historical track points to serve as a data enhancement means, so that the generalization capability of the network is improved, and overfitting is avoided.
In the process of deducing, 5 corresponding future track feature memories are still selected, then 5 future tracks are output by a decoder, the 5 tracks are required to be evaluated, and the optimal future track is selected as a final result. The invention utilizes the three-dimensional Kalman filter to estimate the acceleration, the speed and the position of the tracking example, and calculates the track point of 1 second in future
Figure BDA0004172346070000111
By trace points->
Figure BDA0004172346070000112
And calculating track point errors of 1 second in a plurality of future tracks for the reference, selecting the future track with the smallest error as an optimal track, and turning to finish three-dimensional track prediction by using a Kalman filtering algorithm when the error of the optimal track is larger than a preset first threshold value.
S4, establishing a static map by a dynamic SLAM algorithm: taking the last point cloud point of the point cloud data as a reference, and obtaining the pose of the three-dimensional detector corresponding to each point cloud point by using IMU integral interpolation to finish the de-distortion of the point cloud; and similarly, obtaining the initial pose of the point cloud data, removing dynamic points in the point cloud data according to the dynamic object position information obtained by a sensing algorithm, performing grid voxel downsampling, and then constructing an incremental map by using ikd-Tree to finish the establishment of a static map. The SLAM algorithm flow chart is shown in fig. 5, IMU pre-integration is used for obtaining the pose change between two adjacent frames of three-dimensional laser radar point clouds, meanwhile, the pose change can be used as a constraint construction factor graph, and after the factor graph is optimized, corresponding state quantities such as IMU zero offset values and the like can be obtained, so that the initialization of the IMU is completed. The points in the three-dimensional laser radar point cloud are not all obtained by scanning at the same moment, and in the process of scanning the surrounding environment by the three-dimensional laser radar, the three-dimensional laser radar coordinate systems corresponding to the point cloud points scanned at different moments are not necessarily located at the same pose due to the motion of the ground mobile robot, so that the distortion of the three-dimensional laser radar point cloud is caused. The invention uses a pose interpolation method based on IMU integration to remove the motion distortion of the three-dimensional laser radar point cloud. And taking the last point cloud point of the three-dimensional laser radar point cloud frame as a reference, and obtaining the three-dimensional laser radar pose corresponding to each point cloud point by using IMU integral interpolation, thereby completing the de-distortion of the point cloud.
And after the distortion of the three-dimensional laser radar point cloud is removed, the initial pose of the three-dimensional laser radar point cloud frame is obtained by using an IMU integral pose interpolation method. And removing dynamic points in the three-dimensional laser radar point cloud according to the dynamic object position information obtained by the sensing algorithm, and performing grid voxel downsampling. And thus, finishing the three-dimensional laser radar point cloud frame processing. After constructing an incremental map by using ikd-Tree, 5 nearest map points between the characteristic points in the point cloud data and the ikd-Tree constructed incremental map are obtained, whether the 5 map points are planes is judged, and if the 5 map points are planes, residual errors of the characteristic points and planes where the 5 map points are located are obtained.
The residual of a feature point to a corresponding plane is typically relatively large if the dynamic point is not removed. It is not reasonable to use these feature points to estimate the point cloud frame pose. Accordingly, the corresponding value is calculated with the following equation 7.
Figure BDA0004172346070000121
Wherein the method comprises the steps of
Figure BDA0004172346070000122
Coordinates representing the ith feature point of the point cloud frame,/->
Figure BDA0004172346070000123
Representation->
Figure BDA0004172346070000124
And if S is larger than a preset second threshold value, the residual error value from the feature point to the nearest plane of the incremental map is considered unreasonable, the feature point is a dynamic point or a noise point, and the feature point is removed, so that the removal of the dynamic point is realized.
This embodiment has the following advantages:
the point cloud target detection data obtained by the three-dimensional laser radar and the image target detection data obtained by the camera are subjected to fusion matching through a Hungary matching algorithm, and matching result data can be output, so that parallel calculation of a CIA-SSD point cloud target detection process and a YOLOv5 image target detection process is realized, the problem that targets cannot be stably detected and tracked due to more field environment interferents is solved, the stability of real-time operation of a system can be effectively ensured, and the stability of multi-target tracking is improved while higher detection accuracy is achieved.
The deep learning-based trajectory prediction algorithm requires a historical trajectory of the moving object over a period of time to complete the inference. If the number of the historical track points of the moving object cannot meet the requirement, the deep learning neural network cannot extract effective features. However, when the actual system is running, the tracking instance is created early, and the number of the historical track points cannot meet the requirement. At the moment, three-dimensional track prediction is completed by using a Kalman filtering algorithm, so that the three-dimensional track prediction can be completed by the system under different conditions at different stages. In addition, the Kalman filtering algorithm predicts the moving object with small track error in a short time, and can evaluate the predicted result of the track prediction algorithm based on the deep learning to obtain the optimal predicted track, thereby achieving the advantage of high accuracy of target motion track prediction.
The method is characterized in that the method is not complete by simply using a point cloud image fusion three-dimensional target detection module to obtain information to remove dynamic points, and the corresponding dynamic points cannot be removed because the target detection algorithm can generate the condition that a certain frame of data moving object is missed. In order to solve the problem, the invention designs a dynamic object information collection strategy, which is used for acquiring the target detection result of the previous frame of data on one hand and acquiring the corresponding position estimation value from the tracking example which is not matched with the observed value of the previous frame but is not deleted on the other hand. Since the relative speed between the ground mobile robot and the moving object is not large in the course of the ground mobile robot tracking the target object. Therefore, the position change of the moving object between the adjacent data frames is not great, the position information of the moving object of the previous frame can be used for removing the dynamic object of the current three-dimensional laser radar point cloud frame, and the advantage of higher removal rate of the dynamic point is achieved. The parallel computation of the CIA-SSD network process, the YOLOv5 process and the SLAM process is realized, and the real-time running stability of the whole system is ensured.
The invention can provide future tracks of moving objects and establish a static map on a homemade field environment data set, and verifies the high accuracy of a point cloud image fusion three-dimensional target detection algorithm, the stable tracking effect of a multi-target tracking algorithm, the precision of three-dimensional track prediction, the high preservation rate of static points of the established static map and the high removal rate of dynamic points. The test device can run in real time on test equipment (CPU is AMD Ryzen 93950X, GPU is NVIDIAGeForceRTX 3080). The system has important significance for solving the problem of tracking the target object by the ground mobile robot.
The corresponding matching value is obtained through the Hungary matching algorithm, and the state of the tracking instance is further judged through the life cycle management strategy of the tracking instance, so that the state of the tracking instance can be used as a judgment basis for the next operation of the tracking instance, and the detection, tracking and prediction of the system on the target are facilitated.
The accuracy of the memory enhancement neural network track prediction can be ensured by training a historical track encoder network for extracting the historical track features in the historical track, a future track encoder network for extracting the future track features in the future track and a future track encoder network for decoding and recovering the future track features into the future track.
By initializing the IMU, the accuracy of completing the point cloud de-distortion in the pose interpolation based on IMU integration can be ensured.
The acceleration, the speed and the position of the tracking instance are estimated by using the three-dimensional Kalman filter, so that estimated data of the tracking instance can be further provided, and the state of the tracking instance can be further monitored and analyzed.
The future track is evaluated through the three-dimensional Kalman filter, so that the accuracy of the future track can be further ensured, and the advantage of higher accuracy of target motion track prediction is achieved.
By solving the residual error, whether the residual error value from the characteristic point in the point cloud data to the nearest plane of the incremental map is reasonable or not can be judged, whether the characteristic point is a dynamic point noise point or not can be judged based on the residual error value, the removal rate of the dynamic point is improved, and the real-time operation stability of the whole system is ensured. The pose estimation of the current three-dimensional detector can be completed, and the positioning accuracy of the whole system is ensured.
The residual of a feature point to a corresponding plane is typically relatively large if the dynamic point is not removed. It is not reasonable to use these feature points to estimate the point cloud frame pose. Accordingly, the corresponding value is calculated with the following equation 7. If the value is larger than the second threshold value, the residual value from the feature point to the nearest plane is considered unreasonable, and the feature point is a dynamic point or a noise point and needs to be removed. The method of directly removing all the cloud points near the position of the dynamic object obviously removes part of ground points, so that holes and other conditions appear in the ground area. The invention can solve the problem by using a ikd-Tree to construct an incremental map, and the holes are complemented by using the next three-dimensional laser radar point cloud key frame, so that the purpose of preserving static points as much as possible is realized. The method has the advantages of higher dynamic point removal rate and capability of preserving static points as much as possible.
Variations and modifications to the above would be obvious to persons skilled in the art to which the invention pertains from the foregoing description and teachings. Therefore, the invention is not limited to the specific embodiments disclosed and described above, but some modifications and changes of the invention should be also included in the scope of the claims of the invention. In addition, although specific terms are used in the present specification, these terms are for convenience of description only and do not constitute any limitation on the invention.

Claims (10)

1. The method for detecting and predicting the multiple moving targets in the field environment is characterized by comprising the following steps:
s1, detecting a three-dimensional target by fusing point cloud images: detecting and obtaining point cloud data by using a three-dimensional detector, wherein the point cloud data obtain point cloud target detection data through a point cloud target detection algorithm; obtaining image data by using an image sensor, wherein the image data obtains image target detection data through an image target detection algorithm; matching the point cloud target detection data with the image target detection data by using a Hungary matching algorithm, and outputting matching result data;
s2, three-dimensional multi-target tracking: creating a corresponding tracking example according to the matching result data of the point cloud image fusion three-dimensional target detection, constructing a three-dimensional Kalman filter to predict the position of the moving object at the next moment, and using a Hungary matching algorithm to complete matching of the target detection result and the predicted value at the next moment so as to complete multi-target tracking;
s3, three-dimensional track prediction: the method comprises the steps of combining a deep learning track prediction algorithm and a Kalman filtering algorithm to infer a target motion track, wherein the deep learning track prediction algorithm is used for inferring a future track of a target based on a historical track of the target in the past for a period of time, calculating an error of the future track by using the Kalman filtering algorithm as a reference, and completing prediction of a three-dimensional track by using the Kalman filtering algorithm when the error of the future track is larger than a preset first threshold value;
s4, establishing a static map by a dynamic SLAM algorithm: taking the last point cloud point of the point cloud data as a reference, and obtaining the pose of the three-dimensional detector corresponding to each point cloud point by using IMU integral interpolation to finish the de-distortion of the point cloud; and similarly, obtaining the initial pose of the point cloud data, removing dynamic points in the point cloud data according to the dynamic object position information obtained by a sensing algorithm, performing grid voxel downsampling, and then constructing an incremental map by using ikd-Tree to finish the establishment of a static map.
2. The method for detecting and predicting a plurality of moving objects in a field environment according to claim 1, further comprising the steps of:
the three-dimensional detector obtains point cloud target detection data through a CIA-SSD network, the image sensor obtains image target detection data through a YOLOv5 network, and homogeneous coordinates of a center point of a target object under an image sensor coordinate system are calculated
Figure FDA0004172346060000011
Wherein->
Figure FDA0004172346060000012
Representing homogeneous transformation matrix of three-dimensional detector coordinate system under image sensor coordinate system, P L The homogeneous coordinates of the center point of the target object under the three-dimensional detector coordinate system are represented, a two-dimensional bounding box of the moving object is obtained by using a YOLOv5 network, and then the center point coordinate p of the two-dimensional bounding box is obtained img Then find p L And p is as follows img And constructing a cost matrix by using the Euclidean distance between the point cloud target detection data and the image target detection data, and matching the point cloud target detection data with the image target detection data by using a Hungary matching algorithm.
3. The method for detecting and predicting a plurality of moving objects in a field environment according to claim 1, further comprising the steps of:
predicting the current position P of tracking example by using Kalman filtering algorithm pre Reasonable observation value P of the tracking example is obtained through a Hungary matching algorithm L
4. The method for detecting and predicting trajectories of multiple moving objects in a field environment according to claim 3, further comprising the steps of, in said step S2:
let p be pre To track projection of example position predictors on an image, p img Detecting a moving object center point obtained by a network for a Yolov5 image target; if p pre And p is as follows img With redundant p img If not, entering a step S2.1; if p pre And p is as follows L 、p img And p is as follows L If the two types are matched, entering a step S2.2; if no observation data exists, the step S2.3 is entered; if p img And p is as follows pre If the categories are not matched, discarding all observation data at the moment;
s2.1, creating a tracking instance for the target, wherein the tracking instance is in an initial state;
S2.2、P L as the observation value of the corresponding tracking example, the tracking example enters a tracking state after obtaining 3 frames of observation data;
s2.3 using the predicted value P pre And deleting the tracking example after the current position of the tracking example is continuously obtained for 5 times without observation data, and entering an extinction state.
5. The method for detecting and predicting a plurality of moving objects in a field environment according to claim 1, further comprising the steps of:
the deep learning track prediction algorithm adopts a memory enhancement neural network, uses a history track encoder network to extract history track characteristics hi in a history track, uses a future track encoder network to extract future track characteristics fi in a future track, splices the hi and fi characteristics, inputs the hi and fi characteristics into a decoder network, and restores the future track of a moving object.
6. The method for detecting and predicting trajectories of multiple moving objects in a field environment according to claim 5, further comprising the steps of, in said step S2:
acceleration, velocity and position of the tracking instance are estimated using a three-dimensional kalman filter.
7. The method for detecting and predicting trajectories of multiple moving objects in a field environment according to claim 6, further comprising the steps of, in the step S3:
estimating acceleration, speed and position of tracking example by using three-dimensional Kalman filter, and calculating track point of future period
Figure FDA0004172346060000031
By trace points->
Figure FDA0004172346060000032
And calculating track point errors of a period of time in future tracks predicted by the memory-enhanced neural network for the reference, selecting the future track with the smallest error as an optimal track, and invalidating the future track when the error of the optimal track is larger than a preset first threshold value.
8. The method for detecting and predicting a plurality of moving objects in a field environment according to claim 1, further comprising the steps of:
and acquiring pose changes between two adjacent frames of three-dimensional laser radar point clouds by using IMU pre-integration, constructing a factor graph by using the pose changes as a constraint, and acquiring corresponding state quantities such as IMU zero offset value after the factor graph is optimized, so as to complete initialization of the IMU.
9. The method for detecting and predicting a plurality of moving objects in a field environment according to claim 1, further comprising the steps of:
after constructing an incremental map by using ikd-Tree, solving a plurality of nearest map points between characteristic points in point cloud data and ikd-Tree, judging whether the map points are planes, and solving residual errors of the characteristic points and planes where the map points are located if the map points are planes.
10. The method for detecting and predicting trajectories of multiple moving objects in a field environment according to claim 9, characterized in that in said step S4, it further comprises the steps of:
calculation of
Figure FDA0004172346060000033
Wherein the method comprises the steps of
Figure FDA0004172346060000041
Coordinates representing the ith feature point of the point cloud frame,/->
Figure FDA0004172346060000042
Representation->
Figure FDA0004172346060000043
And if S is larger than a preset second threshold value, removing the characteristic point.
CN202310381363.3A 2023-04-11 2023-04-11 Multi-moving-object detection and track prediction method in field environment Pending CN116385493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310381363.3A CN116385493A (en) 2023-04-11 2023-04-11 Multi-moving-object detection and track prediction method in field environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310381363.3A CN116385493A (en) 2023-04-11 2023-04-11 Multi-moving-object detection and track prediction method in field environment

Publications (1)

Publication Number Publication Date
CN116385493A true CN116385493A (en) 2023-07-04

Family

ID=86963157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310381363.3A Pending CN116385493A (en) 2023-04-11 2023-04-11 Multi-moving-object detection and track prediction method in field environment

Country Status (1)

Country Link
CN (1) CN116385493A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475397A (en) * 2023-12-26 2024-01-30 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, medium and device based on multi-mode sensor
CN117934549A (en) * 2024-01-16 2024-04-26 重庆大学 3D multi-target tracking method based on probability distribution guiding data association

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475397A (en) * 2023-12-26 2024-01-30 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, medium and device based on multi-mode sensor
CN117475397B (en) * 2023-12-26 2024-03-22 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, medium and device based on multi-mode sensor
CN117934549A (en) * 2024-01-16 2024-04-26 重庆大学 3D multi-target tracking method based on probability distribution guiding data association

Similar Documents

Publication Publication Date Title
CN113012203B (en) High-precision multi-target tracking method under complex background
CN107563313B (en) Multi-target pedestrian detection and tracking method based on deep learning
CN107818571B (en) Ship automatic tracking method and system based on deep learning network and average drifting
CN111260683A (en) Target detection and tracking method and device for three-dimensional point cloud data
CN116385493A (en) Multi-moving-object detection and track prediction method in field environment
CN112288770A (en) Video real-time multi-target detection and tracking method and device based on deep learning
CN109145836B (en) Ship target video detection method based on deep learning network and Kalman filtering
CN106814737B (en) A kind of SLAM methods based on rodent models and RTAB Map closed loop detection algorithms
CN110348332B (en) Method for extracting multi-target real-time trajectories of non-human machines in traffic video scene
CN116576857A (en) Multi-obstacle prediction navigation obstacle avoidance method based on single-line laser radar
CN108734109B (en) Visual target tracking method and system for image sequence
CN109708658B (en) Visual odometer method based on convolutional neural network
CN114998276B (en) Robot dynamic obstacle real-time detection method based on three-dimensional point cloud
Engel et al. Deep object tracking on dynamic occupancy grid maps using rnns
CN112418149A (en) Abnormal behavior detection method based on deep convolutional neural network
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN115690153A (en) Intelligent agent track prediction method and system
CN108469729B (en) Human body target identification and following method based on RGB-D information
CN116879870A (en) Dynamic obstacle removing method suitable for low-wire-harness 3D laser radar
CN116188417A (en) Slit detection and three-dimensional positioning method based on SLAM and image processing
CN111739066A (en) Visual positioning method, system and storage medium based on Gaussian process
Liu et al. A light-weight lidar-inertial slam system with loop closing
CN114721008A (en) Obstacle detection method and device, computer equipment and storage medium
CN114387576A (en) Lane line identification method, system, medium, device and information processing terminal
KR20230036243A (en) Real-time 3D object detection and tracking system using visual and LiDAR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination