WO2021072696A1 - 目标检测与跟踪方法、系统、可移动平台、相机及介质 - Google Patents

目标检测与跟踪方法、系统、可移动平台、相机及介质 Download PDF

Info

Publication number
WO2021072696A1
WO2021072696A1 PCT/CN2019/111628 CN2019111628W WO2021072696A1 WO 2021072696 A1 WO2021072696 A1 WO 2021072696A1 CN 2019111628 W CN2019111628 W CN 2019111628W WO 2021072696 A1 WO2021072696 A1 WO 2021072696A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
frame
detection
information
Prior art date
Application number
PCT/CN2019/111628
Other languages
English (en)
French (fr)
Inventor
徐斌
陈晓智
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980033189.5A priority Critical patent/CN112154444B/zh
Priority to PCT/CN2019/111628 priority patent/WO2021072696A1/zh
Publication of WO2021072696A1 publication Critical patent/WO2021072696A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • This application relates to the field of data processing technology, and in particular to a target detection and tracking method, system, movable platform, camera and medium.
  • the mobile platforms can detect and track three-dimensional objects on the driving route.
  • the traditional target detection and target tracking are independent of each other, that is, the target detection scheme is only responsible for target detection, and the target tracking scheme is only responsible for target tracking.
  • embodiments of the present invention provide a target detection and tracking method, system, movable platform, camera, and medium.
  • a target detection and tracking method including:
  • the target detection information and target prediction information of the corresponding adjacent frame are generated, wherein, according to the target detection information of the previous frame in the adjacent frame, the target detection information of the next frame is performed.
  • Forecast determine the target forecast information
  • target tracking is performed.
  • the adjacent frame to-be-detected data is obtained based on adjacent frame acquisition data collected by the detection device.
  • the adjacent frame to-be-detected data includes at least two acquisition times of the to-be-detected data, and the target detection information of the adjacent frame includes the first target detection information of the previous frame and the next one.
  • the second target detection information of the frame includes the first target detection information of the previous frame and the next one.
  • the target prediction information is determined based on the first target detection information and the amount of change in the target position between adjacent frames.
  • the process of determining the target position change between adjacent frames includes:
  • each frame of the to-be-detected data in the adjacent frame to-be-detected data is based on: Frame acquisition data is obtained by fusion processing and preprocessing of similar data; or,
  • Feature fusion is performed on the feature data extracted from adjacent frame to-be-detected data, and the target position change between adjacent frames is predicted based on the fused feature data; each frame of the to-be-detected data in the adjacent frame to be detected data is based on :
  • the acquisition data collected by the same detection device at one acquisition time is obtained by preprocessing, or the acquisition data of multiple frames collected by the same detection device at adjacent acquisition times are obtained by fusion processing and preprocessing of the same kind of data.
  • each frame of the adjacent frame acquisition data includes the distance information between the detection device and the target; in the adjacent frame acquisition data, the next frame is used as the reference data, and the other frames are used as the to-be-calibrated data. data;
  • the process of fusion processing of the same kind of data includes:
  • the collected data of other frames containing the corrected distance information is fused with the collected data of the next frame for similar data fusion processing.
  • the number of frames of data subjected to similar data fusion processing has a positive correlation with the distance between the detection device and the target.
  • the feature fusion of feature data respectively extracted from the to-be-detected data of adjacent frames includes:
  • the corresponding elements in the feature data extracted from the to-be-detected data of adjacent frames are spliced along the specified dimension.
  • the process of determining the amount of change in the target position between adjacent frames includes: according to the speed of the target determined by the data to be detected in the previous frame or the previous frames, and the interval between the previous frame and the next frame. The time difference is obtained, and the change of the target position between adjacent frames is obtained.
  • the detection device includes detection devices that collect different types of collected data.
  • Different types of collected data collected based on different types of detection devices are preprocessed respectively, and the preprocessed data is subjected to multi-source data fusion processing.
  • the target detection information and target prediction information of the adjacent frames are obtained based on the feature data extracted from the to-be-detected data of the adjacent frames, and the process of extracting the feature data includes one or more of the following: Species:
  • Multi-source data fusion processing is performed on the characteristic data output by the last network layer corresponding to different types of data.
  • the process of the multi-source data fusion processing includes: splicing corresponding elements of different types of data.
  • the detection device that collects the same type of data is equipped with a main detection device and a backup detection device, and when the main detection device fails, the backup detection device is used to replace the failed main detection device for data collection.
  • the detection device includes one or more of the following: an image acquisition device, a lidar detection device, and a millimeter wave radar detection device.
  • the target detection information of the adjacent frame includes the first target detection information of the previous frame and the second target detection information of the next frame.
  • the performing target tracking based on the target detection information of the adjacent frames and the target prediction information includes:
  • the target in the second target detection information is given the same identifier as the target in the first target detection information.
  • the method further includes:
  • a new identifier is assigned to the target in the second target detection information.
  • a target detection and tracking system including:
  • a memory and a processor the memory is connected to the processor through a communication bus, and is used to store computer instructions executable by the processor; the processor is used to read computer instructions from the memory to implement any of the above The described target detection and tracking method.
  • a movable platform including:
  • the power system is installed in the body to provide power to the movable platform; and, the target detection and tracking system as described above.
  • the movable platform includes an unmanned vehicle, an unmanned aerial vehicle, or an unmanned ship.
  • a detection device including:
  • the detector is arranged in the housing and is used to collect data
  • a computer-readable storage medium having several computer instructions stored on the readable storage medium, and when the computer instructions are executed, the steps of any one of the methods described above are implemented.
  • the embodiment of the application obtains adjacent frame to-be-detected data, and generates corresponding adjacent frame target detection information and target prediction information according to the adjacent frame to-be-detected data, and the target prediction information is based on the previous one in the adjacent frame.
  • the target detection information of the frame performs target prediction on the next frame. For this reason, target tracking can be performed based on the target detection information of adjacent frames and the target prediction information, so that the target tracking problem and the target detection problem can be integrated into one frame Solve, solve the problem of target detection and target tracking at the same time, thereby reducing repeated calculations and avoiding waste of resources.
  • Fig. 1 is an application scenario diagram of target detection and tracking according to an exemplary embodiment of the present application.
  • Fig. 2 is a schematic flowchart of a target detection and tracking method according to an exemplary embodiment of the present application.
  • Fig. 3 is a schematic diagram showing a relative movement of a vehicle according to an exemplary embodiment of the present application.
  • Fig. 4 is a schematic diagram showing target detection information and target prediction information according to an exemplary embodiment of the present application.
  • Fig. 5 is a schematic diagram showing the framework of target detection and tracking according to an exemplary embodiment of the present application.
  • Fig. 6 is a schematic diagram showing another target detection and tracking framework according to an exemplary embodiment of the present application.
  • Fig. 7 is a schematic diagram of multi-source data fusion at various stages according to an exemplary embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a target detection and tracking system according to an exemplary embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a movable platform according to an exemplary embodiment of the present application.
  • Movable devices with detection devices have been widely used, and can be used in ADAS (Advanced Driving Assistant System), autonomous driving, UGV cars, robots, drones and other products to detect obstacles and realize Obstacle avoidance function and follow-up path planning functions.
  • ADAS Advanced Driving Assistant System
  • target tracking plays an important role, and the accuracy of target tracking algorithms will directly affect the performance and reliability of the system.
  • Target tracking algorithms mainly provide reliable observations for target state estimation (such as target position, speed, angular velocity estimation and prediction, trajectory estimation and prediction, behavior estimation and prediction), and target state estimation provides important information for automatic driving path planning and control , Which directly affects the safety of autonomous driving.
  • the existing target detection and target tracking are separated, that is, the target detection scheme is only responsible for target detection, and the target tracking scheme is only responsible for target tracking.
  • the embodiments of the present application provide a target detection and tracking solution, which integrates the target tracking problem and the target detection problem into one framework to solve the problem, and simultaneously solves the target detection and target tracking problems, thereby reducing repeated calculations and avoiding resource waste.
  • the target detection and tracking method provided in this embodiment can be implemented by software, or by a combination of software and hardware or hardware execution.
  • the hardware involved can be composed of two or more physical entities, or can be composed of one physical entity. Physical composition.
  • the method of this embodiment can be applied to a movable platform equipped with a detection device.
  • the movable platform may be unmanned vehicles, unmanned aerial vehicles, robots, unmanned ships, etc., and the method in this embodiment may also be applied to products such as ADAS.
  • Detection devices include, but are not limited to, image acquisition devices (such as monocular cameras, binocular cameras), lidar detection devices, millimeter wave radar detection devices, and the like.
  • lidar can detect the position and speed of an object in an environment by emitting a laser beam to obtain a laser point cloud.
  • Lidar can transmit detection signals to the environment including the target, and then receive the reflected signal reflected from the target, and obtain the laser point cloud according to the reflected detection signal, the received reflected signal, and the data parameters such as the interval between sending and receiving .
  • the laser point cloud may include N points, and each point may include parameter values such as x, y, z coordinates and intensity (reflectivity).
  • FIG. 1 is an application scenario diagram of target detection and tracking according to an exemplary embodiment of the present application.
  • car A can be equipped with a target detection and tracking system and one or more detection devices.
  • the detection device is arranged at the designated position of the car to detect targets in the surrounding environment.
  • Car B or pedestrian in Figure 1 can be used as the target to be detected when car A is traveling.
  • the detection device can input the collected adjacent frame acquisition data into the target detection and tracking system, and the target detection and tracking system predicts the target detection result and the target tracking result.
  • the target detection result can generally include the three-dimensional position, size, orientation, category, etc. of the target.
  • the target detection result can have multiple representation forms. Here is one representation form as an example.
  • the detected target position size and orientation can be expressed as The three-dimensional outer frame of the object [x0, x1, x2, x3, y0, y1, y2, y3, zmin, zmax] (can be represented by box), the target category is class and the corresponding score, where (x0, y0), ( x1, y1), (x2, y2), (x3, y3) are the four vertices of the three-dimensional circumscribed frame in the top view, zmin, zmax represent the minimum z and maximum z coordinates of the three-dimensional circumscribed frame.
  • the target tracking result can be that the same target is given the same identification. Because the features of target detection and target tracking can be shared, resources are saved.
  • FIG. 2 is a schematic flowchart of a target detection and tracking method according to an exemplary embodiment of the present application.
  • the method may include the following steps 202 to 206:
  • step 202 obtain the to-be-detected data of adjacent frames
  • the target detection information and target prediction information of the corresponding neighboring frames are generated according to the to-be-detected data of the neighboring frames, wherein the target detection information of the previous frame in the neighboring frames is compared to the next one. Frame target prediction, and determine the target prediction information;
  • step 206 target tracking is performed according to the target detection information of the adjacent frames and the target prediction information.
  • the adjacent frame to-be-detected data may be obtained based on the adjacent frame acquisition data collected by the detection device.
  • the detection device may be a device used to detect a target, and the detection device may include, but is not limited to, an image acquisition device, a lidar detection device, a millimeter wave radar detection device, and the like.
  • the detection device may include a main detection device and a backup detection device. When the main detection device takes effect, that is, when it is in a non-failure state, the main detection device can perform data collection alone, or can be combined with a backup detection device for data collection. When the main detection device fails, the backup detection device is used to replace the failed main detection device for data collection.
  • the detection devices corresponding to the data that can be used for target detection and tracking can be applied in this application, and will not be listed here.
  • the multiple detection devices may be multiple detection devices of the same type.
  • the same type of detection device is a detection device that collects the same type of data.
  • multiple detection devices of the same type may include multiple lidar detection devices, multiple image acquisition devices, or multiple millimeter wave radar detection devices.
  • the main detection device when there are multiple detection devices of the same type, only one of the detection devices may be used as the main detection device, and the main detection device may also be referred to as a working detection device.
  • the data collected by the main detection device is used as the collected data of similar detection devices, and other remaining detection devices can be used as backup detection devices.
  • the backup detection device is used to replace the failed main detection device for data collection, so that one of the backup detection devices is used as the new main detection device to continue collecting data to avoid detection failure or detection due to failure of the detection device The question of inaccuracy.
  • the detection devices of the same type can all work and use the collected data as input data.
  • the extracted features are used for both target detection and target tracking.
  • the embodiment of the present application may use the acquisition data of adjacent frames collected by the detection device as input, combined with adjacent The timing information reflected by the frame acquisition data assists in target detection and tracking.
  • the adjacent frame to-be-detected data it may be multiple frames of to-be-detected data obtained based on the adjacent frame acquisition data collected by the detection device.
  • the adjacent frame acquisition data may be multiple frames of data acquired by the detection device at adjacent acquisition times.
  • the collected data collected by the detection device cannot be used directly. For this reason, the collected data can be preprocessed to process the structured data that can be processed by the cost application framework.
  • the collected data may include at least one of the following: laser point cloud data, image data, millimeter wave data, etc., and the collected data may be preprocessed.
  • preprocessing is an example of preprocessing:
  • a laser point cloud is taken as an example.
  • the laser point cloud is disordered data and the number of laser points in each frame of data is not fixed.
  • Ordered data can be obtained after preprocessing, which is also called structured data.
  • the processed structured data can be used in neural networks (such as convolutional neural networks) for point cloud feature extraction.
  • the n*4 vector is processed into the data required by CNN (Convolutional Neural Networks), and the disordered laser point cloud is converted into an ordered three-dimensional image.
  • the preprocessing may include, but is not limited to: voxelization processing, three-dimensional projection to two-dimensional plane processing, and gridding processing of the point cloud by height.
  • the geometric representation of an object is converted into the voxel representation closest to the object to generate a volume data set. It not only contains the surface information of the model, but also describes the internal properties of the model.
  • the spatial voxel representing the model is similar to the two-dimensional pixel representing the image, except that it extends from a two-dimensional point to a three-dimensional cube unit, and the three-dimensional model based on the voxel has many applications.
  • the three-dimensional space in front of the lidar is divided into multiple voxels (each voxel can be understood as a small cube with a preset length, width, and height); then, determine whether there is a laser spot in each voxel, and if it exists, then The voxel is assigned a value of 1; if it does not exist, the voxel is assigned a value of 0. It is understandable that when there are multiple laser points in a voxel, the assignment of the voxel can be the number of laser points.
  • the grayscale value of each pixel is subtracted from the average grayscale value, and then divided by the variance, so as to achieve the preprocessing of the image data.
  • the to-be-detected data of adjacent frames may include at least two acquisition-time-to-be-detected data.
  • the two acquisition times may be adjacent acquisition times.
  • adjacent frames of to-be-detected data include: T-th frame to-be-detected data obtained based on the T-th frame of acquisition data, and acquisition based on the T+1-th frame Data to be detected in the T+1 frame obtained from the data.
  • the adjacent frame to-be-detected data includes: the T-th frame to-be-detected data obtained based on the T-th frame acquisition data, and the T+2-th frame to-be-detected data obtained based on the T+2 frame acquisition data.
  • the T-th frame of data to be detected can be regarded as the previous frame of data to be detected in the adjacent frame of data to be detected, and the T+2 frame of to-be-detected data can be regarded as the next frame of data to be detected in the adjacent frame. Frame data to be detected.
  • each frame of the adjacent-frame to-be-detected data may be preprocessed data of the acquired data. Specifically, each frame of data to be detected in adjacent frames of to-be-detected data is obtained based on: the acquisition data collected by the same detection device at one acquisition time is obtained by preprocessing. In this embodiment, the data collected by the detection device is directly preprocessed to obtain the data to be detected.
  • time sequence fusion processing is performed on multi-frame data in the data preprocessing stage.
  • Time sequence fusion can be the fusion of data corresponding to different sampling times. Specifically, each frame of to-be-detected data in the adjacent frames of to-be-detected data is obtained based on: multiple frames of acquisition data collected by the same detection device at adjacent acquisition times are obtained by fusion processing and preprocessing of similar data.
  • the fusion of similar data can be processed before or after preprocessing.
  • the fusion can be performed before voxelization or after voxelization.
  • multiple frames of acquisition data collected at different acquisition times are merged, which can provide more basis for subsequent determination of target prediction information.
  • the same kind of data fusion processing in the data preprocessing stage can also be called the time series fusion at the data level. For example, contiguous multiple frames such as the T-th frame and the T+1-th frame are merged with the same type of data.
  • the fused data can be input into a single neural network to predict the result.
  • the laser radar detection device acquires TM laser points in the T frame
  • the laser radar detection device acquires TN laser points in the T+1 frame.
  • the laser point cloud itself is a bunch of disorder
  • the laser points of two frames can be directly spliced, that is, the (TM+TN) laser points are used for prediction.
  • the detection device is configured on a moving carrier, and the fusion data may be inaccurate due to its own motion.
  • vehicle-mounted lidar as an example. Considering that the vehicle where the lidar is located can move, the self-movement of the vehicle can be considered when performing time-series fusion.
  • FIG. 3 it is a schematic diagram showing the relative movement of a vehicle according to an exemplary embodiment of the present application.
  • Vehicle A is the own vehicle equipped with lidar
  • vehicle B is a distant vehicle.
  • the laser radar collects laser point 1 at a distance of 50 meters.
  • T+1 frame laser point 1 is collected. Since the vehicle drove 5 meters forward, the distance to laser point 1 is 45 meters.
  • the stationary vehicle B has the same three-dimensional position in the physical world at different times, but due to the movement of the lidar, the lidar has collected different laser point cloud data.
  • each frame of the adjacent frame acquisition data includes the distance information between the detection device and the target; in the adjacent frame acquisition data, since the target in the next frame is often tracked in practical applications, the The latter frame is used as the reference data, and the other frames are used as the data to be calibrated.
  • the process of fusion processing of the same kind of data includes:
  • the collected data of other frames containing the corrected distance information is fused with the collected data of the next frame for similar data fusion processing.
  • the movement displacement of the device is used to modify the distance information between the detection device and the target, thereby avoiding the influence of the distance caused by the movement of the local end itself, thereby improving the accuracy of the fusion data.
  • the laser point cloud As an example. Due to its physical characteristics, the scanning points on the near objects are far more than the laser points on the distant objects. That is, the farther the object is from the lidar, the laser point cloud The sparser. For this reason, the number of frames of data subjected to similar data fusion processing can be positively correlated with the distance between the detection device and the target, and point cloud fusion according to distance can be realized. For example, the laser point cloud of fewer frames is merged with the nearby target, and the laser point cloud of more frames is merged according to the distant target, so as to ensure that the laser point cloud of different objects in the near and far is more balanced.
  • the adjacent frame-to-be-detected data can be used to generate corresponding target detection information and target prediction information of the adjacent frame.
  • the corresponding target detection information can be predicted.
  • the next frame can be used as the current frame to determine whether the target of the current frame is the same as the previous frame.
  • the target of the frame is the same object, and then the tracking of the target is realized.
  • target prediction information can also be determined.
  • the target prediction information may be detection information that predicts the feature region corresponding to the target in the next frame when the target exists in the previous frame.
  • the amount of change in the target position between adjacent frames at least includes: the amount of change in the target position between the previous frame and the next frame.
  • the target prediction information may be determined based on the first target detection information and the amount of change in the target position between adjacent frames. It can be seen that this embodiment predicts the target detection information by determining the amount of change in the target position between neighbors, which is easy to implement.
  • time sequence fusion can be performed at different stages, so as to use the fused data to predict the variation of the target position between adjacent frames.
  • the same kind of data fusion processing can be performed in the data preprocessing stage as described above.
  • the feature data extracted from adjacent frames to be detected data can be feature fused in the feature extraction stage. This process can be It is called the temporal fusion at the feature level.
  • the target position change between adjacent frames is predicted based on the feature data extracted from the adjacent frame to-be-detected data; each frame of the to-be-detected data in the adjacent frame to-be-detected data is based on: the same detection device is in the adjacent frame
  • the multi-frame acquisition data collected at the acquisition time is obtained by fusion processing and preprocessing of the same kind of data.
  • each frame of to-be-detected data actually incorporates multi-frame acquisition data, so the multi-frame acquisition data can be used to predict the detection result of the feature region corresponding to the target in the next frame, and obtain the target position variation between adjacent frames.
  • the data to be detected in adjacent frames includes the data to be detected in the T-th frame (the data in the T-th frame may be fused with the collected data at time T and T-1), and the to-be-detected data in the T+1 frame (the T+1)
  • the data of the frame may be fused with the collected data at time T+1 and T).
  • the feature data extracted from the to-be-detected data of adjacent frames predicts the target position change between adjacent frames, which may include the target position change between the T-th frame to-be-detected data and the T+1-th frame to-be-detected data.
  • the data of the Tth frame not only detects and obtains the T frame target, but also obtains the position of the target in the T+1 frame, that is, target detection information.
  • feature fusion is performed on the feature data extracted from adjacent frame to-be-detected data, and the target position change between adjacent frames is predicted based on the fused feature data; in the adjacent frame to-be-detected data Each frame of to-be-detected data is based on the preprocessing of the collected data collected by the same detection device at one collection time.
  • This embodiment realizes the fusion of data collected at different times at the feature level, so as to predict the amount of change in the target position between neighbors.
  • feature fusion is performed on the feature data extracted from adjacent frame to-be-detected data, and the target position change between adjacent frames is predicted based on the fused feature data; in the adjacent frame to-be-detected data
  • Each frame of to-be-detected data is based on: multi-frame acquisition data collected by the same detection device at adjacent acquisition times are obtained by fusion processing and preprocessing of similar data.
  • This embodiment not only fuses the data collected at different times at the data level, but also fuses the data collected at different times at the feature level, and combines more data, so as to improve the accuracy of predicting the change of the target position between neighbors.
  • the feature fusion of feature data extracted from adjacent frame to-be-detected data may include: combining the corresponding elements in the feature data extracted from adjacent frame-to-be-detected data. Numerical values are used to perform specified operations; or, corresponding elements in the feature data extracted from adjacent frames to be detected data are respectively spliced along specified dimensions.
  • the specified operation can be addition and subtraction, averaging, and so on.
  • splicing along a specified dimension for example, splicing two tensors into a new tensor along a certain dimension, usually along the depth dimension. It is understandable that it can include but is not limited to: element-by-element operation, splicing along a specific dimension, and other fusion methods.
  • the detection result of the feature region corresponding to the target in the next frame is predicted. For example, when there is a detection result on the Tth frame, the detection result of the feature area corresponding to the detection result on the T+1 frame is directly predicted, so that the detection results of the two frames before and after (the detection result of the T+1 frame is based on The T+1 frame detection result predicted by the Tth frame) is correlated to obtain ID information.
  • ID information is a globally unique identification code for each target, that is, the target with the same ID is the same target, and Tracking Results.
  • the T-frame targets A1 and A1 are predicted to be in the T+1 frame.
  • A1 and A2 have the same ID.
  • objects B1 and B2 with the same ID are also predicted (B1 is detected and B2 is predicted).
  • a certain distance measurement such as Euclidean distance
  • the target is tracked on T, T+1, and T+2 frames.
  • the current frame is predicted by one or more frames in the next frame. For example, if the target is detected to be moving in the current frame, combined with the motion of the previous frames, the position of the target in the next frame can be predicted, and the current frame result and the next frame result can be output at the same time.
  • the process of determining the amount of change in the target position between adjacent frames may include: according to the speed of the target determined on the previous frame or several frames of data, and the previous frame and the previous frame. The time difference between the next frame obtains the target position change between adjacent frames.
  • This embodiment can predict the target and the speed corresponding to each target.
  • the data of the T-th frame predicts the speed S of the target A1 and A1.
  • the position A2 of A1 in the T+1 frame can be obtained by calculation (predicted speed* (time difference between T+1 and T) + T frame position).
  • the speed S2 of B1 and B1 can also be predicted, and B2 at time T+2 can be calculated. Then it is measured by a certain distance, such as Euclidean distance. The distance is lower than a certain threshold. For example, when the distance between two cars is less than 0.5 meters, the two cars are considered to be the same car, and a complete tracking result can be obtained in the end.
  • the target tracking based on the target detection information of the adjacent frames and the target prediction information may include: comparing the target prediction information with the second target detection information; if the comparison result is based It is determined that it is the same target, and the target in the second target detection information is given the same identifier as the target in the first target detection information.
  • the same target between the two frames is correlated to achieve target tracking.
  • a preset condition can be used to determine whether the two are the same target.
  • the preset conditions include, but are not limited to, the distance between the two meets the requirements, and the two are of the same category.
  • the T frame has the detection result
  • the T+1 frame has the detection result
  • the two frames of data are correlated
  • the target is found at the 100th and 120th pixels of the T frame
  • the two targets are of the same category and close in position. Therefore, the targets of the two frames are considered to be the same target.
  • a new identifier is assigned to the target in the second target detection information.
  • Forecasts are often dense forecasts with a large degree of overlap. Non-maximum suppression removes frames with very high overlap, removes redundancy, and improves computing efficiency.
  • the data output by the same architecture may include information such as location (x, y, z), category, orientation, ID, and so on.
  • FIG. 5 it is a schematic diagram of the framework of target detection and tracking according to an exemplary embodiment of the present application.
  • the left part represents the processing flow of the target detection in the previous frame (T frame)
  • the right part represents the processing flow of the target detection in the next frame (also called the current frame, the T+1 frame)
  • the middle part represents the target detection in the next frame.
  • the process of tracking The following is an example of target tracking:
  • the target tracking CNN predicts the amount of change of the target position between two frames according to the input fused_feature.
  • box1 predicted by the target tracking network in the previous frame and box2 actually detected by the target detection network in the next frame are generally very close, which is also the expected prediction result of the target tracking network. Therefore, it is possible to determine which two targets are the same target based on the distance between box1 predicted by the target tracking network and the actual result box2 of the next frame target detection network, and then the target association can be completed. It is understandable that when judging whether two targets are the same target, it is also possible to compare whether the two targets are of the same category, etc., which will not be repeated here.
  • the corresponding relationship between box2 and box0 can be determined. Therefore, the ID of box0 can be copied to the corresponding box2 to complete the acquisition of the tracking ID. If the target is detected for the first time, that is, there is no corresponding box0 in the previous frame, a new ID must be assigned to this box2 to achieve target tracking.
  • the tracklet can also be maintained.
  • a tracklet is formed.
  • a tracklet is a sequence composed of the box and class score of the target detection of the same target in the multi-frame data, and a certain check in the data association step is formed.
  • the target box2 and a target box0 in the previous frame are the same target, then box0 and box2 will be stored in a tracklet.
  • it can be preset to merge data collected at different collection times of m frames, and it can be increased to determine whether the number of acquired frames is m frames. If only m-1 frames are acquired, the system can also obtain detection and tracking results based on these m-1 frames.
  • the target detection algorithm only uses a single frame of data for detection without using the timing information, resulting in a lot of noise in the detection result, which causes the user to be unable to distinguish objects correctly.
  • the embodiment of the present application uses timing information in the to-be-detected data in adjacent frames to assist target detection, which will make the target detection result more stable and reliable.
  • the target tracking timing information can also be used to assist target detection. Considering the continuity of time, if an object is detected in the first few frames, the time will not change suddenly, and the target will not disappear suddenly, so the target should be easier to be detected in the previous position.
  • the cumulative tracking frame number N from the start of recording and the cumulative score (cumulative class score) SUM_SCORE.
  • the target check result class score can be corrected:
  • class score*: class score+ ⁇ *SUM_SCORE/N
  • class score* is the score after correction, and class score is the score before correction. If this box is associated with a tracklet (that is, it is associated with the target check result of the previous frame in this tracklet), then the cumulative score of this tracklet is increased by the number of frames N, the cumulative score SUM_SCORE plus the class score* of the new box, otherwise it is stored in a new For tracklet, N and SUM_SCORE do the same operation.
  • the target inspection score can be corrected according to the target tracking result, and the timing information will be more stable when combined.
  • the 3D target detection algorithm based on deep learning with laser point cloud as input mainly solves the problem of obtaining the 3D position, size, and orientation of the object scanned by the laser point cloud given the accumulated laser point cloud for a certain period of time. Categories and other information provide the surrounding perception information for autonomous vehicles.
  • the multiple detection devices there may be multiple detection devices.
  • Data collected by multiple detection devices can be used as source input data.
  • the multiple detection devices may be multiple different types of detection devices.
  • Different types of detection devices are detection devices that collect different types of data.
  • a plurality of different types of detection devices may include a combination of at least two of a laser radar detection device, an image acquisition device, and a millimeter wave radar detection device, and the number of each detection device in each combination is variable.
  • a plurality of different types of detection devices may include a combination of at least two of a laser radar detection device, an image acquisition device, and a millimeter wave radar detection device, and the number of each detection device in each combination is variable.
  • each time you work you can select one of the different types of detection devices for data collection, or you can use different types of detection devices to collect different types of data at the same time. Data is fused.
  • the data collection is exited, and the remaining detection devices are used to collect data, so as to ensure the validity of the source input data.
  • the failed detection device exits data collection or its data is discarded, and the data collected by the remaining detection devices is used as the source Input data to ensure the validity of the source input data.
  • the types of source input data will be correspondingly reduced.
  • the corresponding detection results or tracking results can also be calculated by using these source input data.
  • the detection results or tracking results may be compared with those before exiting some detection devices. The accuracy of the tracking results will be appropriately reduced, but normal use will not be affected. In other words, in this embodiment, the robustness of the calculation result can be improved by acquiring data from multiple detection devices.
  • multi-source data it can include one or more of the three stages of fusion: multi-source data fusion in the data preprocessing stage, multi-source data fusion in the feature extraction process, and multi-source data after feature extraction Fusion.
  • FIG. 6 it is a schematic diagram of another target detection and tracking framework according to an exemplary embodiment of the present application.
  • the schematic diagram takes the detection device including a laser detection device (detection device 1), an image acquisition device (detection device 2), and a millimeter wave radar detection device (detection device 3) as an example.
  • the architecture can support various detection devices.
  • the detected multi-source data includes laser point clouds, images and millimeter wave radar point clouds.
  • the data collected by different detection devices can be fused with multi-source data in the data preprocessing stage, CNN feature extraction stage, and feature extraction.
  • the schematic diagram also takes the detection device 1 as an example to perform the same data fusion processing of the T-th frame and the T+1-th frame in the data preprocessing stage. It can be seen that in this embodiment, fusion can occur both in a single frame (data at the same acquisition time) and multiple frames (data at different acquisition times). Fusion occurs during information interaction in the data preprocessing stage, information interaction in the CNN feature extraction stage, and after the CNN feature extraction stage. Realize the combination of more data, in order to achieve complementary advantages and disadvantages, and better robustness.
  • the target's position (xyz), category, orientation, and ID applied in tracking
  • the detection result is output.
  • This application can be improved on the basis of the point cloud or image-based three-dimensional target detection algorithm, the fusion of multi-sensor data, and the fusion of timing information, the target tracking problem and the target detection problem are integrated into one framework to solve, and at the same time solve Target detection and target tracking issues.
  • the present invention supports single sensor data for corresponding detection and tracking at the same time to obtain the final perception result.
  • Different types of collected data collected based on different types of detection devices are preprocessed respectively, and the preprocessed data is subjected to multi-source data fusion processing.
  • the data collected by some detection devices of different types can be fused with the data collected by other detection devices, while the data collected by some detection devices can not be used. Fusion, or, all the data collected by the detection devices are multi-source data fusion, or all the data collected by the detection devices are not multi-source data fusion, which can be configured according to requirements.
  • the collected data of the detection device is preprocessed to obtain the data to be detected.
  • the data collected by the detection device will be preprocessed, and the data collected by other designated detection devices will be preprocessed for multi-source data fusion processing.
  • the first designated detection device may be a pre-designated detection device that does not require multi-source data fusion, and there may be one or more.
  • the second designated detection device can be a pre-designated detection device that needs to perform multi-source data fusion, and other detection devices that perform multi-source data fusion with it can also be pre-designated. There can be one or more second designated detection devices. How to specify It can be determined according to whether the data collected by the detection device is defective and the specific application scenario.
  • the laser point cloud when preprocessing the laser point cloud, can be an n*4 vector, and each laser point includes x-coordinate information, y-coordinate information, z-coordinate information, and reflection intensity (intensity).
  • the image data is the RGB value of 3 channels, and the size is H*W.
  • the laser point cloud and image data are calibrated first, and then, according to the calibrated laser point cloud and image data, the image coordinates corresponding to each laser point can be determined.
  • the image data includes 3 channels of RGB values and the size is H *W.
  • the pixels corresponding to the laser points are found from the image, and the corresponding RGB color information is extracted, so that the RGB color information in the image data can be fused to the laser points, that is, the laser points from 4 dimensions (x, y) , Z, intensity) is expanded to 7 dimensions (x, y, z, intensity, r, g, b).
  • this fusion achieves the effect of coloring the laser point cloud.
  • multi-source data fusion can be carried out in the data preprocessing stage.
  • the calibration take the fusion of the laser point cloud and the image as an example to learn the correspondence between the two. For example, if a two-dimensional image is acquired, the corresponding relationship between the real three-dimensional coordinates and the image coordinates can be determined.
  • the image data can be expanded from 3 dimensions (r, g, b) to 7 dimensions (r , G, b, x, y, z, intensity).
  • the purpose of data fusion is mainly to complement different types of data.
  • the laser point cloud records accurate three-dimensional information, but the laser point cloud collected by the existing laser radar is relatively sparse.
  • the image data is denser and contains more semantic information.
  • the image lacks accurate three-dimensional information.
  • it only contains rough three-dimensional information such as near-large and far-small. For this reason, more data should be combined to achieve complementary advantages and better robustness.
  • the multi-source data fusion and time series fusion in the data preprocessing stage can also be carried out at the same time.
  • the different types of collected data collected based on different types of detection devices are respectively preprocessed, and the preprocessed data is subjected to multi-source data fusion processing to obtain the data after the multi-source data fusion processing, and then the data corresponding to different sampling times
  • the data after multi-source data fusion is processed by the same kind of data fusion to obtain the data to be tested.
  • the multi-source data fusion in the process of extracting features and the multi-source data fusion after feature extraction can be considered as multi-source data fusion at the feature level.
  • the target detection information and target prediction information of the neighboring frames are obtained based on the feature data extracted from the to-be-detected data of the neighboring frames, and the process of extracting the feature data includes one or more of the following:
  • Multi-source data fusion processing is performed on the characteristic data output by the last network layer corresponding to different types of data.
  • the features extracted from different network layers generally have different focuses. Feature extraction can be achieved through neural networks, especially convolutional neural networks. Generally speaking, the deeper the network, the more semantic features may be.
  • the designated network layer can be configured according to requirements. For example, the designated network layer can be a network layer close to the input layer, and multi-source data fusion is performed close to the input layer, and the fusion can be local detailed features.
  • the designated network layer can also be a network layer close to the output layer. Multi-source data fusion is performed close to the output layer, and the fusion can be global features.
  • multi-source data fusion can be performed in the feature extraction stage, and the corresponding network layer of the multi-source data can be fused. It is understandable that multi-source data fusion can be carried out for each network layer, or part of the network layer can be designated for multi-source data fusion. The features extracted from different layers can be fused, and feature data of different focuses can be combined to combine more The data can complement each other and improve robustness.
  • Fusion methods include, but are not limited to, element-wise operations (addition and subtraction, averaging, etc., corresponding to each value of the feature vector), and splicing along a specific dimension (splicing two tensors along a certain dimension into a new
  • the features output by lidar and image sensors are stitched along any dimension, usually along the direction of the depth dimension). It should be noted that when the features of data collected by different types of detection devices are fused, the physical position relationship between them (that is, the projection correspondence relationship considered during data fusion) can also be considered.
  • a neural network is a multi-layer neural network. Assuming that there are 100 network layers, if the 50th network layer extracts feature 1, and the 100th network layer extracts feature 2, then feature 1 and feature 2 can be merged, by combining different The fusion of the features extracted from the feature layer can improve the robustness.
  • Target detection is only responsible for target detection, and target tracking is only responsible for target tracking. But in fact, these two problems are very related.
  • the input is the same, so it can be extracted.
  • the features of are also similar or even the same. Using two different methods to perform target detection and tracking respectively will cause waste of resources, because similar features can be shared, and sharing these features in the embodiments of the present application can reduce repeated calculations.
  • the current target detection algorithm only uses a single frame of data for detection, and cannot use timing information, which will cause a lot of noise in the detection results.
  • the target detection results assisted by timing information may be more stable and reliable.
  • the target detection and tracking system 800 may include: a memory 82 and a processor 84; the memory 82 communicates with the processing via a communication bus.
  • the processor 84 is connected to store computer instructions executable by the processor 84; the processor 84 is used to read computer instructions from the memory 82 to implement any of the target detection and tracking methods described above. For example, when computer instructions are executed, they are used to perform the following operations:
  • the target detection information and target prediction information of the corresponding neighboring frame are generated, wherein, according to the target detection information of the previous frame in the neighboring frame, the target detection information in the neighboring frame is The previous frame is obtained by performing target prediction on the next frame, and the target prediction information is determined;
  • target tracking is performed.
  • the processor 84 executes the program code included in the memory 82, the processor 84 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors). , DSP), application specific integrated circuit (ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor 84 may be a microprocessor or the processor 84 may also be any conventional processor or the like.
  • the memory 82 stores the program code of the target detection and tracking method.
  • the memory 82 may include at least one type of storage medium.
  • the storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX). Memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic Storage, magnetic disks, optical discs, etc.
  • the target detection and tracking system can cooperate with a network storage device that performs the storage function of the memory 82 through a network connection.
  • the memory 82 may be an internal storage unit of the target detection and tracking system, such as a hard disk or memory of the target detection and tracking system.
  • the memory 82 may also be an external storage device of the target detection and tracking system, such as a plug-in hard disk, a smart media card (SMC), and a secure digital (SD) card equipped on the target detection and tracking system. Flash Card, etc. Further, the memory 82 may also include both an internal storage unit of the target detection and tracking system and an external storage device. The memory 82 is used to store computer program codes and other programs and data required by the target detection and tracking system. The memory 82 can also be used to temporarily store data that has been output or will be output.
  • the various embodiments described herein can be implemented using a computer-readable medium such as computer software, hardware, or any combination thereof.
  • the implementation described here can be implemented by using application-specific integrated circuits (ASIC), digital signal processor 102 (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate array It is implemented by at least one of (FPGA), a processor, a controller, a microcontroller, a microprocessor, and an electronic unit designed to perform the functions described herein.
  • ASIC application-specific integrated circuits
  • DSP digital signal processor 102
  • DSPD digital signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • implementations such as procedures or functions may be implemented with separate software modules that allow execution of at least one function or operation.
  • the software code can be implemented by a software application (or program) written in any suitable programming language, and the software code can be stored in a memory and executed by the controller.
  • an embodiment of the present application also provides a movable platform 900, including:
  • the power system 94 is installed in the body 92 to provide power for the movable platform;
  • the target detection and tracking system 800 is as described above.
  • FIG. 9 is only an example of a movable platform, and does not constitute a limitation on the movable platform. It may include more or less components than those shown in the figure, or a combination of certain components, or different components.
  • the mobile platform can also include input and output devices, network access devices, and so on.
  • the movable platform includes an unmanned vehicle, an unmanned aerial vehicle or an unmanned ship.
  • an embodiment of the present application also provides a detection device, including:
  • the detector is arranged in the housing and is used to collect data
  • this embodiment is only an example of the detection device and does not constitute a limitation on the detection device. It may include more or less components than the above, or a combination of certain components, or different components.
  • this embodiment also provides a computer-readable storage medium with a number of computer instructions stored on the readable storage medium, and when the computer instructions are executed, the steps of any one of the methods described above are implemented.
  • the relevant part can refer to the part of the description of the method embodiment.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units.
  • Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.

Abstract

一种目标检测与跟踪方法、系统、可移动平台、相机及介质,本申请实施例获取相邻帧待检测数据,并根据相邻帧待检测数据,生成相应的相邻帧的目标检测信息及目标预测信息,而目标预测信息根据所述相邻帧中的前一帧的目标检测信息对后一帧进行目标预测得到,为此,可以根据相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪,实现把目标跟踪问题和目标检测问题整合到一个框架下解决,同时解决目标检测和目标跟踪问题,从而减少重复计算,避免资源浪费。

Description

目标检测与跟踪方法、系统、可移动平台、相机及介质 技术领域
本申请涉及数据处理技术领域,尤其涉及一种目标检测与跟踪方法、系统、可移动平台、相机及介质。
背景技术
随着可移动平台(例如无人车、无人飞机等)技术的发展,可移动平台可以对行驶路线上的三维物体进行检测和跟踪。
然而,传统的目标检测和目标跟踪是相互独立的,即目标检测方案只负责目标检测,目标跟踪方案只负责目标跟踪。而发明人发现,两套方案在输入相同的情况下提取到的特征可能相似或者相同,若针对相同输入数据,用不同的两套方案分别做目标检测和目标跟踪,这样重复计算会造成资源浪费。
发明内容
有鉴于此,本发明实施例提供一种目标检测与跟踪方法、系统、可移动平台、相机及介质。
根据本申请实施例的第一方面,提供一种目标检测与跟踪方法,所述方法包括:
获取相邻帧待检测数据;
根据所述相邻帧待检测数据,生成相应的相邻帧的目标检测信息及目标预测信息,其中,其中,根据所述相邻帧中的前一帧的目标检测信息对后一帧进行目标预测,确定所述目标预测信息;
根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪。
在一个可选的实施例中,所述相邻帧待检测数据基于探测装置采集的相邻帧采集数据获得。
在一个可选的实施例中,所述相邻帧待检测数据至少包括两个采集时间的待检测数据,所述相邻帧的目标检测信息包括前一帧的第一目标检测信息和后一帧的第二目标检测信息;
基于所述第一目标检测信息和相邻帧间目标位置变化量确定所述目标预测信息。
在一个可选的实施例中,所述相邻帧间目标位置变化量的确定过程包括:
依据从相邻帧待检测数据中提取的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得;或,
将分别从相邻帧待检测数据中提取的特征数据进行特征融合,并依据融合后的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在一个采集时间采集的采集数据进行预处理获得,或者,同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得。
在一个可选的实施例中,所述相邻帧采集数据中每帧采集数据包括探测装置与目标的距离信息;在相邻帧采集数据中,后一帧作为基准数据,其他帧作为待校准数据;
所述同类数据融合处理的过程,包括:
由探测装置的移动速度、待校准数据和基准数据间的时间差确定装置运动位移,并利用所述装置运动位移修正所述待校准数据中的距离信息;
将包含修正后距离信息的其他帧采集数据,与后一帧采集数据进行同类数据融合处理。
在一个可选的实施例中,进行同类数据融合处理的数据的帧数与所述探测装置和目标的距离呈正相关关系。
在一个可选的实施例中,所述将分别从相邻帧待检测数据中提取的特 征数据进行特征融合,包括:
将分别从相邻帧待检测数据中提取的特征数据中对应元素的数值进行指定运算;或,
将分别从相邻帧待检测数据中提取的特征数据中对应元素沿着指定维度进行拼接。
在一个可选的实施例中,所述相邻帧间目标位置变化量的确定过程包括:依据由前一帧或前几帧待检测数据确定的目标的速度以及前一帧与后一帧间的时间差,获得相邻帧间目标位置变化量。
在一个可选的实施例中,所述探测装置包括采集不同类采集数据的探测装置。
在一个可选的实施例中,针对所述相邻帧待检测数据中相同采集时间的待检测数据,采用以下方式获得:
基于不同类探测装置采集的不同类采集数据分别进行预处理;或,
基于不同类探测装置采集的不同类采集数据分别进行预处理,并将预处理后的数据进行多源数据融合处理。
在一个可选的实施例中,所述相邻帧的目标检测信息及目标预测信息基于从所述相邻帧待检测数据中提取的特征数据获得,提取特征数据的过程包括以下一种或多种:
在指定网络层提取特征后,将从不同类数据中提取的特征进行多源数据融合处理,并将融合处理后的数据作为下一网络层的输入数据;
将不同网络层提取的特征进行同类数据融合处理;
将与不同类数据对应的最后一层网络层输出的特征数据进行多源数据融合处理。
在一个可选的实施例中,所述多源数据融合处理的过程包括:将不同类数据对应元素进行拼接。
在一个可选的实施例中,针对采集同一类数据的探测装置配置有主探测装置和备用探测装置,主探测装置失效时,利用所述备用探测装置替换失效的主探测装置以进行数据采集。
在一个可选的实施例中,所述探测装置包括以下一种或多种:图像采集装置、激光雷达探测装置、毫米波雷达探测装置。
在一个可选的实施例中,所述相邻帧的目标检测信息包括前一帧的第一目标检测信息和后一帧的第二目标检测信息。
在一个可选的实施例中,所述根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪,包括:
将所述目标预测信息与所述第二目标检测信息进行比较;
若根据比较结果判定是同一目标,将所述第二目标检测信息中所述目标赋予与第一目标检测信息中所述目标相同的标识。
在一个可选的实施例中,所述方法还包括:
若根据比较结果判定不是同一目标,对所述第二目标检测信息中所述目标赋予新的标识。
根据本申请实施例的第二方面,提供一种目标检测与跟踪系统,包括:
存储器和处理器;所述存储器通过通信总线和所述处理器连接,用于存储所述处理器可执行的计算机指令;所述处理器用于从所述存储器读取计算机指令以实现上述任一项所述的目标检测与跟踪方法。
根据本申请实施例的第三方面,提供一种可移动平台,包括:
机体;
动力系统,安装在所述机体内,用于为所述可移动平台提供动力;以及,如上述所述的目标检测与跟踪系统。
在一个可选的实施例中,所述可移动平台包括无人车、无人机或无人船。
根据本申请实施例的第四方面,提供一种探测装置,包括:
壳体;
探测器,设于所述壳体,用于采集数据;
以及,如上述所述的目标检测与跟踪系统。
根据本申请实施例的第五方面,提供一种计算机可读存储介质,所述可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现上 述任一项所述方法的步骤。
本申请的实施例提供的技术方案可以包括以下有益效果:
本申请实施例获取相邻帧待检测数据,并根据相邻帧待检测数据,生成相应的相邻帧的目标检测信息及目标预测信息,而目标预测信息根据所述相邻帧中的前一帧的目标检测信息对后一帧进行目标预测,为此,可以根据相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪,实现把目标跟踪问题和目标检测问题整合到一个框架下解决,同时解决目标检测和目标跟踪问题,从而减少重复计算,避免资源浪费。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本说明书。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请根据一示例性实施例示意出的目标检测与跟踪的应用场景图。
图2是本申请根据一示例性实施例示出的一种目标检测与跟踪方法的流程示意图。
图3是本申请根据一示例性实施例示出的一种车辆相对运动示意图。
图4是本申请根据一示例性实施例示出的一种目标检测信息和目标预测信息的示意图。
图5是本申请根据一示例性实施例示出的目标检测与跟踪的框架示意图。
图6是本申请根据一示例性实施例示出的另一种目标检测和跟踪框架 示意图。
图7是本申请根据一示例性实施例示出的一种各阶段进行多源数据融合的示意图。
图8是本申请根据一示例性实施例示出的一种目标检测与跟踪系统的结构示意图。
图9是本申请根据一示例性实施例示出的一种可移动平台的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
具有探测装置的可移动设备得到了广泛的应用,可以用于ADAS(Advanced Driving Assistant System,高级驾驶辅助系统)、自动驾驶、UGV小车,机器人,无人机等产品中进行障碍物的感知,实现避障功能和后续的路径规划等功能。以在自动驾驶和ADAS领域为例,目标跟踪充当重要的角色,目标跟踪算法的准确性将直接影响系统的性能和可靠性。目标跟踪算法主要为目标状态估计(例如目标位置,速度,角速度的估计和预测,轨迹估计和预测,行为估计和预测)提供可靠的观测,而目标状态 估计为自动驾驶路径规划和控制提供重要信息,直接影响自动驾驶安全性。
然而,现有的目标检测和目标跟踪是分开的,即目标检测方案只负责目标检测,目标跟踪方案只负责目标跟踪。而发明人发现,两套方案在输入相同的情况下提取到的特征可能相似或者相同,若针对相同输入数据,用不同的两套方案分别做目标检测和目标跟踪,这样重复计算会造成资源浪费。
基于此,本申请实施例提供一种目标检测与跟踪方案,把目标跟踪问题和目标检测问题整合到一个框架下解决,同时解决目标检测和目标跟踪问题,从而减少重复计算,避免资源浪费。
本实施例提供的目标检测与跟踪方法可以通过软件执行,也可以通过软件和硬件相结合或者硬件执行的方式实现,所涉及的硬件可以由两个或多个物理实体构成,也可以由一个物理实体构成。本实施例方法可以应用于配置有探测装置的可移动平台。其中,可移动平台可以是无人车、无人机、机器人以及无人船等,本实施例方法也可以应用于ADAS等产品中。
探测装置包括但不限于图像采集装置(如单目相机、双目相机)、激光雷达探测装置、毫米波雷达探测装置等。以激光雷达为例,激光雷达可以通过发射激光束探测某个环境中物体的位置、速度等信息,从而获得激光点云。激光雷达可以向包括目标的环境发射探测信号,然后接受从目标反射回来的反射信号,根据反射的探测信号、接收到的反射信号,并根据发送和接收的间隔时间等数据参数,获得激光点云。激光点云可以包括N个点,每个点可以包括x,y,z坐标和intensity(反射率)等参数值。
参见图1所示,是本申请根据一示例性实施例示意出的目标检测与跟踪的应用场景图。在自动驾驶场景中,在汽车A上可以配置有目标检测与跟踪系统以及一个或多个探测装置。探测装置配置在汽车指定位置,以探测周围环境中的目标。图1中汽车B或行人可以作为汽车A行驶过程中待检测的目标。探测装置可以将采集的相邻帧采集数据输入目标检测与跟踪系统,由目标检测与跟踪系统预测出目标检测结果和目标跟踪结果。目标检测结果一般可以包括目标的三维位置、尺寸、朝向、类别等。目标检测 结果可以有多种表示形式,这里取一种表示形式为例介绍,以本车前为x轴,车右为y轴,车下为z轴,检测的目标位置尺寸和朝向可以表示为物体的三维外界框[x0,x1,x2,x3,y0,y1,y2,y3,zmin,zmax](可用box表示),目标类别为class和对应的分数score,其中(x0,y0),(x1,y1),(x2,y2),(x3,y3)为三维外接框在俯视图下的四个顶点,zmin,zmax表示三维外接框的最小z和最大z坐标。目标跟踪结果可以是相同目标赋予相同标识。由于目标检测和目标跟踪的特征可以共享,从而节约资源。
接下来对本申请目标检测与跟踪方法进行示例说明。
参见图2所示,是本申请根据一示例性实施例示出的一种目标检测与跟踪方法的流程示意图,该方法可以包括以下步骤202至步骤206:
在步骤202中,获取相邻帧待检测数据;
在步骤204中,根据所述相邻帧待检测数据,生成相应的相邻帧的目标检测信息及目标预测信息,其中,根据所述相邻帧中的前一帧的目标检测信息对后一帧进行目标预测,确定所述目标预测信息;
在步骤206中,根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪。
其中,可以基于探测装置采集的相邻帧采集数据获得所述相邻帧待检测数据。探测装置可以是用来探测目标的装置,探测装置可以包括但不限于图像采集装置、激光雷达探测装置、毫米波雷达探测装置等。探测装置可以包括主探测装置和备用探测装置。当主探测装置生效时,即处于不失效的状态时,主探测装置可以单独进行数据采集,也可以结合备用探测装置进行数据采集。当主探测装置失效时,利用所述备用探测装置替换失效的主探测装置以进行数据采集。针对能用于进行目标检测和跟踪的数据对应的探测装置都可以应用在本申请中,在此不一一列举。
本实施例中,探测装置可以有一个,也可以有多个。一个或多个探测装置采集的数据可以作为源输入数据。其中,多个探测装置可以是多个同类探测装置。同类探测装置是采集同一类数据的探测装置。例如,多个同类探测装置可以包括:多个激光雷达探测装置、多个图像采集设备或者多 个毫米波雷达探测装置。
在一个实施例中,当同类探测装置的数量为多个时,可以仅将其中一个探测装置作为主探测装置,主探测装置也可以称为工作探测装置。该主探测装置所采集的数据作为同类探测装置的采集数据,其他剩余探测装置可以作为备用探测装置。在主探测装置失效时,利用备用探测装置替换失效的主探测装置以进行数据采集,实现以其中一个备份探测装置作为新的主探测装置继续采集数据,避免由于探测装置故障而引发检测失败或检测不准的问题。当然,在其他实施例中,同类探测装置可以全部工作,并将所采集的数据作为输入数据。
为了实现在同一框架中进行目标检测和目标跟踪,所提取的特征既用于目标检测,又用于目标跟踪,本申请实施例可以以探测装置采集的相邻帧采集数据作为输入,结合相邻帧采集数据反映的时序信息辅助进行目标检测和跟踪。
关于相邻帧待检测数据,可以是基于探测装置采集的相邻帧采集数据获得的多帧待检测数据。相邻帧采集数据可以是探测装置在相邻采集时间采集的多帧数据。
由于各探测装置所采集的数据格式不同,某些情况下,探测装置采集的采集数据不能直接使用,为此,可以将采集数据进行预处理,处理成本申请架构能处理的结构化数据。例如,在获得探测装置采集的数据后,其中采集数据可以包括以下至少一种:激光点云数据、图像数据、毫米波数据等,可以对采集数据进行预处理。以下对预处理进行示例说明:
在一示例中,以激光点云为例,激光点云为无序数据且各帧数据中激光点的数量不固定,经过预处理后可以得到有序数据,又称为结构化数据。处理后的结构化数据,以用于神经网络(诸如卷积神经网络)进行点云特征的提取。例如,将n*4的向量处理成CNN(Convolutional Neural Networks,卷积神经网络)需要的数据,将无序的激光点云转换成有序的三维图像。其中,预处理可以包括但不限于:体素化处理、三维向二维平面投影处理、分高度对点云进行网格化处理。
以体素化处理为例,是将物体的几何形式表示转换成最接近该物体的体素表示形式,产生体数据集。其不仅包含模型的表面信息,而且能描述模型的内部属性。表示模型的空间体素跟表示图像的二维像素比较相似,只不过从二维的点扩展到三维的立方体单元,而且基于体素的三维模型有诸多应用。本示例中将激光雷达前方的三维空间划分为多个体素(每个体素可理解为预设长度、宽度和高度的小立方体);然后,判断各体素内是否存在激光点,若存在,则将该体素赋值为1;若不存在,则将该体素赋值为0。可理解的是,当一个体素内存在多个激光点时,该体素的赋值可以为激光点的个数。
在另一示例中,以图像数据为例,每个像素的灰阶值,减去灰阶值均值,再除以方差,从而实现对图像数据完成预处理。当然,在一些示例中,还可以对图像数据中像素点进行滤除操作,去除过曝光或欠曝光的像素点等,从而保证图像数据的质量。
关于相邻帧待检测数据,可以至少包括两个采集时间的待检测数据。在一个实施例中,两个采集时间可以是相邻采集时间,例如,相邻帧待检测数据包括:基于第T帧采集数据获得的第T帧待检测数据,以及基于第T+1帧采集数据获得的第T+1帧待检测数据。
而在某些场景中,可能因采样频率比较高,出现对每个采样时间的数据都进行处理导致计算量大等情况,为此,在另一个实施例中,两个采集时间也可以不是相邻采集时间,而是间隔一个或多个实际采样时间的采集时间。例如,相邻帧待检测数据包括:基于第T帧采集数据获得的第T帧待检测数据,以及基于第T+2帧采集数据获得的第T+2帧待检测数据。在该例子中,第T帧待检测数据可以视为相邻帧待检测数据中的上一帧待检测数据,第T+2帧待检测数据可以视为相邻帧待检测数据中的下一帧待检测数据。
而针对如何利用探测装置采集的相邻帧采集数据获得相邻帧待检测数据,在一个实施例中,相邻帧待检测数据中每帧待检测数据可以是采集数据预处理后的数据。具体的,相邻帧待检测数据中每帧待检测数据基于: 同一探测装置在一个采集时间采集的采集数据进行预处理获得。在该实施例中,直接将探测装置采集的数据进行预处理,获得待检测数据。
在另一个实施例中,作为时序融合的一种方式,在数据预处理阶段对多帧数据进行时序融合处理。时序融合,可以是将不同采样时间对应的数据进行融合。具体的,所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得。
其中,同类数据融合处理可以在预处理之前,也可以在预处理之后。以激光点云为例,可以在体素化之前进行融合,或者体素化之后进行融合。
该实施例在预处理阶段将不同采集时间采集的多帧采集数据进行融合,可以为后续确定目标预测信息提供更多依据。
在数据预处理阶段的同类数据融合处理,也可以称为数据层面的时序融合。例如,将第T帧、第T+1帧等连续多帧进行同类数据融合。融合后的数据可以输入单一的神经网络进行结果的预测。
以激光点云为例,假设激光雷达探测装置在第T帧获取到TM个激光点,激光雷达探测装置在第T+1帧获取到TN个激光点,由于激光点云本身是一堆无序的点,作为一种简单快速的同类数据融合处理方式,可以将两帧的激光点直接拼接,即利用这(TM+TN)个激光点进行预测。
在某些场景中,探测装置配置在移动的载体上,可能由于自身运动出现融合数据不准确的情况。以车载激光雷达为例,考虑到激光雷达所在车辆可以移动,因此在进行时序上的融合时,可考虑本车的自身运动。如图3所示,是本申请根据一示例性实施例示出的一种车辆相对运动示意图。车辆A为配置有激光雷达的本车,车辆B为远处车辆,假设车辆B为静止车辆,本车在往前开。在第T帧,激光雷达采集到激光点1,距离为50米。在第T+1帧,采集到激光点1,由于本车往前开了5米,采集到激光点1的距离为45米。可理解的是,静止的车辆B在不同时刻拥有相同的物理世界三维位置,但是由于激光雷达的移动,激光雷达却采集到了不同的激光点云数据。
鉴于此,在一个实施例中,对激光点云等具有距离信息的数据融合时,可以先确定出探测装置的自身运动,然后利用自身运动对距离信息进行校准,可以消除自身运动的影响。具体的,相邻帧采集数据中每帧采集数据包括探测装置与目标的距离信息;在相邻帧采集数据中,由于实际应用中往往是对后一帧的目标进行跟踪,为此,可以将后一帧作为基准数据,其他帧作为待校准数据。相应的,所述同类数据融合处理的过程,包括:
由探测装置的移动速度、待校准数据和基准数据间的时间差确定装置运动位移,并利用所述装置运动位移修正所述待校准数据中的距离信息;
将包含修正后距离信息的其他帧采集数据,与后一帧采集数据进行同类数据融合处理。
该实施例利用装置运动位移来修改探测装置与目标的距离信息,从而避免由于本端自身运动给距离造成的影响,从而提高融合数据的准确性。
在某些场景中,以激光点云为例,由于其物理特性,近处的物体上的扫描点要远多于远处的物体上的激光点,即物体距离激光雷达越远,激光点云越稀疏。为此,进行同类数据融合处理的数据的帧数可以与探测装置同目标的距离呈正相关关系,可以实现按距离进行点云融合。例如,近处的目标融合较少帧的激光点云,根据远处的目标融合更多帧的激光点云,从而可以保证远近不同的物体的激光点云更加均衡。
在获得相邻帧待检测数据后,可以相邻帧待检测数据生成相应的相邻帧的目标检测信息及目标预测信息。
针对相邻帧待检测数据中单帧待检测数据,可以预测出对应的目标检测信息。以相邻帧的目标检测信息包括前一帧的第一目标检测信息和后一帧的第二目标检测信息为例,后一帧可以作为当前帧,以实现确定当前帧的目标是否与前一帧的目标为同一对象,进而实现对目标的跟踪。
为了能在同一框架下实现目标检测和目标跟踪,除了确定目标检测信息外,还可以确定目标预测信息。目标预测信息可以是在前一帧存在目标的情况下,预测该目标对应的特征区域在后一帧的检测信息。相邻帧间目标位置变化量至少包括:上一帧和下一帧间目标位置变化量。示例的,可 以基于第一目标检测信息和相邻帧间目标位置变化量确定目标预测信息。可见,该实施例通过确定相邻间目标位置变化量的方式来预测目标检测信息,易于实现。
在确定相邻帧间目标位置变化量的过程中,可以在不同阶段进行时序融合,以便利用融合后的数据预测相邻帧间目标位置变化量。一方面,可以如上所述在数据预处理阶段进行同类数据融合处理,另一方面,也可以在特征提取阶段,将分别从相邻帧待检测数据中提取的特征数据进行特征融合,该过程可以称为特征层面的时序融合。
在一个示例中,依据从相邻帧待检测数据中提取的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得。
在该实施例中,每帧待检测数据实际融合了多帧采集数据,因此可以利用多帧采集数据预测目标对应的特征区域在下一帧上的检测结果,获得邻帧间目标位置变化量。
例如,假设相邻帧待检测数据包括第T帧待检测数据(第T帧的数据可能融合了T、T-1时刻的采集数据),以及第T+1帧待检测数据(第T+1帧的数据可能融合了T+1、T时刻的采集数据)。从相邻帧待检测数据中提取的特征数据预测相邻帧间目标位置变化量,可以包括第T帧待检测数据与第T+1帧待检测数据间的目标位置变化量,为此,针对第T帧的数据,不仅检测获得T帧目标,还获得该目标在T+1帧的位置,即目标检测信息。
在另一个示例中,将分别从相邻帧待检测数据中提取的特征数据进行特征融合,并依据融合后的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在一个采集时间采集的采集数据进行预处理获得。
该实施例实现在特征层面将不同时间采集的数据进行融合,从而预测相邻间目标位置变化量。
在另一个示例中,将分别从相邻帧待检测数据中提取的特征数据进行 特征融合,并依据融合后的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得。
该实施例既在数据层面将不同时间采集的数据进行融合,又在特征层面将不同时间采集的数据进行融合,结合更多的数据,从而可以提高预测相邻间目标位置变化量的准确性。
针对特征层面的融合,示例的,所述将分别从相邻帧待检测数据中提取的特征数据进行特征融合,可以包括:将分别从相邻帧待检测数据中提取的特征数据中对应元素的数值进行指定运算;或,将分别从相邻帧待检测数据中提取的特征数据中对应元素沿着指定维度进行拼接。
其中,指定运算可以是相加减、取平均值等。关于沿着指定维度进行拼接,例如,将两个张量沿着某个维度拼接为一个新的张量,通常是沿着深度维度这个方向进行拼接。可以理解的是,可以包括但不限于:逐元素操作,沿着特定维度进行拼接等融合手段。
在一个实施例中,针对每帧待检测数据不仅检测出目标检测结果,还预测出该目标对应的特征区域在下一帧的检测结果。例如,当第T帧上存在检测结果时,直接预测该检测结果对应的特征区域在T+1帧上的检测结果,从而可以将前后两帧的检测结果(T+1帧的检测结果是基于第T帧预测出的T+1帧检测结果)进行相关联,获取ID信息,所谓ID信息,是每个目标的一个全局的唯一识别码,即拥有相同ID的目标便是同一个目标,得到跟踪结果。如图4所示,针对第T帧的待检测数据(第T帧的待检测数据可能融合了T、T-1时刻的数据),同时预测了T帧目标A1、以及A1在T+1帧的A2,这样,A1和A2便有相同的ID。同时,对于第T+1帧数据,也预测出了相同ID的B1和B2物体(检测到B1,预测到B2)。接着,在第T+1帧确定A2和B1的对应关系,此时,可以通过一定的距离度量(如欧氏距离)来判断A2是否和B1是同一个目标。若判断出A2和B1为同一目标,则在T、T+1、T+2帧上实现了对目标的跟踪。以此类推,形成了时序上跟踪结果的预测。
例如,在第T帧不仅有检测结果,同样预测有T+1帧的结果。通过前几帧的情况,预测下一帧的结果。当前帧在下一帧多做了一帧或者几帧的预测。例如,在当前帧检测到目标在运动,结合前几帧的运动情况,可以预测出下一帧时目标的位置,同时输出当前帧结果和下一帧结果。
关于如何确定目标位置变化量,在一个示例中,所述相邻帧间目标位置变化量的确定过程可以包括:依据由前一帧或前几帧数据上确定的目标的速度以及前一帧与后一帧间的时间差,获得相邻帧间目标位置变化量。
该实施例可以预测目标和每个目标对应的速度。同样参见图4,第T帧的数据预测出目标A1以及A1的速度S,此时便可以通过计算得到A1在T+1帧的位置A2(预测的速度*(T+1与T的时间差)+T帧的位置)。同样,对于T+1的数据,也可以预测出B1以及B1的速度S2,并计算出T+2时刻的B2。然后通过一定的距离度量,如欧氏距离。距离低于某个阈值,例如,当两车距离小于0.5米时,则认为这两个车是同一辆车,最终可以得到完整的跟踪结果。
在获得目标检测信息后,所述根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪,可以包括:将目标预测信息与第二目标检测信息进行比较;若根据比较结果判定是同一目标,将第二目标检测信息中目标赋予与第一目标检测信息中目标相同的标识。该实施例通过将目标预测信息与第二目标检测信息进行比较,从而将两帧间同一目标进行关联,实现目标跟踪。
例如,可以采用预设条件来判断两者是否为同一目标。预设条件包括但不限于两者距离满足要求、两者为相同类别等。例如,T帧有检测结果、T+1帧有检测结果,将两帧数据进行关联,在T帧的第100、120个像素点找到目标、在T+1帧的第101、121个像素点找到目标,这两个目标的类别相同、位置相近,因此认为这两帧的目标是同一个目标。
在一个示例中,若根据比较结果判定不是同一目标,对所述第二目标检测信息中所述目标赋予新的标识。
在获得跟踪结果后,还可以进行数据后处理操作,例如,非极大值抑 制等操作。预测往往是稠密预测,重叠度很大。非极大值抑制去除掉重叠度非常高的框,去除冗余,提高运算效率。作为一种示例,同一架构输出的数据可以包括:位置(x,y,z)、类别、朝向、ID等信息。
为了方便理解,提供一种能同时解决目标检测和目标跟踪问题的框架,如图5所示,是本申请根据一示例性实施例示出的目标检测与跟踪的框架示意图。左边部分表示前一帧(第T帧)目标检测的处理流程,右边部分表示后一帧(也可称为当前帧,第T+1帧)目标检测的处理流程,中间部分表示后一帧目标跟踪的处理流程。以下对目标跟踪进行示例说明:
获取前一帧目标检测CNN中间层的特征feature0(维度c1xHxW)和后一帧目标检测CNN中间层的特征feature1(维度c1xHxW)。将feature0和feature1做关联(correlation),得到结果(维度为c2xHxW),再和feature0和feature1拼成一个张量得到融合后的特征fused_feature(维度为(2*c1+c2)xHxW)。correlation为获取相邻帧时序变化信息的一种方式,具体公式如下:
Figure PCTCN2019111628-appb-000001
-d<=p,q<=d。假设输入特征为xt和xt+τ,维度都为(cxHxW),输出维度是((2d+1)^2xHxW),xt(i,j),xt+τ(i,j)表示cx1维向量,<>表示两个向量的内积。
目标跟踪CNN根据输入fused_feature预测两帧之间目标位置的变化量。
针对目标跟踪后处理:
Box转换:首先根据前一帧目标检测的结果box0和目标跟踪网络预测的目标位置变化量△box相加得到后一帧目标可能的位置box1,box1=box0+△box,这样,box0和box1是一一对应的,对于每个box0预测到一个box1。
数据关联:对于同一个目标,目标跟踪网络在前一帧预测的box1和目标检测网络在后一帧实际检测的box2一般是非常接近的,这也是目标跟踪网络期望预测的结果。所以可以根据目标跟踪网络预测的box1和实际后一 帧目标检测网络的结果box2的距离判断哪两个目标是同一个目标,即可完成目标的关联。可以理解的是,在判断两个目标是否是同一个目标时,还可以比较两个目标是为相同类别等,在此不一一赘述。
获取跟踪ID:box1和box2建立关联关系后,可以确定box2和box0的对应关系,因此,可以把box0的ID复制给对应的box2完成跟踪ID的获取。如果是目标首次被检测到,也就是上一帧没有与之对应的box0,则要给此box2赋值一个新的ID,从而实现目标跟踪。
在一个示例中,还可以对tracklet进行维护。对于某个目标在多帧之间跟踪的结果,形成一个tracklet,一个tracklet为同一个目标在多帧数据里目标检测的box和class score构成的一个序列,将数据关联步骤中某个检查到的目标box2和上一帧的某个目标box0是同一个目标,那么box0和box2将被存到一个tracklet中。
在一个示例中,可以预先设对m帧不同采集时间采集的数据进行融合,可以增加判断获取的帧数是否是m帧。如果只获取了m-1帧,系统也可以根据这m-1帧获取检测和跟踪结果。
相关技术中,目标检测算法只用单帧数据进行检测而未利用时序信息,造成检测结果会有很大噪声,导致用户无法正确的区分物体。而实际场景中,比如单看单帧激光点云数据无法区分哪里有没有车,但是如果观察动态的视频数据,就可以明显从动态视频中找出哪里有车。为此,本申请实施例利用相邻帧待检测数据中的有时序信息辅助目标检测,会使目标检测结果会更加稳定可靠。
在一个示例中,还可以利用目标跟踪时序信息对目标检测进行辅助。从时间的连续性考虑,如果某一个物体在前几帧都有检测到,那么时间不会突变,目标也不会突然消失,所以该目标在之前的位置应该更容易被检测到。对于一个tracklet从记录开始起累计跟踪帧数N,和累计分数(累计class score)SUM_SCORE。对于一个检测结果detection box and class score,其目标检查结果class score可以进行修正:
class score*:=class score+α*SUM_SCORE/N
class score*是修正后的分数,class score是修正前的分数。如果这个box和某tracklet关联(即和这个tracklet中上一帧目标检查结果关联),那么这个tracklet的累计跟着帧数N加1,累计分数SUM_SCORE加上新box的class score*,否则存入新的tracklet,N和SUM_SCORE做相同操作。
该操作后就能把目标检查的分数根据目标跟踪的结果进行的修正,结合了时序信息就会更加稳定。
目前目标检测算法和目标跟踪算法仅针对单一数据源,例如图像或者激光点云。以激光点云为输入的基于深度学习的三维目标检测算法主要解决的问题是,给定某一小段时间累积的激光点云,求得激光点云扫到的物体的三维位置,尺寸,朝向,类别等信息,为自动驾驶车辆提供周围的感知信息。而发明人发现,结合多源数据可以优劣互补,达到更好的鲁棒性。例如,考虑到激光点云中激光点较为稀疏,但记录了准确的三维信息,而图像数据较为稠密包含更多的语义信息,但是缺少准确的三维信息,因此,探测装置包括采集不同类采集数据的探测装置,相邻帧待检测数据基于不同类探测装置采集的多源采集数据获得,将多源数据进行融合可以达到更好的鲁棒性。
本实施例中,探测装置可以有多个。多个探测装置采集的数据可以作为源输入数据。其中,多个探测装置可以是多个不同类探测装置。不同类探测装置是采集不同类数据的探测装置。如,多个不同类探测装置可以包括:激光雷达探测装置、图像采集设备和毫米波雷达探测装置中至少两个的组合,各组合中各探测装置的数量不定。针对配置有多个不同类探测装置的实施例中,每次工作时,可以从不同类探测装置中择一进行数据采集,也可以利用不同类探测装置同时采集不同类的数据,后续将不同类数据进行融合。
还需要说明的是,当多个探测装置采集数据时,若有探测装置失败则退出数据采集,利用剩余探测装置采集数据,从而保证源输入数据的有效性。例如,当多个探测装置中任意两个均为不同类的探测装置时,若其中部分探测装置失效,则失效的探测装置退出数据采集或者其数据丢弃,将 剩余各探测装置采集的数据作为源输入数据,从而保证源输入数据的有效性。可理解的是,由于部分探测装置退出,源输入数据的种类会相应地减少,利用这些源输入数据同样可以计算出相应的检测结果或者跟踪结果,与未退出部分探测装置之前,该检测结果或者跟踪结果的准确度会适当降低,但不影响正常使用。换言之,本实施例中通过获取多个探测装置的数据可以提高计算结果的鲁棒性。
关于多源数据的融合,可以包括三个阶段中的一个或多个阶段的融合:数据预处理阶段的多源数据融合、提取特征过程中的多源数据融合、以及提取特征后的多源数据融合。为了方便理解,如图6所示,是本申请根据一示例性实施例示出的另一种目标检测和跟踪框架示意图。该示意图以探测装置包括激光探测装置(探测装置1)、图像采集装置(探测装置2)以及毫米波雷达探测装置(探测装置3)为例,实际上本架构可以支持各类探测装置。检测到的多源数据包括激光点云、图像和毫米波雷达点云。该示意图中,不同探测装置采集的数据在数据预处理阶段、CNN提特征阶段、以及提取特征后,都可以进行多源数据融合。该示意图还以探测装置1为例,在数据预处理阶段进行第T帧和第T+1帧的同类数据融合处理。可见,该实施例可以在单帧(相同采集时间的数据)、多帧(不同采集时间的数据)阶段都会发生融合。在数据预处理阶段进行信息交互时、CNN提取特征阶段进行信息交互时、CNN提取特征阶段后,都发生融合。实现结合更多的数据,以达到优劣互补,鲁棒性更好。作为一种示例,根据多探测装置融合的结果,可以输出目标的位置(xyz)、类别、朝向、ID(应用在跟踪)。在时序融合后,输出检测结果。
本申请可以在基于点云或图像的三维目标检测算法基础上进行改进,进行多传感器数据的融合,并将时序信息进行融合,把目标跟踪问题和目标检测问题整合到一个框架下解决,同时解决目标检测和目标跟踪问题,同时,为了保证系统层面上更好的鲁棒性和稳定性,本发明同时支持单个传感器数据进行相应的检测与跟踪,得到最终的感知结果。
接下来对不同阶段的融合处理进行示例说明。
针对数据预处理阶段,在一个例子中,针对所述相邻帧待检测数据中相同采集时间的待检测数据,采用以下方式获得:
基于不同类探测装置采集的不同类采集数据分别进行预处理;或,
基于不同类探测装置采集的不同类采集数据分别进行预处理,并将预处理后的数据进行多源数据融合处理。
在数据预处理阶段,由于可以配置有多个不同类的探测装置,则可以将不同类探测装置中部分探测装置采集的数据与其他探测装置采集的数据融合,而部分探测装置采集的数据不进行融合,又或者,所有探测装置采集的数据都进行多源数据融合,或者所有探测装置采集的数据都不进行多源数据融合,具体可以根据需求配置。
相应的,针对不同探测装置中第一指定探测装置,将本探测装置的采集数据进行预处理,获得待检测数据。针对不同探测装置中第二指定探测装置,将对本探测装置采集数据进行预处理后的数据,与其他指定探测装置采集数据进行预处理后的数据进行多源数据融合处理。
其中,第一指定探测装置可以是预先指定的不需要进行多源数据融合的探测装置,可以有一个或多个。第二指定探测装置可以是预先指定的需要进行多源数据融合的探测装置,与其进行多源数据融合的其他探测装置也可以预先指定,第二指定探测装置可以有一个或多个,具体如何指定可以根据探测装置采集的数据是否有缺陷以及具体应用场景决定。
关于数据预处理阶段的多源数据融合,在一示例中,以激光点云和图像数据组合为例,对激光点云进行预处理时,激光点云可以为n*4向量,各激光点包括x坐标信息、y坐标信息、z坐标信息和反射强度(intensity)。图像数据是3通道的RGB值,尺寸为H*W。本示例中,先标定激光点云和图像数据,然后,根据标定后的激光点云和图像数据,可以确定出各激光点对应的图像坐标,其中图像数据包括3通道的RGB值,尺寸为H*W。之后,根据投影关系,从图像上找到激光点所对应的像素,提取出对应的RGB颜色信息,从而可以将图像数据中的RGB颜色信息融合到激光点,即激光点从4维(x,y,z,intensity)扩展为7维(x,y,z,intensity,r, g,b),换言之,本次融合实现对了激光点云进行着色的效果。如图7所示,示意出数据预处理阶段可以进行多源数据融合。
其中,关于标定,以对激光点云跟图像进行融合为例,得知两者的对应关系。例如,获取到二维图像,能够确定真实的三维坐标跟图像坐标的对应关系。
同理,对于图像上的每个像素,也同样可以对应到三维激光点(即将激光点云投影到图像上),这样图像数据可以从3维(r,g,b)扩展到7维(r,g,b,x,y,z,intensity)。数据融合的目的主要是将不同类型的数据进行互补。例如,激光点云记录了准确的三维信息,但是现有的激光雷达所采集到的激光点云较为稀疏。与之相对的是,图像数据更为稠密,包含更多语义信息,但是由于投影原理,图像缺少准确的三维信息,对于小孔成像模型而言,仅包含近大远小一类的粗略三维信息,为此,结合更多的数据,以达到优劣互补,鲁棒性更好。
另外,数据预处理阶段的多源数据融合和时序融合还可以同时进行。例如,将基于不同类探测装置采集的不同类采集数据分别进行预处理,并将预处理后的数据进行多源数据融合处理,获得多源数据融合处理后的数据,再将不同采样时间对应的多源数据融合处理后的数据进行同类数据融合,获得待检测数据。
针对提取特征过程中的多源数据融合、以及提取特征后的多源数据融合,可以认为是特征层面的多源数据融合。在一个实施例中,所述相邻帧的目标检测信息及目标预测信息基于从所述相邻帧待检测数据中提取的特征数据获得,提取特征数据的过程包括以下一种或多种:
在指定网络层提取特征后,将从不同类数据中提取的特征进行多源数据融合处理,并将融合处理后的数据作为下一网络层的输入数据;
将不同网络层提取的特征进行同类数据融合处理;
将与不同类数据对应的最后一层网络层输出的特征数据进行多源数据融合处理。
不同网络层提取的特征一般侧重点不同。特征提取可以通过神经网络 实现,特别是卷积神经网络。一般而言,网络越深,可能会更偏语义特征。指定网络层可以根据需求配置,例如,指定网络层可以是靠近输入层的网络层,在靠近输入层进行多源数据融合,融合的可以是局部细节特征。指定网络层也可以是靠近输出层的网络层,在靠近输出层进行多源数据融合,融合的可以是全局特征等。
如图7所示,示意出特征提取阶段可以进行多源数据融合,多源数据的对应网络层进行融合。可以理解的是,可以针对每个网络层都进行多源数据融合,也可以指定部分网络层进行多源数据融合,不同层提取的特征进行融合,可以融合不同侧重点的特征数据,结合更多的数据,达到优劣互补,提高鲁棒性。
对于各类探测装置采集的数据,最后还可以有融合模块进行数据的融合与交互。融合的方式包括但不限于,逐元素操作(相加减、取平均值等,将特征向量每个值对应运算),沿着特定维度进行拼接(将两个张量沿着某个维度拼接为一个新的张量。例如,将激光雷达、图像传感器输出的特征,沿着任何一个维度进行拼接,通常是沿着深度维度这个方向进行拼接)。需要注意的是,当对不同类探测装置采集的数据的特征进行融合时,还可以考虑到他们之间的物理位置关系(即数据融合时考虑的投影对应关系)。
在一个示例中,同类数据的不同网络层也可以进行融合,实现将不同网络层提取的特征进行同类数据融合处理。例如,神经网络是多层神经网络,假设有100层网络层,如果第50层网络层提取特征1,第100层网络层提取到特征2,可以将特征1和特征2进行融合,通过将不同特征层提取的特征进行融合,可以提高鲁棒性。
目前的目标跟踪算法往往是和目标检测分离的,目标检测只负责目标检测,目标跟踪只负责目标跟踪,但是实际上这两个问题是有很大关联的,首先输入是相同的所以可以提取到的特征也是相似的甚至是相同的,用不同两套方法分别做目标检测和跟踪会造成资源浪费,因为相似的特征是可以共用的,本申请实施例共用这些特征可以减小重复计算。目前目标检测算法只用单帧数据进行检测,无法利用时序信息,会造成检测结果会有很 大噪声,有时序信息辅助的目标检测结果或更加稳定可靠。
相应地,请参阅图8,本申请实施例还提供了一种目标检测与跟踪系统,目标检测与跟踪系统800可以包括:存储器82和处理器84;所述存储器82通过通信总线和所述处理器84连接,用于存储所述处理器84可执行的计算机指令;所述处理器84用于从所述存储器82读取计算机指令以实现上述任一项所述的目标检测与跟踪方法。例如,当计算机指令被执行时,用于执行以下操作:
获取相邻帧待检测数据;
根据所述相邻帧待检测数据,生成相应的相邻帧的目标检测信息及目标预测信息,其中,根据所述相邻帧中的前一帧的目标检测信息在所述相邻帧中的前一帧对后一帧进行目标预测得到,确定所述目标预测信息;
根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪。
所述处理器84执行所述存储器82中包括的程序代码,所述处理器84可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器84可以是微处理器或者该处理器84也可以是任何常规的处理器等。
所述存储器82存储所述的目标检测和跟踪方法的程序代码,所述存储器82可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,目标检测与跟踪系统可以与通过网络连接执行存储器82的存储功能的网络存储装置协作。存储器82可以是目标检测与跟踪系统的内部存储单元,例如目标检测与跟踪系统的硬盘或内存。存储器82也可以 是目标检测与跟踪系统的外部存储设备,例如目标检测与跟踪系统上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器82还可以既包括目标检测与跟踪系统的内部存储单元也包括外部存储设备。存储器82用于存储计算机程序代码以及目标检测与跟踪系统所需的其他程序和数据。存储器82还可以用于暂时地存储已经输出或者将要输出的数据。
这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器102(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器中并且由控制器执行。
相应的,请参阅图9,本申请实施例还提供了一种可移动平台900,包括:
机体92;
动力系统94,安装在所述机体92内,用于为所述可移动平台提供动力;以及,
如上述所述的目标检测与跟踪系统800。
本领域技术人员可以理解,图9仅仅是可移动平台的示例,并不构成对可移动平台的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如可移动平台还可以包括输入输出设备、网络接入设备等。
在一个示例中,所述可移动平台包括无人车、无人机或无人船。
相应的,本申请实施例还提供了一种探测装置,包括:
壳体;
探测器,设于所述壳体,用于采集数据;
以及,如上述任一项所述的目标检测与跟踪系统。
本领域技术人员可以理解,该实施例仅仅是探测装置的示例,并不构成对探测装置的限定,可以包括比上述更多或更少的部件,或者组合某些部件,或者不同的部件等。
相应地,本实施例还提供了一种计算机可读存储介质,所述可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现上述任一项所述方法的步骤。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本发明实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变 之处,综上所述,本说明书内容不应理解为对本发明的限制。

Claims (21)

  1. 一种目标检测与跟踪方法,其特征在于,所述方法包括:
    获取相邻帧待检测数据;
    根据所述相邻帧待检测数据,生成相应的相邻帧的目标检测信息及目标预测信息,其中,根据所述相邻帧中的前一帧的目标检测信息对后一帧进行目标预测,确定所述目标预测信息;
    根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪。
  2. 根据权利要求1所述的方法,其特征在于,所述相邻帧待检测数据至少包括两个采集时间的待检测数据,所述相邻帧的目标检测信息包括前一帧的第一目标检测信息和后一帧的第二目标检测信息;
    基于所述第一目标检测信息和相邻帧间目标位置变化量确定所述目标预测信息。
  3. 根据权利要求2所述的方法,其特征在于,所述相邻帧间目标位置变化量的确定过程包括:
    依据从相邻帧待检测数据中提取的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得;或,将分别从相邻帧待检测数据中提取的特征数据进行特征融合,并依据融合后的特征数据预测相邻帧间目标位置变化量;所述相邻帧待检测数据中每帧待检测数据基于:同一探测装置在一个采集时间采集的采集数据进行预处理获得,或者,同一探测装置在相邻采集时间采集的多帧采集数据进行同类数据融合处理和预处理获得。
  4. 根据权利要求3所述的方法,其特征在于,所述相邻帧采集数据中每帧采集数据包括探测装置与目标的距离信息;在相邻帧采集数据中,后一帧作为基准数据,其他帧作为待校准数据;
    所述同类数据融合处理的过程,包括:
    由探测装置的移动速度、待校准数据和基准数据间的时间差确定装置运动位移,并利用所述装置运动位移修正所述待校准数据中的距离信息;
    将包含修正后距离信息的其他帧采集数据,与后一帧采集数据进行同类数据融合处理。
  5. 根据权利要求3所述的方法,其特征在于,进行同类数据融合处理的数据的帧数与所述探测装置和目标的距离呈正相关关系。
  6. 根据权利要求3所述的方法,其特征在于,所述将分别从相邻帧待检测数据中提取的特征数据进行特征融合,包括:
    将分别从相邻帧待检测数据中提取的特征数据中对应元素的数值进行指定运算;或,
    将分别从相邻帧待检测数据中提取的特征数据中对应元素沿着指定维度进行拼接。
  7. 根据权利要求2所述的方法,其特征在于,所述相邻帧间目标位置变化量的确定过程包括:依据由前一帧或前几帧待检测数据确定的目标的速度以及前一帧与后一帧间的时间差,获得相邻帧间目标位置变化量。
  8. 根据权利要求1或2所述的方法,其特征在于,所述探测装置包括采集不同类采集数据的探测装置;和/或,所述相邻帧待检测数据基于探测装置采集的相邻帧采集数据获得。
  9. 根据权利要求8所述的方法,其特征在于,针对所述相邻帧待检测数据中相同采集时间的待检测数据,采用以下方式获得:
    基于不同类探测装置采集的不同类采集数据分别进行预处理;或,
    基于不同类探测装置采集的不同类采集数据分别进行预处理,并将预处理后的数据进行多源数据融合处理。
  10. 根据权利要求8所述的方法,其特征在于,所述相邻帧的目标检测信息及目标预测信息基于从所述相邻帧待检测数据中提取的特征数据获得,提取特征数据的过程包括以下一种或多种:
    在指定网络层提取特征后,将从不同类数据中提取的特征进行多源数据融合处理,并将融合处理后的数据作为下一网络层的输入数据;
    将不同网络层提取的特征进行同类数据融合处理;
    将与不同类数据对应的最后一层网络层输出的特征数据进行多源数据融合处理。
  11. 根据权利要求9或10所述的方法,其特征在于,所述多源数据融合处理的过程包括:将不同类数据对应元素进行拼接。
  12. 根据权利要求1所述的方法,其特征在于,针对采集同一类数据的探测装置配置有主探测装置和备用探测装置,主探测装置失效时,利用所述备用探测装置替换失效的主探测装置以进行数据采集。
  13. 根据权利要求1所述的方法,其特征在于,所述探测装置包括以下一种或多种:图像采集装置、激光雷达探测装置、毫米波雷达探测装置。
  14. 根据权利要求1所述的方法,其特征在于,所述相邻帧的目标检测信息包括前一帧的第一目标检测信息和后一帧的第二目标检测信息。
  15. 根据权利要求14所述的方法,其特征在于,所述根据所述相邻帧的目标检测信息以及所述目标预测信息,进行目标跟踪,包括:
    将所述目标预测信息与所述第二目标检测信息进行比较;
    若根据比较结果判定是同一目标,将所述第二目标检测信息中所述目标赋予与第一目标检测信息中所述目标相同的标识。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    若根据比较结果判定不是同一目标,对所述第二目标检测信息中所述目标赋予新的标识。
  17. 一种目标检测与跟踪系统,其特征在于,包括:
    存储器和处理器;所述存储器通过通信总线和所述处理器连接,用于存储所述处理器可执行的计算机指令;所述处理器用于从所述存储器读取计算机指令以实现权利要求1至16任一项所述的目标检测与跟踪方法。
  18. 一种可移动平台,其特征在于,包括:
    机体;
    动力系统,安装在所述机体内,用于为所述可移动平台提供动力;以及,
    如权利要求17所述的目标检测与跟踪系统。
  19. 根据权利要求18所述的可移动平台,其特征在于,所述可移动平台包括无人车、无人机或无人船。
  20. 一种探测装置,其特征在于,包括:
    壳体;
    探测器,设于所述壳体,用于采集数据;
    以及,如权利要求17所述的目标检测与跟踪系统。
  21. 一种计算机可读存储介质,其特征在于,所述可读存储介质上存储有若干计算机指令,所述计算机指令被执行时实现权利要求1至16任一项所述方法的步骤。
PCT/CN2019/111628 2019-10-17 2019-10-17 目标检测与跟踪方法、系统、可移动平台、相机及介质 WO2021072696A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980033189.5A CN112154444B (zh) 2019-10-17 2019-10-17 目标检测与跟踪方法、系统、可移动平台、相机及介质
PCT/CN2019/111628 WO2021072696A1 (zh) 2019-10-17 2019-10-17 目标检测与跟踪方法、系统、可移动平台、相机及介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/111628 WO2021072696A1 (zh) 2019-10-17 2019-10-17 目标检测与跟踪方法、系统、可移动平台、相机及介质

Publications (1)

Publication Number Publication Date
WO2021072696A1 true WO2021072696A1 (zh) 2021-04-22

Family

ID=73891473

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/111628 WO2021072696A1 (zh) 2019-10-17 2019-10-17 目标检测与跟踪方法、系统、可移动平台、相机及介质

Country Status (2)

Country Link
CN (1) CN112154444B (zh)
WO (1) WO2021072696A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380039A (zh) * 2021-07-06 2021-09-10 联想(北京)有限公司 数据处理方法、装置和电子设备
CN113538516A (zh) * 2021-07-19 2021-10-22 中国兵器工业计算机应用技术研究所 基于记忆信息的目标对象跟踪方法、装置及电子设备
CN113901909A (zh) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 基于视频的目标检测方法、装置、电子设备和存储介质
CN114063079A (zh) * 2021-10-12 2022-02-18 福瑞泰克智能系统有限公司 目标置信度获取方法、装置、雷达系统和电子装置
CN114067353A (zh) * 2021-10-12 2022-02-18 北京控制与电子技术研究所 一种采用多功能加固处理机实现多源数据融合的方法
CN114663596A (zh) * 2022-04-03 2022-06-24 西北工业大学 基于无人机实时仿地飞行方法的大场景建图方法
CN116012949A (zh) * 2023-02-06 2023-04-25 南京智蓝芯联信息科技有限公司 一种复杂场景下的人流量统计识别方法及系统
CN116363163A (zh) * 2023-03-07 2023-06-30 华中科技大学 基于事件相机的空间目标检测跟踪方法、系统及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102575735B1 (ko) * 2021-02-25 2023-09-08 현대자동차주식회사 라이다 표적신호 선별 장치, 그를 포함하는 라이다 시스템 및 그 방법
CN113177931A (zh) * 2021-05-19 2021-07-27 北京明略软件系统有限公司 一种关键部件的检测追踪方法以及装置
CN113253735B (zh) * 2021-06-15 2021-10-08 同方威视技术股份有限公司 跟随目标的方法、装置、机器人及计算机可读存储介质
CN114155720B (zh) * 2021-11-29 2022-12-13 上海交通大学 一种路侧激光雷达的车辆检测和轨迹预测方法
CN114187328B (zh) * 2022-02-15 2022-07-05 智道网联科技(北京)有限公司 一种物体检测方法、装置和电子设备
CN114782496A (zh) * 2022-06-20 2022-07-22 杭州闪马智擎科技有限公司 一种对象的跟踪方法、装置、存储介质及电子装置
CN117827012B (zh) * 2024-03-04 2024-05-07 北京国星创图科技有限公司 一种3d沙盘实时视角跟踪系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292911A (zh) * 2017-05-23 2017-10-24 南京邮电大学 一种基于多模型融合和数据关联的多目标跟踪方法
CN109532719A (zh) * 2018-11-23 2019-03-29 中汽研(天津)汽车工程研究院有限公司 一种基于多传感器信息融合的电动汽车
CN109635657A (zh) * 2018-11-12 2019-04-16 平安科技(深圳)有限公司 目标跟踪方法、装置、设备及存储介质
US20190114804A1 (en) * 2017-10-13 2019-04-18 Qualcomm Incorporated Object tracking for neural network systems
CN109829386A (zh) * 2019-01-04 2019-05-31 清华大学 基于多源信息融合的智能车辆可通行区域检测方法
WO2019127227A1 (en) * 2017-12-28 2019-07-04 Intel Corporation Vehicle sensor fusion
CN110009659A (zh) * 2019-04-12 2019-07-12 武汉大学 基于多目标运动跟踪的人物视频片段提取方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107478220B (zh) * 2017-07-26 2021-01-15 中国科学院深圳先进技术研究院 无人机室内导航方法、装置、无人机及存储介质
CN108803622B (zh) * 2018-07-27 2021-10-26 吉利汽车研究院(宁波)有限公司 一种用于对目标探测数据进行处理的方法、装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292911A (zh) * 2017-05-23 2017-10-24 南京邮电大学 一种基于多模型融合和数据关联的多目标跟踪方法
US20190114804A1 (en) * 2017-10-13 2019-04-18 Qualcomm Incorporated Object tracking for neural network systems
WO2019127227A1 (en) * 2017-12-28 2019-07-04 Intel Corporation Vehicle sensor fusion
CN109635657A (zh) * 2018-11-12 2019-04-16 平安科技(深圳)有限公司 目标跟踪方法、装置、设备及存储介质
CN109532719A (zh) * 2018-11-23 2019-03-29 中汽研(天津)汽车工程研究院有限公司 一种基于多传感器信息融合的电动汽车
CN109829386A (zh) * 2019-01-04 2019-05-31 清华大学 基于多源信息融合的智能车辆可通行区域检测方法
CN110009659A (zh) * 2019-04-12 2019-07-12 武汉大学 基于多目标运动跟踪的人物视频片段提取方法

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380039A (zh) * 2021-07-06 2021-09-10 联想(北京)有限公司 数据处理方法、装置和电子设备
CN113538516A (zh) * 2021-07-19 2021-10-22 中国兵器工业计算机应用技术研究所 基于记忆信息的目标对象跟踪方法、装置及电子设备
CN113538516B (zh) * 2021-07-19 2024-04-16 中国兵器工业计算机应用技术研究所 基于记忆信息的目标对象跟踪方法、装置及电子设备
CN113901909B (zh) * 2021-09-30 2023-10-27 北京百度网讯科技有限公司 基于视频的目标检测方法、装置、电子设备和存储介质
CN113901909A (zh) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 基于视频的目标检测方法、装置、电子设备和存储介质
CN114063079A (zh) * 2021-10-12 2022-02-18 福瑞泰克智能系统有限公司 目标置信度获取方法、装置、雷达系统和电子装置
CN114063079B (zh) * 2021-10-12 2022-06-21 福瑞泰克智能系统有限公司 目标置信度获取方法、装置、雷达系统和电子装置
CN114067353B (zh) * 2021-10-12 2024-04-02 北京控制与电子技术研究所 一种采用多功能加固处理机实现多源数据融合的方法
CN114067353A (zh) * 2021-10-12 2022-02-18 北京控制与电子技术研究所 一种采用多功能加固处理机实现多源数据融合的方法
CN114663596A (zh) * 2022-04-03 2022-06-24 西北工业大学 基于无人机实时仿地飞行方法的大场景建图方法
CN114663596B (zh) * 2022-04-03 2024-02-23 西北工业大学 基于无人机实时仿地飞行方法的大场景建图方法
CN116012949A (zh) * 2023-02-06 2023-04-25 南京智蓝芯联信息科技有限公司 一种复杂场景下的人流量统计识别方法及系统
CN116012949B (zh) * 2023-02-06 2023-11-17 南京智蓝芯联信息科技有限公司 一种复杂场景下的人流量统计识别方法及系统
CN116363163A (zh) * 2023-03-07 2023-06-30 华中科技大学 基于事件相机的空间目标检测跟踪方法、系统及存储介质
CN116363163B (zh) * 2023-03-07 2023-11-14 华中科技大学 基于事件相机的空间目标检测跟踪方法、系统及存储介质

Also Published As

Publication number Publication date
CN112154444A (zh) 2020-12-29
CN112154444B (zh) 2021-12-17

Similar Documents

Publication Publication Date Title
WO2021072696A1 (zh) 目标检测与跟踪方法、系统、可移动平台、相机及介质
CN110163904B (zh) 对象标注方法、移动控制方法、装置、设备及存储介质
Barth et al. Where will the oncoming vehicle be the next second?
US8199977B2 (en) System and method for extraction of features from a 3-D point cloud
WO2020104423A1 (en) Method and apparatus for data fusion of lidar data and image data
CN116310679A (zh) 多传感器融合目标检测方法、系统、介质、设备及终端
US11501123B2 (en) Method and apparatus for asynchronous data fusion, storage medium and electronic device
CN116503803A (zh) 障碍物检测方法、装置、电子设备以及存储介质
CN115187941A (zh) 目标检测定位方法、系统、设备及存储介质
Sakic et al. Camera-LIDAR object detection and distance estimation with application in collision avoidance system
Fu et al. Camera-based semantic enhanced vehicle segmentation for planar lidar
EP4009228A1 (en) Method for determining a semantic free space
CN116681730A (zh) 一种目标物追踪方法、装置、计算机设备和存储介质
CN111612818A (zh) 新型双目视觉多目标跟踪方法及系统
US20220410942A1 (en) Apparatus and method for determining lane change of surrounding objects
CN115327529A (zh) 一种融合毫米波雷达和激光雷达的3d目标检测与追踪方法
WO2022157157A1 (en) Radar perception
Tamayo et al. Improving Object Distance Estimation in Automated Driving Systems Using Camera Images, LiDAR Point Clouds and Hierarchical Clustering
CN115236672A (zh) 障碍物信息生成方法、装置、设备及计算机可读存储介质
Lee et al. Realtime object-aware monocular depth estimation in onboard systems
CN114705121B (zh) 车辆位姿测量方法、装置及电子设备、存储介质
Berrio et al. Semantic sensor fusion: From camera to sparse LiDAR information
CN115994934B (zh) 数据时间对齐方法、装置以及域控制器
CN117606500A (zh) 减速带检测方法、通过方法、网络训练方法及相关装置
US20240077617A1 (en) Perception for point clouds

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949021

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19949021

Country of ref document: EP

Kind code of ref document: A1