WO2021170030A1 - 一种用于目标追踪的方法、设备及系统 - Google Patents
一种用于目标追踪的方法、设备及系统 Download PDFInfo
- Publication number
- WO2021170030A1 WO2021170030A1 PCT/CN2021/077845 CN2021077845W WO2021170030A1 WO 2021170030 A1 WO2021170030 A1 WO 2021170030A1 CN 2021077845 W CN2021077845 W CN 2021077845W WO 2021170030 A1 WO2021170030 A1 WO 2021170030A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- trajectory
- neural network
- tracking
- sensors
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 128
- 230000033001 locomotion Effects 0.000 claims abstract description 148
- 238000012544 monitoring process Methods 0.000 claims abstract description 31
- 238000013528 artificial neural network Methods 0.000 claims description 86
- 238000013527 convolutional neural network Methods 0.000 claims description 72
- 238000012545 processing Methods 0.000 claims description 47
- 230000004927 fusion Effects 0.000 claims description 40
- 230000015654 memory Effects 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 13
- 210000005036 nerve Anatomy 0.000 claims 1
- 238000004458 analytical method Methods 0.000 abstract description 15
- 230000006399 behavior Effects 0.000 description 44
- 238000005259 measurement Methods 0.000 description 21
- 230000000875 corresponding effect Effects 0.000 description 20
- 230000008569 process Effects 0.000 description 19
- 238000004364 calculation method Methods 0.000 description 18
- 238000010586 diagram Methods 0.000 description 17
- 238000004422 calculation algorithm Methods 0.000 description 12
- 238000012549 training Methods 0.000 description 11
- 230000005540 biological transmission Effects 0.000 description 9
- 238000012806 monitoring device Methods 0.000 description 8
- 238000007500 overflow downdraw method Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 6
- 238000012706 support-vector machine Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000011176 pooling Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000013145 classification model Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000014759 maintenance of location Effects 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 230000029305 taxis Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000010921 in-depth analysis Methods 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30241—Trajectory
Definitions
- This application relates to the field of intelligent security technology, and in particular to a method, equipment and system for target tracking, more specifically, it is suitable for tracking targets (such as suspicious persons) after changing vehicles (such as cars). Switching of the tracking target (for example, the tracking target is switched to a car).
- the aforementioned surveillance camera is also one of the main tools for tracking suspicious persons and assisting in the detection of cases.
- the current automatic target tracking methods mainly include: (1) Single-lens target tracking: that is, pedestrians or vehicles are tracked in the same camera, and when the target disappears (for example, occlusion), through the pedestrian re-identification (Person Re-identification, Person Re-ID (referred to as Re-ID) algorithm re-tracks the target in the shot. (2) Cross-lens target tracking: When the target leaves the shooting range of the current lens, the target is also identified in another lens through the Re-ID algorithm, and the target is tracked again. Therefore, the current automatic tracking method is limited to tracking the same target in each shot, which will cause the tracking process to be discontinuous, that is, the target will leave the monitor's field of view within a certain period of time. In addition, the environment (light, occlusion, and target pose, etc.) of the cross-lens scene is very complicated, and the accuracy and computational efficiency of the Re-ID algorithm still need to be considered.
- cross-target tracking refers to the switching of the tracked target during the monitoring process (such as: suspicious as the initial tracking target). Personnel changed the means of transportation), the target after switching needs to be tracked.
- the application scenario of this kind of cross-target target tracking is mainly carried out by the monitoring personnel manually switching targets for tracking. For example, after a suspicious person gets on a bus, the monitoring personnel switch the tracked target from a person to a vehicle. This method of realization through manual participation cannot guarantee the real-time performance of target tracking and occupies a lot of human resources.
- the present application provides a method, equipment and system for target tracking, which can also realize the switching and continuous tracking of the tracking target after the tracking target changes to the vehicle (ie target switching occurs), which enables the monitoring personnel to fully grasp the tracking
- the whereabouts of the target greatly improves the efficiency of target tracking.
- the first aspect of the present application provides a method applied to target tracking.
- the method includes: acquiring, by a sensor, a motion trajectory of a target included in a scene where the first target is located, wherein the target included in the scene where the first target is located includes the first target And at least one other target except the first target, and the first target is the initial tracking target; then the second target is determined according to the movement trajectory of the first target and the at least one other target, and the second target is taken as the new target Track the target.
- the trajectory of the target in a period of time is constituted by the position of the target at each moment in the period of time.
- the location of the target can be represented by coordinates, and the coordinates can be in the east-north-sky coordinate system or in the north-east-earth coordinate system. All the embodiments of the present invention do not specifically limit the specific coordinate system type.
- the video surveillance system automatically determines the updated tracking target, ensuring the continuity of the tracking target's trajectory, and will not miss the tracking target at any time
- the position information of the segment improves the tracking efficiency.
- the above method uses trajectory data to determine whether the original tracking target has a switching behavior. Compared with the method of directly using video data for intelligent behavior analysis, the amount of calculation is reduced, and the requirement for computing power is reduced.
- the "scene where the first target is located" in the foregoing method refers to the real scene where the first target is located.
- the range of the scene may be an area centered on the first target and a radius of 100 meters.
- the embodiment of this application does not limit the specific scope of the scene, which depends on the specific situation. It should be noted that the scene where the first target is located is always changing with the movement of the first target. When the tracking of the first target is started, the trajectory of the first target and other surrounding targets is started to be acquired.
- determining the second target according to the motion trajectory of the first target and the motion trajectory of the at least one other target includes: determining a set of candidate targets, the candidate target being: the at least one other target, or Among the at least one other target, other targets whose distance to the first target is less than a preset threshold; for each candidate target, the motion trajectory of the candidate target and the motion trajectory of the first target are input to The pre-trained first neural network obtains the probability that the candidate target is the second target; and determines the second target according to the probability of at least one candidate target.
- the above-mentioned "candidate target” includes two situations: (1) all other targets in the scene except the original tracking target; (2) other targets in the scene whose distance from the original tracking target is less than a preset threshold.
- the above method inputs the motion trajectory of the first target and the motion trajectory of each candidate target into the pre-trained neural network, and outputs the probability that each candidate target is suspected to be the second target, which not only ensures accuracy but also real-time.
- the first neural network may be a long short-term memory network LSTM. Before using the first neural network, it needs to be trained. For example, it is possible to manually screen out the historical video of some people getting on the car, and obtain the video data of the people and the car during a period of time before getting on the car, and generate their trajectory data. Convert these trajectory data into trajectory feature pairs, label them as a training set, and train the first neural network.
- the input is the target trajectory feature pair contained in the scene where the first target is located, and the output is the probability that each candidate target is suspected to be the second target.
- determining the second target according to the obtained probability includes: when the first probability in the obtained probabilities is higher than a preset threshold, determining that the first probability corresponds to The target is the second target. During the tracking process, the probability that each candidate target among other targets is the second target is continuously calculated. At a certain moment, when the probability of a certain candidate target exceeds the preset threshold, the candidate target corresponding to the probability can be judged That is the second goal.
- inputting the motion trajectory of the first target and the motion trajectory of the candidate target to the pre-trained first neural network, and obtaining the probability that the candidate target is suspected to be the second target includes: For each candidate target, at least one set of trajectory feature pairs is established based on the motion trajectory of the candidate target and the motion trajectory of the first target. Each set of trajectory feature pairs includes at least two trajectory feature pairs at consecutive moments. The trajectory feature pair includes the position and velocity of the first target at the moment, the position and velocity of other targets, and the angle between the first target and the other target's movement direction.
- Each candidate target establishes a trajectory feature pair with the first target according to the above method, so that at least one set of trajectory feature pairs can be obtained, and each candidate target can be output by inputting the at least one set of trajectory feature pairs into the first neural network
- the probability of the second goal Exemplarily, if you want to obtain the probability that a certain candidate target is the second target at the current moment, you need to input a set of feature pairs.
- the set of feature pairs may include 10 feature pairs, and these 10 feature pairs are the current moments. The position, velocity, and angle between the other target and the first target at the previous 10 moments. Inputting this set of trajectory feature pairs into the neural network, the probability that the candidate target is the second target at the current moment can be obtained.
- the trajectory data of the first target and the trajectory data of each other target are respectively established as the input of the neural network.
- the moment when the first probability is higher than the preset threshold is set as the first moment
- the method further includes: acquiring video frames for a period of time before and after the first moment, and the video frames include the first moment.
- a target then the video frame is input to the pre-trained second neural network, and the third target is determined as the tracking target after switching according to the output result.
- the above-mentioned video frame "including the first target" refers to the original tracking target appearing in the video frame. Exemplarily, it may be a video frame shot from the side to a picture of a person getting on the car, or a picture shot of a car door from the front. This implementation method is a further verification on the basis of trajectory judgment.
- the target is directly used as the target after switching; when the third target and the second target are not the same target , With the third target as the target after switching.
- the above method mainly uses computer vision methods to perform intelligent behavior analysis on video data and determine whether the first target is switched to the second target. On the basis of using the trajectory judgment, the relevant video data is extracted for the second judgment, which improves the accuracy of the final judgment.
- the second neural network includes a convolutional neural network and a graph convolutional neural network
- inputting video data to the second neural network to determine the third target includes: inputting the video data to the pre-trained Convolutional neural network, which outputs the features and bounding boxes of all targets in the video data; constructs a graph model according to the features of the targets contained in the video data and the bounding boxes; and inputs the graph model to the pre-trained graph
- the convolutional neural network determines the third target as the tracking target after switching according to the output result.
- the second neural network extracts the features of the target in the video frame and generates the bounding box corresponding to the target, establishes a unique graph model, and then uses the graph convolutional neural network to judge the behavior of the target to determine the new target after switching.
- the second neural network includes a convolutional neural network and a graph convolutional neural network.
- the convolutional neural network is mainly used to extract the features of the target in the video and generate the bounding box of the target, which is used to build the graph model; the graph convolutional neural network is mainly used Based on the constructed graph model, it is judged whether the original tracking target has cross-target behavior. Before using each neural network, they need to be pre-trained.
- the training set may be an image containing a bounding frame that has been manually labeled, and there are people and cars in the image.
- the graph convolutional neural network it is necessary to manually select a video of getting on the car and input these videos into the above-mentioned convolutional neural network to extract the features of the target and generate the bounding box, and generate the graph model according to the method just described. , Mark the graph model as the behavior of getting on the train, and use it as the training set to train the graph convolutional neural network.
- a traditional machine learning model such as a support vector machine (SVM), etc., can also be used for determination.
- the sensor includes at least two groups of sensors, and the orientations of the sensors of different groups are different. For each group of sensors in the at least two groups of sensors, a movement trajectory corresponding to each target of the group of sensors is generated according to the sensing data collected by the group of sensors, so as to obtain at least two movement trajectories of the target; At least two motion trajectories form the fusion trajectory of the target.
- “azimuth” refers to the position or direction of an object in actual space.
- “Different groups of sensors are in different orientations” refers to the fact that the actual physical positions of the sensors of each group are far apart, for example, the distance between the groups is at least two meters.
- sensing range refers to the spatial range that the sensor can sense, and the range of scenes that sensors in different orientations can sense are also different.
- a clustering algorithm can be used to associate the trajectories (sequences of positions) of the first targets under different sensing ranges, indicating that these trajectories (sequences of positions) belong to the same target (first target), and then merge them.
- the Kalman fusion method can be used.
- the same target under different sensing ranges has multiple positions, from which a better position is selected as the measurement position, the estimated position is obtained by methods such as fitting, and the estimated position and the measured position are Kalman fused You can get the final position of the target at that moment.
- the above method combines the target's motion trajectory under different sensing ranges to form the target's final motion trajectory.
- the angle of view of the camera in different positions is different, and the sensing range of the radar in different positions is also different.
- sensors in other directions can continue to provide position data of the target, ensuring the continuity of the target trajectory.
- the trajectory data of multiple sets of sensors in different orientations are merged to improve the accuracy of the final trajectory of the target. It also improves the efficiency of target tracking.
- each group of sensors includes at least two types of sensors, that is, cameras and at least one of the following two types of sensors: millimeter wave radar and lidar, and the two types of sensors are in the same orientation.
- a group of sensors in the same orientation means that the multiple types of sensors that make up the group are placed in nearby physical locations.
- this group of sensors includes cameras and millimeter wave radars, and these two types of sensors are installed on the same pole on the street.
- the embodiment of the present application does not specifically limit the distance between at least two types of sensors in the same group, as long as the sensing ranges of the two types of sensors are substantially the same.
- azimuth refers to the physical location of the sensor in actual space
- sensing range refers to the spatial range that the sensor can sense.
- a monitoring trajectory of the target corresponding to the type of sensor is generated according to the sensing data collected by the type of sensor, so as to obtain at least two monitoring trajectories of the target; fusing the at least two monitoring trajectories The trajectory forms the trajectory of the target.
- fusion of trajectory data collected by different types of sensors can use an innovative Kalman fusion method: a unified priori estimate (the optimal estimate at the previous moment) is used, which is different from the current moment in turn.
- the measured value of the sensor is fused by Kalman to form the optimal estimated value at the current moment, and the optimal estimated value at each moment forms the final trajectory of the target.
- the above method fuses at least two types of sensor data at adjacent locations, and improves the accuracy of the target's trajectory. Moreover, if only the camera is used for tracking, it will inevitably be affected by the external weather.
- the trajectory data provided by the radar is integrated to ensure that the tracking target will not be lost and improve the tracking efficiency.
- the target tracking method provided by the present application can automatically update the tracking target to the new target after the switch when the original tracking target is switched, so that the monitoring personnel can always grasp the whereabouts of the tracking target.
- this application uses the target's motion trajectory data to determine the new target after the original tracking target is switched, which greatly reduces the amount of calculation under the premise of ensuring the accuracy.
- a piece of video frame data is selected, a unique graph model is established, the graph convolutional neural network is used to make a second judgment on the switching behavior of the original tracking target, and more spatiotemporal information is included, which improves the reliability of target tracking.
- the present application provides another method for target tracking, including: acquiring the motion trajectory of a first target through a sensor, the first target being the initial tracking target; and determining the moment when the motion trajectory of the first target disappears is the first At time, obtain a video frame for a period of time before and after the first time, the video frame includes the picture of the first target may be updated to the second target; input the video frame to the pre-trained neural network to determine the second target, and Use the second target as the updated tracking target.
- the above method analyzes the characteristics of the trajectory to find the time point when the original tracking target may switch behavior (for example, the tracking target gets on the car), and obtains relevant video data containing the original tracking target screen according to the time information and conducts behavior analysis to determine the new tracking target. Compared with the behavior analysis of the original tracking target in the whole process, the amount of calculation is greatly reduced.
- determining the initial moment when the first target trajectory disappears is the first moment according to the motion trajectory of the first target includes: judging that the motion trajectory of the first target after the initial moment does not exist, and determining the initial moment The moment is the first moment.
- determining the second target according to the video frame and using the second target as the updated tracking target includes: inputting the video frame to the pre-trained second neural network to determine the second target, and Use the second target as the updated tracking target.
- the second neural network includes a convolutional neural network and a graph convolutional neural network.
- Inputting video data to the second neural network to determine the second target includes: inputting video data to the pre-trained Convolutional neural network, which outputs the features and bounding boxes of all the targets in the video data; constructs a graph model according to the features of all the targets in the video data and the bounding box; inputs the graph model to the pre-trained graph volume
- the product neural network determines the second target as the tracking target after switching according to the output result.
- the second neural network extracts the features of the target in the video frame and generates the bounding box corresponding to the target, establishes a unique graph model, and then uses the graph convolutional neural network to judge the behavior of the target to determine the new target after switching.
- the sensor includes at least two groups of sensors, and the orientations of the sensors of different groups are different. For each group of sensors in the at least two groups of sensors, a movement trajectory of the first target corresponding to the group of sensors is generated according to the sensing data collected by the group of sensors, so as to obtain at least two movement trajectories of the first target; At least two motion trajectories of the target form the fused motion trajectory of the first target.
- “azimuth” refers to the position or direction of an object in actual space.
- “Different groups of sensors are in different orientations” refers to the fact that the actual physical positions of the sensors of each group are far apart, for example, the distance is at least two meters.
- “Sensing range” refers to the spatial range that the sensor can sense, and the range of scenes that sensors in different orientations can sense are also different.
- “different sensing ranges” are for cameras to shoot targets in the scene from different perspectives, and for radars, it is to detect targets in the scene from different directions.
- the Kalman fusion method can be used. Exemplarily, there are multiple position data for the first target under different sensing ranges at each moment, and a better position is selected as the measurement position, and the estimated position is obtained by methods such as fitting, and the estimated position and the measured position are calculated by Kalman. Fusion can get the final position of the first target at that moment.
- the above method combines the movement trajectory of the first target under multiple viewing angles, and forms the final movement trajectory of the first target.
- the angle of view of the camera in different positions is different, and the sensing range of the radar in different positions is also different.
- sensors in other directions can continue to provide position data of the target, ensuring the continuity of the trajectory of the first target.
- each group of sensors includes at least two types of sensors, that is, cameras and at least one of the following two types of sensors: millimeter wave radar and lidar, and the two types of sensors are in the same orientation.
- a group of sensors in the same orientation means that the multiple types of sensors that make up the group are placed in nearby physical locations.
- this group of sensors includes cameras and millimeter wave radars, and these two types of sensors are installed on the same pole on the street.
- the embodiment of the present application does not specifically limit the distance between at least two types of sensors in the same group, as long as the sensing ranges of the two types of sensors are substantially the same.
- sensing range refers to the spatial range that the sensor can sense.
- the sensor when the sensor is a camera, its sensing range refers to the space range of the scene it can shoot; when the sensor is a radar, its sensing range refers to the space of the actual scene it can detect Scope.
- For each type of sensor in the same group of sensors generate a monitoring trajectory corresponding to the first target of the sensor according to the sensing data collected by the sensor, so as to obtain at least two monitoring trajectories of the first target; fuse the At least two monitoring trajectories form the movement trajectory of the first target.
- the fusion trajectory can adopt an innovative Kalman fusion method: a unified prior estimate (the optimal estimate at the previous moment) is used, and Kalman fusion is carried out in sequence with the measured values of different types of sensors at the current moment to form the current
- the optimal estimated value at each moment, and the optimal estimated value at each moment forms the trajectory of the first target.
- the above method fuses at least two types of sensor data at adjacent positions, and improves the trajectory accuracy of the first target.
- the fusion of the trajectory data provided by the radar can ensure that the tracking target will not be lost and improve the tracking efficiency.
- the present application provides an apparatus for target tracking, including: an acquisition module and a processing module; the acquisition module is configured to acquire sensing data of a target included in a scene where a first target is located, and the scene where the first target is located includes The target includes a first target and at least one other target other than the first target, where the first target is an initial tracking target; a processing module for generating a first target and at least one other target according to the sensing data The processing module is also used to determine the second target according to the motion trajectory of the first target and at least one other target, and use the second target as the updated tracking target.
- the processing module is further configured to determine a set of candidate targets, where the candidate targets are: the at least one other target, or among the at least one other target, and the first Other targets whose distances between targets are less than a preset threshold; for each candidate target, the motion trajectory of the first target and the motion trajectory of the candidate target are input to the pre-trained first neural network to obtain the The probability that the candidate target is the second target; and the second target is determined according to the probability that the at least one candidate target is the second target.
- the processing module is further configured to detect that the first probability among the probabilities that the at least one candidate target is the second target is higher than a preset threshold, and determine the first probability The corresponding target is the second target.
- the processing module is further configured to: for each candidate target, establish at least one set of trajectory feature pairs according to the motion trajectories of the candidate target and the first target, and each set of trajectories The position and velocity of the first target, the position and velocity of the candidate target, and the angle between the movement direction of the first target and the other targets at least two consecutive time points of the feature pair; input the at least one set of trajectory feature pairs To the first neural network, output the probability that the candidate target is the second target.
- the processing module is further configured to select video frames before and after the first moment, and the video frames include the first target possibility.
- the picture updated as the third target; the video frame is input to the pre-trained second neural network, and the third target is determined as the updated tracking target according to the output result.
- the second neural network includes a convolutional neural network and a graph convolutional neural network
- the processing module is specifically configured to input video frames to a pre-trained convolutional neural network, and output video frames
- the graph model is constructed according to the feature and bounding box of the target;
- the graph model is input to the pre-trained graph convolutional neural network, and the third target is determined as the tracking target after switching according to the output result.
- the senor includes at least two groups of sensors, and different groups of sensors are located in different orientations, and for each of the first target and the other targets, the processing module is specifically configured to: Generate at least two motion trajectories of the target according to the sensing data collected by the at least two sets of sensor modules; fuse the at least two motion trajectories of the target to form the motion trajectory of the target.
- each group of sensors includes at least two types of sensors, the at least two types of sensors are cameras and at least one of the following two types of sensors: millimeter wave radar and lidar, and the at least two types of sensors are The two types of sensors are in the same orientation, and for each target included in the scene where the first target is located, the processing module is specifically configured to: respectively generate at least two monitoring trajectories of the target according to the sensing data collected by the at least two types of sensor modules ; Fusion of at least two monitoring trajectories of the target to form the motion trajectory of the target.
- the present application provides another device for target tracking, including an acquisition module and a processing module.
- the acquisition module is used to acquire sensing data of a first target through a sensor; the processing module is used to generate a first target based on the sensing data.
- the movement trajectory of the target; the initial moment when the first target trajectory disappears is determined as the first moment; the video data for a period of time before and after the first moment is acquired, and the video data includes that the first target may be updated to the second target
- the screen; the second target is determined according to the video data, and the second target is used as the updated tracking target.
- the processing module is further configured to determine that the trajectory of the first target after the initial time does not exist, and determine that the initial time is the first time.
- the processing module is further configured to input the video data into the pre-trained second neural network to determine the second target, and use the second target as the updated tracking target.
- the second neural network includes a convolutional neural network and a graph convolutional neural network
- the processing module is also used to input the video frame to a pre-trained convolutional neural network, and output the video
- the features and bounding boxes of the targets contained in the data construct a graph model according to the features and bounding boxes of the targets contained in the video frame; input the graph model to a pre-trained graph convolutional neural network, and determine according to the output result
- the second target and use the second target as the updated tracking target.
- the senor includes at least two groups of sensors, and different groups of sensors are located in different orientations
- the processing module is specifically configured to: respectively generate the first target according to the sensing data collected by the at least two groups of sensor modules. At least two motion trajectories of the first target are merged to form the motion trajectory of the first target.
- each group of sensors includes at least two types of sensors, the at least two types of sensors are cameras and at least one of the following two types of sensors: millimeter wave radar and lidar, and the at least two types of sensors are The two types of sensors are in the same orientation, and the processing module is specifically configured to: respectively generate at least two monitoring trajectories of the first target according to the sensing data collected by the at least two types of sensor modules; fuse at least two monitoring trajectories of the first target to form a first The trajectory of the target.
- this application provides a device for target tracking.
- the device includes a processor and a memory, wherein: the memory stores computer instructions; the processor executes the computer instructions to implement the first aspect and possible implementations. The method described in any of the methods.
- this application provides a device for target tracking.
- the device includes a processor and a memory, wherein: the memory stores computer instructions; the processor executes the computer instructions to implement the second aspect and possible implementations. The method described in any of the methods.
- the present application provides a computer-readable storage medium that stores computer program code, which when run on a computer, causes the computer to execute any of the first aspect and possible implementation manners above.
- These computer-readable storages include but are not limited to one or more of the following: read-only memory (ROM), programmable ROM (programmable ROM, PROM), erasable PROM (erasable PROM, EPROM), Flash memory, electric EPROM (electrically EPROM, EEPROM) hard drive (Hard drive).
- the present application provides a computer-readable storage medium that stores computer program code, which when run on a computer, causes the computer to execute any of the above-mentioned second aspect and possible implementation manners The method described.
- These computer-readable storages include but are not limited to one or more of the following: read-only memory (ROM), programmable ROM (programmable ROM, PROM), erasable PROM (erasable PROM, EPROM), Flash memory, electric EPROM (electrically EPROM, EEPROM) hard drive (Hard drive).
- this application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method described in any of the first aspect and possible implementation manners.
- the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method described in the second aspect and possible implementation manners.
- Fig. 1 is a schematic diagram of an application scenario of a method for target tracking provided by an embodiment of the present application.
- Fig. 2 is a schematic diagram of a system architecture for target tracking provided by an embodiment of the present application.
- FIG. 3 is a schematic flowchart of a method for target tracking provided by an embodiment of the present application.
- FIG. 4 is another schematic flowchart of the method for target tracking provided by an embodiment of the present application.
- Fig. 5 is a schematic flowchart of a method for fusing video trajectories and radar trajectories provided by an embodiment of the present application.
- FIG. 6 is a table of the position of a certain target at different moments and different viewing angles according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of the two-dimensional motion trajectory of the original tracking target and other targets provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of the probability that each other target is a new target after switching at each moment output by the LSTM neural network provided by an embodiment of the present application.
- Fig. 9 is a schematic structural diagram of an LSTM unit provided by an embodiment of the present application.
- Fig. 10(a) is a schematic structural diagram of an LSTM neural network provided by an embodiment of the present application.
- Fig. 10(b) is a schematic diagram of a feature pair provided by an embodiment of the present application.
- FIG. 11 is a time period distribution diagram of training samples for training an LSTM neural network provided by an embodiment of the present application.
- FIG. 12 is a schematic flowchart of a method for performing secondary judgment using computer vision according to an embodiment of the present application.
- FIG. 13 is a schematic diagram of the structure flow of a second neural network provided by an embodiment of the present application.
- FIG. 14 is a schematic diagram of a bounding box of a target provided by an embodiment of the present application.
- FIG. 15 is a schematic diagram of a graph model provided by an embodiment of the present application.
- FIG. 16 is a schematic diagram of the hardware structure of the target tracking device provided by an embodiment of the present application.
- FIG. 1 is a schematic diagram of an application scenario of the target tracking method provided by this application.
- the front monitoring device 10 includes a monitoring camera 11 and a radar 12, and the obliquely rear monitoring device 13 includes a monitoring camera 14 and a radar 15.
- the sensing range (angle of view) of the two groups of monitoring devices is different, and each group of monitoring devices will appear within the sensing range of itself
- the target is monitored and tracked, and the trajectory of the target in the world coordinate system is formed.
- the sensing range refers to the range of the scene that the camera can capture; when the monitoring camera is a radar, the sensing range refers to the spatial range that the radar can detect; the angle of view refers to the range of the camera captured Different viewing angles correspond to different sensing ranges.
- the motion track refers to the position of the target in a period of time, which can be represented by coordinates. According to the position calibration of the camera and radar, the target position photographed or detected at each moment is projected into the global coordinate system, so that the target's motion track can be formed.
- millimeter-wave radar In order to improve the accuracy of the trajectory in a certain sensing range, in addition to the use of surveillance cameras to collect video data, millimeter-wave radar (or lidar) is also installed at a physical location adjacent to the camera. The following directly uses “radar” to summarize “millimeter wave radar or lidar”. Fusion of radar data and video data under the same sensing range can form a more precise motion trajectory. In addition, in order to reduce the influence of occlusion on tracking and ensure the continuity of the trajectory, it is also necessary to fuse the motion trajectories under multiple sensing ranges, that is, to fuse the trajectory data collected by the monitoring device 10 and the monitoring device 13 to obtain the target The trajectory of the movement. In the process of being tracked, the tracking target 16 is about to leave by bus 17.
- the solution proposed in this application is to first determine whether the tracking target 16 (original tracking target) has a switching behavior (switching vehicles) according to the motion trajectory of the tracking target 16 and surrounding targets (for example, bus 17 or more other targets) If it is confirmed that the tracking target 16 has a switching behavior, for example, the tracking target 16 gets on the bus 17, then the tracking target is switched to the bus 17, until the original tracking target reappears in a certain monitoring scene screen. Further, after the new tracking target is determined through the trajectory, in order to improve the accuracy of judgment, video data including scenes of suspected switching behaviors will be selected for behavior analysis to determine whether the original tracking target actually has a switching behavior.
- a switching behavior switching vehicles
- the method for automatic tracking provided by this application can continuously track the target. Even if the target switches the vehicle, the target can be monitored and tracked after the switch, without missing any time period, which greatly improves The convenience and reliability of personnel tracking and deployment.
- FIG. 2 is a schematic diagram of the system architecture of an automatic tracking system provided by this application.
- the system includes a terminal node 21 (Terminal Node, TNode), an edge node 22 (Edge Node, ENode), and a server side node 23 (Server Node, SNode).
- TNode Terminal Node
- ENode Edge Node
- SNode server side node 23
- Each node can perform computing tasks independently, and each node can also communicate through the network to complete task delivery and upload results.
- Network transmission methods include wired transmission and wireless transmission. Among them, wired transmission methods include data transmission using Ethernet, optical fiber, etc., and wireless transmission methods include broadband cellular network transmission methods such as 3G (Third generation), 4G (Fourth generation), or 5G (Fifth generation).
- the end-side node 21 can be used to fuse video trajectories and radar trajectories in the same sensing range.
- the end-side node 21 may be the camera itself or various processor devices with computing capabilities. The data collected by cameras and radars that are physically adjacent to each other can be directly fused and calculated at the end-side node without network transmission, which reduces bandwidth occupation and delay.
- the edge node 22 (ENode) can be used to fuse trajectory data under different sensing ranges (viewing angles).
- the edge node 22 may be an edge computing box, including a switch, a storage unit, a power distribution unit, a computing unit, and so on.
- the server-side node 23 (SNode) is mainly used to perform cross-target behavior judgment on the tracking target.
- the server-side node 23 may be a cloud server, which stores and calculates data uploaded by the end-side node 21 and the edge-side node 22 to implement task deployment. This application does not limit the type of cloud server equipment and the virtualization management
- the edge-side node 22 can also fuse video trajectories and radar trajectories in the same sensing range
- the server-side node 23 can also fuse different sensing ranges (multi-view ) Under the trajectory.
- Specific implementation methods include but are not limited to the following three: (1) Different groups of sensors (cameras and/or radars) directly transmit the collected sensing data to the server-side node 23, and the server-side node 23 performs all calculations and judgments .
- Different groups of sensors directly transmit the collected sensing data to the edge-side node 22, and the edge-side node 22 first fuse the sensor data of the same group, and then fuse the trajectories of different groups Data to obtain the continuous motion trajectory of the target, and then transmit the calculation result to the server-side node 23.
- the server-side node 23 calculates and determines the second target according to the trajectory data (which can also be implemented by the edge-side node 22), and performs secondary verification according to the video data.
- the camera and radar data of the same group are first transmitted to the nearest end-side node 21.
- the end-side node 21 is responsible for fusing the video and radar trajectory under the same sensing range, and then multiple end-side nodes 21 will merge the trajectory data It is passed to the edge-side node 22, and the edge-side node merges the trajectories of the same target in different sensing ranges (viewing angles), thereby obtaining the continuous motion trajectory of the target. Then the edge side node 22 transmits the continuous motion trajectory of the target to the server side node 23, and the server side node 23 determines the second target according to the trajectory data, and performs secondary verification according to the video data.
- the application of the present invention does not specifically limit which node performs the computing function, and it is mainly determined by user habits, network bandwidth, or computing power of the hardware itself.
- the overall process of the solution of the present invention will be introduced below in conjunction with FIG. 3.
- the method includes the following steps:
- the target includes the first target and other targets, and the first target refers to the original tracking target.
- the scene where the first target is located may be an area with a radius of 20 meters as the center of the original tracking target.
- the other targets may be vehicles whose distance from the tracking target 16 is less than 20 meters.
- the sensing data is the video data captured by the surveillance camera; if the sensor is a millimeter wave radar, the sensing data is the distance data between the target and the radar detected by the millimeter wave radar; if The sensor is a lidar, so the sensing data is the position and speed of the target detected by the lidar.
- each target here refers to the first target and other targets in the scene where the first target is located.
- the sensor is a camera
- the camera is calibrated in advance, and the position of the target in the world coordinate system can be obtained directly from the video data;
- the sensor is a radar
- the position of the radar itself is fixed and known, and the measurement target is The distance between the radars can be used to obtain the target's position in the world coordinate system;
- the sensor is a lidar, the coordinates of the lidar itself are also known, and the three-dimensional lattice data of the target can be obtained directly by emitting a laser beam Information such as the location and speed of the target.
- the position of the target at each moment in the world coordinate system constitutes the trajectory of the target.
- S33 Determine the second target according to the movement trajectories of the first target and other targets, and use the second target as the switched target.
- the switching behavior of the first target occurs (cross-target)
- the trajectory of the original tracking target and the trajectory of other tracking targets are used to determine the second target, and the second target is used as a new tracking target for continuous tracking.
- This scheme takes into account the actual movement of the target during the tracking process, flexibly switches the tracking target, forms a tracking without dead ends and time gaps, and improves the tracking efficiency. Moreover, this solution uses trajectory data to determine a new tracking target. Compared with the prior art using computer vision methods to determine behavior throughout the entire process, the calculation amount of this solution is greatly reduced, and real-time requirements can be more ensured.
- the original tracking target is tracking target 16.
- the complete solution mainly includes five steps (some steps are optional):
- Step S41 For each group of sensors, the sensor data of the video and radar in the group of sensors are merged to obtain the movement trajectory of the target under the sensing range of the group of sensors.
- the same group of sensors may include at least two types of sensors, and the two types of sensors are in the same orientation. Exemplarily, it is shown as the camera 11 and the radar 12 in FIG. 1.
- the same orientation refers to physical proximity, for example, it can be installed on the same pole.
- the position 1 in the figure refers to that the camera 11 and the radar 12 are in the adjacent physical position, namely the position 1.
- the sensor data collected by the two for the tracking target 16 is carried out by Kalman fusion, and the position sensing and calculation from the position 1 can be obtained.
- the track data of the tracked target 16 that is, the movement track in the sensing range 1).
- the positions of the camera 13 and the radar 14 are also similar (both in the azimuth 2), and the trajectory data of the target sensed from the azimuth 2 (ie the movement trajectory under the sensing range 2) can also be obtained by fusing the sensing data of the two.
- the sensing range refers to the range of the scene that the camera can shoot; for the radar, it refers to the range of the scene that the radar can detect.
- the sensing ranges of sensors (for example, the camera 11 and the radar 12) in the same orientation are approximately the same.
- the target here includes the original tracking target and other targets in the scene where the original tracking target is located.
- Generating the motion trajectory of the target refers to generating its corresponding trajectory for each target.
- a 3D detection method may be used to determine the center of mass of the target.
- the target is a car
- the local ground feature of the car or the front and rear wheels can also be used to determine the center of mass to improve the accuracy of the video trajectory.
- the trajectory data obtained by multiple types of sensors can be fused.
- the trajectory data obtained by the camera and the trajectory data obtained by the millimeter wave radar can be fused.
- the trajectory data of the camera and the lidar can be fused, and even the camera, millimeter wave radar, and lidar can be combined.
- the data of the fusion is performed, and the embodiment of the present application does not limit the type of the fused sensor, and it depends on the specific situation.
- the trajectory data of the camera 11 and the radar 12 are merged to specifically show the fusion method provided by the embodiment of the present application.
- This method applies Kalman's fusion calculation idea.
- the trajectory of the tracking target 16 is composed of the position at each moment.
- the fusion of the trajectory data of the camera and the radar is the fusion of the position of the tracking target 16 provided by these two types of data at each moment to form the tracking target 16 in the sensing range.
- the trajectory of the movement Before fusion, it is necessary to correspond the radar data (position) of the tracking target 16 with the video data (position).
- the echo image of the radar can be analyzed to generate the contour image of the target included in the scene where the tracking target 16 is located, and the calibration information can be combined to distinguish which target in the video each target position monitored by the radar corresponds to.
- the position of the tracking target 16 obtained by the camera collection data is named the video measurement position
- the position of the tracking target 16 obtained by the radar collection data is named the radar measurement position.
- Figure 5 shows how the video data and radar data of the tracking target 16 are fused at time t.
- the predicted position E t at time t can be predicted.
- the specific prediction formula can be based on Formula derived from experience.
- the video measurement position V t can be obtained according to the video data collected by the camera.
- Kalman fusion is performed on the predicted position E t at time t and the video measurement position V t to obtain the intermediate optimal estimated position M t .
- the radar measurement position R t of the target can be obtained from the data obtained by the radar, and the intermediate optimal estimated position M t and the radar measurement position R t are Kalman fused to obtain the final optimal estimated position at time t F t .
- the optimal estimated position at each moment is calculated according to the optimal estimated position at the previous moment.
- the optimal estimated position at the initial moment can be the video measurement position or the radar measurement position at the initial moment, or the position after the fusion calculation of the two. So far, the obtained optimal estimated position at each moment constitutes the trajectory of the tracking target 16 in the sensing range 1.
- the above-mentioned fusion process can also be replaced with the first fusion of radar data and then the fusion of camera data.
- the embodiment of the present application does not limit the type and number of fused sensors and the sequence of fusion.
- the motion trajectory of other targets such as the bus 17 is also acquired according to the above-mentioned method.
- the simplest weighted average algorithm can also be used for fusion.
- the position A of the tracking target 16 can be obtained according to the video data
- the position B of the tracking target can be obtained according to the radar data.
- the fusion strategy can specify that the trajectory data of targets above 60 meters from the sensor is mainly radar (higher weight), and the trajectory data of targets within 60 meters from the sensor is mainly video (higher weight).
- the embodiment of the application uses an innovative Kalman filtering method to fuse measurement position data provided by different sensors in the same sensing range, thereby improving the accuracy of the target motion track.
- the fusion method of the embodiment of the present application does not simply perform Kalman fusion of the video prediction position and the video measurement position, perform Kalman fusion of the radar prediction position and the radar measurement position, and then fuse the trajectories of these two types of sensors.
- the key point of the embodiments of the present invention is that a unified predicted position is adopted at the same time, and then the measured positions of different sensors are merged, so an intermediate optimal estimated position will be generated in this process.
- Such a fusion method makes the final position of the target refer to the sensing data of multiple sensors at each moment, which improves the accuracy of the trajectory.
- step S41 is not necessary in the entire scheme. In actual situations, no radar (or millimeter wave radar) is installed around the camera. In this case, the trajectory data in a certain sensing range can be directly collected from the camera. Obtained from the video data without fusion. In short, step S41 is optional, and is mainly determined by the actual situation of the application scenario and the needs of the monitoring personnel.
- Step S42 Fusion of the motion trajectories of the target in different sensing ranges.
- the target here includes the original tracking target and other targets in the scene where the original tracking target is located. After acquiring the trajectory of the target in a certain sensing range, due to the continuous movement of the target, it may be blocked by foreign objects (billboards, buses), or the target may leave the monitoring range directly, causing the trajectory of the target to be interrupted. In order to obtain the continuous motion trajectory of the target, it is first necessary to associate the same target under different viewing angles, and then merge the different viewing angle trajectories of the same target.
- the specific steps of the association may be as follows: first, obtain the sensing data of each target in different sensing ranges, and extract the characteristics of the target therein (the color of the person's hair, the color of the clothes, the color, the shape of the car, etc.).
- pair the target trajectory position P and target feature C at time t in Represents the trajectory position of the target n under the angle of view k at time t; Represents the characteristic information of the target n at the k perspective at time t, such as the position of the car, the direction of the front of the car, the outline of the car body, and so on.
- the clustering algorithm is used to cluster and correlate the features and trajectory position pairs detected by each target in different sensing ranges. If they can be clustered in the same category, it is determined that these features and trajectory positions correspond to the same object.
- the clustering algorithm may be a density-based clustering algorithm (DBSCAN, Density-Based Spatial Clustering of Applications with Noise), and the embodiment of the application does not specifically limit the clustering algorithm used in association with the same target.
- DBSCAN Density-Based Spatial Clustering of Applications with Noise
- the embodiment of the application does not specifically limit the clustering algorithm used in association with the same target.
- angle of view refers to the direction taken by the camera, each angle of view corresponds to a sensing range, and different angles of view correspond to different sensing ranges. Cameras in different directions have different viewing angles, which correspond to different sensing ranges.
- the entire table is associated with a clustering algorithm to track the position of the target 16 at different moments and different sensing ranges.
- the timing of each row is the same, and the sensing range (viewing angle) of each column is the same.
- the embodiment of the application applies the idea of Kalman fusion to trajectory fusion, and the specific fusion method is as follows:
- a fitting method can be used or the current position can be estimated directly based on the velocity (including size and direction) and the position at the previous time.
- Obtain the optimal estimated position at time t 2.
- Other surrounding targets can also obtain continuous motion trajectories in accordance with the above method.
- the most direct weighted average method can also be used to obtain the final position of the target at each moment.
- This embodiment of the application applies the Kalman fusion idea to the fusion of the target motion trajectory under multiple viewing angles for the first time, and the measurement value at each moment selects the best viewing angle instead of random viewing angle fusion, which improves the accuracy of trajectory fusion. It can effectively solve the problem of occlusion or loss of the target in the tracking process, and ensure the continuity of the trajectory of each target.
- Step S43 Determine the target after the original tracking target is switched according to the motion trajectory of the original tracking target and other targets.
- the continuous motion trajectory of the original tracking target and other targets can be obtained, and the continuous motion trajectory of the original tracking target and other targets can be input into the pre-trained neural network model to determine whether the original tracking target has a switching behavior, thereby determining a new one.
- Track the target Exemplarily, the two-dimensional schematic diagram of the trajectory is shown in Fig. 7, the solid line 1 is the movement trajectory of the original tracking target 1 (tracking target 16), and the dashed lines 2, 3, and 4 correspond to other targets 2, other targets 3, and others.
- the movement trajectory of the target 4, where the other target 2 corresponding to the dashed line 2 may be the bus 17.
- the trajectory that is too far from the original tracking target may be filtered.
- the other targets are filtered out. Assuming that the position of the original tracking target is (x 1 , y 1 ) and the position of a certain other target is (x 2 , y 2 ), for example, the Euclidean distance calculation formula can be used to calculate the distance L between the two: The remaining targets after filtering can be called candidate targets.
- the process of determining the new target after switching the original tracking target according to the trajectory roughly includes the following steps: First, according to the above method, filter other distant targets 4 to obtain candidate targets 2, 3; then establish the trajectory characteristics of the original tracking target and the candidate target The space is input into the pre-trained first neural network model to determine the new target after the original tracking target is switched.
- an LSTM neural network first neural network
- first neural network is taken as an example to introduce how to analyze trajectory data to determine a new target after switching.
- the LSTM network is a time recurrent neural network.
- the LSTM network (as shown in Figure 10(a)) includes the LSTM unit (as shown in Figure 9).
- the LSTM unit includes three gates: Forget gate, Input gate and Output gate .
- the forget gate included in the LSTM unit is used to determine the information to be forgotten, the input gate of the LSTM unit is used to determine the updated information, and the output gate of the LSTM unit is used to determine the output value.
- Figure 9 shows LSTM units at three moments.
- the input of the first LSTM unit is the input at time t-1
- the input of the second LSTM unit is the input at time t
- the third LSTM The input of the unit is the input at time t+1
- the structures of the first LSTM unit, the second LSTM unit and the third LSTM unit are exactly the same.
- the core of the LSTM network is the state of the cell (the cell is the large box in Figure 9) and the horizontal line crossing in Figure 9.
- the state of the cell ie C t-1 and C t in Figure 9) is like a conveyor belt , Through the entire cell, just a few linear operations. In this way, information can be passed through the entire cell without changes, and long-term memory retention can be achieved.
- X t-1 , X t , and X t+1 have a one-to-one correspondence.
- X t-1 , X t , X t+1 in Fig. 10(a) are a set of feature pairs of targets 1 and 2 at time t1, t2, and t3, respectively, as shown in Fig.
- P11 represents the location of target 1 at t1
- P12 represents the location of target 2 at t1
- V11 represents the velocity of target 1 at t1
- V12 represents the velocity of target 2 at t1
- ⁇ 1 represents target 1 and 2 The angle of the movement direction at t1.
- the first neural network needs to be pre-trained before using it.
- establish the trajectory feature pair of the person and the taxi that is, the person gets off at each moment.
- the position, speed, and speed of the taxi are included, and then the characteristics of different time periods are taken to mark the different samples, and then the marked samples are input into the neural network for training. For example, suppose that the trajectory sampling interval is 1 second, and a video is found manually.
- the time when a person gets on the car is 11:01:25 (01'25" in Figure 11).
- a, b, c, d and e are five staggered time periods, and the length of the time period is 10 seconds. It should be noted that the length of the time period is not fixed. Take the time period a as an example, the initial time is 01 minutes and 10 seconds. The time length is 10 seconds.
- the five sets of feature pairs obtained in the above five time periods can be used as five training samples to train the neural network.
- the actual number of training samples is determined by the actual situation or the expected model accuracy.
- the real-time trajectory data can be converted into trajectory features to perform the neural network on the input, and then the probability that other targets will be switched targets at each moment can be output.
- trajectory feature pairs of target 1 and target 2 and the trajectory feature pairs of target 1 and target 3 are input to the LSTM neural network.
- the above trajectory characteristics can be understood as some attributes of the target trajectory, such as position, speed, and angle, as follows:
- O 1 P t (x,y) represents the position of the original target 1 at time t
- O 1 V t represents the velocity of the original target 1 at time t
- O 2 P t (x,y) represents the candidate target 2
- O 2 V t represents the velocity of candidate target 2 at time t
- O 3 P t (x,y) represents the position of candidate target 3 at time t
- O 3 V t represents The velocity of candidate target 3 at time t
- the angle between the target and the target can also be calculated according to the direction of the speed
- ⁇ 1 t represents the angle between the movement direction of target 1 and target 2 at time t
- ⁇ 1 t represents The angle between the movement direction of target 1 and target 3 at time t.
- the formula (1) establishes the trajectory feature pair of the original tracking target 1 and the candidate target 2 at each time
- the formula (2) establishes the trajectory feature pair of the original tracking target 1 and the candidate target 3 at each time.
- the trajectory feature pair includes the position, velocity, and angle between the targets (targets 1 and 2, and targets 1 and 3) (determined by the speed direction).
- a pre-trained neural network such as a Long Short-Term Memory (LSTM) network
- LSTM Long Short-Term Memory
- the LSTM neural network can directly output the probability that each candidate target is the new target after switching at each moment.
- F t is the probability that each candidate target is a new target after switching at each moment
- the horizontal axis that is, the time axis, is the current moment.
- a probability threshold c is preset. At time t 0 , the probability corresponding to candidate target 2 exceeds the preset threshold, which means that the original target is switched to candidate target 2 at time t 0.
- trajectory data in addition to the use of neural network calculation and analysis, traditional machine learning classification models such as support vector machines can also be used for calculation. Or directly judge the obtained trajectory data based on artificially prescribed rules.
- the existing research method mainly uses video data to perform intelligent behavior analysis on the target, and the embodiment of the present application uses the motion trajectory of the target to determine whether the target is switched. The change not only saves computing power, but also improves the calculation speed and ensures the real-time tracking; and the use of pre-trained neural networks for calculations ensures the accuracy of the calculation results.
- Step S44 Use a computer vision method to make a second judgment.
- the time t 0 at which a target is switched can be obtained.
- the best video may be a segment shot at the above best viewing angle at time t0.
- Nearby videos it should be noted that, in addition to only using the video at the best viewing angle selected in step S42, multiple videos containing the images where the original tracking target is switched can also be selected.
- the time period of the video is mainly concentrated near time t 0 , for example, it may be a video of about 6 seconds, and the middle time of the time period covered by the video is t 0 .
- the Region Of Interest may only include the tracked target 16 and the bus 17.
- the process of secondary judgment using computer vision is shown in Figures 12 and 13, including the following steps:
- Step S121 Input the selected video frame data into the pre-trained convolutional neural network, which may be a convolutional neural network (CNN, Convolutional Neural Networks), a three-dimensional convolutional neural network (3D CNN, 3D Convolutional Neural Networks), and so on.
- the convolutional neural network is mainly used to extract the features of the target in the video image and generate the bounding box (bbox) corresponding to the target.
- Pre-training is required before use, and the training set can be an image of a bounding box that has been manually marked.
- Convolutional neural networks mainly include input layer, convolutional layer, pooling layer, and fully connected layer. The input layer is the input of the entire neural network.
- the convolutional neural network In the convolutional neural network that processes images, it generally represents the pixel matrix of a picture; the input of each node in the convolutional layer is only a small part of the previous neural network. Block, the size of this small block is 3*3, 5*5 or other sizes.
- the convolutional layer is used to perform more in-depth analysis of each small block in the neural network to obtain more abstract features; pooling layer , The number of nodes in the final fully connected layer can be further reduced, so as to achieve the purpose of reducing the parameters in the entire neural network; the fully connected layer is mainly used to complete the classification task.
- Step S122 The convolutional neural network extracts features of the target in the video and generates a bounding box. Among them, the characteristics of the target are represented by a series of digital matrices; the position of the target determines the coordinates of the four corners of the bounding box. As shown in Figure 14, the tracking target 16 and the bus 17 each generate a bbox, and generate the corresponding Target classification.
- Step S123 Establish a graph model according to the characteristics of the target and the bounding box corresponding to the target.
- the target identified in the video frame (which can be understood as an ID in the field of target recognition) is used as the node.
- the edges connected between nodes are mainly divided into two categories.
- the first type of edge has two components: 1) The first part represents the similarity of the target between the two frames before and after. The higher the similarity, the higher the value, and the value range is [0,1].
- the second part represents the current frame The degree of coincidence between the bbox of the target and the bbox of the target in the next frame, that is, the size of IoU (intersection of union). If they are completely coincident, the value is 1.
- the first part and the second part are added according to the preset weight combination to form the value of the first type of edge.
- the second type of edge represents the distance between two targets in real space. The closer the distance is, the larger the value of the edge.
- the graph model constructed according to the above method is shown in FIG. 15, the first type of edge can be understood as the horizontal edge in the figure, and the second type of edge can be understood as the longitudinal edge in the figure.
- Step S124 Input the constructed graph model to the graph convolutional neural network.
- the general graph convolutional neural network includes graph convolutional layer, pooling layer, fully connected layer and output layer.
- the graph convolution layer is similar to image convolution, which performs the information transfer inside the graph and can fully mine the features of the graph; the pooling layer is used for dimensionality reduction; the fully connected layer is used for classification; the output layer outputs the results of the classification.
- the classification result may include the behavior of getting in the car, the behavior of getting off the car, the behavior of getting close to the car, the behavior of being short, the behavior of opening and closing the door of the car, and so on.
- the output of the neural network is not only a behavior recognition (getting in or getting off the car), but also behavior detection, for example, it can determine the characteristics of the car on which a person gets on.
- the post-switching target determined in step S44 is generally the same as the post-switching target determined in step S43. In this case, the target can be directly used as the post-switching target.
- the switched target determined in step S43 and step S44 are different, the switched target determined in step S44 is taken as the new tracking target.
- the above-mentioned graph convolutional neural network also needs to be pre-trained before being used.
- a video of a person getting on a car is manually obtained, a graph model of each target in the video is generated according to steps S121-S123, and the graph model is marked as a car getting on behavior, which is used as the training set of the graph convolutional neural network. Therefore, after inputting the graph model generated by real-time video data, the graph convolutional neural network can automatically output the classification to determine whether the tracking target has cross-target behavior.
- step S44 is not necessary, and step S43 can directly determine a final new target after the handover, which is mainly determined by the preference or actual situation of the monitoring personnel.
- a graph convolutional neural network is used to make judgments. It should be noted that other artificial intelligence classification models other than graph convolutional neural networks can also perform behavior recognition on selected better videos.
- the embodiment of the application uses a computer vision method to make a secondary judgment on whether the original target is switched, constructs a unique graph model, incorporates more time and space information, and improves the accuracy of judgment.
- Step S45 Determine the target after the handover.
- Step S44 is optional, so there are two situations as follows: (1) Step S45 is executed directly after step S43 is executed, and the target determined by step S43 is used as the target after switching; (2) Step S43 is executed after execution In step S44, in step S45, the target after the handover determined in step S44 is taken as the final target after the handover. It should be noted that in case (2) (that is, the target determined in step S43 and step S44 are different after switching), in addition to directly using the target determined in step S44 as the target after switching, other alternatives can be used. Strategies, such as combining the accuracy of the models used in the two steps and the confidence of the output results, and selecting one of them as the final target after switching.
- Steps S41-S45 mainly take the scene of a person getting on the bus as an example, and switch the tracking target from the tracking target 16 to the bus 17. After that, the bus 17 can be tracked all the time, and the video images of the passengers getting off the bus can be obtained at every parking time period, and the tracking target 16 can be found. When the tracking target 16 reappears in a certain screen, the tracking target is re-switched to the tracking target 16, thereby achieving continuous tracking of the target.
- the method of automatic cross-target tracking provided by the embodiments of the present application can be summarized as follows: First, the improved Kalman filter method is used to fuse video and radar data under the same viewing angle to improve the accuracy of the target movement trajectory under a single viewing angle. Integrating the same target motion trajectory under multiple viewing angles, effectively solving the occlusion problem, and obtaining the continuous motion trajectory of each target. Then, the trajectories of the tracking target and other nearby targets are input to the pre-trained neural network to determine whether the original tracking target is switched, and if it is switched, the tracking target is replaced with a new target. In addition, in order to further improve the accuracy of the judgment, the best video that can capture the scene of the switching behavior can be selected on the basis of the previous step for behavior analysis.
- the best video is input to a neural network to extract features, and then a unique graph model is constructed, and a pre-trained graph convolutional neural network is used to make a secondary judgment on whether the target in the video has a cross-target behavior.
- a pre-trained graph convolutional neural network is used to make a secondary judgment on whether the target in the video has a cross-target behavior.
- the above method can also be used to help find runaway children or lost elderly people, etc. Regardless of the type of the tracked object, the technical solution provided by the present invention can be applied when tracking it.
- the automatic cross-target tracking method provided by this application can automatically switch the tracking target when the original target has a cross-target behavior, so that the real-time effectiveness of the tracking can be guaranteed, and it will not be caused by the target taking a vehicle.
- Tracking is interrupted; from the perspective of the specific implementation of tracking, this application proposes for the first time to use the trajectory of the target in the world coordinate system to perform cross-target behavior analysis and judgment, which improves the accuracy of behavior judgment.
- the present invention also provides a series of measures, including: (1) using an original and improved Kalman filter method to fuse video and radar trajectories under a single view; (2) using Kalman The filtering method fuses the trajectory of the same target under multiple viewing angles.
- the present invention also constructs a unique graph model based on the selected best video, and then uses graph convolutional neural network to make secondary judgments on cross-target behaviors, and incorporate more spatiotemporal information.
- the automatic cross-target tracking method provided by the present application can realize real-time, continuous, and accurate tracking of the target without missing any time period information, thereby improving the convenience and reliability of deployment control tracking.
- this application also provides another modified embodiment for target tracking on the basis of the foregoing embodiment.
- the method includes the following steps: (1) Obtain the movement trajectory of the tracking target 16 (original tracking target).
- the trajectory of the original tracking target can be obtained according to the method described in steps S41 (optional) and S42 in the above embodiment.
- the trajectory is composed of the position of the target within a period of time, and the position of the target can be represented by coordinates in the global coordinate system (for example, the east-north-sky coordinate system).
- the movement trajectory of the tracking target 16 it is determined that the initial moment when the trajectory disappears is the first moment. That is, when the trajectory of the tracking target 16 no longer appears, the initial moment when the target disappears is determined as the first moment.
- the video data includes images of the tracking target 16 that may be updated as the second target.
- the tracking target 16 in the frame of the video data is complete and clear.
- relevant video data can be input into a pre-trained neural network for analysis.
- the pre-trained neural network may be the neural network used in step S44, and the video content is analyzed through the neural network to determine the target after switching.
- the above method mainly describes the scene of a person (tracking target 16) getting on a car (tracking target after switching).
- the switched target such as a bus is obtained, and then the target The bus is tracked.
- a person gets off the bus it is still necessary to look for the tracking target 16 in the monitoring pictures along the bus, and then continue to track it.
- only one target trajectory needs to be acquired at each moment in the entire process.
- the trajectory of the original tracking target is directly used to retrieve the relevant video for behavior analysis, which reduces the occupation of computing power to a certain extent, and ensures real-time and accuracy.
- Can include acquisition module and processing module;
- the acquiring module is configured to acquire sensing data of a target included in the scene where the first target is located.
- the target included in the scene where the first target is located includes a first target and at least one other target except the first target.
- One target is the initial tracking target;
- the processing module is configured to generate the motion trajectory of the first target and the at least one other target according to the sensing data; and is also configured to determine the second target according to the motion trajectory of the first target and the motion trajectory of the other target. Target, and use the second target as the tracking target after switching.
- the processing module is specifically configured to determine a set of candidate targets, where the candidate targets are: the at least one other target, or the difference between the at least one other target and the first target Other targets whose distance is less than a preset threshold; for each candidate target, the motion trajectory of the first target and the motion trajectory of the candidate target are input to the pre-trained first neural network, and the candidate target is obtained as the The probability of the second target; the second target is determined according to the probability that the set of candidate targets are the second target.
- the processing module is further configured to detect that the first probability among the probabilities that the at least one candidate target is the second target is higher than a preset threshold, and determine that the target corresponding to the first probability is the The second goal.
- the processing module is further configured to, for each candidate target, establish at least one set of trajectory feature pairs according to the motion trajectories of the candidate target and the first target, and each set of trajectory feature pairs includes at least two consecutive moments.
- the trajectory feature pair at each moment includes the position and velocity of the first target at that moment, the position and velocity of the candidate target, and the relationship between the first target and the candidate target.
- the included angle of the movement direction; the at least one set of trajectory feature pairs are input to the first neural network to obtain the probability that the candidate target is the second target.
- the processing module is further configured to detect that a first probability among the probabilities of the at least one candidate target is higher than a preset threshold, and determine that the target corresponding to the first probability is the second target.
- the processing module is further configured to obtain video data before and after the first moment, where the video data includes the possibility that the first target may occur The screen of the switching behavior; the video data is input to the pre-trained second neural network, and the third target is determined as the tracking target after the switching according to the output result.
- the second neural network includes a convolutional neural network and a graph convolutional neural network
- the processing module is specifically configured to input the selected video frame data to the pre-trained convolutional neural network, and output the video frame data
- the features and bounding boxes of all targets in the middle constructing a graph model according to the features and bounding boxes; inputting the graph model to the pre-trained graph convolutional neural network, and determining the third target as the tracking target after switching according to the output result.
- the sensors include at least two groups of sensors in different orientations, and different groups of sensor modules are in different orientations.
- the processing module is specifically configured to: The sensing data collected by the sensor module generates at least two motion trajectories of the target; at least two motion trajectories of the target are merged to form the motion trajectory of the target.
- each group of sensors includes at least two types of sensors, that is, cameras and at least one of the following two types of sensors: millimeter wave radar and lidar, and the two types of sensors are in the same orientation, aiming at the scene where the first target is located
- the processing module is specifically configured to: respectively generate at least two monitoring trajectories of the target according to the sensing data collected by at least two types of sensor modules; fuse the at least two monitoring trajectories of the target to form the movement of the target Trajectory.
- the present application also provides another target tracking device, including an acquisition module and a processing module.
- the acquisition module is used to acquire sensing data of a first target through a sensor, the first target being an initial tracking target; the processing module Used to generate the motion trajectory of the first target according to the sensing data; determine the initial moment when the first target trajectory disappears according to the motion trajectory of the first target as the first moment; obtain before and after the first moment A video frame for a period of time, where the video frame includes the first target; the second target is determined according to the video frame, and the second target is used as the updated tracking target.
- the processing module is further configured to determine that the movement track of the first target after the initial moment does not exist, and determine that the initial moment is the first moment.
- the processing module is further configured to input the video frame into a pre-trained second neural network to determine a second target, and use the second target as an updated tracking target.
- the second neural network includes a convolutional neural network and a graph convolutional neural network
- the processing module is further configured to: input the video frame to a pre-trained convolutional neural network, and output the video data The feature and bounding box of the target contained in the video frame; the graph model is constructed according to the feature of the target contained in the video frame and the bounding box; the graph model is input to the pre-trained graph convolutional neural network, and the The second target is described, and the second target is used as the updated tracking target.
- the computer program product includes one or more computer instructions.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
- the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
- the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
- the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk, SSD), etc.
- FIG. 16 is a schematic diagram of the hardware structure of a computing device for target tracking provided by an embodiment of the application.
- the device 160 may include a processor 1601, a communication interface 1602, a memory 1603, and a system bus 1604.
- the memory 1603 and the communication interface 1602 are connected to the processor 1601 through the system bus 1604 and complete mutual communication.
- the memory 1603 is used to store computer execution instructions
- the communication interface 1602 is used to communicate with other devices
- the processor 1601 executes computer instructions to implement the solutions shown in all the foregoing embodiments.
- the system bus mentioned in FIG. 16 may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
- PCI peripheral component interconnect standard
- EISA extended industry standard architecture
- the system bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
- the communication interface is used to realize the communication between the database access device and other devices (such as the client, the read-write library and the read-only library).
- the memory may include random access memory (RAM), and may also include non-volatile memory, such as at least one disk memory.
- the above-mentioned processor may be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it may also be a digital signal processor DSP, an application-specific integrated circuit ASIC, a field programmable gate array FPGA or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- CPU central processing unit
- NP network processor
- DSP digital signal processor
- ASIC application-specific integrated circuit
- FPGA field programmable gate array
- an embodiment of the present application further provides a computer-readable storage medium, which stores instructions in the computer-readable storage medium, which when run on a computer, causes the computer to execute the method shown in the above method embodiment .
- an embodiment of the present application further provides a chip, which is configured to execute the method shown in the foregoing method embodiment.
- the size of the sequence numbers of the foregoing processes does not mean the order of execution.
- the execution order of the processes should be determined by their functions and internal logic, and should not be used for the implementation of this application.
- the implementation process of the example constitutes any limitation.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Computational Biology (AREA)
- Radar Systems Or Details Thereof (AREA)
- Image Analysis (AREA)
Abstract
本申请公开了一种用于目标追踪的方法、设备及系统,主要应用于人员追踪等领域。其中,该方法包括:通过传感器获取初始追踪目标(例如行人)的运动轨迹和初始追踪目标所在场景中其他目标(例如车辆)的运动轨迹;根据上述初始追踪目标的运动轨迹以及场景中其他目标的运动轨迹确定切换后的追踪目标。上述方法能够帮助监控人员掌握追踪目标的行踪,在碰到追踪目标更换交通工具(即:发生追踪目标的切换)时,也能够实现追踪目标的切换和持续追踪,相对于现有技术中利用人力去分析和判断,能够较大地提高目标追踪的效率。
Description
本申请要求于2020年2月28日提交中国专利局、申请号为202010129282.0、申请名称为“一种跨目标追踪的方法、装置及系统”,以及于2020年5月19日提交中国专利局、申请号为202010427448.7、申请名称为“一种用于目标追踪的方法、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及智能安防技术领域,尤其涉及一种用于目标追踪的方法、设备及系统,更具体的,适用于在追踪目标(如:可疑人员)更换交通工具(如:汽车)后,能够实现追踪目标的切换(如:追踪目标切换为汽车)。
随着视频监控技术的发展,国内外很多城市在各个主要的道路都布设了高精度的智能摄像头以便于对道路信息进行监控。上述监控摄像头除了对犯罪人员起到警示作用以外,它也是进行可疑人员追踪、协助案件侦破的主要工具之一。
当前的自动目标追踪方法主要包括:(1)单镜头的目标追踪:即在同一个摄像头内对行人或车辆进行追踪,当目标消失后(譬如遮挡),通过行人重识别(Person Re-identification,Person Re-ID,简称为Re-ID)算法在该镜头内对该目标重新进行追踪。(2)跨镜头的目标追踪:即当目标离开当前镜头的拍摄范围时,同样通过Re-ID算法,在另外一个镜头中识别出该目标,重新进行追踪。因此,目前的自动追踪方法,仅限于在各个镜头中追踪同一个目标,这样会导致追踪过程不连续,即在某一时间段内目标会离开监控者的视野。此外,跨镜头场景的环境(光线、遮挡、及目标姿态等)十分复杂,Re-ID算法的准确性以及计算效率仍然有待考量。
除了上述针对同一目标的追踪场景之外,实际场景中还存在跨目标追踪的情形,所谓跨目标追踪指的是在监控过程中,被追踪的目标发生了切换(如:作为初始追踪目标的可疑人员更换了交通工具),需要对切换后的目标进行追踪。目前针对这种跨目标的目标追踪的应用场景,主要是通过监控人员手动切换目标进行追踪,例如可疑人员上了公交车之后,监控人员将追踪的目标由人切换成车辆。这种通过人工参与实现的方法,不能保证目标追踪的实时性,且占用了较多的人力资源。
综上,目前需要一种能够对目标进行连续而又准确追踪的方法,从而提高目标追踪的效率。
发明内容
本申请提供一种用于目标追踪的方法、设备及系统,在追踪目标更换交通工具(即:发生目标切换)后,也能够实现追踪目标的切换和持续追踪,这使得监控人员可以完全掌握追踪目标的行踪,较大地提高了目标追踪的效率。
本申请的第一方面提供了一种应用于目标追踪的方法,该方法包括:通过传感器获取第一目标所在场景包含的目标的运动轨迹,其中,第一目标所在场景包含的目标包括第一目标和除所述第一目标之外的至少一个其他目标,且第一目标为初始追踪目标;然后根据第一目标和至少一个其他目标的运动轨迹确定第二目标,并将第二目标作为新的追踪目标。需要说明的是,目标在一段时间内的运动轨迹由构成这段时间的每个时刻下目标的位置构成。目标的位置可以通过坐标来表示,该坐标可以是东-北-天坐标系下的坐标也可以是北-东-地坐标系下的坐标。本发明的所有实施例对于具体的坐标系类型不做具体限定。通过上述方法,当原始追踪目标发生切换时(例如追踪目标上车),视频监控系统就自动确定更新后的追踪目标,保证了追踪目标的行动轨迹的连续性,不会遗漏追踪目标在任何时间段的位置信息,提高了追踪效率。此外,上述方法通过轨迹数据去判断原始追踪目标是否发生切换行为,比起直接利用视频数据进行智能行为分析的方法降低了计算量,减少了对算力的要求。
上述方法中“第一目标所在场景”指的是第一目标所处的真实场景,示例性的,场景的范围可以是一个以第一目标为中心,半径为100米的区域。本申请实施例对于场景的具体范围不做限定,视具体情况而定。需要说明的是,第一目标所在的场景随着第一目标的移动一直在变化,当开始对第一目标执行追踪时,即开始获取第一目标以及周围其他目标的轨迹。
在一种可能的实现方式中,根据第一目标的运动轨迹以及至少一个其他目标的运动轨迹确定第二目标包括:确定一组候选目标,所述候选目标为:所述至少一个其他目标,或在所述至少一个其他目标中,与所述第一目标之间的距离小于预设阈值的其他目标;对于每一个候选目标,将所述候选目标的运动轨迹和第一目标的运动轨迹输入至预训练的第一神经网络,获得该候选目标为所述第二目标的概率;根据至少一个候选目标的概率,确定所述第二目标。上述“候选目标”包含两种情况:(1)场景中除了原始追踪目标以外的所有其他目标;(2)场景中与原始追踪目标距离小于预设阈值的其他目标。上述方法将第一目标的运动轨迹和每一个候选目标的运动轨迹分别输入至预训练的神经网络,输出每个候选目标疑似为第二目标的概率,既保证了准确度又保证了实时性。
示例性的,第一神经网络可以是长短期记忆网络LSTM。在使用第一神经网络之前,需要对其进行训练。举个例子,可以人工筛出一些人上车的历史视频,并且获取在上车前一段时间内人和车的视频数据,生成他们的轨迹数据。将这些轨迹数据转换成轨迹特征对,打上标签作为训练集,训练第一神经网络。在使用第一神经网络时,输入是第一目标所在场景包含的目标的轨迹特征对,输出是各个候选目标疑似为第二目标的概率。
需要说明的是,除了将轨迹数据输入进神经网络以外,还可以采用其他人工智能的算法确定第二目标,例如,可以使用支持向量机(Support Vector Machine,SVM)等传统的分类模型。除了上述列举的人工智能算法以外,也可以采用人为选定的规则进行判断,示例性的,通过轨迹数据获取第一目标与其他目标之间的距离以及他们对应的速度等,将距离的变化和速度的变化等作为参考指标确定第二目标。总之,本申请实施例对于如何使用轨迹数据确定第二目标不做具体限定。
在另一种可能的实现方式中,根据所述获得的概率确定所述第二目标包括:当所述获得的概率中的第一概率高于预设阈值时,确定所述第一概率对应的目标为所述第二目标。在追 踪的过程中会不断计算其他目标中的每个候选目标为第二目标的概率,在某一时刻,当某一个候选目标的概率超过预设阈值时,即可判断该概率对应的候选目标即为第二目标。
在另一种实现方式中,对于每个候选目标,将第一目标的运动轨迹和候选目标的运动轨迹输入至预训练的第一神经网络,获得候选目标疑似为第二目标的概率包括:对于每一个候选目标,根据该候选目标的运动轨迹和第一目标的运动轨迹建立至少一组轨迹特征对,每一组轨迹特征对包括至少两个连续时刻下的轨迹特征对,每个时刻下的轨迹特征对包括在该时刻下第一目标的位置、速率,其他目标的位置、速率,以及第一目标和该其他目标运动方向的夹角。每个候选目标都按照上述方法分别和第一目标建立轨迹特征对,从而可以得到至少一组轨迹特征对,将所述至少一组轨迹特征对输入至第一神经网络即可输出每个候选目标为第二目标的概率。示例性的,如果想要获取当前时刻下某一个候选目标为第二目标的概率,则需要输入一组特征对,该组特征对可以包括10个特征对,这10个特征对分别为当前时刻之前的10个时刻下的该其他目标和第一目标的位置、速率以及夹角。将这一组轨迹特征对输入进神经网络,即可得到当前时刻下该候选目标为第二目标的概率。上述方法将第一目标的轨迹数据分别和各个其他目标的轨迹数据建立轨迹特征对作为神经网络的输入,通过分析每个其他目标的轨迹和第一目标的轨迹之间的特征关系,从至少一个其他目标中确定第二目标。
在另一种可能的实现方式中,设定第一概率高于预设阈值的时刻为第一时刻,所述方法还包括:获取第一时刻前后一段时间的视频帧,所述视频帧包括第一目标;然后将所述视频帧输入至预训练的第二神经网络,根据输出结果确定第三目标作为切换后的追踪目标。上述“包括第一目标”的视频帧指的是该视频画面中出现了原始追踪目标。示例性的,可以是从侧面拍摄到人上车画面的视频帧,也可以是正面拍摄车门的画面。本实现方式是在轨迹判断基础上的进一步验证,当第三目标和第二目标是同一个目标时,直接将该目标作为切换后的目标;当第三目标和第二目标不是同一个目标时,以第三目标作为切换后的目标。上述方法主要是利用了计算机视觉的方法对视频数据进行智能行为分析,判断所述第一目标是否切换成了第二目标。在利用轨迹判断的基础上,又提取了相关的视频数据进行二次判断,提高了最终判断的精度。
在另一种可能的实现方式中,第二神经网络包括卷积神经网络和图卷积神经网络,将视频数据输入至第二神经网络确定第三目标,包括:将视频数据输入至预训练的卷积神经网络,输出所述视频数据中所有目标的特征以及包围框;根据所述视频数据中包含的目标的特征以及所述包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第三目标作为切换后的追踪目标。通过第二神经网络提取视频帧中目标的特征以及生成对应目标的包围框,建立了特有的图模型,再利用图卷积神经网络对目标的行为进行判断以确定切换后的新目标。使用图卷积神经网络,纳入更多的时空信息,提高判断的精度。
第二神经网络包括了卷积神经网络和图卷积神经网络,卷积神经网络主要用于提取视频中目标的特征以及生成目标的包围框,用于构建图模型;图卷积神经网络主要用于根据构建好的图模型判断原始追踪目标是否发生跨目标行为。在使用各个神经网络之前需要对他们进行预先训练。示例性的,对于卷积神经网络而言,训练集可以是已经进行人工标记的包含包围框的图像,图像中有人有车。示例性的,对于图卷积神经网络而言,需要人工选取一段上 车的视频并且将这些视频输入至上述卷积神经网络中提取目标的特征以及生成包围框,按照刚才描述的方法生成图模型,将该图模型标记为上车行为,以此作为训练集训练图卷积神经网络。需要说明的是,除了利用神经网络判断视频中目标是否发生跨目标行为以外,还可以利用传统的机器学习模型进行判断,例如支持向量机(SVM)等等。
在另一种可能的实现方式中,传感器包括至少两组传感器,且不同组传感器所处的方位不同。对于所述至少两组传感器中每一组传感器,根据该组传感器所采集的感应数据生成对应该组传感器的每个目标的运动轨迹,从而获得目标的至少两条运动轨迹;融合所述目标的至少两条运动轨迹,形成目标融合后的运动轨迹。其中,“方位”指的是物体在实际空间中所处的位置或方向。“不同组传感器所处的方位不同”指的是各组传感器的实际物理位置相隔较远,示例性的,组与组之间的距离至少两米以上。在融合从不同方位下获取的目标运动轨迹前需要对不同感应范围下的同一目标进行关联。其中,“感应范围”指的是传感器能感应的空间范围,处于不同方位的传感器所能感应的场景范围也不同。示例性的,可以采用聚类算法将不同感应范围下的第一目标的轨迹(位置的序列)关联起来,表示这些轨迹(位置的序列)属于同一个目标(第一目标),然后再进行融合。在融合不同组传感器的运动轨迹时,可以采用卡尔曼融合的方法。示例性的,每个时刻,在不同感应范围下的同一目标具有多个位置,从中选取一个较优位置作为测量位置,采用拟合等方法获得估计位置,将估计位置和测量位置进行卡尔曼融合即可得到该时刻下目标的最终位置。上述方法融合了目标在不同感应范围下的运动轨迹,形成目标的最终运动轨迹。不同位置的摄像机视角不同,不同位置雷达的感应范围也不同。当在某个感应范围(视角)下,目标被异物(例如广告牌)遮挡时,处于其他方位的传感器可以继续提供目标的位置数据,保证了目标轨迹的连续性。同时,由于环境、光线等不可控因素,仅仅使用一组传感器的话,目标时常会出现丢失的情况,所以融合多组处于不同方位的传感器的轨迹数据,在提高了目标最终运动轨迹精度的同时,也提高了目标追踪的效率。
在另一种可能的实现方式中,每一组传感器包括至少两类传感器,即摄像机以及如下两类传感器中的至少一个:毫米波雷达和激光雷达,且两类传感器处于相同的方位。“一组处于相同方位的传感器”表示构成该组的多类传感器设置于临近的物理位置,譬如:这组传感器包括摄像机和毫米波雷达,这两类传感器安装在街道上的同一电线杆上。本申请实施例对于同一组的至少两类传感器之间的距离不做具体限定,只要保证两类传感器的感应范围大致相同即可。需要说明的是,“方位”指的是传感器在实际空间中所处的物理位置,“感应范围”指的是传感器能感应的空间范围。示例性的,当传感器为摄像机时,它的感应范围指的就是它所能拍摄的场景的空间范围;当传感器为雷达时,它的感应范围指的就是它所能探测到的实际场景的空间范围。对于同一组传感器中的每一类传感器,根据该类传感器所采集的感应数据生成对应该类传感器的所述目标的监测轨迹,从而获得目标的至少两条监测轨迹;融合所述至少两条监测轨迹,形成目标的运动轨迹。示例性的,针对同一个目标,融合不同类传感器采集的轨迹数据可以采用创新的卡尔曼融合方法:采用统一的先验估计值(上一时刻的最优估计值),依次与当前时刻不同类传感器的测量值进行卡尔曼融合,形成当前时刻的最优估计值,每个时刻的最优估计值形成了目标的最终运动轨迹。上述方法融合了相邻位置的至少两类传感器数据,提高了目标的轨迹精度。而且,如果仅仅采用摄像机进行追踪,难免会受 到外界天气的影响,融合了雷达提供的轨迹数据,更能确保不会丢失追踪目标,提高追踪效率。
通过上述描述,本申请提供的目标追踪方法,可以在原始追踪目标发生切换时,自动将追踪目标更新为切换后的新目标,使得监控人员可以一直掌握追踪对象的行踪。另外,本申请采用了目标的运动轨迹数据去确定原始追踪目标切换后的新目标,在保证准确率的前提下大大减少了计算量。进一步的,还选取了一段视频帧数据,建立特有的图模型,使用图卷积神经网络对原始追踪目标的切换行为进行了二次判断,纳入更多时空信息,提高了目标追踪的可靠性。
第二方面,本申请提供另一种用于目标追踪的方法,包括:通过传感器获取第一目标的运动轨迹,第一目标为初始追踪目标;确定第一目标的运动轨迹消失的时刻为第一时刻,获取第一时刻前后一段时间的视频帧,所述视频帧包括所述第一目标可能更新为第二目标的画面;将所述视频帧输入至预训练的神经网络确定第二目标,并将第二目标作为更新后的追踪目标。上述方法通过分析轨迹的特点去寻找原始追踪目标可能发生切换行为(例如追踪目标上车)的时间点,根据时间信息获取包含原始追踪目标画面的相关视频数据进行行为分析从而确定新的追踪目标,比起全程对原始追踪目标进行行为分析大大减少了计算量。
在另一种可能的实现方式中,根据第一目标的运动轨迹确定第一目标轨迹消失的初始时刻为第一时刻包括:判断第一目标在初始时刻之后的运动轨迹不存在,确定所述初始时刻为第一时刻。
在另一种可能的实现方式中,根据视频帧确定第二目标,并将第二目标作为更新后的追踪目标,包括:将视频帧输入至预训练的第二神经网络确定第二目标,并将第二目标作为更新后的追踪目标。使用神经网络对相关的视频进行行为检测,提高了判断精度。
在另一种可能的实现方式中,第二神经网络包括卷积神经网络和图卷积神经网络,将视频数据输入至第二神经网络确定第二目标,包括:将视频数据输入至预训练的卷积神经网络,输出所述视频数据中所有目标的特征以及包围框;根据所述视频数据中所有目标的特征以及所述包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第二目标作为切换后的追踪目标。通过第二神经网络提取视频帧中目标的特征以及生成对应目标的包围框,建立了特有的图模型,再利用图卷积神经网络对目标的行为进行判断以确定切换后的新目标。使用图卷积神经网络,纳入更多的时空信息,提高判断的精度。
在另一种可能的实现方式中,传感器包括至少两组传感器,且不同组传感器所处的方位不同。对于所述至少两组传感器中每一组传感器,根据该组传感器所采集的感应数据生成对应该组传感器的第一目标的运动轨迹,从而获得第一目标的至少两条运动轨迹;融合第一目标的至少两条运动轨迹,形成融合后的第一目标的运动轨迹。其中,“方位”指的是物体在实际空间中所处的位置或方向。“不同组传感器所处的方位不同”指的是各组传感器的实际物理位置相隔较远,示例性的,距离至少两米以上。“感应范围”指的是传感器能感应的空间范围,处于不同方位的传感器所能感应的场景范围也不同。示例性的,“不同感应范围”对于摄像机 而言就是从不同的视角去拍摄场景中的目标,对于雷达而言就是从不同的方位去探测场景中的目标。在融合不同组传感器的运动轨迹时,可以采用卡尔曼融合的方法。示例性的,每个时刻在不同感应范围下的第一目标有多个位置数据,从中选取一个较优位置作为测量位置,采用拟合等方法获得估计位置,将估计位置和测量位置进行卡尔曼融合即可得到该时刻下第一目标的最终位置。上述方法融合了第一目标在多视角下的运动轨迹,形成了第一目标的最终运动轨迹。不同位置的摄像机视角不同,不同位置雷达的感应范围也不同。当在某个感应范围(视角)下,第一目标被异物(例如广告牌)遮挡时,处于其他方位的传感器可以继续提供目标的位置数据,保证了第一目标轨迹的连续性。
在另一种可能的实现方式中,每一组传感器包括至少两类传感器,即摄像机以及如下两类传感器中的至少一个:毫米波雷达和激光雷达,且两类传感器处于相同的方位。“一组处于相同方位的传感器”表示构成该组的多类传感器设置于临近的物理位置,譬如:这组传感器包括摄像机和毫米波雷达,这两类传感器安装在街道上的同一电线杆上。本申请实施例对于同一组的至少两类传感器之间的距离不做具体限定,只要保证两类传感器的感应范围大致相同即可。需要说明的是,“方位”指的是传感器在实际空间中所处的物理位置,“感应范围”指的是传感器能感应的空间范围。示例性的,当传感器为摄像机时,它的感应范围指的就是它所能拍摄的场景的空间范围;当传感器为雷达时,它的感应范围指的就是它所能探测到的实际场景的空间范围。对于同一组传感器中的每一类传感器,根据该类传感器所采集的感应数据生成对应该类传感器的所述第一目标的监测轨迹,从而获得第一目标的至少两条监测轨迹;融合所述至少两条监测轨迹,形成第一目标的运动轨迹。示例性的,融合轨迹可以采用创新的卡尔曼融合方法:采用统一的先验估计值(上一时刻的最优估计值),依次与当前时刻不同类传感器的测量值进行卡尔曼融合,形成当前时刻的最优估计值,每个时刻的最优估计值形成了第一目标的运动轨迹。上述方法融合了相邻位置的至少两类传感器数据,提高了第一目标的轨迹精度。而且,如果仅仅采用摄像机进行追踪,难免会受到外界天气的影响,融合了雷达提供的轨迹数据,更能确保不会丢失追踪目标,提高追踪效率。
第三方面,本申请提供一种用于目标追踪的装置,包括:获取模块和处理模块;获取模块,用于获取第一目标所在场景包含的目标的感应数据,所述第一目标所在场景包含的目标包括第一目标和除所述第一目标之外的至少一个其他目标,所述第一目标为初始追踪目标;处理模块,用于根据所述感应数据生成第一目标和至少一个其他目标的运动轨迹;处理模块还用于根据第一目标以及至少一个其他目标的运动轨迹确定第二目标,并将第二目标作为更新后的追踪目标。
在另一种可能的实现方式中,处理模块还用于,确定一组候选目标,所述候选目标为:所述至少一个其他目标,或在所述至少一个其他目标中,与所述第一目标之间的距离小于预设阈值的其他目标;对于所述每一个候选目标,将所述第一目标的运动轨迹和所述候选目标的运动轨迹输入至预训练的第一神经网络,获得所述候选目标为所述第二目标的概率;根据所述至少一个候选目标为所述第二目标的概率,确定所述第二目标。
在另一种可能的实现方式中,所述处理模块还用于,检测所述至少一个候选目标为所述第 二目标的概率中的第一概率高于预设阈值,确定所述第一概率对应的目标为所述第二目标。
在另一种可能的实现方式中,处理模块还用于:对于每一个所述候选目标,根据所述候选目标和所述第一目标的运动轨迹建立至少一组轨迹特征对,每一组轨迹特征对至少两个连续时刻下第一目标的位置、速率,候选目标的位置、速率,以及所述第一目标和所述其他目标运动方向的夹角;将所述至少一组轨迹特征对输入至第一神经网络,输出所述候选目标为第二目标的概率。
在另一种可能的实现方式中,第一概率高于预设阈值的时刻为第一时刻,处理模块还用于,选取第一时刻前后的视频帧,该视频帧包括所述第一目标可能更新为第三目标的画面;将视频帧输入至预训练的第二神经网络,根据输出结果确定第三目标为更新后的追踪目标。
在另一种可能的实现方式中,第二神经网络包括卷积神经网络和图卷积神经网络,所述处理模块具体用于,将视频帧输入至预训练的卷积神经网络,输出视频帧中所包含目标的特征以及包围框;根据目标的特征以及包围框构建图模型;将图模型输入至预训练的图卷积神经网络,根据输出结果确定第三目标作为切换后的追踪目标。
在另一种可能的实现方式中,传感器包括至少两组传感器,不同组传感器所处的方位不同,针对所述第一目标和所述其他目标中的每个目标,处理模块具体用于:分别根据所述至少两组传感器模块所采集的感应数据生成目标的至少两条运动轨迹;融合所述目标的至少两条运动轨迹,形成所述目标的运动轨迹。
在另一种可能的实现方式中,每一组传感器包括至少两类传感器,所述至少两类传感器为摄像机以及如下两类传感器中的至少一类:毫米波雷达和激光雷达,且所述至少两类传感器处于同一方位,针对所述第一目标所在场景包含的目标中的每个目标,处理模块具体用于:分别根据至少两类传感器模块所采集的感应数据生成目标的至少两条监测轨迹;融合目标的至少两条监测轨迹,形成目标的运动轨迹。
第四方面,本申请提供另一种用于目标追踪的装置,包括获取模块和处理模块,获取模块用于,通过传感器获取第一目标的感应数据;处理模块用于,根据感应数据生成第一目标的运动轨迹;确定所述第一目标轨迹消失的初始时刻为第一时刻;获取所述第一时刻前后一段时间的视频数据,所述视频数据包括所述第一目标可能更新为第二目标的画面;根据所述视频数据确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
在另一种可能的实现方式中,处理模块还用于,判断第一目标在初始时刻之后的运动轨迹不存在,确定所述初始时刻为第一时刻。
在另一种可能的实现方式中,处理模块还用于,将视频数据输入至预训练的第二神经网络确定第二目标,将第二目标作为更新后的追踪目标。
在另一种可能的实现方式中,第二神经网络包括卷积神经网络和图卷积神经网络,处理模块还用于将所述视频帧输入至预训练的卷积神经网络,输出所述视频数据中所包含目标的特征以及包围框;根据所述视频帧中所包含的目标的特征以及包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
在另一种可能的实现方式中,传感器包括至少两组传感器,不同组传感器所处的方位不同,处理模块具体用于:分别根据所述至少两组传感器模块所采集的感应数据生成第一目标的至少两条运动轨迹;融合所述第一目标的至少两条运动轨迹,形成所述第一目标的运动轨迹。
在另一种可能的实现方式中,每一组传感器包括至少两类传感器,所述至少两类传感器为摄像机以及如下两类传感器中的至少一类:毫米波雷达和激光雷达,且所述至少两类传感器处于同一方位,处理模块具体用于:分别根据至少两类传感器模块所采集的感应数据生成第一目标的至少两条监测轨迹;融合第一目标的至少两条监测轨迹,形成第一目标的运动轨迹。
第五方面,本申请提供一种用于目标追踪的设备,该设备包括处理器和存储器,其中:存储器中存储有计算机指令;处理器执行所述计算机指令,以实现上述第一方面及可能实现方式中任一所述的方法。
第六方面,本申请提供一种用于目标追踪的设备,该设备包括处理器和存储器,其中:存储器中存储有计算机指令;处理器执行所述计算机指令,以实现上述第二方面及可能实现方式中任一所述的方法。
第七方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当其在计算机上运行时,使得计算机执行上述第一方面及可能实现方式中任一所述的方法。这些计算机可读存储包括但不限于如下的一个或者多个:只读存储器(read-only memory,ROM)、可编程ROM(programmable ROM,PROM)、可擦除的PROM(erasable PROM,EPROM)、Flash存储器、电EPROM(electrically EPROM,EEPROM)硬盘驱动器(Hard drive)。
第八方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当其在计算机上运行时,使得计算机执行上述第二方面及可能实现方式中任一所述的方法。这些计算机可读存储包括但不限于如下的一个或者多个:只读存储器(read-only memory,ROM)、可编程ROM(programmable ROM,PROM)、可擦除的PROM(erasable PROM,EPROM)、Flash存储器、电EPROM(electrically EPROM,EEPROM)硬盘驱动器(Hard drive)。
第九方面,本申请提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面及可能实现方式中任一所述的方法。
第十方面,本申请提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计 算机执行上述第二方面及可能实现方式中任一所述的方法。
图1是本申请实施例提供的用于目标追踪的方法的应用场景示意图。
图2是本申请实施例提供的用于目标追踪的系统架构示意图。
图3是本申请实施例提供的用于目标追踪的方法的流程示意图。
图4是本申请实施例提供的用于目标追踪的方法的另一流程示意图。
图5是本申请实施例提供的融合视频轨迹和雷达轨迹的方法流程示意图。
图6是本申请实施例提供的某一目标在不同时刻以及不同视角下的位置表。
图7是本申请实施例提供的原始追踪目标和其他目标的二维运动轨迹示意图。
图8是本申请实施例提供的LSTM神经网络输出的每个时刻下各个其他目标为切换后的新目标的概率示意图。
图9是本申请实施例提供的LSTM单元的结构示意图。
图10(a)是本申请实施例提供的LSTM神经网络的结构示意图。
图10(b)是本申请实施例提供的特征对示意图。
图11是本申请实施例提供的训练LSTM神经网络训练样本的时间段分布图。
图12是本申请实施例提供的利用计算机视觉进行二次判断的方法流程图示意图。
图13是本申请实施例提供的第二神经网络的结构流程示意图。
图14是本申请实施例提供的目标的包围框示意图。
图15是本申请实施例提供的图模型示意图。
图16是本申请实施例提供的目标追踪设备的硬件结构示意图。
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明的技术方案进行清楚、完整地描述。
应用场景介绍
图1为本申请提供的目标追踪方法的应用场景示意图。示例性的,图1中追踪目标16的前方以及斜后方各有一组监控设备。前方的监控设备10包括监控摄像头11以及雷达12,斜后方的监控设备13包括监控摄像头14以及雷达15,两组监控设备的感应范围(视角)不同,各组监控设备都会对自身感应范围内出现的目标进行监控、追踪,并形成该目标在世界坐标系下的运动轨迹。当监控设备为监控摄像头时,感应范围指的是摄像头所能拍摄到的场景范围;当监控摄像头为雷达时,感应范围指的是雷达所能探测到的空间范围;视角指的是摄像机所拍摄的方向,不同的视角对应着不同的感应范围。运动轨迹指的是目标在一段时间的位置,可以由坐标表示。根据摄像头、雷达的位置标定,将每个时刻拍摄或者探测到的目标位置投射到全局坐标系中,从而可以形成目标的运动轨迹。为了提高在某一感应范围下的轨迹精度,除了利用监控摄像头采集视频数据以外,在与摄像头的相邻的物理位置也安装有毫米波雷达(或者激光雷达)。下面直接使用“雷达”概括表示“毫米波雷达或者激光雷达”。融合相同感应范围下的雷达数据以及视频数据,可以形成精度更高的运动轨迹。此外,为了减 少遮挡对追踪的影响,保证轨迹的连续性,还需要融合多个感应范围下的运动轨迹,也就是将监控设备10和监控设备13所采集到的轨迹数据进行融合,可以得到目标的运动轨迹。在被追踪的过程中,追踪目标16即将坐公交车17离开。此时此刻,由于追踪目标16上了车导致追踪目标消失,即使已经融合了追踪目标16在不同感应范围(多视角)下的运动轨迹,也无法继续实时追踪了,因为不管在哪个镜头下都无法找到追踪目标16。所以本申请提出的方案是:首先根据追踪目标16以及周围目标(例如公交车17或者更多其他的目标)的运动轨迹去判断追踪目标16(原追踪目标)是否发生切换行为(切换交通工具),如果确认追踪目标16发生了切换行为,比如追踪目标16坐上了公交车17,则将追踪目标切换为公交车17,直到原始追踪目标又重新出现在某个监控场景画面中。进一步的,通过轨迹确定了新的追踪目标以后,为了提高判断精度,还会选取包括疑似发生切换行为画面的视频数据用于行为分析,判断验证原始追踪目标是否的确发生切换行为。
本申请提供的用于自动追踪的方法,可以对目标进行持续性追踪,即使目标切换了交通工具,也可以对切换后的目标进行监控追踪,不会遗漏任何时间段,极大程度上提高了人员布控追踪的便利性以及可靠性。
系统架构介绍
图2为本申请提供的一种自动追踪系统的系统架构示意图。如图所示,该系统包括端侧节点21(Terminal Node,TNode)、边缘侧节点22(Edge Node,ENode)以及服务器侧节点23(Server Node,SNode)。每个节点都可以单独执行计算任务,各个节点之间还可以通过网络进行通信,完成任务的下发以及结果的上传。网络传输方式包括以有线传输和无线传输两种方式。其中,有线传输方式包括利用以太、光纤等形式进行数据传输,无线传输方式包括3G(Third generation)、4G(Fourth generation)、或5G(Fifth generation)等宽带蜂窝网络传输方式。
端侧节点21(TNode)可以用于融合同一感应范围下的视频轨迹以及雷达轨迹。端侧节点21可以摄像机本身,也可以是具有计算能力的各种处理器设备。物理位置相邻的摄像机以及雷达采集到的数据可以直接在端侧节点进行融合计算,无需进行网络传输,减少了带宽的占用,降低了时延。边缘侧节点22(ENode)可以用于融合不同感应范围(视角)下的轨迹数据。边缘侧节点22可以是边缘计算盒子,包括交换机、存储单元、配电单元、计算单元等等。服务器侧节点23(SNode)主要用于对追踪目标进行跨目标行为判断。服务器侧节点23可以是云端服务器,对端侧节点21以及边缘侧节点22上传的数据进行存储和计算,实现任务的调配。本申请对云端服务器设备的类型以及虚拟化管理方式不做限定。
需要说明的是,上述各个计算节点所执行的任务并不是固定的,例如边缘侧节点22也可以融合同一感应范围下的视频轨迹和雷达轨迹,服务器侧节点23也可以融合不同感应范围(多视角)下的运动轨迹。具体实现方式包括但不限于如下三种:(1)不同组的传感器(摄像机和/或雷达)直接将采集到的感应数据传输至服务侧节点23,由服务侧节点23执行所有的计算和判断。(2)不同组的传感器(摄像机和/或雷达)直接将采集到的感应数据传输至边缘侧节点22,由边缘侧节点22先对同一组的传感器感应数据进行融合,接着融合不同组的轨迹数据,从而获得目标的连续运动轨迹,然后将计算结果传输至服务器侧节点23。服务器侧节点23根据轨迹数据计算确定第二目标(也可以由边缘侧节点22实现),以及根据视频数据进行二次验证。(3)同一组的摄像机和雷达数据先传递给距离最近的端侧节点21,端侧 节点21负责融合同一感应范围下的视频和雷达轨迹,然后多个端侧节点21将融合后的轨迹数据传递给边缘侧节点22,由边缘侧节点融合同一目标的在不同感应范围(视角)下的轨迹,从而得到目标的连续运动轨迹。然后边缘侧节点22将目标的连续运动轨迹传输至服务器侧节点23,服务器侧节点23根据轨迹数据确定第二目标,以及根据视频数据进行二次验证。总之,本发明申请对于具体由哪个节点执行计算功能,并不做具体的限定,主要由用户习惯、网络带宽或者硬件本身的计算能力等决定。
整体方案
下面将结合图3对本发明方案的整体流程进行介绍,方法包括如下步骤:
S31:获取传感器采集的第一目标所在场景包含的目标的感应数据。其中,目标包括第一目标和其他目标,第一目标指的是原始追踪目标。示例性的,第一目标所在场景可以是以原始追踪目标为中心半径20米的区域。示例性的,如果第一目标是追踪目标16,那么其他目标可以是与追踪目标16之间的距离小于20米的交通工具。如果传感器是监控摄像机,那么感应数据即为该监控摄像机所拍摄的视频数据;如果传感器是毫米波雷达,那么感应数据即为该毫米波雷达所探测到的目标与雷达之间的距离数据;如果传感器是激光雷达,那么感应数据即为该激光雷达所探测到的目标的位置、速度等数据。
S32:根据传感器采集的感应数据生成每个目标的运动轨迹。这里的每个目标指的是第一目标和第一目标所在场景中的其他目标。示例性的,当传感器为摄像机时,摄像机经过预先标定,可以直接通过视频数据获取其中目标在世界坐标系下的位置;当传感器为雷达时,雷达本身的位置是固定已知的,测量目标与雷达之间的距离即可获得目标在世界坐标系下的位置;当传感器为激光雷达时,激光雷达本身的坐标也是已知的,通过发射激光束来获取目标的三维点阵数据,可以直接获取目标的位置、速度等信息。目标在世界坐标系下每个时刻的位置构成了目标的运动轨迹。
S33:根据第一目标和其他目标的运动轨迹确定第二目标,并将第二目标作为切换后的目标。当第一目标发生切换行为时(跨目标),利用原始追踪目标的轨迹和其他追踪目标的轨迹确定第二目标,并将第二目标作为新的追踪目标进行持续追踪。
本方案考虑了在追踪过程中目标的现实运动情况,灵活切换追踪目标,形成了无死角、无时间缝隙的追踪,提高了追踪效率。而且,本方案利用轨迹数据确定新的追踪目标,比起现有技术全程通过计算机视觉方法进行行为判断,本方案的计算量大大降低,更能保证实时性的需求。
方案实现细节
介绍完了方案的整体流程之后,下面将详细展示本发明的具体实现方案。示例性的,原始追踪目标为追踪目标16,如图4所示,完整的方案主要包括五个步骤(部分步骤是可选的):
步骤S41:针对每一组传感器,融合该组传感器中视频和雷达的感应数据,获得目标在该组传感器感应范围下的运动轨迹。
同一组传感器内可以包括至少两类传感器,且这两类传感器处于相同的方位。示例性的,如图1中摄像机11和雷达12所示。相同方位指的是物理位置邻近,例如可以是安装在同一根电线杆上。图中方位1指的是摄像机11和雷达12处于相邻的物理位置即方位1,将二者 采集到的针对追踪目标16的感应数据进行卡尔曼融合,可以得到从方位1这个位置感应并计算到的追踪目标16的轨迹数据(即感应范围1下的运动轨迹)。同理,摄像机13和雷达14的位置也相近(都处于方位2),融合二者的感应数据也可以获得从方位2感应到的目标的轨迹数据(即感应范围2下的运动轨迹)。需要说明的是,感应范围对于摄像机而言,指的是摄像机所能拍摄的场景范围;对于雷达而言,指的是雷达所能探测到的场景范围。处于相同方位的传感器(例如摄像机11、雷达12)的感应范围大致相同。
这里的目标包括了原始追踪目标以及原始追踪目标所在场景中的其他目标,生成目标的运动轨迹指的是针对每个目标都生成其对应的轨迹。可选的,在采集视频数据时,为了提升视频轨迹的精度,可以采用3D检测的方法判断目标的质心。此外,当目标为车时,还可以利用车的局部贴地特征或者前后车轮来判断质心以提高视频轨迹的精度。
为了提高在某一方位下感应到的目标运动轨迹的精度,可以将多类传感器得到的轨迹数据进行融合。例如将通过摄像机获取到的轨迹数据和通过毫米波雷达获取到的轨迹数据进行融合,除此之外还可以将摄像机和激光雷达轨迹数据进行融合,甚至将摄像机、毫米波雷达、激光雷达三者的数据进行融合,本申请实施例对融合的传感器的种类不做限定,视具体情况而定。
下面以获取追踪目标16在感应范围1下的运动轨迹为例,融合摄像机11和雷达12的轨迹数据,具体展示本申请实施例提供的融合方法。本法方法应用了卡尔曼融合的计算思想。追踪目标16的轨迹是由每个时刻的位置所构成的,融合摄像机和雷达的轨迹数据也就是融合这两类数据提供的追踪目标16在每个时刻下的位置,形成追踪目标16在感应范围下的的运动轨迹。在融合之前,必然需要将追踪目标16的雷达数据(位置)和视频数据(位置)进行对应。示例性的,可以分析雷达的回波图像生成追踪目标16所在场景包含的目标的轮廓图像,结合标定信息可以区分雷达监测到的各个目标位置对应于视频中的哪个目标。
为了方便描述融合的具体过程,将通过摄像机采集数据获取到的追踪目标16的位置命名为视频测量位置,将通过雷达采集数据获取到的追踪目标16的位置命名为雷达测量位置。图5展示了t时刻下,追踪目标16的视频数据和雷达数据是如何融合的。首先,t-1时刻有一个最优估计位置F
t-1,根据这个上一时刻的最优估计位置F
t-1可以预测得到t时刻的预测位置E
t,具体的预测的公式可以是根据经验而推导出来的公式。t时刻,根据摄像机采集到的视频数据可以得到视频测量位置V
t。将t时刻的预测位置E
t,以及视频测量位置V
t进行卡尔曼融合得到中间最优估计位置M
t。与此同时,t时刻根据雷达获取的数据可以获得目标的雷达测量位置R
t,将中间最优估计位置M
t和雷达测量位置R
t进行卡尔曼融合,可以得到t时刻最终的最优估计位置F
t。每个时刻的最优估计位置都根据上一时刻的最优估计位置计算,初始时刻最优估计位置可以选取初始时刻的视频测量位置或者雷达测量位置,或者是二者融合计算以后的位置。至此,获得的每个时刻的最优估计位置组成了追踪目标16在感应范围1下的运动轨迹。值得说明的是,上述融合的过程也可以更换为先融合雷达数据再融合摄像机的数据,总之,本申请实施例对于融合的传感器的类别、数量以及融合的先后顺序都不做限定。除了追踪目标16(第一目标)以外,其他目标例如公交车17的运动轨迹也同样按照上述方法获取运动轨迹。
需要说明的是,除了应用卡尔曼思想融合视频和雷达轨迹以外,还可以采用最简单的加权平均算法进行融合。示例性的,t=1时刻根据视频数据可以得到追踪目标16的位置A,根据雷达数据可以得到追踪目标的位置B,直接将位置A和位置B进行加权平均计算即可得到t=1 时刻追踪目标16在感应范围1下的最终位置。可选的,融合策略可以规定距离传感器60米以上目标的轨迹数据以雷达为主(权重更高),距离传感器60米以内目标的轨迹数据以视频为主(权重更高)。
本申请实施例通过创新的卡尔曼滤波方法,融合在同一感应范围下不同传感器提供的测量位置数据,提高了目标运动轨迹的精度。需要注意的是,本申请实施例的融合方法不是单纯地将视频预测位置和视频测量位置进行卡尔曼融合,将雷达预测位置和雷达测量位置进行卡尔曼融合,然后再融合这两类传感器的轨迹。本发明申请实施例的关键点在于在同一时刻下采用了统一的预测位置,然后融合不同传感器的测量位置,所以在这过程中会产生中间最优估计位置。这样的融合方法使得目标在每个时刻的最终位置都参考了多种传感器的感应数据,提高了轨迹的精度。
需要说明的是,步骤S41在整个方案中不是必须的,在实际情况中,摄像机周围没有安装雷达(或者毫米波雷达),在这样的情况下某一感应范围下的轨迹数据可以直接从摄像机采集的视频数据中获取,无需融合。总之,步骤S41是可选的,主要由应用场景的实际情况以及监控人员的需求等决定。
步骤S42:融合目标在不同感应范围下的运动轨迹。
这里的目标包括原始追踪目标和原始追踪目标所在场景中的其他目标。获取了目标在某一感应范围下的运动轨迹之后,由于目标的不断移动,有可能被异物(广告牌、公交车)遮挡,或者目标直接离开监控范围,导致目标的轨迹中断。为了获得目标的连续运动轨迹,首先需要关联不同视角下的同一目标,然后再将同一目标的不同视角轨迹进行融合。
因为在每一个感应范围内都有多个目标(追踪目标16、其他目标例如公交车17),在融合目标在不同感应范围下的轨迹之前需要先将同一目标在不同感应范围下的轨迹关联起来,也就是要确保在融合多个感应范围的轨迹前获得的运动轨迹是属于同一个目标的。示例性的,关联的具体步骤可以如下:首先,获取各个目标在不同感应范围下的感应数据,并且提取其中目标的特征(人的发色、衣服颜色,车的颜色、形状等等)。然后,将t时刻的目标轨迹位置P和目标特征C组对
其中,
代表t时刻目标n在视角k下的轨迹位置;
代表t时刻目标n在k视角下的特征信息,例如车的部位、车头方向、车身轮廓等等。通过聚类算法对各个目标在不同的感应范围下检测到的特征和轨迹位置对进行聚类关联,如果能聚在同一个类别的就判定这些特征和轨迹位置对应属于同一个物体。聚类算法可以是基于密度的聚类算法(DBSCAN,Density-Based Spatial Clustering of Applications with Noise),本申请实施例对于关联同一目标使用的聚类算法不做具体限定。聚类之后,如果某个类内有
这几个组对信息,那么这几个组对就属于同一个目标物体,示例性的,即
就是来自追踪目标16在不同视角(视角1、2、4)下测量到的轨迹位置。至此,在t时刻下,同一个目标在不同视角下的轨迹位置已经关联了起来,接下来就需要对该目标在多个视角下测量到的轨迹位置进行融合。需要说明的是,“视角”指的是摄像机所拍摄的方向,每个视角都对应着一个感应范围,不同的视角对应着不同的感应范围。处于不同方位的摄像机都有着不同的视角,从而对应着不同的感应范围。
如图6所示,整个表格都是经过聚类算法关联之后追踪目标16在不同时刻以及不同感应范围下的位置,每一行的时刻是相同的,每一列的感应范围(视角)是相同的。在这里以仅仅包括摄像机这一类传感器为例,展示融合多视角轨迹数据的方法细节。本申请实施例将卡 尔曼融合的思想应用于轨迹融合,具体融合方法如下:
t=0时刻也就是初始时刻,可以随机选取某一个视角的测量位置作为初始时刻的估计位置。t=1时刻,有视角1、视角2…视角k下追踪目标16的测量位置,根据这些视角是否是进口道、拍摄该视角所对应设备和目标之间的距离等原则,选择一个较优视角下的测量位置,作为t=1时刻下的目标测量位置。根据t=0时刻的最终位置(最优位置)预测t=1时刻的位置,得到t=1时刻的目标预测位置。其中,根据上一时刻最终位置预测当前时刻位置的方法有多种,示例性的,可以采用拟合的方法或者直接根据上一时刻的速度(包括大小和方向)和位置推算当前时刻的位置。将目标预测位置和选定的t=1时刻下的目标测量位置进行卡尔曼融合,即可得到t=1时刻下的最优估计位置。同理,t=2时刻也选取一个最优视角下的目标测量位置和t=2时刻下的目标预测位置(根据t=1时刻的最优估计位置进行预测得到)进行卡尔曼融合,即可得到t=2时刻下的最优估计位置。每个时刻重复下去即可得到追踪目标16在融合了不同视角数据之后的连续运动轨迹。周围其他目标也按照上述方法可以获得连续运动轨迹。需要说明的是,除了上述利用卡尔曼的思想进行融合的方法以外,也可以采用最直接的加权平均方法得到目标每个时刻的最终位置。示例性的,t=1时刻有视角1、2、3…k,所以追踪目标16在t=1时刻有k个位置,直接将这k个位置进行加权平均计算即可得到t=1时刻追踪目标16的最终位置。
本申请实施例首次将卡尔曼融合的思想应用于多视角下目标运动轨迹的融合,而且每个时刻的测量值选取的都是最佳视角,不是随机视角融合,提高了轨迹融合的准确度,可以有效地解决在追踪过程中目标被遮挡或者目标丢失的问题,确保了每一个目标运动轨迹的连续性。
步骤S43:根据原始追踪目标以及其他目标的运动轨迹,确定原始追踪目标切换后的目标。
根据步骤S42可以获得原始追踪目标和其他目标的连续运动轨迹,将原始追踪目标和其他目标的连续运动轨迹输入进预训练的神经网络模型可以判断原始追踪目标是否发生切换行为,从而确定出新的追踪目标。示例性的,轨迹的二维示意图如图7所示,实线1是原始追踪目标1(追踪目标16)的运动轨迹,虚线2、3、4对应的是其他目标2、其他目标3、其他目标4的运动轨迹,其中,虚线2对应的其他目标2可以是公交车17。可选的,在根据轨迹判断原始追踪目标切换目标之前,可以对距离原始追踪目标太远的轨迹进行过滤。示例性的,在t时刻,当追踪目标16和其他目标之间的距离大于7米时,则滤除该其他目标。假设原始追踪目标的位置为(x
1,y
1),某一个其他目标的位置为(x
2,y
2),示例性的,计算二者之间的距离L可以采用欧式距离计算公式:
滤除过后剩下的其他目标可以被称为候选目标。
根据轨迹确定原始追踪目标切换后的新目标的流程大致包括如下步骤:首先,按照上述方法,过滤较远的其他目标4,得到候选目标2、3;然后建立原始追踪目标和候选目标的轨迹特征空间,输入进预训练的第一神经网络模型,确定原始追踪目标切换后的新目标。在本申请实施例中,以LSTM神经网络(第一神经网络)为例,介绍如何对轨迹数据进行分析从而确定切换后的新目标。
在介绍使用LSTM神经网络进行轨迹分析之前,首先到对LSTM神经网络进行介绍。
LSTM网络是一种时间递归神经网络。LSTM网络(如图10(a)所示)中包括LSTM单元(如图9所示),LSTM单元包括遗忘门(Forget gate)、输入门(Input gate)和输出门(Output gate)三个门。LSTM单元包括的遗忘门用于决定忘记的信息,LSTM单元的输入门用于决定更新的信息,LSTM单元的输出门用于决定输出值。示例性的,图9中示出了三个时刻下的LSTM单元,第一个LSTM单元的输入是t-1时刻的输入,第二个LSTM单元的输入是t时刻的输入,第三个LSTM单元的输入是t+1时刻的输入,第一个LSTM单元、第二个LSTM单元和第三个LSTM单元的结构完全相同。LSTM网络的核心在于细胞(cell)(细胞为图9中的大方框)的状态和图9中横穿的水平线,细胞的状态(即图9中的C
t-1和C
t)像一条传送带,从整个细胞中穿过,只是做了少量的线性操作。这样可以实现信息从整个细胞中穿过而不做改变,进而可以实现长时的记忆保留。LSTM单元每次进行输出后,若还有时间点的轨迹特征对没有输入LSTM单元,则将本次的输出和未输入LSTM单元的下一个时间点的轨迹特征对输入LSTM单元,直到LSTM单元某次输出后,没有要输入LSTM单元的轨迹特征对存在。需要说明的是,图9只展示了网络的部分单元,整个神经网络其实是由多个LSTM单元组合形成,如图10(a)所示,各个时刻的LSTM单元的输出会集中输入到全连接层(Fully connection),然后通过softmax层输出分类结果(score),通过回归损失(regression loss)的方法找到输入的多个时刻中具体是哪个时刻为切换时刻,图9和图10(a)的X
t-1、X
t、X
t+1一一对应。示例性的,图10(a)中X
t-1、X
t、X
t+1分别为t1、t2、t3时刻下目标1和2的一组特征对,如图10(b)所示,P11表示目标1在t1时刻下的位置,P12表示目标2在t1时刻下的位置;V11表示目标1在t1时刻下速率,V12表示目标2在t1时刻下的速率,θ1表示的是目标1和2在t1时刻下运动方向的夹角。将图10(b)所示的一组特征对输入进如图10(a)所示的LSTM神经网络模型,即可判断该组特征对是否包含切换行为(softmax分类为yes或者no,score为概率值),以及时刻(index)。
在上述方法中,在使用第一神经网络之前需要对其进行预先训练。示例性的,可以人工查找包含人上出租车场景的视频数据。通过视频数据获取其中人的轨迹数据以及人所上的出租车的轨迹数据,按照上述方法(如图10(b)所示),建立人和出租车的轨迹特征对,即每个时刻下人和出租车的位置、速度大小、速度夹角,然后取不同时间段的特征对组成不同的样本分别进行标记,然后标记好的样本输入进神经网络进行训练。举个例子,假设轨迹采样时间间隔为1秒,人工找到一段视频,人上车的时刻为11点01分25秒(图11中的01'25”),图11中a、b、c、d、e为五个交错的时间段,时间段的长度为10秒,需要说明的是,时间段的长度并不是固定的。以a时间段为例,其初始时刻为01分10秒,其时间长度为10秒,选取a时间段的人和出租车的视频数据并转化成对应的轨迹数据,即可得到一组在10个时刻下(a时间段包含10秒)人和出租车的轨迹特征对,将这10个轨迹特征对作为一组轨迹特征,并标记为分类结果为NO(因为a时间段内人没有上车),切换时间为none。同理b时间段得到的一组特征对的分类结果也标记为NO,切换时间为none。e时间段的初始时刻为01'20”,长度是10s,包括人上出租车的时刻,所以将e时间段得到的一组轨迹特征对(10个轨迹特征对)的分类结果标记为YES,切换时间为0125。上述五个时间段得到的五组特征对即可作为五个训练样本训练神经网络,实际的训练样本数量由实际情况或者期望的模型准确度所决定。训练完神经网络之后,就可以将实时轨迹数据转换成轨迹特征对输入进行神经网络,就能输出其他目标在每个时刻下为切换后目标的概率。
下面讲述一下LSTM神经网络的使用。以目标1、2、3为例,输入LSTM神经网络的为目标1和目标2的轨迹特征对,以及目标1和目标3的轨迹特征对。上述轨迹特征可以理解为目标轨迹的一些属性,比如位置、速度和角度,具体如下所示:
[O
1V
t,O
2V
t,O
1P
t(x,y),O
2P
t(x,y),θ1
t],t=0,1,2,…,m (1)
[O
1V
t,O
3V
t,O
1P
t(x,y),O
3P
t(x,y),θ2
t],t=0,1,2,…,m (2)
其中,O
1P
t(x,y)表示原始目标1在t时刻的位置,O
1V
t表示原始目标1在t时刻的速率;O
2P
t(x,y)表示的是候选目标2在t时刻的位置,O
2V
t表示的是候选目标2在t时刻的速率;O
3P
t(x,y)表示的是候选目标3在t时刻的位置,O
3V
t表示的是候选目标3在t时刻的速率;根据速度的方向也可算出目标与目标之间的夹角,θ1
t表示的是目标1和目标2在t时刻下运动方向的夹角,θ1
t表示的是目标1和目标3在t时刻下运动方向的夹角。
公式(1)建立的是各个时刻下原始追踪目标1和候选目标2的轨迹特征对,公式(2)建立的是各个时刻下原始追踪目标1和候选目标3的轨迹特征对。轨迹特征对包含了各个目标的位置、速率、以及目标(目标1和2,目标1和3)之间的夹角(通过速度方向确定)。将建立的特征对分别输入至预训练的神经网络,例如长短期记忆网络(LSTM,Long Short-Term Memory)即可实时输出各个候选目标为原始目标切换后的新目标的概率。如图8所示,LSTM神经网络可以直接输出每个时刻下各个候选目标为切换后的新目标的概率。示例性的,以分析候选目标2为例,假设当前时刻是t=4,开始追踪目标1,选取通过公式(1)建立的t=0至t=4时刻目标1和2的4个特征对(一组特征对)输入至图10(a)所示的LSTM神经网络,softmax输出发生切换的概率(作为t=4时刻的概率值);当前时刻变为t=5,选取通过公式(1)建立的t=1至t=5时刻目标1和2的4个特征对(另一组特征对)输入至图10(a)所示的LSTM神经网络,softmax输出发生切换的概率(作为t=5时刻的概率值),如此重复下去,即可绘制出候选目标2为切换后新目标的时间概率分布图(如图8中虚线2所示)。图8中,F
t为每个时刻下各个候选目标为切换后的新目标的概率,横轴即时间轴为当前时刻。预先设定一个概率阈值c,在t
0时刻,候选目标2对应的概率超过了预设阈值,即代表在t
0时刻原始目标切换为了候选目标2。
需要说明的是,获得轨迹数据之后,除了使用神经网络计算分析以外,还可以采用支持向量机等传统的机器学习分类模型进行计算。或者直接根据人为规定的规则对获得的轨迹数据进行判断。现有的研究方法主要是利用视频数据对目标进行智能行为分析,而本申请实施例是利用目标的运动轨迹去判断目标是否发生切换,无需对视频数据进行大量的计算,只需考虑目标位置的变化,在节约算力的同时也提高了计算速度,保证了追踪的实时性;且利用预训练的神经网络进行计算,保证了计算结果的准确度。
步骤S44:利用计算机视觉方法进行二次判断。
在进一步判断切换行为之前需要先选取一段最佳视频。通过上述实施例可以得到一个目标发生切换的时刻t
0,在步骤S42中,每个时刻都选取了一个较优视角,因此,最佳视频可以是在上述最佳视角下拍摄的一段在t0时刻附近的视频。需要说明的是,除了仅仅采用步骤S42中选取的最佳视角下的视频以外,还可以选取多段包含原始追踪目标发生切换的画面的视频。该视频的时间段主要集中在t
0时刻附近,示例性的,可以是一段6秒左右的视频,视频所覆盖时间段的中间时刻为t
0。为了减少计算量提高计算准确度,还可以先提取视频帧中的感兴趣区域(ROI,Region Of Interest),然后再将提取之后的视频数据输入至预训练的神经网络。示例性的,感兴趣区域可以只包括被追踪目标16和公交车17。利用计算机视觉进行二次判断的流程如图12和13所示,包括如下几个步骤:
步骤S121:将选取的视频帧数据输入至预训练的卷积神经网络,可以是卷积神经网络(CNN, Convolutional Neural Networks)、三维卷积神经网络(3D CNN,3D Convolutional Neural Networks)等等。卷积神经网络主要是用来提取视频图像中目标的特征以及生成目标对应的包围框(bounding box,bbox)。在使用之前需要进行预先训练,训练集可以是已经人工标注好的包围框的图像。卷积神经网络主要包括输入层、卷积层、池化层、全连接层。输入层是整个神经网络的输入,在处理图像的卷积神经网络中,它一般代表了一张图片的像素矩阵;卷积层中的每一个节点的输入只是上一层神经网络中的一小块,这个小块的大小有3*3、5*5或者其他尺寸,卷积层用于将神经网络中的每一个小块进行更加深入的分析从而得到抽象程度更高的特征;池化层,可以进一步缩小最后全连接层中节点的个数,从而达到减少整个神经网络中的参数的目的;全连接层主要用于完成分类任务。
步骤S122:卷积神经网络提取视频中目标的特征以及生成包围框。其中,目标的特征由一系列的数字矩阵表示;目标的位置决定了bounding box的四个角的坐标,如图14所示,追踪目标16和公交车17都分别产生一个bbox,并且生成对应的目标分类。
步骤S123:根据目标的特征以及该目标对应的bounding box建立图模型。构建图模型时,以视频帧中识别的目标(在目标识别领域可以理解为一个ID)作为节点。节点之间连接的边主要分为两类。第一类边有两个组成部分:1)第一部分表示的是前后两帧之间目标的相似性。相似性越高值越高,取值范围[0,1]。显然,连续帧中的同一个人、同一辆车拥有较高相似性,不同车、人拥有较低相似性,人\车之间则基本不具有相似性;2)第二部分表示的是当前帧目标的bbox与下一帧目标的bbox之间的重合度,即IoU(intersection of union)大小。若完全重合,则值为1。第一部分和第二部分按预设的权重组合相加,形成第一类边的值。第二类边表示的是两个目标在实际空间中的距离,距离约近,边的值越大。示例性的,按照上述方法构建的图模型如图15所示,第一类边可以理解为图中的横向边,第二类边可以理解为图中的纵向边。
步骤S124:将构建好的图模型输入至图卷积神经网络。一般的图卷积神经网络包括图卷积层、池化层、全连接层和输出层。其中,图卷积层类似于图像卷积,执行了图内部的信息传递,能够充分挖掘图的特征;池化层用于降维;全连接层用于执行分类;输出层输出分类的结果。示例性的,分类结果可以包括上车行为、下车行为、人车贴近行为、人矮身行为、车子开关门行为等等。需要说明的是,神经网络的输出不仅仅是一个行为识别(上车或者下车),还可以包括行为检测,例如可以确定人所上的车的特征。需要说明的是,通过步骤S44确定的切换后的目标在一般情况下是与步骤S43确定的切换后的目标相同的,在这种情况下,直接将该目标作为切换后的目标即可。当步骤S43和步骤S44确定的切换后的目标不同时,则将步骤S44确定的切换后的目标作为新的追踪目标。
上述图卷积神经网络在使用之前也需要进行预先训练。示例性的,人工获取一段人上车的视频,按照步骤S121-S123生成视频中各个目标的图模型,将所述图模型标记为上车行为,以此作为图卷积神经网络的训练集。因此,当输入实时视频数据生成的图模型之后,图卷积神经网络可以自动输出分类,判断追踪目标是否发生跨目标行为。
需要说明的是步骤S44不是必须的,可以直接仅由步骤S43确定一个最终的切换后的新目标,这主要由监控人员的偏好或者实际情况所决定。而且步骤S44中采用的是图卷积神经网络进行判断,需要说明的是,除了图卷积神经网络以外其余人工智能分类模型也可以对选取的较优视频进行行为识别。
本申请实施例利用计算机视觉的方法对原目标是否发生切换进行了二次判断,构建了特 有的图模型,纳入更多的时间和空间信息,提高了判断的精度。
步骤S45:确定切换后的目标。
步骤S44是可选的,因此存在如下两种情况:(1)执行步骤S43之后直接执行步骤S45,则将通过步骤S43确定的目标作为切换后的目标;(2)在执行完步骤S43之后执行步骤S44,在步骤S45中,将步骤S44确定的切换后的目标作为最终的切换后的目标。需要说明的是,在第(2)种情况下(即步骤S43和步骤S44确定的切换后的目标不一样),除了直接将步骤S44确定的目标作为切换后的目标以外,还可以采用别的策略,例如结合两个步骤中所使用的模型的准确度以及输出的结果的置信度综合考虑,选择其中一个作为最终的切换后的目标。
步骤S41-S45主要以人上车的场景为例,将追踪目标从追踪目标16切换成了公交车17。之后,可以一直追踪公交车17,在每一个停车的时间段都获取乘客下车的视频画面,寻找追踪目标16。当在某个画面中重现出现追踪目标16时,将追踪目标由重新切换为追踪目标16,从而实现对目标的持续追踪。
综上,本申请实施例提供的自动跨目标追踪的方法可以概括如下:首先,采用改进的卡尔曼滤波方法融合相同视角下的视频和雷达数据,提高了单视角下的目标运动轨迹精度,接着,融合多个视角下的同一目标运动轨迹,有效地解决了遮挡问题,可以获得各个目标的连续运动轨迹。然后,将追踪目标以及附近其他目标的运动轨迹输入至预训练的神经网络,判断原始追踪目标是否发生切换,如果切换,则将追踪目标更换为新目标。此外,为了更进一步提高判断的准确度,还可以在上一步的基础上选取可以捕捉到发生切换行为画面的最佳视频用于行为分析。将所述最佳视频输入至神经网络提取特征,然后构建特有的图模型,采用预训练的图卷积神经网络对视频中的目标是否发生跨目标行为进行二次判断。上述方法除了应用于追踪可疑人员以外,还可以应用于帮助寻找出走儿童或者迷路老人等等,不论追踪的对象为何种类型,在对其追踪时都可应用本发明提供的技术方案。
从追踪策略来看,本申请提供的自动跨目标追踪方法,在原目标发生跨目标行为的时候,就能自动切换追踪目标,这样可以保证追踪的实时有效性,不会由于目标乘坐交通工具而导致追踪中断;从追踪的具体实现方式来看,本申请首次提出利用目标在世界坐标系下的运动轨迹进行跨目标行为分析判断,提高了行为判断的准确度。此外,为了提高目标的运动轨迹精度,本发明也提供了一系列的措施,包括:(1)采用独创的改进的卡尔曼滤波方法融合单视角下的视频和雷达轨迹;(2)采用卡尔曼滤波方法融合多个视角下同一目标的运动轨迹。在上述基础上,为了进一步提高判断精度,本发明还根据选取的最佳视频构建了特有的图模型,然后采用图卷积神经网络对跨目标行为进行二次判断,纳入更多时空信息。总之,本发明申请提供的自动跨目标追踪方法,可以实现对目标的实时、连续、准确地追踪,不会遗漏任何时间段的信息,提高了布控追踪的便利性以及可靠性。
除了上述实现方式之外,本申请在上述实施例的基础上还提供了另一种变形的实施例用于目标追踪。方法包括如下步骤:(1)获取追踪目标16(原始追踪目标)的运动轨迹。可以按照上述实施例中步骤S41(可选的)和S42所述的方法获取原始追踪目标的轨迹。轨迹由一段时间内目标的位置所构成,目标的位置可以用全局坐标系(例如东-北-天坐标系)下的坐标来表示。(2)根据追踪目标16的运动轨迹确定轨迹消失的初始时刻为第一时刻。即当追 踪目标16的轨迹不再出现的时候,确定目标消失的初始时刻为第一时刻。(3)获取第一时刻前后一段时间的视频数据,该视频数据包括追踪目标16可能更新为第二目标的画面。示例性的,该视频数据的画面中追踪目标16是完整且清晰的。(4)对上述步骤中获取的视频数据进行分析,确定切换后的目标。示例性的,可以将相关的视频数据输入到预训练的神经网络中进行分析。所述预训练的神经网络可以是步骤S44中采用的神经网络,通过神经网络对视频内容进行分析,确定切换后的目标。
上述方法主要描述了人(追踪目标16)上车(切换后的追踪目标)的场景,通过判断人的轨迹消失然后调取周围的相关视频进行分析从而得到切换后的目标例如公交车,进而对公交车进行追踪。当人下车时,依然也需要在公交车沿途的监控画面中寻找追踪目标16,然后继续对其进行追踪。本申请实施例全程每个时刻都只需要获取一种目标的轨迹。直接通过原始追踪目标的轨迹去调取相关视频进行行为分析,在一定程度上减少了算力的占用,保证了实时性和准确性。
上文结合图3-图15描述了本申请实施例提供的用于目标追踪的方法,下面将描述本申请实施例提供的目标追踪装置。可以包括获取模块和处理模块;
获取模块,用于获取第一目标所在场景包含的目标的感应数据,所述第一目标所在场景包含的目标包括第一目标和除所述第一目标之外的至少一个其他目标,所述第一目标为初始追踪目标;
处理模块,用于根据所述感应数据生成所述第一目标和所述至少一个其他目标的运动轨迹;还用于根据所述第一目标的运动轨迹以及所述其他目标的运动轨迹确定第二目标,并将所述第二目标作为切换后的追踪目标。
可选的,所述处理模块具体用于,确定一组候选目标,所述候选目标为:所述至少一个其他目标,或在所述至少一个其他目标中,与所述第一目标之间的距离小于预设阈值的其他目标;对于每一个候选目标,将所述第一目标的运动轨迹和所述候选目标的运动轨迹输入至预训练的第一神经网络,获得所述候选目标为所述第二目标的概率;根据所述一组候选目标为所述第二目标的概率,确定所述第二目标。
可选的,所述处理模块还用于,检测所述至少一个候选目标为所述第二目标的概率中的第一概率高于预设阈值,确定所述第一概率对应的目标为所述第二目标。
可选的,处理模块还用于,对于每一个候选目标,根据所述候选目标和所述第一目标的运动轨迹建立至少一组轨迹特征对,每组轨迹特征对包括至少两个连续时刻下的轨迹特征对,每个时刻下的所述轨迹特征对包括该时刻下所述第一目标的位置、速率,所述候选目标的位置、速率,以及所述第一目标和所述候选目标的运动方向的夹角;将所述至少一组轨迹特征对输入至第一神经网络,获得所述候选目标为所述第二目标的概率。
可选的,所述处理模块还用于,检测所述至少一个候选目标的概率中的第一概率高于预设阈值,确定所述第一概率对应的目标为所述第二目标。
可选的,设定第一概率高于预设阈值的时刻为第一时刻,所述处理模块还用于,获取第一时刻前后的视频数据,所述视频数据包括所述第一目标可能发生切换行为的画面;将所述视频数据输入至预训练的第二神经网络,根据输出结果确定所述第三目标作为切换后的追踪目标。
可选的,第二神经网络包括卷积神经网络和图卷积神经网络,所述处理模块具体用于, 将选取的视频帧数据输入至预训练的卷积神经网络,输出所述视频帧数据中所有目标的特征以及包围框;根据特征以及包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定第三目标作为切换后的追踪目标。
可选的,所述传感器包括至少两组处于不同方位传感器,不同组传感器模块处于不同的方位,针对第一目标所在场景中的每个目标,所述处理模块具体用于,分别根据至少两组传感器模块所采集的感应数据生成目标的至少两条运动轨迹;融合目标的至少两条运动轨迹,形成所述目标的运动轨迹。
可选的,每一组传感器包括至少两类传感器,即包括摄像机以及如下两类传感器中的至少一类:毫米波雷达和激光雷达,且两类传感器处于同一方位,针对第一目标所在场景中包含的每个目标,所述处理模块具体用于:分别根据至少两类传感器模块所采集的感应数据生成目标的至少两条监测轨迹;融合所述目标的至少两条监测轨迹,形成目标的运动轨迹。
本申请还提供另一种目标追踪的装置,包括获取模块和处理模块,所述获取模块用于,通过传感器获取第一目标的感应数据,所述第一目标为初始追踪目标;所述处理模块用于,根据所述感应数据生成所述第一目标的运动轨迹;根据所述第一目标的运动轨迹确定所述第一目标轨迹消失的初始时刻为第一时刻;获取所述第一时刻前后一段时间的视频帧,所述视频帧包括所述第一目标;根据所述视频帧确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
可选的,处理模块还用于,判断所述第一目标在所述初始时刻之后的运动轨迹不存在,确定所述初始时刻为所述第一时刻。
可选的,处理模块还用于,将所述视频帧输入至预训练的第二神经网络确定第二目标,将所述第二目标作为更新后的追踪目标。
可选的,所述第二神经网络包括卷积神经网络和图卷积神经网络,所述处理模块还用于:将所述视频帧输入至预训练的卷积神经网络,输出所述视频数据中所包含目标的特征以及包围框;根据所述视频帧中所包含的目标的特征以及包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如:固态硬盘solid state disk,SSD)等。
图16为本申请实施例提供的用于目标追踪的计算设备硬件结构示意图。如图16所示, 该设备160可以包括处理器1601、通信接口1602、存储器1603和系统总线1604。存储器1603和通信接口1602通过系统总线1604和处理器1601连接,并完成相互间的通信。存储器1603用于存储计算机执行指令,通信接口1602用于和其他设备进行通信,处理器1601执行计算机指令实现上述所有实施例所示的方案。
图16中提到的系统总线可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述系统总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。通信接口用于实现数据库访问装置与其他设备(例如客户端、读写库和只读库)之间的通信。存储器可能包含随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
上述的处理器可以是通用处理器,包括中央处理器CPU、网络处理器(network processor,NP)等;还可以是数字信号处理器DSP、专用集成电路ASIC、现场可编程门阵列FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。
可选的,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如上述方法实施例所示的方法。
可选的,本申请实施例还提供一种芯片,所述芯片用于执行如上述方法实施例所示的方法。
可以理解的是,在本申请的实施例中涉及的各种数字编号仅为描述方便进行的区分,并不用来限制本申请的实施例的范围。
可以理解的是,在本申请的实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。
Claims (28)
- 一种用于目标追踪的方法,其特征在于,所述方法包括:通过传感器获取第一目标所在场景包含的目标的运动轨迹,所述第一目标所在场景包含的目标包括第一目标和除所述第一目标之外的至少一个其他目标,所述第一目标为初始追踪目标;根据所述第一目标的运动轨迹以及所述至少一个其他目标的运动轨迹确定第二目标,并将所述第二目标作为更新后的追踪目标。
- 根据权利要求1所述的方法,其特征在于,所述根据所述第一目标的运动轨迹以及所述至少一个其他目标的运动轨迹确定第二目标包括:确定一组候选目标,所述候选目标为:所述至少一个其他目标,或在所述至少一个其他目标中,与所述第一目标之间的距离小于预设阈值的其他目标;对于每一个所述候选目标,将所述第一目标的运动轨迹和所述候选目标的运动轨迹输入至预训练的第一神经网络,获得所述候选目标为所述第二目标的概率;根据所述一组候选目标为所述第二目标的概率,确定所述第二目标。
- 根据权利要求2所述的方法,其特征在于,所述根据所述一组候选目标为所述第二目标的概率,确定所述第二目标包括:检测所述一组候选目标为所述第二目标的概率中的第一概率高于预设阈值,确定所述第一概率对应的目标为所述第二目标。
- 根据权利要求2或3所述的方法,其特征在于,所述对于每一个所述候选目标,将所述第一目标的运动轨迹和所述候选目标的运动轨迹输入至预训练的第一神经网络,获得所述候选目标为所述第二目标的概率包括:根据所述候选目标的运动轨迹和所述第一目标的运动轨迹建立至少一组轨迹特征对,每组轨迹特征对包括至少两个连续时刻下的轨迹特征对,每个时刻下的所述轨迹特征对包括该时刻下所述第一目标的位置、速率,所述候选目标的位置、速率,以及所述第一目标和所述候选目标的运动方向的夹角;将所述至少一组轨迹特征对输入至所述第一神经网络,获得所述候选目标为所述第二目标的概率。
- 根据权利要求3或4所述的方法,其特征在于,确定所述第一概率高于预设阈值的时刻为第一时刻,所述方法还包括:获取第一时刻前后一段时间的视频帧,所述视频帧包括所述第一目标;将所述视频帧输入至预训练的第二神经网络,根据输出结果确定第三目标,并将所述第三目标作为更新后的追踪目标。
- 根据权利要求5所述的方法,其特征在于,所述第二神经网络包括卷积神经网络和图卷积神经网络,所述将所述视频帧输入至预训练的第二神经网络,根据输出结果确定所述第三目标,并将所述第三目标作为更新后的追踪目标包括:将所述视频帧输入至预训练的卷积神经网络,输出所述视频帧中所包含目标的特征以及包围框;根据所述视频帧中所包含的目标的特征以及包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第三目标,并将所述第三目标作为更新后的追踪目标。
- 根据权利要求1-6任一所述的方法,其特征在于,所述传感器包含至少两组处于不同方位的传感器,针对所述第一目标所在场景包含的目标中的每个目标,所述通过传感器获取该目标的运动轨迹包括:对于所述至少两组传感器中每一组传感器,根据该组传感器所采集的感应数据生成对应该组传感器的所述目标的运动轨迹,从而获得所述目标的至少两条运动轨迹,所述目标的至少两条运动轨迹从不同方位拍摄得到;融合所述目标的至少两条运动轨迹,获得所述目标融合后的运动轨迹。
- 根据权利要求7所述的方法,其特征在于,每一组传感器包括至少两类传感器,所述至少两类传感器包括摄像机以及如下两类传感器中的至少一类:毫米波雷达和激光雷达,且所述至少两类传感器处于同一方位,针对所述第一目标所在场景包含的目标中的每个目标,所述对于所述至少两组传感器中每一组传感器,根据该组传感器所采集的感应数据生成对应该组传感器的所述目标的运动轨迹包括:对于所述该组传感器中的每一类传感器,根据该类传感器所采集的感应数据生成对应该类传感器的所述目标的监测轨迹,从而获得所述目标的至少两条监测轨迹;融合所述目标的至少两条监测轨迹,获得所述目标的运动轨迹。
- 一种用于目标追踪的方法,其特征在于,所述方法包括:通过传感器获取第一目标的运动轨迹,所述第一目标为初始追踪目标;根据所述第一目标的运动轨迹确定所述第一目标轨迹消失的初始时刻为第一时刻;获取所述第一时刻前后一段时间的视频帧,所述视频帧包括所述第一目标;根据所述视频帧确定第二目标,并将所述第二目标作为更新后的追踪目标。
- 根据权利要求9所述的方法,其特征在于,所述根据所述第一目标的运动轨迹确定所述第一目标轨迹消失的初始时刻为第一时刻,包括:判断所述第一目标在所述初始时刻之后的运动轨迹不存在,确定所述初始时刻为所述第一时刻。
- 根据权利要求9或10所述的方法,其特征在于,所述根据所述视频帧确定第二目标,并将所述第二目标作为更新后的追踪目标,包括:将所述视频帧输入至预训练的第二神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
- 根据权利要求11所述的方法,其特征在于,所述第二神经网络包括卷积神经网 络和图卷积神经网络,所述将所述视频帧输入至预训练的第二神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标包括:将所述视频帧输入至预训练的卷积神经网络,输出所述视频帧中所包含目标的特征以及包围框;根据所述视频帧中所包含的目标的特征以及包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
- 一种用于目标追踪的装置,其特征在于,包括:获取模块和处理模块;所述获取模块,用于获取第一目标所在场景包含的目标的感应数据,所述第一目标所在场景包含的目标包括第一目标和除所述第一目标之外的至少一个其他目标,所述第一目标为初始追踪目标;所述处理模块,用于根据所述感应数据生成所述第一目标和所述至少一个其他目标的运动轨迹,以及根据所述第一目标的运动轨迹以及所述至少一个其他目标的运动轨迹确定第二目标,并将所述第二目标作为更新后的追踪目标。
- 根据权利要求13所述的装置,其特征在于,所述处理模块具体用于,确定一组候选目标,所述候选目标为:所述至少一个其他目标,或在所述至少一个其他目标中,与所述第一目标之间的距离小于预设阈值的其他目标;对于每一个所述候选目标,将所述第一目标的运动轨迹和所述候选目标的运动轨迹输入至预训练的第一神经网络,获得所述候选目标为所述第二目标的概率;根据所述一组候选目标为所述第二目标的概率,确定所述第二目标。
- 根据权利要求14所述的装置,其特征在于,所述处理模块还用于,检测所述一组候选目标为所述第二目标的概率中的第一概率高于预设阈值,确定所述第一概率对应的目标为所述第二目标。
- 根据权利要求14或15所述的装置,其特征在于,所述处理模块还用于:对于每一个候选目标,根据所述候选目标和所述第一目标的运动轨迹建立至少一组轨迹特征对,每组轨迹特征对包括至少两个连续时刻下的轨迹特征对,每个时刻下的所述轨迹特征对包括该时刻下所述第一目标的位置、速率,所述候选目标的位置、速率,以及所述第一目标和所述候选目标的运动方向的夹角;将所述至少一组轨迹特征对输入至所述第一神经网络,获得所述候选目标为所述第二目标的概率。
- 根据权利要求15或16所述的装置,其特征在于,所述第一概率高于预设阈值的时刻为第一时刻,所述处理模块还用于,获取第一时刻前后一段时间的视频帧,所述视频帧包括所述第一目标;将所述视频帧输入至预训练的第二神经网络,根据输出结果确定第三目标,并将所述第三目标作为更新后的追踪目标。
- 根据权利要求17所述的装置,其特征在于,所述第二神经网络包括卷积神经网络和图卷积神经网络,所述处理模块具体用于,将所述视频帧输入至预训练的卷积神经网络,输出所述视频帧中所包含目标的特征以及包围框;根据所述视频数据中所有目标的特征以及所述包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第三目标,并将所述第三目标作为更新后的追踪目标。
- 根据权利要求13-18任一所述的装置,其特征在于,所述传感器包含至少两组处于不同方位的传感器,针对所述第一目标所在场景包含的目标中的每个目标,所述处理模块具体用于,对于所述至少两组传感器中每一组传感器,根据该组传感器所采集的感应数据生成对应该组传感器的所述目标的运动轨迹,从而获得所述目标的至少两条运动轨迹,所述目标的至少两条运动轨迹从不同方位拍摄得到;融合所述目标的至少两条运动轨迹,获得所述目标融合后的运动轨迹。
- 根据权利要求19所述的装置,其特征在于,每一组传感器包括至少两类传感器,所述至少两类传感器包括摄像机以及如下两类传感器中的至少一类:毫米波雷达和激光雷达,且所述至少两类传感器处于同一方位,针对所述第一目标所在场景包含的目标中的每个目标,所述处理模块具体用于,对于所述该组传感器中每一类传感器,根据该类传感器所采集的感应数据生成对应该类传感器的所述目标的监测轨迹,从而获得所述目标的至少两条监测轨迹;融合所述目标的至少两条监测轨迹,获得所述目标的运动轨迹。
- 一种用于目标追踪的装置,其特征在于,所述装置包括获取模块和处理模块,所述获取模块用于,通过传感器获取第一目标的感应数据,所述第一目标为初始追踪目标;所述处理模块用于,根据所述感应数据生成所述第一目标的运动轨迹;根据所述第一目标的运动轨迹确定所述第一目标轨迹消失的初始时刻为第一时刻;获取所述第一时刻前后一段时间的视频帧,所述视频帧包括所述第一目标;根据所述视频帧确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
- 根据权利要求21所述的装置,其特征在于,所述处理模块还用于,判断所述第一目标在所述初始时刻之后的运动轨迹不存在,确定所述初始时刻为所述第一时刻。
- 根据权利要求21或22所述的装置,其特征在于,所述处理模块还用于,将所述视频帧输入至预训练的第二神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
- 根据权利要求23所述的装置,其特征在于,所述第二神经网络包括卷积神经网络和图卷积神经网络,所述处理模块还用于:将所述视频帧输入至预训练的卷积神经网络,输出所述视频数据中所包含目标的特征以及包围框;根据所述视频帧中所包含的目标的特征以及包围框构建图模型;将所述图模型输入至预训练的图卷积神经网络,根据输出结果确定所述第二目标,并将所述第二目标作为更新后的追踪目标。
- 一种用于目标追踪的计算设备,其特征在于,所述计算设备包括处理器和存储器,其中:所述存储器中存储有计算机指令;所述处理器执行所述存储器存储的计算机指令,以实现所述权利要求1-8中任一项所述的方法。
- 一种用于目标追踪的计算设备,其特征在于,所述计算设备包括处理器和存储器,其中:所述存储器中存储有计算机指令;所述处理器执行所述存储器存储的计算机指令,以实现所述权利要求9-12中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算机执行时,所述计算机程序代码使得所述计算机执行如权利要求1-8中任一项所述方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被计算机执行时,所述计算机程序代码使得所述计算机执行如权利要求9-12中任一项所述方法。
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010129282.0 | 2020-02-28 | ||
CN202010129282 | 2020-02-28 | ||
CN202010427448.7A CN113326719A (zh) | 2020-02-28 | 2020-05-19 | 一种用于目标追踪的方法、设备及系统 |
CN202010427448.7 | 2020-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021170030A1 true WO2021170030A1 (zh) | 2021-09-02 |
Family
ID=77413020
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/077845 WO2021170030A1 (zh) | 2020-02-28 | 2021-02-25 | 一种用于目标追踪的方法、设备及系统 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113326719A (zh) |
WO (1) | WO2021170030A1 (zh) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113777994A (zh) * | 2021-09-17 | 2021-12-10 | 延边国泰新能源汽车有限公司 | 一种高寒地域公共车智慧监控服务系统及方法 |
CN114022902A (zh) * | 2021-11-04 | 2022-02-08 | 湖南大学 | 一种行人重识别方法及系统 |
CN114067428A (zh) * | 2021-11-02 | 2022-02-18 | 上海浦东发展银行股份有限公司 | 多视角多目标的跟踪方法、装置、计算机设备和存储介质 |
CN114119665A (zh) * | 2021-11-26 | 2022-03-01 | 中国第一汽车股份有限公司 | 人脸追踪方法及系统 |
CN114154687A (zh) * | 2021-11-17 | 2022-03-08 | 泰康保险集团股份有限公司 | 一种轨迹预测方法、装置、电子设备及存储介质 |
CN114463385A (zh) * | 2022-01-12 | 2022-05-10 | 平安科技(深圳)有限公司 | 基于枪球联动系统的目标跟踪方法、装置、设备及介质 |
CN114648870A (zh) * | 2022-02-11 | 2022-06-21 | 行云新能科技(深圳)有限公司 | 边缘计算系统、边缘计算决策预测方法以及计算机可读存储介质 |
CN114663473A (zh) * | 2022-03-02 | 2022-06-24 | 国网浙江省电力有限公司电力科学研究院 | 基于多视角信息融合的人员目标定位与追踪方法及系统 |
CN117197182A (zh) * | 2023-11-07 | 2023-12-08 | 华诺星空技术股份有限公司 | 雷视标定方法、设备及存储介质 |
CN117237418A (zh) * | 2023-11-15 | 2023-12-15 | 成都航空职业技术学院 | 一种基于深度学习的运动目标检测方法和系统 |
CN118628008A (zh) * | 2024-08-08 | 2024-09-10 | 菲特(天津)检测技术有限公司 | 一种物流物料追溯管理方法及装置 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113628251B (zh) * | 2021-10-11 | 2022-02-01 | 北京中科金马科技股份有限公司 | 一种智慧酒店终端监测方法 |
CN114049767B (zh) * | 2021-11-10 | 2023-05-12 | 刘鹏 | 一种边缘计算方法、装置及可读存储介质 |
CN114333330B (zh) * | 2022-01-27 | 2023-04-25 | 浙江嘉兴数字城市实验室有限公司 | 一种基于路侧边缘全息感知的交叉口事件检测系统 |
CN114399537B (zh) * | 2022-03-23 | 2022-07-01 | 东莞先知大数据有限公司 | 一种目标人员的车辆跟踪方法及系统 |
CN117893581B (zh) * | 2024-01-17 | 2024-08-02 | 微网优联科技(成都)有限公司 | 一种移动行人追踪方法及系统 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156982A (zh) * | 2014-07-31 | 2014-11-19 | 华为技术有限公司 | 运动目标跟踪方法和装置 |
US9373036B1 (en) * | 2015-01-16 | 2016-06-21 | Toyota Motor Engineering & Manufacturing North America, Inc. | Collaborative distance metric learning for method and apparatus visual tracking |
CN108986151A (zh) * | 2017-05-31 | 2018-12-11 | 华为技术有限公司 | 一种多目标跟踪处理方法及设备 |
CN110400347A (zh) * | 2019-06-25 | 2019-11-01 | 哈尔滨工程大学 | 一种判断遮挡及目标重定位的目标跟踪方法 |
-
2020
- 2020-05-19 CN CN202010427448.7A patent/CN113326719A/zh active Pending
-
2021
- 2021-02-25 WO PCT/CN2021/077845 patent/WO2021170030A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156982A (zh) * | 2014-07-31 | 2014-11-19 | 华为技术有限公司 | 运动目标跟踪方法和装置 |
US9373036B1 (en) * | 2015-01-16 | 2016-06-21 | Toyota Motor Engineering & Manufacturing North America, Inc. | Collaborative distance metric learning for method and apparatus visual tracking |
CN108986151A (zh) * | 2017-05-31 | 2018-12-11 | 华为技术有限公司 | 一种多目标跟踪处理方法及设备 |
CN110400347A (zh) * | 2019-06-25 | 2019-11-01 | 哈尔滨工程大学 | 一种判断遮挡及目标重定位的目标跟踪方法 |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113777994A (zh) * | 2021-09-17 | 2021-12-10 | 延边国泰新能源汽车有限公司 | 一种高寒地域公共车智慧监控服务系统及方法 |
CN114067428A (zh) * | 2021-11-02 | 2022-02-18 | 上海浦东发展银行股份有限公司 | 多视角多目标的跟踪方法、装置、计算机设备和存储介质 |
CN114022902A (zh) * | 2021-11-04 | 2022-02-08 | 湖南大学 | 一种行人重识别方法及系统 |
CN114154687A (zh) * | 2021-11-17 | 2022-03-08 | 泰康保险集团股份有限公司 | 一种轨迹预测方法、装置、电子设备及存储介质 |
CN114119665A (zh) * | 2021-11-26 | 2022-03-01 | 中国第一汽车股份有限公司 | 人脸追踪方法及系统 |
CN114463385A (zh) * | 2022-01-12 | 2022-05-10 | 平安科技(深圳)有限公司 | 基于枪球联动系统的目标跟踪方法、装置、设备及介质 |
CN114648870A (zh) * | 2022-02-11 | 2022-06-21 | 行云新能科技(深圳)有限公司 | 边缘计算系统、边缘计算决策预测方法以及计算机可读存储介质 |
CN114648870B (zh) * | 2022-02-11 | 2023-07-28 | 行云新能科技(深圳)有限公司 | 边缘计算系统、边缘计算决策预测方法以及计算机可读存储介质 |
CN114663473A (zh) * | 2022-03-02 | 2022-06-24 | 国网浙江省电力有限公司电力科学研究院 | 基于多视角信息融合的人员目标定位与追踪方法及系统 |
CN117197182A (zh) * | 2023-11-07 | 2023-12-08 | 华诺星空技术股份有限公司 | 雷视标定方法、设备及存储介质 |
CN117197182B (zh) * | 2023-11-07 | 2024-02-27 | 华诺星空技术股份有限公司 | 雷视标定方法、设备及存储介质 |
CN117237418A (zh) * | 2023-11-15 | 2023-12-15 | 成都航空职业技术学院 | 一种基于深度学习的运动目标检测方法和系统 |
CN118628008A (zh) * | 2024-08-08 | 2024-09-10 | 菲特(天津)检测技术有限公司 | 一种物流物料追溯管理方法及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN113326719A (zh) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021170030A1 (zh) | 一种用于目标追踪的方法、设备及系统 | |
Liu et al. | Robust target recognition and tracking of self-driving cars with radar and camera information fusion under severe weather conditions | |
KR102197946B1 (ko) | 딥러닝 인공지능 기술을 이용한 객체인식 및 카운팅 방법 | |
Dai et al. | Multi-task faster R-CNN for nighttime pedestrian detection and distance estimation | |
CN104303193B (zh) | 基于聚类的目标分类 | |
US20160364614A1 (en) | Pedestrian Right of Way Monitoring and Reporting System and Method | |
US20030123703A1 (en) | Method for monitoring a moving object and system regarding same | |
Chang et al. | Video analytics in smart transportation for the AIC'18 challenge | |
Rezaei et al. | Traffic-Net: 3D traffic monitoring using a single camera | |
CN111666860A (zh) | 一种车牌信息与车辆特征融合的车辆轨迹跟踪方法 | |
Zheng et al. | Detection, localization, and tracking of multiple MAVs with panoramic stereo camera networks | |
CN112465854A (zh) | 基于无锚点检测算法的无人机跟踪方法 | |
CN115424233A (zh) | 基于信息融合的目标检测方法以及目标检测装置 | |
CN116311166A (zh) | 交通障碍物识别方法、装置及电子设备 | |
Jain et al. | Relative vehicle velocity estimation using monocular video stream | |
Mannion | Vulnerable road user detection: state-of-the-art and open challenges | |
Arthi et al. | Object detection of autonomous vehicles under adverse weather conditions | |
CN113537170A (zh) | 一种交通路况智能监测方法和计算机可读存储介质 | |
Chang et al. | Using spatiotemporal stacks for precise vehicle tracking from roadside 3D LiDAR data | |
Ng et al. | Outdoor illegal parking detection system using convolutional neural network on Raspberry Pi | |
Liu et al. | Multi-view vehicle detection and tracking in crossroads | |
CN117115752A (zh) | 一种高速公路视频监控方法及系统 | |
Qui et al. | The study of the detection of pedestrian and bicycle using image processing | |
CN115953764A (zh) | 基于鸟瞰图的车辆哨兵方法、装置、设备及存储介质 | |
Zhang et al. | Faster R-CNN based on frame difference and spatiotemporal context for vehicle detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21760780 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21760780 Country of ref document: EP Kind code of ref document: A1 |