CN111753732A - Vehicle multi-target tracking method based on target center point - Google Patents

Vehicle multi-target tracking method based on target center point Download PDF

Info

Publication number
CN111753732A
CN111753732A CN202010590410.1A CN202010590410A CN111753732A CN 111753732 A CN111753732 A CN 111753732A CN 202010590410 A CN202010590410 A CN 202010590410A CN 111753732 A CN111753732 A CN 111753732A
Authority
CN
China
Prior art keywords
vehicle
target
tracking
data set
detection model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010590410.1A
Other languages
Chinese (zh)
Inventor
杨航
杨海东
黄坤山
彭文瑜
林玉山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Original Assignee
Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute, Foshan Guangdong University CNC Equipment Technology Development Co. Ltd filed Critical Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute
Priority to CN202010590410.1A priority Critical patent/CN111753732A/en
Publication of CN111753732A publication Critical patent/CN111753732A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a vehicle multi-target tracking method based on a target center point, which comprises the following steps: s1, acquiring a vehicle tracking data set, and performing image enhancement on the vehicle tracking data set; s2, building a vehicle detection model, setting a hyper-parameter, and pre-training the vehicle detection model through the vehicle tracking data set; s3, copying all weights from the vehicle detection model, adding 4 input channels and 2 output channels on the basis of the original vehicle detection model, and retraining to generate a vehicle tracking model; s4, inputting the video stream into the vehicle tracking model to obtain the result of vehicle multi-target tracking, the invention greatly reduces the amount of calculation and operation time by integrating the two modules into a network, and simplifies the detection based on tracking, the detector based on tracking can directly extract the heat map and carry out joint reasoning on the targets in a plurality of frames when associating them.

Description

Vehicle multi-target tracking method based on target center point
Technical Field
The invention relates to the technical field of multi-target tracking, in particular to a vehicle multi-target tracking method based on a target center point.
Background
With the development of computer hardware technology and computer vision technology, a traffic monitoring system based on computer vision becomes possible, and real-time detection and tracking of video vehicles are core parts of an intelligent traffic monitoring system. In the existing detection and tracking technology, under the conditions of a complex scene, a large range and multiple targets, the tracking effect of a moving target is not ideal, and further improvement is needed. Due to the development and application of convolutional neural networks (CNN for short), tasks in many computer vision fields are greatly developed, and meanwhile, a plurality of target methods based on CNN are also applied to solving the problems of multi-target tracking and the like.
The existing mainstream target tracking method mostly follows the idea of tracking-by-detection, an interested target in each frame is detected by using a target detection algorithm to obtain corresponding indexes such as position coordinates, classification, reliability and the like, and the detection result in the last step is supposed to be associated with the detection target in the last frame in a certain mode.
Disclosure of Invention
Aiming at the problems, the invention provides a vehicle multi-target tracking method based on a target central point, and mainly provides a novel tracking model structure which replaces objects with points and simultaneously detects and tracks. By performing detection on an image and combining the target detection results of previous frames to estimate the target motion of the current frame, simplicity, on-line and real-time can be achieved without requiring high computational resources.
The invention provides a vehicle multi-target tracking method based on a target center point, which simplifies two key steps of a traditional tracking scheme: one is tracking condition detection, because each object in the past frame is represented by a single point, its historical information is contained in its corresponding heat map, from which the model can directly extract relevant information; and secondly, the targets are correlated in a cross-time mode, and the targets in different frames can be connected through simple displacement prediction similar to sparse optical flow. The displacement prediction is based on previous detection results, which can jointly detect objects in the current frame and associate them with the previous detection results, comprising the following steps:
s1, acquiring a vehicle tracking data set, and performing image enhancement on the vehicle tracking data set;
s2, building a vehicle detection model, setting a hyper-parameter, and pre-training the vehicle detection model through the vehicle tracking data set;
s3, copying all weights from the vehicle detection model, adding 4 input channels and 2 output channels on the basis of the original vehicle detection model, and retraining to generate a vehicle tracking model;
and S4, inputting the video stream into the vehicle tracking model to obtain a vehicle multi-target tracking result.
In a further refinement, the vehicle tracking data set includes MOT, KITTI, and nuScenes, and is divided into a training set, a test set, and a validation set in a 6:2:2 ratio.
In a further improvement, the image enhancement comprises at least one of the following methods:
1) training chart I in the vehicle tracking data setHRotating according to different angles and generating four sub-graphs, wherein the sub-graphs are marked as
Figure BDA0002556114600000031
i∈{-30°,-15°,+15°,+30°};
2) Training chart I in the vehicle tracking data setHCarrying out size transformation, and marking the sub-graph after size transformation as
Figure BDA0002556114600000032
3) Training chart I in the vehicle tracking data setHUsing pixel-by-pixel binary segmentation, the sub-image after binary segmentation being marked as
Figure BDA0002556114600000033
The character count value is C.
In a further improvement, the step S2 specifically includes the following steps:
s21, cutting the training chart in the vehicle tracking data set into a resolution format of 906 × 554, inputting the training chart into the vehicle detection model, and setting the hyper-parameter false positive rate to be lambdafp0.1, false negative rate λfn0.4, 0.4 confidence threshold θ, 0.5 heat map rendering threshold τ;
s22, selecting a deformable convolution as an up-sampling convolution to jump and connect a low layer and an output layer of the vehicle detection model, selecting a step length of 4, carrying out batch processing on the selected step length of 12, training 30 rounds by using an Adam optimizer, setting the learning rate of the front 20 rounds to be le-4, decreasing the learning rate of the rear 10 rounds to be le-6 from le-5, and adopting focal loss as a loss function:
Figure BDA0002556114600000034
s23, surrounding the point in the downsampled image map
Figure BDA0002556114600000035
The key points are distributed on the characteristic graph by using a Gaussian kernel, and the Gaussian parameters are adjusted according to the target size to carry out fuzzy processing;
and S24, comparing all the response points on the heat map with the 8 adjacent points connected with the response points, if the response value of point changing is greater than or equal to the 8 adjacent point values, reserving the first N peak points meeting the condition, and obtaining the heat map which has the resolution of 240x136 and contains the target center point.
In a further improvement, the step S24 further includes:
in order to reduce the false alarm probability, only the targets with scores higher than a preset threshold value in the detection result are rendered.
In a further improvement, the step S3 specifically includes the following steps:
s31, copying all weights from the vehicle detection model, keeping the basic hyper-parameters unchanged, adding the image of the previous frame and the generated heat map to the input part of the original vehicle detection model, and adding a two-dimensional offset vector of the target center point in the current frame to the center point of the previous frame to the output part;
s32, and adding 2 additional output channels for predicting two-dimensional offset vectors describing the displacement of the position of each object in the current frame relative to its position in the previous frame image, in order to be able to establish the link between the detection targets in time;
s33, drawing a target boundary box according to the central point of the generated heat map, establishing a connection between the object of the current frame and the object of the previous frame by using a greedy matching strategy, inheriting the ID of the object of the previous frame if the matching is successful, and distributing a new ID for the target if the matching is unsuccessful.
In a further improvement, the step S3 further includes:
on the prediction result of the previous frame, Gaussian disturbance is carried out on each detection target point, and a hyper-parameter lambda is setftSimulating the situation of target positioning error;
and randomly rendering some false peaks near the center of the ground truth target with a certain probability, and setting a hyper-parameter lambdafpSimulating the condition of false detection, wherein the value is 0.1;
randomly removing part of detection results with a predetermined probability, and setting a hyper-parameter lambdafnThe case of missing inspection was simulated at 0.02.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention abandons the traditional mode of separating the detection part from the association part in the past, and greatly reduces the calculation amount and the running time by integrating the two modules in one network. While the detection based on tracking is simplified, each target in the video stream is identified by a simple point, and multiple targets can be identified by a heat map comprising multiple points. The trace-based detector may directly extract the heat map and perform joint reasoning on the objects in the multiple frames when associating them.
2. Point-based tracking simplifies target association across time. Simple displacement prediction like sparse optical flow can connect objects in different frames. The displacement prediction is based on previous detection results, which enable joint detection of objects in the current frame and their correlation with previous detection results.
3. Since it is a completely local method, only objects in adjacent frames are correlated, and lost, time-distant tracking is not reinitialized. Inputting the current frame and the previous frame into the model together can help the network to estimate the change of the object in the scene and recover the object that may not be observed in the current frame according to the clue provided by the previous frame.
Drawings
The drawings are for illustrative purposes only and are not to be construed as limiting the patent; for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
FIG. 1 is a schematic overall flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of a network framework according to an embodiment of the present invention;
fig. 3 is a diagram of a detection network structure according to an embodiment of the present invention.
Detailed Description
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted" and "connected" are to be interpreted broadly, e.g., as being either fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, so to speak, as communicating between the two elements. The specific meaning of the above terms in the present invention can be understood in specific cases to those skilled in the art. The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Referring to fig. 1-, a method for tracking multiple targets of a vehicle based on a target center point includes the following steps:
s1, acquiring a vehicle tracking data set, and performing image enhancement on the vehicle tracking data set;
s2, building a vehicle detection model, setting a hyper-parameter, and pre-training the vehicle detection model through the vehicle tracking data set;
s3, copying all weights from the vehicle detection model, adding 4 input channels and 2 output channels on the basis of the original vehicle detection model, and retraining to generate a vehicle tracking model;
and S4, inputting the video stream into the vehicle tracking model to obtain a vehicle multi-target tracking result.
Specifically, the vehicle detection model uses a codec full convolution network in step S2, as shown in fig. 2, the full convolution network is used to directly obtain a heat map of a 4-fold down-sampling, and anchor points do not need to be set in advance, so that the network parameters and the amount of calculation are greatly reduced. The peak value of the heat map is used as a target central point extracted by the network, and then a threshold value is set for screening to obtain a final target central point. All the upsampling is performed by deformable convolution, and the convolution has the effect of enabling the receptive field of the network to be more accurate, but is not limited to be within a 3 × 3 rectangular convolution frame. Meanwhile, the 4 times of downsampling feature map is much higher than the resolution of a common network, and large and small targets can be well detected at the same time without multi-scale prediction and a pyramid. The up-sampling uses the transposition convolution which is greatly different from the bilinear difference value in the general up-sampling, and the transposition convolution can better recover the semantic information and the position information of the image. In order to generate a heat map close to the real situation, the situations of positioning errors, missing detection and false detection in practice are simulated by adding disturbance to a detection target point, rendering some false peaks and randomly removing some detection results with a certain probability in training.
Specifically, the input to the network in step 3 is a pair of images, and a heat map rendered from the detection of the first frame image. The peak position corresponds to a target center point, Gaussian and rendering methods are used, Gaussian parameters are adjusted according to the size of the target to carry out fuzzy processing, and in order to reduce the false alarm probability, the target with the score higher than a certain threshold value in the detection result is rendered. The model outputs an offset vector from the center of the current object to the center of the previous frame object, and this offset vector is learned as an additional attribute of the center point, thereby adding little extra computation. After the center point and the offset are available, the object of the current frame and the response object of the previous frame can be linked by only a greedy matching strategy. In addition, during the actual training process, the previous frame It-1Not necessarily the previous frame but also other frames in the same video sequence. By this enhanced approach, the sensitivity of the model to the video frame rate can be circumvented.
As a preferred embodiment of the present invention, the vehicle tracking data set includes MOT, KITTI and nuScenes, and is divided into a training set, a test set and a validation set in a ratio of 6:2: 2.
As a preferred embodiment of the present invention, the image enhancement includes at least one of the following methods:
1) training chart I in the vehicle tracking data setHRotating according to different angles and generating four sub-graphs, wherein the sub-graphs are marked as
Figure BDA0002556114600000081
i∈{-30°,-15°,+15°,+30°};
2) Training chart I in the vehicle tracking data setHCarrying out size transformation, and marking the sub-graph after size transformation as
Figure BDA0002556114600000082
3) Will the vehicleTraining diagram I in vehicle tracking data setHUsing pixel-by-pixel binary segmentation, the sub-image after binary segmentation being marked as
Figure BDA0002556114600000083
The character count value is C.
As a preferred embodiment of the present invention, the step S2 specifically includes the following steps:
s21, cutting the image into 960x544 resolution, inputting the image into a network, and setting the hyper-parameter false positive rate as lambdafp0.1, the false negative rate is set as λfn0.4, 0.4 confidence threshold θ, 0.5 heat map rendering threshold τ;
and S22, jumping and connecting the lower layer and the output layer by using a formalable convolution, and replacing the traditional upsampling convolution with the formalable convolution. The step size was selected to be 4, the batch size was 12, 30 rounds were trained with the Adam optimizer, the learning rate was set to 1e-4 for the first 20 rounds, and the next 10 rounds were decremented from 1e-5 to 1 e-6. Considering the problem of the imbalance of positive and negative samples, we take focal length as the loss function:
Figure BDA0002556114600000091
s23, surrounding the point in the downsampled image map
Figure BDA0002556114600000092
The key points are distributed on the characteristic graph by using a Gaussian kernel, and the Gaussian parameters are adjusted according to the target size to carry out fuzzy processing;
and S24, comparing all the response points on the heat map with the 8 adjacent points connected with the response points, if the response value of the point change is larger than or equal to the eight adjacent point values, keeping the first N peak points meeting the condition, wherein the peak points are the target center points. In order to reduce the false alarm probability, only the targets with scores higher than a certain threshold value in the detection result are rendered, and finally a heat map containing the center points of the targets with the resolution of 240x136 is obtained.
As a preferred embodiment of the present invention, the step S3 specifically includes the following steps:
s31, copying all weights of all models from step 2, keeping the basic hyper-parameters unchanged, adding an image of a frame (3 channels) and a generated heat map (1 channel) in the original model input part, adding a two-dimensional offset vector of a target center point in the current frame to the center point of the previous frame in the output part, and adding a two-dimensional offset vector of the target center point in the previous frame I in the actual training processt-1The frame is not necessarily the previous frame, and can be other frames in the same video sequence, and the sensitivity of the model to the video frame rate can be avoided in an enhanced mode;
s32, adding 2 extra output channels for predicting a 2-dimensional offset vector, which describes the displacement of the position of each object in the current frame relative to the position of each object in the previous frame image, so as to establish the relation between the detection targets in time;
s33, drawing a target boundary box according to the central point of the generated heat map, establishing a connection between the object of the current frame and the object of the previous frame by using a greedy matching strategy, inheriting the ID of the object of the previous frame if the matching is successful, and distributing a new ID for the target if the matching is unsuccessful.
As a preferred embodiment of the present invention, the step S3 further includes:
during model inference, heatmaps are rendered from model predictions, and there may be a variable number of missed, false, and mislocalized situations. We simulate these conditions by some methods during the training process, improving the system robustness, specifically:
on the prediction result of the previous frame, Gaussian disturbance is carried out on each detection target point, and a hyper-parameter lambda is setftSimulating the situation of target positioning error;
and randomly rendering some false peaks near the center of the ground truth target with a certain probability, and setting a hyper-parameter lambdafpSimulating the condition of false detection, wherein the value is 0.1;
randomly removing part of detection results with a predetermined probability, and setting a hyper-parameter lambdafnThe case of missing inspection was simulated at 0.02.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention abandons the traditional mode of separating the detection part from the association part in the past, and greatly reduces the calculation amount and the running time by integrating the two modules in one network. While the detection based on tracking is simplified, each target in the video stream is identified by a simple point, and multiple targets can be identified by a heat map comprising multiple points. The trace-based detector may directly extract the heat map and perform joint reasoning on the objects in the multiple frames when associating them.
2. Point-based tracking simplifies target association across time. Simple displacement prediction like sparse optical flow can connect objects in different frames. The displacement prediction is based on previous detection results, which enable joint detection of objects in the current frame and their correlation with previous detection results.
3. Since it is a completely local method, only objects in adjacent frames are correlated, and lost, time-distant tracking is not reinitialized. Inputting the current frame and the previous frame into the model together can help the network to estimate the change of the object in the scene and recover the object that may not be observed in the current frame according to the clue provided by the previous frame.
In the drawings, the positional relationship is described for illustrative purposes only and is not to be construed as limiting the present patent; it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (7)

1. A vehicle multi-target tracking method based on a target center point is characterized by comprising the following steps:
s1, acquiring a vehicle tracking data set, and performing image enhancement on the vehicle tracking data set;
s2, building a vehicle detection model, setting a hyper-parameter, and pre-training the vehicle detection model through the vehicle tracking data set;
s3, copying all weights from the vehicle detection model, adding 4 input channels and 2 output channels on the basis of the original vehicle detection model, and retraining to generate a vehicle tracking model;
and S4, inputting the video stream into the vehicle tracking model to obtain a vehicle multi-target tracking result.
2. The vehicle multi-target tracking method based on the target center points as claimed in claim 1, wherein the vehicle tracking data sets comprise MOT, KITTI and nuScenes, and are divided into a training set, a testing set and a verification set according to a ratio of 6:2: 2.
3. The method for multi-target tracking of vehicles based on target center points as claimed in claim 1, wherein the image enhancement comprises at least one of the following methods:
1) training chart I in the vehicle tracking data setHRotating according to different angles and generating four sub-graphs, wherein the sub-graphs are marked as
Figure FDA0002556114590000011
2) Training chart I in the vehicle tracking data setHCarrying out size transformation, and marking the sub-graph after size transformation as
Figure FDA0002556114590000012
3) Training chart I in the vehicle tracking data setHUsing pixel-by-pixel binary segmentation, the sub-image after binary segmentation being marked as
Figure FDA0002556114590000021
The character count value is C.
4. The method for multiple target tracking of vehicles based on target center points as claimed in claim 1, wherein said step S2 specifically comprises the following steps:
s21, cutting the training chart in the vehicle tracking data set into a resolution format of 906 × 554, inputting the training chart into the vehicle detection model, and setting the hyper-parameter false positive rate to be lambdafp0.1, false negative rate λfn0.4, 0.4 confidence threshold θ, 0.5 heat map rendering threshold τ;
s22, selecting a deformable convolution as an up-sampling convolution to jump and connect a low layer and an output layer of the vehicle detection model, selecting a step length of 4, carrying out batch processing on the selected step length of 12, training 30 rounds by using an Adam optimizer, setting the learning rate of the front 20 rounds to be le-4, decreasing the learning rate of the rear 10 rounds to be le-6 from le-5, and adopting focal loss as a loss function:
Figure FDA0002556114590000022
s23, surrounding the point in the downsampled image map
Figure FDA0002556114590000023
The key points are distributed on the characteristic graph by using a Gaussian kernel, and the Gaussian parameters are adjusted according to the target size to carry out fuzzy processing;
and S24, comparing all the response points on the heat map with the 8 adjacent points connected with the response points, if the response value of point changing is greater than or equal to the 8 adjacent point values, reserving the first N peak points meeting the condition, and obtaining the heat map which has the resolution of 240x136 and contains the target center point.
5. The method for multiple target tracking of vehicles according to claim 4, wherein said step S24 further comprises:
in order to reduce the false alarm probability, only the targets with scores higher than a preset threshold value in the detection result are rendered.
6. The method for multiple target tracking of vehicles based on target center points as claimed in claim 1, wherein said step S3 specifically comprises the following steps:
s31, copying all weights from the vehicle detection model, keeping the basic hyper-parameters unchanged, adding the image of the previous frame and the generated heat map to the input part of the original vehicle detection model, and adding a two-dimensional offset vector of the target center point in the current frame to the center point of the previous frame to the output part;
s32, and adding 2 additional output channels for predicting two-dimensional offset vectors describing the displacement of the position of each object in the current frame relative to its position in the previous frame image, in order to be able to establish the link between the detection targets in time;
s33, drawing a target boundary box according to the central point of the generated heat map, establishing a connection between the object of the current frame and the object of the previous frame by using a greedy matching strategy, inheriting the ID of the object of the previous frame if the matching is successful, and distributing a new ID for the target if the matching is unsuccessful.
7. The method for multiple target tracking of vehicles according to claim 6, wherein said step S3 further comprises:
on the prediction result of the previous frame, Gaussian disturbance is carried out on each detection target point, and a hyper-parameter lambda is setftSimulating the situation of target positioning error;
and randomly rendering some false peaks near the center of the ground truth target with a certain probability, and setting a hyper-parameter lambdafpSimulating the condition of false detection, wherein the value is 0.1;
randomly removing part of detection results with a predetermined probability, and setting a hyper-parameter lambdafnThe case of missing inspection was simulated at 0.02.
CN202010590410.1A 2020-06-24 2020-06-24 Vehicle multi-target tracking method based on target center point Pending CN111753732A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010590410.1A CN111753732A (en) 2020-06-24 2020-06-24 Vehicle multi-target tracking method based on target center point

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010590410.1A CN111753732A (en) 2020-06-24 2020-06-24 Vehicle multi-target tracking method based on target center point

Publications (1)

Publication Number Publication Date
CN111753732A true CN111753732A (en) 2020-10-09

Family

ID=72677148

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010590410.1A Pending CN111753732A (en) 2020-06-24 2020-06-24 Vehicle multi-target tracking method based on target center point

Country Status (1)

Country Link
CN (1) CN111753732A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257609A (en) * 2020-10-23 2021-01-22 重庆邮电大学 Vehicle detection method and device based on self-adaptive key point heat map
CN112561959A (en) * 2020-12-08 2021-03-26 佛山市南海区广工大数控装备协同创新研究院 Online vehicle multi-target tracking method based on neural network
CN113033573A (en) * 2021-03-16 2021-06-25 佛山市南海区广工大数控装备协同创新研究院 Method for improving detection performance of instance segmentation model based on data enhancement
CN113538523A (en) * 2021-09-17 2021-10-22 魔视智能科技(上海)有限公司 Parking space detection tracking method, electronic equipment and vehicle
CN114283175A (en) * 2021-12-28 2022-04-05 中国人民解放军国防科技大学 Vehicle multi-target tracking method and device based on traffic video monitoring scene

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532894A (en) * 2019-08-05 2019-12-03 西安电子科技大学 Remote sensing target detection method based on boundary constraint CenterNet
CN110889449A (en) * 2019-11-27 2020-03-17 中国人民解放军国防科技大学 Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
CN110992317A (en) * 2019-11-19 2020-04-10 佛山市南海区广工大数控装备协同创新研究院 PCB defect detection method based on semantic segmentation
CN111127516A (en) * 2019-12-19 2020-05-08 苏州智加科技有限公司 Target detection and tracking method and system without search box

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110532894A (en) * 2019-08-05 2019-12-03 西安电子科技大学 Remote sensing target detection method based on boundary constraint CenterNet
CN110992317A (en) * 2019-11-19 2020-04-10 佛山市南海区广工大数控装备协同创新研究院 PCB defect detection method based on semantic segmentation
CN110889449A (en) * 2019-11-27 2020-03-17 中国人民解放军国防科技大学 Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
CN111127516A (en) * 2019-12-19 2020-05-08 苏州智加科技有限公司 Target detection and tracking method and system without search box

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINGYI ZHOU 等: "Objects as Points", 《ARXIV》, pages 1 - 12 *
XINGYI ZHOU 等: "Tracking Objects as Points", 《ARXIV》, pages 1 - 22 *
夏雪 等: "基于轻量级无锚点深度卷积神经网络的树上苹果检测模型", 《智慧农业》, vol. 2, no. 1, pages 99 - 110 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112257609A (en) * 2020-10-23 2021-01-22 重庆邮电大学 Vehicle detection method and device based on self-adaptive key point heat map
CN112561959A (en) * 2020-12-08 2021-03-26 佛山市南海区广工大数控装备协同创新研究院 Online vehicle multi-target tracking method based on neural network
CN113033573A (en) * 2021-03-16 2021-06-25 佛山市南海区广工大数控装备协同创新研究院 Method for improving detection performance of instance segmentation model based on data enhancement
CN113538523A (en) * 2021-09-17 2021-10-22 魔视智能科技(上海)有限公司 Parking space detection tracking method, electronic equipment and vehicle
CN114283175A (en) * 2021-12-28 2022-04-05 中国人民解放军国防科技大学 Vehicle multi-target tracking method and device based on traffic video monitoring scene
CN114283175B (en) * 2021-12-28 2024-02-02 中国人民解放军国防科技大学 Vehicle multi-target tracking method and device based on traffic video monitoring scene

Similar Documents

Publication Publication Date Title
CN111753732A (en) Vehicle multi-target tracking method based on target center point
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
CN112183788B (en) Domain adaptive equipment operation detection system and method
CN112800937B (en) Intelligent face recognition method
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
CN113506317A (en) Multi-target tracking method based on Mask R-CNN and apparent feature fusion
EP3690744B1 (en) Method for integrating driving images acquired from vehicles performing cooperative driving and driving image integrating device using same
CN110705412A (en) Video target detection method based on motion history image
CN113313037A (en) Method for detecting video abnormity of generation countermeasure network based on self-attention mechanism
CN112801019B (en) Method and system for eliminating re-identification deviation of unsupervised vehicle based on synthetic data
Wang et al. Sface: An efficient network for face detection in large scale variations
CN111368634B (en) Human head detection method, system and storage medium based on neural network
Fleck et al. Robust tracking of reference trajectories for autonomous driving in intelligent roadside infrastructure
CN114677618A (en) Accident detection method and device, electronic equipment and storage medium
Duan [Retracted] Deep Learning‐Based Multitarget Motion Shadow Rejection and Accurate Tracking for Sports Video
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
CN113763417A (en) Target tracking method based on twin network and residual error structure
CN115100565B (en) Multi-target tracking method based on spatial correlation and optical flow registration
Chen et al. KepSalinst: Using peripheral points to delineate salient instances
CN114972434A (en) End-to-end multi-target tracking system for cascade detection and matching
CN115546668A (en) Marine organism detection method and device and unmanned aerial vehicle
CN115393743A (en) Vehicle detection method based on double-branch encoding and decoding network, unmanned aerial vehicle and medium
Schöller et al. Buoy Light Pattern Classification for Autonomous Ship Navigation Using Recurrent Neural Networks
Teoh et al. Fast Regression Convolutional Neural Network for Visual Crowd Counting
Kalirajan et al. Deep learning for moving object detection and tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination