CN114998999B - Multi-target tracking method and device based on multi-frame input and track smoothing - Google Patents
Multi-target tracking method and device based on multi-frame input and track smoothing Download PDFInfo
- Publication number
- CN114998999B CN114998999B CN202210856428.0A CN202210856428A CN114998999B CN 114998999 B CN114998999 B CN 114998999B CN 202210856428 A CN202210856428 A CN 202210856428A CN 114998999 B CN114998999 B CN 114998999B
- Authority
- CN
- China
- Prior art keywords
- track
- target
- frame
- pedestrian
- target tracking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000009499 grossing Methods 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000001514 detection method Methods 0.000 claims abstract description 36
- 238000000605 extraction Methods 0.000 claims abstract description 32
- 239000012634 fragment Substances 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 54
- 238000012545 processing Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-target tracking method and a multi-target tracking device based on multi-frame input and track smoothing, wherein the method comprises the following steps: step S1: acquiring a pedestrian video data set, marking pedestrian coordinates and pedestrian tracks, and generating fragment type track data; step S2: constructing and training a pedestrian multi-target tracking network model based on multi-frame input and smooth track; and step S3: reasoning is carried out based on a pedestrian multi-target tracking network model obtained through training, and a current frame pedestrian target detection and feature extraction result and previous frames of pedestrian target detection and feature extraction results are obtained, namely the coordinates and the appearance features of a multi-frame image target are obtained; and step S4: and matching the shortest characteristic distance by using the coordinates and the appearance characteristics of the multi-frame image target, and smoothing the track by using a track curvature smoothing function to finally obtain the track of the current frame. The method has the advantages of low time consumption and good robustness to the shielding problem of the same kind of targets.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a multi-target tracking method and device based on multi-frame input and track smoothing.
Background
With the wide deployment of urban public area monitoring cameras, the online detection and multi-target tracking technology for interested targets has significant academic and commercial values based on the requirements of public safety and emergency recourse.
Most of the current tracking algorithms for objects such as pedestrians use a detection network to obtain the position of an interested object, then use a ReID network to extract the appearance characteristic of the object, and finally use a Hungarian algorithm or a greedy algorithm to perform matching based on the distance measurement of a characteristic space. However, this approach has significant drawbacks: 1. when the targets are matched, only feature matching is carried out on the targets and the previous frame or frames, and the identity identification numbers are easy to exchange due to similar features of the blocked targets; 2. the selection of a fixed feature distance threshold is highly likely to result in a newly appearing target matching a track that has historically disappeared because there is no active track matching.
Based on the above two problems, the academic world mainly relies on proposing a network with better detection performance and proposing a network with stronger feature expression robustness, but as shown in fig. 3, in such a situation, because the occlusion of the same kind of object can lead to that a part of the appearance features of the object are covered by the appearance features of other objects, when two people meet, one person blocks the other person, and at the moment of occlusion, the appearance features of the blocked person also become the features of the occluded person, therefore, in a manner based on feature matching, the identity ID of the object is very easy to cause interchange, so as to generate a misconnection track as shown by the B indication line in fig. 3, the real situation should be the track shown by the a indication line in fig. 3, and this problem has not been solved well.
Based on the above, a pedestrian multi-target tracking method which is high in operation efficiency, resistant to similar target shielding and excellent in performance needs to be provided.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a multi-target tracking method and a multi-target tracking device based on multi-frame input and track smoothing, and the specific technical scheme is as follows:
a multi-target tracking method based on multi-frame input and track smoothing comprises the following steps:
step S1: acquiring a pedestrian video data set, marking pedestrian coordinates and pedestrian tracks, and generating fragment type track data;
step S2: constructing and training a pedestrian multi-target tracking network model based on multi-frame input and smooth track;
and step S3: reasoning is carried out based on a pedestrian multi-target tracking network model obtained through training, and a current frame pedestrian target detection and feature extraction result and previous frames of pedestrian target detection and feature extraction results are obtained, namely the coordinates and the appearance features of a multi-frame image target are obtained;
and step S4: and matching the shortest characteristic distance by using the coordinates and the appearance characteristics of the multi-frame image target, and smoothing the track by using a track curvature smoothing function to finally obtain the track of the current frame.
Further, the step S1 specifically includes: marking pedestrians in the pedestrian video sequence frame by using marking software for the acquired open source pedestrian video, wherein the marking comprises marking a target frame and an Identification (ID) number of a target, and the ID numbers are accumulated from 1; then, cutting and binding the pedestrian video with a fixed length to generate a track segment, wherein the track segment is composed of 2m +1 image sequence frames, namely the data of the track segment is composed of m image frames before to m image frames after an image frame at a certain moment, and m is a positive integer.
Further, the pedestrian multi-target tracking network model is formed by combining a Yolov5-L main network and a multi-scale feature extraction module, the multi-scale feature extraction module and a target detection head of the Yolov5-L main network are arranged in parallel and input in the same mode, and the multi-scale feature extraction module is composed of a 3 × 256 convolution layer and a 1 × 256 × 3 convolution layer; the input image passes through a Yolov5-L main network and then passes through a multi-scale feature extraction module to output an appearance feature map with the same size as the input image, and then the appearance feature corresponding to the target frame is obtained by intercepting in the appearance feature map based on a preset frame to which the target frame detected by a target detection head belongs.
Further, the training of the pedestrian multi-target tracking network model is to train by adopting the fragment type trajectory data, simultaneously send image sequence frames of the fragment type trajectory data into the pedestrian multi-target tracking network model for reasoning, calculate to obtain coordinates of a target, namely a target frame and appearance characteristics, match the coordinates and the appearance characteristics of the target by adopting a shortest characteristic distance and a trajectory curvature smooth function based on the coordinates and the appearance characteristics of the target, and simultaneously utilize a total loss function to solve a gradient for reasoning the backward direction of the pedestrian multi-target tracking network model.
Further, the total loss function is a loss function combining the trajectory feature distance and the fitted loss function with a weighted average of the average L1 loss functions of the trajectory detection,A loss function representing the combined trajectory feature distance and fit,represents the average L1 loss function of track segment object detection.
Further, the combined track characteristic distance and the fitted loss function are obtained by weighted average of a track characteristic distance loss function and a track curvature smooth loss function, and feature extraction and track matching of the pedestrian multi-target tracking network model are trained and learned;
the trajectory feature distance loss function is expressed as:
wherein,,i∈[1,2m+1]representing the target frame in the ith image frameAnd the ith image frame real label target frameThe characteristic distance is represented by a cosine function of a characteristic vector included angle, and 2m +1 is the number of image sequence frames of the track segment;
the trajectory curvature smoothing loss function is expressed as:
wherein x represents the number of target tracks formed in the track segment,the curvature of the track of the jth predicted target is the average track of the 2m +1 frame image,for the curvature of the corresponding real label trajectory, j ∈ [1,x], The predicted target track and the actual label track are the average track curvature difference; the matching specifically comprises the steps that the predicted target track is matched with the real label track by adopting the rule of curve front end, middle end and rear end IOU matching;
thus, the combined trajectory feature distance and fitted penalty function is represented as:
whereinAndare all weighted based onThe learning of the pedestrian multi-target tracking network model on feature extraction and track matching is supervised.
Further, the average L1 loss function of the track segment target detection is expressed as:
wherein,represents the average L1 loss function of target detection for the ith frame image, and 2m +1 is the number of image sequence frames of the track segment.
Further, the step S4 specifically includes: the method comprises the following steps of adopting a trained pedestrian multi-target tracking network model, carrying out shortest characteristic distance matching by utilizing coordinates and appearance characteristics of multi-frame image targets, carrying out track smoothing, and enabling the weighted sum of the average characteristic distance and the track curvature of the target and the track target of the previous 2m frames to be the minimum by the track obtained by matching, wherein the weighted sum is expressed as:
whereinRepresenting the apparent feature distance of the current predicted image frame k from the ith image frame in its 2m preceding frames,andare all weighted weights.
A multi-target tracking device based on multi-frame input and track smoothing comprises one or more processors and is used for realizing the multi-target tracking method based on multi-frame input and track smoothing.
A computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the multi-target tracking method based on multi-frame input and trajectory smoothing.
Compared with the prior art, the invention has the beneficial effects that: 1. by constructing the fragment type trajectory data set, different angles of the same target in a plurality of images can simultaneously supervise the learning of the feature extraction module, so that the learned appearance features are more robust; 2. in the continuous frames of the video, the motion of the pedestrian is linear in a short time, the track jump cannot occur, and the track jump caused by mismatching due to similar target shielding can be well filtered through track smoothing based on 2m +1 frames; 3. by sending a section of training track data set, the detection module can learn the target information of the same target at different moments at the same time, so that the detection performance is improved to a certain extent; 4. by sharing the cache mode with the characteristics and the detection result, only 1 frame of picture needs to be inferred during deployment, and the effect of inputting 2m +1 frame of picture is achieved.
Drawings
Fig. 1 is a schematic flow chart of a multi-target tracking method based on multi-frame input and track smoothing according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an overall network framework of a multi-target tracking method based on multi-frame input and trajectory smoothing according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of the present invention in which an identity exchange problem exists;
fig. 4 is a schematic structural diagram of a multi-target tracking apparatus based on multi-frame input and track smoothing according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.
The invention provides a multi-frame input and smooth track-based multi-target tracking method, which aims to solve the problem of track misconnection caused by shielding of similar targets in the conventional multi-target tracking algorithm. The invention provides a single-stage network based on multi-frame input detection and ReID feature extraction, wherein Yolov5-L is used as a backbone of a network model, a target feature extraction module is added at the tail end of the network, and coordinates and appearance features of a pedestrian target are simultaneously obtained through the single-stage network; and adopting multi-frame input, matching multi-frame target characteristics and smoothing an online track to reduce the misconnection rate.
As shown in fig. 1, the method specifically includes the following steps:
step S1: acquiring a pedestrian video data set, marking pedestrian coordinates and pedestrian tracks, and generating fragment type track data;
specifically, the marking of the pedestrian coordinates and the pedestrian track refers to marking the pedestrians in the video sequence frame by using professional marking software, and comprises marking a target frame and an Identification (ID) number of the target, wherein the ID numbers are accumulated from 1;
the generation of fragment type track data refers to that historical video data is cut and bound in a fixed length to generate track fragments, overlapping frames or non-overlapping frames can exist among the track fragments, a track fragment before and after a certain time of the history is supposed to be composed of 2m +1 image sequence frames, namely the data of the track fragment is composed of m image frames from the front to the back of the image frame at the certain time, m is a positive integer, and the fragment type track data only binds image frame serial numbers and does not need to relate to repeated copying of images.
Step S2: constructing and training a pedestrian multi-target tracking network model based on multi-frame input and smooth track;
specifically, as shown in fig. 2, a multi-frame input pedestrian multi-target tracking network model based on the shortest feature distance and the smooth matching of the trajectory is mainly formed by combining a Yolov5-L main network and a multi-scale feature extraction module, the size of an input picture of the network model is 960 × 960, the multi-scale feature extraction module is arranged in parallel with a target detection head Detect of the Yolov5-L main network, the Yolov5-L main network performs target detection on an input image through a target detection head, the input of the multi-scale feature extraction module is the same as that of the target detection head, and the multi-scale feature extraction module is composed of one 3 × 256 convolution layer and one 1 × 256 × 3 convolution layer and performs feature extraction on a target; the method comprises the steps that an input image passes through a Yolov5-L main network and then passes through a multi-scale feature extraction module to finally obtain an appearance feature map with the same size as the input image, and finally, appearance features corresponding to a target frame are obtained in the appearance feature map by intercepting on the basis of a preset frame to which the target frame obtained by detection of a target detection head Detect belongs, wherein the dimension of the appearance features of a single target is 256 dimensions.
The training of the pedestrian multi-target tracking network model is to train by adopting the fragment type track data, send image sequence frames of the fragment type track data into the pedestrian multi-target tracking network model for reasoning, calculate to obtain coordinates of a target, namely a target frame and appearance characteristics, match the coordinates and the appearance characteristics of the target by adopting a shortest characteristic distance and a track curvature smooth function, and simultaneously solve gradient through a total loss function to carry out backward reasoning of the pedestrian multi-target tracking network model.
The feature extraction of the pedestrian multi-target tracking network model and the matching training of the track adopt a loss function combining track feature distance and fitting to supervise the learning of the feature extraction and the track matching of the network model. The loss function is obtained by weighted average of a track characteristic distance loss function and a track curvature smooth loss function.
For the distance of the feature space, the cosine function of the included angle of the feature vector is adopted to represent, and then the target frame in the ith image frameAnd the ith image frame real label target frameIs expressed as:
when the included angle of the feature vector is closer to 0 degree, the average predicted value is closer to the true value, and the feature distance is closerThe closer to 0, the characteristic distance otherwiseThe closer to 1; thus, the trajectory feature distance loss function for a trajectory segment is represented as:
assuming that the track segments together form the track of x objects, calculating the average track curvature of the j-th object at 2m +1 frame image as,j∈[1,x]Calculating the average track curvature difference between the predicted target track and the real label track asIn whichMatching the predicted track and the real label track by adopting a rule of curve front end, middle end and rear end IOU matching for the curvature of the corresponding real label track, namely, taking a coordinate of a first target, a coordinate of a middle target and a coordinate of a last disappearance in the predicted track and coordinates of first, middle and last time periods corresponding to the real label track to perform IOU matching, if the average IOU is maximum, considering the predicted track and the real track to be the same track, and calculating a loss function of fitting the predicted track and the real label track based on a calculation formula of an average track curvature difference value, namely a track curvature smooth loss function:
finally, the combined trajectory feature distance and fitted loss function is expressed as:
whereinAndare all weighted based onThe learning of the pedestrian multi-target tracking network model on feature extraction and track matching is supervised.
The pedestrian multi-target tracking network model detects and trains the L1 loss function in the commonly used single-frame target detection model, and assumes the target of the ith frame imageAverage L1 loss function representation of target detectionThen the average L1 loss function of target detection for all images of the track segmentAnd (4) showing.
Finally, the total loss function of the training of the multi-target tracking network model of the pedestrian is expressed as。
And step S3: reasoning is carried out based on a pedestrian multi-target tracking network model obtained through training, and a current frame pedestrian detection and feature extraction result and previous frames of pedestrian detection and feature extraction results are obtained, namely the coordinates and the appearance features of a multi-frame image target are obtained;
specifically, a trained pedestrian multi-target tracking network model is used for detecting a pedestrian target frame of an obtained frame image and corresponding appearance characteristics of the frame image, a track segment is formed by front and back m frames of images in the training process, however, a track segment is inferred by 2m +1 formed by front 2m frames of images in actual application deployment, so that the model is not required to be inferred for 2m +1 times each time in actual inference, only 1 time is required to be inferred, and the result of the front 2m frames is obtained by previous cache.
And step S4: and matching the shortest characteristic distance by using the coordinates and the appearance characteristics of the multi-frame image target, and smoothing the track by using a track curvature smoothing function to finally obtain the track of the current frame.
Specifically, based on the image inference result of 2m +1 frame, the shortest feature distance matching and the track curvature smoothing principle are used, that is, the track obtained by matching makes the weighted sum of the average feature distance and the track curvature of the current predicted target and the track target of the previous 2m frames minimum, and the weighted sum is expressed as:
whereinRepresenting the apparent feature distance of the current predicted image frame k from the ith image frame in its 2m preceding frames,andare all weighted weights.
Corresponding to the embodiment of the multi-target tracking method based on multi-frame input and track smoothing, the invention also provides an embodiment of the multi-target tracking device based on multi-frame input and track smoothing.
Referring to fig. 4, the multi-target tracking device based on multi-frame input and trajectory smoothing provided by the embodiment of the present invention includes one or more processors, and is configured to implement the multi-target tracking method based on multi-frame input and trajectory smoothing in the foregoing embodiment.
The embodiment of the multi-target tracking device based on multi-frame input and track smoothing can be applied to any equipment with data processing capability, such as computers and other equipment or devices. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a device in a logical sense, a processor of any device with data processing capability reads corresponding computer program instructions in the nonvolatile memory into the memory for operation. In terms of hardware, as shown in fig. 4, a hardware structure diagram of an arbitrary device with data processing capability where a multi-frame input and trajectory smoothing-based multi-target tracking apparatus of the present invention is located is shown, except for the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 4, in an embodiment, the arbitrary device with data processing capability where the apparatus is located may also include other hardware according to an actual function of the arbitrary device with data processing capability, which is not described again.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, and when the program is executed by a processor, the multi-target tracking method based on multi-frame input and track smoothing in the above embodiments is implemented.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing device described in any previous embodiment. The computer readable storage medium may also be an external storage device such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer readable storage medium may include both an internal storage unit and an external storage device of any data processing capable device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way. Although the foregoing has described the practice of the present invention in detail, it will be apparent to those skilled in the art that modifications may be made to the practice of the invention as described in the foregoing examples, or that certain features may be substituted in the practice of the invention. All changes, equivalents and modifications which come within the spirit and scope of the invention are desired to be protected.
Claims (6)
1. A multi-target tracking method based on multi-frame input and track smoothing is characterized by comprising the following steps:
step S1: acquiring a pedestrian video data set, marking pedestrian coordinates and pedestrian tracks, and generating fragment type track data;
step S2: constructing and training a pedestrian multi-target tracking network model based on multi-frame input and smooth track;
specifically, the training of the pedestrian multi-target tracking network model is to train by adopting the fragment type track data, send image sequence frames of the fragment type track data into the pedestrian multi-target tracking network model for reasoning at the same time, calculate to obtain a coordinate of a target, namely a target frame and an appearance characteristic, match the coordinate and the appearance characteristic of the target by adopting a shortest characteristic distance and a track curvature smooth function based on the coordinate and the appearance characteristic of the target, and simultaneously utilize a total loss function to solve a gradient for backward reasoning of the pedestrian multi-target tracking network model;
the total loss function is a loss function combining the trajectory feature distance and the fitted loss function with the weighted average of the average L1 loss functions of the trajectory detection,A loss function representing the combined trajectory feature distance and fit,an average L1 loss function representing track segment target detection;
the combined track characteristic distance and the fitted loss function are obtained by weighted average of a track characteristic distance loss function and a track curvature smooth loss function, and training and learning are carried out on characteristic extraction and track matching of the pedestrian multi-target tracking network model;
the trajectory feature distance loss function is expressed as:
wherein,,i∈[1,2m+1]representing the target frame in the ith image frameAnd the ith image frame real label target frameThe characteristic distance is represented by a cosine function of a characteristic vector included angle, and 2m +1 is the number of image sequence frames of the track segment;
the trajectory curvature smoothing loss function is expressed as:
wherein x represents the number of target tracks formed in the track segment,the curvature of the track of the jth predicted target is the average track of the 2m +1 frame image,for the curvature of the corresponding real label trajectory, j ∈ [1,x], The predicted target track and the real label track are the average track curvature difference; the matching is specifically matching of a predicted target track and a real label track, and matching is carried out by adopting a rule of curve front end, middle end and rear end IOU matching;
thus, the combined trajectory feature distance and fitted penalty function is represented as:
whereinAndare all weighted based onThe learning of a pedestrian multi-target tracking network model on feature extraction and track matching is supervised;
and step S3: reasoning is carried out based on a pedestrian multi-target tracking network model obtained through training, and a current frame pedestrian target detection and feature extraction result and previous frames of pedestrian target detection and feature extraction results are obtained, namely the coordinates and the appearance features of a multi-frame image target are obtained;
and step S4: the method comprises the following steps of carrying out shortest characteristic distance matching by utilizing coordinates and appearance characteristics of a multi-frame image target, carrying out track smoothing by utilizing a track curvature smoothing function, and finally obtaining the track of a current frame, wherein the method specifically comprises the following steps: the method comprises the following steps of adopting a trained pedestrian multi-target tracking network model, carrying out shortest characteristic distance matching by utilizing coordinates and appearance characteristics of multi-frame image targets, carrying out track smoothing, and enabling the weighted sum of the average characteristic distance and the track curvature of the target and the track target of the previous 2m frames to be the minimum by the track obtained by matching, wherein the weighted sum is expressed as:
2. The multi-target tracking method based on multi-frame input and track smoothing as claimed in claim 1, wherein the step S1 specifically comprises: marking pedestrians in the pedestrian video sequence frame by using marking software for the acquired open source pedestrian video, wherein the marking comprises marking a target frame and an Identification (ID) number of a target, and the ID numbers are accumulated from 1; then, cutting and binding the pedestrian video with a fixed length to generate a track segment, wherein the track segment is composed of 2m +1 image sequence frames, namely the data of the track segment is composed of m image frames before to m image frames after an image frame at a certain moment, and m is a positive integer.
3. The multi-frame input and track smoothing-based multi-target tracking method according to claim 1, wherein the pedestrian multi-target tracking network model is formed by combining a Yolov5-L main network and a multi-scale feature extraction module, the multi-scale feature extraction module is arranged in parallel with a target detection head of the Yolov5-L main network and has the same input, and the multi-scale feature extraction module is composed of one 3 x 256 convolution layer and one 1 x 256 convolution layer; the input image passes through a Yolov5-L main network and then passes through a multi-scale feature extraction module to output an appearance feature map with the same size as the input image, and then the appearance feature corresponding to the target frame is obtained by intercepting in the appearance feature map based on a preset frame to which the target frame detected by a target detection head belongs.
4. The multi-target tracking method based on multi-frame input and track smoothing as claimed in claim 1, wherein the average L1 loss function of track segment target detection is represented as:
5. A multi-target tracking device based on multi-frame input and trajectory smoothing, which is characterized by comprising one or more processors and is used for implementing the multi-target tracking method based on multi-frame input and trajectory smoothing as claimed in any one of claims 1 to 4.
6. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements the multi-target tracking method based on multi-frame input and trajectory smoothing according to any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210856428.0A CN114998999B (en) | 2022-07-21 | 2022-07-21 | Multi-target tracking method and device based on multi-frame input and track smoothing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210856428.0A CN114998999B (en) | 2022-07-21 | 2022-07-21 | Multi-target tracking method and device based on multi-frame input and track smoothing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114998999A CN114998999A (en) | 2022-09-02 |
CN114998999B true CN114998999B (en) | 2022-12-06 |
Family
ID=83021963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210856428.0A Active CN114998999B (en) | 2022-07-21 | 2022-07-21 | Multi-target tracking method and device based on multi-frame input and track smoothing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998999B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115342822B (en) * | 2022-10-18 | 2022-12-23 | 智道网联科技(北京)有限公司 | Intersection track data rendering method, device and system |
CN115880338B (en) * | 2023-03-02 | 2023-06-02 | 浙江大华技术股份有限公司 | Labeling method, labeling device and computer readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135314A (en) * | 2019-05-07 | 2019-08-16 | 电子科技大学 | A kind of multi-object tracking method based on depth Trajectory prediction |
CN110349187A (en) * | 2019-07-18 | 2019-10-18 | 深圳大学 | Method for tracking target, device and storage medium based on TSK Fuzzy Classifier |
CN111767847A (en) * | 2020-06-29 | 2020-10-13 | 佛山市南海区广工大数控装备协同创新研究院 | Pedestrian multi-target tracking method integrating target detection and association |
CN111797738A (en) * | 2020-06-23 | 2020-10-20 | 同济大学 | Multi-target traffic behavior fast extraction method based on video identification |
CN114677633A (en) * | 2022-05-26 | 2022-06-28 | 之江实验室 | Multi-component feature fusion-based pedestrian detection multi-target tracking system and method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103854273B (en) * | 2012-11-28 | 2017-08-25 | 天佑科技股份有限公司 | A kind of nearly positive vertical view monitor video pedestrian tracking method of counting and device |
US11341512B2 (en) * | 2018-12-20 | 2022-05-24 | Here Global B.V. | Distinguishing between pedestrian and vehicle travel modes by mining mix-mode trajectory probe data |
CN110378259A (en) * | 2019-07-05 | 2019-10-25 | 桂林电子科技大学 | A kind of multiple target Activity recognition method and system towards monitor video |
-
2022
- 2022-07-21 CN CN202210856428.0A patent/CN114998999B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110135314A (en) * | 2019-05-07 | 2019-08-16 | 电子科技大学 | A kind of multi-object tracking method based on depth Trajectory prediction |
CN110349187A (en) * | 2019-07-18 | 2019-10-18 | 深圳大学 | Method for tracking target, device and storage medium based on TSK Fuzzy Classifier |
CN111797738A (en) * | 2020-06-23 | 2020-10-20 | 同济大学 | Multi-target traffic behavior fast extraction method based on video identification |
CN111767847A (en) * | 2020-06-29 | 2020-10-13 | 佛山市南海区广工大数控装备协同创新研究院 | Pedestrian multi-target tracking method integrating target detection and association |
CN114677633A (en) * | 2022-05-26 | 2022-06-28 | 之江实验室 | Multi-component feature fusion-based pedestrian detection multi-target tracking system and method |
Non-Patent Citations (3)
Title |
---|
A Fusion Approach for Multi-Frame Optical Flow Estimation;Zhile Ren 等;《2019 IEEE Winter Conference on Applications of Computer Vision (WACV)》;20190307;全文 * |
Aerial image object detection based on improved YOLOv5;Qing Wen 等;《2022 2nd International Conference on Consumer Electronics and Computer Engineering (ICCECE)》;20220221;全文 * |
基于YOLOv3与卡尔曼滤波的多目标跟踪算法;任珈民等;《计算机应用与软件》;20200512(第05期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN114998999A (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114998999B (en) | Multi-target tracking method and device based on multi-frame input and track smoothing | |
CN111627045B (en) | Multi-pedestrian online tracking method, device and equipment under single lens and storage medium | |
Sakaridis et al. | Map-guided curriculum domain adaptation and uncertainty-aware evaluation for semantic nighttime image segmentation | |
Han et al. | Mat: Motion-aware multi-object tracking | |
Shin Yoon et al. | Pixel-level matching for video object segmentation using convolutional neural networks | |
CN109426805B (en) | Method, apparatus and computer program product for object detection | |
CN113034541B (en) | Target tracking method and device, computer equipment and storage medium | |
US9754178B2 (en) | Long-term static object detection | |
Rajasegaran et al. | Tracking people by predicting 3d appearance, location and pose | |
CN106803263A (en) | A kind of method for tracking target and device | |
WO2012127815A1 (en) | Moving object detecting apparatus and moving object detecting method | |
CN113159006B (en) | Attendance checking method and system based on face recognition, electronic equipment and storage medium | |
CN111027555B (en) | License plate recognition method and device and electronic equipment | |
CN110298867A (en) | A kind of video target tracking method | |
David | An intellectual individual performance abnormality discovery system in civic surroundings | |
CN114677633A (en) | Multi-component feature fusion-based pedestrian detection multi-target tracking system and method | |
Tao et al. | An adaptive frame selection network with enhanced dilated convolution for video smoke recognition | |
Chen et al. | Multiperson tracking by online learned grouping model with nonlinear motion context | |
Liu et al. | Real-time anomaly detection on surveillance video with two-stream spatio-temporal generative model | |
CN111382705A (en) | Reverse behavior detection method and device, electronic equipment and readable storage medium | |
CN114742112A (en) | Object association method and device and electronic equipment | |
CN111382606A (en) | Tumble detection method, tumble detection device and electronic equipment | |
CN110378515A (en) | A kind of prediction technique of emergency event, device, storage medium and server | |
Choudhury et al. | Scale aware deep pedestrian detection | |
CN114529587A (en) | Video target tracking method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |