CN112949615A - Multi-target tracking system and method based on fusion detection technology - Google Patents

Multi-target tracking system and method based on fusion detection technology Download PDF

Info

Publication number
CN112949615A
CN112949615A CN202110519994.8A CN202110519994A CN112949615A CN 112949615 A CN112949615 A CN 112949615A CN 202110519994 A CN202110519994 A CN 202110519994A CN 112949615 A CN112949615 A CN 112949615A
Authority
CN
China
Prior art keywords
target
frame image
coordinate information
tracking
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110519994.8A
Other languages
Chinese (zh)
Other versions
CN112949615B (en
Inventor
卢朝晖
齐国栋
王润发
于慧敏
顾建波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lijia Electronic Technology Co ltd
Zhejiang University ZJU
Original Assignee
Zhejiang Lijia Electronic Technology Co ltd
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lijia Electronic Technology Co ltd, Zhejiang University ZJU filed Critical Zhejiang Lijia Electronic Technology Co ltd
Priority to CN202110519994.8A priority Critical patent/CN112949615B/en
Publication of CN112949615A publication Critical patent/CN112949615A/en
Application granted granted Critical
Publication of CN112949615B publication Critical patent/CN112949615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-target tracking method fusing detection technology. The method comprises the steps of firstly modeling monitoring videos and pictures, stably and accurately outputting the category and position information of a concerned target and tracking. Specifically, firstly, a target detector is used for classifying and positioning targets in a picture, then a tracker based on motion modeling is used for continuously predicting a target track, then a detection area and a motion prediction area are input into an appearance feature acquisition network to obtain respective matching features, and data association between the detection area and the motion prediction area and correction of a final tracking result frame are completed according to the target position and the matching features. The method can accurately and quickly detect the target information, meanwhile, the motion prediction position and matching feature network fully utilizes the target motion and the appearance feature, and the result of multi-target tracking is more accurate through the correction of matching and frames.

Description

Multi-target tracking system and method based on fusion detection technology
Technical Field
The invention belongs to the technical field of intelligent identification, and particularly relates to a multi-target tracking method based on a fusion detection technology.
Background
The multi-target tracking method is widely applied to engineering, and plays a very key role in the work of monitoring road traffic, illegal behavior identification and the like. Given relevant videos, the traditional tracking method needs to manually initialize a tracked target frame, and with the development of deep learning, the tracking technology based on the detection of the neural network is increasing.
The detection method used in the current detection algorithm mostly adopts a one-stage or two-stage target detection algorithm, the one-stage target detection algorithm is that the features are directly mapped to the coordinate information and the category information of the target, the two-stage target detector firstly carries out the coarse positioning of all foreground targets as a regional candidate network and then inputs the regional candidate network into a classifier for the relocation and classification. The one-stage target detection algorithm has the advantages of high detection speed but low accuracy; the two-stage target detection algorithm has the advantages of high accuracy and low detection speed.
The multi-target tracking algorithm locates a plurality of targets in a video sequence based on a detection result, and forms a track through the corresponding relation between frames, so that the multi-target tracking algorithm focuses on difference learning among different target individuals. The common multi-target tracking framework gives association to the detection result of unified identity by measuring the distance between the appearance and the motion information of the detection targets and combining the association of the previous frame and the next frame. The method proves the effectiveness to a certain extent, but the design flow means that the tracking performance depends on the detection result unilaterally, and the visual characteristic with discrimination is obtained, so that a complex mechanism and a huge calculation amount are introduced, the tracking result is limited, and the efficiency is also deficient.
Disclosure of Invention
In order to solve the problems, the invention aims to provide a multi-target tracking method based on a fusion detection technology, and the multi-target tracking method has the characteristics of high specific accuracy and high detection rate.
In order to achieve the purpose, the invention adopts the following technical scheme: a multi-target tracking system based on fusion detection technology, the system comprising: the system comprises a deep convolutional neural network, a target detection network, a Kalman filter model, a matching feature acquisition network and a tracking result correction network; the output end of the deep convolutional neural network is connected with the input end of a target detection network, the output end of the target detection network is respectively connected with the input end of a matched feature acquisition network and the input end of a Kalman filter model, the output end of the Kalman filter model is connected with the input end of the matched feature acquisition network, and the output end of the matched feature acquisition network is connected with the input end of a tracking result correction network.
The invention also provides a tracking method of the multi-target tracking system based on the fusion detection technology, which specifically comprises the following steps:
(1) collecting video frame images;
(2) outputting a first frame image and a second frame image of the video frame image to a deep convolutional neural network to obtain the characteristics of each target in the first frame image and the second frame image;
(3) respectively inputting the characteristics of each target in the first frame image and the second frame image into a target detector network, outputting the confidence score, the category type, the category score and the coordinate information of each target in the first frame image and the second frame image, calculating the product of the confidence score and the category score of each target, and reserving the category type of the target with the product higher than a threshold value and the corresponding coordinate information;
(4) inputting the category type of the reserved first frame image and the corresponding coordinate information into a Kalman filter model for target tracking prediction, and predicting the coordinate information corresponding to the category type in the second frame image;
(5) inputting the second frame image, the coordinate information corresponding to the category type in the predicted second frame image and the reserved coordinate information of the second frame image into a matching feature acquisition network to obtain a predicted shape matching feature and a shape matching feature;
(6) calculating distance measurement according to the coordinate information of the second frame image, the coordinate information of the predicted second frame image, the predicted appearance matching characteristic and the appearance matching characteristic, and matching the coordinate information of the second frame image with a target corresponding to the coordinate information of the predicted second frame image by using a Hungarian algorithm;
(7) inputting the predicted shape matching feature and the shape matching feature of the second frame image obtained in the step (5) and the matched coordinate information and predicted coordinate information into a tracking result correction network, and outputting corrected coordinate information of multi-target tracking;
(8) and (5) sequentially repeating the steps (2) to (7) on the subsequent frame image until all the video frame images are tracked.
Further, the matching feature obtaining network is composed of a backbone network and a feature mapping module, and the step (5) is specifically: inputting the second frame image into a backbone network to obtain a complete feature map, cutting the coordinate information corresponding to the category type in the predicted second frame image and the part of the reserved coordinate information of the second frame image at the corresponding position in the feature map, respectively inputting the two parts of features obtained by cutting into a feature mapping module, and outputting a 1 x 128-dimensional predicted shape matching feature and a shape matching feature.
Further, the distance metric
Figure 90161DEST_PATH_IMAGE001
: the calculation process of (2) is as follows:
Figure 413826DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 724721DEST_PATH_IMAGE004
represents the secondCoordinate information P of frame image and coordinate information of predicted second frame image
Figure 274258DEST_PATH_IMAGE005
The ratio of the intersection of (a) to the union thereof,
Figure 619789DEST_PATH_IMAGE006
in order to predict the shape-matching features,
Figure 114355DEST_PATH_IMAGE007
is a shape matching feature.
Further, if the matching fails in the step (6), continuously predicting a certain number of frames of the motion prediction result until the matching is achieved in a certain frame, or continuously predicting the certain number of frames, and then stopping the track operation without prediction, wherein the matching with the detection result cannot be achieved yet; if there is no detection result matching the motion prediction result, re-recognition is performed to determine whether to restart the old target or to newly initialize a new target.
Further, the tracking method further comprises: after the tracking task of the frame image is completed, the shape matching feature F of the tracking corresponding detection result is usedPUpdating the shape matching feature F of each tracking trackT
Figure 912547DEST_PATH_IMAGE008
Wherein the content of the first and second substances,
Figure 767239DEST_PATH_IMAGE009
it is indicated that the learning rate is,
Figure 170539DEST_PATH_IMAGE010
representing the shape-matching features of the trace before processing the current frame,
Figure 695061DEST_PATH_IMAGE011
and showing the track shape matching characteristics after the track corresponding to the target shape information of the current frame is updated.
Compared with the prior art, the invention has the beneficial effects that: the detection method of the multi-target tracking method adopts an advanced target detection network, and provides more accurate initialization and observation values for target tracking. In the aspect of motion prediction, the invention adopts a Kalman filtering-based motion modeling method, only uses motion information of a target to perform iterative update, accurately predicts the position of an object and introduces extremely low calculation cost, and on the basis of fusion of shape matching characteristics and position information based on a detection result and a tracking result, a more accurate tracking result can be obtained through a tracking result correction network.
Drawings
FIG. 1 is a flow chart of a multi-target tracking method based on fusion detection technology.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. On the contrary, the invention is intended to cover alternatives, modifications, equivalents and alternatives which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of the present invention, certain specific details are set forth in order to provide a better understanding of the present invention. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details.
The invention provides a multi-target tracking system based on fusion detection technology, which comprises: the system comprises a deep convolutional neural network, a target detection network, a Kalman filter model, a matching feature acquisition network and a tracking result correction network; the output end of the deep convolutional neural network is connected with the input end of a target detection network, the output end of the target detection network is respectively connected with the input end of a matched feature acquisition network and the input end of a Kalman filter model, the output end of the Kalman filter model is connected with the input end of the matched feature acquisition network, and the output end of the matched feature acquisition network is connected with the input end of a tracking result correction network.
Referring to fig. 1, a flowchart of a tracking method of a multi-target tracking system based on a fusion detection technology is shown, which specifically includes the following steps:
(1) obtaining surveillance video and video image frames through a road traffic surveillance camera
Figure 855915DEST_PATH_IMAGE012
(2) Image of video frame
Figure 389664DEST_PATH_IMAGE012
The first frame image and the second frame image are output to a deep convolutional neural network
Figure 70307DEST_PATH_IMAGE013
Obtaining the characteristics of each target in the first frame image and the second frame image
Figure 31309DEST_PATH_IMAGE014
(ii) a ResNet-50 is adopted as the deep convolution neural network in the invention
Figure 7356DEST_PATH_IMAGE015
(3) The characteristics of each object in the first frame image and the second frame image are compared
Figure 689004DEST_PATH_IMAGE016
Separately input into a network of target detectors
Figure 863633DEST_PATH_IMAGE017
In the first frame image and the second frame image, the confidence score S, the category type C and the category score of each object in the first frame image and the second frame image are output
Figure 120171DEST_PATH_IMAGE018
And coordinate information P
Figure 380251DEST_PATH_IMAGE019
In which is definedThe category is
Figure 865590DEST_PATH_IMAGE020
= { pedestrian, mei qu takeaway electric vehicle, hungry take-out electric vehicle, non-takeaway electric vehicle },
Figure 97989DEST_PATH_IMAGE021
is the number of categories. And calculating a confidence score S and a category score for each object
Figure 400794DEST_PATH_IMAGE018
Will be higher than a threshold value
Figure 505760DEST_PATH_IMAGE022
Class of object of (2)
Figure 185003DEST_PATH_IMAGE023
And corresponding coordinate information P are reserved;
(4) setting an observation vector of a target by motion modeling position prediction based on a custom uniform velocity linear Kalman filter model
Figure 944011DEST_PATH_IMAGE024
Defining a modeled target state vector for the center coordinates, width and height, respectively, of the target bounding box
Figure 620980DEST_PATH_IMAGE025
The last four items are respectively the moving speed, the width and the high change rate of the central point of the target enclosing frame in the x and y directions, the category type C of the reserved first frame image and the corresponding coordinate information P are input into a Kalman filter model for target tracking prediction, and the coordinate information corresponding to the category type in the second frame image is predicted
Figure 590073DEST_PATH_IMAGE026
(ii) a Specifically, the method comprises the following steps:
(4.1) category of first frame image to be retained
Figure 932062DEST_PATH_IMAGE023
And corresponding coordinate information P, initializing an observation vector of the target
Figure 935790DEST_PATH_IMAGE027
Setting the motion speed and the width-height change rate to be 0;
(4.2) passing the motion state of the first frame image
Figure 783660DEST_PATH_IMAGE028
Predicting the second frame image state with the state transition matrix H
Figure 115416DEST_PATH_IMAGE029
Figure 136461DEST_PATH_IMAGE030
(ii) a Error matrix through first frame image
Figure 355215DEST_PATH_IMAGE031
State transition matrix H and noise covariance matrix Q, error matrix for predicting second frame image
Figure 373987DEST_PATH_IMAGE032
(5) Predicting the coordinate information corresponding to the category in the second frame image
Figure 317672DEST_PATH_IMAGE033
And inputting the coordinate information P of the reserved second frame image into a matching feature acquisition network to obtain a predicted appearance matching feature
Figure 17775DEST_PATH_IMAGE034
And profile matching features
Figure 730516DEST_PATH_IMAGE035
(ii) a The matching feature acquisition network consists of a backbone network F' and a feature mapping module, and the step (5) specifically comprises the following steps: inputting the second frame image into the backbone network FGet the complete characteristic diagram
Figure 575981DEST_PATH_IMAGE036
Then, the coordinate information corresponding to the category in the second frame image is predicted
Figure 210225DEST_PATH_IMAGE037
And the coordinate information P of the second frame image is retained in the feature map
Figure 307494DEST_PATH_IMAGE038
Cutting the part corresponding to the position, inputting the two parts of features obtained by cutting into a feature mapping module respectively, and outputting 1 x 128-dimensional predicted shape matching features
Figure 281266DEST_PATH_IMAGE039
And profile matching features
Figure 376261DEST_PATH_IMAGE040
. Compared with the original output characteristic diagram, the characteristic diagram after mapping is integrated and compressed with effective characteristics, and is more efficient in measuring the similarity of the shape information of different positions.
The matching feature acquisition network realizes the functions of matching and re-identification, adopts triple loss during training, takes the same target image in a video sequence to form a positive sample pair, different target images to form a negative sample pair, and sets an anchor sample to obtain the shape matching feature
Figure 294538DEST_PATH_IMAGE041
The positive sample has the shape matching characteristic of
Figure 818667DEST_PATH_IMAGE042
The negative sample has the shape matching characteristic of
Figure 506000DEST_PATH_IMAGE043
Then define
Figure 709580DEST_PATH_IMAGE044
Comprises the following steps:
Figure 52836DEST_PATH_IMAGE045
wherein i represents the ith training sample, and + represents that the original value is taken when the value is greater than 0 and 0 is taken when the value is less than 0, and alpha is the minimum interval between different samples.
(6) Predicting the coordinate information of the second frame image according to the coordinate information P of the second frame image
Figure 757487DEST_PATH_IMAGE046
Predicted shape matching features
Figure 689540DEST_PATH_IMAGE047
And profile matching features
Figure 188655DEST_PATH_IMAGE048
Calculating a distance metric
Figure 19207DEST_PATH_IMAGE049
The distance measure
Figure 137336DEST_PATH_IMAGE050
: the calculation process of (2) is as follows:
Figure 799262DEST_PATH_IMAGE052
wherein the content of the first and second substances,
Figure 95376DEST_PATH_IMAGE053
coordinate information P representing the second frame image and coordinate information of the predicted second frame image
Figure 413225DEST_PATH_IMAGE054
The ratio of the intersection of (a) to the union thereof.
Using Hungarian algorithm, coordinate information P of the second frame image is compared with coordinate information of the predicted second frame image
Figure 194099DEST_PATH_IMAGE005
Matching corresponding targets; if the matching fails, continuously predicting a certain number of frames for the motion prediction result corresponding to the non-detection result until the matching is achieved in a certain frame, or continuously predicting the certain number of frames and then failing to achieve the matching with the detection result, suspending the track operation and not predicting any more; re-identifying the detection result without the matching of the motion prediction result, and measuring the Euclidean distance between the shape matching feature of the pause track T and the shape matching feature of the current detection result P
Figure 851476DEST_PATH_IMAGE055
Setting a threshold value delta, restarting the track motion prediction if the distance is smaller than the threshold value, simultaneously associating the current matching detection results, and establishing a motion prediction model for a new target if all tracks are not smaller than the threshold value.
Coordinate information P for the second frame image and coordinate information of the predicted second frame image
Figure 692393DEST_PATH_IMAGE056
In the matched result, recording the detection result in the matched result as an observed value D, and calculating Kalman gain by combining with a measurement noise covariance matrix R
Figure 497538DEST_PATH_IMAGE057
And updating the motion state of the second frame image
Figure 206737DEST_PATH_IMAGE058
Error matrix
Figure 843255DEST_PATH_IMAGE059
Figure 730440DEST_PATH_IMAGE060
(7) Inputting the characteristics of the second frame image I and the matched coordinate information P and predicted coordinate information P' into the tracking result correction network, and outputting correctionCoordinate information of multi-target tracking; outputting the amount of deviation of the tracking result correction frame with respect to the predicted position of the motion
Figure 757301DEST_PATH_IMAGE061
And respectively representing the offset of the correction result relative to the xy coordinate of the center store of the motion prediction frame, the offset of the width and the offset of the height, thereby obtaining a more accurate tracking result.
The tracking correction network consists of 4 layers of 3 × 3 convolutional layers and fully-connected layers, the output is the offset prediction of the regression matching pair quantity × 4, and Smooth-L1Loss is adopted during training. Setting the true value of the tracking result as
Figure 145557DEST_PATH_IMAGE062
The result of the motion prediction is
Figure 994171DEST_PATH_IMAGE063
The network output offset is
Figure 442470DEST_PATH_IMAGE064
Then, then
Figure 222207DEST_PATH_IMAGE065
Is defined as:
Figure 23941DEST_PATH_IMAGE066
smooth L1Loss is such that the difference between the true and predicted values is not too large when the difference is large, but small enough when the distance is small, so that the network learns a more stable offset regression capability. The final tracking result is
Figure 369472DEST_PATH_IMAGE068
(8) And (5) sequentially repeating the steps (2) to (7) on the subsequent frame image until all the video frame images are tracked.
The tracking method further comprises the following steps: after the tracking task of the frame image is completed, the shape matching feature F of the tracking corresponding detection result is usedPUpdate each stripShape matching feature F of tracking trajectoryT
Figure 113306DEST_PATH_IMAGE008
Wherein the content of the first and second substances,
Figure 114760DEST_PATH_IMAGE009
it is indicated that the learning rate is,
Figure 844819DEST_PATH_IMAGE010
representing the shape-matching features of the trace before processing the current frame,
Figure 654643DEST_PATH_IMAGE011
and showing the track shape matching characteristics after the track corresponding to the target shape information of the current frame is updated.
For the evaluation of the tracking result, it is calculated by the following formula:
Figure 710323DEST_PATH_IMAGE069
wherein
Figure 621910DEST_PATH_IMAGE070
Respectively the number of missed detection, false detection and false matching in t frames.
The multi-target tracking system based on the fusion detection technology is used for multi-target tracking of pedestrians, American group take-out electric vehicles, hungry take-out electric vehicles and non-take-out electric vehicles, and the number of the pedestrians, the American group take-out electric vehicles, the hungry take-out electric vehicles and the non-take-out electric vehicles in the image at the 1 st second is respectively 2, 1, 2 and 6; the number of missed detections is 0, 0, 0, 0; the false detection numbers are respectively 0, 0, 0 and 1; the number of error matches is 0, 0, 0, 1; the tracking result score was calculated to be 81.82%. In the image of 1.5 seconds, the numbers of pedestrians, beauty-group takeaway electric vehicles, hungry take-away electric vehicles and non-takeaway electric vehicles are respectively 3, 3, 2 and 6; the number of missed detections is 0, 0, 0 and 1 respectively; the false detection numbers are respectively 0, 0, 0 and 1; the number of error matches is 0, 0, 0, 1; the tracking result score was calculated to be 78.50%. The tracking scores of the images in other times are not described in detail. From the tracking result fraction, the deep convolutional neural network and the target detection network used in the invention provide the reliability of the result, and the Kalman filter model, the matching feature acquisition network and the tracking result correction network used in the invention can ensure that the matching and tracking results are more accurate and have excellent performance.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A multi-target tracking system based on fusion detection technology, the system comprising: the system comprises a deep convolutional neural network, a target detection network, a Kalman filter model, a matching feature acquisition network and a tracking result correction network; the output end of the deep convolutional neural network is connected with the input end of a target detection network, the output end of the target detection network is respectively connected with the input end of a matched feature acquisition network and the input end of a Kalman filter model, the output end of the Kalman filter model is connected with the input end of the matched feature acquisition network, and the output end of the matched feature acquisition network is connected with the input end of a tracking result correction network.
2. The multi-target tracking system tracking method based on the fusion detection technology as claimed in claim 1, which is characterized by comprising the following steps:
(1) collecting video frame images;
(2) outputting a first frame image and a second frame image of the video frame image to a deep convolutional neural network to obtain the characteristics of each target in the first frame image and the second frame image;
(3) respectively inputting the characteristics of each target in the first frame image and the second frame image into a target detector network, outputting the confidence score, the category type, the category score and the coordinate information of each target in the first frame image and the second frame image, calculating the product of the confidence score and the category score of each target, and reserving the category type of the target with the product higher than a threshold value and the corresponding coordinate information;
(4) inputting the category type of the reserved first frame image and the corresponding coordinate information into a Kalman filter model for target tracking prediction, and predicting the coordinate information corresponding to the category type in the second frame image;
(5) inputting the second frame image, the coordinate information corresponding to the category type in the predicted second frame image and the reserved coordinate information of the second frame image into a matching feature acquisition network to obtain a predicted shape matching feature and a shape matching feature;
(6) calculating distance measurement according to the coordinate information of the second frame image, the coordinate information of the predicted second frame image, the predicted appearance matching characteristic and the appearance matching characteristic, and matching the coordinate information of the second frame image with a target corresponding to the coordinate information of the predicted second frame image by using a Hungarian algorithm;
(7) inputting the predicted shape matching feature and the shape matching feature of the second frame image obtained in the step (5) and the matched coordinate information and predicted coordinate information into a tracking result correction network, and outputting corrected coordinate information of multi-target tracking;
(8) and (5) sequentially repeating the steps (2) to (7) on the subsequent frame image until all the video frame images are tracked.
3. The tracking method of the multi-target tracking system based on the fusion detection technology as claimed in claim 2, wherein the matching feature acquisition network is composed of a backbone network and a feature mapping module, and the step (5) is specifically as follows: inputting the second frame image into a backbone network to obtain a complete feature map, cutting the position part of the coordinate information corresponding to the category type in the predicted second frame image and the reserved coordinate information of the second frame image in the feature map, respectively inputting the two cut parts of features into a feature mapping module, and outputting a 1 x 128-dimensional predicted shape matching feature and a shape matching feature.
4. The method as claimed in claim 2, wherein the distance metric is measured by a distance measuring device
Figure 335332DEST_PATH_IMAGE001
: the calculation process of (2) is as follows:
Figure 248930DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 475512DEST_PATH_IMAGE003
coordinate information P representing the second frame image and coordinate information of the predicted second frame image
Figure 592504DEST_PATH_IMAGE004
The ratio of the intersection of (a) to the union thereof,
Figure 664365DEST_PATH_IMAGE005
in order to predict the shape-matching features,
Figure 465093DEST_PATH_IMAGE006
is a shape matching feature.
5. The tracking method of the multi-target tracking system based on the fusion detection technology as claimed in claim 2, wherein in the step (6), if the matching fails, the motion prediction result is continuously predicted for a certain number of frames until the matching is achieved for a certain frame, or the motion prediction result is not matched with the detection result after the certain number of frames are continuously predicted, the track operation is suspended, and the prediction is not performed; if there is no detection result matching the motion prediction result, re-recognition is performed to determine whether to restart the old target or to newly initialize a new target.
6. The method of claim 2The tracking method of the multi-target tracking system based on the fusion detection technology is characterized by further comprising the following steps: after the tracking task of the frame image is completed, the shape matching feature F of the tracking corresponding detection result is usedPUpdating the shape matching feature F of each tracking trackT
Figure 65838DEST_PATH_IMAGE007
Wherein the content of the first and second substances,
Figure 325918DEST_PATH_IMAGE008
it is indicated that the learning rate is,
Figure 545678DEST_PATH_IMAGE009
representing the shape-matching features of the trace before processing the current frame,
Figure 574814DEST_PATH_IMAGE010
and showing the track shape matching characteristics after the track corresponding to the target shape information of the current frame is updated.
CN202110519994.8A 2021-05-13 2021-05-13 Multi-target tracking system and method based on fusion detection technology Active CN112949615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110519994.8A CN112949615B (en) 2021-05-13 2021-05-13 Multi-target tracking system and method based on fusion detection technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110519994.8A CN112949615B (en) 2021-05-13 2021-05-13 Multi-target tracking system and method based on fusion detection technology

Publications (2)

Publication Number Publication Date
CN112949615A true CN112949615A (en) 2021-06-11
CN112949615B CN112949615B (en) 2021-08-17

Family

ID=76233798

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110519994.8A Active CN112949615B (en) 2021-05-13 2021-05-13 Multi-target tracking system and method based on fusion detection technology

Country Status (1)

Country Link
CN (1) CN112949615B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115719368A (en) * 2022-11-29 2023-02-28 上海船舶运输科学研究所有限公司 Multi-target ship tracking method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563313A (en) * 2017-08-18 2018-01-09 北京航空航天大学 Multiple target pedestrian detection and tracking based on deep learning
US20190073783A1 (en) * 2006-01-04 2019-03-07 Mobileye Vision Technologies Ltd. Estimating distance to an object using a sequence of images recorded by a monocular camera
CN111127513A (en) * 2019-12-02 2020-05-08 北京交通大学 Multi-target tracking method
CN111739053A (en) * 2019-03-21 2020-10-02 四川大学 Online multi-pedestrian detection tracking method under complex scene
CN112288770A (en) * 2020-09-25 2021-01-29 航天科工深圳(集团)有限公司 Video real-time multi-target detection and tracking method and device based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073783A1 (en) * 2006-01-04 2019-03-07 Mobileye Vision Technologies Ltd. Estimating distance to an object using a sequence of images recorded by a monocular camera
CN107563313A (en) * 2017-08-18 2018-01-09 北京航空航天大学 Multiple target pedestrian detection and tracking based on deep learning
CN111739053A (en) * 2019-03-21 2020-10-02 四川大学 Online multi-pedestrian detection tracking method under complex scene
CN111127513A (en) * 2019-12-02 2020-05-08 北京交通大学 Multi-target tracking method
CN112288770A (en) * 2020-09-25 2021-01-29 航天科工深圳(集团)有限公司 Video real-time multi-target detection and tracking method and device based on deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115719368A (en) * 2022-11-29 2023-02-28 上海船舶运输科学研究所有限公司 Multi-target ship tracking method and system
CN115719368B (en) * 2022-11-29 2024-05-17 上海船舶运输科学研究所有限公司 Multi-target ship tracking method and system

Also Published As

Publication number Publication date
CN112949615B (en) 2021-08-17

Similar Documents

Publication Publication Date Title
Dedeoğlu et al. Silhouette-based method for object classification and human action recognition in video
Mallikarjuna et al. Traffic data collection under mixed traffic conditions using video image processing
Meuter et al. A decision fusion and reasoning module for a traffic sign recognition system
CN111310583A (en) Vehicle abnormal behavior identification method based on improved long-term and short-term memory network
Lee et al. Learning discriminative appearance models for online multi-object tracking with appearance discriminability measures
CN106570490B (en) A kind of pedestrian's method for real time tracking based on quick clustering
CN106778712A (en) A kind of multi-target detection and tracking method
Dai et al. Instance segmentation enabled hybrid data association and discriminative hashing for online multi-object tracking
CN110781785A (en) Traffic scene pedestrian detection method improved based on fast RCNN algorithm
CN113256690B (en) Pedestrian multi-target tracking method based on video monitoring
CN112738470B (en) Method for detecting parking in highway tunnel
CN111882586A (en) Multi-actor target tracking method oriented to theater environment
Liu et al. Dynamic RGB-D SLAM based on static probability and observation number
Li et al. Bi-directional dense traffic counting based on spatio-temporal counting feature and counting-LSTM network
CN112232240A (en) Road sprinkled object detection and identification method based on optimized intersection-to-parallel ratio function
Chen et al. A video-based method with strong-robustness for vehicle detection and classification based on static appearance features and motion features
CN112949615B (en) Multi-target tracking system and method based on fusion detection technology
CN114820765A (en) Image recognition method and device, electronic equipment and computer readable storage medium
Abdullah et al. Vehicle counting using deep learning models: a comparative study
CN117011341A (en) Vehicle track detection method and system based on target tracking
Lafuente-Arroyo et al. Road sign tracking with a predictive filter solution
Dong et al. An automatic object detection and tracking method based on video surveillance
CN108830182B (en) Lane line detection method based on cascade convolution neural network
Buch et al. Urban vehicle tracking using a combined 3D model detector and classifier
CN112528937A (en) Method for detecting starting and stopping of video pumping unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant