CN113409361A - Multi-target tracking method, device, computer and storage medium - Google Patents

Multi-target tracking method, device, computer and storage medium Download PDF

Info

Publication number
CN113409361A
CN113409361A CN202110922602.2A CN202110922602A CN113409361A CN 113409361 A CN113409361 A CN 113409361A CN 202110922602 A CN202110922602 A CN 202110922602A CN 113409361 A CN113409361 A CN 113409361A
Authority
CN
China
Prior art keywords
target
module
information
target tracking
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110922602.2A
Other languages
Chinese (zh)
Other versions
CN113409361B (en
Inventor
林涛
张炳振
刘宇鸣
邓普阳
张枭勇
陈振武
王宇
周勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Urban Transport Planning Center Co Ltd
Original Assignee
Shenzhen Urban Transport Planning Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Urban Transport Planning Center Co Ltd filed Critical Shenzhen Urban Transport Planning Center Co Ltd
Priority to CN202110922602.2A priority Critical patent/CN113409361B/en
Publication of CN113409361A publication Critical patent/CN113409361A/en
Application granted granted Critical
Publication of CN113409361B publication Critical patent/CN113409361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-target tracking method, a multi-target tracking device, a computer and a storage medium, and belongs to the technical field of artificial intelligence. Firstly, inputting a video into a fusion detection association module, performing down-sampling processing to obtain a feature map, and inputting the feature map into a difference calculation network to obtain difference features; secondly, obtaining the object type, the object position information and the same trackID of the same object in different video frames by a multi-task learning method in deep learning; and predicting the position of the target of the current frame possibly by using a track prediction module according to the target motion track information in the continuous frames, and providing reference for the fusion detection correlation module. And finally, outputting the multi-target tracking information. The method solves the technical problems that the target tracking efficiency is low, the target is easy to lose and the target ID is easy to change in the prior art, improves the efficiency of multi-target tracking and avoids the loss of target tracking.

Description

Multi-target tracking method, device, computer and storage medium
Technical Field
The application relates to a target tracking method, in particular to a multi-target tracking method, a multi-target tracking device, a computer and a storage medium, and belongs to the technical field of artificial intelligence.
Background
The multi-target tracking is to simultaneously track a plurality of targets in a video, application scenes such as security protection, automatic driving and the like are adopted, the number of people and vehicles in the scenes is uncertain, the characteristics of each target are uncertain, and the tracking of the targets is the basis of other applications (such as target positioning, target density calculation and the like). Different from single target tracking, multi-target tracking has a unique ID for each target, and the target is ensured not to be lost in the tracking process. Meanwhile, the appearance of a new target and the disappearance of an old target are also problems to be solved by multi-target tracking.
At present, many researches are made for multi-target tracking, a main tracking strategy is DBT (detection-based tracking), a detection module and a data association module are independent, a video sequence firstly passes through a detection algorithm to obtain position information of a target, and a final track result is obtained by executing the data association algorithm.
A representative algorithm in multi-target tracking is a DeepsORT algorithm, belongs to a data association algorithm in MOT (multi-target tracking), and can be combined with any detector to realize multi-target tracking. The algorithm combines a Kalman filtering algorithm and a Hungarian algorithm. And predicting the state of the detection frame in the next frame by using a Kalman filtering algorithm, and matching the state with the detection result of the next frame. In the matching process, a Hungarian algorithm is used, the motion characteristics obtained by Kalman filtering are combined with appearance characteristics extracted by a CNN (convolutional neural network) to be fused together to calculate a cost matrix.
The MOT is mainly applied to scenes such as security protection, automatic driving and the like, and the scenes have high requirements on algorithm real-time performance. In the case of a fixed hardware level, the detection efficiency and the detection accuracy of the MOT should be improved as much as possible. In the prior art, the MOT has the problem of low efficiency in practical application. The existing real-time MOT usually only concerns about data association steps, essentially only completes a part of the MOT, and cannot really solve the efficiency problem.
In addition, different targets are often occluded in a real scene, which can cause problems of target loss, target ID change and the like in the MOT.
Disclosure of Invention
The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to determine the key or critical elements of the present invention, nor is it intended to limit the scope of the present invention. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
In view of this, in order to solve the technical problems of low target tracking efficiency, easy target loss and easy target ID change in the prior art, the present invention provides a multi-target tracking method, apparatus, computer and storage medium.
And the fusion detection correlation module outputs the position and the category information of different targets. The track prediction module takes the information as input to learn different types of target track information, so that the target tracking efficiency is improved, and the target tracking loss is avoided.
A multi-target tracking method comprises the following steps:
s110, inputting the video into a fusion detection correlation module, performing down-sampling processing to obtain a feature map, and inputting the feature map into a difference calculation network to obtain difference features;
s120, calculating a loss function;
s130, acquiring data association relation among the target type, the target position information and the target; inputting target position information into a track prediction module, learning target movement by using convolution operation, outputting predicted position information, forming different types of target motion rule information and transmitting the different types of target motion rule information to a database and a fusion detection association module;
s140 outputs multi-target tracking.
Preferably, the specific method for obtaining the feature map in step S110 is:
1) 1/4 downsampling the video through the convolutional layer 1 to obtain a characteristic diagram 1;
2) carrying out 1/8 downsampling on the characteristic diagram 1 through the convolutional layer 2 to obtain a characteristic diagram 2;
3) the characteristic diagram 2 is subjected to 1/16 downsampling by the convolutional layer 3 to obtain the characteristic diagram 3.
Preferably, the calculating the loss function in step S120 specifically includes the following three loss functions:
1) a target classification loss function;
2) a target location regression loss function;
3) multi-objective cross entropy loss function.
Preferably, the calculation methods of the three loss functions in step S120 are specifically:
1) objective classification penalty function
Figure 759823DEST_PATH_IMAGE001
Figure 714180DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure 274343DEST_PATH_IMAGE003
a true category label representing the object,
Figure 954373DEST_PATH_IMAGE004
the predicted value of the model is represented,
Figure 94978DEST_PATH_IMAGE005
representing a total number of target categories;
Figure 969305DEST_PATH_IMAGE006
representing object class label representations
Figure 273554DEST_PATH_IMAGE003
A category feature of (a);
Figure 252881DEST_PATH_IMAGE007
and expressing a class characteristic balance coefficient for balancing the influence of the class characteristic on the whole loss function, wherein the value is 0.5.
Random initialization at the beginning of training, followed by updating of training each iteration
Figure 705115DEST_PATH_IMAGE008
The update formula is:
Figure 791756DEST_PATH_IMAGE009
Figure 66791DEST_PATH_IMAGE010
indicating the difference between the current data and the class characteristics,
Figure 106599DEST_PATH_IMAGE005
the total number of target categories is represented,
Figure 410410DEST_PATH_IMAGE006
representing object class label representations
Figure 889189DEST_PATH_IMAGE003
Is determined by the characteristics of the category of (1),
Figure 928819DEST_PATH_IMAGE004
representing the model predicted value;
Figure 488763DEST_PATH_IMAGE010
is shown as
Figure 638292DEST_PATH_IMAGE011
The difference between the current data and the class characteristics at the time of the second iteration, which will be later
Figure 19463DEST_PATH_IMAGE012
Value-modified update representation of
Figure 797320DEST_PATH_IMAGE013
Simultaneously use
Figure 674138DEST_PATH_IMAGE014
Guarantee
Figure 955471DEST_PATH_IMAGE015
The stability of the composite material is improved,
Figure 531379DEST_PATH_IMAGE016
the value was taken to be 0.5;
2) target position regression loss function:
Figure 528022DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 896947DEST_PATH_IMAGE018
the target predicted value of the model is represented,
Figure 563945DEST_PATH_IMAGE019
the true value of the target is represented,
Figure 546640DEST_PATH_IMAGE020
can take values
Figure 33509DEST_PATH_IMAGE021
Figure 500131DEST_PATH_IMAGE022
The coordinate value of the center point of the detection frame is shown,
Figure 269240DEST_PATH_IMAGE023
it indicates that the width of the detection frame,
Figure 431625DEST_PATH_IMAGE024
it indicates that the detection box is high,
Figure 131422DEST_PATH_IMAGE021
the position and the size of the target detection frame can be obtained through regression, and if the target prediction position output by the track prediction module is increased, the regression loss function of the target position is expressed as follows:
Figure 139086DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 246588DEST_PATH_IMAGE026
indicating the position of the output of the trajectory prediction module, including
Figure 591592DEST_PATH_IMAGE021
Information;
3) multi-objective cross entropy loss function:
Figure 790405DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 357085DEST_PATH_IMAGE003
a true category label representing the object,
Figure 322023DEST_PATH_IMAGE004
representing the model predicted value;
the fusion detection association module aims to generate target types, target position information and trackID information of targets among different video frames, so that loss functions need to be weighted and summed to form a total loss function, and the loss function of the fusion detection association module needs to be calculated;
the loss function of the fusion detection correlation module is as follows:
Figure 292122DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 330678DEST_PATH_IMAGE029
Figure 214232DEST_PATH_IMAGE030
and the multi-task weight parameter is expressed and can be set according to different task requirements.
Preferably, in step S230, the target movement law is learned through a three-layer ConvLSTM network, and predicted position information is output, specifically including feature information of a first-layer learning target; a second layer learns position change information of the target between consecutive frames; the third layer outputs predicted position information.
The system comprises a video input module, a fusion detection association module, a track prediction module, an output module and a storage module; the video input module is sequentially connected with the fusion detection association module and the output module; the video input module and the fusion detection association module are connected with the track prediction module; the track prediction module is connected with the storage module; the video input module is used for inputting video information; the fusion detection association module is used for acquiring the data association relation among the target category, the target position information and the target and outputting the target position information to the track prediction module; the track prediction module is used for acquiring the motion rule information of different types of targets; outputting the motion rule information of the targets of different types to a storage module and a fusion detection association module; the output module is used for outputting the target tracking result output by the fusion detection correlation module; the storage module is used for storing the motion rule information of different types of targets.
A computer comprising a memory storing a computer program and a processor implementing the steps of a multi-target tracking method when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, implements a multi-target tracking method.
The invention has the following beneficial effects: the scheme of the invention fuses the detection algorithm and the data association algorithm into a module, thereby reducing repeated calculation. The track prediction module can be used for well processing the matching problem of difficult targets, the trackID generated by obtaining the data association relation among the target type, the target position information and the targets is more stable, the identification accuracy of the same target of the previous frame and the next frame can be improved, and the trackID is frequently switched. The problem of low computational efficiency and poor real-time performance of the existing multi-target tracking technology is solved, and meanwhile, the robustness for target shielding loss is high.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a fusion detection association module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a difference computing network according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a trajectory prediction module according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a ConvLSTM model according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a multi-target tracking device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Embodiment 1, this embodiment is described with reference to fig. 1 to 3, and a multi-target tracking method includes the following steps:
s110, inputting the video into a fusion detection correlation module, performing down-sampling processing to obtain a feature map, and inputting the feature map into a difference calculation network to obtain difference features;
firstly, inputting a video to a fusion detection association module to obtain a target position and data association information at one time, wherein a model of the fusion detection association module specifically refers to fig. 2.
The specific method for obtaining the feature map by performing the down-sampling process is that, assuming that the size of the input video frame is 1280 × 720 (length × width, which means that there are 1280 pixels in length and 720 pixels in width), the image is adjusted to 896 × 896 by resize, which is convenient for subsequent processing. The down-sampling process is as follows:
(1) the input image is subjected to 1/4 downsampling through a convolution layer 1 (convolution kernel size 8 × 8, step size = 8) to obtain a feature map 1, size 224 × 224;
(2) the characteristic diagram 1 is subjected to 1/8 downsampling through a convolution layer 2 (the size of a convolution kernel is 2 x 2, and the step length is = 2), and a characteristic diagram 2 with the size of 112 x 112 is obtained;
(3) the feature map 2 is then passed through convolution layer 3 (convolution kernel size 2 x 2, step = 2), and the downsampling is computed 1/16 to get the feature map 3, size 56 x 56.
So far, 3 feature maps with different sizes are obtained by the image through a down-sampling process. Each frame of image in the fusion detection association module is subjected to down-sampling calculation, and 6 feature maps of the front frame and the rear frame are used as input and are transmitted into a difference calculation network. The method aims to calculate and fuse difference characteristics under different scales, and finally, a multi-task learning method is used for simultaneously predicting and obtaining data association relations among target types, target position information and targets.
The difference calculation network mainly comprises two structures of DenseBlock and Transition. The specific DenseBlock is composed of a BN layer + ReLU layer +3 × 3 convolution layer, and the input and output characteristic diagrams of DenseBlock are identical. The Transition is composed of a BN layer + a ReLU layer +1 × 1 convolution layer +2 × 2 average pooling layer, and therefore the size of the feature map becomes 1/2 after each Transition. In actual calculation, a total of 6 feature maps of the two frames before and after are input into the difference calculation network together. Similar to the twin Network (Siamese Network), the difference calculation Network also has two paths, which correspond to the 3 feature maps of the previous frame and the 3 feature maps of the current frame, respectively. The two path networks are identical in structure but different in weight.
(1) Firstly, inputting a feature map 1 with the size of 224 × 224 into each channel, changing the network size into 112 × 112 through Transition1, and then transmitting a DenseBlock1 network learning feature to obtain the feature of 112 × 112;
(2) fusing and adding the features obtained in the last step with the features 2, and continuously transmitting the features into a Transition2 network and a DenseBlock2 network to obtain 56 by 56 features;
(3) similarly, the features of the previous step and the feature map 3 are fused and added, and are transmitted into a DenseBlock3 network for further learning features;
(4) the previous frame and the current frame respectively obtain a feature map of 56 x 56, and the difference between the two feature maps obtains a difference feature with the size of 56 x 56.
S120, calculating a loss function; since the network target obtains the target category, the target position information and the target data association relationship at one time, that is, the trackID information in the tracking process, a loss function needs to be calculated.
The calculation of the loss function specifically includes the following three loss functions:
1) a target classification loss function;
2) a target location regression loss function;
3) multi-objective cross entropy loss function.
Wherein the target classification loss function
Figure 54581DEST_PATH_IMAGE001
The calculation method specifically comprises the following steps:
Figure 198511DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 100477DEST_PATH_IMAGE003
a true category label representing the object,
Figure 265353DEST_PATH_IMAGE004
representing the probability that the model predicts as a positive sample,
Figure 942234DEST_PATH_IMAGE005
representing a total number of target categories;
Figure 776505DEST_PATH_IMAGE006
indicates an object class label of
Figure 900188DEST_PATH_IMAGE003
A category feature of (a);
Figure 122615DEST_PATH_IMAGE007
and expressing a class characteristic balance coefficient for balancing the influence of the class characteristic on the whole loss function, wherein the value is 0.5.
Random initialization at the beginning of training, followed by updating of training each iteration
Figure 429835DEST_PATH_IMAGE008
The update formula is:
Figure 715235DEST_PATH_IMAGE009
Figure 317425DEST_PATH_IMAGE010
indicating the difference between the current data and the class characteristics,
Figure 809455DEST_PATH_IMAGE005
the total number of target categories is represented,
Figure 192419DEST_PATH_IMAGE006
representing object class label representations
Figure 394860DEST_PATH_IMAGE003
Is determined by the characteristics of the category of (1),
Figure 761644DEST_PATH_IMAGE004
representing the model predicted value;
Figure 284461DEST_PATH_IMAGE010
is shown as
Figure 50161DEST_PATH_IMAGE011
The difference between the current data and the class characteristics at the time of the second iteration, which will be later
Figure 143275DEST_PATH_IMAGE012
Value-modified update representation of
Figure 12004DEST_PATH_IMAGE013
Simultaneously use
Figure 848766DEST_PATH_IMAGE031
Guarantee
Figure 734551DEST_PATH_IMAGE015
The stability of the composite material is improved,
Figure 721269DEST_PATH_IMAGE016
the value was taken to be 0.5;
wherein the target location regression loss function:
Figure 799995DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 953632DEST_PATH_IMAGE018
the target predicted value of the model is represented,
Figure 431275DEST_PATH_IMAGE019
the true value of the target is represented,
Figure 128841DEST_PATH_IMAGE020
can take values
Figure 945794DEST_PATH_IMAGE021
Figure 127289DEST_PATH_IMAGE022
The coordinate value of the center point of the detection frame is shown,
Figure 736736DEST_PATH_IMAGE023
it indicates that the width of the detection frame,
Figure 605204DEST_PATH_IMAGE024
it indicates that the detection box is high,
Figure 157057DEST_PATH_IMAGE021
the position and the size of the target detection frame can be obtained through regression, and if the target prediction position output by the track prediction module is increased, the regression loss function of the target position is expressed as follows:
Figure 127594DEST_PATH_IMAGE025
wherein the content of the first and second substances,
Figure 582759DEST_PATH_IMAGE026
indicating the position of the output of the trajectory prediction module, including
Figure 636776DEST_PATH_IMAGE021
And (4) information.
Wherein, the multi-target cross entropy loss function:
Figure 171531DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 957478DEST_PATH_IMAGE003
a true category label representing the object,
Figure 255430DEST_PATH_IMAGE004
representing the model predicted value;
the fusion detection association module aims to generate target types, target position information and trackID information of targets among different video frames, so that loss functions need to be weighted and summed to form a total loss function, and the loss function of the fusion detection association module needs to be calculated;
the loss function of the fusion detection correlation module is as follows:
Figure 970095DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 53716DEST_PATH_IMAGE029
Figure 640424DEST_PATH_IMAGE030
and the multi-task weight parameter is expressed and can be set according to different task requirements.
S130, acquiring data association relation among the target type, the target position information and the target; inputting target position information into a track prediction module, learning target movement by using convolution operation, outputting predicted position information, forming different types of target motion rule information and transmitting the different types of target motion rule information to a database and a fusion detection association module;
the data association target is to obtain target trackID information in front and rear video frames, and if a red vehicle appears in the previous frame and the red vehicle also appears in the current frame, the two vehicles can be judged to be the same trackID through data association. In order to find that the same object has the same trackID in different frames, a model should judge that the same object is closer to the space than different objects, and a common method in the prior art, namely a triplet loss function, is used in the MOT.
The specific algorithm implementation process is as follows: the difference features are followed by a full connection layer, and the number N of nodes of the full connection layer indicates that there are at most N different trackids (N is a hyper-parameter, which can be modified according to the needs of the scene, and usually takes the value N = 20000). The classification process is to classify the object when the object is detected. If the target exists before, the corresponding trackID is correctly classified, otherwise, the target is a new target with a classification label of-1, the parameters of the full connection layer are updated, and the object can be identified in the subsequent classification process by adding a trackID. Meanwhile, in the updating process of the model parameters, the trackIDs which are not detected for a long time can be forgotten, and the total number of the trackIDs recorded by the model is ensured not to exceed the value of N.
S150 outputs the multi-target tracking.
Embodiment 2, the embodiment is described with reference to fig. 4, and the multi-target tracking method further includes a trajectory prediction module, and the trajectory prediction module may learn historical trajectory information of targets of different categories. The trajectory prediction module model structure is described with particular reference to fig. 4. The LSTM structure is a classical network structure for processing time series data, while ConvLSTM is a network structure formed by combining an LSTM structure and convolution (convolution), and the model structure is specifically shown with reference to fig. 5, wherein,
Figure 801671DEST_PATH_IMAGE032
to represent
Figure 176983DEST_PATH_IMAGE033
The input of the time of day is,
Figure 689261DEST_PATH_IMAGE034
to represent
Figure 814080DEST_PATH_IMAGE035
The input of the time of day is,
Figure 874176DEST_PATH_IMAGE036
to represent
Figure 429179DEST_PATH_IMAGE035
The output of the time of day is,
Figure 419963DEST_PATH_IMAGE037
to represent
Figure 85824DEST_PATH_IMAGE033
The output of the time of day is,
Figure 218734DEST_PATH_IMAGE038
to represent
Figure 221936DEST_PATH_IMAGE039
The output of the moment can not only establish a time sequence relation, but also exert the characteristic of convolution to depict the local spatial features of the image.
S210, inputting the target position information into a track prediction module, and calculating output variables C and H in the LSTM; the model being input in a sequence of successive image frames, e.g. XtAnd Xt+1Calculating C (cell output) and H (hidden state) for two continuous frame inputs; c (cell output) and H (hidden state) are output variables in the LSTM.
Wherein C represents a cell unit in the LSTM and is used for storing time sequence information and medium-term and long-term memory; h represents a hidden unit for storing the recent memory in the time sequence information.
S220, estimating C and H of a target time through C and H input at the past time by using convolution operation;
s230, learning a target movement rule through a three-layer ConvLSTM network, outputting predicted position information, and forming different types of target movement rule information;
wherein the first layer learns the characteristic information of the target; a second layer learns position change information of the target between consecutive frames; the third layer outputs predicted position information.
S240, the different types of target motion rule information are respectively transmitted to the database and the fusion detection association module. When the target is shielded, and the fusion detection association module cannot identify the image information of the current frame, the motion track of the next frame of the image can be predicted through the motion rule information of different types of targets obtained through the track prediction model training.
In a traffic monitoring scene, the visual angle of the camera is generally fixed, so that the vehicle tracks in the pictures shot by the camera have certain similarity. The rule can be obtained through automatic learning of a special neural network structure. The track learning prediction module can also store the learned motion rule information in a database for a long time, and can be called at any time when the fusion detection association module needs to use the motion rule information.
After training the law of the movement of the learning target, inputting a frame of image and the position information of the current target, and outputting the position information of the target at the next moment by the track prediction model, wherein the position information comprises x, y, w and h. The predicted target position can be added into a target position loss function of the fusion detection correlation module, and the position identification accuracy is improved.
The track prediction module predicts different positions of different types of targets, and optionally stores output results in a database for the fusion detection association module to utilize the information.
The following explains English language appearing in the present embodiment or the drawings
1) ConvLSTM-Encode, namely a convolution length memory coding layer;
2) ConvLSTM-Position, memory Position layer when convolution length;
3) ConvLSTM-Decode, namely a convolution length memory decoding layer;
4) trackID, the same target should have the same trackID in different frames;
5) CNN: a Convolutional Neural Network. The key parameters are the size and the step size of the convolution kernel, the size of the convolution kernel influences the influence range of the convolution kernel in the image, and the step size influences the distance of each movement of the convolution kernel.
Embodiment 3, the embodiment is described with reference to fig. 6, and the multi-target tracking device of the embodiment includes a video input module, a fusion detection association module, a trajectory prediction module, an output module, and a storage module; the video input module is sequentially connected with the fusion detection association module and the output module; the shooting and fusion detection correlation module is connected with the track prediction module; the track prediction module is connected with the storage module; the video input module is used for inputting video information; the fusion detection association module is used for acquiring the data association relation among the target category, the target position information and the target and outputting the target position information to the track prediction module; the track prediction module is used for acquiring the motion rule information of different types of targets; outputting the motion rule information of the targets of different types to a storage module and a fusion detection association module; the output module is used for outputting the target tracking result output by the fusion detection correlation module; the storage module is used for storing the motion rule information of different types of targets.
The video input module inputs the video to the fusion detection association module, the fusion detection association module obtains the data association relation among the target category, the target position information and the target, simultaneously transmits the target position information to the track prediction module, and transmits the data association relation between the target category and the target to the output module; the track prediction module obtains different types of target motion rule information according to the received target position information, and simultaneously transmits the target motion rule information to the storage module and the fusion joint detection association module; when the target tracking of the fusion detection association module is lost, the next video frame can be predicted through the target motion rule information.
The key technology of the invention is as follows:
1. the invention fuses the detection algorithm and the data association algorithm into one module, reduces repeated calculation, and can obtain the data association information between the target position information and the continuous frames by only one-time calculation.
2. And the detection association module is used for learning multi-scale information of the video frame, performing differential feature learning at different scales and performing feature fusion at different scales on the basis. And finally, outputting a final result by utilizing a multi-task learning method.
3. The track prediction module can learn historical track information, help predict the target track, avoid losing because of sheltering from and causing the target.
4. The invention fuses the detection module and the data association module into the same neural network, reduces the calculated amount to shorten the operation time by sharing the same bottom layer characteristics,
the traditional DeepsSort algorithm runs 26FPS frames (FPS, which is how many frames can be detected per second, and the higher the FPS, the faster the algorithm is, which is a standard for measuring the execution speed of the algorithm), and the algorithm runs 33FPS frames.
The computer device of the present invention may be a device including a processor, a memory, and the like, for example, a single chip microcomputer including a central processing unit and the like. And the processor is used for implementing the steps of the recommendation method capable of modifying the relationship-driven recommendation data based on the CREO software when executing the computer program stored in the memory.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Computer-readable storage medium embodiments
The computer readable storage medium of the present invention may be any form of storage medium that can be read by a processor of a computer device, including but not limited to non-volatile memory, ferroelectric memory, etc., and the computer readable storage medium has stored thereon a computer program that, when the computer program stored in the memory is read and executed by the processor of the computer device, can implement the above-mentioned steps of the CREO-based software that can modify the modeling method of the relationship-driven modeling data.
The computer program comprises computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (8)

1. A multi-target tracking method is characterized by comprising the following steps:
s110, inputting the video into a fusion detection correlation module, performing down-sampling processing to obtain a feature map, and inputting the feature map into a difference calculation network to obtain difference features;
s120, calculating a loss function;
s130, acquiring data association relation among the target type, the target position information and the target; inputting target position information into a track prediction module, learning target movement by using convolution operation, outputting predicted position information, forming different types of target motion rule information and transmitting the different types of target motion rule information to a database and a fusion detection association module;
s140 outputs multi-target tracking.
2. The method according to claim 1, wherein the specific method for obtaining the feature map in step S110 is:
1) 1/4 downsampling the video through the convolutional layer 1 to obtain a characteristic diagram 1;
2) carrying out 1/8 downsampling on the characteristic diagram 1 through the convolutional layer 2 to obtain a characteristic diagram 2;
3) the characteristic diagram 2 is subjected to 1/16 downsampling by the convolutional layer 3 to obtain the characteristic diagram 3.
3. The method according to claim 2, wherein the calculating the loss function at step S120 specifically includes the following three loss functions:
1) a target classification loss function;
2) a target location regression loss function;
3) multi-objective cross entropy loss function.
4. The method according to claim 3, wherein the three loss functions of step S120 are calculated by:
1) objective classification penalty function
Figure DEST_PATH_IMAGE001
Figure 728543DEST_PATH_IMAGE002
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE003
a true category label representing the object,
Figure 377174DEST_PATH_IMAGE004
the predicted value of the model is represented,
Figure DEST_PATH_IMAGE005
representing a total number of target categories;
Figure 826085DEST_PATH_IMAGE006
representing object class label representations
Figure 740338DEST_PATH_IMAGE003
A category feature of (a);
Figure DEST_PATH_IMAGE007
representing a class characteristic balance coefficient, and taking the value of the class characteristic balance coefficient as 0.5;
random initialization at the beginning of training, followed by updating of training each iteration
Figure 286201DEST_PATH_IMAGE006
The update formula is:
Figure 406866DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE009
indicating the difference between the current data and the class characteristics,
Figure 405390DEST_PATH_IMAGE005
the total number of target categories is represented,
Figure 99896DEST_PATH_IMAGE006
representing object class label representations
Figure 212119DEST_PATH_IMAGE003
Is determined by the characteristics of the category of (1),
Figure 441368DEST_PATH_IMAGE004
representing the model predicted value;
Figure 956882DEST_PATH_IMAGE009
is shown as
Figure 929780DEST_PATH_IMAGE010
The difference between the current data and the class characteristics at the time of the second iteration, which will be later
Figure DEST_PATH_IMAGE011
Value-modified update representation of
Figure 981394DEST_PATH_IMAGE012
Simultaneously use
Figure DEST_PATH_IMAGE013
Guarantee
Figure 80412DEST_PATH_IMAGE014
The stability of the composite material is improved,
Figure DEST_PATH_IMAGE015
the value was taken to be 0.5;
2) target position regression loss function:
Figure 873704DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE017
the target predicted value of the model is represented,
Figure 801690DEST_PATH_IMAGE018
the true value of the target is represented,
Figure DEST_PATH_IMAGE019
can take values
Figure 176652DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
The coordinate value of the center point of the detection frame is shown,
Figure 40047DEST_PATH_IMAGE022
it indicates that the width of the detection frame,
Figure DEST_PATH_IMAGE023
it indicates that the detection box is high,
Figure 461581DEST_PATH_IMAGE020
the position and the size of the target detection frame can be obtained through regression, and if the target prediction position output by the track prediction module is increased, the regression loss function of the target position is expressed as follows:
Figure 36000DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE025
indicating the position of the output of the trajectory prediction module, including
Figure 796627DEST_PATH_IMAGE020
Information;
3) multi-objective cross entropy loss function:
Figure 991111DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 171775DEST_PATH_IMAGE003
a true category label representing the object,
Figure 841283DEST_PATH_IMAGE004
and representing the model predicted value.
5. The method according to claim 4, wherein the learning target movement by convolution operation at S130 outputs predicted position information, specifically including feature information of the first-layer learning target; a second layer learns position change information of the target between consecutive frames; the third layer outputs predicted position information.
6. A multi-target tracking device is characterized by comprising a video input module, a fusion detection association module, a track prediction module, an output module and a storage module; the video input module is sequentially connected with the fusion detection association module and the output module; the video input module and the fusion detection association module are connected with the track prediction module; the track prediction module is connected with the storage module; the video input module is used for inputting video information; the fusion detection association module is used for acquiring the data association relation among the target category, the target position information and the target and outputting the target position information to the track prediction module; the track prediction module is used for acquiring the motion rule information of different types of targets; outputting the motion rule information of the targets of different types to a storage module and a fusion detection association module; the output module is used for outputting the target tracking result output by the fusion detection correlation module; the storage module is used for storing the motion rule information of different types of targets.
7. A computer comprising a memory storing a computer program and a processor, the processor implementing the steps of a multi-target tracking method as claimed in any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a multi-target tracking method according to any one of claims 1 to 5.
CN202110922602.2A 2021-08-12 2021-08-12 Multi-target tracking method and device, computer and storage medium Active CN113409361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110922602.2A CN113409361B (en) 2021-08-12 2021-08-12 Multi-target tracking method and device, computer and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110922602.2A CN113409361B (en) 2021-08-12 2021-08-12 Multi-target tracking method and device, computer and storage medium

Publications (2)

Publication Number Publication Date
CN113409361A true CN113409361A (en) 2021-09-17
CN113409361B CN113409361B (en) 2023-04-18

Family

ID=77688703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110922602.2A Active CN113409361B (en) 2021-08-12 2021-08-12 Multi-target tracking method and device, computer and storage medium

Country Status (1)

Country Link
CN (1) CN113409361B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113993172A (en) * 2021-10-24 2022-01-28 河南大学 Ultra-dense network switching method based on user movement behavior prediction
CN114022509A (en) * 2021-09-24 2022-02-08 北京邮电大学 Target tracking method based on monitoring videos of multiple animals and related equipment
CN114170271A (en) * 2021-11-18 2022-03-11 安徽清新互联信息科技有限公司 Multi-target tracking method with self-tracking consciousness, equipment and storage medium
CN114419102A (en) * 2022-01-25 2022-04-29 江南大学 Multi-target tracking detection method based on frame difference time sequence motion information
CN116309692A (en) * 2022-09-08 2023-06-23 广东省机场管理集团有限公司工程建设指挥部 Method, device and medium for binding airport security inspection personal packages based on deep learning
CN117541625A (en) * 2024-01-05 2024-02-09 大连理工大学 Video multi-target tracking method based on domain adaptation feature fusion
CN117593340A (en) * 2024-01-18 2024-02-23 东方空间(江苏)航天动力有限公司 Method, device and equipment for determining swing angle of carrier rocket servo mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111127513A (en) * 2019-12-02 2020-05-08 北京交通大学 Multi-target tracking method
CN111882580A (en) * 2020-07-17 2020-11-03 元神科技(杭州)有限公司 Video multi-target tracking method and system
CN111898504A (en) * 2020-07-20 2020-11-06 南京邮电大学 Target tracking method and system based on twin circulating neural network
CN112001225A (en) * 2020-07-06 2020-11-27 西安电子科技大学 Online multi-target tracking method, system and application
US20210142489A1 (en) * 2019-11-13 2021-05-13 Beijing Baidu Netcom Science And Technology Co., Ltd. Target tracking method, device, electronic apparatus and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210142489A1 (en) * 2019-11-13 2021-05-13 Beijing Baidu Netcom Science And Technology Co., Ltd. Target tracking method, device, electronic apparatus and storage medium
CN111127513A (en) * 2019-12-02 2020-05-08 北京交通大学 Multi-target tracking method
CN112001225A (en) * 2020-07-06 2020-11-27 西安电子科技大学 Online multi-target tracking method, system and application
CN111882580A (en) * 2020-07-17 2020-11-03 元神科技(杭州)有限公司 Video multi-target tracking method and system
CN111898504A (en) * 2020-07-20 2020-11-06 南京邮电大学 Target tracking method and system based on twin circulating neural network

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114022509A (en) * 2021-09-24 2022-02-08 北京邮电大学 Target tracking method based on monitoring videos of multiple animals and related equipment
CN113993172B (en) * 2021-10-24 2022-10-25 河南大学 Ultra-dense network switching method based on user movement behavior prediction
CN113993172A (en) * 2021-10-24 2022-01-28 河南大学 Ultra-dense network switching method based on user movement behavior prediction
CN114170271B (en) * 2021-11-18 2024-04-12 安徽清新互联信息科技有限公司 Multi-target tracking method, equipment and storage medium with self-tracking consciousness
CN114170271A (en) * 2021-11-18 2022-03-11 安徽清新互联信息科技有限公司 Multi-target tracking method with self-tracking consciousness, equipment and storage medium
CN114419102A (en) * 2022-01-25 2022-04-29 江南大学 Multi-target tracking detection method based on frame difference time sequence motion information
CN114419102B (en) * 2022-01-25 2023-06-06 江南大学 Multi-target tracking detection method based on frame difference time sequence motion information
CN116309692A (en) * 2022-09-08 2023-06-23 广东省机场管理集团有限公司工程建设指挥部 Method, device and medium for binding airport security inspection personal packages based on deep learning
CN116309692B (en) * 2022-09-08 2023-10-20 广东省机场管理集团有限公司工程建设指挥部 Method, device and medium for binding airport security inspection personal packages based on deep learning
CN117541625A (en) * 2024-01-05 2024-02-09 大连理工大学 Video multi-target tracking method based on domain adaptation feature fusion
CN117541625B (en) * 2024-01-05 2024-03-29 大连理工大学 Video multi-target tracking method based on domain adaptation feature fusion
CN117593340A (en) * 2024-01-18 2024-02-23 东方空间(江苏)航天动力有限公司 Method, device and equipment for determining swing angle of carrier rocket servo mechanism
CN117593340B (en) * 2024-01-18 2024-04-05 东方空间(江苏)航天动力有限公司 Method, device and equipment for determining swing angle of carrier rocket servo mechanism

Also Published As

Publication number Publication date
CN113409361B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN113409361B (en) Multi-target tracking method and device, computer and storage medium
CN110245659B (en) Image salient object segmentation method and device based on foreground and background interrelation
CN111539370B (en) Image pedestrian re-identification method and system based on multi-attention joint learning
CN110415277B (en) Multi-target tracking method, system and device based on optical flow and Kalman filtering
CN112132119B (en) Passenger flow statistical method and device, electronic equipment and storage medium
CN111062413A (en) Road target detection method and device, electronic equipment and storage medium
CN110781262B (en) Semantic map construction method based on visual SLAM
CN112651995B (en) Online multi-target tracking method based on multifunctional aggregation and tracking simulation training
CN111382686B (en) Lane line detection method based on semi-supervised generation confrontation network
CN112215255B (en) Training method of target detection model, target detection method and terminal equipment
Akan et al. Stretchbev: Stretching future instance prediction spatially and temporally
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
Patil et al. Msednet: multi-scale deep saliency learning for moving object detection
CN114049382A (en) Target fusion tracking method, system and medium in intelligent network connection environment
CN112036381B (en) Visual tracking method, video monitoring method and terminal equipment
Dinh et al. Transfer learning for vehicle detection using two cameras with different focal lengths
CN113160283A (en) Target tracking method based on SIFT under multi-camera scene
CN115063447A (en) Target animal motion tracking method based on video sequence and related equipment
CN115546705A (en) Target identification method, terminal device and storage medium
CN117036397A (en) Multi-target tracking method based on fusion information association and camera motion compensation
CN115661767A (en) Image front vehicle target identification method based on convolutional neural network
CN116861262B (en) Perception model training method and device, electronic equipment and storage medium
CN115100565B (en) Multi-target tracking method based on spatial correlation and optical flow registration
US20230298335A1 (en) Computer-implemented method, data processing apparatus and computer program for object detection
CN116129386A (en) Method, system and computer readable medium for detecting a travelable region

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant