CN110956643A - Improved vehicle tracking method and system based on MDNet - Google Patents

Improved vehicle tracking method and system based on MDNet Download PDF

Info

Publication number
CN110956643A
CN110956643A CN201911227267.3A CN201911227267A CN110956643A CN 110956643 A CN110956643 A CN 110956643A CN 201911227267 A CN201911227267 A CN 201911227267A CN 110956643 A CN110956643 A CN 110956643A
Authority
CN
China
Prior art keywords
mdnet
target
tracking
samples
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911227267.3A
Other languages
Chinese (zh)
Inventor
李爱民
王建文
逄业文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201911227267.3A priority Critical patent/CN110956643A/en
Publication of CN110956643A publication Critical patent/CN110956643A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an improved vehicle tracking method and system based on MDNet, which adopts an improved MDNet tracking algorithm, firstly, Mask RCNN is adopted to carry out example segmentation operation on a video frame, and candidate areas obtained by example segmentation are used as the input of the improved MDNet algorithm, so that a foreground tracking target is strengthened, the tracking range is reduced, a background and a target can be clearly distinguished, and the tracking real-time performance and the tracking accuracy are improved; meanwhile, the training and testing of the improved MDNet tracking algorithm are carried out on line, and the smaller network structure after example segmentation enables the robustness of the improved MDNet tracking algorithm to be better in target tracking.

Description

Improved vehicle tracking method and system based on MDNet
Technical Field
The disclosure relates to the technical field of vehicle tracking, in particular to an improved vehicle tracking method and system based on MDNet.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Computer vision is one of the popular disciplines in the field of artificial intelligence and has received wide attention from both domestic and foreign scholars. Visual target tracking, as an important branch of research in computer vision, has attracted the eye of visual researchers. Beginning in 2015, deep learning enters the military target tracking field, and features of targets can be better extracted by using the deep learning. The target is better expressed, powerful target change is processed, the tracker is prevented from drifting, and the target can be subjected to range positioning. Visual target tracking technology has been widely used in many areas of life and military affairs. The vehicle target tracking is a key problem in the research of the intelligent traffic field, and the intelligent traffic system performs tasks such as traffic flow control, vehicle illegal behavior detection and the like according to the acquired video images. Accurate detection and tracking of vehicle targets is of great significance to traffic safety and intelligent vehicle management.
The inventor of the present disclosure finds that currently, commonly used algorithms for tracking a moving vehicle mainly include optical flow-based target tracking, motion estimation-based target tracking, recognition-based target tracking, and deep learning-based target tracking methods, and a difficulty of vehicle target tracking research is how to ensure robustness, real-time performance, and accuracy of the algorithms. The existing tracking algorithm has a good effect when the tracking problem of the moving vehicle under the condition of a simple background is processed, but due to the complexity of target motion and the timeliness of target characteristics, when a tracking target is shielded, rotated, changed in scale and interfered by the background, the tracking effect is poor, and a robust tracking effect is difficult to obtain.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an improved vehicle tracking method and system based on MDNet, which improve the discrimination of background and foreground targets and further improve the real-time property, accuracy and robustness of vehicle tracking.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
the first aspect of the present disclosure provides an improved MDNet-based vehicle tracking method.
An improved vehicle tracking method based on MDNet, comprising the steps of:
preprocessing an acquired video sequence, inputting the video sequence into a Mask R-CNN neural network for example segmentation to obtain a candidate region of a vehicle target to be tracked;
the method comprises the following steps of utilizing an obtained candidate region of a vehicle target to be tracked as an input, utilizing an MDNet network to track the target, and specifically comprising the following steps:
when the state of each frame of target of a video sequence is predicted, generating a plurality of candidate area positive samples and negative samples which accord with Gaussian distribution according to the predicted target position of the previous frame, then obtaining scores of the positive samples and the negative samples according to an MDNet network, and finding out the candidate area sample with the highest target score as the current optimal target state;
and averaging the scores of all the candidate area samples, comparing the average with a preset threshold value, judging that the target tracking is successful when the scores of all the candidate area samples are greater than the preset threshold value, and otherwise, judging that the target tracking is failed.
As some possible implementation modes, a random gradient descent method is adopted to train the convolutional neural network of the MDNet, and N is sequentially selected when the first video sequence is iterated each time1Frame, then in N1In frames, each frame takes M1A positive sample and M2A bounding box of negative examples, including N1*M1A positive sample, N1*M2And all the positive samples and the negative samples form a small batch, and the segmentation results obtained by segmenting the examples are unified into A x A as the input of the network.
As some possible implementation manners, the MDNet neural network adopts an RReLU activation function, sets a dynamically changing learning rate according to the number of training rounds, adopts a large learning rate when the distance from the optimal solution is far from the initial training, and gradually reduces the learning rate in the process of approaching the optimal solution along with the increase of the number of iterations.
As some possible implementation manners, averaging all the candidate area samples to generate a target boundary frame of the current frame, if the tracking is successful, performing fine adjustment on the boundary frame, generating a plurality of positive sample areas and negative sample areas according to the target boundary frame predicted by the current frame, performing forward propagation on the sample areas respectively, and storing the third convolution layer convolution characteristics of the areas.
By way of further limitation, positive sample regions of frames prior to a first predetermined number are discarded if the number of video frames exceeds a first predetermined number, and negative sample regions of frames prior to a second predetermined number are discarded if the number of video frames exceeds a second predetermined number.
As a further limitation, if the tracking fails, performing short-time updating, selecting a positive sample and a negative sample with the latest preset frame number, and then performing iterative training for a preset number of rounds;
randomly extracting third convolution layer features and T of S positive samples at each iteration1Third convolution layer characteristics of negative samples, constituting a small batch, and1putting the negative samples into an MDNet network, circulating for preset times, and calculating scores;
then from T1Selecting T from the negative sample2And taking the maximum calculation target score as a difficult negative sample, respectively calculating the score of the positive sample and the score of the difficult negative sample, carrying out forward propagation calculation loss, and carrying out parameter optimization of the MDNet network.
As a further limitation, the method is input into a Mask R-CNN neural network for example segmentation, specifically:
after the video sequence is input into a network, obtaining a corresponding feature map, and obtaining a plurality of candidate identification areas in the feature map;
sending the candidate identification areas into an RPN for binary classification, and filtering out the candidate identification areas which do not meet the requirements;
and obtaining the coordinates of part of candidate recognition areas by using an RPN, inputting the coordinates into ROI Pooling, outputting a characteristic diagram with the size of B x B for classification and positioning, and performing ROIAlign operation on the remaining candidate recognition areas.
A second aspect of the present disclosure provides an improved MDNet based vehicle tracking system.
An MDNet based improved vehicle tracking system comprising:
an instance partitioning module configured to: preprocessing an acquired video sequence, inputting the video sequence into a Mask R-CNN neural network for example segmentation to obtain a candidate region of a vehicle target to be tracked;
a target tracking module configured to: the method comprises the following steps of utilizing an obtained candidate region of a vehicle target to be tracked as an input, utilizing an MDNet network to track the target, and specifically comprising the following steps:
when the state of each frame of target of a video sequence is predicted, generating a plurality of candidate area positive samples and negative samples which accord with Gaussian distribution according to the predicted target position of the previous frame, then obtaining scores of the positive samples and the negative samples according to an MDNet network, and finding out the candidate area sample with the highest target score as the current optimal target state;
and averaging the scores of all the candidate area samples, comparing the average with a preset threshold value, judging that the target tracking is successful when the scores of all the candidate area samples are greater than the preset threshold value, and otherwise, judging that the target tracking is failed.
A third aspect of the present disclosure provides a medium having a program stored thereon, the program, when executed by a processor, implementing the steps in the MDNet-based improved vehicle tracking method according to the first aspect of the present disclosure.
A fourth aspect of the present disclosure provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the MDNet-based improved vehicle tracking method according to the first aspect of the present disclosure when executing the program.
Compared with the prior art, the beneficial effect of this disclosure is:
1. according to the method, the video frame is subjected to instance segmentation, then the segmented foreground vehicle region is used as the input of MDNet vehicle target tracking processing to be subjected to subsequent target tracking processing, and the smaller tracking region is obtained after instance segmentation, so that the network structure adopted by the method can be relatively smaller, the background and foreground targets can be distinguished conveniently, and the tracking real-time performance and accuracy can be improved.
2. The MDNet tracking algorithm is improved, the improved MDNet tracking algorithm firstly adopts Mask RCNN to carry out example segmentation operation on a video frame, and candidate areas obtained by example segmentation are used as input of the improved MDNet tracking algorithm, so that a foreground tracking target is strengthened, the tracking range is reduced, a background and a target can be clearly distinguished, and the tracking instantaneity and accuracy are improved.
3. The network architecture adopted by the present disclosure contains five hidden layers, three convolutional layers (conv1-conv3) and two global connection layers (fc4-fc5), and adopts a smaller network structure, so that the network architecture achieves a more robust effect in target tracking.
Drawings
Fig. 1 is a schematic diagram of an overall implementation of the MDNet-based improved vehicle tracking method provided in embodiment 1 of the present disclosure.
Fig. 2 is a schematic flowchart of the Mask RCNN algorithm provided in embodiment 1 of the present disclosure.
Fig. 3 is a schematic diagram of an example segmentation result provided in embodiment 1 of the present disclosure.
Fig. 4 is a schematic flow chart of an improved algorithm MDNet algorithm provided in embodiment 1 of the present disclosure.
Fig. 5 is a schematic diagram of a vehicle target tracking result provided in embodiment 1 of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
Example 1:
as shown in fig. 1, embodiment 1 of the present disclosure provides an improved vehicle tracking method based on MDNet, which includes performing example segmentation on a video frame by using Mask RCNN, and using candidate regions obtained by example segmentation as input of an improved MDNet algorithm, thereby enhancing a foreground tracking target, reducing a tracking range, more clearly distinguishing a background from a target, and performing training and testing on line.
The network architecture adopted receives 107 × 107RGB input and comprises five hidden layers, wherein three convolutional layers (conv1-conv3) and two global connection layers (fc4-fc5), and the embodiment adopts a smaller network structure, so that the network architecture obtains a more robust effect in target tracking.
The method comprises the following specific steps:
step 1: preprocessing operations such as data labeling are performed on the video sequence, and the labeled video sequence is input into the neural network, as shown in fig. 2.
Step 2: after input, a corresponding feature map (feature map) is obtained, and a plurality of ROI candidates (regions of interest) are obtained in the feature map. And then, the candidate ROIs are sent to an RPN network for binary classification, and a part of candidate ROIs are filtered out.
The RPN network proposes the coordinates of several ROIs as [ x, y, w, h ], then inputs ROI Pooling (region of interest Pooling), outputs 7 × 7 size feature maps for classification and localization. The purpose of ROI Pooling is to uniformly adjust the different sized ROIs to 7 × 7 smaller feature maps. And then, the remaining ROI is subjected to ROI Align (region of interest aggregation) operation, so that the problem of region mismatching in the ROI Pooling operation can be better solved.
Mask RCNN adopts an average two-value cross entropy loss function, and the loss function of Mask R-CNN can be described as follows:
Lfinal=L({pi},{ti})+(Lcls+Lbox+Lmask) (1)
wherein L isclsAnd LboxFor classification and regression, LmaskIs to classify each pixel, with K m dimensions of the output, K representing the number of classes, m being the size of the extracted ROI image.
Finally, these ROIs were classified and Mask generated. A sequence of video frames to be segmented is obtained as shown in fig. 3.
The penalty function for training the RPN is described as follows:
Figure BDA0002302582150000071
in the above formula, i is the serial number of the anchor in each small batch, p is the probability of the anchor target, and p is*Is a label, t is four parameters of the prediction box, t*Is a parameter of the calibration frame, LclsIs a classification loss function, LregIs a regression loss function.
For roilign back propagation the following is described:
Figure BDA0002302582150000072
denotes the distance between two points, Δ h and Δ w denote xiAnd xiDifference between abscissa and ordinate (r, j).
And step 3: and performing example segmentation operation on the target in the video image by using Mask RCNN to obtain candidate region information of the vehicle target to be tracked.
The result obtained in step 2 is used as the input of the improved MDNet algorithm, as shown in fig. 4, which makes it easier to improve the tracking efficiency and effectively distinguish the tracked target from the background. By adopting the method, the situation that the target tracking effect is deteriorated and even the tracking target is lost due to the fact that the space position information of the target is diluted along with the deepening of the network can be avoided.
The size of the vehicle candidate region obtained by example segmentation is much smaller than that of the original image, so that the tracking can be realized by adopting smaller network depth. The MDNet separates, from the domain-specific (region-dependent representation) information, domain-independent representation information by a multi-domain learning framework.
The CNN employed is trained by a random gradient descent (SGD) method, where each domain is specifically processed in each iteration. Because of the SGD method training, the video sequence is first scrambled. The original video sequence is arranged according to the frame order, and each time the first video sequence is iterated, 8 frames are taken in turn, and then in the 8 frames, each frame takes a bounding box of 4 positive samples (IOU > -0.7) and 12 negative samples (IOU < ═ 0.3), where IOU is the overlap ratio of the generated candidate box and the generated ground channel box. The segmentation results obtained by instance segmentation are unified into 107 × 107 as the input of the network, and a mini-batch is composed of 32 positive samples and 96 negative samples.
The IOU equation is defined as follows:
Figure BDA0002302582150000081
Figure BDA0002302582150000082
Figure BDA0002302582150000083
the MDNet algorithm improvement method comprises the following steps: using the RReLU activation function, if the learning rate setting is too small, the entire network convergence process may become extremely slow; if the learning rate is set too large, the gradient may wander around the minimum and may even fail to converge to achieve the desired effect. In the present embodiment, the learning rate is not fixed, but a dynamically changing learning rate is set according to the number of training rounds. When training is started, a slightly larger learning rate is adopted when the distance from the optimal solution is far, and the learning rate is gradually reduced in the process of approaching the optimal solution along with the increase of the iteration times.
The activation function employed is expressed as follows:
Figure BDA0002302582150000091
wherein, aji~U(l,u),l<u and l,u∈[0,1)
And 4, step 4: in the process of target tracking, a simple network is always kept, and meanwhile two updating methods, namely Long-term updating and short-term updating, are adopted according to the change speed of the appearance of the target. Long-term updates are updated after regular intervals, short-term updates are updated when potential update failures occur, i.e., the positive score of the predicted target is less than 0.5.
When predicting the state of each frame of target, firstly extracting N templates around the previous frame of object, and then obtaining the score f of the positive sample according to the network+(xi) Score f of Sum negative sample-(xi). By finding the score of the largest sample as the current optimal target state X*
Figure BDA0002302582150000092
The method specifically comprises the following steps: and generating 256 candidate areas which accord with the Gaussian distribution on each frame according to the target position predicted by the previous frame, wherein the generated candidate boxes are represented as (x, y, w, h). Then, the candidate frame area is cut out from the original image, and then the size resize is 107 × 107 to be used as the input of the network for calculation. The scores of the 256 candidate regions are calculated through forward propagation, and the candidate region with the highest target score is selected. These candidate regions are averaged to generate a target bounding box of the current frame, and an average of the candidate region scores is calculated. A threshold is set and then compared with a threshold to determine whether tracking is successful.
Step 6: if the tracking is successful, performing bounding box fine tuning, generating 50 positive sample regions (IOU > -0.7) and 200 negative sample regions (IOU < -0.3) according to the predicted targetoutputting box of the current frame. The sample regions are then separately forward propagated, and finally the conv3 (third convolution layer) characteristics of these regions are preserved. Wherein the number of video frames exceeds 100 and discards the positive sample regions of the earliest frames, and if the number of video frames exceeds 20, discards the negative sample regions of the earliest frames.
If the tracking fails, the aforementioned short-time update is performed. And selecting the positive sample and the negative sample of the latest 20 frames, and then carrying out 15 rounds of iterative training, wherein the iterative process is the same as the previous iterative process, and randomly extracting the conv3 features of 32 positive samples and the conv3 features of 1024 negative samples in each iteration to form a mini-batch. These 1024 negative samples are then put into the test model, 4 cycles are made, scores are calculated, and scores where the calculation result is the target are retained. And then picking out 96 negative samples with the largest calculated target score from the 1024 negative samples as the difficult negative samples. Then, a training model is introduced, scores of positive samples (32) and scores of difficult negative samples (96) are calculated respectively, loss is calculated through forward propagation, optimization of an optimizer and updating of parameters are carried out, and the like.
And 7: the tracked object is displayed resulting in a tracked video sequence, as shown in fig. 5.
Example 2:
this disclosed embodiment 2 has improved an improvement vehicle tracking system based on MDNet, includes:
an instance partitioning module configured to: preprocessing an acquired video sequence, inputting the video sequence into a Mask R-CNN neural network for example segmentation to obtain a candidate region of a vehicle target to be tracked;
a target tracking module configured to: the method comprises the following steps of utilizing an obtained candidate region of a vehicle target to be tracked as an input, utilizing an MDNet network to track the target, and specifically comprising the following steps:
when the state of each frame of target of a video sequence is predicted, generating a plurality of candidate area positive samples and negative samples which accord with Gaussian distribution according to the predicted target position of the previous frame, then obtaining scores of the positive samples and the negative samples according to an MDNet network, and finding out the candidate area sample with the highest target score as the current optimal target state;
and averaging the scores of all the candidate area samples, comparing the average with a preset threshold value, judging that the target tracking is successful when the scores of all the candidate area samples are greater than the preset threshold value, and otherwise, judging that the target tracking is failed.
Example 3:
the embodiment 3 of the present disclosure provides a medium on which a program is stored, which when executed by a processor implements the steps in the MDNet-based improved vehicle tracking method according to the embodiment 1 of the present disclosure.
Example 4:
the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored on the memory and executable on the processor, and the processor executes the program to implement the steps in the MDNet-based improved vehicle tracking method according to the embodiment 1 of the present disclosure.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. An improved vehicle tracking method based on MDNet, characterized by comprising the following steps:
preprocessing an acquired video sequence, inputting the video sequence into a Mask R-CNN neural network for example segmentation to obtain a candidate region of a vehicle target to be tracked;
the method comprises the following steps of utilizing an obtained candidate region of a vehicle target to be tracked as an input, utilizing an MDNet network to track the target, and specifically comprising the following steps:
when the state of each frame of target of a video sequence is predicted, generating a plurality of candidate area positive samples and negative samples which accord with Gaussian distribution according to the predicted target position of the previous frame, then obtaining scores of the positive samples and the negative samples according to an MDNet network, and finding out the candidate area sample with the highest target score as the current optimal target state;
and averaging the scores of all the candidate area samples, comparing the average with a preset threshold value, judging that the target tracking is successful when the scores of all the candidate area samples are greater than the preset threshold value, and otherwise, judging that the target tracking is failed.
2. The MDNet-based improved vehicle tracking method of claim 1, wherein the convolutional neural network of the MDNet is trained by using a random gradient descent method, and each time the convolutional neural network is iterated to the first video sequence, N is sequentially taken1Frame, then in N1In frames, each frame takes M1A positive sample and M2A bounding box of negative examples, including N1*M1A positive sample, N1*M2And all the positive samples and the negative samples form a small batch, and the segmentation results obtained by segmenting the examples are unified into A x A as the input of the network.
3. The MDNet-based improved vehicle tracking method according to claim 1, wherein the MDNet neural network uses a RReLU activation function, sets a dynamically changing learning rate according to the number of training rounds, uses a large learning rate when the training is just started and is far from the optimal solution, and gradually reduces the learning rate as the number of iterations increases in the process of approaching the optimal solution.
4. The MDNet-based improved vehicle tracking method of claim 1, wherein the target bounding box of the current frame is generated by averaging all the candidate region samples, and if tracking is successful, fine-tuning of the bounding box is performed, a plurality of positive sample regions and negative sample regions are generated from the predicted target bounding box of the current frame, forward propagation is performed on the sample regions, respectively, and third convolutional layer convolution characteristics of the regions are saved.
5. The MDNet-based improved vehicle tracking method of claim 2, wherein the positive sample regions of frames before the first preset number are discarded if the number of video frames exceeds a first preset number, and the negative sample regions of frames before the second preset number are discarded if the number of video frames exceeds a second preset number.
6. The MDNet-based improved vehicle tracking method of claim 2, wherein if tracking fails, a short time update is performed, a positive sample and a negative sample of a latest preset frame number are selected, and then iterative training is performed for a preset number of rounds;
randomly extracting third convolution layer features and T of S positive samples at each iteration1Third convolution layer characteristics of negative samples, constituting a small batch, and1putting the negative samples into an MDNet network, circulating for preset times, and calculating scores;
then from T1Selecting T from the negative sample2And taking the maximum calculation target score as a difficult negative sample, respectively calculating the score of the positive sample and the score of the difficult negative sample, carrying out forward propagation calculation loss, and carrying out parameter optimization of the MDNet network.
7. The MDNet-based improved vehicle tracking method of claim 2, wherein the MDNet-based improved vehicle tracking method is input into a Mask R-CNN neural network for instance segmentation, and specifically comprises the following steps:
after the video sequence is input into a network, obtaining a corresponding feature map, and obtaining a plurality of candidate identification areas in the feature map;
sending the candidate identification areas into an RPN for binary classification, and filtering out the candidate identification areas which do not meet the requirements;
and obtaining the coordinates of part of candidate recognition areas by using an RPN, inputting the coordinates into ROI Pooling, outputting a characteristic diagram with the size of B x B for classification and positioning, and performing ROIAlign operation on the remaining candidate recognition areas.
8. An improved MDNet-based vehicle tracking system, comprising:
an instance partitioning module configured to: preprocessing an acquired video sequence, inputting the video sequence into a Mask R-CNN neural network for example segmentation to obtain a candidate region of a vehicle target to be tracked;
a target tracking module configured to: the method comprises the following steps of utilizing an obtained candidate region of a vehicle target to be tracked as an input, utilizing an MDNet network to track the target, and specifically comprising the following steps:
when the state of each frame of target of a video sequence is predicted, generating a plurality of candidate area positive samples and negative samples which accord with Gaussian distribution according to the predicted target position of the previous frame, then obtaining scores of the positive samples and the negative samples according to an MDNet network, and finding out the candidate area sample with the highest target score as the current optimal target state;
and averaging the scores of all the candidate area samples, comparing the average with a preset threshold value, judging that the target tracking is successful when the scores of all the candidate area samples are greater than the preset threshold value, and otherwise, judging that the target tracking is failed.
9. A medium having a program stored thereon, wherein the program, when executed by a processor, implements the steps in the MDNet based improved vehicle tracking method of any of claims 1-7.
10. An electronic device comprising a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps in the MDNet-based improved vehicle tracking method of any one of claims 1-7.
CN201911227267.3A 2019-12-04 2019-12-04 Improved vehicle tracking method and system based on MDNet Pending CN110956643A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911227267.3A CN110956643A (en) 2019-12-04 2019-12-04 Improved vehicle tracking method and system based on MDNet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911227267.3A CN110956643A (en) 2019-12-04 2019-12-04 Improved vehicle tracking method and system based on MDNet

Publications (1)

Publication Number Publication Date
CN110956643A true CN110956643A (en) 2020-04-03

Family

ID=69979741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911227267.3A Pending CN110956643A (en) 2019-12-04 2019-12-04 Improved vehicle tracking method and system based on MDNet

Country Status (1)

Country Link
CN (1) CN110956643A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709328A (en) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 Vehicle tracking method and device and electronic equipment
CN113129337A (en) * 2021-04-14 2021-07-16 桂林电子科技大学 Background perception tracking method, computer readable storage medium and computer device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460793A (en) * 2018-11-15 2019-03-12 腾讯科技(深圳)有限公司 A kind of method of node-classification, the method and device of model training
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection
CN110059605A (en) * 2019-04-10 2019-07-26 厦门美图之家科技有限公司 A kind of neural network training method calculates equipment and storage medium
CN110276321A (en) * 2019-06-11 2019-09-24 北方工业大学 Remote sensing video target tracking method and system
CN110348363A (en) * 2019-07-05 2019-10-18 西安邮电大学 The vehicle tracking algorithm for eliminating similar vehicle interference is merged based on multiframe angle information
CN110472594A (en) * 2019-08-20 2019-11-19 腾讯科技(深圳)有限公司 Method for tracking target, information insertion method and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460793A (en) * 2018-11-15 2019-03-12 腾讯科技(深圳)有限公司 A kind of method of node-classification, the method and device of model training
CN109711295A (en) * 2018-12-14 2019-05-03 北京航空航天大学 A kind of remote sensing image offshore Ship Detection
CN110059605A (en) * 2019-04-10 2019-07-26 厦门美图之家科技有限公司 A kind of neural network training method calculates equipment and storage medium
CN110276321A (en) * 2019-06-11 2019-09-24 北方工业大学 Remote sensing video target tracking method and system
CN110348363A (en) * 2019-07-05 2019-10-18 西安邮电大学 The vehicle tracking algorithm for eliminating similar vehicle interference is merged based on multiframe angle information
CN110472594A (en) * 2019-08-20 2019-11-19 腾讯科技(深圳)有限公司 Method for tracking target, information insertion method and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HYEONSEOB NAM等: ""Mask R-CNN"" *
码农教程: ""MDNet训练与在线跟踪过程"" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709328A (en) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 Vehicle tracking method and device and electronic equipment
WO2021238062A1 (en) * 2020-05-29 2021-12-02 北京百度网讯科技有限公司 Vehicle tracking method and apparatus, and electronic device
CN111709328B (en) * 2020-05-29 2023-08-04 北京百度网讯科技有限公司 Vehicle tracking method and device and electronic equipment
CN113129337A (en) * 2021-04-14 2021-07-16 桂林电子科技大学 Background perception tracking method, computer readable storage medium and computer device
CN113129337B (en) * 2021-04-14 2022-07-19 桂林电子科技大学 Background perception tracking method, computer readable storage medium and computer device

Similar Documents

Publication Publication Date Title
CN109145713B (en) Small target semantic segmentation method combined with target detection
CN107767405B (en) Nuclear correlation filtering target tracking method fusing convolutional neural network
CN108647577B (en) Self-adaptive pedestrian re-identification method and system for difficult excavation
CN107145889B (en) Target identification method based on double CNN network with RoI pooling
CN113326731B (en) Cross-domain pedestrian re-identification method based on momentum network guidance
CN112069874B (en) Method, system, equipment and storage medium for identifying cells in embryo light microscope image
CN109598268A (en) A kind of RGB-D well-marked target detection method based on single flow depth degree network
CN110929848B (en) Training and tracking method based on multi-challenge perception learning model
CN111680702B (en) Method for realizing weak supervision image significance detection by using detection frame
CN107609512A (en) A kind of video human face method for catching based on neutral net
CN111274917B (en) Long-time target tracking method based on depth detection
CN111160407A (en) Deep learning target detection method and system
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN111192294B (en) Target tracking method and system based on target detection
CN114693983B (en) Training method and cross-domain target detection method based on image-instance alignment network
CN114333040B (en) Multi-level target detection method and system
CN110956643A (en) Improved vehicle tracking method and system based on MDNet
CN112434599A (en) Pedestrian re-identification method based on random shielding recovery of noise channel
CN114139631A (en) Multi-target training object-oriented selectable ash box confrontation sample generation method
CN111931572B (en) Target detection method for remote sensing image
Duan [Retracted] Deep Learning‐Based Multitarget Motion Shadow Rejection and Accurate Tracking for Sports Video
Shobha et al. Deep learning assisted active net segmentation of vehicles for smart traffic management
CN116416152A (en) Image data-based processing method and device
CN113850166A (en) Ship image identification method and system based on convolutional neural network
CN110909670B (en) Unstructured road identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination