CN110033473B - Moving target tracking method based on template matching and depth classification network - Google Patents

Moving target tracking method based on template matching and depth classification network Download PDF

Info

Publication number
CN110033473B
CN110033473B CN201910297980.9A CN201910297980A CN110033473B CN 110033473 B CN110033473 B CN 110033473B CN 201910297980 A CN201910297980 A CN 201910297980A CN 110033473 B CN110033473 B CN 110033473B
Authority
CN
China
Prior art keywords
template
target
image
residual
double
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910297980.9A
Other languages
Chinese (zh)
Other versions
CN110033473A (en
Inventor
田小林
李芳�
李帅
李娇娇
荀亮
贾楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201910297980.9A priority Critical patent/CN110033473B/en
Publication of CN110033473A publication Critical patent/CN110033473A/en
Application granted granted Critical
Publication of CN110033473B publication Critical patent/CN110033473B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a moving target tracking method based on template matching and a depth classification network, which mainly solves the problems of low target detection speed and inaccurate tracking when a target is deformed and shielded in the prior art. The implementation scheme is as follows: 1) building a double residual error deep classification network and training the double residual error deep classification network; 2) extracting a template network and a detection network from a double residual error deep classification network; 3) extracting template features by using a template network; 4) extracting detection characteristics by using a detection network; 5) carrying out template matching on the template characteristics on the detection characteristics to obtain a template matching image; 6) determining the target position according to the template matching graph; 7) updating the template features according to the target position; 8) and judging whether the current frame is the last frame, if so, ending the target tracking, otherwise, taking the updated template characteristics as the template characteristics of the next frame, and returning to step 4). The method is high in tracking speed and accuracy and is used for tracking the video target with severe deformation and illumination change.

Description

Moving target tracking method based on template matching and depth classification network
Technical Field
The invention belongs to the technical field of image processing, and further relates to a moving target tracking method which can be used for tracking video targets with severe deformation, lens jitter, scale change, illumination change and the like.
Background
The main task of tracking the moving target requires learning a tracker under the condition of only knowing initial frame information of the target to be tracked, so that the tracker can accurately predict the position of the target to be tracked in the next frame of a video sequence. With the continuous and deep understanding of people on the computer vision field, the moving target tracking is widely applied and developed in the field, and due to the continuous application of the deep learning in the image classification and image segmentation fields, the deep learning method is also gradually applied to the target tracking field. Compared with a manual feature extraction method which excessively depends on prior knowledge of a designer in the traditional tracking method, the deep learning method can utilize the advantages of big data, and a neural network can automatically learn features through training of a large amount of data, so that a large amount of tracking algorithms exist at present to realize the tracking of moving targets. However, due to the influence of objective factors such as occlusion, background clutter, appearance deformation, illumination change, and view angle change, it is still very challenging to accurately track the target.
The patent document "an anti-occlusion target tracking method" (patent application No. 201610818828.7, publication No. 106408591a) applied by Nanjing aerospace university discloses a target tracking method based on detection, tracking and learning. Firstly, determining a target area according to an initial image frame, and forming an initial target template through the target area by a tracker; secondly, initializing parameters of a cascade detector; then, adding a shielding detection mechanism and updating a threshold value in real time; then, respectively calculating the tracking confidence and the detection confidence of the tracker and the detector to the target; and finally, integrating the tracking result according to the confidence coefficient, if the tracker fails to track, initializing by using the detection result, and updating the corresponding parameters of the detector by the tracking result through a learning module. The method has the disadvantages that the weighted result of the target template and the background template is used as a confidence value, the fluctuation condition of the response of the target to be tracked cannot be reflected, the identification capability of the classifier obtained by training is not strong enough, and the target cannot be accurately tracked for a long time when the target is in intense illumination change and moves rapidly.
A method for tracking a moving target by using local feature learning is disclosed in a patent document "target tracking method based on local feature learning" (patent application No. 201610024953.0, publication No. 108038435a) applied by the university of south china agriculture. The method comprises the specific steps of (1) decomposing a target area and a background area into a large number of local units, and training and constructing an appearance model in a deep learning mode; (2) calculating the confidence coefficient that each local area of the next frame of image belongs to the target to obtain a confidence coefficient map for positioning the target; (3) setting a threshold value TposAnd TnegThe threshold value is larger than TposAdding the local area into the target sample set, and reducing the threshold value to be less than TnegAdding a background sample set into the local area, and updating the appearance model. The method has the disadvantages that the type of the sample of each local area of the image needs to be judged by setting a threshold value, and when a target to be tracked is shielded to a large extent, the target sample or the background sample can be wrongly classified, so that the updated model cannot continuously and accurately track the target.
Disclosure of Invention
The invention aims to provide a moving target tracking method based on template matching and a depth classification network aiming at the defects of the prior art so as to realize accurate and effective tracking of a target under the condition that the target is deformed, changed in scale or shielded.
Firstly, selecting a training mechanism for offline training aiming at the problem of insufficient training samples; secondly, a template network and a detection network are established by utilizing the Resnet50, the template network is utilized to extract the characteristics of the template image, and the detection network is utilized to extract the characteristics of the image to be detected; and finally, matching the extracted template features with the features extracted from the detection template to determine the position of the target, wherein the specific steps comprise the following steps:
(1) building a double residual depth classification network model:
(1a) two depth residual error neural networks ResNet50 are used as front-end networks of a double-residual error depth classification network model, the parameters of input layers of the two depth residual error neural networks are different, and the parameters of other layers are the same.
(1b) Two 3-layer fully-connected networks are set up to serve as a back-end network of a double-residual-error depth classification network model, the first layer of each fully-connected network is an input layer, the second layer is a hidden layer, the third layer is an output layer, the parameters of the first layer of each fully-connected network are different, and the parameters of the second layer and the third layer of each fully-connected network are the same;
(2) inputting the ImageNet classification data set into a double-residual-error deep classification network model, and updating the weight of each node in the double-residual-error deep classification network model by using a random gradient descent method to obtain a trained double-residual-error deep classification network model;
(3) deleting all layers behind the penultimate hidden layer of the depth residual error network ResNet50 in the trained double-residual error depth classification network model to obtain a template network model and a detection network model;
(4) extracting a template feature map by using a template network:
(4a) inputting a first frame image in a video image sequence containing a target to be tracked, and determining a rectangular frame by one time of the length and width of the target to be tracked at the center of the initial position of the target to be tracked;
(4b) intercepting a target image from the rectangular frame, and adjusting the size of the image to 224 multiplied by 3 pixels to obtain a template image;
(4c) inputting a template image into a template network, extracting the features of the image, forming an image feature map by all the features, and outputting 2048 template feature maps of 7 multiplied by 7 on the last layer of the template network;
(5) extracting a detection feature map by using a detection network:
(5a) inputting an image to be detected containing a target to be tracked, and determining a rectangular frame by the length and width twice of the target to be tracked at the center of the initial position of the target to be tracked;
(5b) intercepting a target image from the rectangular frame, and adjusting the size of the image to 448 multiplied by 3 pixels to obtain a detection image;
(5c) inputting a detection image into a detection network, extracting the characteristics of the image, forming an image characteristic diagram by all the characteristics, and outputting 2048 detection characteristic diagrams of 14 multiplied by 14 at the last layer of the detection network;
(6) template matching:
(6a) 2048 template feature maps and 2048 detection feature maps are in one-to-one correspondence to form 2048 template detection feature pairs;
(6b) in each pair of template detection feature pairs, carrying out sliding frame type convolution on a 7 × 7 template feature map on a 14 × 14 detection feature map to obtain 2048 14 × 14 template matching maps;
(6c) and (3) corresponding 14 × 14 pixel points in the 2048 template matching images one by one, and performing summation operation on matching values on the corresponding points to obtain a 14 × 14 characteristic response image.
(7) Determining the position of the target:
(7a) sequencing the response values in the 14 multiplied by 14 characteristic response graph from large to small, selecting the normalization coordinate corresponding to the first 10 response values, and solving the average normalization coordinate value;
(7b) calculating the position of the tracking target in the video frame image according to the average normalized coordinate value through the following formula;
x′=x×m+a-w,y′=y×n+b-h
wherein x 'represents the abscissa value of the first pixel at the upper left corner of the target image in the video frame, x represents the average normalized abscissa, a represents the abscissa value of the initial position of the target to be tracked, w represents the width of the template image, m represents the width of the detection image, y' represents the ordinate value of the first pixel at the upper left corner of the target image in the video frame, y represents the average normalized ordinate, b represents the ordinate value of the initial position of the target to be tracked, h represents the height of the template image, and n represents the height of the detection image.
(8) Extracting a tracking target feature map according to the position of the tracking target in the video frame image, and updating the template feature map according to the tracking target feature map: z ═ η Z1+(1-η)Z2Wherein Z represents the updated template feature map, Z1Representing the template feature map in the previous frame of image, wherein eta represents the learning rate of template updating, wherein eta is less than or equal to 1, Z2Representing a tracking target feature map in a current video frame;
(9) and (5) judging whether the current frame video image is the last frame video image of the video image sequence to be tracked, if so, ending the tracking of the moving target to be tracked, otherwise, taking the updated template feature map as the template feature map of the next frame target to be tracked, and returning to the step (5) to finish the target tracking.
Compared with the prior art, the invention has the following advantages:
firstly, the invention uses the mechanism of the offline training classification data set, and overcomes the problems that in the prior art, when a network is trained, a first frame image is used for repeated iteration, overfitting is easy, and when a target to be tracked is deformed to a large extent, tracking is inaccurate, so that the invention can more accurately track the target when the target to be tracked is deformed to a large extent.
Secondly, because the invention constructs a double-residual-error depth classification network model, utilizes the template network and the image characteristics extracted by the detection network to carry out matching, and uses the response value to judge the position of the target to be tracked, the invention overcomes the problem that when the target to be tracked generates a large degree of shielding, positive and negative samples are easily mistakenly divided, so that the updated model cannot continuously and accurately track the target in the prior art, and the invention can more accurately track the target when the target to be tracked generates a large degree of shielding.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a simulation of the present invention.
Detailed Description
Embodiments and effects of the present invention will be further described below with reference to the accompanying drawings.
Referring to fig. 1, the specific steps for the present invention are as follows.
Step 1, building a double residual error depth classification network model.
1.1) setting a front-end network:
adjusting the input layer parameters of two existing depth residual error neural networks ResNet50, wherein the number of neurons of the first network input layer is set to be 224 multiplied by 3, the number of neurons of the second network input layer is set to be 448 multiplied by 3, the parameters of other layers are kept unchanged, and the two depth residual error neural networks are used as front-end networks of a double-residual error depth classification network model;
1.2) setting a back-end network:
the method comprises the following steps of constructing two three-layer fully-connected networks as a rear-end network of a double-residual-error depth classification network model, wherein a first layer of each fully-connected network is an input layer, a second layer is a hidden layer, a third layer is an output layer, parameters of the first layers of the two fully-connected networks are different, parameters of the second layer and the third layer are the same, and parameters of all layers in the two fully-connected networks are as follows:
the number of the first layer neurons of the first network is set to 1 × 1 × 2048, and the number of the first layer neurons of the second network is set to 2 × 2 × 2048;
the number of the neurons of the second layer of the two networks is simultaneously set to 1024, and the activation function is simultaneously set to a modified linear unit ReLU function;
the number of neurons in the third layer of the two networks was set to 1000 at the same time, and the activation function was set to the Softmax function at the same time.
And 2, training a double residual error deep classification network model.
Inputting the ImageNet classification data set into the double-residual-error deep classification network model built in the step 1, and updating the weight of each node in the double-residual-error deep classification network model by using a random gradient descent method to obtain a trained double-residual-error deep classification network model:
(2a) randomly selecting a number in the range of (0,0.1), and using the number as an initial weight of each node in the double-residual depth classification network model;
(2b) taking the initial weight of each node as the current weight of each node in the double-residual depth classification network model in the first iteration process;
(2c) random selection from ImageNet classification dataset 2nThe sample image is propagated in the forward direction in the double-residual depth classification network model, and the output layer of the double-residual depth classification network model outputs 2nN is more than or equal to 3 and less than or equal to 7 according to the classification result of the sample images;
(2d) calculating the average logarithmic loss value of the classification result according to the classification result of the sample image and the following formula:
Figure BDA0002027257450000051
wherein L represents the average log loss value of the classification result, N represents the total number of randomly selected sample images, i represents the serial number of the input sample image, yiClass representing the ith input sample image, positiveY of class sampleiValue 1, y of negative type sampleiValue of 0, piRepresenting the output value of the double residual error depth classification network model of the ith sample image in the classification result;
(2e) calculating the partial derivative of the current weight of each node in the double-residual depth classification network by using the average logarithmic loss value to obtain the gradient value delta w of the current weight of each node in the double-residual depth classification network modelk
(2f) Calculating the updated weight of each node in the double-residual depth classification network model according to the gradient value of the current weight of the node:
Figure BDA0002027257450000052
wherein,
Figure BDA0002027257450000053
represents the updated weight value, w, of the kth node of the double-residual depth classification network modelkRepresenting the current weight of the kth node of the double-residual depth classification network model, wherein alpha represents the learning rate and the value range is (0, 1);
(2g) and (3) judging whether the sample images in the training data set are all selected, if so, obtaining a trained double-residual deep classification network model, otherwise, executing (2c) after the updated weight of each node is taken as the current weight.
And 3, extracting a template network model and a detection network model.
In the trained double-residual-error deep classification network model obtained in the step 2, the network layers after the 49 th layer of the two deep networks are deleted respectively, and the rest of the networks become new networks.
And extracting a template network model and a detection network model from the rest networks according to the parameters of the input layer, namely using the rest networks with the parameters of the input layer of 224 multiplied by 3 as the template network model and using the rest networks with the parameters of the input layer of 448 multiplied by 3 as the detection network model.
And 4, extracting a template characteristic graph by using the template network.
(4a) Inputting a first frame image in a video image sequence containing a target to be tracked, and determining a rectangular frame by the length and width of the target to be tracked at the center of the initial position of the target to be tracked;
(4b) intercepting a target image from the rectangular frame, and adjusting the size of the image to 224 multiplied by 3 pixels to obtain a template image;
(4c) inputting the template image into the template network obtained in the step 3 to perform feature extraction on the template image, forming feature maps by using the extracted features, outputting 2048 feature maps of 7 × 7 in the last layer of the template network, and taking the 2048 feature maps of 7 × 7 as the template feature maps.
And 5, extracting a detection characteristic diagram by using the detection network.
(5a) Inputting an image to be detected containing a target to be tracked, and determining a rectangular frame by twice the length and the width of the target to be tracked at the center of the initial position of the target to be tracked.
(5b) Intercepting a target image from the rectangular frame, and adjusting the size of the image to 448 multiplied by 3 pixels to obtain a detection image;
(5c) inputting the detection image into the detection network obtained in the step 3 to perform feature extraction on the detection image, forming a feature map by using the extracted features, outputting 2048 feature maps of 14 × 14 in the last layer of the detection network, and taking the 2048 feature maps of 14 × 14 as the detection feature map.
And 6, template matching.
(6a) The 2048 template feature maps obtained in the step 4 and 2048 detection feature maps obtained in the step 5 are in one-to-one correspondence to form 2048 template detection feature pairs;
(6b) in each pair of template detection feature pairs, taking the upper left corner of a detection feature map as a starting point and 1 pixel as a step length, sequentially translating the corresponding template feature maps to the upper right corner, the lower right corner and the lower left corner of the detection feature map, and finally translating the template feature maps back to the upper left corner for convolution operation to obtain 2048 template matching maps of 14 multiplied by 14;
(6c) and (3) corresponding 14 × 14 pixel points in the 2048 template matching images one by one, and performing summation operation on matching values on the corresponding points to obtain a 14 × 14 characteristic response image.
And 7, determining the position of the target.
(7a) Sorting the response values in the 14 × 14 characteristic response graph obtained in the step 6 from large to small, selecting a normalization coordinate corresponding to the first 10 response values, and averaging the 10 normalization coordinate values to obtain an average normalization coordinate value (x, y);
(7b) calculating the position of the tracking target in the video frame image according to the average normalized coordinate value by the following formula:
x′=x×m+a-w,
y′=y×n+b-h,
wherein x 'represents the abscissa value of the first pixel at the upper left corner of the target image in the video frame, a represents the abscissa value of the initial position of the target to be tracked, w represents the width of the template image, m represents the width of the detection image, y' represents the ordinate value of the first pixel at the upper left corner of the target image in the video frame, b represents the ordinate value of the initial position of the target to be tracked, h represents the height of the template image, and n represents the height of the detection image.
And 8, updating the template.
In the detection feature map, taking the position of the tracking target in the video frame image obtained in the step 7 as the center, and taking the initial size of the tracking target as the size to perform a cutting operation, so as to obtain a tracking target feature map, and according to the tracking target feature map, updating the template feature map:
Z=ηZ1+(1-η)Z2
wherein Z represents the updated template feature map, Z1Representing the template characteristic diagram in the previous frame of image, wherein eta represents the learning rate of updating the template, eta is less than or equal to 1, and Z2Representing the feature map of the tracked object in the current video frame.
And 9, judging whether the current frame video image is the last frame video image of the video image sequence to be tracked, if so, ending the tracking of the moving target to be tracked, otherwise, taking the template characteristic image updated in the step 8 as the template characteristic image of the next frame target to be tracked, returning to the step 5, and finishing the target tracking.
The effect of the present invention will be further explained with the simulation experiment.
1. Simulation experiment conditions are as follows:
the hardware test platform of the simulation experiment of the invention is as follows: the CPU is intel Core i5-6500, the main frequency is 3.2GHz, the memory is 8GB, and the GPU is NVIDIA TITAN Xp; the software platform is as follows: ubuntu 16.04 LTS, 64-bit operating system, python 3.6.5.
2. Simulation content and results:
the method of the invention is used for carrying out a moving Object tracking simulation experiment on a video image sequence collected from an Object tracking benchmark 2015 database, wherein the video image sequence moves on the road for a man, the video image sequence has 252 frames of video images in total, and the result of the simulation experiment is shown in a figure (2), wherein:
fig. 2(a) is a 1 st frame image of a video image sequence acquired by a simulation experiment, and a white-line rectangular frame in fig. 2(a) represents an initial position of a target to be tracked.
Fig. 2(b) is a tracking result of a video image when appearance deformation and target occlusion occur in a frame of target to be tracked for target tracking of an acquired video image sequence in a simulation experiment of the present invention, wherein a gray line rectangular frame marks a predicted position of the target to be tracked, a white line rectangular frame marks a real position of the target to be tracked, and as can be seen from the diagram, the target to be tracked has appearance deformation and target occlusion compared with the target to be tracked in fig. 2 (a).
Fig. 2(c) is a tracking result of a frame of target to be tracked when appearance deformation and illumination change occur to a target to be tracked which performs target tracking on an acquired video image sequence in a simulation experiment of the present invention, wherein a gray line rectangular frame marks a predicted position of the target to be tracked, and a white line rectangular frame marks a position of the target to be tracked. As can be seen from the figure, the target to be tracked has appearance deformation and illumination enhancement compared with the target to be tracked in fig. 2 (a).
As can be seen from fig. 2(b) and 2(c), the target framed by the rectangular frame with gray lines in the figure is consistent with the target framed by the rectangular frame with white lines, which shows that the present invention can accurately and efficiently track the target when the target to be tracked in the video image is deformed, changed in illumination, and shielded.

Claims (5)

1. A moving target tracking method based on template matching and a depth classification network is characterized by comprising the following steps:
(1) building a double residual depth classification network model:
(1a) two depth residual error neural networks ResNet50 are used as front-end networks of a double-residual error depth classification network model, the parameters of input layers of the two depth residual error neural networks are different, and the parameters of other layers are the same;
(1b) two 3-layer fully-connected networks are set up to serve as a back-end network of a double-residual-error depth classification network model, the first layer of each fully-connected network is an input layer, the second layer is a hidden layer, the third layer is an output layer, the parameters of the first layer of each fully-connected network are different, and the parameters of the second layer and the third layer of each fully-connected network are the same;
(2) inputting the ImageNet classification data set into a double-residual-error deep classification network model, and updating the weight of each node in the double-residual-error deep classification network model by using a random gradient descent method to obtain a trained double-residual-error deep classification network model;
(3) deleting all layers behind the penultimate hidden layer of the depth residual error network ResNet50 in the trained double-residual error depth classification network model to obtain a template network model and a detection network model;
(4) extracting a template feature map by using a template network:
(4a) inputting a first frame image in a video image sequence containing a target to be tracked, and determining a rectangular frame by one time of the length and width of the target to be tracked at the center of the initial position of the target to be tracked;
(4b) intercepting a target image from the rectangular frame, and adjusting the size of the image to 224 multiplied by 3 pixels to obtain a template image;
(4c) inputting a template image into a template network, extracting the features of the image, forming an image feature map by all the features, and outputting 2048 template feature maps of 7 multiplied by 7 on the last layer of the template network;
(5) extracting a detection feature map by using a detection network:
(5a) inputting an image to be detected containing a target to be tracked, and determining a rectangular frame by the length and width twice of the target to be tracked at the center of the initial position of the target to be tracked;
(5b) intercepting a target image from the rectangular frame, and adjusting the size of the image to 448 multiplied by 3 pixels to obtain a detection image;
(5c) inputting a detection image into a detection network, extracting the characteristics of the image, forming an image characteristic diagram by all the characteristics, and outputting 2048 detection characteristic diagrams of 14 multiplied by 14 at the last layer of the detection network;
(6) template matching:
(6a) 2048 template feature maps and 2048 detection feature maps are in one-to-one correspondence to form 2048 template detection feature pairs;
(6b) in each pair of template detection feature pairs, carrying out sliding frame type convolution on a 7 × 7 template feature map on a 14 × 14 detection feature map to obtain 2048 14 × 14 template matching maps;
(6c) the method comprises the steps of enabling 14 × 14 pixel points in 2048 template matching graphs to correspond one to one, and conducting summation operation on matching values on corresponding points to obtain a 14 × 14 characteristic response graph;
(7) determining the position of the target:
(7a) sequencing the response values in the 14 multiplied by 14 characteristic response graph from large to small, selecting the normalization coordinate corresponding to the first 10 response values, and solving the average normalization coordinate value;
(7b) calculating the position of the tracking target in the video frame image according to the average normalized coordinate value through the following formula;
x′=x×m+a-w,y′=y×n+b-h,
wherein x 'represents the abscissa value of the first pixel at the upper left corner of a target image in a video frame, x represents the average normalized abscissa, a represents the abscissa value of the initial position of the target to be tracked, w represents the width of a template image, m represents the width of a detection image, y' represents the ordinate value of the first pixel at the upper left corner of the target image in the video frame, y represents the average normalized ordinate, b represents the ordinate value of the initial position of the target to be tracked, h represents the height of the template image, and n represents the height of the detection image;
(8) extracting a tracking target feature map according to the position of the tracking target in the video frame image, and updating the template feature map according to the tracking target feature map: z ═ η Z1+(1-η)Z2Wherein Z represents the updated template feature map, Z1Representing the template feature map in the previous frame of image, wherein eta represents the learning rate of template updating, wherein eta is less than or equal to 1, Z2Representing a tracking target feature map in a current video frame;
(9) and (5) judging whether the current frame video image is the last frame video image of the video image sequence to be tracked, if so, ending the tracking of the moving target to be tracked, otherwise, taking the updated template feature map as the template feature map of the next frame target to be tracked, and returning to the step (5) to finish the target tracking.
2. The method of claim 1, wherein the number of neurons in the input layers of the two deep residual neural networks ResNet50 in (1a) is set to 224 × 224 × 3 and 448 × 448 × 3, respectively.
3. The method of claim 1, wherein the parameters of each of the two fully-connected networks in (1b) are set as follows:
the number of the first layer neurons is 1 × 1 × 2048 and 2 × 2 × 2048, respectively;
the number of the neurons of the second layer is 1024, and the activation function of the neurons is set as a modified linear unit ReLU function;
the number of neurons in the third layer is 1000, and the activation function of the neurons is set to be a Softmax function.
4. The method according to claim 1, wherein in (2), a random gradient descent method is used to update the weight of each node in the dual-residual depth classification network model, and the specific steps are as follows:
(2a) randomly selecting a number in the range of (0,0.1), and using the number as an initial weight of each node in the double-residual depth classification network model;
(2b) taking the initial weight of each node as the current weight of each node in the double-residual depth classification network model in the first iteration process;
(2c) random selection from ImageNet classification dataset 2nThe sample images are propagated in the forward direction in the double-residual depth classification network model, wherein n is more than or equal to 3 and less than or equal to 7, and the output layer of the double-residual depth classification network model outputs 2nClassifying results of the sample images;
(2d) calculating the average logarithmic loss value of the classification result according to the classification result of the sample image and the following formula:
Figure FDA0002027257440000031
wherein L represents the average log loss value of the classification result, N represents the total number of randomly selected sample images, i represents the serial number of the input sample image, yiY representing the class of the ith input sample image, positive class sampleiValue 1, y of negative class sampleiThe value is taken to be 0, piRepresenting the output value of the double residual error depth classification network model of the ith sample image in the classification result;
(2e) calculating the partial derivative of the current weight of each node in the double-residual depth classification network by using the average logarithmic loss value to obtain the gradient value delta w of the current weight of each node in the double-residual depth classification network modelk
(2f) Calculating the updated weight of each node in the double-residual depth classification network model according to the gradient value of the current weight of the node:
Figure FDA0002027257440000032
wherein,
Figure FDA0002027257440000033
representing a k-th node update of a dual residual depth classification network modelThe latter weight, wkRepresenting the current weight of the kth node of the double-residual depth classification network model, wherein alpha represents the learning rate and the value range is (0, 1);
(2g) and (3) judging whether all the sample images in the training data set are selected, if so, obtaining a trained double-residual deep classification network model, and otherwise, executing (2c) after the updated weight of each node is taken as the current weight.
5. The method of claim 1, wherein the sliding frame convolution in (6b) is performed by using the top left corner of the detected feature map in each pair of template detected feature pairs as a starting point, using 1 pixel as a step size, sequentially translating the corresponding template feature map to the top right corner, the bottom right corner and the bottom left corner of the detected feature map, and finally translating back to the top left corner.
CN201910297980.9A 2019-04-15 2019-04-15 Moving target tracking method based on template matching and depth classification network Active CN110033473B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910297980.9A CN110033473B (en) 2019-04-15 2019-04-15 Moving target tracking method based on template matching and depth classification network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910297980.9A CN110033473B (en) 2019-04-15 2019-04-15 Moving target tracking method based on template matching and depth classification network

Publications (2)

Publication Number Publication Date
CN110033473A CN110033473A (en) 2019-07-19
CN110033473B true CN110033473B (en) 2021-04-20

Family

ID=67238315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910297980.9A Active CN110033473B (en) 2019-04-15 2019-04-15 Moving target tracking method based on template matching and depth classification network

Country Status (1)

Country Link
CN (1) CN110033473B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647836B (en) * 2019-09-18 2022-09-20 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN110705479A (en) * 2019-09-30 2020-01-17 北京猎户星空科技有限公司 Model training method, target recognition method, device, equipment and medium
CN110766725B (en) * 2019-10-31 2022-10-04 北京市商汤科技开发有限公司 Template image updating method and device, target tracking method and device, electronic equipment and medium
CN110766724B (en) * 2019-10-31 2023-01-24 北京市商汤科技开发有限公司 Target tracking network training and tracking method and device, electronic equipment and medium
CN111145215B (en) * 2019-12-25 2023-09-05 北京迈格威科技有限公司 Target tracking method and device
CN110930428B (en) * 2020-02-19 2020-08-14 成都纵横大鹏无人机科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN111461010B (en) * 2020-04-01 2022-08-12 贵州电网有限责任公司 Power equipment identification efficiency optimization method based on template tracking
CN111640136B (en) * 2020-05-23 2022-02-25 西北工业大学 Depth target tracking method in complex environment
CN111815677A (en) * 2020-07-10 2020-10-23 中山大学新华学院 Target tracking method and device, terminal equipment and readable storage medium
CN112287906B (en) * 2020-12-18 2021-04-09 中汽创智科技有限公司 Template matching tracking method and system based on depth feature fusion
CN112818801B (en) * 2021-01-26 2024-04-26 每步科技(上海)有限公司 Motion counting method, recognition device, recognition system and storage medium
CN115100441B (en) * 2022-08-23 2022-11-18 浙江大华技术股份有限公司 Object detection method, electronic device, and storage medium
CN116596958B (en) * 2023-07-18 2023-10-10 四川迪晟新达类脑智能技术有限公司 Target tracking method and device based on online sample augmentation

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6621929B1 (en) * 1999-06-22 2003-09-16 Siemens Corporate Research, Inc. Method for matching images using spatially-varying illumination change models
WO2007126780A2 (en) * 2006-03-28 2007-11-08 Object Video, Inc. Automatic extraction of secondary video streams
CN101867699A (en) * 2010-05-25 2010-10-20 中国科学技术大学 Real-time tracking method of nonspecific target based on partitioning
CN103150572A (en) * 2012-12-11 2013-06-12 中国科学院深圳先进技术研究院 On-line type visual tracking method
CN105719292A (en) * 2016-01-20 2016-06-29 华东师范大学 Method of realizing video target tracking by adopting two-layer cascading Boosting classification algorithm
CN105787963A (en) * 2016-02-26 2016-07-20 浪潮软件股份有限公司 Video target tracking method and device
CN107689052A (en) * 2017-07-11 2018-02-13 西安电子科技大学 Visual target tracking method based on multi-model fusion and structuring depth characteristic
CN108694723A (en) * 2018-05-11 2018-10-23 西安天和防务技术股份有限公司 A kind of target in complex environment tenacious tracking method
CN109345559A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Expand the motion target tracking method with depth sorting network based on sample

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3576987B2 (en) * 2001-03-06 2004-10-13 株式会社東芝 Image template matching method and image processing apparatus

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6621929B1 (en) * 1999-06-22 2003-09-16 Siemens Corporate Research, Inc. Method for matching images using spatially-varying illumination change models
WO2007126780A2 (en) * 2006-03-28 2007-11-08 Object Video, Inc. Automatic extraction of secondary video streams
CN101867699A (en) * 2010-05-25 2010-10-20 中国科学技术大学 Real-time tracking method of nonspecific target based on partitioning
CN103150572A (en) * 2012-12-11 2013-06-12 中国科学院深圳先进技术研究院 On-line type visual tracking method
CN105719292A (en) * 2016-01-20 2016-06-29 华东师范大学 Method of realizing video target tracking by adopting two-layer cascading Boosting classification algorithm
CN105787963A (en) * 2016-02-26 2016-07-20 浪潮软件股份有限公司 Video target tracking method and device
CN107689052A (en) * 2017-07-11 2018-02-13 西安电子科技大学 Visual target tracking method based on multi-model fusion and structuring depth characteristic
CN108694723A (en) * 2018-05-11 2018-10-23 西安天和防务技术股份有限公司 A kind of target in complex environment tenacious tracking method
CN109345559A (en) * 2018-08-30 2019-02-15 西安电子科技大学 Expand the motion target tracking method with depth sorting network based on sample

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Tracking with spatial constrained coding;Tian, Xiaolin 等;《IET COMPUTER VISION》;20150228;第9卷(第1期);63-74 *

Also Published As

Publication number Publication date
CN110033473A (en) 2019-07-19

Similar Documents

Publication Publication Date Title
CN110033473B (en) Moving target tracking method based on template matching and depth classification network
CN110210463B (en) Precise ROI-fast R-CNN-based radar target image detection method
CN111259930B (en) General target detection method of self-adaptive attention guidance mechanism
Yang et al. Real-time face detection based on YOLO
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108154118B (en) A kind of target detection system and method based on adaptive combined filter and multistage detection
CN110084292B (en) Target detection method based on DenseNet and multi-scale feature fusion
CN110119728A (en) Remote sensing images cloud detection method of optic based on Multiscale Fusion semantic segmentation network
CN107977683B (en) Joint SAR target recognition method based on convolution feature extraction and machine learning
CN111862119A (en) Semantic information extraction method based on Mask-RCNN
CN109377511B (en) Moving target tracking method based on sample combination and depth detection network
CN111833322B (en) Garbage multi-target detection method based on improved YOLOv3
CN112613350A (en) High-resolution optical remote sensing image airplane target detection method based on deep neural network
CN109345559B (en) Moving target tracking method based on sample expansion and depth classification network
CN110728694A (en) Long-term visual target tracking method based on continuous learning
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN104599291B (en) Infrared motion target detection method based on structural similarity and significance analysis
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
CN115393631A (en) Hyperspectral image classification method based on Bayesian layer graph convolution neural network
CN117315556A (en) Improved Vision Transformer insect fine grain identification method
CN114743045B (en) Small sample target detection method based on double-branch area suggestion network
CN113627245B (en) CRTS target detection method
CN113450321B (en) Single-stage target detection method based on edge detection
CN115223080A (en) Target segmentation method based on non-local feature aggregation neural network
CN111914751B (en) Image crowd density identification detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant