CN109345559B - Moving target tracking method based on sample expansion and depth classification network - Google Patents

Moving target tracking method based on sample expansion and depth classification network Download PDF

Info

Publication number
CN109345559B
CN109345559B CN201811005680.0A CN201811005680A CN109345559B CN 109345559 B CN109345559 B CN 109345559B CN 201811005680 A CN201811005680 A CN 201811005680A CN 109345559 B CN109345559 B CN 109345559B
Authority
CN
China
Prior art keywords
target
rectangular
tracked
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811005680.0A
Other languages
Chinese (zh)
Other versions
CN109345559A (en
Inventor
田小林
荀亮
李芳�
李帅
焦李成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201811005680.0A priority Critical patent/CN109345559B/en
Publication of CN109345559A publication Critical patent/CN109345559A/en
Application granted granted Critical
Publication of CN109345559B publication Critical patent/CN109345559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a moving target tracking method based on sample expansion and a depth classification network. The invention has the following steps: (1) building a depth classification network model; (2) generating a positive sample set; (3) generating a negative sample set; (4) training a deep classification network model; (5) extracting a target characteristic network model; (6) predicting the target position of the next frame image; (7) judging whether the current frame image is the last frame video image of the video sequence to be tracked, if so, executing the step (8), otherwise, executing the step (6); (8) and finishing the tracking of the moving target to be tracked. The invention trains the depth classification network by using the expanded sample set and determines the position of the target to be tracked by using the characteristic response value, thereby solving the problem of inaccurate tracking caused by appearance deformation and shielding of the target.

Description

Moving target tracking method based on sample expansion and depth classification network
Technical Field
The invention belongs to the technical field of image processing, and further relates to a moving target tracking method based on sample expansion and a depth classification network in the field of computer vision. The method can be used for tracking the moving target in complex scenes such as video monitoring, robot navigation, video sequences acquired by an unmanned aerial vehicle and the like.
Background
The main task of the moving object tracking is to detect a moving object from a continuous video image sequence, and then determine the position of the moving object in each frame of image. With the continuous and deep understanding of people on the field of computer vision, the moving target tracking is widely applied and developed in the field, and the current deep learning method is gradually applied to the field of target tracking. Compared with a manual feature extraction method which excessively depends on the priori knowledge of a designer in the traditional tracking method, the deep learning method can utilize the advantages of big data, and the neural network can automatically learn features through the training of a large amount of data. Under the condition that the training data is large enough, the feature extraction obtained by deep learning is far better than the feature extracted by a manual method. However, when the deep learning method is applied to the field of target tracking, the main problem lies in the loss of training data: one of the advantages of the depth model comes from the efficient learning of a large amount of annotated training data, while target tracking only provides the bounding-box of the first frame as training data.
The patent document of the university in zhongshan "a feature extraction and target tracking method based on a convolutional neural network" (patent application No. 201711262806.8, publication No. 105678338a) discloses a tracking method for a moving target using a deep convolutional network. The method comprises the specific steps of (1) constructing and pre-training a network model; (2) inputting a first frame of video image into the reconstructed network for repeated iteration according to the video sequence, and training a network model on line; (3) inputting a video sequence, and calculating a tracking result; (4) and evaluating the tracking result of the last frame in the video sequence, and selecting a positive sample result to be put into the network for iteration so as to update the network parameters. The method has the disadvantages that when the pre-trained network model is used for on-line training of the first frame image of the video, the image is input into the reconstructed network for repeated iteration, so that overfitting is easy to occur, when a target in a subsequent video frame is deformed to a large extent, drift is easy to occur, and long-term accurate tracking cannot be realized.
A method for tracking a moving target by using local feature learning is disclosed in a patent document "target tracking method based on local feature learning" (patent application No. 201610024953.0, publication No. 108038435a) applied by the university of south china agriculture. The method is realized by the specific steps of (1) decomposing a target area and a background area into a large number of local areasThe unit is used for training and constructing an appearance model in a deep learning mode; (2) calculating the confidence coefficient that each local area of the next frame of image belongs to the target to obtain a confidence coefficient map for positioning the target; (3) setting a threshold value TposAnd TnegThe threshold value is larger than TposAdding the local area into the target sample set, and reducing the threshold value to be less than TnegAdding a background sample set into the local area, and updating the appearance model. The method has the disadvantages that because the method needs to judge the sample type of each local area of the image by setting a threshold value, when the target to be tracked is shielded to a large extent, the target sample or the background sample can be wrongly divided, so that the updated model cannot continuously and accurately track the target.
Disclosure of Invention
The invention aims to provide a moving target tracking method based on sample expansion and a depth classification network aiming at the defects of the prior art so as to accurately and effectively track a target when the target is deformed, changed in scale or shielded.
The idea of achieving the purpose of the invention is that firstly, aiming at the problem of insufficient training samples, a sample expansion method is utilized to generate a positive and negative sample set containing a large number of images. And secondly, improving the depth residual error network ResNet50 to obtain a depth classification network model, and further extracting a target feature network model. And finally, inputting the image sequence intercepted in the candidate region into a target characteristic network model, and obtaining the specific position of the target to be tracked according to the characteristic response value.
The method comprises the following specific steps:
(1) constructing a deep classification network model:
(1a) building a 3-layer fully-connected network, wherein the first layer of the network is an input layer, the second layer is a hidden layer, and the third layer is an output layer;
(1b) the parameters of each layer in the fully connected network are set as follows: setting the number of the neurons of the first layer as 1024, and setting an activation function as a modified linear unit ReLU function; setting the number of the neurons of the second layer to be 2; setting the number of the neurons in the third layer as 2, and setting the activation function as a Sigmoid function;
(1c) taking the output of the depth residual error network ResNet50 as the input of a full-connection network to obtain a depth classification network model;
(2) generating a positive sample set:
(2a) inputting a first frame image in a video image sequence containing a target to be tracked, and determining a rectangular frame by taking the center of the initial position of the target to be tracked as the center and the length and the width of the target to be tracked as the length and the width;
(2b) 3000 rectangular target images with the same size are intercepted from the rectangular frame to form a positive sample set;
(2c) randomly selecting one rectangular target image from the positive sample set, uniformly cutting the rectangular target image into 3 multiplied by 3 small rectangular images by 3 equal parts, discarding the part which is less than 3 equal parts, randomly combining and splicing the 9 divided small rectangular images into 4000 recombined images with the same size as the rectangular target image to form a recombined image set;
(2d) in a first frame of video image, acquiring a scale change image set in a single-pixel stepping mode;
(2e) adding the recombined image set and the scale change image set into a positive sample set to form an expanded positive sample set;
(3) generating a negative sample set:
(3a) in a first frame image of a video, determining 5 large rectangular frames by taking an initial position of a target to be tracked as a center, wherein the length and the width of each large rectangular frame are respectively 1.5, 1.6, 1.7, 1.8 and 1.9 times of the length and the width of the target to be tracked;
(3b) selecting a rectangular sliding frame with the same size as the target to be tracked;
(3c) sliding the rectangular sliding frame in each large rectangular frame, intercepting images in the rectangular sliding frame after each translation, and forming an expanded negative sample set by all the intercepted images;
(4) training a deep classification network model:
(4a) inputting the extended positive sample set and the extended negative sample set into a deep classification network model;
(4b) updating the weight of each node in the deep classification network model by using a random gradient descent method to obtain a trained deep classification network model;
(5) extracting a target characteristic network model:
deleting an output layer of a full-connection network in the trained deep classification network model, taking a hidden layer of the full-connection network as an output layer of a target characteristic network, and extracting the target characteristic network model;
(6) predicting the target position of the current frame image:
(6a) loading a next frame image in a video sequence to be tracked as a current frame image, establishing a rectangular area by using the size which is 1.5 times of the length and width of a target to be tracked in the current frame image by taking the position of the target to be tracked of the loaded previous frame image as the center, and taking the rectangular area as a candidate area of the target to be tracked;
(6b) intercepting rectangular images in a candidate region of a target to be tracked in a sliding mode with step change, and forming a candidate image sequence by all the intercepted rectangular images;
(6c) inputting the candidate image sequence into a target characteristic network model, outputting a characteristic response value sequence corresponding to each candidate image, and selecting a maximum characteristic response value from the characteristic response value sequence;
(6d) in the current frame image, taking the position of the candidate image corresponding to the maximum characteristic response value as the position of the tracking target;
(7) judging whether the current frame video image is the last frame video image of the video image sequence to be tracked, if so, executing the step (8), otherwise, executing the step (6);
(8) and finishing the tracking of the moving target to be tracked.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention generates the extended positive and negative sample sets, the invention overcomes the problems that in the prior art, when the network is trained, the first frame image is used for repeated iteration, so that overfitting is easy, and when the target to be tracked is deformed to a large extent, the tracking is inaccurate, so that the invention can more accurately track the target when the target to be tracked is deformed to a large extent.
Secondly, the invention constructs a characteristic network model of the target, and judges the position of the target to be tracked by using the target characteristic response value, thereby overcoming the problem that the updated model can not continuously and accurately track the target due to the fact that positive and negative samples are easily mistakenly divided when the target to be tracked generates a large degree of shielding in the prior art, and enabling the invention to more accurately track the target when the target to be tracked generates a large degree of shielding.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a simulation of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The specific steps of the present invention are further described with reference to fig. 1.
Step 1, constructing a deep classification network model.
And (3) constructing a 3-layer full-connection network, wherein the first layer of the network is an input layer, the second layer is a hidden layer, and the third layer is an output layer.
The parameters of each layer in the fully connected network are set as follows: setting the number of the neurons of the first layer as 1024, and setting an activation function as a modified linear unit ReLU function; setting the number of the neurons of the second layer to be 2; the number of neurons in the third layer is set to 2, and the activation function is set to a Sigmoid function.
And taking the output of the depth residual error network ResNet50 as the input of the full-connection network to obtain a depth classification network model.
And 2, generating a positive sample set.
Inputting a first frame image in a video image sequence containing a target to be tracked, and determining a rectangular frame by taking the center of the initial position of the target to be tracked as the center and the length and the width of the target to be tracked as the length and the width.
3000 rectangular target images with the same size are intercepted from the rectangular frame to form a positive sample set.
Randomly selecting one rectangular target image from the positive sample set, uniformly cutting the rectangular target image into 3 multiplied by 3 small rectangular images by 3 equal parts, discarding the part which is less than 3 equal parts, randomly combining and splicing the 9 divided small rectangular images into 4000 recombined images with the same size as the rectangular target image, and forming a recombined image set.
In a first frame image of the video, a scale change image set is obtained in a single-pixel stepping mode.
The specific steps of the single-pixel stepping mode are as follows:
step 1, forming a small rectangular frame by the center of the initial position of the target to be tracked and the length and the width which are 0.5 times of the initial position.
And step 2, keeping the center of the small rectangular frame unchanged, increasing the width of the small rectangular frame by 1 pixel, and taking the product of the aspect ratio of the small rectangular frame and the increased width as the length to form a temporary rectangular frame.
And 3, judging whether the width of the temporary rectangular frame is more than or equal to 3 times of the width of the small rectangular frame, if so, executing the fourth step, otherwise, intercepting and storing the rectangular image in the temporary rectangular frame and then executing the second step.
And 4, forming all the intercepted images into a scale change image set.
And adding the recombined image set and the scale change image set into the positive sample set to form an expanded positive sample set.
And 3, generating a negative sample set.
In a first frame image of a video, 5 large rectangular frames are determined by taking an initial position of a target to be tracked as a center, and the length and the width of each large rectangular frame are respectively 1.5, 1.6, 1.7, 1.8 and 1.9 times of the length and the width of the target to be tracked.
And selecting a rectangular sliding frame with the same size as the target to be tracked.
And sliding the rectangular sliding frame in each large rectangular frame, intercepting the images in the rectangular sliding frame after each translation, and forming an expanded negative sample set by all the intercepted images.
The sliding mode of the rectangular sliding frame in each large rectangular frame is as follows: and (3) sequentially translating the rectangular sliding frame to the upper right corner, the lower right corner and the lower left corner of each large rectangular frame by taking the upper left corner of each large rectangular frame as a starting point and 1 pixel as a step length, and finally translating the rectangular sliding frame back to the upper left corner.
And 4, training a deep classification network model.
The augmented positive sample set and the augmented negative sample set are input to a deep classification network model.
And updating the weight of each node in the deep classification network model by using a random gradient descent method to obtain the trained deep classification network model.
The random gradient descent method comprises the following specific steps:
and step 1, randomly selecting a number in the range of (0,0.1), and using the number as an initial weight of each node in the deep classification network model.
And 2, taking the initial weight of each node as the current weight of each node in the deep classification network model in the first iteration process.
Step 3, randomly selecting 2 from the positive and negative sample setsnThe sample images are propagated in the depth classification network model in the forward direction, wherein n is more than or equal to 3 and less than or equal to 7, and the output layer of the depth classification network model outputs 2nAnd (5) classifying the sample images.
And 4, calculating the average logarithmic loss value of the classification result according to the following formula:
Figure GDA0003088551260000061
wherein L represents the average log loss value of the classification result, N represents the total number of randomly selected sample images, Σ represents the summation operation, i represents the serial number of the input sample image, y represents the average log loss value of the classification result, y represents the total number of the randomly selected sample images, y represents the sum of the sumiY representing the class of the ith input sample image, positive class sampleiValue 1, y of negative class sampleiThe value takes 0, log denotes base 10 logarithm operation, piAnd representing the output value of the depth classification network model of the ith sample image in the classification result.
And 5, solving the partial derivative of the current weight of each node in the depth classification network by using the average logarithmic loss value to obtain the gradient value of the current weight of each node in the depth classification network model.
And 6, calculating the updated weight of each node in the deep classification network model according to the following formula:
Figure GDA0003088551260000062
wherein the content of the first and second substances,
Figure GDA0003088551260000063
represents the updated weight value w of the kth node of the depth classification network modelkRepresenting the current weight of the kth node of the depth classification network model, alpha representing the learning rate, and the value range of the alpha representing the learning rate is (0,1), and delta wkAnd representing the gradient value of the current weight of the kth node in the depth classification network model.
And 7, judging whether all sample images in the training sample set are selected, if so, obtaining a trained deep classification network model, and otherwise, executing the 3 rd step after the updated weight of each node is taken as the current weight.
And 5, extracting a target characteristic network model.
And deleting the output layer of the fully-connected network in the trained deep classification network model, taking the hidden layer of the fully-connected network as the output layer of the target characteristic network, and extracting the target characteristic network model.
And 6, predicting the target position of the current frame image.
And loading the next frame image in the video sequence to be tracked as a current frame image, establishing a rectangular area by using the size which is 1.5 times of the length and width of the target to be tracked in the current frame image by taking the position of the target to be tracked of the loaded previous frame image as the center, and taking the rectangular area as a candidate area of the target to be tracked.
And intercepting the rectangular images in the candidate region of the target to be tracked in a sliding mode with step change, and forming a candidate image sequence by all the intercepted rectangular images.
The step change sliding mode comprises the following specific steps:
and step 1, selecting a rectangular sliding frame with the same size as the target to be tracked, and respectively setting the maximum sliding step length and the minimum sliding step length in the directions of the x axis and the y axis.
And 2, placing a rectangular sliding frame at the upper left corner of the target candidate area to be tracked.
And 3, calculating the sliding step length in the positive direction of the x axis according to the following formula.
Figure GDA0003088551260000071
Wherein S isxRepresenting the step of sliding in the positive x-direction, Sx1Denotes the maximum sliding step in the x-axis direction, Sx2And the minimum sliding step length in the x-axis direction is represented, w represents the width of the target to be tracked, u' represents the abscissa of the central point of the rectangular sliding frame, and u represents the abscissa of the central point of the candidate area of the target to be tracked.
And 4, sliding the rectangular sliding frame in the sliding step length of the positive direction of the x axis, and capturing the framed image.
And 5, judging whether the rectangular sliding frame exceeds the candidate area of the target to be tracked, if so, translating the rectangular sliding frame to the leftmost side of the candidate area of the target to be tracked along the negative direction of the x axis and then executing the step 6, and if not, executing the step 3.
And 6, calculating the sliding step length in the positive direction of the y axis according to the following formula:
Figure GDA0003088551260000072
wherein S isyDenotes the step of sliding in the positive y-axis direction, Sy1Denotes the maximum sliding step in the y-axis direction, Sy2The minimum sliding step length in the y-axis direction is represented, h represents the length of the target to be tracked, v' represents the ordinate of the center point of the current position of the rectangular frame, and v represents the ordinate of the center point of the candidate area of the target to be tracked.
And 7, sliding the rectangular sliding frame in the positive direction of the y axis by the sliding step length, and capturing the framed image.
And 8, judging whether the rectangular sliding frame exceeds the candidate area of the target to be tracked, if so, executing the ninth step, otherwise, executing the third step.
And 9, forming a candidate image sequence by all the intercepted images.
And inputting the candidate image sequence into a target characteristic network model, outputting a characteristic response value sequence corresponding to each candidate image, and selecting a maximum characteristic response value from the characteristic response value sequence.
And in the current frame image, taking the position of the candidate image corresponding to the maximum characteristic response value as the position of the tracking target.
And 7, judging whether the current frame video image is the last frame video image of the video image sequence to be tracked, if so, executing the step 8, otherwise, executing the step 6.
And 8, finishing tracking the moving target to be tracked.
The effect of the present invention will be further explained with the simulation experiment.
1. Simulation experiment conditions are as follows:
the hardware test platform of the simulation experiment of the invention is as follows: the CPU is intel Core i5-6500, the main frequency is 3.2GHz, the memory is 8GB, and the GPU is NVIDIA TITAN Xp; the software platform is as follows: ubuntu 16.04 LTS, 64-bit operating system, python 3.6.5.
2. Simulation content simulation result analysis:
the simulation experiment of the present invention is a simulation experiment performed on a video image sequence of a man walking on a road from an Object tracking benchmark 2015 database by using the method of the present invention, wherein the video image sequence has a total of 252 frames of video images, and the result of the simulation experiment of the present invention is shown in fig. 2.
Fig. 2(a) is a 1 st frame image of a video image sequence input by a simulation experiment of the present invention, and the position of a solid-line rectangular frame in fig. 2(a) represents an initial position of a target to be tracked.
Fig. 2(b) is a schematic diagram of a tracking result of a frame of video image when an input target to be tracked is occluded in a simulation experiment of the present invention. The man in the video image is the target to be tracked, and the target to be tracked is shielded. Determining a candidate region of a target to be tracked from the video image, inputting a candidate image sequence intercepted in the candidate region into a target characteristic network to obtain a characteristic response value sequence corresponding to each candidate image, and taking the position of the candidate image corresponding to the maximum characteristic response value in the current frame image as the position of the target to be tracked. The solid rectangular box in fig. 2(b) indicates the position of the target to be tracked.
Fig. 2(c) is a schematic diagram of a tracking result of a frame of video image when a target to be tracked is deformed, which is input in a simulation experiment of the present invention. The man in the video image is the target to be tracked, and the target to be tracked generates deformation. Determining a candidate region of a target to be tracked from the video image, inputting a candidate image sequence intercepted in the candidate region into a target characteristic network to obtain a characteristic response value sequence corresponding to each candidate image, and taking the position of the candidate image corresponding to the maximum characteristic response value in the current frame image as the position of the target to be tracked. The solid rectangular box in fig. 2(c) indicates the position of the target to be tracked.
As can be seen from fig. 2(b) and 2(c), the target framed by the solid-line rectangular frame in the figure is consistent with the target framed by the solid-line rectangular frame in fig. 2(a), which shows that the target tracking method can accurately and effectively track the target when the target to be tracked in the video image is deformed and shielded.

Claims (5)

1. A moving target tracking method based on sample expansion and a deep classification network is characterized in that a deep classification network model is constructed, positive and negative sample sets are generated, the deep classification network model is trained, and a target characteristic network model is extracted; the method comprises the following specific steps:
(1) constructing a deep classification network model:
(1a) building a 3-layer fully-connected network, wherein the first layer of the network is an input layer, the second layer is a hidden layer, and the third layer is an output layer;
(1b) the parameters of each layer in the fully connected network are set as follows: setting the number of the neurons of the first layer as 1024, and setting an activation function as a modified linear unit ReLU function; setting the number of the neurons of the second layer to be 2; setting the number of the neurons in the third layer as 2, and setting the activation function as a Sigmoid function;
(1c) taking the output of the depth residual error network ResNet50 as the input of a full-connection network to obtain a depth classification network model;
(2) generating a positive sample set:
(2a) inputting a first frame image in a video image sequence containing a target to be tracked, and determining a rectangular frame by taking the center of the initial position of the target to be tracked as the center and the length and the width of the target to be tracked as the length and the width;
(2b) 3000 rectangular target images with the same size are intercepted from the rectangular frame to form a positive sample set;
(2c) randomly selecting one rectangular target image from the positive sample set, uniformly cutting the rectangular target image into 3 multiplied by 3 small rectangular images by 3 equal parts, discarding the part which is less than 3 equal parts, randomly combining and splicing the 9 divided small rectangular images into 4000 recombined images with the same size as the rectangular target image to form a recombined image set;
(2d) in a first frame of video image, acquiring a scale change image set in a single-pixel stepping mode;
(2e) adding the recombined image set and the scale change image set into a positive sample set to form an expanded positive sample set;
(3) generating a negative sample set:
(3a) in a first frame image of a video, determining 5 large rectangular frames by taking an initial position of a target to be tracked as a center, wherein the length and the width of each large rectangular frame are respectively 1.5, 1.6, 1.7, 1.8 and 1.9 times of the length and the width of the target to be tracked;
(3b) selecting a rectangular sliding frame with the same size as the target to be tracked;
(3c) sliding the rectangular sliding frame in each large rectangular frame, intercepting images in the rectangular sliding frame after each translation, and forming an expanded negative sample set by all the intercepted images;
(4) training a deep classification network model:
(4a) inputting the extended positive sample set and the extended negative sample set into a deep classification network model;
(4b) updating the weight of each node in the deep classification network model by using a random gradient descent method to obtain a trained deep classification network model;
(5) extracting a target characteristic network model:
deleting an output layer of a full-connection network in the trained deep classification network model, taking a hidden layer of the full-connection network as an output layer of a target characteristic network, and extracting the target characteristic network model;
(6) predicting the target position of the current frame image:
(6a) loading a next frame image in a video sequence to be tracked as a current frame image, establishing a rectangular area by using the size which is 1.5 times of the length and width of a target to be tracked in the current frame image by taking the position of the target to be tracked of the loaded previous frame image as the center, and taking the rectangular area as a candidate area of the target to be tracked;
(6b) intercepting rectangular images in a candidate region of a target to be tracked in a sliding mode with step change, and forming a candidate image sequence by all the intercepted rectangular images;
(6c) inputting the candidate image sequence into a target characteristic network model, outputting a characteristic response value sequence corresponding to each candidate image, and selecting a maximum characteristic response value from the characteristic response value sequence;
(6d) in the current frame image, taking the position of the candidate image corresponding to the maximum characteristic response value as the position of the tracking target;
(7) judging whether the current frame video image is the last frame video image of the video image sequence to be tracked, if so, executing the step (8), otherwise, executing the step (6);
(8) and finishing the tracking of the moving target to be tracked.
2. The method for tracking the moving object based on the sample expansion and depth classification network of claim 1, wherein the specific steps of the single-pixel stepping mode in the step (2d) are as follows:
step one, forming a small rectangular frame by using the center of the initial position of the target to be tracked and the length and width which are 0.5 times of the initial position;
keeping the center of the small rectangular frame unchanged, increasing the width of the small rectangular frame by 1 pixel, and taking the product of the length-width ratio of the small rectangular frame and the increased width as the length to form a temporary rectangular frame;
thirdly, judging whether the width of the temporary rectangular frame is more than or equal to 3 times of the width of the small rectangular frame, if so, executing the fourth step, otherwise, intercepting and storing the rectangular image in the temporary rectangular frame and then executing the second step;
and fourthly, forming all the intercepted images into a scale change image set.
3. The method for tracking the moving object based on the sample expansion and depth classification network of claim 1, wherein the rectangular sliding frame slides in each large rectangular frame in the step (3c) in a manner that: and (3) sequentially translating the rectangular sliding frame to the upper right corner, the lower right corner and the lower left corner of each large rectangular frame by taking the upper left corner of each large rectangular frame as a starting point and 1 pixel as a step length, and finally translating the rectangular sliding frame back to the upper left corner.
4. The method for tracking the moving object based on the sample expansion and depth classification network of claim 1, wherein the random gradient descent method in the step (4b) comprises the following specific steps:
step one, randomly selecting a number in a range of (0,0.1), and using the number as an initial weight of each node in a deep classification network model;
secondly, taking the initial weight of each node as the current weight of each node in the deep classification network model in the first iteration process;
thirdly, randomly selecting 2 from the positive and negative sample setsnThe sample images are propagated in the depth classification network model in the forward direction, wherein n is more than or equal to 3 and less than or equal to 7, and the output layer of the depth classification network model outputs 2nClassifying results of the sample images;
fourthly, calculating the average logarithmic loss value of the classification result according to the following formula:
Figure FDA0003088551250000031
wherein L represents the average log loss value of the classification result, N represents the total number of randomly selected sample images, Σ represents the summation operation, i represents the serial number of the input sample image, y represents the average log loss value of the classification result, y represents the total number of the randomly selected sample images, y represents the sum of the sumiY representing the class of the ith input sample image, positive class sampleiValue 1, y of negative class sampleiThe value takes 0, log denotes base 10 logarithm operation, piRepresenting the output value of the depth classification network model of the ith sample image in the classification result;
fifthly, solving a partial derivative of the current weight of each node in the depth classification network by using the average logarithmic loss value to obtain a gradient value of the current weight of each node in the depth classification network model;
sixthly, calculating the updated weight of each node in the deep classification network model according to the following formula:
Figure FDA0003088551250000032
wherein the content of the first and second substances,
Figure FDA0003088551250000033
represents the updated weight value w of the kth node of the depth classification network modelkRepresenting the current weight of the kth node of the depth classification network model, alpha representing the learning rate, and the value range of the alpha representing the learning rate is (0,1), and delta wkRepresenting the gradient value of the current weight of the kth node in the depth classification network model;
and seventhly, judging whether all the sample images in the training sample set are selected, if so, obtaining a trained deep classification network model, and otherwise, executing the third step after taking the updated weight of each node as the current weight.
5. The method for tracking the moving object based on the sample expansion and depth classification network of claim 1, wherein the step change sliding manner in step (6b) comprises the following steps:
the method comprises the steps that firstly, a rectangular sliding frame with the same size as a target to be tracked is selected, and the maximum sliding step length and the minimum sliding step length in the x-axis direction and the y-axis direction are respectively set;
secondly, placing a rectangular sliding frame at the upper left corner of the candidate area of the target to be tracked;
thirdly, calculating the sliding step length in the positive direction of the x axis according to the following formula:
Figure FDA0003088551250000041
wherein S isxRepresenting the step of sliding in the positive x-direction, Sx1Denotes the maximum sliding step in the x-axis direction, Sx2The minimum sliding step length in the x-axis direction is represented, w represents the width of a target to be tracked, u' represents the abscissa of the center point of a rectangular sliding frame, and u represents the abscissa of the center point of a candidate area of the target to be tracked;
fourthly, sliding a rectangular sliding frame in the sliding step length of the positive direction of the x axis, and capturing the framed image;
fifthly, judging whether the rectangular sliding frame exceeds the candidate area of the target to be tracked, if so, translating the rectangular sliding frame to the leftmost side of the candidate area of the target to be tracked along the negative direction of the x axis, and then executing the sixth step, otherwise, executing the third step;
and sixthly, calculating the sliding step length in the positive direction of the y axis according to the following formula:
Figure FDA0003088551250000042
wherein S isyDenotes the step of sliding in the positive y-axis direction, Sy1Denotes the maximum sliding step in the y-axis direction, Sy2The minimum sliding step length in the y-axis direction is represented, h represents the length of the target to be tracked, v' represents the vertical coordinate of the center point of the current position of the rectangular frame, and v represents the vertical coordinate of the center point of the candidate area of the target to be tracked;
seventhly, sliding the rectangular sliding frame in the sliding step length of the positive direction of the y axis, and capturing the framed image;
eighthly, judging whether the rectangular sliding frame exceeds the candidate area of the target to be tracked, if so, executing the ninth step, otherwise, executing the third step;
and step nine, forming a candidate image sequence by all the intercepted images.
CN201811005680.0A 2018-08-30 2018-08-30 Moving target tracking method based on sample expansion and depth classification network Active CN109345559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811005680.0A CN109345559B (en) 2018-08-30 2018-08-30 Moving target tracking method based on sample expansion and depth classification network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811005680.0A CN109345559B (en) 2018-08-30 2018-08-30 Moving target tracking method based on sample expansion and depth classification network

Publications (2)

Publication Number Publication Date
CN109345559A CN109345559A (en) 2019-02-15
CN109345559B true CN109345559B (en) 2021-08-06

Family

ID=65292344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811005680.0A Active CN109345559B (en) 2018-08-30 2018-08-30 Moving target tracking method based on sample expansion and depth classification network

Country Status (1)

Country Link
CN (1) CN109345559B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110033473B (en) * 2019-04-15 2021-04-20 西安电子科技大学 Moving target tracking method based on template matching and depth classification network
CN110135365B (en) * 2019-05-20 2021-04-06 厦门大学 Robust target tracking method based on illusion countermeasure network
CN110298838A (en) * 2019-07-09 2019-10-01 国信优易数据有限公司 A kind of method, apparatus, equipment and the storage medium of determining sample image
CN112164097B (en) * 2020-10-20 2024-03-29 南京莱斯网信技术研究院有限公司 Ship video detection sample collection method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871124A (en) * 2017-11-15 2018-04-03 陕西师范大学 A kind of Remote Sensing Target detection method based on deep neural network
CN107945210A (en) * 2017-11-30 2018-04-20 天津大学 Target tracking algorism based on deep learning and environment self-adaption
CN108090918A (en) * 2018-02-12 2018-05-29 天津天地伟业信息系统集成有限公司 A kind of Real-time Human Face Tracking based on the twin network of the full convolution of depth
CN108171752A (en) * 2017-12-28 2018-06-15 成都阿普奇科技股份有限公司 A kind of sea ship video detection and tracking based on deep learning
CN108171112A (en) * 2017-12-01 2018-06-15 西安电子科技大学 Vehicle identification and tracking based on convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871124A (en) * 2017-11-15 2018-04-03 陕西师范大学 A kind of Remote Sensing Target detection method based on deep neural network
CN107945210A (en) * 2017-11-30 2018-04-20 天津大学 Target tracking algorism based on deep learning and environment self-adaption
CN108171112A (en) * 2017-12-01 2018-06-15 西安电子科技大学 Vehicle identification and tracking based on convolutional neural networks
CN108171752A (en) * 2017-12-28 2018-06-15 成都阿普奇科技股份有限公司 A kind of sea ship video detection and tracking based on deep learning
CN108090918A (en) * 2018-02-12 2018-05-29 天津天地伟业信息系统集成有限公司 A kind of Real-time Human Face Tracking based on the twin network of the full convolution of depth

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Convolutional neural networks based scale-adaptive kernelized correlation filter for robust visual object tracking;Bing Liu et al.;《2017 International Conference on Security, Pattern Analysis, and Cybernetics》;20171231;第423-428页 *
基于特征在线选择的目标压缩跟踪算法;李庆武 等;《自动化学报》;20151130;第41卷(第11期);第1961-1970页 *
基于碎片表征的尺度自适应运动目标跟踪;朱雨莲 等;《计算机工程》;20160930;第42卷(第9期);第268-273页 *

Also Published As

Publication number Publication date
CN109345559A (en) 2019-02-15

Similar Documents

Publication Publication Date Title
CN110033473B (en) Moving target tracking method based on template matching and depth classification network
CN111259930B (en) General target detection method of self-adaptive attention guidance mechanism
CN109345559B (en) Moving target tracking method based on sample expansion and depth classification network
CN110930347B (en) Convolutional neural network training method, and method and device for detecting welding spot defects
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN109754017B (en) Hyperspectral image classification method based on separable three-dimensional residual error network and transfer learning
CN110781924B (en) Side-scan sonar image feature extraction method based on full convolution neural network
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN107563433B (en) Infrared small target detection method based on convolutional neural network
CN108038435B (en) Feature extraction and target tracking method based on convolutional neural network
CN107633226B (en) Human body motion tracking feature processing method
CN111640125A (en) Mask R-CNN-based aerial photograph building detection and segmentation method and device
CN110334589A (en) A kind of action identification method of the high timing 3D neural network based on empty convolution
CN112348849A (en) Twin network video target tracking method and device
CN109753996B (en) Hyperspectral image classification method based on three-dimensional lightweight depth network
CN109377511B (en) Moving target tracking method based on sample combination and depth detection network
CN113177456B (en) Remote sensing target detection method based on single-stage full convolution network and multi-feature fusion
CN111242109B (en) Method and device for manually fetching words
US11367206B2 (en) Edge-guided ranking loss for monocular depth prediction
CN110516700B (en) Fine-grained image classification method based on metric learning
CN117079098A (en) Space small target detection method based on position coding
CN113963333B (en) Traffic sign board detection method based on improved YOLOF model
CN113989631A (en) Infrared image target detection network compression method based on convolutional neural network
CN110866552B (en) Hyperspectral image classification method based on full convolution space propagation network
CN114494441B (en) Grape and picking point synchronous identification and positioning method and device based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant