CN110276784B - Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics - Google Patents

Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics Download PDF

Info

Publication number
CN110276784B
CN110276784B CN201910478278.2A CN201910478278A CN110276784B CN 110276784 B CN110276784 B CN 110276784B CN 201910478278 A CN201910478278 A CN 201910478278A CN 110276784 B CN110276784 B CN 110276784B
Authority
CN
China
Prior art keywords
classifier
peak
target
interference
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910478278.2A
Other languages
Chinese (zh)
Other versions
CN110276784A (en
Inventor
宋勇
王姗姗
杨昕
赵宇飞
王枫宁
郭拯坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201910478278.2A priority Critical patent/CN110276784B/en
Publication of CN110276784A publication Critical patent/CN110276784A/en
Application granted granted Critical
Publication of CN110276784B publication Critical patent/CN110276784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a related filtering moving target tracking method based on a memory mechanism and convolution characteristics, and belongs to the technical field of computer vision. The method utilizes a pre-trained deep convolutional neural network to extract the convolutional characteristic of a target, is inspired by a human brain memory mechanism in human visual information processing cognitive behaviors, and integrates the memory mechanism into the detection, training and updating processes of a classifier of a relevant filtering method. The memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficient. The method has stronger robustness, and can still continuously and stably realize target tracking under the conditions of violent deformation, reappearance or shielding after temporary disappearance and the like of the target. Meanwhile, the method has higher target tracking speed, reduces the complexity and reduces the operation amount.

Description

Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics
Technical Field
The invention relates to a method for tracking a moving target in an image sequence, in particular to a method for tracking a related filtering moving target based on a memory mechanism and convolution characteristics, and belongs to the technical field of computer vision.
Background
The moving target tracking technology is an important research direction of computer vision science and is widely applied to the fields of safety monitoring, human-computer interfaces, medical diagnosis and the like. At present, the moving target tracking technology has the main problem that the tracking precision is reduced due to the fact that the influence of complex interference factors such as background illumination condition change, target shielding, shape change, size change, rapid movement and the like is difficult to overcome.
The discriminant tracking method is an important moving target tracking method, and specifically comprises the following steps: a multiple sample Learning (MIL) Tracking method, a Tracking-Learning-Detection (TLD) Tracking method, a core Structured output (Structured output Tracking with kernel) Tracking method, and the like. The principle of such a method is: firstly, training a classifier by taking a target as a positive sample and taking a background as a negative sample; then, the search area is detected by the classifier, and the point with the maximum responsiveness is regarded as the target center position for tracking. Typically, such methods train the classifier by sparse sampling, i.e., taking several equally sized windows around the target as samples. However, as the number of samples increases, the amount of calculation also increases, thereby decreasing the real-time performance of the tracking method.
The related filtering tracking method solves the problems of insufficient training samples and large calculated amount of the discriminant tracking method to a certain extent by constructing a cyclic matrix of the samples. For example, the KCF algorithm proposed by Henriques et al (Henriques J F, Rui C, Martins P, et al, "High-Speed Transmission with Kernelized Correlation Filters". IEEE Transactions on Pattern Analysis & Machine Analysis, 2014,37(3): 583-. And the relevant filtering process is realized through ridge regression operation based on the kernel. The algorithm has high real-time performance and realizes accurate tracking of the moving target under the nonlinear condition.
In recent years, research efforts in the field of deep learning have begun to be combined with correlation filtering tracking methods. For example, the HCF algorithm (Ma C, Huang J B, Yang X, et al. structural Convolutional Features for Visual Tracking [ C ]// IEEE International Conference on Computer vision. IEEE Computer Society,2015:3074-3082.) replaces HOG Features with hierarchical convolution Features within the framework of the KCF algorithm. According to the characteristics that the high-level features contain more semantic information and the low-level features contain more local information such as textures, outlines and the like, the approximate position of the target is determined by the highest-level features, and then the target is gradually and accurately positioned downwards, so that the method has higher robustness compared with the traditional manually extracted features.
Although the related filtering algorithm using convolution features has the advantages, the related filtering algorithm also has certain limitations: firstly, the classifier extracts convolution characteristics twice in detection and training, and the calculated amount is very large; secondly, the target template and the classifier are updated at a fixed rate every frame, so that the ability of adapting to the drastic change of the target is poor. Therefore, when the target has the conditions of shape mutation, serious shielding, reappearance after temporary disappearance and the like, the tracking precision of the target is obviously reduced, and even the target is lost; and it is difficult to meet the real-time requirements.
Disclosure of Invention
The invention aims to solve the problem that a target can be accurately and quickly tracked under the interference conditions of sudden change of posture and shape, reappearance after temporary disappearance, shielding and the like of the target, and provides a related filtering moving target tracking method based on a memory mechanism and convolution characteristics.
The method of the invention utilizes the deep convolutional neural network trained in advance to extract the convolutional characteristic of the target. Inspired by the human brain memory mechanism in the human visual information processing cognitive behavior, the memory mechanism is integrated into the detection, training and updating processes of the classifier of the relevant filtering method. The memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficient. The process of fusing the detection, training and updating of the memory mechanism and the classifier is described as follows:
(1) classifier detection based on response graph decision: after the convolution characteristics of the candidate region are extracted, all classifiers in the memory space are subjected to convolution operation with the candidate region to obtain respective response graphs, and the response graph with the maximum peak value is selected to position the target.
(2) Training a classifier based on adaptive peak detection: after the target is positioned, the size relation and the position relation of the main peak and the secondary high interference peak in the response diagram are synthesized, and the change condition of the target is analyzed. And if the interference degree is larger than the threshold value, extracting the convolution characteristics of the target again and training a new classifier. And if the interference degree is not greater than the threshold value, not training and updating the classifier.
(3) Updating the classifier based on the self-adaptive fusion coefficient: after a new classifier is trained, the fusion coefficient is calculated in a self-adaptive manner according to the result of peak detection. The more severe the interference, the larger the fusion coefficient.
Through the mode, the organic integration of the memory mechanism and the tracking method is realized.
The method of the invention is realized by the following specific steps:
a correlation filtering moving target tracking method based on a memory mechanism and convolution characteristics comprises the following steps:
step 1: the memory space is initialized.
The capacity of the memory space is set as m, and the memory space is filled first during the 1 st to m-th frames, and the memory mechanism is not executed. After the training of the classifier is finished in the ith frame, the parameters of the classifier are stored in a memory space as the ith classifier w [ i ], i ∈ { 1. Except for initializing the memory space, the method of the present invention has the same steps as the general correlation filtering tracking method. When the memory space is filled, the memory mechanism is started to be executed in the subsequent frame.
Step 2: classifier detection based on response graph decision is performed.
Step 2.1: and extracting the convolution characteristics of the current frame candidate area.
Reading the t-th frame image, t>And m, selecting a candidate area according to the target center position determined by the previous frame. And extracting the convolution characteristics of the tracking window by means of a pre-trained convolution neural network. After the subsequent region image is input into the convolutional neural network, the output of L layers in 19 convolutional layers is selected as the convolutional characteristicxt. the characteristic of the candidate region at the l layer at the time t is represented as xt[l],l∈L。
Extracting convolution features xtThen, with xtConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)t)。
Step 2.2: all classifiers in the memory space are detected.
Let wt-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space correspond to the ith layer features, i ∈ { 1., m }, L ∈ L. With the test sample C (x)t) And (4) convolving with the classifier to obtain a response map, and regarding the position of the maximum response value on the response map as the target position.
As can be seen from the properties of the circulant matrix, the convolution of an arbitrary matrix and the circulant matrix in the time domain can be expressed as a dot product of the convolution and the generation matrix of the circulant matrix in the frequency domain. The response f of each layer characteristict[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t framet[i]:
Figure BDA0002082978540000041
Wherein the content of the first and second substances,
Figure BDA0002082978540000042
denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, capital letters denote Fourier transform forms of variables, and γ is a fusion weight. Xt[l]Representing a fourier transform version of the ith layer features at frame t.
Performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, estimating a target position by taking the response graph with the maximum response peak value, and performing subsequent training and updating on the classifier corresponding to the response graph:
Figure BDA0002082978540000043
in the formula, pi is the index of the classifier corresponding to the maximum peak response graph in the memory space.
And step 3: classifier training based on adaptive peak detection is performed.
Step 3.1: adaptive peak detection.
And simultaneously, calculating and comparing the position and peak value size relationship of the main peak and the interference peak on the response diagram, and selecting a secondary peak except the main peak as the interference peak. When the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded; when the interference peak appears at a position closer to the main peak, it is determined that the target is occluded even if the interference peak is not high. And judging the target state by using the peak interference degree, wherein the formula is as follows:
Figure BDA0002082978540000044
wherein, the response diagram uses the center of the main peak as the origin to define the coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,
Figure BDA0002082978540000051
is the position vector of the interference peak relative to the main peak,
Figure BDA0002082978540000052
is a constructed paraboloid. If the interference peak is higher than the curved surface, the target is considered to be changed drastically. The rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak, if the peak value interference degree rho is 0, all the following steps are skipped, the training and updating of the classifier are not carried out, and the next frame is directly entered; when peak interference degree rho>0, executing the following steps:
step 3.2: and extracting the convolution characteristics of the current frame target area.
According to the positioning result of the current frame in the step 2, the target center is taken as the center, the target area with the same size as the subsequent area is obtained by expansion, and the subsequent area is subjected to the positioningInputting the target area into a convolutional neural network, and extracting the convolution characteristic x of the target areat'。
Step 3.3: and (5) training a classifier.
Peak interference degree rho>0, indicating that the degree of matching between the classifier corresponding to the peak maximum response graph selected in step 3.1 and the target is poor, and a new classifier w needs to be trainedt' to accommodate changes in goals.
The principle of training the classifier is the same as that of the general correlation filtering method, and the classifier parameter w corresponding to the l-th layer feature is trained by minimizing the following formulat'[l]:
Figure BDA0002082978540000053
Wherein, x't[l]Features extracted at new positions during training are convolution operators, and lambda is a l2 regularization parameter; y is the trained target label function, which is a two-dimensional gaussian function with the same size as the classifier, and the peak is located at the center.
The closed-form solution to this minimization problem is:
Figure BDA0002082978540000054
wherein the content of the first and second substances,
Figure BDA0002082978540000055
a fourier transform form representing the target tag function.
And 4, step 4: and updating the classifier based on the adaptive fusion coefficient.
New classifier parameters wtAfter training, the classifier in the memory space is updated. Classifier wt-1[π]And wt' carry out weighted fusion, and the parameters of the rest classifiers are unchanged, and the formula is as follows:
Figure BDA0002082978540000061
wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:
Figure BDA0002082978540000062
wherein λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update; e is a natural log symbol.
Advantageous effects
Compared with the existing moving target tracking method, the method of the invention has the following advantages:
(1) and the robustness is strong. The method has stronger robustness, and the algorithm can memorize the state of the target during tracking by integrating the human brain memory mechanism into the related filtering algorithm. In one aspect, response map decisions are used to select the most appropriate classifier from the memory space for detection. On the other hand, the adaptive peak detection is utilized to train the classifier, the convolution characteristic of the target is extracted again only when the target is changed violently, and the classifier is updated according to the self-adaptive calculation fusion coefficient of the peak detection result, so that the target tracking can be continuously and stably realized under the conditions that the target is deformed violently, reappears or is shielded after disappearing briefly, and the like.
(2) The tracking speed is high. The method has higher target tracking speed. In one aspect, training samples for a classifier are constructed by cyclic shifting in the framework of correlation filtering. Meanwhile, the problem is transformed to a frequency domain for solving based on the characteristics of the cyclic matrix, and the matrix inversion process is avoided, so that the complexity of the algorithm is greatly reduced. On the other hand, classifier parameters of the target under different states are stored in a memory space. When the similar state occurs again, the classifier is directly selected and called according to the response value, the CNN feature of the target area does not need to be extracted again for retraining, and therefore the operation amount is reduced by half.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a schematic diagram illustrating the classifier detection step based on response map decision in the method of the present invention;
FIG. 3 is a schematic diagram illustrating the classifier training steps based on adaptive peak detection in the method of the present invention;
FIG. 4 is a schematic diagram illustrating the classifier updating step based on adaptive fusion coefficients in the method of the present invention;
FIG. 5 is a flow chart showing the method of the present invention;
FIG. 6 is a comparison of the tracking results of the method of the present invention and the conventional HCF method;
FIG. 7 is a graph of tracking accuracy for the method of the present invention and a conventional HCF method;
FIG. 8 is a comparison of the tracking index of the method of the present invention and the conventional HCF method.
Detailed Description
The method of the present invention will be described in detail with reference to the accompanying drawings and examples.
Examples
A method for tracking a related filtering moving target based on a memory mechanism and convolution characteristics is shown in FIG. 2, and comprises the following steps:
step 1: the memory space is initialized.
Let the capacity m of the memory space be 4. In frames 1 to 4, the method of the present invention is identical to the general correlation filtering tracking method except that the memory space is initialized. After the training of the classifier is finished in each frame, the parameters of the classifier are stored in a memory space to be used as the ith classifier in the memory space. At the end of frame 4, the memory space is filled and the memory mechanism begins to be executed in the subsequent frame.
Step 2: classifier detection based on response graph decision.
Step 2.1: and extracting the convolution characteristics of the current frame candidate area.
Reading the image of the t-th frame, and selecting a candidate area according to the target center position determined by the previous frame. The method of the invention extracts the tracking window by using the trained VGG-19 convolutional neural networkAnd (4) convolution characteristics. After the subsequent region image is input into the convolution network, the outputs of Conv3-4, Conv4-4 and Conv5-4 in 19 convolution layers are selected as convolution characteristics, namely L ═ Conv3-4, Conv4-4 and Conv 5-4. the characteristic of the candidate region at the l-th layer at the time point t is represented as xt[l],l∈L。
Extracting convolution features xtThen, with xtConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)t)。
Step 2.2: detection of all classifiers in memory space.
Let wt-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space corresponding to the ith layer features, i ∈ {1,2,3,4}, L ∈ L. With the test sample C (x)t) Convolution with the classifier can obtain a response map, and the position of the maximum response value on the response map is regarded as the target position.
As can be seen from the properties of the circulant matrix, the convolution of an arbitrary matrix and the circulant matrix in the time domain can be expressed as a dot product of the convolution and the generator matrix of the circulant matrix in the frequency domain. The response f of each layer characteristict[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t framet[i]:
Figure BDA0002082978540000081
Wherein the content of the first and second substances,
Figure BDA0002082978540000082
denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, an upper case represents a fourier transform form of a variable, and γ is a fusion weight, which is set to be {0.25,0.5,1 }.
And (3) performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, and estimating the target position by taking the response graph with the maximum response peak value. And the classifier corresponding to the response graph is subjected to subsequent training and updating.
Figure BDA0002082978540000083
And pi in the formula is the index of the classifier corresponding to the maximum peak response graph in the memory space.
And step 3: classifier training based on adaptive peak detection.
Step 3.1: adaptive peak detection
The core idea of adaptive peak detection is as follows: and simultaneously calculating and comparing the position and peak value size relationship of the main peak and the interference peak on the response diagram. And selecting secondary peaks except the main peak as interference peaks. When the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded; when the interference peak appears at a position closer to the main peak, it is determined that the target is occluded even if the interference peak is not high. And judging the target state by utilizing the peak interference degree, wherein the calculation formula is as follows:
Figure BDA0002082978540000084
in the formula, the response diagram uses the center of the main peak as an origin to define a coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,
Figure BDA0002082978540000091
is the position vector of the interference peak relative to the main peak,
Figure BDA0002082978540000092
is a constructed paraboloid. If the interference peak is higher than the curved surface, the target is considered to be changed drastically. The rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak. If the peak interference ρ is 0, the following steps are skipped, and the training and updating of the classifier are not performed, and the next frame is entered directly.
When the peak interference degree ρ >0, the following steps are performed.
Step 3.2: and extracting the convolution characteristics of the current frame target area.
And (3) according to the positioning result of the current frame in the step (2), expanding to obtain a target area with the same size as the subsequent area by taking the target center as the center. Inputting the target area into a VGG-19 network, and extracting convolution characteristics x of the target areat'。
Step 3.3: and (5) training a classifier.
Peak interference degree rho>0, which means that the classifier corresponding to the peak maximum response graph selected in step 3.1 has a poor matching degree with the target, and a new classifier w needs to be trainedt' to accommodate changes in goals.
The principle of training the classifier is the same as that of the general correlation filtering method, and the classifier parameter w corresponding to the l-th layer feature is trained by minimizing the following formulat'[l]:
Figure BDA0002082978540000093
In the formula, x is a convolution operator, λ is a regularization parameter l2, y is a trained target label function, and is a two-dimensional gaussian function with the same size as the classifier, and the peak is located at the center.
The closed-form solution to this minimization problem is:
Figure BDA0002082978540000094
and 4, step 4: updating the classifier based on the adaptive fusion coefficient.
New classifier parameters wtAfter training, the classifier in the memory space is updated. Classifier wt-1[π]And wt' carrying out weighted fusion, keeping the parameters of the rest classifiers unchanged, and describing the parameters as follows by a formula:
Figure BDA0002082978540000095
wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:
Figure BDA0002082978540000101
λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update.
The simulation effect of the invention is illustrated by the following simulation experiment:
1. simulation conditions are as follows:
the invention uses MATLAB 2017b platform on the PC of Intel (R) core (TM) i7-7700HQ CPU 2.80GHz, RAM 8.00G, GTX1050GPU to complete simulation experiment to the video sequence in Visual Tracker Benchmark video test set.
2. And (3) simulation results:
fig. 3 is a graph of the tracking result of a video sequence with obvious occlusion on the target, which is the 330 th, 371 th, 390 th and 410 th frames, respectively, and the rectangular boxes in the graph represent the tracking result of the conventional method and the method of the present invention. As can be seen from FIG. 3, the method can accurately track the target in the process that the moving target reappears after being obviously shielded.
FIG. 4 is a graph comparing the tracking accuracy curves of the method of the present invention and a conventional HCF algorithm. The abscissa of the tracking precision curve refers to the Euclidean distance between the target center of the simulation tracking result and the real center marked in the grountruth, and the ordinate refers to the proportion of the number of frames with the Euclidean distance smaller than a certain threshold value in the length of the whole test video sequence. Fig. 5 is a graph of tracking accuracy versus tracking speed (FPS: frames per second) at a distance threshold of 20 pixels. Through evaluation statistics, for the Lemming sequence, the probability that the distance between the tracking result of the conventional HCF algorithm and the tracking result of the method of the invention and the actual position of the target is within 20 pixels is 0.6820 and 0.8920 respectively, and the tracking precision is improved by 30.8%. When CNN operation is completed on a GPU, the speeds of the conventional HCF algorithm and the algorithm provided by the invention are 4.4751fps and 5.1678fps respectively, and the speeds are improved by 15.5%; when the CNN operation is completed on the CPU, the speeds of the two algorithms are 1.1653fps and 2.1363fps respectively, and the speed is improved by 83.3%.

Claims (3)

1. A correlation filtering moving target tracking method based on a memory mechanism and convolution characteristics is characterized by comprising the following steps:
firstly, initializing a memory space, and the method comprises the following steps:
setting the capacity of a memory space as m, filling the memory space when the frames 1 to m are processed, not executing a memory mechanism for the moment, storing parameters of a classifier into the memory space after the training of the classifier is completed in the ith frame, using the parameters as an ith classifier w [ i ], wherein i belongs to { 1., m } in the memory space, and starting executing the memory mechanism in a subsequent frame when the memory space is filled;
secondly, extracting the convolution characteristics of the target by utilizing a pre-trained deep convolutional neural network, and integrating a memory mechanism into the detection, training and updating fusion process of a classifier of a related filtering method, wherein the memory mechanism consists of three parts, namely response diagram decision, adaptive peak detection and adaptive fusion coefficients; specifically, a memory mechanism is integrated into a detection, training and updating fusion process of a classifier of a relevant filtering method, and the method comprises the following steps:
the classifier detection based on the response graph decision comprises the following steps: after extracting the convolution characteristics of the candidate region, carrying out convolution operation on all classifiers in the memory space with the candidate region to obtain respective response graphs, and selecting the response graph with the maximum peak value to position the target;
the classifier training based on the self-adaptive peak detection comprises the following steps:
step A: self-adaptive peak detection;
meanwhile, the position and peak value size relation of the main peak and the interference peak on the response graph is calculated and compared, and the secondary peak except the main peak is selected as the interference peak; when the interference peak is far away from the main peak, even if the interference peak is high, the target is considered not to be shielded, and when the interference peak is close to the main peak, the target is judged to be shielded even if the interference peak is not high;
and judging the target state by using the peak interference degree, wherein the formula is as follows:
Figure FDA0002847701730000011
wherein, the response diagram uses the center of the main peak as the origin to define the coordinate system again, H is the peak value of the main peak on the response diagram, H is the peak value of the interference peak, M is the distance from the main peak to the edge of the response diagram in the direction of the interference peak,
Figure FDA0002847701730000012
is the position vector of the interference peak relative to the main peak,
Figure FDA0002847701730000021
for a constructed paraboloid, if the interference peak is higher than the curved surface, the target is considered to be changed violently; the rho value is the ratio of the distance of the interference peak exceeding the curved surface to the height of the whole interference peak, if the peak value interference degree rho is 0, the subsequent steps are skipped, the training and updating of the classifier are not carried out, and the next frame is directly entered; when peak interference degree rho>0, executing the following steps:
and B: extracting convolution characteristics of a current frame target area;
according to the positioning result of the current frame in the classifier detection process, a target center is used as a center, a target area with the same size as a subsequent area is obtained through expansion, the target area is input into a convolutional neural network, and the convolutional characteristic x of the target area is extractedt';
And C: training a classifier;
peak interference degree rho>0, indicating that the degree of matching between the classifier corresponding to the peak maximum response graph selected in the step A and the target is poor, and a new classifier w needs to be trainedt' to accommodate changes in goals;
training classifier parameters w corresponding to the l-th layer features by minimizing the following formulat'[l]:
Figure FDA0002847701730000022
Wherein, x't[l]Features extracted at new positions during training are convolution operators, and lambda is a l2 regularization parameter; y is a trained target label function and is a two-dimensional Gaussian function with the same size as the classifier, and the peak value is positioned at the center;
the closed-form solution to this minimization problem is:
Figure FDA0002847701730000023
wherein the content of the first and second substances,
Figure FDA0002847701730000024
a Fourier transform form representing a target tag function;
updating a classifier based on a self-adaptive fusion coefficient, wherein the method comprises the following steps: after a new classifier is trained, a fusion coefficient is calculated in a self-adaptive mode according to a peak detection result, and the more severe the interference is, the larger the fusion coefficient is.
2. The method for tracking the moving object based on the correlation filtering of the memory mechanism and the convolution characteristic as claimed in claim 1, wherein the classifier detection based on the response graph decision is as follows:
step 2.1: extracting convolution characteristics of current frame candidate region
Setting the memory space capacity as m; reading the t-th frame image, t>m, selecting a candidate area according to the target center position determined by the previous frame; extracting the convolution characteristic of the tracking window by means of a pre-trained convolution neural network; after the subsequent region image is input into the convolutional neural network, the output of L layers in 19 convolutional layers is selected as the convolutional characteristic xtAnd the characteristic of the candidate region at the ith layer at the moment t is represented as xt[l],l∈L;
Extracting convolution features xtThen, with xtConstructing a circulant matrix for the generated matrix to obtain a test sample C (x)t);
Step 2.2: detecting all classifiers in a memory space
Let wt-1[i,l]Parameters representing the ith classifier learned before the tth frame in memory space corresponding to the ith layer features, i ∈ { 1., m }, L ∈ L, are used to detect the sample C (x)t) Convolving with a classifier to obtain a response graph, and regarding the position of the maximum response value on the response graph as a target position;
the response f of each layer characteristict[i,l]Adding according to fixed weight to obtain the response image f of the ith classifier in the memory space at the t framet[i]:
Figure FDA0002847701730000031
Wherein the content of the first and second substances,
Figure FDA0002847701730000032
denotes an Inverse Fast Fourier Transform (IFFT) operation, which is a dot product operator, capital letters denote Fourier transform forms of variables, and γ is a fusion weight; xt[l]A Fourier transform form representing the characteristics of the l < th > layer at the t < th > frame;
performing convolution operation on all classifiers in the memory space and the cyclic sample to obtain m response graphs, estimating a target position by taking the response graph with the maximum response peak value, and performing subsequent training and updating on the classifier corresponding to the response graph:
Figure FDA0002847701730000033
in the formula, pi is the index of the classifier corresponding to the maximum peak response graph in the memory space.
3. The method for tracking the moving object based on the correlation filtering of the memory mechanism and the convolution characteristic as claimed in claim 1, wherein the classifier updating method based on the adaptive fusion coefficient is as follows:
new classifier parametersNumber wtAfter training, updating the classifier in the memory space; classifier wt-1[π]And wt' carry out weighted fusion, and the parameters of the rest classifiers are unchanged, and the formula is as follows:
Figure FDA0002847701730000041
wherein λ is a fusion coefficient of the classifier at the current frame, and is obtained by using a Sigmoid function in a self-adaptive manner:
Figure FDA0002847701730000042
wherein λ monotonically increases with respect to ρ such that the more drastic the change in the target, the faster the rate of classifier update; e is a natural log symbol.
CN201910478278.2A 2019-06-03 2019-06-03 Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics Active CN110276784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910478278.2A CN110276784B (en) 2019-06-03 2019-06-03 Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910478278.2A CN110276784B (en) 2019-06-03 2019-06-03 Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics

Publications (2)

Publication Number Publication Date
CN110276784A CN110276784A (en) 2019-09-24
CN110276784B true CN110276784B (en) 2021-04-06

Family

ID=67961901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910478278.2A Active CN110276784B (en) 2019-06-03 2019-06-03 Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics

Country Status (1)

Country Link
CN (1) CN110276784B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPWO2021241487A1 (en) * 2020-05-25 2021-12-02
CN112183493A (en) * 2020-11-05 2021-01-05 北京澎思科技有限公司 Target tracking method, device and computer readable storage medium
CN113298846B (en) * 2020-11-18 2024-02-09 西北工业大学 Interference intelligent detection method based on time-frequency semantic perception
CN115115992B (en) * 2022-07-26 2022-11-15 中国科学院长春光学精密机械与物理研究所 Multi-platform photoelectric auto-disturbance rejection tracking system and method based on brain map control right decision

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530340A (en) * 2016-10-24 2017-03-22 深圳市商汤科技有限公司 Appointed object tracking method
CN107016689A (en) * 2017-02-04 2017-08-04 中国人民解放军理工大学 A kind of correlation filtering of dimension self-adaption liquidates method for tracking target
CN107146238A (en) * 2017-04-24 2017-09-08 西安电子科技大学 The preferred motion target tracking method of feature based block
CN108549839A (en) * 2018-03-13 2018-09-18 华侨大学 The multiple dimensioned correlation filtering visual tracking method of self-adaptive features fusion

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100211830A1 (en) * 2009-02-13 2010-08-19 Seagate Technology Llc Multi-input multi-output read-channel architecture for recording systems
CN104574445B (en) * 2015-01-23 2015-10-14 北京航空航天大学 A kind of method for tracking target
CN107767405B (en) * 2017-09-29 2020-01-03 华中科技大学 Nuclear correlation filtering target tracking method fusing convolutional neural network
CN107818575A (en) * 2017-10-27 2018-03-20 深圳市唯特视科技有限公司 A kind of visual object tracking based on layering convolution

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106530340A (en) * 2016-10-24 2017-03-22 深圳市商汤科技有限公司 Appointed object tracking method
CN107016689A (en) * 2017-02-04 2017-08-04 中国人民解放军理工大学 A kind of correlation filtering of dimension self-adaption liquidates method for tracking target
CN107146238A (en) * 2017-04-24 2017-09-08 西安电子科技大学 The preferred motion target tracking method of feature based block
CN108549839A (en) * 2018-03-13 2018-09-18 华侨大学 The multiple dimensioned correlation filtering visual tracking method of self-adaptive features fusion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
When Correlation Filters Meet Convolutional Neural Networks for Visual Tracking;Chao Ma,and etc;《IEEE Signal Processing Letters ( Volume: 23, Issue: 10, Oct. 2016)》;20161031;第23卷(第10期);第1454-1458页 *
基于卷积神经网络的响应自适应跟踪;李勇等;《液晶与显示》;20180731;第33卷(第7期);第596-605页 *

Also Published As

Publication number Publication date
CN110276784A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN111354017B (en) Target tracking method based on twin neural network and parallel attention module
CN110276784B (en) Correlation filtering moving target tracking method based on memory mechanism and convolution characteristics
CN109345508B (en) Bone age evaluation method based on two-stage neural network
CN110120064B (en) Depth-related target tracking algorithm based on mutual reinforcement and multi-attention mechanism learning
CN109741318B (en) Real-time detection method of single-stage multi-scale specific target based on effective receptive field
CN109859241B (en) Adaptive feature selection and time consistency robust correlation filtering visual tracking method
CN111080675B (en) Target tracking method based on space-time constraint correlation filtering
CN112184752A (en) Video target tracking method based on pyramid convolution
CN109919241B (en) Hyperspectral unknown class target detection method based on probability model and deep learning
CN109325440B (en) Human body action recognition method and system
CN111582349B (en) Improved target tracking algorithm based on YOLOv3 and kernel correlation filtering
CN111915644B (en) Real-time target tracking method of twin guide anchor frame RPN network
CN111612817A (en) Target tracking method based on depth feature adaptive fusion and context information
CN109035300B (en) Target tracking method based on depth feature and average peak correlation energy
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN110503090B (en) Character detection network training method based on limited attention model, character detection method and character detector
CN110827327B (en) Fusion-based long-term target tracking method
CN109272036B (en) Random fern target tracking method based on depth residual error network
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking
CN113033356B (en) Scale-adaptive long-term correlation target tracking method
CN116597275A (en) High-speed moving target recognition method based on data enhancement
CN111145221A (en) Target tracking algorithm based on multi-layer depth feature extraction
Masilamani et al. Art classification with pytorch using transfer learning
CN113920159B (en) Infrared air small and medium target tracking method based on full convolution twin network
CN115984325A (en) Target tracking method for target volume searching space-time regularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant