CN113298136B - Twin network tracking method based on alpha divergence - Google Patents

Twin network tracking method based on alpha divergence Download PDF

Info

Publication number
CN113298136B
CN113298136B CN202110556609.7A CN202110556609A CN113298136B CN 113298136 B CN113298136 B CN 113298136B CN 202110556609 A CN202110556609 A CN 202110556609A CN 113298136 B CN113298136 B CN 113298136B
Authority
CN
China
Prior art keywords
target
training
regression branch
alpha
twin network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110556609.7A
Other languages
Chinese (zh)
Other versions
CN113298136A (en
Inventor
胡旷伋
朱虎
邓丽珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202110556609.7A priority Critical patent/CN113298136B/en
Publication of CN113298136A publication Critical patent/CN113298136A/en
Application granted granted Critical
Publication of CN113298136B publication Critical patent/CN113298136B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a twin network tracking method based on alpha divergence, and aims to solve the technical problem that visual tracking with high robustness and accuracy is difficult to realize in the prior art. It includes: acquiring an image to be tracked and a well-trained twin network, wherein the twin network is trained based on alpha divergence; extracting the depth features of the image to be tracked by using ResNet 50; processing the depth characteristics of the image to be tracked by using a target center regression branch to obtain the predicted target position of the image to be tracked; and respectively processing the depth characteristics of the image to be tracked by using the regression branches of the target frame to obtain a predicted target frame of the image to be tracked. The method can explain the noise and uncertainty caused by manual labeling from the perspective of probability, and has higher accuracy and robustness.

Description

Twin network tracking method based on alpha divergence
Technical Field
The invention relates to a twin network tracking method based on alpha divergence, and belongs to the technical field of image vision.
Background
With the development of the fields of communication, computers and the like, artificial intelligence has become a hot research focus of the current people. Computer vision, which allows a camera to be the "eye" of a computer to view the world, is expected to enable the computer to process the massive high-dimensional image data like the brain, and to be widely applied to industrial flaw detection, medical image processing, road safety and monitoring protection. The target tracking is a very important and basic direction in computer vision, under the condition of giving an initial state, the motion trail of a target in a video sequence is estimated, the flow which is seemingly simple integrates the knowledge of a plurality of fields such as image processing, mode recognition, probability theory, optimization theory and the like, and the method has wide application in military and civil life, such as sports event relay, unmanned vehicle, intelligent monitoring and a human-computer interaction system.
In recent years, a tracking algorithm based on a discriminant correlation filter has achieved excellent results, and tensor eigenexpressions that maintain the structure of high-dimensional image data have been introduced into the DCF. With the continuous improvement of the computing power of computers and the strong feature extraction capability of deep neural networks, a great number of researchers and science and technology companies are conducting computer vision research based on deep learning and have great success. In recent years, due to the development of visual target tracking competition, a data training basis is provided for target tracking based on deep learning by the appearance of a large number of manually labeled data sets such as OTB, TrackingNet, COCO and the like, the development of the deep learning in a target tracking algorithm is greatly stimulated, researchers increasingly integrate the traditional tracking algorithm into a deep network, the pressure of feature extraction and parameter optimization of the traditional method is relieved, and the classification performance of a target and a background is improved. However, deep learning is data-driven, and has the disadvantages of long training time, high sample requirement, and high hardware configuration requirement, and still has limitations.
Furthermore, the tracker suffers from interference from two aspects: rotation and deformation caused by the self-movement of the target, and blurring and scale change caused by rapid movement; shielding and background speckle caused by external environment. These disturbances present a number of challenges to the tracking algorithm. Achieving highly robust and accurate visual tracking remains a difficult point.
The target tracking is closely related to the life of human beings, and has a very wide application prospect, and although the technical method is continuously updated along with the technological progress to overcome various interferences, the design of a tracker with robustness and real-time performance is still a difficult task, and has important significance for the research thereof.
Disclosure of Invention
In order to solve the problem that visual tracking with high robustness and accuracy is difficult to realize in the prior art, the invention provides a twin network tracking method based on alpha divergence.
In order to solve the technical problems, the invention adopts the following technical means:
the invention provides a twin network tracking method based on alpha divergence, which comprises the following steps:
acquiring an image to be tracked and a well-trained twin network, wherein the twin network is trained based on alpha divergence;
extracting the depth features of the image to be tracked by using ResNet50 in the trained twin network;
processing the depth characteristics of the image to be tracked by using a target center regression branch in the trained twin network to obtain the predicted target position of the image to be tracked;
and respectively processing the depth characteristics of the image to be tracked by using the regression branches of the target frame in the trained twin network to obtain a predicted target frame of the image to be tracked.
Further, the training process of the twin network is as follows:
constructing a basic framework of a twin network, wherein the twin network comprises a main network adopting ResNet50, a target center regression branch and a target frame regression branch;
obtaining a training set and a test set of the twin network, wherein the training set or the test set comprises a plurality of training images or test images containing targets;
extracting depth features of training images in a training set by using ResNet50, and respectively transmitting the depth features to a target center regression branch and a target frame regression branch;
processing the depth characteristics of the training image by using the target center regression branch to obtain the predicted target position of the training image, and training the alpha divergence of the target center regression branch by using grid sampling;
processing the depth characteristics of the training image by using the regression branch of the target frame to obtain a predicted target frame of the training image, and sampling the alpha divergence of the regression branch of the training target frame by using Monte Carlo;
determining network parameters of the twin network through alpha divergence training to obtain a trained twin network;
and testing the trained twin network by using a test set.
Furthermore, in the twin network training process, the first three frames of images including the current frame are selected as a group of training images to be input into the twin network, and the last three frames of images including the current frame are selected as a group of testing images to be tested and input into the twin network.
Further, an initialization layer is adopted in the target center regression branch to initialize the convolution kernel, and an optimization layer is adopted to update the filter; and the target frame regression branch obtains a modulation vector by using a full connection layer for the depth features of the training image or the test image based on IoUnet, and then regresses the overlapping degree between each candidate window and the real target frame.
Further, the calculation formula of the alpha divergence of the target center regression branch or the target frame regression branch is as follows:
Figure BDA0003077394740000041
wherein p (y | x) i θ) represents the conditional probability distribution of the target center regression branch or target box regression branch output, p (yyy |) i ) Representing the conditional probability distribution of the true annotations in the training image, D α [p(y|y i )||p(y|x i ,θ)]Denotes p (y | y) i ) And p (y | x) i Alpha divergence between θ), y representing the true target position or the true target frame, x i Representing the ith training image, theta is a parameter of a target center regression branch or a target frame regression branch, alpha is a control coefficient of alpha divergence, and y i Representing the position of the artificially labeled target or labeled target frame in the ith training image, s θ (y,x i ) Is represented by x i And y is the score of the target center regression branch or the target frame regression branch output when one sample is obtained, i is 1,2, …, n, n is the number of training images in the training set.
Further, the method for training the alpha divergence of the target center regression branch by using grid sampling comprises the following steps:
dividing the confidence score graph output by the regression branch of the target center into K grids so as to ensure that
Figure BDA0003077394740000042
Wherein, y (k) Indicating the sampled target position of the kth grid point,
Figure BDA0003077394740000043
a set of sampled target positions representing K grid points;
and expressing the alpha divergence by using a grid sampling method and using the alpha divergence as a loss function of the target center regression branch, wherein the expression of the corresponding loss function of the ith training image in the target center regression branch is as follows:
Figure BDA0003077394740000051
wherein L is i Representing the corresponding loss function of the ith training image in the target center regression branch, wherein C is 1/alpha (1-alpha), alpha is the control coefficient of alpha divergence, A is the scaling factor of the grid sampling method, and p (y) (k) |y i ) Representing the conditional probability distribution, s, of the true label in the kth grid point θ (y (k) ,x i ) Is represented by x i And y (k) Confidence score, x, of target center regression branch output for one sample i Representing the ith training image, theta is a parameter of a target center regression branch, i is 1,2, …, n, and n is the number of training images in a training set;
using a loss function L i And training the network parameters of the target center regression branch to obtain a filter for judging the target position.
Further, the method for training the alpha divergence of the regression branch of the target frame by using Monte Carlo sampling comprises the following steps:
and utilizing Monte Carlo sampling to represent alpha divergence and taking the alpha divergence as a loss function of the regression branch of the target frame, wherein the expression of the corresponding loss function of the ith training image in the regression branch of the target frame is as follows:
Figure BDA0003077394740000052
wherein, L' i A control system for representing the corresponding loss function of the ith training image in the regression branch of the target frame, wherein C is 1/alpha (1-alpha), and alpha is alpha divergenceNumber, H is the number of samples of the monte carlo sample,
Figure BDA0003077394740000053
indicates the given labeled target box y in the h-th sample i The true probability distribution under the condition of (a),
Figure BDA0003077394740000054
representing the real target box in the h-th sample,
Figure BDA0003077394740000055
indicates the given labeled target box y in the h-th sample i The probability distribution of sampling under the condition of (a),
Figure BDA0003077394740000061
is represented by x i And
Figure BDA0003077394740000062
overlap, x, of regression branch outputs of the target frame for one sample i Representing the ith training image, theta is a parameter of a regression branch of the target frame, i is 1,2, …, n, and n is the number of training images in the training set;
utilizing loss function L' i And training the network parameters of the regression branches of the target box.
Further, the method comprises the following steps:
and after the trained twin network tracks the images to be tracked with the preset frame number, updating the network parameters of the target center regression branch in the twin network by using the online updating sample to obtain a new trained twin network.
Further, the value range of the preset frame number is 5-20.
The following advantages can be obtained by adopting the technical means:
the invention provides a twin network tracking method based on alpha divergence, which extracts deep features of an input image by using ResNet, obtains a predicted target position and a predicted target frame of a target in the input image by using a target center regression branch and a target frame regression branch respectively, and can give a motion track of the target in a video sequence. According to the method, from the angle of probability, conditional probability distribution is used as the output of the twin network, alpha divergence is used as the loss function of the network, twin network training is carried out through a large number of data sets, the distribution of network output distribution and real marking can be fitted, further, uncertainty existing in a target area of manual marking and noise introduced by manual marking are eliminated, tracking interference is reduced, and the robustness and accuracy of target tracking are improved.
In a twin network structure, the method solves the alpha divergence of the target center regression branch and the target frame regression branch by using a grid sampling method and a Monte Carlo sampling method respectively, and the twin network can directly use the alpha divergence without receiving the interference of loss selection. In addition, the method of the invention can update network parameters in practical application, and further ensure the tracking effect of the twin network.
The target tracking effect of the method is higher than that of the existing tracker, the target tracking accuracy, success rate and speed are higher, the method has higher accuracy and robustness, and the method has very wide application prospect.
Drawings
FIG. 1 is a flowchart of the steps of an alpha divergence-based twin network tracking method of the present invention;
FIG. 2 is a network structure diagram of a twin network in an embodiment of the present invention;
FIG. 3 is a flowchart illustrating the steps of training twin networks according to an embodiment of the present invention;
FIG. 4 is a graph of the accuracy of the method and contrast tracker of the present invention on an OTB100 data set in an embodiment of the present invention;
FIG. 5 is a graph of the success rate of the method of the present invention and a comparison tracker on an OTB100 data set in an embodiment of the present invention;
FIG. 6 is a graph of the accuracy of the method and contrast tracker of the present invention on a UAV123 data set in accordance with an embodiment of the present invention;
figure 7 is a graph of the success rate of the method and comparative tracker of the present invention on a UAV123 data set in accordance with an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the accompanying drawings as follows:
the invention provides a twin network tracking method based on alpha divergence, which specifically comprises the following steps as shown in figure 1:
step 1, obtaining an image to be tracked and a well-trained twin network, wherein the twin network is trained based on alpha divergence. When one or more targets in one video need to be tracked and identified, images in the video can be extracted according to a time sequence, each frame of image is used as an image to be tracked, and the image is input into a trained twin network for target tracking and identification.
In the embodiment of the present invention, a network structure of the twin network is shown in fig. 2, the twin network mainly includes 1 trunk network and 2 branch networks, wherein the trunk network adopts ResNet50, the branch networks are a target center regression branch and a target frame regression branch, respectively, and the branch networks can be regarded as convolutional neural networks.
And 2, extracting the depth features of the image to be tracked by utilizing ResNet50 in the trained twin network. After each frame of image to be tracked is input into ResNet50, the features output by the third layer and the fourth layer of ResNet50 are pooled to obtain the depth features of the image to be tracked, and the depth features are input into a target center regression branch and a target frame regression branch.
And 3, processing the depth characteristic of the image to be tracked by using a target center regression branch in the trained twin network, wherein the target center regression branch is equivalent to a filter for roughly positioning the target, the depth characteristic of the image to be tracked and the filter are subjected to convolution operation to obtain a confidence score, and the position corresponding to the maximum value of the confidence score is the predicted target position of the image to be tracked.
And 4, respectively processing the depth characteristics of the image to be tracked by using target frame regression branches in the trained twin network, wherein the target frame regression branches are based on IoUnet, a plurality of candidate target frames are produced at the target position in the image to be tracked, a modulation vector is obtained by using a modulation network, and IoU scores (overlapping degree) of each candidate target frame are regressed, wherein the candidate target frame corresponding to the maximum IoU score is the predicted target frame of the image to be tracked.
And 5, after the trained twin network tracks the images to be tracked with preset frame numbers, updating the network parameters of the target center regression branch in the twin network by using the online updating sample to obtain a new trained twin network.
In order to further improve the target tracking effect of the method, the network parameters need to be updated in practical application, the method stores the image to be tracked of each frame input into the twin network, takes the recently stored 50 frames of images as online updating samples, trains the target center regression branch in the twin network again by using the online updating samples after the twin network continuously tracks the image to be tracked of the preset frame number, updates the network parameters of the filter, and uses the updated twin network to track the target in subsequent application. Wherein the value range of the preset frame number is 5-20.
In order to avoid deviation of tracking effect caused by selection problem of loss function in deep learning, the invention utilizes alpha divergence as the loss function of two branch networks to carry out twin network training, and the principle is to minimize the alpha divergence between conditional probability distribution and real labeling distribution of network output, so that the prediction distribution is approximate to the real distribution. The calculation formula of the alpha divergence of the target center regression branch or the target frame regression branch is as follows:
Figure BDA0003077394740000091
wherein D is α [p(y|y i )||p(y|x i ,θ)]Denotes p (y | y) i ) And p (y | x) i Alpha divergence between theta), p (y | x) i θ) represents the conditional probability distribution of the target center regression branch or target box regression branch output, p (yyy |) i ) Expressing the conditional probability distribution of the true mark in the training image, y expressing the true value of the sample, and regression branching at the target centerWhere y represents the true target position, y represents the true target box in the target box regression branch, x i The method can lead the predicted distribution and the real distribution to be more fitted by manually adjusting alpha, and y is characterized in that the ith training image is represented, theta is a parameter of a target center regression branch or a target frame regression branch, and alpha is a control coefficient of alpha divergence i Representing the artificial annotation value in the ith training image, y in the target center regression branch i Indicating the marked target position of the manual mark, y in the regression branch of the target frame i Labeling target boxes, s, representing manual labeling θ (y,x i ) Is represented by x i And y is the score of the target center regression branch or target box regression branch output for one sample, i.e. the branch network in image x i Output at position y of (1), s in the target central regression branch θ (y,x i ) Representing confidence scores, in the target box regression branches s θ (y,x i ) The overlap may also be referred to as an IoU score, where i is 1,2, …, n, n is the number of training images in the training set.
p(y|x i θ) is as follows:
Figure BDA0003077394740000101
Z θ (x i )=∫exp(s θ (y,x i ))dy (6)
given network f θ (. The translational invariance of two-dimensional images can efficiently parameterize the output s of the network θ (y,x i )=f θ (x i )(y)。
As shown in fig. 3, the training process of the twin network is as follows:
and step A, constructing a basic framework of the twin network, wherein the specific framework is shown in figure 2.
B, obtaining a training set and a test set of the twin network, wherein the training set comprises a plurality of training images containing targets, and each training image contains an artificially labeled target position and a labeled target frame; the test set includes a plurality of test images containing the object.
When subsequent training and testing operations are carried out, the method selects the first three frames of images containing the current frame as a group of training images to be input into the twin network, and selects the last three frames of images containing the current frame as a group of testing images to be tested and input into the twin network.
And C, extracting the depth features of the training images in the training set by using ResNet50, and respectively transmitting the depth features to the target center regression branch and the target frame regression branch. Features of the input training image are extracted by using ResNet50, specifically, features of the third layer and the fourth layer of ResNet are used respectively, and the features are subjected to pooling by a pooling layer prPooling and then input into a branch network.
And D, processing the depth characteristics of the training image by using the target center regression branch to obtain the predicted target position of the training image, and training the alpha divergence of the target center regression branch by using grid sampling.
The target-centric regression branch is used for target-centric regression, where an initialization layer is used to initialize the convolution kernel (i.e., filter parameters), and the optimization layer is used to update the filter.
And D01, performing convolution operation on the depth features of the training image by using the target center regression branch to obtain a confidence score, and selecting a position corresponding to the maximum value of the confidence score as a predicted target position of the training image.
And D02, loss calculation can be carried out on the target center regression branch according to the predicted target position and the manually marked target position.
The invention uses grid sampling method to solve the integral of alpha divergence, and sets the confidence score chart of target center regression branch output to be divided into K grids to make
Figure BDA0003077394740000111
Wherein, y (k) Indicating the sampled target position of the kth grid point,
Figure BDA0003077394740000112
representing the truths of K grid pointsA set of target locations.
And (3) expressing the alpha divergence in the formula (4) by using a grid sampling method and using the alpha divergence as a loss function of the target center regression branch, wherein the expression of the corresponding loss function of the ith training image in the target center regression branch is as follows:
Figure BDA0003077394740000113
wherein L is i Representing the corresponding loss function of the ith training image in the target center regression branch, wherein C is 1/alpha (1-alpha), A is the scaling factor of the grid sampling method, and p (y) (k) |y i ) Representing the conditional probability distribution, s, of the true label in the kth grid point θ (y (k) ,x i ) Is represented by x i And y (k) The confidence score of the target center regression branch output at one sample. The final loss function is the average loss over a small sample batch.
Step D03, utilizing the loss function L i And training the network parameters of the target center regression branch to obtain a filter for judging the target position.
And E, processing the depth characteristics of the training image by using the regression branch of the target frame to obtain a predicted target frame of the training image, and sampling the alpha divergence of the regression branch of the training target frame by using Monte Carlo.
And E01, processing the depth characteristics of the training image by using the regression branch of the target frame, producing a plurality of candidate target frames at the target positions of the training image, calculating the overlapping degree between each candidate target window and the real target frame, and selecting the optimal candidate target window as the prediction target frame corresponding to the training image according to the overlapping degree.
And E02, performing loss calculation on the regression branch of the target frame according to the predicted target frame and the manually labeled target frame.
Representing the degree of overlap of regression branch outputs of the target box as
Figure BDA0003077394740000121
Wherein, y bb Representing the real border of the target, x representing the training image, the invention does not use negative log likelihood loss-logp (y) i |x i ,θ)=log(∫exp(s θ (y,x i ))dy)-s θ (y i ,x i ) Instead, the alpha divergence in equation (4) is used as the loss function, and the Monte Carlo sampling is used to solve the alpha divergence, where the reason for not using grid sampling is: grid sampling causes a large amount of calculation in target frame regression, and is difficult to popularize to a high dimension and has sampling deviation. Usually, the uncertainty of the target frame is generated when the frame is labeled manually, especially for small targets, the manual labeling can generate different labels by different labeling persons to introduce noise, and the invention assumes that the target frame y is labeled given i Under the condition (1), the probability distribution of the true target box for sampling is:
Figure BDA0003077394740000131
and utilizing Monte Carlo sampling to represent alpha divergence and taking the alpha divergence as a loss function of the regression branch of the target frame, wherein the expression of the corresponding loss function of the ith training image in the regression branch of the target frame is as follows:
Figure BDA0003077394740000132
wherein, L' i Representing the corresponding loss function of the ith training image in the regression branch of the target frame, H is the sampling frequency of Monte Carlo sampling,
Figure BDA0003077394740000133
indicates the given labeled target box y in the h-th sample i The true probability distribution under the condition of (a),
Figure BDA0003077394740000134
representing the real target box in the h-th sample,
Figure BDA0003077394740000135
Figure BDA0003077394740000136
indicates the given labeled target box y in the h-th sample i The probability distribution of sampling under the condition of (a),
Figure BDA0003077394740000137
is expressed as x i And
Figure BDA0003077394740000138
the overlap of the target frame regression branch outputs for one sample.
When the distribution of the candidate target frame can cover the high region in the sample real conditional probability distribution and the predicted output conditional probability distribution, the task of the regression frame can be satisfied by using the mixed Gaussian model with the marked target frame as the center.
Step E03, utilizing loss function L' i And training the network parameters of the regression branches of the target box.
And F, determining network parameters of the twin network through alpha divergence training to obtain the trained twin network.
And G, testing the trained twin network by using the test set.
To verify the effectiveness of the method of the invention, a set of comparative experiments is given below:
the hardware of the comparison experiment adopts two RTX 2080Ti display cards, one 12-core, two-process CPUs of each core and a 64G server running a memory to train and experiment.
Firstly, 50 generations of training are carried out on a COCO, GOT10K, LaSot and TrackingNet data set, each generation is trained for 1000 times, and the final network parameters of the twin network are obtained; then, the method and trackers such as STRCF, LADCF, ECO-HC, GFSDCF, ARCF-H, ARCF-HC, AutoTrackcC, SAMF, KCF, DSST, HOG-LR, BACF, Stacke + CA, SRDCF, SAMF + AT are used for carrying out comparison experiments on the data set OTB100 and the data set UAV123 respectively, and the accuracy and the success rate of the method and the comparison trackers on 2 data sets are shown in FIGS. 4-7.
As can be seen from fig. 4 and 5, the tracking effect of the method (alphaTK) on OTB100 data set is significantly higher than that of other contrast trackers, 0.3% higher than the second STRCF in the accuracy curve and 2% higher than the second LADCF in the success rate curve. As can be seen from fig. 6 and 7, the accuracy and success rate of the UAV123 data set of the method of the present invention are also higher than those of other contrast trackers, which are 4.4% higher than the second autotrack c in the accuracy curve and 14.1% higher than the second in the success rate curve. Through comparative experiments, the method disclosed by the invention can achieve a good target tracking effect on both OTB100 and UAV123 data sets, and the performance in UAV123 is more prominent.
According to the method, from the angle of probability, conditional probability distribution is used as the output of the twin network, alpha divergence is used as the loss function of the network, twin network training is carried out through a large number of data sets, the distribution of network output distribution and the distribution of real marking can be fitted, further, uncertainty existing in a target area of artificial marking and noise introduced by artificial marking are eliminated, tracking interference is reduced, and the robustness and accuracy of target tracking are improved. In a twin network structure, the method solves the alpha divergence of the target center regression branch and the target frame regression branch by using a grid sampling method and a Monte Carlo sampling method respectively, and the twin network can directly use the alpha divergence without receiving the interference of loss selection. The target tracking effect of the method is higher than that of the conventional tracker, and the method has a very wide application prospect.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (8)

1. A twin network tracking method based on alpha divergence is characterized by comprising the following steps:
acquiring an image to be tracked and a well-trained twin network, wherein the twin network is trained based on alpha divergence;
extracting the depth features of the image to be tracked by using ResNet50 in the trained twin network;
processing the depth characteristics of the image to be tracked by using a target center regression branch in the trained twin network to obtain the predicted target position of the image to be tracked;
respectively processing the depth characteristics of the image to be tracked by using the regression branches of the target frame in the trained twin network to obtain a predicted target frame of the image to be tracked;
the twin network training process is as follows:
constructing a basic framework of a twin network, wherein the twin network comprises a main network adopting ResNet50, a target center regression branch and a target frame regression branch;
obtaining a training set and a test set of the twin network, wherein the training set or the test set comprises a plurality of training images or test images containing targets;
extracting depth features of training images in a training set by using ResNet50, and respectively transmitting the depth features to a target center regression branch and a target frame regression branch;
processing the depth characteristics of the training image by using the target center regression branch to obtain the predicted target position of the training image, and training the alpha divergence of the target center regression branch by using grid sampling;
processing the depth characteristics of the training image by using the regression branch of the target frame to obtain a predicted target frame of the training image, and sampling the alpha divergence of the regression branch of the training target frame by using Monte Carlo;
determining network parameters of the twin network through alpha divergence training to obtain a trained twin network;
and testing the trained twin network by using a test set.
2. The method as claimed in claim 1, wherein during the twin network training process, the first three images including the current frame are selected as a set of training images to be input into the twin network, and the last three images including the current frame are selected as a set of test images to be tested and input into the twin network.
3. The method as claimed in claim 1, wherein an initialization layer is used in said target central regression branch to initialize convolution kernel, and an optimization layer is used to update filter; and the target frame regression branch obtains a modulation vector by using a full connection layer for the depth features of the training image or the test image based on IoUnet, and then regresses the overlapping degree between each candidate window and the real target frame.
4. The twin network tracking method based on alpha divergence according to claim 1, wherein the calculation formula of the alpha divergence of the target center regression branch or the target frame regression branch is as follows:
Figure FDA0003687188760000021
wherein p (y | x) i θ) represents the conditional probability distribution of the target center regression branch or target box regression branch output, p (yyy |) i ) Representing the conditional probability distribution of the true annotations in the training image, D α [p(y|y i )||p(y|x i ,θ)]Denotes p (y | y) i ) And p (y | x) i θ) alpha divergence between y representing the true target position or true target box, x i Representing the ith training image, theta is a parameter of a target center regression branch or a target frame regression branch, alpha is a control coefficient of alpha divergence, and y i Indicating the position of the artificially labeled target or labeled target frame in the ith training image, s θ (y,x i ) Is represented by x i And y is the score of the target center regression branch or the target frame regression branch output when one sample is obtained, i is 1,2, …, n, n is the number of training images in the training set.
5. The twin network tracking method based on alpha divergence according to claim 1, wherein the method for training the alpha divergence of the regression branch at the center of the target by using grid sampling comprises:
dividing the confidence score chart output by the regression branch of the target center into K grids to order
Figure FDA0003687188760000031
Wherein, y (k) Indicating the sampled target position of the kth grid point,
Figure FDA0003687188760000032
a set of sampled target positions representing K grid points;
and expressing the alpha divergence by using a grid sampling method and using the alpha divergence as a loss function of the target center regression branch, wherein the expression of the corresponding loss function of the ith training image in the target center regression branch is as follows:
Figure FDA0003687188760000033
wherein L is i Representing the corresponding loss function of the ith training image in the target center regression branch, wherein C is 1/alpha (1-alpha), alpha is the control coefficient of alpha divergence, A is the scaling factor of the grid sampling method, and p (y) (k) |y i ) Representing the conditional probability distribution, s, of the true label in the kth grid point θ (y (k) ,x i ) Is represented by x i And y (k) Confidence score, x, of target center regression branch output for one sample i Representing the ith training image, theta is a parameter of a target center regression branch, i is 1,2, …, n, and n is the number of training images in a training set;
using a loss function L i And training the network parameters of the target center regression branch to obtain a filter for judging the target position.
6. The twin network tracking method based on alpha divergence according to claim 1, wherein the method for training the alpha divergence of the regression branch of the target frame by using Monte Carlo sampling comprises:
and utilizing Monte Carlo sampling to represent alpha divergence and taking the alpha divergence as a loss function of the regression branch of the target frame, wherein the expression of the corresponding loss function of the ith training image in the regression branch of the target frame is as follows:
Figure FDA0003687188760000041
wherein, L' i Representing the corresponding loss function of the ith training image in the regression branch of the target frame, wherein C is 1/alpha (1-alpha), alpha is a control coefficient of alpha divergence, H is the sampling frequency of Monte Carlo sampling,
Figure FDA0003687188760000042
indicates the given labeled target box y in the h-th sample i The true probability distribution under the condition of (a),
Figure FDA0003687188760000043
representing the real target box in the h-th sample,
Figure FDA0003687188760000044
indicates the given labeled target box y in the h-th sample i The probability distribution of sampling under the condition of (a),
Figure FDA0003687188760000045
is expressed as x i And
Figure FDA0003687188760000046
overlap, x, of regression branch outputs of the target box for one sample i Representing the ith training image, theta is a parameter of a regression branch of the target frame, i is 1,2, …, n is the number of training images in the training set;
utilizing loss function L' i And training the network parameters of the regression branches of the target box.
7. The method of alpha divergence based twin network tracking as claimed in claim 1, further comprising the steps of:
and after the trained twin network tracks the images to be tracked with the preset frame number, updating the network parameters of the target center regression branch in the twin network by using the online updating sample to obtain a new trained twin network.
8. The twin network tracking method based on alpha divergence as claimed in claim 7, wherein the value range of said preset number of frames is 5-20.
CN202110556609.7A 2021-05-21 2021-05-21 Twin network tracking method based on alpha divergence Active CN113298136B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110556609.7A CN113298136B (en) 2021-05-21 2021-05-21 Twin network tracking method based on alpha divergence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110556609.7A CN113298136B (en) 2021-05-21 2021-05-21 Twin network tracking method based on alpha divergence

Publications (2)

Publication Number Publication Date
CN113298136A CN113298136A (en) 2021-08-24
CN113298136B true CN113298136B (en) 2022-08-05

Family

ID=77323619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110556609.7A Active CN113298136B (en) 2021-05-21 2021-05-21 Twin network tracking method based on alpha divergence

Country Status (1)

Country Link
CN (1) CN113298136B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229052B (en) * 2023-05-09 2023-07-25 浩鲸云计算科技股份有限公司 Method for detecting state change of substation equipment based on twin network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN112712546A (en) * 2020-12-21 2021-04-27 吉林大学 Target tracking method based on twin neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN112712546A (en) * 2020-12-21 2021-04-27 吉林大学 Target tracking method based on twin neural network

Also Published As

Publication number Publication date
CN113298136A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN113610126B (en) Label-free knowledge distillation method based on multi-target detection model and storage medium
Lim et al. Isolated sign language recognition using convolutional neural network hand modelling and hand energy image
Yuan et al. Robust visual tracking with correlation filters and metric learning
CN108256421A (en) A kind of dynamic gesture sequence real-time identification method, system and device
CN109859241B (en) Adaptive feature selection and time consistency robust correlation filtering visual tracking method
CN104484890B (en) Video target tracking method based on compound sparse model
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
CN104463191A (en) Robot visual processing method based on attention mechanism
CN112307995A (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
CN109508686B (en) Human behavior recognition method based on hierarchical feature subspace learning
CN113298136B (en) Twin network tracking method based on alpha divergence
CN110084834B (en) Target tracking method based on rapid tensor singular value decomposition feature dimension reduction
Tang et al. Transound: Hyper-head attention transformer for birds sound recognition
Yao RETRACTED ARTICLE: Deep learning analysis of human behaviour recognition based on convolutional neural network analysis
Huang et al. BSCF: Learning background suppressed correlation filter tracker for wireless multimedia sensor networks
CN103996207A (en) Object tracking method
CN114038011A (en) Method for detecting abnormal behaviors of human body in indoor scene
Liu et al. Key algorithm for human motion recognition in virtual reality video sequences based on hidden markov model
Ikram et al. Real time hand gesture recognition using leap motion controller based on CNN-SVM architechture
Guo et al. An adaptive kernelized correlation filters with multiple features in the tracking application
Zhang et al. Robust correlation tracking in unmanned aerial vehicle videos via deep target-specific rectification networks
CN116343335A (en) Motion gesture correction method based on motion recognition
Zhou et al. Hybrid generative-discriminative learning for online tracking of sperm cell
CN110659576A (en) Pedestrian searching method and device based on joint judgment and generation learning
CN109492530A (en) Robustness vision object tracking algorithm based on the multiple dimensioned space-time characteristic of depth

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant