CN113888595B - Twin network single-target visual tracking method based on difficult sample mining - Google Patents
Twin network single-target visual tracking method based on difficult sample mining Download PDFInfo
- Publication number
- CN113888595B CN113888595B CN202111152770.4A CN202111152770A CN113888595B CN 113888595 B CN113888595 B CN 113888595B CN 202111152770 A CN202111152770 A CN 202111152770A CN 113888595 B CN113888595 B CN 113888595B
- Authority
- CN
- China
- Prior art keywords
- target
- image
- sample
- images
- difficult
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000005065 mining Methods 0.000 title claims abstract description 31
- 230000000007 visual effect Effects 0.000 title claims description 18
- 238000012549 training Methods 0.000 claims abstract description 60
- 230000008569 process Effects 0.000 claims abstract description 23
- 238000000605 extraction Methods 0.000 claims description 15
- 230000004044 response Effects 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 8
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a twin network single target tracking method based on difficult sample mining, which comprises the steps of constructing a training set, constructing a convolution twin network based on difficult sample mining and the like: according to the invention, the difficult sample mining is introduced into the target tracking method, the difficult negative sample is mined as training data in the training process, the network parameters are updated, the ternary group loss of the difficult sample is selected as a loss function, the difficult sample is continuously optimized, and the difficult negative sample is continuously mined by the model in the training process through optimizing the loss, so that the network is fully trained, similar targets are better distinguished, the model is learned to have the characteristic of distinguishing capability, and the target tracking effect is better.
Description
Technical Field
The invention belongs to the technical field of computer vision, relates to an image processing technology, and particularly relates to a twin network single target tracking method based on difficult sample mining.
Background
The single-target visual tracking is one of the popular but challenging research subjects in computer vision, has wide application in the aspects of intelligent video monitoring, robot visual navigation, medical diagnosis, positioning and tracking of underwater organisms and the like, and has wide development prospect. Visual target tracking refers to designating a target to be tracked in a first frame of a video sequence and calibrating an initial position of the target to be tracked, and then predicting the position and the size of the target in a subsequent frame to accurately track the target.
Early classical algorithms all process in the time domain, and these algorithms involve complex calculations, and the large amount of calculation makes tracking less real-time. Then an algorithm based on correlation filtering appears, and in contrast, the target tracking method converts calculation into a frequency domain by the introduction of the correlation filtering, so that the operation amount is greatly reduced, and the speed is greatly improved. With the development of deep learning, researchers have introduced deep learning techniques into target tracking, and a series of methods have been proposed and have achieved good results.
In recent years, a method for tracking targets based on a twin network has received unprecedented attention. The existing method adopts a convolutional neural network to perform feature extraction on target modeling. In the process of target tracking, offline training of a tracked target is one of the keys of the performance of a relational tracking model, and the selection of training data is particularly important when the model is in offline training. The existing twin network-based method only uses a target area, the characteristics extracted in the target area are directly subjected to related operation in the characteristics of the test frame image, the robustness is poor, complex scenes such as similar objects cannot be processed, and the discrimination capability is insufficient. The prior method usually marks the coordinate distance between the object and the instance as positive when the object is tracked, otherwise marks as negative, maximizes the similarity score of the positive instance pair and minimizes the similarity score of the negative instance pair through logic loss, and the method only uses the paired relation among the sample pairs, ignores the potential relation among the prototype, the positive instance and the negative instance, does not consider the effect of the difficult sample on the model, cannot process complex scenes such as similar objects and the like, and has proved by researchers in the fields of object recognition and the like.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a twin network single-target tracking method based on difficult sample mining, which is characterized in that difficult sample mining is introduced into the target tracking method, difficult negative samples are mined as training data in the training process, network parameters are updated, and difficult sample triplet loss is selected as a loss function, so that the difficult negative samples are continuously mined by a model in the training process through optimizing the loss, the network is fully trained, similar targets are better distinguished, and the model is learned to have the characteristic of distinguishing capability.
In order to solve the technical problems, the invention adopts the following technical scheme:
a twin network single target tracking method based on difficult sample mining comprises the following steps:
Step (1), constructing a training set: cutting out target template images Z and search area images X of all images in an image sequence training set according to the target positions and the sizes of the images, dividing the search area images X into positive example images P and negative example images N, forming a pair of positive sample pairs by the images Z and the images P, forming a pair of negative sample pairs by the images Z and the images N, and forming a training data set by a (Z, P, N) triplet formed by the target template images Z, the positive example images P and the negative example images N;
Step (2), constructing a convolution twin network based on difficult sample mining, wherein the network comprises three branches and the three branches share weights of a feature extraction network; the three branches are respectively used for acquiring a feature map of a target template image, a feature map of a positive sample image of a search area and a feature map of a negative sample image, wherein during feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
step (3), performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein a position with a higher score in the response map is considered as a position most similar to an image target object, and the response map is enlarged to an original image size, so that the position of a target on an image to be searched is determined;
Step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a training convergence twin network;
and (5) performing online target tracking by utilizing the trained twin network.
Further, the operation of step (1) includes cropping the target region template image and cropping the search region image; the clipping method of the target template image comprises the following steps: the method comprises the steps that a target frame of a template image in target tracking is known, a square area is cut out by taking a tracked target as the center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of a cut target image block is scaled; the clipping method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking the target area as the center, and then scaling the size of the cut image block of the search area; where q= (w+h)/4,w is the width of the target frame and h is the height of the target frame.
Further, in the step (2), the feature extraction networks of different branches of the twin network are adjusted ResNet-50, and the input image is subjected to ResNet-50 feature extraction.
Further, the positive sample pair is an image pair with similar visual characteristics and high reference contrast, and the negative sample pair is an image pair with similar visual characteristics and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
Wherein S v represents the visual feature similarity, S c represents the reference contrast similarity, α represents the threshold value of the visual feature similarity, and β represents the threshold value of the reference contrast similarity;
When selecting pictures from a training set for training, selecting a least similar positive sample and a most similar negative sample for each picture to form a triplet, and calculating the triplet loss of a difficult sample; the difficult sample triplet loss is defined as:
wherein M represents M targets selected from each batch of samples, N represents N pictures selected randomly from each target, (z) + represents max (z, 0), z represents maxd A,P-mindA,N +θ, θ is a threshold parameter set according to actual needs, d A,P represents similarity between a template sample and a positive sample, and d A,N represents distance between the template sample and the negative sample;
through L hard optimizing loss, the model continuously excavates positive sample pairs and difficult negative samples in the training process, and learns the characteristics with distinguishing capability.
Further, the operation of the step (3) is as follows: after feature extraction, fusing different layers of features, wherein the lower layer features have more target position information and the higher layer features have more semantic information, performing up-sampling operation on the higher layer features, then fusing the higher layer features with the lower layer features, iteratively generating feature images fused by different branch multi-layer features, performing cross-correlation operation on the target template image feature images, the positive sample image feature images and the negative sample image feature images of the search area respectively to obtain response images, expanding the response images to the original image size, and determining the position of the target on the image to be searched.
Further, the specific operation of step (4) is as follows:
1) Training by using initial positive and negative samples, and enabling the Z direction P to be close to and far from N through training to obtain a trained classifier;
2) Classifying the samples by using the trained classifier, putting the samples with the misclassification as difficult negative samples into a negative sample subset, and then continuing to train the classifier;
3) The process is repeated until the performance of the classifier is no longer improved.
Further, the online tracking process in step (5) includes the following steps:
1) Reading a first frame picture of a video sequence to be tracked, acquiring boundary frame information of the first frame picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of the training convergence twin network in the step (4), extracting multi-layer features of the template image, fusing, and then setting t=2;
2) Reading a t frame of a video to be tracked, cutting out a search area image of the t frame according to the target position determined in the t-1 frame and the method for cutting out the search area image in the step (1), inputting the cut t frame search area image into a search branch of the training convergence twin network in the step (4), and extracting the characteristics of the t frame search image;
3) Performing cross-correlation operation on the feature map obtained in the step 1) after multi-layer fusion and the feature map obtained in the step 2);
4) Setting t=t+1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, if so, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Compared with the prior art, the invention has the advantages that:
Aiming at the problem that the existing twin network target tracking method does not consider the effect of a difficult sample on a model, the twin network target tracking method based on difficult sample mining is designed, the difficult sample mining is introduced into a target tracking twin network structure, a difficult negative sample is mined in the training process as training data, and the loss of a difficult sample triplet is selected as a loss function, so that the model is continuously optimized, the characteristic with distinguishing capability is learned, and the target tracking effect is good.
Specifically, in the training process, an initial positive sample and a negative sample are used for training, then the trained classifier is used for classifying the samples, the samples with the wrong classification are used as difficult negative samples to be placed into a negative sample subset, then the training is continued, and the training is repeated until the performance of the classifier is not improved. Different from the traditional samples for triplet training, the invention selects the difficult sample triples, updates network parameters by using the difficult samples in the training process, selects the positive sample which is least similar to each picture and the negative sample which is most similar to each picture to calculate the difficult triplet loss, and the model continuously excavates the difficult negative samples in the training process through optimizing the loss, so that the network is fully trained, similar targets are better distinguished, the problems of local change, background interference and the like in the images are solved, and the learned model has stronger generalization capability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic overall flow chart of the present invention;
FIG. 2 is a schematic diagram of a difficult sample mining strategy architecture according to the present invention;
FIG. 3 is a tracking effect of object tracking for a first video sequence using the method of the present invention;
fig. 4 is a tracking effect of object tracking for a second video sequence using the method of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples.
In combination with the overall flow shown in fig. 1, the twin network single target tracking method based on difficult sample mining comprises the following steps:
And (1) constructing a training set.
Cutting out target template images Z and search area images X of all images in an image sequence training set according to the target positions and the sizes of the images, dividing the search area images X into positive example images P and negative example images N, forming a pair of positive sample pairs by the images Z and the images P, forming a pair of negative sample pairs by the images Z and the images N, and forming a training data set by a (Z, P, N) triplet formed by the target template images Z, the positive example images P and the negative example images N.
Specifically, the operation of step (1) includes cropping the target area template image and cropping the search area image. The clipping method of the target template image comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking the tracked target as the center, the center position of the target area is used for representing the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is scaled to be 127 multiplied by 127. The clipping method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking the target area as the center, and then scaling the size of the cut image block of the search area to 255×255; where q= (w+h)/4,w is the width of the target frame and h is the height of the target frame.
And (2) constructing a convolution twin network based on difficult sample mining, and obtaining feature graphs of different branches.
This network contains three branches and the three branches share the weights of the feature extraction network; the three branches are respectively used for acquiring a characteristic image of the target template image, a characteristic image of the positive sample image of the search area and a characteristic image of the negative sample image, wherein during characteristic extraction, a difficult sample is defined, and difficult sample mining is introduced to learn the characteristic with distinguishing capability.
Specifically, the feature extraction networks of different branches of the twin network in the step (2) are all ResNet-50 after fine tuning, and the input image extracts features through ResNet-50.
Difficult sample mining is introduced to learn the distinguishing-capable features. In connection with the difficult sample mining strategy of the present invention shown in fig. 2, in particular, the present invention contemplates obtaining valid difficult sample pairs from both visual feature similarity and reference contrast similarity. Image pairs with similar visual features and high reference contrast are defined as positive sample pairs and image pairs with similar visual features and low reference contrast are defined as negative sample pairs.
The difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
Wherein S v represents the visual feature similarity, S c represents the reference contrast similarity, α represents the threshold value of the visual feature similarity, and β represents the threshold value of the reference contrast similarity.
The traditional triples sample three pictures from training data, so that the method is simpler, but most of the sampled samples are simple and easily distinguished sample pairs, and if a large number of training sample pairs are simple sample pairs, better characteristics are not beneficial to network learning. Therefore, when the pictures are selected from the training set for training, for each picture, a least similar positive sample and a most similar negative sample are selected to form a triplet, and the difficult sample triplet loss is calculated.
The difficult sample triplet loss is defined as:
Wherein M represents M targets selected from each batch of samples, N represents N pictures selected randomly from each target, (z) + represents max (z, 0), z represents maxd A,P-mindA,N +θ, θ is a threshold parameter set according to actual needs, d A,P represents similarity between the template sample and the positive sample, and d A,N represents distance between the template sample and the negative sample.
Through L hard optimizing loss, the model continuously excavates positive sample pairs and difficult negative samples in the training process, and learns the characteristics with distinguishing capability.
And (3) performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein a position with a higher score in the response map is considered as the most similar position of the image target object, so that the position of the target is determined.
Specifically, the step (3) operates as follows: after feature extraction, different layers of features are fused, a lower layer of features have more target position information, a higher layer of features have more semantic information, the higher layer of features are firstly subjected to up-sampling operation, then are fused with the lower layer of features, feature images after different branches of multi-layer feature fusion are generated in an iteration mode, and the target template image feature images are respectively subjected to cross-correlation operation with the positive sample image feature image and the negative sample image feature image of the search area to obtain response images. And expanding the response map to the original image size so as to determine the position of the target on the image to be searched.
And (4) training the twin network based on difficult sample mining based on the training set in the step (1) to obtain a training convergence twin network.
Specifically, the specific operation of step (4) is as follows:
1) Training by using initial positive and negative samples, and enabling the Z direction P to be close to and far from N through training to obtain a trained classifier;
2) Classifying the samples by using the trained classifier, putting the samples with the misclassification as difficult negative samples into a negative sample subset, and then continuing to train the classifier;
3) The process is repeated until the performance of the classifier is no longer improved.
And (5) performing online target tracking by utilizing the trained twin network.
Specifically, the online tracking process in step (5) includes the following steps:
1) And (3) reading a first frame picture of the video sequence to be tracked, acquiring boundary frame information of the first frame picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of the training convergence twin network in the step (4), extracting multi-layer features of the template image, fusing, and then setting t=2.
2) And (3) reading a t frame of the video to be tracked, cutting out a search area image of the t frame according to the target position determined in the t-1 frame and the method for cutting out the search area image in the step (1), inputting the cut t frame search area image into a search branch of the training convergence twin network in the step (4), and extracting the characteristics of the t frame search image.
3) And performing cross-correlation operation on the characteristic map obtained in the step 1) after multi-layer fusion and the characteristic map obtained in the step 2).
4) Setting t=t+1, and judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected; if yes, executing the step 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Fig. 3 is a tracking effect of object tracking for a first video sequence using the method of the present invention. It can be seen that the target tracking method provided by the invention can effectively track targets with similar background interference.
Fig. 4 is a tracking effect of object tracking for a second video sequence using the method of the present invention. It can be seen that the target tracking method provided by the invention can effectively track the target with posture change and rapid movement.
In summary, the invention introduces difficult sample mining into the target tracking twin network structure, designs difficult triplet loss, can fully train the network, strengthen the discrimination capability of the classifier, can better distinguish similar targets, and can solve the problems of local change, background interference and the like in images, and the learned model has stronger generalization capability.
It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.
Claims (6)
1. The twin network single target tracking method based on difficult sample mining is characterized by comprising the following steps:
step (1), constructing a training set: cutting out target template images Z and search area images X of all images in an image sequence training set according to the target positions and the sizes of the images, dividing the search area images X into positive example images P and negative example images N, forming a pair of positive sample pairs by the images Z and the images N, forming a pair of negative sample pairs by the images Z, the positive example images P and the (Z, P, N) triples formed by the target template images Z, the positive example images P and the negative example images N, and forming a training data set;
The positive sample pair is an image pair with similar visual characteristics and high reference contrast, and the negative sample pair is an image pair with similar visual characteristics and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
Wherein S v represents the visual feature similarity, S c represents the reference contrast similarity, α represents the threshold value of the visual feature similarity, and β represents the threshold value of the reference contrast similarity;
When selecting images from a training set for training, selecting a least similar positive sample and a most similar negative sample for each image to form a triplet, and calculating the triplet loss of a difficult sample; the difficult sample triplet loss is defined as:
Wherein M represents M targets selected from each batch of samples, N represents N images selected randomly from each target, (z) + represents max (z, 0), z represents max d A,P-min dA,N +θ, θ is a threshold parameter set according to actual needs, d A,P represents similarity between the template sample and the positive sample, and d A,N represents distance between the template sample and the negative sample;
Through L hard optimizing loss, the model continuously excavates positive sample pairs and difficult negative samples in the training process, and learns the characteristics with distinguishing capability;
Step (2), constructing a convolution twin network based on difficult sample mining, wherein the network comprises three branches and the three branches share weights of a feature extraction network; the three branches are respectively used for acquiring a feature map of a target template image, a feature map of a positive sample image of a search area and a feature map of a negative sample image, wherein during feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
step (3), performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein a position with a higher score in the response map is considered as a position most similar to an image target object, and the response map is enlarged to an original image size, so that the position of a target on an image to be searched is determined;
Step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a training convergence twin network;
and (5) performing online target tracking by utilizing the trained twin network.
2. The difficult sample mining-based twin network single target tracking method of claim 1, wherein the operations of step (1) comprise cropping the target region template image and cropping the search region image; the clipping method of the target template image comprises the following steps: the method comprises the steps that a target frame of a template image in target tracking is known, a square area is cut out by taking a tracked target as the center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of a cut target image block is scaled; the clipping method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking the target area as the center, and then scaling the size of the cut image block of the search area; where q= (w+h)/4,w is the width of the target frame and h is the height of the target frame.
3. The twin network single target tracking method based on difficult sample mining according to claim 1, wherein the feature extraction networks of the different branches of the twin network in step (2) are adjusted ResNet-50, and the input image is subjected to feature extraction through ResNet-50.
4. The twin network single target tracking method based on difficult sample mining of claim 1, wherein step (3) operates as follows: after feature extraction, fusing different layers of features, wherein the lower layer features have more target position information and the higher layer features have more semantic information, performing up-sampling operation on the higher layer features, then fusing the higher layer features with the lower layer features, iteratively generating feature images fused by different branch multi-layer features, performing cross-correlation operation on the target template image feature images, the positive sample image feature images and the negative sample image feature images of the search area respectively to obtain response images, expanding the response images to the original image size, and determining the position of the target on the image to be searched.
5. The twin network single target tracking method based on difficult sample mining of claim 1, wherein the specific operation of step (4) is as follows:
1) Training by using initial positive and negative samples, and enabling the Z direction P to be close to and far from N through training to obtain a trained classifier;
2) Classifying the samples by using the trained classifier, putting the samples with the misclassification as difficult negative samples into a negative sample subset, and then continuing to train the classifier;
3) The process is repeated until the performance of the classifier is no longer improved.
6. The twin network single target tracking method based on difficult sample mining of claim 2, wherein the online target tracking process in step (5) comprises the steps of:
1) Reading a first frame image of a video sequence to be tracked, acquiring boundary frame information of the first frame image, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of the training convergence twin network in the step (4), extracting multi-layer characteristics of the template image, fusing, and then setting t=2;
2) Reading a t frame of a video to be tracked, cutting out a search area image of the t frame according to the target position determined in the t-1 frame and the method for cutting out the search area image in the step (1), inputting the cut t frame search area image into a search branch of the training convergence twin network in the step (4), and extracting the characteristics of the t frame search image;
3) Performing cross-correlation operation on the feature map obtained in the step 1) after multi-layer fusion and the feature map obtained in the step 2);
4) Setting t=t+1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, if so, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111152770.4A CN113888595B (en) | 2021-09-29 | 2021-09-29 | Twin network single-target visual tracking method based on difficult sample mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111152770.4A CN113888595B (en) | 2021-09-29 | 2021-09-29 | Twin network single-target visual tracking method based on difficult sample mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113888595A CN113888595A (en) | 2022-01-04 |
CN113888595B true CN113888595B (en) | 2024-05-14 |
Family
ID=79008165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111152770.4A Active CN113888595B (en) | 2021-09-29 | 2021-09-29 | Twin network single-target visual tracking method based on difficult sample mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113888595B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114579783A (en) * | 2022-03-09 | 2022-06-03 | 南京邮电大学 | Unsupervised image embedding learning method based on nearest neighbor and difficult sample mining |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111340850A (en) * | 2020-03-20 | 2020-06-26 | 军事科学院系统工程研究院系统总体研究所 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
-
2021
- 2021-09-29 CN CN202111152770.4A patent/CN113888595B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020181685A1 (en) * | 2019-03-12 | 2020-09-17 | 南京邮电大学 | Vehicle-mounted video target detection method based on deep learning |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111340850A (en) * | 2020-03-20 | 2020-06-26 | 军事科学院系统工程研究院系统总体研究所 | Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss |
Non-Patent Citations (3)
Title |
---|
一种基于多样性正实例的单目标跟踪算法;张博言;钟勇;;哈尔滨工业大学学报;20200925(第10期);全文 * |
基于孪生网络的跟踪算法综述;熊昌镇;李言;;工业控制计算机;20200325(第03期);全文 * |
基于轮廓特征及扩展Kalman滤波的车辆跟踪方法研究;纪筱鹏;魏志强;;中国图象图形学报;20110216(第02期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113888595A (en) | 2022-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108665481B (en) | Self-adaptive anti-blocking infrared target tracking method based on multi-layer depth feature fusion | |
CN112069896B (en) | Video target tracking method based on twin network fusion multi-template features | |
EP1934941B1 (en) | Bi-directional tracking using trajectory segment analysis | |
CN109598684B (en) | Correlation filtering tracking method combined with twin network | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN111354017A (en) | Target tracking method based on twin neural network and parallel attention module | |
CN112651998B (en) | Human body tracking algorithm based on attention mechanism and double-flow multi-domain convolutional neural network | |
CN112489081B (en) | Visual target tracking method and device | |
CN109461172A (en) | Manually with the united correlation filtering video adaptive tracking method of depth characteristic | |
CN109410247A (en) | A kind of video tracking algorithm of multi-template and adaptive features select | |
CN108520530A (en) | Method for tracking target based on long memory network in short-term | |
CN110399840B (en) | Rapid lawn semantic segmentation and boundary detection method | |
CN112668483A (en) | Single-target person tracking method integrating pedestrian re-identification and face detection | |
CN113706581B (en) | Target tracking method based on residual channel attention and multi-level classification regression | |
CN111931654A (en) | Intelligent monitoring method, system and device for personnel tracking | |
CN114861761B (en) | Loop detection method based on twin network characteristics and geometric verification | |
CN112434599A (en) | Pedestrian re-identification method based on random shielding recovery of noise channel | |
CN113920472A (en) | Unsupervised target re-identification method and system based on attention mechanism | |
CN109740552A (en) | A kind of method for tracking target based on Parallel Signature pyramid neural network | |
CN113888595B (en) | Twin network single-target visual tracking method based on difficult sample mining | |
CN114495170A (en) | Pedestrian re-identification method and system based on local self-attention inhibition | |
CN102081740B (en) | 3D image classification method based on scale invariant features | |
CN114038011A (en) | Method for detecting abnormal behaviors of human body in indoor scene | |
CN116543019A (en) | Single-target tracking method based on accurate bounding box prediction | |
CN115953570A (en) | Twin network target tracking method combining template updating and trajectory prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |