CN113888595B - Twin network single-target visual tracking method based on difficult sample mining - Google Patents

Twin network single-target visual tracking method based on difficult sample mining Download PDF

Info

Publication number
CN113888595B
CN113888595B CN202111152770.4A CN202111152770A CN113888595B CN 113888595 B CN113888595 B CN 113888595B CN 202111152770 A CN202111152770 A CN 202111152770A CN 113888595 B CN113888595 B CN 113888595B
Authority
CN
China
Prior art keywords
target
image
sample
images
difficult
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111152770.4A
Other languages
Chinese (zh)
Other versions
CN113888595A (en
Inventor
黄磊
高占祺
魏志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202111152770.4A priority Critical patent/CN113888595B/en
Publication of CN113888595A publication Critical patent/CN113888595A/en
Application granted granted Critical
Publication of CN113888595B publication Critical patent/CN113888595B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a twin network single target tracking method based on difficult sample mining, which comprises the steps of constructing a training set, constructing a convolution twin network based on difficult sample mining and the like: according to the invention, the difficult sample mining is introduced into the target tracking method, the difficult negative sample is mined as training data in the training process, the network parameters are updated, the ternary group loss of the difficult sample is selected as a loss function, the difficult sample is continuously optimized, and the difficult negative sample is continuously mined by the model in the training process through optimizing the loss, so that the network is fully trained, similar targets are better distinguished, the model is learned to have the characteristic of distinguishing capability, and the target tracking effect is better.

Description

Twin network single-target visual tracking method based on difficult sample mining
Technical Field
The invention belongs to the technical field of computer vision, relates to an image processing technology, and particularly relates to a twin network single target tracking method based on difficult sample mining.
Background
The single-target visual tracking is one of the popular but challenging research subjects in computer vision, has wide application in the aspects of intelligent video monitoring, robot visual navigation, medical diagnosis, positioning and tracking of underwater organisms and the like, and has wide development prospect. Visual target tracking refers to designating a target to be tracked in a first frame of a video sequence and calibrating an initial position of the target to be tracked, and then predicting the position and the size of the target in a subsequent frame to accurately track the target.
Early classical algorithms all process in the time domain, and these algorithms involve complex calculations, and the large amount of calculation makes tracking less real-time. Then an algorithm based on correlation filtering appears, and in contrast, the target tracking method converts calculation into a frequency domain by the introduction of the correlation filtering, so that the operation amount is greatly reduced, and the speed is greatly improved. With the development of deep learning, researchers have introduced deep learning techniques into target tracking, and a series of methods have been proposed and have achieved good results.
In recent years, a method for tracking targets based on a twin network has received unprecedented attention. The existing method adopts a convolutional neural network to perform feature extraction on target modeling. In the process of target tracking, offline training of a tracked target is one of the keys of the performance of a relational tracking model, and the selection of training data is particularly important when the model is in offline training. The existing twin network-based method only uses a target area, the characteristics extracted in the target area are directly subjected to related operation in the characteristics of the test frame image, the robustness is poor, complex scenes such as similar objects cannot be processed, and the discrimination capability is insufficient. The prior method usually marks the coordinate distance between the object and the instance as positive when the object is tracked, otherwise marks as negative, maximizes the similarity score of the positive instance pair and minimizes the similarity score of the negative instance pair through logic loss, and the method only uses the paired relation among the sample pairs, ignores the potential relation among the prototype, the positive instance and the negative instance, does not consider the effect of the difficult sample on the model, cannot process complex scenes such as similar objects and the like, and has proved by researchers in the fields of object recognition and the like.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a twin network single-target tracking method based on difficult sample mining, which is characterized in that difficult sample mining is introduced into the target tracking method, difficult negative samples are mined as training data in the training process, network parameters are updated, and difficult sample triplet loss is selected as a loss function, so that the difficult negative samples are continuously mined by a model in the training process through optimizing the loss, the network is fully trained, similar targets are better distinguished, and the model is learned to have the characteristic of distinguishing capability.
In order to solve the technical problems, the invention adopts the following technical scheme:
a twin network single target tracking method based on difficult sample mining comprises the following steps:
Step (1), constructing a training set: cutting out target template images Z and search area images X of all images in an image sequence training set according to the target positions and the sizes of the images, dividing the search area images X into positive example images P and negative example images N, forming a pair of positive sample pairs by the images Z and the images P, forming a pair of negative sample pairs by the images Z and the images N, and forming a training data set by a (Z, P, N) triplet formed by the target template images Z, the positive example images P and the negative example images N;
Step (2), constructing a convolution twin network based on difficult sample mining, wherein the network comprises three branches and the three branches share weights of a feature extraction network; the three branches are respectively used for acquiring a feature map of a target template image, a feature map of a positive sample image of a search area and a feature map of a negative sample image, wherein during feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
step (3), performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein a position with a higher score in the response map is considered as a position most similar to an image target object, and the response map is enlarged to an original image size, so that the position of a target on an image to be searched is determined;
Step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a training convergence twin network;
and (5) performing online target tracking by utilizing the trained twin network.
Further, the operation of step (1) includes cropping the target region template image and cropping the search region image; the clipping method of the target template image comprises the following steps: the method comprises the steps that a target frame of a template image in target tracking is known, a square area is cut out by taking a tracked target as the center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of a cut target image block is scaled; the clipping method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking the target area as the center, and then scaling the size of the cut image block of the search area; where q= (w+h)/4,w is the width of the target frame and h is the height of the target frame.
Further, in the step (2), the feature extraction networks of different branches of the twin network are adjusted ResNet-50, and the input image is subjected to ResNet-50 feature extraction.
Further, the positive sample pair is an image pair with similar visual characteristics and high reference contrast, and the negative sample pair is an image pair with similar visual characteristics and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
Wherein S v represents the visual feature similarity, S c represents the reference contrast similarity, α represents the threshold value of the visual feature similarity, and β represents the threshold value of the reference contrast similarity;
When selecting pictures from a training set for training, selecting a least similar positive sample and a most similar negative sample for each picture to form a triplet, and calculating the triplet loss of a difficult sample; the difficult sample triplet loss is defined as:
wherein M represents M targets selected from each batch of samples, N represents N pictures selected randomly from each target, (z) + represents max (z, 0), z represents maxd A,P-mindA,N +θ, θ is a threshold parameter set according to actual needs, d A,P represents similarity between a template sample and a positive sample, and d A,N represents distance between the template sample and the negative sample;
through L hard optimizing loss, the model continuously excavates positive sample pairs and difficult negative samples in the training process, and learns the characteristics with distinguishing capability.
Further, the operation of the step (3) is as follows: after feature extraction, fusing different layers of features, wherein the lower layer features have more target position information and the higher layer features have more semantic information, performing up-sampling operation on the higher layer features, then fusing the higher layer features with the lower layer features, iteratively generating feature images fused by different branch multi-layer features, performing cross-correlation operation on the target template image feature images, the positive sample image feature images and the negative sample image feature images of the search area respectively to obtain response images, expanding the response images to the original image size, and determining the position of the target on the image to be searched.
Further, the specific operation of step (4) is as follows:
1) Training by using initial positive and negative samples, and enabling the Z direction P to be close to and far from N through training to obtain a trained classifier;
2) Classifying the samples by using the trained classifier, putting the samples with the misclassification as difficult negative samples into a negative sample subset, and then continuing to train the classifier;
3) The process is repeated until the performance of the classifier is no longer improved.
Further, the online tracking process in step (5) includes the following steps:
1) Reading a first frame picture of a video sequence to be tracked, acquiring boundary frame information of the first frame picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of the training convergence twin network in the step (4), extracting multi-layer features of the template image, fusing, and then setting t=2;
2) Reading a t frame of a video to be tracked, cutting out a search area image of the t frame according to the target position determined in the t-1 frame and the method for cutting out the search area image in the step (1), inputting the cut t frame search area image into a search branch of the training convergence twin network in the step (4), and extracting the characteristics of the t frame search image;
3) Performing cross-correlation operation on the feature map obtained in the step 1) after multi-layer fusion and the feature map obtained in the step 2);
4) Setting t=t+1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, if so, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Compared with the prior art, the invention has the advantages that:
Aiming at the problem that the existing twin network target tracking method does not consider the effect of a difficult sample on a model, the twin network target tracking method based on difficult sample mining is designed, the difficult sample mining is introduced into a target tracking twin network structure, a difficult negative sample is mined in the training process as training data, and the loss of a difficult sample triplet is selected as a loss function, so that the model is continuously optimized, the characteristic with distinguishing capability is learned, and the target tracking effect is good.
Specifically, in the training process, an initial positive sample and a negative sample are used for training, then the trained classifier is used for classifying the samples, the samples with the wrong classification are used as difficult negative samples to be placed into a negative sample subset, then the training is continued, and the training is repeated until the performance of the classifier is not improved. Different from the traditional samples for triplet training, the invention selects the difficult sample triples, updates network parameters by using the difficult samples in the training process, selects the positive sample which is least similar to each picture and the negative sample which is most similar to each picture to calculate the difficult triplet loss, and the model continuously excavates the difficult negative samples in the training process through optimizing the loss, so that the network is fully trained, similar targets are better distinguished, the problems of local change, background interference and the like in the images are solved, and the learned model has stronger generalization capability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic overall flow chart of the present invention;
FIG. 2 is a schematic diagram of a difficult sample mining strategy architecture according to the present invention;
FIG. 3 is a tracking effect of object tracking for a first video sequence using the method of the present invention;
fig. 4 is a tracking effect of object tracking for a second video sequence using the method of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples.
In combination with the overall flow shown in fig. 1, the twin network single target tracking method based on difficult sample mining comprises the following steps:
And (1) constructing a training set.
Cutting out target template images Z and search area images X of all images in an image sequence training set according to the target positions and the sizes of the images, dividing the search area images X into positive example images P and negative example images N, forming a pair of positive sample pairs by the images Z and the images P, forming a pair of negative sample pairs by the images Z and the images N, and forming a training data set by a (Z, P, N) triplet formed by the target template images Z, the positive example images P and the negative example images N.
Specifically, the operation of step (1) includes cropping the target area template image and cropping the search area image. The clipping method of the target template image comprises the following steps: the target frame of the template image in target tracking is known, a square area is cut out by taking the tracked target as the center, the center position of the target area is used for representing the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of the cut target image block is scaled to be 127 multiplied by 127. The clipping method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking the target area as the center, and then scaling the size of the cut image block of the search area to 255×255; where q= (w+h)/4,w is the width of the target frame and h is the height of the target frame.
And (2) constructing a convolution twin network based on difficult sample mining, and obtaining feature graphs of different branches.
This network contains three branches and the three branches share the weights of the feature extraction network; the three branches are respectively used for acquiring a characteristic image of the target template image, a characteristic image of the positive sample image of the search area and a characteristic image of the negative sample image, wherein during characteristic extraction, a difficult sample is defined, and difficult sample mining is introduced to learn the characteristic with distinguishing capability.
Specifically, the feature extraction networks of different branches of the twin network in the step (2) are all ResNet-50 after fine tuning, and the input image extracts features through ResNet-50.
Difficult sample mining is introduced to learn the distinguishing-capable features. In connection with the difficult sample mining strategy of the present invention shown in fig. 2, in particular, the present invention contemplates obtaining valid difficult sample pairs from both visual feature similarity and reference contrast similarity. Image pairs with similar visual features and high reference contrast are defined as positive sample pairs and image pairs with similar visual features and low reference contrast are defined as negative sample pairs.
The difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
Wherein S v represents the visual feature similarity, S c represents the reference contrast similarity, α represents the threshold value of the visual feature similarity, and β represents the threshold value of the reference contrast similarity.
The traditional triples sample three pictures from training data, so that the method is simpler, but most of the sampled samples are simple and easily distinguished sample pairs, and if a large number of training sample pairs are simple sample pairs, better characteristics are not beneficial to network learning. Therefore, when the pictures are selected from the training set for training, for each picture, a least similar positive sample and a most similar negative sample are selected to form a triplet, and the difficult sample triplet loss is calculated.
The difficult sample triplet loss is defined as:
Wherein M represents M targets selected from each batch of samples, N represents N pictures selected randomly from each target, (z) + represents max (z, 0), z represents maxd A,P-mindA,N +θ, θ is a threshold parameter set according to actual needs, d A,P represents similarity between the template sample and the positive sample, and d A,N represents distance between the template sample and the negative sample.
Through L hard optimizing loss, the model continuously excavates positive sample pairs and difficult negative samples in the training process, and learns the characteristics with distinguishing capability.
And (3) performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein a position with a higher score in the response map is considered as the most similar position of the image target object, so that the position of the target is determined.
Specifically, the step (3) operates as follows: after feature extraction, different layers of features are fused, a lower layer of features have more target position information, a higher layer of features have more semantic information, the higher layer of features are firstly subjected to up-sampling operation, then are fused with the lower layer of features, feature images after different branches of multi-layer feature fusion are generated in an iteration mode, and the target template image feature images are respectively subjected to cross-correlation operation with the positive sample image feature image and the negative sample image feature image of the search area to obtain response images. And expanding the response map to the original image size so as to determine the position of the target on the image to be searched.
And (4) training the twin network based on difficult sample mining based on the training set in the step (1) to obtain a training convergence twin network.
Specifically, the specific operation of step (4) is as follows:
1) Training by using initial positive and negative samples, and enabling the Z direction P to be close to and far from N through training to obtain a trained classifier;
2) Classifying the samples by using the trained classifier, putting the samples with the misclassification as difficult negative samples into a negative sample subset, and then continuing to train the classifier;
3) The process is repeated until the performance of the classifier is no longer improved.
And (5) performing online target tracking by utilizing the trained twin network.
Specifically, the online tracking process in step (5) includes the following steps:
1) And (3) reading a first frame picture of the video sequence to be tracked, acquiring boundary frame information of the first frame picture, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of the training convergence twin network in the step (4), extracting multi-layer features of the template image, fusing, and then setting t=2.
2) And (3) reading a t frame of the video to be tracked, cutting out a search area image of the t frame according to the target position determined in the t-1 frame and the method for cutting out the search area image in the step (1), inputting the cut t frame search area image into a search branch of the training convergence twin network in the step (4), and extracting the characteristics of the t frame search image.
3) And performing cross-correlation operation on the characteristic map obtained in the step 1) after multi-layer fusion and the characteristic map obtained in the step 2).
4) Setting t=t+1, and judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected; if yes, executing the step 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
Fig. 3 is a tracking effect of object tracking for a first video sequence using the method of the present invention. It can be seen that the target tracking method provided by the invention can effectively track targets with similar background interference.
Fig. 4 is a tracking effect of object tracking for a second video sequence using the method of the present invention. It can be seen that the target tracking method provided by the invention can effectively track the target with posture change and rapid movement.
In summary, the invention introduces difficult sample mining into the target tracking twin network structure, designs difficult triplet loss, can fully train the network, strengthen the discrimination capability of the classifier, can better distinguish similar targets, and can solve the problems of local change, background interference and the like in images, and the learned model has stronger generalization capability.
It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that various changes, modifications, additions and substitutions can be made by those skilled in the art without departing from the spirit and scope of the invention.

Claims (6)

1. The twin network single target tracking method based on difficult sample mining is characterized by comprising the following steps:
step (1), constructing a training set: cutting out target template images Z and search area images X of all images in an image sequence training set according to the target positions and the sizes of the images, dividing the search area images X into positive example images P and negative example images N, forming a pair of positive sample pairs by the images Z and the images N, forming a pair of negative sample pairs by the images Z, the positive example images P and the (Z, P, N) triples formed by the target template images Z, the positive example images P and the negative example images N, and forming a training data set;
The positive sample pair is an image pair with similar visual characteristics and high reference contrast, and the negative sample pair is an image pair with similar visual characteristics and low reference contrast; the difficult samples in the dataset are defined as:
P={(i,j)|Sv(xi,xj)≥α,Sc(yi,yj)≥β}
N={(m,n)|Sv(xm,xn)≥α,Sc(ym,yn)<β}
Wherein S v represents the visual feature similarity, S c represents the reference contrast similarity, α represents the threshold value of the visual feature similarity, and β represents the threshold value of the reference contrast similarity;
When selecting images from a training set for training, selecting a least similar positive sample and a most similar negative sample for each image to form a triplet, and calculating the triplet loss of a difficult sample; the difficult sample triplet loss is defined as:
Wherein M represents M targets selected from each batch of samples, N represents N images selected randomly from each target, (z) + represents max (z, 0), z represents max d A,P-min dA,N +θ, θ is a threshold parameter set according to actual needs, d A,P represents similarity between the template sample and the positive sample, and d A,N represents distance between the template sample and the negative sample;
Through L hard optimizing loss, the model continuously excavates positive sample pairs and difficult negative samples in the training process, and learns the characteristics with distinguishing capability;
Step (2), constructing a convolution twin network based on difficult sample mining, wherein the network comprises three branches and the three branches share weights of a feature extraction network; the three branches are respectively used for acquiring a feature map of a target template image, a feature map of a positive sample image of a search area and a feature map of a negative sample image, wherein during feature extraction, a difficult sample is defined, and difficult sample mining is introduced to learn features with distinguishing capability;
step (3), performing cross-correlation operation on the target template image feature map obtained in the step (2) and the search area image feature map to obtain a response map, wherein a position with a higher score in the response map is considered as a position most similar to an image target object, and the response map is enlarged to an original image size, so that the position of a target on an image to be searched is determined;
Step (4), training a twin network based on difficult sample mining based on the training set in the step (1) to obtain a training convergence twin network;
and (5) performing online target tracking by utilizing the trained twin network.
2. The difficult sample mining-based twin network single target tracking method of claim 1, wherein the operations of step (1) comprise cropping the target region template image and cropping the search region image; the clipping method of the target template image comprises the following steps: the method comprises the steps that a target frame of a template image in target tracking is known, a square area is cut out by taking a tracked target as the center, the center position of the target area represents the target position, q pixels are respectively expanded on four sides of the target frame, and finally the size of a cut target image block is scaled; the clipping method of the search area image comprises the following steps: respectively expanding 2q pixels on four sides of a target frame by taking the target area as the center, and then scaling the size of the cut image block of the search area; where q= (w+h)/4,w is the width of the target frame and h is the height of the target frame.
3. The twin network single target tracking method based on difficult sample mining according to claim 1, wherein the feature extraction networks of the different branches of the twin network in step (2) are adjusted ResNet-50, and the input image is subjected to feature extraction through ResNet-50.
4. The twin network single target tracking method based on difficult sample mining of claim 1, wherein step (3) operates as follows: after feature extraction, fusing different layers of features, wherein the lower layer features have more target position information and the higher layer features have more semantic information, performing up-sampling operation on the higher layer features, then fusing the higher layer features with the lower layer features, iteratively generating feature images fused by different branch multi-layer features, performing cross-correlation operation on the target template image feature images, the positive sample image feature images and the negative sample image feature images of the search area respectively to obtain response images, expanding the response images to the original image size, and determining the position of the target on the image to be searched.
5. The twin network single target tracking method based on difficult sample mining of claim 1, wherein the specific operation of step (4) is as follows:
1) Training by using initial positive and negative samples, and enabling the Z direction P to be close to and far from N through training to obtain a trained classifier;
2) Classifying the samples by using the trained classifier, putting the samples with the misclassification as difficult negative samples into a negative sample subset, and then continuing to train the classifier;
3) The process is repeated until the performance of the classifier is no longer improved.
6. The twin network single target tracking method based on difficult sample mining of claim 2, wherein the online target tracking process in step (5) comprises the steps of:
1) Reading a first frame image of a video sequence to be tracked, acquiring boundary frame information of the first frame image, cutting out a target template image Z of the first frame according to the method for cutting out the target template image in the step (1), inputting the Z into a template branch of the training convergence twin network in the step (4), extracting multi-layer characteristics of the template image, fusing, and then setting t=2;
2) Reading a t frame of a video to be tracked, cutting out a search area image of the t frame according to the target position determined in the t-1 frame and the method for cutting out the search area image in the step (1), inputting the cut t frame search area image into a search branch of the training convergence twin network in the step (4), and extracting the characteristics of the t frame search image;
3) Performing cross-correlation operation on the feature map obtained in the step 1) after multi-layer fusion and the feature map obtained in the step 2);
4) Setting t=t+1, judging whether T is less than or equal to T, wherein T is the total frame number of the video sequence to be detected, if so, executing the steps 2) -3), otherwise, ending the tracking process of the video sequence to be detected.
CN202111152770.4A 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining Active CN113888595B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111152770.4A CN113888595B (en) 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111152770.4A CN113888595B (en) 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining

Publications (2)

Publication Number Publication Date
CN113888595A CN113888595A (en) 2022-01-04
CN113888595B true CN113888595B (en) 2024-05-14

Family

ID=79008165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111152770.4A Active CN113888595B (en) 2021-09-29 2021-09-29 Twin network single-target visual tracking method based on difficult sample mining

Country Status (1)

Country Link
CN (1) CN113888595B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院系统工程研究院系统总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111340850A (en) * 2020-03-20 2020-06-26 军事科学院系统工程研究院系统总体研究所 Ground target tracking method of unmanned aerial vehicle based on twin network and central logic loss

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种基于多样性正实例的单目标跟踪算法;张博言;钟勇;;哈尔滨工业大学学报;20200925(第10期);全文 *
基于孪生网络的跟踪算法综述;熊昌镇;李言;;工业控制计算机;20200325(第03期);全文 *
基于轮廓特征及扩展Kalman滤波的车辆跟踪方法研究;纪筱鹏;魏志强;;中国图象图形学报;20110216(第02期);全文 *

Also Published As

Publication number Publication date
CN113888595A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN108665481B (en) Self-adaptive anti-blocking infrared target tracking method based on multi-layer depth feature fusion
CN112069896B (en) Video target tracking method based on twin network fusion multi-template features
EP1934941B1 (en) Bi-directional tracking using trajectory segment analysis
CN111354017A (en) Target tracking method based on twin neural network and parallel attention module
CN112184752A (en) Video target tracking method based on pyramid convolution
CN109598684B (en) Correlation filtering tracking method combined with twin network
CN112489081B (en) Visual target tracking method and device
CN112651998B (en) Human body tracking algorithm based on attention mechanism and double-flow multi-domain convolutional neural network
CN108520530A (en) Method for tracking target based on long memory network in short-term
CN110399840B (en) Rapid lawn semantic segmentation and boundary detection method
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN107194929B (en) Method for tracking region of interest of lung CT image
CN112668483A (en) Single-target person tracking method integrating pedestrian re-identification and face detection
CN111931654A (en) Intelligent monitoring method, system and device for personnel tracking
CN115527269B (en) Intelligent human body posture image recognition method and system
CN112434599A (en) Pedestrian re-identification method based on random shielding recovery of noise channel
CN111640138A (en) Target tracking method, device, equipment and storage medium
CN113920472A (en) Unsupervised target re-identification method and system based on attention mechanism
CN114495170A (en) Pedestrian re-identification method and system based on local self-attention inhibition
CN114038011A (en) Method for detecting abnormal behaviors of human body in indoor scene
CN113888595B (en) Twin network single-target visual tracking method based on difficult sample mining
CN116543019A (en) Single-target tracking method based on accurate bounding box prediction
CN116311387A (en) Cross-modal pedestrian re-identification method based on feature intersection
CN116051601A (en) Depth space-time associated video target tracking method and system
CN114943741A (en) Visual SLAM method based on target detection and geometric probability in dynamic scene

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant