CN112991385B - Twin network target tracking method based on different measurement criteria - Google Patents
Twin network target tracking method based on different measurement criteria Download PDFInfo
- Publication number
- CN112991385B CN112991385B CN202110171718.7A CN202110171718A CN112991385B CN 112991385 B CN112991385 B CN 112991385B CN 202110171718 A CN202110171718 A CN 202110171718A CN 112991385 B CN112991385 B CN 112991385B
- Authority
- CN
- China
- Prior art keywords
- target
- frame
- tracking
- frame image
- follows
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000005259 measurement Methods 0.000 title claims abstract description 42
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000004044 response Effects 0.000 claims abstract description 60
- 238000000605 extraction Methods 0.000 claims abstract description 25
- 230000004927 fusion Effects 0.000 claims abstract description 5
- 238000010586 diagram Methods 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 18
- 230000008859 change Effects 0.000 claims description 8
- 239000013598 vector Substances 0.000 claims description 8
- 230000009191 jumping Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Abstract
The invention discloses a twin network target tracking method based on different measurement criteria, which comprises the following specific steps: step 1, selecting a feature extraction network; and 2, acquiring a tracking video, and manually selecting an area where the target is located on a first frame of the video. Obtaining depth characteristics of the template; step 3, entering a subsequent frame, and obtaining depth characteristics of a current frame searching area by utilizing the tracking target coordinate position and the width and height of the previous frame; step 4, similarity measurement is carried out on the depth features of the template and the depth features of the current frame searching area by adopting cosine similarity, and a response chart is obtained; step 5, similarity measurement is carried out on the template depth features and the depth features of the current frame search area by using Euclidean distance, and a response chart obtained by using the Euclidean distance measurement mode is obtained; and 6, carrying out weighted fusion on the two response graphs, and determining the position of the target according to the maximum value on the response graphs. The method solves the problems that a target tracking method based on a twin network is easy to be interfered by a similar object and is not robust to the appearance of the target.
Description
Technical Field
The invention belongs to the technical field of video single-target tracking, and relates to a twin network target tracking method based on different measurement criteria.
Background
In the field of computer vision, object tracking has been an important topic and research direction at present. The target tracking work is to estimate the position, shape or occupied area of the tracked target in a continuous video image sequence and determine the motion information such as the motion speed, direction and track of the target. The target tracking has important research significance and wide application prospect, and is mainly applied to aspects of video monitoring, man-machine interaction, intelligent transportation, autonomous navigation and the like.
The object tracking method based on the twin network is the main stream of the current object tracking method. The main idea of the twin network architecture is to find a function that can map the input picture to a high-dimensional space such that the simple distance in the target space approximates the "semantic" distance of the input space. More precisely the structure tries to find a set of parameters such that the similarity measure is small when belonging to the same category and large when belonging to different categories. The network is mainly used for metric learning in the past and is used for calculating the similarity of information such as images, sounds, texts and the like. Especially in the field of face recognition. In target tracking, the twin network typically uses the first frame target region as a template, and continuously makes similarity measurements with the template in subsequent frames to obtain the target position and size. The existing target tracking method based on the twin network generally adopts cosine similarity as measurement, the measurement mode is too single, and the situation that the appearance of the target changes greatly or similar target interference exists cannot be well dealt with.
Disclosure of Invention
The invention aims to provide a twin network target tracking method based on different measurement criteria, which solves the problem that the existing twin network target tracking method is easy to be interfered by similar targets or is not robust to the change of the appearance of the targets so as to cause tracking failure.
The technical scheme adopted by the invention is that the twin network target tracking method based on different measurement criteria comprises the following steps:
Step 2, acquiring a tracking video, manually selecting a region where a target is located on a first frame of the video, and inputting a target region Z of the first frame as a template into the feature extraction network selected in the step 1In, obtain depth features of templates +.>
Step 4, adopting cosine similarity to obtain the depth characteristics of the template obtained in the step 2And (3) searching the depth feature of the area of the current frame obtained in the step (3)>Performing similarity measurement to obtain a response graph h obtained by using a cosine similarity measurement mode c (Z,S t );
Step 5, using Euclidean distance to obtain the depth characteristics of the template obtained in the step 2And (3) searching the depth feature of the area of the current frame obtained in the step (3)>Performing similarity measurement to obtain a response graph h obtained by using Euclidean distance measurement mode d (Z,S t )。
The invention is also characterized in that:
the specific process of the step 1 is as follows:
selecting AlexNet network pre-trained on ImageNet dataset as feature extraction network of twin network
The specific process of the step 2 is as follows: the specific process of the step 2 is as follows:
step 2.1, acquiring a tracking video, manually selecting an area where a target is located on a first frame of the video, enabling (x, y) to be the center point coordinate of the target in the first frame, m and n to be the width and height of the target area respectively, taking the center point (x, y) of the target in the first frame as the center, and intercepting a square area with the side length of z_sz, wherein the calculation formula of z_sz is as follows:
wherein p= (m+n)/4 represents the filling amount;
step 2.2, if the square region with the side length of z_sz exceeds the first frame image of the tracking video, filling the excess with the average value of the first frame image of the tracking video, wherein the average value of the first frame imageThe calculation is carried out by adopting the following formula (2):
wherein ,representing pixel values of the ith channel, jth row, and kth column in the first frame target region;
Step 2.3, scaling the square region with the side length of z_sz to b×b to obtain a target region Z of the first frame image, and inputting the target region Z of the first frame image into the feature extraction networkDepth characteristics w×h×c are obtained for the width, height, and channel number>
The specific process of the step 3 is as follows:
step 3.1, entering a subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x t-1 ,y t-1 ) And width and height (m t-1 ,n t-1 ) A square area with the side length of x_sx is cut off on the current t frame image, and the side length of x_sx is calculated according to the following formula:
wherein pt-1 =(w t-1 +n t-1 ) 4, representing the filling amount;
step 3.2, if the square area intercepted in the step 3.1 exceeds the current t frame image, filling the excess part with the mean value of the t frame image, wherein the mean value of the current t frame image is calculated by adopting the following formula:
wherein ,pixel values representing an ith channel, a jth row, and a kth column in a subsequent frame target region;
step 3.3, scaling the square area with the side length of x_sx to a size of a×a to obtain a search area S of the current t-th frame image t Will be the current t frameSearch area S of image t Feature extraction network input to step 1Depth feature of current frame search region of W×H×C with width, height and channel number obtained>
The specific process of the step 4 is as follows:
first, template frame depth featuresSearch area +.>Sliding operation is performed, and every time sliding operation is performed, the current frame search area is +.>There will always be one and template frame depth feature +.>Areas of equal size->Wherein i represents +.>At->Subscript of upper horizontal movement, j represents +.>At->Subscript of vertical movement is given byAt every time at +.>The up shift s is a step of size, then i and j will take values within the interval:
wherein i and j are integers;
due to and />The dimensions of (2) are w.times.h.times.C, will now be +.> and />One-dimensional vector flattened to (wXhXC) x 1> and />Measuring the similarity of the two vectors by cosine similarity, solving for +.>Andthe cosine similarity of (2) is as follows:
The specific steps of the step 5 are as follows:
first template frame depth featureSearch area +.>Sliding operation is performed, and during the sliding operation, euclidean distance is used for measuring the depth characteristic of the template frame +.>And the current frame search area->Neutron block->Is measured as follows:
finally, a response diagram h is obtained through Euclidean distance measurement mode d (Z,S t ) Is that
The specific process of the step 6 is as follows:
step 6.1, for response graph h obtained by cosine similarity d (Z,S t ) And Euclidean distance obtained response graph h d (Z,S t ) The following weighted fusion is carried out to obtain a fused response diagram h(Z,S t ):
h(Z,S t )=λh c (Z,S t )+(1-λ)h d (Z,S t ) (6);
Step 6.2, the fused response diagram h (Z, S) t ) Interpolation into response maps H (Z, S) of size lambda x lambda t ) Response map H (Z, S t ) The maximum point on the map is the position of the target, and then according to the response map H (Z, S t ) Maximum value and response diagram H (Z, S t ) The deviation (Deltax, deltay) of the center position corrects the target position (x) of the previous frame t-1 ,y t-1 ) Obtain the target position (x t ,y t ) The specific calculation mode is as follows:
step 6.3, update the width and height (w t ,h t ) Firstly, obtaining a scale of the target width-height change by adopting a linear interpolation mode, wherein the calculation mode is as follows:
wherein r is the update rate;
step 6.4, updating the width and height (w t ,h t ) The specific formula is as follows:
and 6.5, finishing the tracking process of the target image of the current frame, taking the next frame as the current frame, and jumping to the step 3 to track the next frame.
The beneficial effects of the invention are as follows:
1. the invention additionally introduces the Euclidean distance measurement mode, so that the network can better cope with the condition of similar target change, and effectively solves the problem of tracking failure caused by similar target occurrence
2. The advantages of the two measurement modes are fully utilized by fusing the response graph obtained by Euclidean distance measurement and the response graph obtained by cosine similarity measurement, so that the network is more robust to the change of the appearance of the target, and the tracking drift problem caused by the change of the appearance of the target is effectively solved.
Drawings
FIG. 1 is a block diagram of a network architecture of the twin network target tracking method of the present invention based on different metric criteria;
FIG. 2 is a schematic diagram of similarity measurement in the twin network target tracking method based on different measurement criteria of the present invention;
FIG. 3 is a process schematic diagram of embodiment 1 of the twin network target tracking method of the present invention based on different metric criteria.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
The invention discloses a twin network target tracking method based on different measurement criteria, which is shown in figure 1 and comprises the following specific steps:
Step 2, acquiring a tracking video, manually selecting a region where a target is located on a first frame of the video, and inputting a target region Z of the first frame as a template into the feature extraction network selected in the step 1In, obtain depth features of templates +.>
Step 2.1, acquiring a tracking video, manually selecting an area where a target is located on a first frame of the video, enabling (x, y) to be the center point coordinate of the target in the first frame, m and n to be the width and height of the target area respectively, taking the center point (x, y) of the target in the first frame as the center, and intercepting a square area with the side length of z_sz, wherein the calculation formula of z_sz is as follows:
wherein p= (m+n)/4 represents the filling amount;
step 2.2, if the square region with the side length of z_sz exceeds the first frame image of the tracking video, filling the excess with the average value of the first frame image of the tracking video, wherein the average value of the first frame imageThe calculation is carried out by adopting the following formula (2):
wherein ,pixel values representing an ith channel, a jth row, and a kth column in a first frame target region;
step 2.3, scaling the square region with the side length of z_sz to b×b to obtain a target region Z of the first frame image, and inputting the target region Z of the first frame image into the feature extraction networkDepth characteristics w×h×c are obtained for the width, height, and channel number>
The specific process of the step 3 is as follows:
step 3.1, entering a subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x t-1 ,y t-1 ) And width and height (m t-1 ,n t-1 ) A square area with the side length of x_sx is cut off on the current t frame image, and the side length of x_sx is calculated according to the following formula:
wherein pt-1 =(w t-1 +n t-1 ) 4, representing the filling amount;
step 3.2, if the square area intercepted in the step 3.1 exceeds the current t frame image, filling the excess part with the mean value of the t frame image, wherein the mean value of the current t frame image is calculated by adopting the following formula:
wherein ,pixel values representing an ith channel, a jth row, and a kth column in a subsequent frame target region;
step 3.3, scaling the square area with the side length of x_sx to a size of a×a to obtain a search area S of the current t-th frame image t Search area S of current t-th frame image t Input to the feature extraction network selected in step 1Depth feature of current frame search region of W×H×C with width, height and channel number obtained>
Step 4, adopting cosine similarity to form template depth characteristicsAnd the current frame search region depth feature +.>And (5) similarity measurement is carried out. First of all the depth features of the template frame->Search area +.>A sliding operation as shown in fig. 2 is performed. Every time a sliding operation is made, the current frame searches for an area +.>There will always be one and template frame depth feature +.>Areas of the same size. Define the area as +.>Wherein i represents +.>At->Subscript of upper horizontal movement, j represents +.>At->And vertically moving subscripts. Let->At every time at +.>The up shift s is a step of size, then i and j will take values within the interval:
due to and />The dimensions of (2) are w.times.h.times.C, will now be +.> and />One-dimensional vector flattened to (wXhXC) x 1> and />The degree of similarity of the two vectors is measured by cosine similarity. Solving->Andthe cosine similarity of (2) is as follows:
response diagram h finally obtained by cosine similarity measurement mode c (Z,S t ) Is thatIs a set of (3). h is a c (Z,S t ) The expression of (c) can be written in the following form:
* Representing a cross-correlation metric operation.
Step 5, using Euclidean distance to make template frame depth featureAnd the current frame search region depth feature +.>And (4) performing similarity measurement, and adopting a method similar to the cosine similarity measurement in the step (4). First template frame depth feature->Search area +.>A sliding operation as shown in fig. 2 is performed. During the sliding operation, euclidean distance is used to measure the depth characteristic of template frame>And the current frame search area->Neutron block->Is measured as follows:
finally, a response diagram h is obtained through Euclidean distance measurement mode d (Z,S t ) Is thatIs a set of (3). h is a d (Z,S t ) The expression of (c) can be written in the following form:
the letter indicates Euclidean distance metric operation.
h(Z,S t )=λh c (Z,S t )+(1-λ)h d (Z,S t ) (10);
and then the fused response diagram h (Z, S) t ) Interpolation into response maps H (Z, S) of size lambda x lambda t ). Response diagram H (Z, S t ) The maximum point on the map is the position of the target, and then according to the response map H (Z, S t ) Maximum value and response diagram H (Z, S t ) The deviation (Deltax, deltay) of the center position corrects the target position (x) of the previous frame t-1 ,y t-1 ) Obtaining the destination of the current frameTarget position (x) t ,y t ) The specific calculation mode is as follows:
second update the width and height (w) t ,h t ) Firstly, obtaining a scale of the target width-height change by adopting a linear interpolation mode, wherein the calculation mode is as follows:
where r is the update rate. And updating the width and height (w) of the current frame target by multiplying the variable scale t ,h t ) The specific formula is as follows:
and (3) finishing the tracking process of the current frame target, taking the next frame as the current frame, and jumping to the step (3) to track the next frame.
Example 1
TABLE 1 feature extraction network parameter table
Feature extraction networkThe parameters of (2) are shown in table 1 and consist of a total of 5 convolutional layers and 2 pooling layers. The first two convolutional layers are followed by two maximum pooling layers. The first 4 convolution layers are all followed by adding a random inactivation layer and a RELU nonlinear activation function
And 2, acquiring a tracking video, and manually selecting an area where the target is located on a first frame of the video. Let (x, y) be the center point coordinates of the object in the first frame, m and n be the width and height of the object region, respectively. Taking the center point (x, y) of the target in the first frame as the center, intercepting a square area with the side length of z_sz, wherein the calculation formula of z_sz is as follows:
where p= (m+n)/4 represents the filling amount. If the square area exceeds the image, the excess is filled with the image mean. The square region of z_sz is then scaled to a size of 127 x 127, resulting in the target region Z of the first frame. Finally, inputting the target area Z of the first frame into the feature extraction networkDepth features of 6×6×256 are obtained>
wherein pt-1 =(w t-1 +n t-1 ) And/4, the filling amount. If the square area exceeds the image, the excess is filled with the image mean. A kind of electronic deviceThen scaling the square region of x_sx to 255×255 to obtain the search region S of the current frame t And input into a feature extraction network to obtain depth features of the current frame search regionThe size is 22 multiplied by 256;
step 4, adopting cosine similarity to form template depth characteristicsAnd the current frame search region depth feature +.>And (5) similarity measurement is carried out. First of all the depth features of the template frame->Search area +.>A sliding operation as shown in fig. 2 is performed. Every time a sliding operation is made, the current frame searches for an area +.>There will always be one and template frame depth feature +.>Areas of equal size->Wherein i represents +.>At->Subscript of upper horizontal movement, j represents +.>At->And vertically moving subscripts. Let->At every time at +.>Up-shift by a step of s=1, i and j will take values within the interval:
i∈[1,2,...,17]j∈[1,2,...,17]
due to and />The dimensions of (2) are 6X 256, and +.> and />One-dimensional vector flattened to (6X 256) X1> and />The degree of similarity of the two vectors is measured by cosine similarity. Solving for and />The cosine similarity of (2) is as follows: />
Response diagram h finally obtained by cosine similarity measurement mode c (Z,S t ) Is thatIs a set of (3). h is a c (Z,S t ) The expression of (c) can be written in the following form:
* Representing a cross-correlation metric operation.
Step 5, using Euclidean distance to make template frame depth featureAnd the current frame search region depth feature +.>And (4) performing similarity measurement, and adopting a method similar to the cosine similarity measurement in the step (4). First template frame depth feature->Search area +.>A sliding operation as shown in fig. 2 is performed. During the sliding operation, euclidean distance is used to measure the depth characteristic of template frame>And the current frame search area->Neutron block->Is measured as follows:
finally, a response diagram h is obtained through Euclidean distance measurement mode d (Z,S t ) Is thatIs a set of (3). h is a d (Z,S t ) The expression of (c) can be written in the following form:
the letter indicates Euclidean distance metric operation.
and then the fused response diagram h (Z, S) t ) Interpolated into response plots H (Z, S) of size 272×272 t ). Response diagram H (Z, S t ) The maximum point on the map is the position of the target, and then according to the response map H (Z, S t ) Maximum value and response diagram H (Z, S t ) The deviation (Deltax, deltay) of the center position corrects the target position (x) of the previous frame t-1 ,y t-1 ) Obtain the target position (x t ,y t ) The specific calculation mode is as follows:
second update the width and height (w) t ,h t ). Firstly, obtaining a scale of the target width and height change by adopting a linear interpolation mode, wherein the calculation mode is as follows:
let the update rate r=0.59. The width and height (w) of the current frame target are updated as follows t ,h t ),
And (3) finishing the tracking process of the current frame target, taking the next frame as the current frame, and jumping to the step (3) to track the next frame. As shown in fig. 3, the target can be positioned and the target size can be determined in the current frame by means of combining the maximum value of the response diagram with the wide-high updating mode.
Claims (5)
1. A twin network target tracking method based on different measurement criteria is characterized in that: the method specifically comprises the following steps:
Step 2, acquiring a tracking video, manually selecting a region where a target is located on a first frame of the video, and inputting a target region Z of the first frame as a template into the feature extraction network selected in the step 1In, obtain depth features of templates +.>
Step 3, entering a follow-up frame image of the tracking video, and utilizing the follow-up frame imageTracking target coordinate position (x of one frame image t-1 ,y t-1 ) And width and height (w) t-1 ,h t-1 ) Search area S for obtaining current frame image t Search area S of current frame image t Input to the feature extraction network selected in step 1Depth feature of search area of current frame image is obtained>
Step 4, adopting cosine similarity to obtain the depth characteristics of the template obtained in the step 2And (3) searching the depth feature of the area of the current frame obtained in the step (3)>Performing similarity measurement to obtain a response graph h obtained by using a cosine similarity measurement mode c (Z,S t );
The specific process of the step 4 is as follows:
first, template frame depth featuresSearch area +.>Sliding operation is performed, and every time sliding operation is performed, the current frame search area is +.>There will always be one and template frame depth feature +.>Areas of equal size->Wherein i represents +.>At->Subscript of upper horizontal movement, j represents +.>At->Upper vertically shifted subscript, assume +.>At every time at +.>The up shift s is a step of size, then i and j will take values within the interval:
wherein i and j are integers; w, H the feature extraction networks selected for step 1 respectivelyThe width and height obtained in (a); w and h are depth features respectively>Is the width and height of (2);
due to and />The dimensions of (2) are w.times.h.times.C, will now be +.> and />One-dimensional vector flattened to (wXhXC) x 1> and />Measuring the similarity of the two vectors by cosine similarity, solving for +.> and />The cosine similarity of (2) is as follows:
Step 5, using Euclidean distance to obtain the depth characteristics of the template obtained in the step 2And (3) searching the depth feature of the area of the current frame obtained in the step (3)>Performing similarity measurement to obtain a response graph h obtained by using Euclidean distance measurement mode d (Z,S t );
The specific steps of the step 5 are as follows:
first template frame depth featureSearch area +.>Sliding operation is performed, and during the sliding operation, euclidean distance is used for measuring the depth characteristic of the template frame +.>And the current frame search area->Neutron block->Is measured as follows: />
Finally, a response diagram h is obtained through Euclidean distance measurement mode d (Z,S t ) Is that
Step 6, for the response graph h obtained in the step 4 c (Z,S t ) And the response diagram h obtained in the step 5 d (Z,S t ) Weighting and fusing to obtain final response diagram h (Z, S) t ) And fusingResponse plot h (Z, S) t ) To a fixed size, the response map h (Z, S t ) The maximum point is the position of the tracking target, and then the width and height of the target are updated in a linear interpolation mode, so that the tracking of the target of the current frame is realized.
3. A twin network target tracking method based on different metric criteria as defined in claim 2 wherein: the specific process of the step 2 is as follows: the specific process of the step 2 is as follows:
step 2.1, acquiring a tracking video, manually selecting an area where a target is located on a first frame of the video, enabling (x, y) to be the center point coordinate of the target in the first frame, m and n to be the width and height of the target area respectively, taking the center point (x, y) of the target in the first frame as the center, and intercepting a square area with the side length of z_sz, wherein the calculation formula of z_sz is as follows:
wherein p= (m+n)/4 represents the filling amount;
step 2.2, if the square region with the side length of z_sz exceeds the first frame image of the tracking video, filling the excess with the average value of the first frame image of the tracking video, wherein the average value of the first frame imageThe calculation is carried out by adopting the following formula (2):
wherein ,pixel values representing an ith channel, a jth row, and a kth column in a first frame target region;
step 2.3, scaling the square region with the side length of z_sz to b×b to obtain a target region Z of the first frame image, and inputting the target region Z of the first frame image into the feature extraction networkDepth characteristics w×h×c are obtained for the width, height, and channel number>
4. A twin network target tracking method based on different metric criteria as defined in claim 3, wherein: the specific process of the step 3 is as follows:
step 3.1, entering a subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x t-1 ,y t-1 ) And width and height (m t-1 ,n t-1 ) A square area with the side length of x_sx is cut off on the current t frame image, and the side length of x_sx is calculated according to the following formula:
wherein pt-1 =(w t-1 +n t-1 ) 4, representing the filling amount;
step 3.2, if the square area intercepted in the step 3.1 exceeds the current t frame image, filling the excess part with the mean value of the t frame image, wherein the mean value of the current t frame image is calculated by adopting the following formula:
wherein ,pixel values representing an ith channel, a jth row, and a kth column in a subsequent frame target region;
step 3.3, scaling the square area with the side length of x_sx to a size of a×a to obtain a search area S of the current t-th frame image t Search area S of current t-th frame image t Input to the feature extraction network selected in step 1Depth feature of current frame search region of W×H×C with width, height and channel number obtained>
5. The twin network target tracking method based on different metric criteria of claim 1, wherein: the specific process of the step 6 is as follows:
step 6.1, for response graph h obtained by cosine similarity d (Z,S t ) And Euclidean distance obtained response graph h d (Z,S t ) The following weighted fusion is performed to obtain a fused response diagram h (Z, S) t ):
h(Z,S t )=λh c (Z,S t )+(1-λ)h d (Z,S t ) (6);
Step 6.2, the fused response diagram h (Z, S) t ) Interpolation into response maps H (Z, S) of size lambda x lambda t ) Response map H (Z, S t ) The maximum point on the map is the position of the target, and then according to the response map H (Z, S t ) Maximum value and response diagram H (Z, S t ) The deviation (Deltax, deltay) of the center position corrects the target position (x) of the previous frame t-1 ,y t-1 ) Obtain the target position (x t ,y t ) The specific calculation mode is as follows:
step 6.3, update the width and height (w t ,h t ) Firstly, obtaining a scale of the target width-height change by adopting a linear interpolation mode, wherein the calculation mode is as follows:
wherein r is the update rate;
step 6.4, updating the width and height (w t ,h t ) The specific formula is as follows:
and 6.5, finishing the tracking process of the target image of the current frame, taking the next frame as the current frame, and jumping to the step 3 to track the next frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110171718.7A CN112991385B (en) | 2021-02-08 | 2021-02-08 | Twin network target tracking method based on different measurement criteria |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110171718.7A CN112991385B (en) | 2021-02-08 | 2021-02-08 | Twin network target tracking method based on different measurement criteria |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112991385A CN112991385A (en) | 2021-06-18 |
CN112991385B true CN112991385B (en) | 2023-04-28 |
Family
ID=76347410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110171718.7A Active CN112991385B (en) | 2021-02-08 | 2021-02-08 | Twin network target tracking method based on different measurement criteria |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112991385B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113379806B (en) * | 2021-08-13 | 2021-11-09 | 南昌工程学院 | Target tracking method and system based on learnable sparse conversion attention mechanism |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128254A1 (en) * | 2017-12-26 | 2019-07-04 | 浙江宇视科技有限公司 | Image analysis method and apparatus, and electronic device and readable storage medium |
CN111161317A (en) * | 2019-12-30 | 2020-05-15 | 北京工业大学 | Single-target tracking method based on multiple networks |
CN111179314A (en) * | 2019-12-30 | 2020-05-19 | 北京工业大学 | Target tracking method based on residual dense twin network |
CN111639551A (en) * | 2020-05-12 | 2020-09-08 | 华中科技大学 | Online multi-target tracking method and system based on twin network and long-short term clues |
CN111951304A (en) * | 2020-09-03 | 2020-11-17 | 湖南人文科技学院 | Target tracking method, device and equipment based on mutual supervision twin network |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109272530B (en) * | 2018-08-08 | 2020-07-21 | 北京航空航天大学 | Target tracking method and device for space-based monitoring scene |
-
2021
- 2021-02-08 CN CN202110171718.7A patent/CN112991385B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128254A1 (en) * | 2017-12-26 | 2019-07-04 | 浙江宇视科技有限公司 | Image analysis method and apparatus, and electronic device and readable storage medium |
CN111161317A (en) * | 2019-12-30 | 2020-05-15 | 北京工业大学 | Single-target tracking method based on multiple networks |
CN111179314A (en) * | 2019-12-30 | 2020-05-19 | 北京工业大学 | Target tracking method based on residual dense twin network |
CN111639551A (en) * | 2020-05-12 | 2020-09-08 | 华中科技大学 | Online multi-target tracking method and system based on twin network and long-short term clues |
CN111951304A (en) * | 2020-09-03 | 2020-11-17 | 湖南人文科技学院 | Target tracking method, device and equipment based on mutual supervision twin network |
Non-Patent Citations (2)
Title |
---|
"Visual Tracking Based on Siamese Network of Fused Score Map";L. Xu等;《IEEE Access》;20191016;第7卷;全文 * |
"基于孪生网络和多距离融合的行人再识别";秦晓飞等;《光学仪器》;20200229;第42卷(第01期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112991385A (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276785B (en) | Anti-shielding infrared target tracking method | |
CN107818571A (en) | Ship automatic tracking method and system based on deep learning network and average drifting | |
CN107563494A (en) | A kind of the first visual angle Fingertip Detection based on convolutional neural networks and thermal map | |
CN111781608B (en) | Moving target detection method and system based on FMCW laser radar | |
CN113706581B (en) | Target tracking method based on residual channel attention and multi-level classification regression | |
CN111275740B (en) | Satellite video target tracking method based on high-resolution twin network | |
CN112183675B (en) | Tracking method for low-resolution target based on twin network | |
CN112991385B (en) | Twin network target tracking method based on different measurement criteria | |
CN106408596A (en) | Edge-based local stereo matching method | |
CN107945207A (en) | A kind of real-time object tracking method based on video interframe low-rank related information uniformity | |
CN111998862A (en) | Dense binocular SLAM method based on BNN | |
CN111027586A (en) | Target tracking method based on novel response map fusion | |
CN111123953B (en) | Particle-based mobile robot group under artificial intelligence big data and control method thereof | |
CN115908539A (en) | Target volume automatic measurement method and device and storage medium | |
CN113487631B (en) | LEGO-LOAM-based adjustable large-angle detection sensing and control method | |
CN111127510B (en) | Target object position prediction method and device | |
CN112446353B (en) | Video image trace line detection method based on depth convolution neural network | |
CN107038710B (en) | It is a kind of using paper as the Vision Tracking of target | |
CN113064422A (en) | Autonomous underwater vehicle path planning method based on double neural network reinforcement learning | |
CN116777956A (en) | Moving target screening method based on multi-scale track management | |
CN116659500A (en) | Mobile robot positioning method and system based on laser radar scanning information | |
CN106408600A (en) | Image registration method applied to solar high-resolution image | |
CN116469001A (en) | Remote sensing image-oriented construction method of rotating frame target detection model | |
CN114612518A (en) | Twin network target tracking method based on historical track information and fine-grained matching | |
CN116429116A (en) | Robot positioning method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |