CN112991385A - Twin network target tracking method based on different measurement criteria - Google Patents
Twin network target tracking method based on different measurement criteria Download PDFInfo
- Publication number
- CN112991385A CN112991385A CN202110171718.7A CN202110171718A CN112991385A CN 112991385 A CN112991385 A CN 112991385A CN 202110171718 A CN202110171718 A CN 202110171718A CN 112991385 A CN112991385 A CN 112991385A
- Authority
- CN
- China
- Prior art keywords
- target
- frame
- tracking
- frame image
- follows
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000005259 measurement Methods 0.000 title claims abstract description 43
- 230000004044 response Effects 0.000 claims abstract description 61
- 238000000605 extraction Methods 0.000 claims abstract description 24
- 230000004927 fusion Effects 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 7
- 239000013598 vector Substances 0.000 claims description 7
- 239000003814 drug Substances 0.000 claims description 4
- 230000008859 change Effects 0.000 description 4
- 238000011524 similarity measure Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000009849 deactivation Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Abstract
The invention discloses a twin network target tracking method based on different measurement criteria, which comprises the following specific steps: step 1, selecting a feature extraction network; and 2, acquiring a tracking video, and manually selecting an area where the target is located on the first frame of the video. Obtaining the depth characteristic of the template; step 3, entering a subsequent frame, and obtaining the depth characteristic of the current frame search area by utilizing the coordinate position and the width and the height of the tracking target of the previous frame; step 4, performing similarity measurement on the template depth feature and the current frame search area depth feature by adopting cosine similarity to obtain a response map; step 5, performing similarity measurement on the template depth characteristic and the current frame search area depth characteristic by adopting the Euclidean distance to obtain a response map obtained by using an Euclidean distance measurement mode; and 6, performing weighted fusion on the two response graphs, and determining the position of the target according to the maximum value on the response graphs. The method solves the problems that a target tracking method based on a twin network is easily interfered by similar objects and is not robust to the appearance of a target.
Description
Technical Field
The invention belongs to the technical field of video single-target tracking, and relates to a twin network target tracking method based on different measurement criteria.
Background
In the field of computer vision, target tracking has been an important topic and research direction at present. The target tracking work is to estimate the position, shape or occupied area of a tracked target in a continuous video image sequence and determine the motion information of the target such as the motion speed, direction, track and the like. The target tracking has important research significance and wide application prospect, and is mainly applied to the aspects of video monitoring, man-machine interaction, intelligent transportation, autonomous navigation and the like.
The target tracking method based on the twin network is the mainstream of the current target tracking method. The main idea of the twin network structure is to find a function that can map the input picture to a high dimensional space, so that the simple distance in the target space approximates the "semantic" distance of the input space. More precisely, the structure tries to find a set of parameters such that the similarity measure is small when belonging to the same category and large when belonging to different categories. The network was mainly used for metric learning to calculate the similarity of information such as images, sounds, texts, etc. Especially in the field of face recognition. In target tracking, the twin network usually adopts the target area of the first frame as a template, and continuously performs similarity measurement with the template in subsequent frames to obtain the target position and size. The existing twin network-based target tracking method generally adopts cosine similarity as measurement, and the measurement mode is too single and cannot well deal with the situation that the appearance of a target is greatly changed or similar target interference exists.
Disclosure of Invention
The invention aims to provide a twin network target tracking method based on different measurement criteria, and solves the problem that the conventional twin network target tracking method is easily interfered by similar targets or is not robust to target appearance change to cause tracking failure.
The invention adopts the technical scheme that a twin network target tracking method based on different measurement criteria specifically comprises the following steps:
Step 2, acquiring a tracking video, manually selecting a region where a target is located on a first frame of the video, and inputting a target region Z of the first frame as a template into the feature extraction network selected in the step 1In the method, depth characteristics of the template are obtained
Step 4, the depth characteristics of the template obtained in the step 2 are subjected to cosine similarityAnd the depth characteristic of the current frame search area obtained in the step 3Similarity measurement is carried out to obtain a response graph h obtained by using a cosine similarity measurement modec(Z,St);
Step 5, the template depth characteristics obtained in the step 2 are subjected to Euclidean distanceAnd the current frame search area obtained in step 3Depth featurePerforming similarity measurement to obtain a response graph h obtained by using an Euclidean distance measurement moded(Z,St)。
The invention is also characterized in that:
the specific process of the step 1 is as follows:
selecting an AlexNet network pre-trained on the ImageNet data set as a feature extraction network of the twin network
The specific process of the step 2 is as follows: the specific process of the step 2 is as follows:
step 2.1, acquiring a tracking video, manually selecting an area where a target is located on a first frame of the video, setting (x, y) as coordinates of a central point of the target in the first frame, setting m and n as the width and the height of the target area respectively, taking the central point (x, y) of the target in the first frame as a center, and intercepting a square area with the side length of z _ sz, wherein a calculation formula of z _ sz is as follows:
wherein p ═ m + n)/4, represents the fill level;
step 2.2, if the square area with the side length of z _ sz exceeds the first frame image of the tracking video, filling the exceeding part with the mean value of the first frame image of the tracking video, wherein the mean value of the first frame imageCalculated using the following equation (2):
wherein ,representing the pixel values of the ith channel, the jth row and the kth column in the target area of the first frame;
step 2.3, the square area with the side length of Z _ sz is scaled to b multiplied by b to obtain the target area Z of the first frame image, and the target area Z of the first frame image is input into the feature extraction networkObtaining the depth characteristics of width, height and channel number of w multiplied by h multiplied by C
The specific process of the step 3 is as follows:
step 3.1, entering the subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x) of the t-1 frame imaget-1,yt-1) And width and height (m)t-1,nt-1) A square area with the side length of x _ sx is cut from the current t frame image, and the calculation formula of the side length x _ sx is as follows:
wherein pt-1=(wt-1+nt-1) (4) represents the filling amount;
step 3.2, if the square area intercepted in the step 3.1 exceeds the current t frame image, the exceeding part is filled with the mean value of the t frame image, and the mean value of the current t frame image is calculated by adopting the following formula:
wherein ,representing the pixel values of the ith channel, the jth row and the kth column in the target area of the subsequent frame;
step 3.3, the square area with the side length of x _ sx is scaled to a size of a multiplied by a to obtain a search area S of the current t frame imagetSearching area S of current t frame imagetFeature extraction network input to step 1Obtaining the depth characteristics of the current frame search area with width, height and channel number of W multiplied by H multiplied by C
The specific process of the step 4 is as follows:
first depth features of template framesSearching for a region in a current framePerforming sliding operation, and searching the region in the current frame every time sliding operation is performedThere will always be one sum template frame depth featureAreas of the same sizeWherein i representsIn thatSubscript of upper horizontal shift, j representsIn thatUp a vertically moving subscript, assumingEach time atThe up shift s is a step size, then i and j will take values within the following interval:
wherein i and j are integers;
due to the fact thatAndall of which are w x h x C, will now beAndone-dimensional vector flattened to (w × h × C) × 1Andusing cosine similarity measure of the two vectorsDegree of similarity, solvingAndthe cosine similarity of (c) is as follows:
The specific steps of the step 5 are as follows:
first template frame depth featuresSearching for a region in a current framePerforming sliding operation, and measuring the depth characteristic of the template frame by using Euclidean distance in the sliding operation processAnd a current frame search areaChinese medicine blockThe measure of similarity is as follows:
finally obtaining a response graph h through an Euclidean distance measurement moded(Z,St) Is composed of
The specific process of the step 6 is as follows:
step 6.1, response graph h obtained by cosine similarityd(Z,St) Response graph h obtained from Euclidean distanced(Z,St) Weighted fusion is performed as shown below to obtain a fused response graph h (Z, S)t):
h(Z,St)=λhc(Z,St)+(1-λ)hd(Z,St) (6);
Step 6.2, the fused response graph h (Z, S) is processed by a bicubic interpolation modet) Interpolated to a response plot H (Z, S) of size λ ×t) Response graph H (Z, S)t) The maximum point is the position of the target, and then the response diagram H (Z, S) is usedt) Maximum and response plot H (Z, S) oft) The deviation (Deltax, Deltay) of the center position corrects the target position (x) of the previous framet-1,yt-1) Obtaining the target position (x) of the current framet,yt) The specific calculation method is as follows:
step 6.3, updating the width and height (w) of the current frame targett,ht) Firstly, a linear interpolation mode is adopted to obtain a scale with target width and high variation, and the calculation mode is as follows:
wherein r is the update rate;
step 6.4, updating the width and height (w) of the current t frame target in a mode of multiplying the changed scalet,ht) The concrete formula is as follows:
and 6.5, finishing the tracking process of the target image of the current frame, taking the next frame as the current frame and skipping to the step 3 to track the subsequent frame.
The invention has the following beneficial effects:
1. the invention leads the network to better cope with the situation of similar target change by additionally introducing the Euclidean distance measurement mode, and effectively solves the problem of tracking failure caused by the appearance of similar targets
2. The response graph obtained by Euclidean distance measurement and the response graph obtained by cosine similarity measurement are fused, and the advantages of the two measurement modes are fully utilized, so that the network is more robust to the appearance change of the target, and the tracking drift problem caused by the appearance change of the target is effectively solved.
Drawings
FIG. 1 is a network structure diagram of a twin network target tracking method based on different measurement criteria according to the present invention;
FIG. 2 is a schematic diagram of similarity measurement performed in a twin network target tracking method based on different measurement criteria according to the present invention;
FIG. 3 is a process diagram of an embodiment 1 of the twin network target tracking method based on different metric criteria.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The twin network target tracking method based on different measurement criteria, as shown in fig. 1, comprises the following specific steps:
Step 2, acquiring a tracking video, manually selecting a region where a target is located on a first frame of the video, and inputting a target region Z of the first frame as a template into the feature extraction network selected in the step 1In the method, depth characteristics of the template are obtained
Step 2.1, acquiring a tracking video, manually selecting an area where a target is located on a first frame of the video, setting (x, y) as coordinates of a central point of the target in the first frame, setting m and n as the width and the height of the target area respectively, taking the central point (x, y) of the target in the first frame as a center, and intercepting a square area with the side length of z _ sz, wherein a calculation formula of z _ sz is as follows:
wherein p ═ m + n)/4, represents the fill level;
step 2.2, if the square area with the side length of z _ sz exceeds the first frame image of the tracking video, filling the exceeding part with the mean value of the first frame image of the tracking video, wherein the mean value of the first frame imageCalculated using the following equation (2):
wherein ,representing the pixel values of the ith channel, the jth row and the kth column in the target area of the first frame;
step 2.3, mixingThe square area with the side length of Z _ sz is scaled to b multiplied by b to obtain the target area Z of the first frame image, and the target area Z of the first frame image is input into the feature extraction networkObtaining the depth characteristics of width, height and channel number of w multiplied by h multiplied by C
The specific process of the step 3 is as follows:
step 3.1, entering the subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x) of the t-1 frame imaget-1,yt-1) And width and height (m)t-1,nt-1) A square area with the side length of x _ sx is cut from the current t frame image, and the calculation formula of the side length x _ sx is as follows:
wherein pt-1=(wt-1+nt-1) (4) represents the filling amount;
step 3.2, if the square area intercepted in the step 3.1 exceeds the current t frame image, the exceeding part is filled with the mean value of the t frame image, and the mean value of the current t frame image is calculated by adopting the following formula:
wherein ,representing the pixel values of the ith channel, the jth row and the kth column in the target area of the subsequent frame;
step 3.3, the square area with the side length of x _ sx is scaled to a size of a multiplied by a to obtain a search area S of the current t frame imagetSearching area S of current t frame imagetInput to the feature extraction network selected in step 1Obtaining the depth characteristics of the current frame search area with width, height and channel number of W multiplied by H multiplied by C
Step 4, adopting cosine similarity to match the depth characteristic of the templateAnd current frame search region depth featuresAnd (5) performing similarity measurement. First depth features of template framesSearching for a region in a current frameA sliding operation is performed as shown in fig. 2. Searching the area of the current frame every time sliding operation is performedThere will always be one sum template frame depth featureAreas of the same size. Defining the region asWherein i representsIn thatSubscript of upper horizontal shift, j representsIn thatUp the vertically shifted subscript. Suppose thatEach time atThe up shift s is a step size, then i and j will take values within the following interval:
due to the fact thatAndall of which are w x h x C, will now beAndis flattened into (w)One-dimensional vector of x h x C) x 1Andthe cosine similarity measures the degree of similarity of the two vectors. Solving forAndthe cosine similarity of (c) is as follows:
finally, a response graph h obtained by a cosine similarity measurement modec(Z,St) Is composed ofA collection of (a). h isc(Z,St) The expression of (c) can be written as follows:
denotes the cross-correlation metric operation.
Step 5, the depth characteristics of the template frame are subjected to Euclidean distanceAnd current frame search region depth featuresAnd (4) performing similarity measurement, and adopting a method similar to the cosine similarity measurement in the step 4. First template frame depth featuresSearching for a region in a current frameA sliding operation is performed as shown in fig. 2. In the process of sliding operation, the Euclidean distance is used for measuring the depth characteristic of the template frameAnd a current frame search areaChinese medicine blockThe measure of similarity is as follows:
finally obtaining a response graph h through an Euclidean distance measurement moded(Z,St) Is thatA collection of (a). h isd(Z,St) The expression of (c) can be written as follows:
an "-" indicates an Oldham distance metric operation.
h(Z,St)=λhc(Z,St)+(1-λ)hd(Z,St) (10);
then the fused response graph h (Z, S) is processed by a bicubic interpolation modet) Interpolated to a response plot H (Z, S) of size λ ×t). Response graph H (Z, S)t) The maximum point is the position of the target, and then the response diagram H (Z, S) is usedt) Maximum and response plot H (Z, S) oft) The deviation (Deltax, Deltay) of the center position corrects the target position (x) of the previous framet-1,yt-1) Obtaining the target position (x) of the current framet,yt) The specific calculation method is as follows:
secondly, the width and height (w) of the current frame target is updatedt,ht) Firstly, a linear interpolation mode is adopted to obtain a scale with target width and high variation, and the calculation mode is as follows:
where r is the update rate. And then updating the width and height (w) of the target of the current frame in a mode of multiplying the changed scalet,ht) The concrete formula is as follows:
and (3) ending the tracking process of the target of the current frame, taking the next frame as the current frame and skipping to the step (3) to track the subsequent frame.
Example 1
Table 1 feature extraction network parameter table
Feature extraction networkAs shown in table 1, consists of a total of 5 convolutional layers and 2 pooling layers. The first two convolutional layers are followed by two max pooling layers. Random deactivation layers and RELU nonlinear activation functions are added after the first 4 convolutional layers
And 2, acquiring a tracking video, and manually selecting an area where the target is located on the first frame of the video. Let (x, y) be the coordinates of the center point of the object in the first frame, and m and n be the width and height of the object region, respectively. Taking the center point (x, y) of the target in the first frame as the center, cutting out a square area with the side length of z _ sz, wherein the calculation formula of z _ sz is as follows:
wherein p ═ m + n)/4 represents the filling amount. If the square area is beyond the image, the excess portion is filled with the image mean. The square area of Z _ sz is then scaled to 127 x 127, resulting in the target area Z of the first frame. Finally, inputting the target area Z of the first frame into the feature extraction networkResulting in a depth feature with dimensions of 6 x 256
wherein pt-1=(wt-1+nt-1) And/4, the filling amount. If the square area is beyond the image, the excess portion is filled with the image mean. Then, the square area of x _ sx is scaled to 255 × 255 size to obtain the search area S of the current frametAnd inputting the depth feature of the current frame search area into a feature extraction networkThe size is 22 × 22 × 256;
step 4, adopting cosine similarity to match the depth characteristic of the templateAnd current frame search region depth featuresAnd (5) performing similarity measurement. First depth features of template framesSearching for a region in a current frameA sliding operation is performed as shown in fig. 2. Searching the area of the current frame every time sliding operation is performedThere will always be one sum template frame depth featureAreas of the same sizeWherein i representsIn thatSubscript of upper horizontal shift, j representsIn thatUp the vertically shifted subscript. Suppose thatEach time atIf the up shift s is 1 step, i and j will take values within the following interval:
i∈[1,2,...,17]j∈[1,2,...,17]
due to the fact thatAndare all 6X 256, will now beAndone-dimensional vector flattened to (6 × 6 × 256) × 1Andthe cosine similarity measures the degree of similarity of the two vectors. Solving forAndthe cosine similarity of (c) is as follows:
finally, a response graph h obtained by a cosine similarity measurement modec(Z,St) Is composed ofA collection of (a). h isc(Z,St) The expression of (c) can be written as follows:
denotes the cross-correlation metric operation.
Step 5, the depth characteristics of the template frame are subjected to Euclidean distanceAnd current frame search region depth featuresAnd (4) performing similarity measurement, and adopting a method similar to the cosine similarity measurement in the step 4. First template frame depth featuresSearching for a region in a current frameA sliding operation is performed as shown in fig. 2. In the process of sliding operation, the Euclidean distance is used for measuring the depth characteristic of the template frameAnd a current frame search areaChinese medicine blockThe measure of similarity is as follows:
finally obtaining a response graph h through an Euclidean distance measurement moded(Z,St) Is thatA collection of (a). h isd(Z,St) The expression of (c) can be written as follows:
an "-" indicates an Oldham distance metric operation.
then the fused response graph h (Z, S) is processed by a bicubic interpolation modet) Interpolated to a response plot H (Z, S) of size 272X 272t). Response graph H (Z, S)t) The maximum point is the position of the target, and then the response diagram H (Z, S) is usedt) Maximum and response plot H (Z, S) oft) The deviation (Deltax, Deltay) of the center position corrects the target position (x) of the previous framet-1,yt-1) Obtaining the target position (x) of the current framet,yt) The specific calculation method is as follows:
secondly, the width and height (w) of the current frame target is updatedt,ht). Firstly, obtaining a target width and height variable scale by adopting a linear interpolation mode, wherein the calculation mode is as follows:
the update rate r is made 0.59. The width and height (w) of the current frame target are updated as followst,ht),
And (3) ending the tracking process of the target of the current frame, taking the next frame as the current frame and skipping to the step (3) to track the subsequent frame. As shown in fig. 3, by using the maximum value of the fused response map and the width and height updating method, the target can be located and the size of the target can be determined in the current frame.
Claims (7)
1. A twin network target tracking method based on different measurement criteria is characterized in that: the method specifically comprises the following steps:
Step 2, acquiring a tracking video, manually selecting a region where a target is located on a first frame of the video, and inputting a target region Z of the first frame as a template into the feature extraction network selected in the step 1In the method, depth characteristics of the template are obtained
Step 3, entering the subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x) of the previous frame imaget-1,yt-1) And width and height (w)t-1,ht-1) Obtaining a search region S of the current frame imagetSearching area S of the current frame imagetInput to the feature extraction network selected in step 1Obtaining the depth characteristic of the current frame image search area
Step 4, the depth characteristics of the template obtained in the step 2 are subjected to cosine similarityAnd the depth characteristic of the current frame search area obtained in the step 3Similarity measurement is carried out to obtain a response graph h obtained by using a cosine similarity measurement modec(Z,St);
Step 5, the template depth characteristics obtained in the step 2 are subjected to Euclidean distanceAnd the depth characteristic of the current frame search area obtained in the step 3Performing similarity measurement to obtain a response graph h obtained by using an Euclidean distance measurement moded(Z,St)。
Step 6, response graph h obtained in the step 4c(Z,St) And the response chart obtained in the step 5hd(Z,St) Performing weighted fusion to obtain the final response graph h (Z, S)t) And the fused response graph h (Z, S)t) Interpolation to fixed size, response plot h (Z, S)t) The maximum value point is the position of the tracking target, and the width and the height of the target are updated by adopting a linear interpolation mode, so that the tracking of the current frame target is realized.
3. The twin network target tracking method based on different metric criteria as claimed in claim 2, wherein: the specific process of the step 2 is as follows: the specific process of the step 2 is as follows:
step 2.1, acquiring a tracking video, manually selecting an area where a target is located on a first frame of the video, setting (x, y) as coordinates of a central point of the target in the first frame, setting m and n as the width and the height of the target area respectively, taking the central point (x, y) of the target in the first frame as a center, and intercepting a square area with the side length of z _ sz, wherein a calculation formula of z _ sz is as follows:
wherein p ═ m + n)/4, represents the fill level;
step 2.2, if the square area with the side length of z _ sz exceeds the first frame image of the tracking video, filling the exceeding part with the mean value of the first frame image of the tracking video, wherein the mean value of the first frame imageCalculated using the following equation (2):
wherein ,representing the pixel values of the ith channel, the jth row and the kth column in the target area of the first frame;
step 2.3, the square area with the side length of Z _ sz is scaled to b multiplied by b to obtain the target area Z of the first frame image, and the target area Z of the first frame image is input into the feature extraction networkObtaining the depth characteristics of width, height and channel number of w multiplied by h multiplied by C
4. The twin network target tracking method based on different metric criteria as claimed in claim 3, wherein: the specific process of the step 3 is as follows:
step 3.1, entering the subsequent frame image of the tracking video, and utilizing the tracking target coordinate position (x) of the t-1 frame imaget-1,yt-1) And width and height (m)t-1,nt-1) A square area with the side length of x _ sx is cut from the current t frame image, and the calculation formula of the side length x _ sx is as follows:
wherein pt-1=(wt-1+nt-1) (4) represents the filling amount;
step 3.2, if the square area intercepted in the step 3.1 exceeds the current t frame image, the exceeding part is filled with the mean value of the t frame image, and the mean value of the current t frame image is calculated by adopting the following formula:
wherein ,representing the pixel values of the ith channel, the jth row and the kth column in the target area of the subsequent frame;
step 3.3, the square area with the side length of x _ sx is scaled to a size of a multiplied by a to obtain a search area S of the current t frame imagetSearching area S of current t frame imagetInput to the feature extraction network selected in step 1Obtaining the depth characteristics of the current frame search area with width, height and channel number of W multiplied by H multiplied by C
5. The twin network target tracking method based on different metric criteria as claimed in claim 4, wherein: the specific process of the step 4 is as follows:
first depth features of template framesSearching for a region in a current framePerforming sliding operation, and searching the region in the current frame every time sliding operation is performedThere will always be one sum template frameDepth featureAreas of the same sizeWherein i representsIn thatSubscript of upper horizontal shift, j representsIn thatUp a vertically moving subscript, assumingEach time atThe up shift s is a step size, then i and j will take values within the following interval:
wherein i and j are integers;
due to the fact thatAndall of which are w x h x C, will now beAndone-dimensional vector flattened to (w × h × C) × 1Andmeasuring the similarity of the two vectors by cosine similarity, and solvingAndthe cosine similarity of (c) is as follows:
6. The twin network target tracking method based on different metric criteria as claimed in claim 5, wherein: the specific steps of the step 5 are as follows:
first template frame depth featuresSearching for a region in a current framePerforming sliding operation, and measuring the depth characteristic of the template frame by using Euclidean distance in the sliding operation processAnd a current frame search areaChinese medicine blockThe measure of similarity is as follows:
7. The twin network target tracking method based on different metric criteria as claimed in claim 6, wherein: the specific process of the step 6 is as follows:
step 6.1, response graph h obtained by cosine similarityd(Z,St) Response graph h obtained from Euclidean distanced(Z,St) Weighted fusion is performed as shown below to obtain a fused response graph h (Z, S)t):
h(Z,St)=λhc(Z,St)+(1-λ)hd(Z,St) (6);
Step 6.2, the fused response graph h (Z, S) is processed by a bicubic interpolation modet) Interpolated to a response plot H (Z, S) of size λ ×t) Response graph H (Z, S)t) The maximum point is the position of the target, and then the response diagram H (Z, S) is usedt) Maximum and response plot H (Z, S) oft) The deviation (Deltax, Deltay) of the center position corrects the target position (x) of the previous framet-1,yt-1) Obtaining the target position (x) of the current framet,yt) The specific calculation method is as follows:
step 6.3, updating the width and height (w) of the current frame targett,ht) Firstly, a linear interpolation mode is adopted to obtain a scale with target width and high variation, and the calculation mode is as follows:
wherein r is the update rate;
step 6.4, updating the width and height (w) of the current t frame target in a mode of multiplying the changed scalet,ht) The concrete formula is as follows:
and 6.5, finishing the tracking process of the target image of the current frame, taking the next frame as the current frame and skipping to the step 3 to track the subsequent frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110171718.7A CN112991385B (en) | 2021-02-08 | 2021-02-08 | Twin network target tracking method based on different measurement criteria |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110171718.7A CN112991385B (en) | 2021-02-08 | 2021-02-08 | Twin network target tracking method based on different measurement criteria |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112991385A true CN112991385A (en) | 2021-06-18 |
CN112991385B CN112991385B (en) | 2023-04-28 |
Family
ID=76347410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110171718.7A Active CN112991385B (en) | 2021-02-08 | 2021-02-08 | Twin network target tracking method based on different measurement criteria |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112991385B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113379806A (en) * | 2021-08-13 | 2021-09-10 | 南昌工程学院 | Target tracking method and system based on learnable sparse conversion attention mechanism |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128254A1 (en) * | 2017-12-26 | 2019-07-04 | 浙江宇视科技有限公司 | Image analysis method and apparatus, and electronic device and readable storage medium |
US20200051250A1 (en) * | 2018-08-08 | 2020-02-13 | Beihang University | Target tracking method and device oriented to airborne-based monitoring scenarios |
CN111161317A (en) * | 2019-12-30 | 2020-05-15 | 北京工业大学 | Single-target tracking method based on multiple networks |
CN111179314A (en) * | 2019-12-30 | 2020-05-19 | 北京工业大学 | Target tracking method based on residual dense twin network |
CN111639551A (en) * | 2020-05-12 | 2020-09-08 | 华中科技大学 | Online multi-target tracking method and system based on twin network and long-short term clues |
CN111951304A (en) * | 2020-09-03 | 2020-11-17 | 湖南人文科技学院 | Target tracking method, device and equipment based on mutual supervision twin network |
-
2021
- 2021-02-08 CN CN202110171718.7A patent/CN112991385B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019128254A1 (en) * | 2017-12-26 | 2019-07-04 | 浙江宇视科技有限公司 | Image analysis method and apparatus, and electronic device and readable storage medium |
US20200051250A1 (en) * | 2018-08-08 | 2020-02-13 | Beihang University | Target tracking method and device oriented to airborne-based monitoring scenarios |
CN111161317A (en) * | 2019-12-30 | 2020-05-15 | 北京工业大学 | Single-target tracking method based on multiple networks |
CN111179314A (en) * | 2019-12-30 | 2020-05-19 | 北京工业大学 | Target tracking method based on residual dense twin network |
CN111639551A (en) * | 2020-05-12 | 2020-09-08 | 华中科技大学 | Online multi-target tracking method and system based on twin network and long-short term clues |
CN111951304A (en) * | 2020-09-03 | 2020-11-17 | 湖南人文科技学院 | Target tracking method, device and equipment based on mutual supervision twin network |
Non-Patent Citations (2)
Title |
---|
L. XU等: ""Visual Tracking Based on Siamese Network of Fused Score Map"", 《IEEE ACCESS》 * |
秦晓飞等: ""基于孪生网络和多距离融合的行人再识别"", 《光学仪器》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113379806A (en) * | 2021-08-13 | 2021-09-10 | 南昌工程学院 | Target tracking method and system based on learnable sparse conversion attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN112991385B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109949375B (en) | Mobile robot target tracking method based on depth map region of interest | |
US20220366576A1 (en) | Method for target tracking, electronic device, and storage medium | |
CN108550162B (en) | Object detection method based on deep reinforcement learning | |
CN111781608B (en) | Moving target detection method and system based on FMCW laser radar | |
CN107578430B (en) | Stereo matching method based on self-adaptive weight and local entropy | |
CN107169994B (en) | Correlation filtering tracking method based on multi-feature fusion | |
CN113706581B (en) | Target tracking method based on residual channel attention and multi-level classification regression | |
CN106408596B (en) | Sectional perspective matching process based on edge | |
CN106780631A (en) | A kind of robot closed loop detection method based on deep learning | |
CN109708658B (en) | Visual odometer method based on convolutional neural network | |
CN110111370B (en) | Visual object tracking method based on TLD and depth multi-scale space-time features | |
CN111260661A (en) | Visual semantic SLAM system and method based on neural network technology | |
CN111292369B (en) | False point cloud data generation method of laser radar | |
CN108537825B (en) | Target tracking method based on transfer learning regression network | |
CN107945207A (en) | A kind of real-time object tracking method based on video interframe low-rank related information uniformity | |
WO2023169337A1 (en) | Target object speed estimation method and apparatus, vehicle, and storage medium | |
CN111998862A (en) | Dense binocular SLAM method based on BNN | |
CN112508851A (en) | Mud rock lithology recognition system based on CNN classification algorithm | |
CN112991385A (en) | Twin network target tracking method based on different measurement criteria | |
CN112802199A (en) | High-precision mapping point cloud data processing method and system based on artificial intelligence | |
CN113487631B (en) | LEGO-LOAM-based adjustable large-angle detection sensing and control method | |
CN115908539A (en) | Target volume automatic measurement method and device and storage medium | |
CN100378752C (en) | Segmentation method of natural image in robustness | |
CN113643329B (en) | Twin attention network-based online update target tracking method and system | |
CN112446353B (en) | Video image trace line detection method based on depth convolution neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |