CN113160247A - Anti-noise twin network target tracking method based on frequency separation - Google Patents

Anti-noise twin network target tracking method based on frequency separation Download PDF

Info

Publication number
CN113160247A
CN113160247A CN202110433521.6A CN202110433521A CN113160247A CN 113160247 A CN113160247 A CN 113160247A CN 202110433521 A CN202110433521 A CN 202110433521A CN 113160247 A CN113160247 A CN 113160247A
Authority
CN
China
Prior art keywords
target
convolution
feature map
graph
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110433521.6A
Other languages
Chinese (zh)
Other versions
CN113160247B (en
Inventor
陈飞
王志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202110433521.6A priority Critical patent/CN113160247B/en
Publication of CN113160247A publication Critical patent/CN113160247A/en
Application granted granted Critical
Publication of CN113160247B publication Critical patent/CN113160247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an anti-noise twin network target tracking method based on frequency separation, which is characterized by comprising the following steps of: firstly, a convolutional neural network is utilized to carry out feature extraction on a tracking target and a subsequent frame search area image. And then further generating a high-dimensional feature map for the search area feature map and the template feature map by using an Ocatave convolution structure, fusing the cross-correlation corresponding maps after the cross-correlation operation is completed to obtain a target position regression map, obtaining an object perception classification result map by using target position regression information, obtaining a conventional classification map by using the same method, obtaining a final classification result map, and completing the determination of the target position. The invention utilizes high-frequency and low-frequency information exchange to enhance the anti-noise capability of the network, and simultaneously introduces a new feature fusion method, the feature fusion method can aggregate local and global context information, and the problem of poor tracking effect in a noise environment in the existing target tracking method is solved.

Description

Anti-noise twin network target tracking method based on frequency separation
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an anti-noise twin network target tracking method based on frequency separation.
Background
Due to their wide application in applications such as autopilot, traffic flow monitoring, surveillance, robotics, human-machine interaction, medical diagnostic systems, and activity recognition, target tracking is of interest. The specific task is to determine the position of the object in subsequent video frames, knowing its initial position. In recent years, twin network trackers have attracted considerable attention with their balanced speed and accuracy. Pioneering work to learn similarity measures between object targets and candidate images using a twin network, thereby modeling tracking as a search problem of the target over the entire image. Subsequently, a series of twin network based target trackers worked to achieve better performance, and among these trackers, this type of preselected Anchor frame (Anchor) based tracker has a stronger advantage in terms of accuracy by introducing a regional recommendation network. For noisy image data, low-frequency information is full of a large amount of noise, and a target tracking algorithm suffers from the problem that the accuracy is reduced and a target frame drifts. The reason for this is that, on one hand, the addition of noise causes instability of target feature extraction, and on the other hand, the occurrence of noise directly and adversely affects the accuracy of subsequent target position regression and classification. With the development of deep learning technology, the field of image denoising is rapidly developed, and the existing denoising algorithm based on the convolutional neural network generally utilizes the neural network to learn the mapping from a noisy image to a clean image. The direct application of the convolutional neural network-based denoising algorithm to a target tracking task with noise will result in a large increase in the amount of computation. In addition, for most computer vision tasks, it is very difficult to obtain enough clean and noisy images, so how to improve the noise immunity of the target tracking network itself becomes another method to solve the above-mentioned problems.
Disclosure of Invention
In view of this, the present invention provides an anti-noise twin network target tracking method based on frequency separation, which first performs feature extraction on a tracking target and a subsequent frame search region map by using a convolutional neural network. And then further generating a high-dimensional feature map for the search area feature map x and the template feature map z by using an Ocatave convolution structure, fusing corresponding cross-correlation maps after the cross-correlation operation is completed to obtain a target position regression map, obtaining an object perception classification result map by using target position regression information, obtaining a conventional classification map by using the same method, obtaining a final classification result map, and completing the determination of the target position. The invention utilizes high-frequency and low-frequency information exchange to enhance the anti-noise capability of the network, and simultaneously introduces a new feature fusion method, the feature fusion method can aggregate local and global context information, and the problem of poor tracking effect in a noise environment in the existing target tracking method is solved.
The invention specifically adopts the following technical scheme:
an anti-noise twin network target tracking method based on frequency separation is characterized in that: firstly, extracting the characteristics of a tracking target and a subsequent frame search area image by using a convolutional neural network; and then further generating a high-dimensional feature map for the search area feature map x and the template feature map z by using an Ocatave convolution structure, fusing the cross-correlation response maps after the cross-correlation operation is completed to obtain a target position regression map, obtaining an object perception classification result map by using target position regression information, obtaining a conventional classification map by using the same method, obtaining a final classification result map, and completing the determination of the target position.
Further, the method comprises the following steps:
step 1: inputting the initial frame target into a substrate convolution neural network to extract features, and acquiring and storing a template feature map;
step 2: cutting a search area graph in a subsequent frame according to the target position of the previous frame, and inputting the search area graph into a substrate convolutional neural network for feature extraction;
and step 3: performing cross-correlation operation on the template characteristic diagram and the search area characteristic diagram to generate a regression cross-correlation response diagram and a classification cross-correlation response diagram;
and 4, step 4: performing convolution operation on the regression cross-correlation response graph to generate a target position regression result graph;
and 5: performing convolution operation on the classification cross-correlation response graph to generate a conventional classification result graph;
step 6: generating a symmetrical classification result graph by using the target position regression result graph;
and 7: and adding the conventional classification result graph and the symmetrical classification result graph to obtain a final classification result graph, and selecting a position regression numerical value in the regression result graph of the target position corresponding to the maximum position in the classification values to determine the target position.
And, an anti-noise twin network target tracking method based on frequency separation, characterized by comprising the steps of:
step S1: an object needing to be tracked is specified in a first frame of a video image, and a specified target is cut in a current frame to generate a target template picture; extracting features of the target template graph by utilizing a substrate convolution neural network model to obtain a template feature graph z;
step S2: intercepting a subsequent frame target search area graph, extracting features by using a base convolution neural network model, obtaining a subsequent frame search area feature graph x, and further extracting the features of the search area feature graph x and a template feature graph z by using three independent ocave convolution structures to obtain a feature graph (x)11,x12,x13) And (z)11,z12,z13) The same subscripts indicate that they were generated using the same ocave convolution structure;
step S3: using z11For convolution kernels, at x11Performing convolution operation to obtain cross-correlation response diagram R1(ii) a Using z12For convolution kernels, at x12Performing convolution operation to obtain cross-correlation response diagram R2(ii) a Using z13For convolution kernels, at x13Performing convolution operation to obtain cross-correlation response diagram R3
Step S4: cross correlation response map R1And R2Performing feature fusion operation to obtain a feature fusion result graph R4(ii) a And R is again introduced3And R4Carrying out feature fusion to obtain a feature map R'; the feature map R' is convoluted by five convolution kernels, and the final output size is 25 multiplied by 4]The target tracking position regression graph Reg; reg represents the linear distance from each pixel point in the search area to the frame of the predicted target;
step S5: and for the template feature map z and the search area feature map x, performing further feature extraction on the search area feature map x and the template feature map z by using three independent occlusion convolution structures with different parameters from those in the step S2 to obtain a feature map { x }21 x22 x23And { z }21 z22 z23The same subscript indicates that it was generated using the same ocave convolution structure;
step S6: using z21For convolution kernels, at x21Performing convolution operation to obtain cross-correlation response diagram C1(ii) a Using z22For convolution kernels, at x22Performing convolution operation to obtain cross-correlation response diagram C2(ii) a Using z23For convolution kernels, at x23Performing convolution operation to obtain cross-correlation response diagram C3
Step S7: aligning the fixed sampling position of the convolution kernel to a predicted regression box, wherein each position alpha on the classification map is (dx, dy), and the regression map Reg has a corresponding regression prediction frame (x) at the target tracking position1,x2,y1,y2),(x1,x2,y1,y2) Representing the distance of the position to the target frame; by using (x)1,x2,y1,y2) Obtaining M ═ M, my, mw, mh, (mx, my) represents the coordinates of the target center point, and (mw, mh) represents the length and height of the candidate frame, further sampling is carried out from the candidate frame M to obtain the classification score of the characteristic feature prediction position α ═ dx, dy, and the object perception classification result graph Class1 is obtained by the method;
step S8: cross correlation response graph C1And C2Performing feature fusion operation to obtain a feature fusion result graph C4Then, again apply C3And C4Carrying out feature fusion to obtain a feature map C1'; checking features using five convolutionsFIG. C1' convolution operation is performed to obtain a final output size of [25 × 25 × 1]The conventional classification diagram Class2 uses the parameter ratio to perform soft selection on Class1 and Class2, and the selection equation is as follows: obtaining a final target comprehensive classification diagram Class by Class1+ (1-ratio) Class2, wherein alpha belongs to Class at any point, is more than or equal to 0 and less than or equal to 1, and represents the probability value of alpha as the target foreground;
step S9: selecting the position with the maximum target foreground probability value in the target tracking foreground and background classification diagram Class, and determining the corresponding position in the target tracking position regression diagram Reg to obtain the corresponding target frame information: (x)1,x2,y1,y2),(x1,x2,y1,y2) Representing the distance of the location to the target frame.
Further, step S2 specifically includes the following steps:
step S21: carrying out high-low frequency division operation on the search area characteristic diagram x; taking X as an input feature map, firstly generating a preliminary low-frequency feature map X with the length and width reduced by half by using an average pooling operation with the size of 2X 2low1Second to Xlow1Generating low-frequency characteristic diagram X with half number of channels by utilizing conventional convolution operationl1(ii) a Convolution operation is carried out on the X to generate a high-frequency characteristic diagram X with half of the number of channels and unchanged length and widthh1
Step S22: for high frequency characteristic diagram Xh1Firstly, performing average pooling operation with size of 2X 2, and secondly, generating low-frequency feature map X by convolution operationl2(ii) a For low frequency characteristic diagram Xl1Generating a low frequency feature map X of constant size by convolution operationl3(ii) a Mixing Xl2And Xl3Adding to generate low-frequency feature map Xl4(ii) a For high frequency characteristic diagram Xh1Generating a high frequency feature map X of constant size by convolution operationh2(ii) a For low frequency characteristic diagram Xl1First, convolution operation is performed, and then the up-sampling operation with the up-sampling rate of 2 is used to generate the high-frequency feature map Xh3(ii) a Mixing Xh2And Xh3Performing addition operation to generate high-frequency feature map Xh4
Step S23: for high frequency characteristic diagram Xh4All right (1)Using convolution operation to generate characteristic diagram X with output channel number equal to input characteristic diagram channel numberh5(ii) a For low frequency characteristic diagram Xl4Generating a feature map X with the number of output channels equal to the number of input feature map channels by convolution operationl5Secondly, generating a high-frequency characteristic diagram X by utilizing an upsampling operation with an upsampling rate of 2h6(ii) a Mixing Xh5And Xh6Adding to generate an Ocatave convolution structure result x11
Step S24: repeating the steps S21 to S23 to respectively generate the Ocatave convolution result x12And Ocatave convolution result x13
Step S25: generating a feature map (z) by using the template feature map z as an input feature map in accordance with steps S21 to S2411,z12,z13)。
The specific operation for step S5 is similar to step S2.
Further, step S4 specifically includes the following steps:
step S41: cross correlation response map R1And R2Performing addition operation, setting a result characteristic diagram to be X, and calculating a local context weight diagram L (X) ═ f (delta (f (X))) and a global context weight diagram G (X) ═ f (delta (f (GPoling (X)))) of X, wherein delta is a Relu activation function, f represents a point-by-point convolution method, GPoling represents global average pooling operation, and obtaining an attention weight diagram A (X) ═ L (X) + G (X));
step S42: fused cross-correlation response graph R1And R2The result of fusion is R4=R1*A(X)+R2*(1-A(X));
Step S43: referring to step S41, the cross-correlation response graph R is fused3And R4Obtaining R'; the feature map R' is convolved by five convolution kernels, and the output size is [25 × 25 × 4 ]]Target tracking position regression graph Reg.
Further, in step S1, the base convolutional neural network model is obtained by training the convolutional neural network through the same type of picture data set of the image to be tracked.
Compared with the prior art, the invention and the optimized scheme thereof have the following beneficial effects:
1) by introducing a twin Ocatave convolution characteristic representation method, redundant information of low-frequency information is inhibited on one hand, high-frequency information is reserved on the other hand, and the characteristics extracted by a model have stronger anti-noise capability by utilizing the exchange between the high-frequency information and the low-frequency information; and simultaneously, the template characteristic diagram and the search area characteristic diagram are processed by using the same frequency division processing structure, so that the self-similarity of the characteristics is further improved.
2) A fusion method of aggregating global and local contexts is introduced, and fusion weights with the same size as the cross-correlation response graph are generated for point multiplication, so that dynamic soft selection is performed at the level of elements, and the model has stronger self-adaptive capacity.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of a parathyroid target in accordance with an embodiment of the present invention;
FIG. 3 is a graph showing the effect of tracking parathyroid gland in accordance with an embodiment of the present invention.
Detailed Description
In order to make the features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail as follows:
as shown in fig. 1 to fig. 3, the present embodiment provides an anti-noise twin network target tracking method based on frequency separation, which is implemented by the following steps:
step S1: an object needing to be tracked is specified in a current frame of the video image, and a target area is specified in the current frame to generate a target template picture. Extracting features of the target template graph by utilizing a substrate convolution neural network model to obtain a template feature graph z;
step S2: intercepting a subsequent frame target search area graph, extracting features by using a base convolution neural network model, obtaining a subsequent frame search area feature graph x, and further performing one-step search on the search area feature graph x and a template feature graph z by using three independent occlusion convolution structuresStep-by-step feature extraction to obtain a feature map (x)11,x12,x13) And (z)11,z12,z13) The same subscripts indicate that they were generated using the same ocave convolution structure;
step S3: using z11For convolution kernels, at x11Performing convolution operation to obtain cross-correlation response diagram R1(ii) a Using z12For convolution kernels, at x12Performing convolution operation to obtain cross-correlation response diagram R2(ii) a Using z13For convolution kernels, at x13Performing convolution operation to obtain cross-correlation response diagram R3
Step S4: cross correlation response map R1And R2Performing feature fusion operation to obtain a feature fusion result graph R4Then, again apply R3And R4Carrying out feature fusion to obtain a feature map R'; the feature map R' is convoluted by five convolution kernels, and the final output size is 25 multiplied by 4]Target tracking position regression graph Reg. Reg represents the straight-line distance from each pixel point in the search area to the frame of the predicted target.
Step S5: and for the template feature map z and the search area feature map x, further extracting features of the search area feature map x and the template feature map z by using three independent Ocatave convolution structures to obtain a feature map { x21 x22 x23And { z }21z22 z23The same subscript indicates that the same ocave convolution structure is used for generation, and it should be noted that three independent ocave convolution structures used in this step are the same as the ocave convolution structure used in step S2 and are not parameters of the same;
step S6: using z21For convolution kernels, at x21Performing convolution operation to obtain cross-correlation response diagram C1(ii) a Using z22For convolution kernels, at x22Performing convolution operation to obtain cross-correlation response diagram C2(ii) a Using z23For convolution kernels, at x23Performing convolution operation to obtain cross-correlation response diagram C3
Step S7: stationary sampling of convolution kernelsThe sample positions are aligned to a predicted regression box, each position alpha on the classification map is (dx, dy), and the regression map Reg has a corresponding regression prediction frame (x, dy) at the target tracking position1,x2,y1,y2),(x1,x2,y1,y2) Representing the distance of the location to the target frame. By using (x)1,x2,y1,y2) Obtaining M ═ M, my, mw, mh, (mx, my) represents the coordinates of the target center point, and (mw, mh) represents the length and height of the candidate frame, further sampling is carried out from the candidate frame M to obtain the classification score of the characteristic feature prediction position α ═ dx, dy, and the target symmetric classification result graph Class1 is obtained by the method;
step S8: cross correlation response graph C1And C2Performing feature fusion operation to obtain a feature fusion result graph C4Then, again apply C3And C4Carrying out feature fusion to obtain a feature map C1'; checking the feature map C by five convolutions1' convolution operation is performed to obtain a final output size of [25 × 25 × 1]The conventional classification diagram Class2 uses the parameter ratio to perform soft selection on Class1 and Class2, and the selection equation is as follows: obtaining a final target comprehensive classification diagram Class by Class1+ (1-ratio) Class2, wherein alpha belongs to Class at any point, is more than or equal to 0 and less than or equal to 1, and represents the probability value of alpha as the target foreground;
step S9: selecting the position with the maximum target foreground probability value in the target tracking foreground and background classification diagram Class, and determining the corresponding position in the target tracking position regression diagram Reg to obtain the corresponding target frame information: (x)1,x2,y1,y2),(x1,x2,y1,y2) Representing the distance of the location to the target frame.
Specifically, in this embodiment, step S2 specifically includes the following steps:
step S21: and performing high-low frequency division operation on the search area characteristic diagram. Firstly, generating a preliminary low-frequency characteristic diagram with the length and the width reduced by half by using average pooling operation with the size of the initial low-frequency characteristic diagram as an input characteristic diagram, and secondly, generating a low-frequency characteristic diagram with the channel number reduced by half by using conventional convolution operation; generating a high-frequency characteristic diagram with half of the number of channels and unchanged length and width by utilizing convolution operation;
step S22: carrying out average pooling operation with the size of the high-frequency characteristic diagram, and generating a low-frequency characteristic diagram by utilizing convolution operation; for the low-frequency feature map, generating the low-frequency feature map with unchanged size by using convolution operation; adding the sums to generate a low-frequency feature map; for the high-frequency characteristic diagram, generating the high-frequency characteristic diagram with unchanged size by utilizing convolution operation; performing convolution operation on the low-frequency characteristic diagram, and generating a high-frequency characteristic diagram by utilizing up-sampling operation with an up-sampling rate of 2; adding the sums to generate a high-frequency characteristic diagram;
step S23: for the high-frequency characteristic diagram, generating a characteristic diagram with the number of output channels equal to the number of input channels of the characteristic diagram by using convolution operation; for the low-frequency characteristic diagram, generating a characteristic diagram with the number of output channels equal to the number of input channels of the characteristic diagram by convolution operation, and then generating a high-frequency characteristic diagram by upsampling operation with the upsampling rate of 2; adding the sums to generate an Ocatave convolution structure result;
step S24: repeating the steps S21 to S23 to generate an Ocatave convolution result and an Ocatave convolution result;
step S25: similarly, steps S21 to S24 are repeated, and the template feature map is used as the input feature map to generate the feature map.
In the present embodiment, the specific operation of step S5 is similar to step S2.
Specifically, in this embodiment, step S4 specifically includes the following steps:
step S41: cross correlation response map R1And R2Performing addition operation, setting a result characteristic diagram to be X, and calculating a local context weight diagram L (X) ═ f (delta (f (X))) and a global context weight diagram G (X) ═ f (delta (f (GPoling (X)))) of X, wherein delta is a Relu activation function, f represents a point-by-point convolution method, GPoling represents global average pooling operation, and obtaining an attention weight diagram A (X) ═ L (X) + G (X));
step S42: fused cross-correlation response graph R1And R2Fusion result R4=R1*A(X)+R2*(1-A(X));
Step S43: similar to step S41, the cross-correlation response graph R is fused3And R4Obtaining R'; the feature map R' is convolved by five convolution kernels, and the output size is [25 × 25 × 4 ]]The target tracking position regression graph Reg;
the following shows a specific embodiment of the present invention.
The specific steps of the algorithm provided by the invention for tracking the parathyroid target are as follows:
1. establishing a recognition data set q of a prior parathyroid gland1,q2,…,qNCutting each data set picture into target pictures with the size of 255 × 255 and search area pictures with the size of 511 × 511;
2. transmitting the pair of data graphs obtained in the last step into a network model for forward transmission, and outputting a frame regression result and a classification result;
3. calculating a loss function, wherein the regression branch loss function is Lreg=-∑iln (Iou (preg, true)), the conventional classification branch penalty function is Lclass1=-∑p1log(p1)+(1-p1)log(1-p1) Symmetric classification of the branch loss function as Lclass2=-∑p2log(p2)+(1-p2)log(1-p2);
4. Carrying out reverse transmission by using an SGD method, and updating network model parameters;
5. repeating the steps 2) -4) for a plurality of times to train the network model, and obtaining network parameters after the training is finished;
6. inputting the initial frame target into a substrate network to extract features, and storing the template feature map;
7. cutting out a search area image in a subsequent frame according to the target position of the previous frame (for example, the second frame is according to the object position of the first frame, and the third frame is according to the object predicted position of the second frame), and inputting the search area image into a substrate network for feature extraction;
8. performing cross-correlation operation on the template characteristic diagram and the search area characteristic diagram to generate a regression cross-correlation response diagram and a classification cross-correlation response diagram;
9. performing convolution operation on the regression cross-correlation response graph to generate a target position regression result graph, performing convolution operation on the classification cross-correlation response graph to generate a conventional classification result graph, and generating a symmetrical classification result graph by using the target position regression result graph;
10. and adding the conventional classification result graph and the symmetrical classification result graph to obtain a final classification result graph, and selecting a position regression numerical value in the target position regression result graph corresponding to the maximum position in the classification values to determine the target position.
FIG. 3 is a graph illustrating the effects of an example of the target tracking algorithm described above, and the box in FIG. 3 shows the results of the target location for the algorithm. The embodiment provides a twinning Octave convolution characteristic representation method, which utilizes a mode of information exchange between image high-frequency and low-frequency information to retain high-frequency information while suppressing low-frequency component noise information, thereby achieving the purpose of enhancing the anti-noise capability of a network. Meanwhile, a feature fusion method for aggregating global and local contexts is further combined, so that dynamic soft selection is performed on the level of elements, and the model has stronger self-adaptive capacity.
The present invention is not limited to the above preferred embodiments, and other various anti-noise twin network target tracking methods based on frequency separation can be derived by anyone based on the teaching of the present invention, and all equivalent changes and modifications made according to the claims of the present invention shall fall within the scope of the present invention.

Claims (6)

1. An anti-noise twin network target tracking method based on frequency separation is characterized in that: firstly, extracting the characteristics of a tracking target and a subsequent frame search area image by using a convolutional neural network; and then further generating a high-dimensional feature map for the search area feature map x and the template feature map z by using an Ocatave convolution structure, fusing the cross-correlation response maps after the cross-correlation operation is completed to obtain a target position regression map, obtaining an object perception classification result map by using target position regression information, obtaining a conventional classification map by using the same method, obtaining a final classification result map, and completing the determination of the target position.
2. The anti-noise twin network target tracking method based on frequency separation according to claim 1, characterized by comprising the steps of:
step 1: inputting the initial frame target into a substrate convolution neural network to extract features, and acquiring and storing a template feature map;
step 2: cutting a search area graph in a subsequent frame according to the target position of the previous frame, and inputting the search area graph into a substrate convolutional neural network for feature extraction;
and step 3: performing cross-correlation operation on the template characteristic diagram and the search area characteristic diagram to generate a regression cross-correlation response diagram and a classification cross-correlation response diagram;
and 4, step 4: performing convolution operation on the regression cross-correlation response graph to generate a target position regression result graph;
and 5: performing convolution operation on the classification cross-correlation response graph to generate a conventional classification result graph;
step 6: generating a symmetrical classification result graph by using the target position regression result graph;
and 7: and adding the conventional classification result graph and the symmetrical classification result graph to obtain a final classification result graph, and selecting a position regression numerical value in the regression result graph of the target position corresponding to the maximum position in the classification values to determine the target position.
3. An anti-noise twin network target tracking method based on frequency separation is characterized by comprising the following steps:
step S1: an object needing to be tracked is specified in a first frame of a video image, and a specified target is cut in a current frame to generate a target template picture; extracting features of the target template graph by utilizing a substrate convolution neural network model to obtain a template feature graph z;
step S2: intercepting a subsequent frame target search area graph, extracting features by using a base convolution neural network model, obtaining a subsequent frame search area feature graph x, and further extracting the features of the search area feature graph x and a template feature graph z by using three independent ocave convolution structures to obtainCharacteristic diagram (x)11,x12,x13) And (z)11,z12,z13) The same subscripts indicate that they were generated using the same ocave convolution structure;
step S3: using z11For convolution kernels, at x11Performing convolution operation to obtain cross-correlation response diagram R1(ii) a Using z12For convolution kernels, at x12Performing convolution operation to obtain cross-correlation response diagram R2(ii) a Using z13For convolution kernels, at x13Performing convolution operation to obtain cross-correlation response diagram R3
Step S4: cross correlation response map R1And R2Performing feature fusion operation to obtain a feature fusion result graph R4(ii) a And R is again introduced3And R4Carrying out feature fusion to obtain a feature map R'; the feature map R' is convoluted by five convolution kernels, and the final output size is 25 multiplied by 4]The target tracking position regression graph Reg; reg represents the linear distance from each pixel point in the search area to the frame of the predicted target;
step S5: and for the template feature map z and the search area feature map x, performing further feature extraction on the search area feature map x and the template feature map z by using three independent occlusion convolution structures with different parameters from those in the step S2 to obtain a feature map { x }21 x22 x23And { z }21 z22 z23The same subscript indicates that it was generated using the same ocave convolution structure;
step S6: using z21For convolution kernels, at x21Performing convolution operation to obtain cross-correlation response diagram C1(ii) a Using z22For convolution kernels, at x22Performing convolution operation to obtain cross-correlation response diagram C2(ii) a Using z23For convolution kernels, at x23Performing convolution operation to obtain cross-correlation response diagram C3
Step S7: aligning the fixed sampling position of the convolution kernel to a predicted regression box, wherein each position alpha on the classification map is (dx, dy), and the regression map Reg has a corresponding regression prediction frame at the target tracking position(x1,x2,y1,y2),(x1,x2,y1,y2) Representing the distance of the position to the target frame; by using (x)1,x2,y1,y2) Obtaining M ═ M, my, mw, mh, (mx, my) represents the coordinates of the target center point, and (mw, mh) represents the length and height of the candidate frame, further sampling is carried out from the candidate frame M to obtain the classification score of the characteristic feature prediction position α ═ dx, dy, and the object perception classification result graph Class1 is obtained by the method;
step S8: cross correlation response graph C1And C2Performing feature fusion operation to obtain a feature fusion result graph C4Then, again apply C3And C4Performing feature fusion to obtain a feature map C'1(ii) a Checking feature map C 'with five convolutions'1Convolution operation is performed to obtain final output size of 25 × 25 × 1]The conventional classification diagram Class2 uses the parameter ratio to perform soft selection on Class1 and Class2, and the selection equation is as follows: obtaining a final target comprehensive classification diagram Class by Class1+ (1-ratio) Class2, wherein alpha belongs to Class at any point, is more than or equal to 0 and less than or equal to 1, and represents the probability value of alpha as the target foreground;
step S9: selecting the position with the maximum target foreground probability value in the target tracking foreground and background classification diagram Class, and determining the corresponding position in the target tracking position regression diagram Reg to obtain the corresponding target frame information: (x)1,x2,y1,y2),(x1,x2,y1,y2) Representing the distance of the location to the target frame.
4. The anti-noise twin network target tracking method based on frequency separation according to claim 3, wherein the step S2 specifically comprises the following steps:
step S21: carrying out high-low frequency division operation on the search area characteristic diagram x; taking X as an input feature map, firstly generating a preliminary low-frequency feature map X with the length and width reduced by half by using an average pooling operation with the size of 2X 2low1Second to Xlow1Generating low-frequency characteristic diagram X with half number of channels by utilizing conventional convolution operationl1(ii) a Using volumes for xGenerating high-frequency characteristic diagram X with half of channel number, length and width unchanged by product operationh1
Step S22: for high frequency characteristic diagram Xh1Firstly, performing average pooling operation with size of 2X 2, and secondly, generating low-frequency feature map X by convolution operationl2(ii) a For low frequency characteristic diagram Xl1Generating a low frequency feature map X of constant size by convolution operationl3(ii) a Mixing Xl2And Xl3Adding to generate low-frequency feature map Xl4(ii) a For high frequency characteristic diagram Xh1Generating a high frequency feature map X of constant size by convolution operationh2(ii) a For low frequency characteristic diagram Xl1First, convolution operation is performed, and then the up-sampling operation with the up-sampling rate of 2 is used to generate the high-frequency feature map Xh3(ii) a Mixing Xh2And Xh3Performing addition operation to generate high-frequency feature map Xh4
Step S23: for high frequency characteristic diagram Xh4Generating a feature map X with the number of output channels equal to the number of input feature map channels by convolution operationh5(ii) a For low frequency characteristic diagram Xl4Generating a feature map X with the number of output channels equal to the number of input feature map channels by convolution operationl5Secondly, generating a high-frequency characteristic diagram X by utilizing an upsampling operation with an upsampling rate of 2h6(ii) a Mixing Xh5And Xh6Adding to generate an Ocatave convolution structure result x11
Step S24: repeating the steps S21 to S23 to respectively generate the Ocatave convolution result x12And Ocatave convolution result x13
Step S25: generating a feature map (z) by using the template feature map z as an input feature map in accordance with steps S21 to S2411,z12,z13)。
5. The anti-noise twin network target tracking method based on frequency separation according to claim 3, wherein the step S4 specifically comprises the following steps:
step S41: cross correlation response map R1And R2Adding to obtain a result characteristic diagram of X, and calculating the part of XA global context weight map g (x) f (δ (f (x))), where δ is the Relu activation function, f represents a point-by-point convolution method, gpoling represents a global average pooling operation, and an attention weight map a (x) l (x)) + g (x) is obtained;
step S42: fused cross-correlation response graph R1And R2The result of fusion is R4=R1*A(X)+R2*(1-A(X));
Step S43: referring to step S41, the cross-correlation response graph R is fused3And R4Obtaining R'; the feature map R' is convolved by five convolution kernels, and the output size is [25 × 25 × 4 ]]Target tracking position regression graph Reg.
6. The anti-noise twin network target tracking method based on frequency separation of claim 3, characterized in that: in step S1, the base convolutional neural network model is obtained by training the same type of picture data set of the image to be tracked in a convolutional neural network.
CN202110433521.6A 2021-04-22 2021-04-22 Anti-noise twin network target tracking method based on frequency separation Active CN113160247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110433521.6A CN113160247B (en) 2021-04-22 2021-04-22 Anti-noise twin network target tracking method based on frequency separation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110433521.6A CN113160247B (en) 2021-04-22 2021-04-22 Anti-noise twin network target tracking method based on frequency separation

Publications (2)

Publication Number Publication Date
CN113160247A true CN113160247A (en) 2021-07-23
CN113160247B CN113160247B (en) 2022-07-05

Family

ID=76868549

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110433521.6A Active CN113160247B (en) 2021-04-22 2021-04-22 Anti-noise twin network target tracking method based on frequency separation

Country Status (1)

Country Link
CN (1) CN113160247B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019091464A1 (en) * 2017-11-12 2019-05-16 北京市商汤科技开发有限公司 Target detection method and apparatus, training method, electronic device and medium
CN111191555A (en) * 2019-12-24 2020-05-22 重庆邮电大学 Target tracking method, medium and system combining high-low spatial frequency characteristics
CN111462175A (en) * 2020-03-11 2020-07-28 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
CN112069896A (en) * 2020-08-04 2020-12-11 河南科技大学 Video target tracking method based on twin network fusion multi-template features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019091464A1 (en) * 2017-11-12 2019-05-16 北京市商汤科技开发有限公司 Target detection method and apparatus, training method, electronic device and medium
CN111191555A (en) * 2019-12-24 2020-05-22 重庆邮电大学 Target tracking method, medium and system combining high-low spatial frequency characteristics
CN111462175A (en) * 2020-03-11 2020-07-28 华南理工大学 Space-time convolution twin matching network target tracking method, device, medium and equipment
CN112069896A (en) * 2020-08-04 2020-12-11 河南科技大学 Video target tracking method based on twin network fusion multi-template features

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
C.ZHANG ET AL.: "Deeper Siamese Network With Stronger Feature Representation for Visual Tracking", 《IEEE ACCESS》 *
谭建豪等: "引入全局上下文特征模块的DenseNet孪生网络目标跟踪", 《电子与信息学报》 *

Also Published As

Publication number Publication date
CN113160247B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN110210551B (en) Visual target tracking method based on adaptive subject sensitivity
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN113033570B (en) Image semantic segmentation method for improving void convolution and multilevel characteristic information fusion
CN110060286B (en) Monocular depth estimation method
CN110443849B (en) Target positioning method for double-current convolution neural network regression learning based on depth image
CN113706581B (en) Target tracking method based on residual channel attention and multi-level classification regression
CN111401436A (en) Streetscape image segmentation method fusing network and two-channel attention mechanism
CN112507920B (en) Examination abnormal behavior identification method based on time displacement and attention mechanism
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN117252904B (en) Target tracking method and system based on long-range space perception and channel enhancement
CN116185179A (en) Panoramic view visual saliency prediction method and system based on crowdsourcing eye movement data
CN116542991A (en) Network architecture for fracture image segmentation, training method and segmentation method thereof
Ukwuoma et al. Image inpainting and classification agent training based on reinforcement learning and generative models with attention mechanism
CN114358246A (en) Graph convolution neural network module of attention mechanism of three-dimensional point cloud scene
WO2022141718A1 (en) Method and system for assisting point cloud-based object detection
CN114119669A (en) Image matching target tracking method and system based on Shuffle attention
CN113313162A (en) Method and system for detecting multi-scale feature fusion target
CN113160247B (en) Anti-noise twin network target tracking method based on frequency separation
CN112085164A (en) Area recommendation network extraction method based on anchor-frame-free network
CN113379787B (en) Target tracking method based on 3D convolution twin neural network and template updating
CN113592021B (en) Stereo matching method based on deformable and depth separable convolution
Gkillas et al. Federated learning for lidar super resolution on automotive scenes
CN114708423A (en) Underwater target detection method based on improved Faster RCNN
CN109146886B (en) RGBD image semantic segmentation optimization method based on depth density
CN113112522A (en) Twin network target tracking method based on deformable convolution and template updating

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant