CN114757970B - Sample balance-based multi-level regression target tracking method and tracking system - Google Patents

Sample balance-based multi-level regression target tracking method and tracking system Download PDF

Info

Publication number
CN114757970B
CN114757970B CN202210394687.6A CN202210394687A CN114757970B CN 114757970 B CN114757970 B CN 114757970B CN 202210394687 A CN202210394687 A CN 202210394687A CN 114757970 B CN114757970 B CN 114757970B
Authority
CN
China
Prior art keywords
iou
search image
candidate
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210394687.6A
Other languages
Chinese (zh)
Other versions
CN114757970A (en
Inventor
吴晶晶
楚喻棋
刘学亮
洪日昌
蒋建国
齐美彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202210394687.6A priority Critical patent/CN114757970B/en
Publication of CN114757970A publication Critical patent/CN114757970A/en
Application granted granted Critical
Publication of CN114757970B publication Critical patent/CN114757970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a sample balance-based multi-level regression target tracking method and a sample balance-based multi-level regression target tracking system, which are characterized in that fusion characteristics between a candidate frame in a search image and a target frame in a reference image are obtained, and a plurality of cascaded optimization stages are adopted to optimize the candidate frame in the search image; wherein IoU thresholds in the plurality of optimization stages are gradually raised, and the positioning accuracy is gradually raised while balancing the samples; the method overcomes the defect that the balance of sample sampling and sample error is difficult to realize by setting a single threshold value in the existing method.

Description

Sample balance-based multi-level regression target tracking method and tracking system
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a multi-level regression target tracking method and system based on sample balance.
Background
Given the location of an object of interest in a first frame of video, a visual object tracking task aims to continuously locate that object in a subsequent frame of video. The task has higher practical application value in a security system, so that the task is widely focused in the field of computer vision. Although deep learning techniques have been successfully applied to this task, significant progress has been made. But this task remains challenging due to shape changes, dimensional changes, object occlusion, background clutter, etc. of the object.
In the existing target tracker based on deep learning, an offline network mostly adopts a Siamese double-flow network structure, and regression operation of candidate positions is realized by integrating appearance information of a given template and the candidate positions. The following documents:
[1]Li B,Yan J,Wu W,et al.High performance visual tracking with siamese region proposal network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2018:8971-8980.
[2]Li B,Wu W,Wang Q,et al.Siamrpn++:Evolution of siamese visual tracking with very deep networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:4282-4291.
[3]Zhu Z,Wang Q,Li B,et al.Distractor-aware siamese networks for visual object tracking[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:101-117.
[4]He A,Luo C,Tian X,et al.Towards a better match in siamese network based visual object tracker[C]//Proceedings of the European Conference on Computer Vision(ECCV).2018:0-0.
[5]Zhang Z,Peng H.Deeper and wider siamese networks for real-time visualtracking[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019:4591-4600.
in the target tracking task, the most commonly used regression operation is bounding box regression (bounding box regression), which directly learns the deviation between the candidate position and the target real position for correcting the candidate position so that it is closer to the real position. However, due to sample imbalance problems, for example, when the bounding box regresses, only positive sample candidates with a size larger than the set threshold are regressed Intersection over Union (IoU). When the IoU threshold is set higher, the fewer the number of positive samples, and thus the greater the likelihood of overfitting. But when the threshold is set lower there will be more error because a lower threshold will result in more background in the positive sample. Therefore, how to set a reasonable IoU threshold, and to balance the samples and improve the accuracy of tracking and positioning is a critical issue in this task. However, this problem is ignored in existing tracking offline network designs.
Disclosure of Invention
The invention aims to: aiming at the problems in the prior art, the invention provides a multi-level regression target tracking method based on sample balance and a corresponding tracking system. According to the target tracking method, the IoU threshold value is gradually lifted in the positioning of a plurality of stages in cascade, and the positioning precision is gradually lifted while the sample is balanced; the method overcomes the defect that the balance of sample sampling and sample error is difficult to realize by setting a single threshold value in the existing method.
The technical scheme is as follows: the invention discloses a multi-level regression target tracking method based on sample balance, which comprises the following steps:
s1, extracting shallow layer characteristics R of a reference image 1 And deep features R 2 The method comprises the steps of carrying out a first treatment on the surface of the According to R respectively 1 And R is 2 Acquiring shallow layer characteristics a of target in-frame region in reference image by using PrPool layer 1 And deep features a 2
S2, extracting shallow layer characteristics S of the search image 1 And deep features S 2 The method comprises the steps of carrying out a first treatment on the surface of the Acquiring an initial target frame in a search image, and disturbing the initial target frame in the search image to generate a plurality of candidate frames B 0i The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2, …, N is the number of candidate frames in the search image;
s3, according to S respectively 1 And S is 2 The PrPool layer is adopted to obtain shallow layer characteristics and deep layer characteristics in each candidate frame in the search image, and the ith candidate frame B is obtained 0i The shallow features in the inner part are denoted b 1i Deep features are denoted b 2i
Will a 1 And b 1i Channel multiplication is performed to multiply a 2 And b 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that a candidate frame B is obtained 0i Corresponding first fusion feature f i
S4, optimizing candidate frames in the search image in a first stage: incorporating the first fusion feature f i Inputting the first code fusion characteristic f into a first head network i 'A'; will f i ' input first IoU prediction unit, get candidate frame B 0i First predicted IoU value u i The method comprises the steps of carrying out a first treatment on the surface of the If u is i >U 1 For candidate frame B 0i Optimizing by adopting a first bounding box regression unit to obtain an optimized candidate frame B 1i ;U 1 IoU threshold for the first IoU prediction unit;
s5, performing second-stage optimization on the candidate frames after the first-stage optimization: according to S respectively 1 And S is 2 Obtaining optimized candidate frame B in search image by PrPool layer 1i Shallow features b' 1i And deep features b' 2i
Will a 1 And b' 1i Channel multiplication is performed to multiply a 2 And b' 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that an optimized candidate frame B is obtained 1i Corresponding second fusion feature g i The method comprises the steps of carrying out a first treatment on the surface of the Fusing the second fusion feature g i Inputting the second code fusion characteristic g 'into a second head network' i
Will g' i Inputting the second IoU prediction unit to obtain B 1i A second predicted IoU value v of (2) i The method comprises the steps of carrying out a first treatment on the surface of the If v i >U 2 For B 1i Optimizing by adopting a second bounding box regression unit to obtain an optimized candidate frame B 2i ;U 2 IoU threshold for the second IoU prediction unit, and U 2 >U 1
S6, obtaining a plurality of optimized candidate frames after the N candidate frames in the search image are subjected to the steps S4 and S5, selecting M candidate frames with the largest second prediction IoU value, and taking the M candidate frames as a final target frame of the search image after averaging.
Further, in step S2, an ATOM-based online classifier is used to obtain an initial target frame in the search image.
Further, the method further comprises the following steps:
s7, taking the search image as a reference image, taking the next frame image of the search image as a new search image, and re-executing the steps S1 to S6 to realize target tracking in the video.
Further, the IoU threshold U of the first IoU prediction unit 1 IoU threshold U of 0.5 for the second IoU prediction unit 2 0.7.
Further, in the steps S1 and S2, shallow features of the reference image and the search image are extracted by using a shallow feature extractor formed by sequentially connecting an initial convolution layer, a Block1 and two convolution layers of Resnet-50.
Further, in the steps S1 and S2, a deep feature extractor consisting of a Block2-Block4 in Resnet-50 and two convolution layers connected in sequence is adopted to extract deep features of the reference image and the search image.
Further, the parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit and the second bounding box regression unit are trained by:
s11, constructing a sample set, wherein each sample in the sample set comprises: a reference image, a search image, a target frame in the reference image, and a real bounding box of a target in the search image;
s12, processing the reference image and the search image in the sample according to steps S1 to S3, and then performing first-stage optimization processing: inputting the first coding fusion characteristic output by the first head network into a first IoU prediction unit and a real IoU calculation module in parallel; the true IoU computing module at this stage is used for computing IoU values IoU of candidate frames and true bounding frames of targets in the selected search image gt1 The method comprises the steps of carrying out a first treatment on the surface of the If IoU gt1 >U 1 Inputting the candidate frame into a first boundary frame regression unit for optimization to obtain an optimized candidate frame BB 1n N=1, 2, …, N1 are N in the search imageThe number of candidate frames obtained after the candidate frames are subjected to the first-stage optimization treatment;
and (3) performing second-stage optimization treatment: according to BB 1n Obtaining a second fusion characteristic and inputting the second fusion characteristic into a second head network, and inputting the second coding fusion characteristic output by the second head network into a second IoU prediction unit and a real IoU calculation module in parallel, wherein the real IoU calculation module in the stage is used for calculating BB 1n IoU value IoU of real bounding box with target gt2 The method comprises the steps of carrying out a first treatment on the surface of the If IoU gt2 >U 2 Then candidate box BB 1n Inputting the optimized candidate frame BB into a second boundary frame regression unit for optimization 2m M=1, 2, …, N2 are the number of candidate frames obtained by performing the second-stage optimization on the N1 candidate frames obtained by performing the first-stage optimization;
s13, optimizing parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit and the second bounding box regression unit by minimizing a loss function;
the loss function is:
where t represents the current number of training epochs,and->Representing IoU loss in the first stage and IoU loss in the second stage, respectively, for the t-1 generation training:
therein IoU 1i First predicted IoU value corresponding to ith candidate box of search image in sample, ioU 2n A second predicted IoU value corresponding to the nth candidate box representing the search image after the first-stage optimization;
and->Respectively representing the optimization error of the first boundary box regression unit and the optimization error of the second boundary box regression unit during t-1 generation training, +.>
Wherein BB is 1n Representing an nth candidate box, BB, of a search image in a sample after the search image is optimized in a first stage 2m Representing an mth candidate frame of the search image after the second-stage optimization; BB (BB) gt Representing a true bounding box of the object in the search image;mean values of the optimization errors of the first bounding box regression units obtained by training of the 1 to t-1 generation are shown.
On the other hand, the invention also discloses a system for realizing the multi-level regression target tracking method based on sample balance, which comprises the following steps:
a reference image shallow feature extractor 1 for extracting shallow features R of a reference image 1
A reference image deep feature extractor 2 for extracting deep features R of the reference image 2
A reference image shallow PrPool layer 3 for R-dependent 1 Acquiring shallow layer characteristics a in target frame in reference image 1
A reference image deep PrPool layer 4 for R-dependent 2 Obtaining deep layer characteristic a in target frame in reference image 2
A candidate frame generation module 5 for acquiring an initial target frame in the search image, and disturbing the initial target frame in the search image to generate a plurality of candidate frames B 0i
A search image shallow feature extractor 6 for extracting shallow features S of the search image 1
A search image deep feature extractor 7 for extracting deep features S of the search image 2
Searching the image shallow PrPool layer 8 for the image according to S 1 Obtaining search image candidate frame B 0i Internal shallow features b 1i
Search image deep PrPool layer 9 for according to S 2 Obtaining search image candidate frame B 0i Deep-inside features b 2i
A first fusion feature acquisition module 10 for acquiring a 1 And b 1i Channel multiplication is performed to multiply a 2 And b 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that a candidate frame B is obtained 0i Corresponding first fusion feature f i
A first optimizing module 11, configured to perform a first-stage optimization on candidate frames in the search image: incorporating the first fusion feature f i Inputting the first code fusion characteristic f into a first head network i 'A'; will f i ' input first IoU prediction unit, get candidate frame B 0i First predicted IoU value u i The method comprises the steps of carrying out a first treatment on the surface of the If u is i >U 1 For candidate frame B 0i Optimizing by adopting a first bounding box regression unit to obtain an optimized candidate frame B 1i ;U 1 IoU threshold for the first IoU prediction unit;
a second optimizing module 12, configured to perform a second-stage optimization on the candidate frame after the first-stage optimization: according to S respectively 1 And S is 2 Obtaining optimized candidate frame B in search image by PrPool layer 1i Shallow features b' 1i And deep features b' 2i
Will a 1 And b' 1i Channel multiplication is performed to multiply a 2 And b' 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that an optimized candidate frame B is obtained 1i Corresponding second fusion feature g i The method comprises the steps of carrying out a first treatment on the surface of the Fusing the second fusion feature g i Inputting the second code fusion characteristic g 'into a second head network' i
Will g' i Inputting the second IoU prediction unit to obtain B 1i A second predicted IoU value v of (2) i The method comprises the steps of carrying out a first treatment on the surface of the If v i >U 2 For B 1i Optimizing by adopting a second bounding box regression unit to obtain an optimized candidate frame B 2i ;U 2 IoU threshold for the second IoU prediction unit, and U 2 >U 1
The final target frame obtaining module 13 is configured to select M candidate frames with the largest second prediction IoU value from the multiple optimized candidate frames obtained by processing the N candidate frames in the search image by the first optimizing module 11 and the second optimizing module 12, and average the M candidate frames to be used as the final target frame of the search image.
Further, the target tracking system further includes a loss function calculation module 14 for calculating a loss function value by training parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit, and the second bounding box regression unit;
the loss function is:
where t represents the current number of training epochs,and->Representing IoU loss in the first stage and IoU loss in the second stage, respectively, for the t-1 generation training:
therein IoU 1i First predicted IoU value corresponding to nth candidate box of search image in sample, ioU 2n A second predicted IoU value corresponding to the nth candidate box representing the search image after the first-stage optimization IoU gt1 And IoU gt2 Respectively, search imagesTrue IoU values of candidate boxes in the first and second stage optimizations;
and->Respectively representing the optimization error of the first boundary box regression unit and the optimization error of the second boundary box regression unit during t-1 generation training, +.>
Wherein BB is 1n Representing an nth candidate box, BB, of a search image in a sample after the search image is optimized in a first stage 2m Representing an mth candidate frame of the search image after the second-stage optimization; BB (BB) gt Representing a true bounding box of the object in the search image;
mean values of the optimization errors of the first bounding box regression units obtained by training of the 1 to t-1 generation are shown.
The beneficial effects are that: the invention discloses a sample balance-based multi-level regression target tracking method and a tracking system, which design a multi-level regression network, and increase IoU threshold values of candidate frames stage by stage through a positioning process of cascading two stages. The first optimization stage sets a smaller IoU threshold to increase the number of positive samples (IoU candidate boxes greater than the threshold are marked as positive samples), thereby achieving a balance of training samples. After the optimization phase of the first positional regression, the quality of the candidate boxes will be improved. Therefore, the IoU threshold value is lifted in the second optimization stage, a large number of positive samples can be kept, and meanwhile, the candidate frames are subjected to further positioning regression, so that the regression accuracy of the candidate frames is improved. In summary, the invention sets different IoU thresholds at different stages, thereby alleviating the balance problem of the sample and improving the positioning accuracy through positioning layer by layer.
Drawings
FIG. 1 is a flow chart of a sample balance-based multi-level regression target tracking method disclosed by the invention;
FIG. 2 is a schematic diagram of the composition of a sample balance based multi-level regression target tracking system;
FIG. 3 is a schematic diagram of the composition of a two-stage optimization module;
FIG. 4 is a process flow diagram of a two-stage optimization stage in the training process.
Detailed Description
The invention is further elucidated below in connection with the drawings and the detailed description.
The invention discloses a multi-level regression target tracking method based on sample balance, wherein a flow chart of the method is shown in fig. 1, and fig. 2 is a composition schematic diagram of a tracking system for realizing the target tracking method. The target tracking method comprises the following steps:
s1, extracting shallow layer characteristics R of a reference image 1 And deep features R 2 The method comprises the steps of carrying out a first treatment on the surface of the According to R respectively 1 And R is 2 Acquiring shallow layer characteristics a of target in-frame region in reference image by using PrPool layer 1 And deep features a 2
S2, extracting shallow layer characteristics S of the search image 1 And deep features S 2 The method comprises the steps of carrying out a first treatment on the surface of the Acquiring an initial target frame in a search image, and disturbing the initial target frame in the search image to generate a plurality of candidate frames B 0i The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2, …, N is the number of candidate frames in the search image;
in step S2, an ATOM-based online classifier is used to obtain an initial target frame in the search image, where the online classifier is used in literature: danelljan M, bhat G, khan F S, et al ATOM Accurate tracking by overlap maximization [ C]The classifier is capable of obtaining the approximate location of the target as described in detail in// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognizing.2019:4660-4669. The candidate frame generation module 5 perturbs the initial target frame in the search image to generate a plurality of candidate frames B 0i
In steps S1 and S2 of the present embodiment, a shallow extractor and a deep extractor with shared parameters are used to obtain two-scale trunk features in the reference image and the search image; specifically, the reference image shallow feature extractor 1 and the search image shallow feature extractor 6 are both shallow feature extractors formed by sequentially connecting an initial convolution layer of Resnet-50, block1 and two convolution layers; the reference image deep feature extractor 2 and the search image deep feature extractor 7 are deep feature extractors formed by sequentially connecting Block2-Block4 and two convolution layers in Resnet-50. Resnet-50 is described in: [7] the details of Kaiming He, xiangyu Zhang, shaoqing Ren, and Jian Sun 2016.Deep residual learning for image recognment. In Proceedings of the IEEE conference on computer vision and pattern recognment. 770-778 are described in detail.
S3, according to S respectively 1 And S is 2 The PrPool layer is adopted to obtain shallow layer characteristics and deep layer characteristics in each candidate frame in the search image, and the ith candidate frame B is obtained 0i The shallow features in the inner part are denoted b 1i Deep features are denoted b 2i
PrPool layers in reference image shallow PrPool layer 3, reference image deep PrPool layer 4, search image shallow PrPool layer 8, search image deep PrPool layer 9 are described in detail in document [6]Danelljan M,Bhat G,Khan F S,et al.ATOM:Accurate tracking by overlap maximization[C ]// Proceedings of the IEEE Conference on Computer Vision and Pattern Reconnection.2019:4660-4669.
Shallow features a of regions within a target frame in a reference image 1 Shallow features b corresponding to candidate boxes 1i Channel multiplication is carried out to obtain deep features a of the region in the target frame in the reference image 2 Deep features b corresponding to candidate boxes 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that a candidate frame B is obtained 0i Corresponding first fusion feature f i The method comprises the steps of carrying out a first treatment on the surface of the This part of the functionality is implemented by the first fusion feature acquisition module 10, as shown in fig. 2, in whichRepresenting channel multiplication>Indicating a size adjustment->Representing a cascade.
S4, optimizing candidate frames in the search image in a first stage: incorporating the first fusion feature f i Inputting the first code fusion characteristic f into a first head network i 'A'; will f i ' input first IoU prediction unit, get candidate frame B 0i First predicted IoU value u i The method comprises the steps of carrying out a first treatment on the surface of the If u is i >U 1 For candidate frame B 0i Optimizing by adopting a first bounding box regression unit to obtain an optimized candidate frame B 1i ;U 1 IoU threshold for the first IoU prediction unit;
at this stage, ioU threshold U 1 Set to 0.5. That is, candidate positions with a first predicted IoU value greater than 0.5 are marked as candidate frames for preliminary screening, and a first bounding-frame regression unit is used to optimize candidate position B 0i Is selected from the candidate frames of the preliminary screening. Since the IoU threshold at this stage is lower, more screening results can be obtained. The number of the N candidate frames in the search image after the optimization of the first stage is smaller than N. The candidate frame obtained by optimization is marked as B 1i
S5, performing second-stage optimization on the candidate frames after the first-stage optimization: according to S respectively 1 And S is 2 Obtaining optimized candidate frame B in search image by PrPool layer 1i Shallow features b' 1i And deep features b' 2i
Will a 1 And b' 1i Channel multiplication is performed to multiply a 2 And b' 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that an optimized candidate frame B is obtained 1i Corresponding second fusion feature g i The method comprises the steps of carrying out a first treatment on the surface of the Fusing the second fusion feature g i Inputting the second code fusion characteristic g 'into a second head network' i
Will g' i Inputting the second IoU prediction unit to obtain B 1i Second prediction I of (2)oU value v i The method comprises the steps of carrying out a first treatment on the surface of the If v i >U 2 For B 1i Optimizing by adopting a second bounding box regression unit to obtain an optimized candidate frame B 2i ;U 2 IoU threshold for the second IoU prediction unit, and U 2 >U 1
In this stage, ioU threshold U 2 Set to 0.7. The IoU threshold at this stage is higher, i.e. the quality of the candidate frame after screening is higher, and the candidate position B after optimizing it accordingly 2i The quality is higher, thereby progressively improving the quality of the candidate frame.
The first head network and the second head network are each a smaller network located after the backbone network. In the invention, the structures of the first head network and the second head network are all a plurality of convolution layers and a full connection layer which are cascaded in turn, and the characteristics of fixed size and dimension are output. S6, obtaining a plurality of optimized candidate frames after the N candidate frames in the search image are subjected to the steps S4 and S5, selecting M candidate frames with the largest second prediction IoU value, and taking the average as a final target frame of the search image;
steps S4 and S5 are respectively completed by the first optimizing module 11 and the second optimizing module 12, and the structures thereof are shown in (a) and (b) in fig. 3. The final target frame obtaining module 13 selects M candidate frames with the largest second prediction IoU value, and takes the M candidate frames as the final target frame of the search image after averaging.
And S7, when tracking the target in the video, taking the search image as a reference image, taking the next frame image of the search image as a new search image, and re-executing the steps S1 to S6 to realize target tracking in the video.
In this embodiment, the structures and documents of the first IoU prediction unit and the second IoU prediction unit are as follows: the IoU predictors in Danelljan M, bhat G, khan F S, et al ATOM Accurate tracking by overlap maximization [ C ]// Proceedings of the IEEE Conference on Computer Vision and Pattern recording.2019:4660-4669 are the same.
Parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit, and the second bounding box regression unit were trained using the following steps:
s11, constructing a sample set, wherein each sample in the sample set comprises: a reference image, a search image, a target frame in the reference image, and a real bounding box of a target in the search image;
s12, processing the reference image and the search image in the sample according to steps S1 to S3, and then adopting a first-stage optimization process similar to S4: inputting the first coding fusion characteristic output by the first head network into a first IoU prediction unit and a real IoU calculation module in parallel; the true IoU computing module at this stage is used for computing IoU values IoU of candidate frames and true bounding frames of targets in the selected search image gt1 The method comprises the steps of carrying out a first treatment on the surface of the If IoU gt1 >U 1 Inputting the candidate frame into a first boundary frame regression unit for optimization to obtain an optimized candidate frame BB 1n N=1, 2, …, N1 is the number of candidate frames obtained by performing the first-stage optimization processing on N candidate frames in the search image;
a second-stage optimization process similar to S5 is performed: according to BB 1n Obtaining a second fusion characteristic and inputting the second fusion characteristic into a second head network, and inputting the second coding fusion characteristic output by the second head network into a second IoU prediction unit and a real IoU calculation module in parallel, wherein the real IoU calculation module in the stage is used for calculating BB 1n IoU value IoU of real bounding box with target gt2 The method comprises the steps of carrying out a first treatment on the surface of the If IoU gt2 >U 2 Then candidate box BB 1n Inputting the optimized candidate frame BB into a second boundary frame regression unit for optimization 2m M=1, 2, …, N2 are the number of candidate frames obtained by performing the second-stage optimization on the N1 candidate frames obtained by performing the first-stage optimization;
the process flows of the first-stage optimization and the second-stage optimization in training are shown in fig. 4 (a) and (b), respectively. It differs from S4 and S5 in that: during training, the first coding fusion characteristic output by the first head network is input into the first IoU prediction unit and the real IoU calculation module in parallel, and the second coding fusion characteristic output by the second head network is input into the second IoU prediction unit and the real IoU calculation module in parallel; and judging whether the first boundary box regression unit and the second boundary box regression unit are input to carry out regression optimization according to the true IoU value of the candidate box calculated by the true IoU calculation module. That is, training of the first IoU prediction unit and the second IoU prediction unit is to learn the IoU score of the predicted candidate frame, and the output results of the first IoU prediction unit and the second IoU prediction unit are made as close as possible to the true IoU value of the candidate frame calculated by the true IoU calculation module, so that the predicted IoU is used in the case where the candidate frame IoU cannot be acquired at the time of the test. In the first-stage optimization, a smaller IoU threshold value is adopted, more positive samples can be obtained, and the first boundary box regression unit optimizes the positive samples, so that the balance of training samples is ensured; in the second-stage optimization, a larger IoU threshold is adopted, so that the quality of the positive sample is improved again, and the second bounding box regression unit can obtain a candidate box with higher quality.
S13, optimizing parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit and the second bounding box regression unit by minimizing a loss function;
the loss function is:
where t represents the current number of training epochs,and->Representing IoU loss in the first stage and IoU loss in the second stage, respectively, for the t-1 generation training:
therein IoU 1i First predicted IoU value corresponding to ith candidate box of search image in sample, ioU 2n A second predicted IoU value corresponding to the nth candidate box representing the search image after the first-stage optimization;
and->Respectively representing the optimization error of the first boundary box regression unit and the optimization error of the second boundary box regression unit during t-1 generation training, +.>
Wherein BB is 1n Representing an nth candidate box, BB, of a search image in a sample after the search image is optimized in a first stage 2m Representing an mth candidate frame of the search image after the second-stage optimization; BB (BB) gt Representing a true bounding box of the object in the search image;
mean values of the optimization errors of the first bounding box regression units obtained by training of the 1 to t-1 generation are shown.
In the last term of the loss function,and->Is inversely proportional to the coefficient of (c). Therefore, in the early stage of the training phase, the influence of the optimization of the first phase on the loss function is larger, the optimization error of the first phase is smaller and smaller after the bounding box regression unit of the first phase is trained better, and the influence of the optimization of the second phase on the loss function is gradually increased. In the subsequent training, the quality of the candidate position is better and the number of positive samples is also more and more, and the weight occupied by the second stage is increased when the sample balance is kept. By cascading multiple regression networks, the IoU threshold is gradually lifted in the target positioning of multiple stages, and the previous stage sets a smaller IoU threshold to increase the number of positive samples, thereby realizing the balance of training samples. The quality of the candidate frame is improved through the first positioning regression. Thus raising the IoU threshold in the second stageThe value can still keep the number of positive samples unchanged, and the candidate frames are subjected to further positioning regression, so that the regression accuracy of the candidate frames is improved. Therefore, it is possible to gradually improve the accuracy of positioning while balancing the sample.

Claims (10)

1. The multi-level regression target tracking method based on sample balance is characterized by comprising the following steps:
s1, extracting shallow layer characteristics R of a reference image 1 And deep features R 2 The method comprises the steps of carrying out a first treatment on the surface of the According to R respectively 1 And R is 2 Acquiring shallow layer characteristics a of target in-frame region in reference image by using PrPool layer 1 And deep features a 2
S2, extracting shallow layer characteristics S of the search image 1 And deep features S 2 The method comprises the steps of carrying out a first treatment on the surface of the Acquiring an initial target frame in a search image, and disturbing the initial target frame in the search image to generate a plurality of candidate frames B 0i The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2, …, N is the number of candidate frames in the search image;
s3, according to S respectively 1 And S is 2 The PrPool layer is adopted to obtain shallow layer characteristics and deep layer characteristics in each candidate frame in the search image, and the ith candidate frame B is obtained 0i The shallow features in the inner part are denoted b 1i Deep features are denoted b 2i
Will a 1 And b 1i Channel multiplication is performed to multiply a 2 And b 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that a candidate frame B is obtained 0i Corresponding first fusion feature f i
S4, optimizing candidate frames in the search image in a first stage: incorporating the first fusion feature f i Inputting the first code fusion characteristic f into a first head network i 'A'; will f i ' input first IoU prediction unit, get candidate frame B 0i First predicted IoU value u i The method comprises the steps of carrying out a first treatment on the surface of the If u is i >U 1 For candidate frame B 0i Optimizing by adopting a first bounding box regression unit to obtain an optimized candidate frame B 1i ;U 1 Is the firstA IoU threshold value for a IoU prediction unit;
s5, performing second-stage optimization on the candidate frames after the first-stage optimization: according to S respectively 1 And S is 2 Obtaining optimized candidate frame B in search image by PrPool layer 1i Shallow features b' 1i And deep features b' 2i
Will a 1 And b' 1i Channel multiplication is performed to multiply a 2 And b' 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that an optimized candidate frame B is obtained 1i Corresponding second fusion feature g i The method comprises the steps of carrying out a first treatment on the surface of the Fusing the second fusion feature g i Inputting the second code fusion characteristic g 'into a second head network' i
Will g' i Inputting the second IoU prediction unit to obtain B 1i A second predicted IoU value v of (2) i The method comprises the steps of carrying out a first treatment on the surface of the If v i >U 2 For B 1i Optimizing by adopting a second bounding box regression unit to obtain an optimized candidate frame B 2i ;U 2 IoU threshold for the second IoU prediction unit, and U 2 >U 1
S6, obtaining a plurality of optimized candidate frames after the N candidate frames in the search image are subjected to the steps S4 and S5, selecting M candidate frames with the largest second prediction IoU value, and taking the M candidate frames as a final target frame of the search image after averaging.
2. The multi-level regression object tracking method according to claim 1, wherein the initial object frame in the search image is obtained by an ATOM-based on-line classifier in step S2.
3. The multi-level regression target tracking method of claim 1 further comprising:
s7, taking the search image as a reference image, taking the next frame image of the search image as a new search image, and re-executing the steps S1 to S6 to realize target tracking in the video.
4. According toThe multi-level regression target tracking method of claim 1 wherein the IoU threshold U of the first IoU prediction unit 1 IoU threshold U of 0.5 for the second IoU prediction unit 2 0.7.
5. The multi-level regression object tracking method according to claim 1, wherein in the steps S1 and S2, shallow features of the reference image and the search image are extracted by a shallow feature extractor consisting of an initial convolution layer of Resnet-50, block1 and two convolution layers connected in sequence.
6. The multi-level regression object tracking method according to claim 1, wherein in the steps S1 and S2, the deep feature extractor consisting of Block2-Block4 and two convolution layers connected in sequence in the Resnet-50 is used to extract the deep features of the reference image and the search image.
7. The multi-level regression target tracking method of claim 1 wherein the parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit and the second bounding box regression unit are trained by:
s11, constructing a sample set, wherein each sample in the sample set comprises: a reference image, a search image, a target frame in the reference image, and a real bounding box of a target in the search image;
s12, processing the reference image and the search image in the sample according to steps S1 to S3, and then performing first-stage optimization processing: inputting the first coding fusion characteristic output by the first head network into a first IoU prediction unit and a real IoU calculation module in parallel; the true IoU computing module at this stage is used for computing IoU values IoU of candidate frames and true bounding frames of targets in the selected search image gt1 The method comprises the steps of carrying out a first treatment on the surface of the If IoU gt1 >U 1 Inputting the candidate frame into a first boundary frame regression unit for optimization to obtain an optimized candidate frame BB 1n N=1, 2, …, N1 is the number of candidate frames obtained by performing the first-stage optimization processing on N candidate frames in the search image;
and (3) performing second-stage optimization treatment: according to BB 1n Obtaining a second fusion characteristic and inputting the second fusion characteristic into a second head network, and inputting the second coding fusion characteristic output by the second head network into a second IoU prediction unit and a real IoU calculation module in parallel, wherein the real IoU calculation module in the stage is used for calculating BB 1n IoU value IoU of real bounding box with target gt2 The method comprises the steps of carrying out a first treatment on the surface of the If IoU gt2 >U 2 Then candidate box BB 1n Inputting the optimized candidate frame BB into a second boundary frame regression unit for optimization 2m M=1, 2, …, N2 are the number of candidate frames obtained by performing the second-stage optimization on the N1 candidate frames obtained by performing the first-stage optimization;
s13, optimizing parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit and the second bounding box regression unit by minimizing a loss function;
the loss function is:
where t represents the current number of training epochs,and->Representing IoU loss in the first stage and IoU loss in the second stage, respectively, for the t-1 generation training:
therein IoU 1i First predicted IoU value corresponding to ith candidate box of search image in sample, ioU 2n A second predicted IoU value corresponding to the nth candidate box representing the search image after the first-stage optimization;
and->Respectively representing the optimization error of the first boundary box regression unit and the optimization error of the second boundary box regression unit during t-1 generation training, +.>
Wherein BB is 1n Representing an nth candidate box, BB, of a search image in a sample after the search image is optimized in a first stage 2m Representing an mth candidate frame of the search image after the second-stage optimization; BB (BB) gt Representing a true bounding box of the object in the search image;
mean values of the optimization errors of the first bounding box regression units obtained by training of the 1 to t-1 generation are shown.
8. A sample balance-based multi-level regression target tracking system, comprising:
a reference image shallow feature extractor (1) for extracting shallow features R of a reference image 1
A reference image deep feature extractor (2) for extracting deep features R of a reference image 2
A reference image shallow PrPool layer (3) for R-dependent 1 Acquiring shallow layer characteristics a in target frame in reference image 1
A reference image deep PrPool layer (4) for R-dependent 2 Obtaining deep layer characteristic a in target frame in reference image 2
A candidate frame generation module (5) for acquiring an initial target frame in the search image, and disturbing the initial target frame in the search image to generate a plurality of candidate frames B 0i
Searching for shallow features of an imageAn extractor (6) for extracting shallow features S of the search image 1
A search image deep feature extractor (7) for extracting deep features S of the search image 2
Searching the image shallow PrPool layer (8) for the image according to S 1 Obtaining search image candidate frame B 0i Internal shallow features b 1i
Searching the image deep PrPool layer (9) for the image according to S 2 Obtaining search image candidate frame B 0i Deep-inside features b 2i
A first fusion feature acquisition module (10) for acquiring a 1 And b 1i Channel multiplication is performed to multiply a 2 And b 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that a candidate frame B is obtained 0i Corresponding first fusion feature f i
A first optimization module (11) for performing a first stage optimization on candidate boxes in the search image: incorporating the first fusion feature f i Inputting the first code fusion characteristic f into a first head network i 'A'; will f i ' input first IoU prediction unit, get candidate frame B 0i First predicted IoU value u i The method comprises the steps of carrying out a first treatment on the surface of the If u is i >U 1 For candidate frame B 0i Optimizing by adopting a first bounding box regression unit to obtain an optimized candidate frame B 1i ;U 1 IoU threshold for the first IoU prediction unit;
a second optimizing module (12) for performing a second-stage optimization on the candidate frames after the first-stage optimization: according to S respectively 1 And S is 2 Obtaining optimized candidate frame B in search image by PrPool layer 1i Shallow features b' 1i And deep features b' 2i
Will a 1 And b' 1i Channel multiplication is performed to multiply a 2 And b' 2i Performing channel multiplication; the two results of channel multiplication are regulated to be the same size and then are cascaded, so that an optimized candidate frame B is obtained 1i Corresponding second fusion feature g i The method comprises the steps of carrying out a first treatment on the surface of the Fusing the second fusion feature g i Input the firstIn the two-head network, a second coding fusion characteristic g 'is obtained' i
Will g' i Inputting the second IoU prediction unit to obtain B 1i A second predicted IoU value v of (2) i The method comprises the steps of carrying out a first treatment on the surface of the If v i >U 2 For B 1i Optimizing by adopting a second bounding box regression unit to obtain an optimized candidate frame B 2i ;U 2 IoU threshold for the second IoU prediction unit, and U 2 >U 1
And the final target frame acquisition module (13) is used for selecting M candidate frames with the largest second prediction IoU value from a plurality of optimized candidate frames obtained by processing N candidate frames in the search image through the first optimization module (11) and the second optimization module (12), and taking the average as the final target frame of the search image.
9. The multi-level regression object tracking system according to claim 8, wherein the reference image shallow feature extractor (1) and the search image shallow feature extractor (6) are each composed of an initial convolution layer of Resnet-50, block1, and two convolution layers connected in sequence.
10. The multi-level regression target tracking system of claim 8 further comprising a loss function calculation module (14) for calculating a loss function value for training parameters in the first IoU prediction unit, the second IoU prediction unit, the first bounding box regression unit and the second bounding box regression unit;
the loss function is:
where t represents the current number of training epochs,and->Respectively are provided withRepresenting IoU loss in the first stage and IoU loss in the second stage for the t-1 generation training:
therein IoU 1i First predicted IoU value corresponding to ith candidate box of search image in sample, ioU 2n A second predicted IoU value corresponding to the nth candidate box representing the search image after the first-stage optimization IoU gt1 And IoU gt2 True IoU values of candidate frames in the first-stage and second-stage optimization in the search image are respectively obtained;and->Respectively representing the optimization error of the first boundary box regression unit and the optimization error of the second boundary box regression unit during t-1 generation training,
wherein BB is 1n Representing an nth candidate box, BB, of a search image in a sample after the search image is optimized in a first stage 2m Representing an mth candidate frame of the search image after the second-stage optimization; BB (BB) gt Representing a true bounding box of the object in the search image;
mean values of the optimization errors of the first bounding box regression units obtained by training of the 1 to t-1 generation are shown.
CN202210394687.6A 2022-04-15 2022-04-15 Sample balance-based multi-level regression target tracking method and tracking system Active CN114757970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210394687.6A CN114757970B (en) 2022-04-15 2022-04-15 Sample balance-based multi-level regression target tracking method and tracking system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210394687.6A CN114757970B (en) 2022-04-15 2022-04-15 Sample balance-based multi-level regression target tracking method and tracking system

Publications (2)

Publication Number Publication Date
CN114757970A CN114757970A (en) 2022-07-15
CN114757970B true CN114757970B (en) 2024-03-08

Family

ID=82330152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210394687.6A Active CN114757970B (en) 2022-04-15 2022-04-15 Sample balance-based multi-level regression target tracking method and tracking system

Country Status (1)

Country Link
CN (1) CN114757970B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110533691A (en) * 2019-08-15 2019-12-03 合肥工业大学 Method for tracking target, equipment and storage medium based on multi-categorizer
WO2020051776A1 (en) * 2018-09-11 2020-03-19 Intel Corporation Method and system of deep supervision object detection for reducing resource usage
CN112215080A (en) * 2020-09-16 2021-01-12 电子科技大学 Target tracking method using time sequence information
CN112215079A (en) * 2020-09-16 2021-01-12 电子科技大学 Global multistage target tracking method
WO2021208502A1 (en) * 2020-04-16 2021-10-21 中国科学院深圳先进技术研究院 Remote-sensing image target detection method based on smooth bounding box regression function

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020051776A1 (en) * 2018-09-11 2020-03-19 Intel Corporation Method and system of deep supervision object detection for reducing resource usage
CN110533691A (en) * 2019-08-15 2019-12-03 合肥工业大学 Method for tracking target, equipment and storage medium based on multi-categorizer
WO2021208502A1 (en) * 2020-04-16 2021-10-21 中国科学院深圳先进技术研究院 Remote-sensing image target detection method based on smooth bounding box regression function
CN112215080A (en) * 2020-09-16 2021-01-12 电子科技大学 Target tracking method using time sequence information
CN112215079A (en) * 2020-09-16 2021-01-12 电子科技大学 Global multistage target tracking method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于孪生网络的跟踪算法综述;熊昌镇;李言;;工业控制计算机;20200325(第03期);全文 *
基于联合优化的强耦合孪生区域推荐网络的目标跟踪算法;石国强;赵霞;;计算机应用;20201010(第10期);全文 *

Also Published As

Publication number Publication date
CN114757970A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN113220919B (en) Dam defect image text cross-modal retrieval method and model
CN112507996B (en) Face detection method of main sample attention mechanism
CN110033473B (en) Moving target tracking method based on template matching and depth classification network
CN110263666B (en) Action detection method based on asymmetric multi-stream
CN112132856B (en) Twin network tracking method based on self-adaptive template updating
CN112329760A (en) Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
CN112085765B (en) Video target tracking method combining particle filtering and metric learning
CN110728694A (en) Long-term visual target tracking method based on continuous learning
CN112446900B (en) Twin neural network target tracking method and system
CN116188528B (en) RGBT unmanned aerial vehicle target tracking method and system based on multi-stage attention mechanism
CN112215080A (en) Target tracking method using time sequence information
CN114757970B (en) Sample balance-based multi-level regression target tracking method and tracking system
CN114267060A (en) Face age identification method and system based on uncertain suppression network model
CN116935411A (en) Radical-level ancient character recognition method based on character decomposition and reconstruction
Xiang et al. Transformer-based person search model with symmetric online instance matching
CN110991565A (en) Target tracking optimization algorithm based on KCF
CN116168060A (en) Deep twin network target tracking algorithm combining element learning
CN113538507B (en) Single-target tracking method based on full convolution network online training
CN109165587A (en) intelligent image information extraction method
CN115359335A (en) Training method of visual target detection network model
CN113808170B (en) Anti-unmanned aerial vehicle tracking method based on deep learning
CN113901846B (en) Video guidance machine translation method based on space-time attention
CN113223507B (en) Abnormal speech recognition method based on double-input mutual interference convolutional neural network
CN110059584B (en) Event naming method combining boundary distribution and correction
CN112949671B (en) Signal classification method and system based on unsupervised feature optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant