CN112200870B - Single-target tracking method based on combination of classification and position loss of twin network - Google Patents

Single-target tracking method based on combination of classification and position loss of twin network Download PDF

Info

Publication number
CN112200870B
CN112200870B CN202011188664.7A CN202011188664A CN112200870B CN 112200870 B CN112200870 B CN 112200870B CN 202011188664 A CN202011188664 A CN 202011188664A CN 112200870 B CN112200870 B CN 112200870B
Authority
CN
China
Prior art keywords
classification
loss
block
calculating
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011188664.7A
Other languages
Chinese (zh)
Other versions
CN112200870A (en
Inventor
鄢展锋
姚敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CN202011188664.7A priority Critical patent/CN112200870B/en
Publication of CN112200870A publication Critical patent/CN112200870A/en
Application granted granted Critical
Publication of CN112200870B publication Critical patent/CN112200870B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a single target tracking method based on combination of classification and position loss of a twin network, which comprises the following steps: determining the size of the template and the search area after cutting, and taking the area as the input of a model; taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks; selecting a template corresponding to the block and a feature map of the search area by one branch, calculating a response map corresponding to each block to obtain a classification result, and linearly superposing three classification errors; calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension ascending on the convolution characteristic corresponding to the acquired template region according to the channel direction, and carrying out cross-correlation operation to obtain the deviation of the central point and the width and height of the target box and the real box; the total loss of both branches is calculated. According to the single-target tracking method based on the combination of the classification of the twin network and the position loss, the feature extraction is carried out on the preprocessed picture through the modified residual error, so that the output sizes of different blocks are consistent.

Description

Single-target tracking method based on combination of classification and position loss of twin network
Technical Field
The invention relates to the technical field of computer vision digital image processing, in particular to a single-target tracking method based on combination of classification and position loss of a twin network.
Background
Twin networks (Siamese networks) are a supervised model for metric learning. Typically, a twin network has two inputs, which are fed into two neural networks sharing weights, respectively, and then a similarity loss function is performed on the two eigenvectors at the last layer to find the similarity matching the two inputs.
The residual network (ResNet) is a deeper neural network that suppresses degradation problems that occur as the network deepens. The Residual network consists of a series of Residual blocks (Residual blocks), one of which can be expressed as:
x l+1 =x l +F(x l ,W l )
wherein x is l Is an input feature, F (x l ,W l ) Is to perform a plurality of convolution operations on the input features, x l+1 Is an output feature.
Anchors (anchors) are a set of preset rims that frame the target at approximately the possible locations and then adjust based on these preset rims. The anchor is defined by the aspect ratio (ratio) of the frame and the scale (scale) of the frame, which corresponds to a series of preset frame generation rules, and can generate a series of frames at any position of the image. In general, the anchor generates a target frame based on the rule described above with the points of the feature map extracted by the convolutional neural network as the center positions.
Three sets of aspect ratios, typically 0.5, 1 and 2, with three dimensions 8, 16 and 32, can be combined into nine different shapes and sizes of rims. For example, assuming that the area s=16×16 of the rectangular frame, the width and height of the rectangular frame are respectively equal to w and h, there are:
simplifying and obtaining:
nine different rectangular boxes can be obtained after adding the scale factors, as follows:
how to find the target position more accurately and more quickly is a problem to be solved.
Disclosure of Invention
The invention aims to provide a single-target tracking method based on combination of classification and position loss of a twin network, so as to solve the problem of how to find a target position more accurately and more rapidly.
In order to solve the technical problems, the technical scheme of the invention is as follows: providing a single target tracking method based on combination of classification and position loss of a twin network, which comprises the following steps:
step one, determining the size of a template and a search area after cutting, and taking the area as the input of a model;
taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;
step three, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, and classifying by using the last block in a testing stage;
step four, calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the product characteristic corresponding to the acquired template region according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the deviation of the center point and the width and height of the target box and the real frame;
and step five, calculating the total loss of the two branches.
Further, in step three, performing cross-correlation operation according to the convolution feature map corresponding to the extracted block, and calculating a classification loss sum: and carrying out cross-correlation operation after the template features obtained by the same block are up-scaled to obtain the probability that each target box is divided into the foreground and the background, and calculating the classification loss weighting summation of different blocks, wherein the formula is as follows:
L cls =α 1 L 12 L 23 L 3
wherein L represents the classification loss of the ith block and is a classification cross entropy loss function; alpha represents the weight of the corresponding classification loss.
Further, in step four, the formula for calculating the error of the target position regression with the true value:where x represents the element-by-element difference between the prediction and real frames, the parameter σ controls the smoothing of the region, σTaking 3; />Wherein R is smooth L1 Function t i And->The offset of the prediction anchor and the offset of the real frame are represented, respectively. For each anchor, L is calculated reg Partial post-multiplication by p * ,p * Indicating a 1 when there is an object and a 0 when there is no object.
In step five, the results of the two branches are added linearly according to a certain weight, and the calculation formula of the total loss of the two branches is L total =αL cls +γL reg
Wherein L is cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L reg Is the return branch loss, and gamma is the proportion of the return branch.
According to the single-target tracking method based on the combination of the classification and the position loss of the twin network, which is provided by the invention, the characteristics of the preprocessed pictures are extracted through the modified residual error, so that the output sizes of different blocks are consistent. In the training stage, one branch assists in locating the center of the target by linearly weighting the classification losses between different blocks, and the other branch calculates the position regression loss to generate a target frame with a more suitable size. In the test stage, only the classification and position regression result of the last block are used, so that the success rate and the accuracy are improved, and the speed is improved.
Drawings
The invention is further described below with reference to the accompanying drawings:
fig. 1 is a schematic flow chart of a single-target tracking method based on combination of classification and position loss of a twin network according to an embodiment of the present invention.
Detailed Description
The single-target tracking method based on the combination of classification of the twin network and position loss, which is proposed by the invention, is further described in detail below with reference to the accompanying drawings and the specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims. It is noted that the drawings are in a very simplified form and utilize non-precise ratios, and are intended to facilitate a convenient, clear, description of the embodiments of the invention.
The invention has the core idea that the single-target tracking method based on the combination of the classification and the position loss of the twin network extracts the characteristics of the preprocessed picture through the modified residual error, so that the output sizes of different blocks are consistent. In the training stage, one branch assists in locating the center of the target by linearly weighting the classification losses between different blocks, and the other branch calculates the position regression loss to generate a target frame with a more suitable size. In the test stage, only the classification and position regression result of the last block are used, so that the success rate and the accuracy are improved, and the speed is improved.
According to the technical scheme, the invention provides a single-target tracking method based on combination of classification and position loss of a twin network, and fig. 1 is a flow chart of steps of the single-target tracking method based on combination of classification and position loss of the twin network. Referring to fig. 1, a single target tracking method based on a combination of classification and position loss of a twin network is provided, comprising the steps of:
s11, determining the size of the template and the search area after cutting, and taking the area as the input of a model;
s12, taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;
s13, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, and classifying by using the last block in a testing stage;
s14, calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the product characteristic corresponding to the acquired template region according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the deviation of the center point and the width and height of the target box and the real frame;
s15, calculating the total loss of the two branches.
First, in S11, the size of the template area after clipping is determined to be 127, if the template area exceeds the original image boundary, the average value of the image is used as the edge to be filled; determining the size of the cut search area as 255; the template and the search area are taken as two inputs and pass through the backbone network with identical parameters.
In S12, the original residual network is modified to ensure that the convolution feature graphs of the last three blocks are equal in size, the step sizes of the last three blocks are removed, and simultaneously an expansion convolution is added to increase the receptive field. In the embodiment of the invention, the cut area is used as the input of a model, the sizes of template area feature maps corresponding to different blocks are 15x15x512, 15x15x1024 and 15x15x2048, and the sizes of search area feature maps are 31x31x512, 31x31x1024 and 31x31x2048.
Respectively taking template features and search region features of each block to perform cross-correlation operation, wherein the specific formula is expressed as follows:
wherein,and->The feature map of the ith block obtained by the same convolution operation of the template region z and the search region x is represented, the inner product of the response diagram is represented, and b1 represents the inner product.
The result of each block classification is weighted linearly, and the specific formula is expressed as:
L cls =α 1 L 12 L 23 L 3
where L represents the classification loss of the i-th block and α represents the weight of the corresponding classification loss.
The formula for calculating the error of the target position regression and the true value:
wherein x represents element-by-element difference between the prediction frame and the real frame, and the parameter sigma controls smoothing of the area, and sigma takes 3;
wherein R is smooth L1 Function t i Andthe offset of the prediction anchor and the offset of the real frame are represented, respectively. For each anchor, L is calculated reg Partial post-multiplication by p * ,p * When there is an object (positive) is 1, and when there is no object (negative) is 0, this means that only the foreground calculates the loss, and the background does not calculate the loss.
In the fifth step, the results of the two branches are added linearly according to a certain weight, and the calculation formula of the total loss of the two branches is L total =αL cls +γL reg Wherein L is cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L reg Is the return branch loss, and gamma is the proportion of the return branch.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (2)

1. The single target tracking method based on the combination of classification and position loss of the twin network is characterized by comprising the following steps:
step one, determining the size of a template and a search area after cutting, and taking the area as the input of a model;
taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;
step three, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, classifying by using the last block in a testing stage, performing cross-correlation operation according to the convolution feature map corresponding to the extracted block, and calculating a classification loss sum: and carrying out cross-correlation operation after the template features obtained by the same block are up-scaled to obtain the probability that each target box is divided into the foreground and the background, and calculating the classification loss weighting summation of different blocks, wherein the formula is as follows:
L cls =α 1 L 12 L 23 L 3
wherein L represents the classification loss of the ith block and is a classification cross entropy loss function; alpha represents the weight of the corresponding classification loss;
step four, calculating the loss of position regression of the convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the convolution characteristic corresponding to the acquired template area according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the center point of the target box and the real frame and the deviation of the width and the height L And (3) calculating an error formula between the target position regression and the true value:
wherein x represents element-by-element difference between the prediction frame and the real frame, and the parameter sigma controls smoothing of the area, and sigma takes 3; />Wherein R is smooth L1 Function t i And->Respectively representing the offset of the prediction anchor and the offset of the real frame; for each anchor, L is calculated reg Part ×
Multiplying the division by p, wherein p represents 1 when an object is present and 0 when no object is present;
and step five, calculating the total loss of the two branches.
2. The method for single-target tracking by combining classification and position loss of twin network according to claim 1, wherein in step five, the results of two branches are added linearly according to a certain weight, and the calculation formula of total loss of two branches is L total =αL cls +γL reg Wherein L is cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L reg Is the return branch loss, and gamma is the proportion of the return branch.
CN202011188664.7A 2020-10-30 2020-10-30 Single-target tracking method based on combination of classification and position loss of twin network Active CN112200870B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011188664.7A CN112200870B (en) 2020-10-30 2020-10-30 Single-target tracking method based on combination of classification and position loss of twin network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011188664.7A CN112200870B (en) 2020-10-30 2020-10-30 Single-target tracking method based on combination of classification and position loss of twin network

Publications (2)

Publication Number Publication Date
CN112200870A CN112200870A (en) 2021-01-08
CN112200870B true CN112200870B (en) 2024-03-12

Family

ID=74012155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011188664.7A Active CN112200870B (en) 2020-10-30 2020-10-30 Single-target tracking method based on combination of classification and position loss of twin network

Country Status (1)

Country Link
CN (1) CN112200870B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052873B (en) * 2021-03-16 2022-09-09 南京理工大学 Single-target tracking method for on-line self-supervision learning scene adaptation
CN113129341B (en) 2021-04-20 2021-12-14 广东工业大学 Landing tracking control method and system based on light-weight twin network and unmanned aerial vehicle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111179314A (en) * 2019-12-30 2020-05-19 北京工业大学 Target tracking method based on residual dense twin network
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
WO2020173036A1 (en) * 2019-02-26 2020-09-03 博众精工科技股份有限公司 Localization method and system based on deep learning
CN111640136A (en) * 2020-05-23 2020-09-08 西北工业大学 Depth target tracking method in complex environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020173036A1 (en) * 2019-02-26 2020-09-03 博众精工科技股份有限公司 Localization method and system based on deep learning
CN111179307A (en) * 2019-12-16 2020-05-19 浙江工业大学 Visual target tracking method for full-volume integral and regression twin network structure
CN111179314A (en) * 2019-12-30 2020-05-19 北京工业大学 Target tracking method based on residual dense twin network
CN111640136A (en) * 2020-05-23 2020-09-08 西北工业大学 Depth target tracking method in complex environment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨康 ; 宋慧慧 ; 张开华 ; .基于双重注意力孪生网络的实时视觉跟踪.计算机应用.2019,(06),全文. *
石国强 ; 赵霞 ; .基于联合优化的强耦合孪生区域推荐网络的目标跟踪算法.计算机应用.2020,(10),全文. *

Also Published As

Publication number Publication date
CN112200870A (en) 2021-01-08

Similar Documents

Publication Publication Date Title
US11208985B2 (en) Correction method and apparatus for predicted wind speed of wind farm
CN112200870B (en) Single-target tracking method based on combination of classification and position loss of twin network
CN111126472A (en) Improved target detection method based on SSD
CN112016507A (en) Super-resolution-based vehicle detection method, device, equipment and storage medium
CN113128355A (en) Unmanned aerial vehicle image real-time target detection method based on channel pruning
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN112287832A (en) High-resolution remote sensing image-based urban illegal building detection method
CN113408423A (en) Aquatic product target real-time detection method suitable for TX2 embedded platform
CN111798447B (en) Deep learning plasticized material defect detection method based on fast RCNN
CN111553348A (en) Anchor-based target detection method based on centernet
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN112070037B (en) Road extraction method, device, medium and equipment based on remote sensing image
CN111862122A (en) Corrugated board stacking layer number counting method based on deep learning
CN112365511A (en) Point cloud segmentation method based on overlapped region retrieval and alignment
CN114972759A (en) Remote sensing image semantic segmentation method based on hierarchical contour cost function
CN114782714A (en) Image matching method and device based on context information fusion
CN114140485A (en) Method and system for generating cutting track of main root of panax notoginseng
CN117253188A (en) Transformer substation grounding wire state target detection method based on improved YOLOv5
CN114419078B (en) Surface defect region segmentation method and device based on convolutional neural network
CN115984559A (en) Intelligent sample selection method and related device
CN113989267B (en) Battery defect detection method based on lightweight neural network
CN115995020A (en) Small target detection algorithm based on full convolution
CN111860332B (en) Dual-channel electrokinetic diagram part detection method based on multi-threshold cascade detector
CN112686310B (en) Anchor frame-based prior frame design method in target detection algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant