CN112200870B - Single-target tracking method based on combination of classification and position loss of twin network - Google Patents
Single-target tracking method based on combination of classification and position loss of twin network Download PDFInfo
- Publication number
- CN112200870B CN112200870B CN202011188664.7A CN202011188664A CN112200870B CN 112200870 B CN112200870 B CN 112200870B CN 202011188664 A CN202011188664 A CN 202011188664A CN 112200870 B CN112200870 B CN 112200870B
- Authority
- CN
- China
- Prior art keywords
- classification
- loss
- block
- calculating
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000010586 diagram Methods 0.000 claims abstract description 5
- 230000004044 response Effects 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 238000012549 training Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000001174 ascending effect Effects 0.000 abstract 1
- 238000000605 extraction Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 4
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/12—Edge-based segmentation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a single target tracking method based on combination of classification and position loss of a twin network, which comprises the following steps: determining the size of the template and the search area after cutting, and taking the area as the input of a model; taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks; selecting a template corresponding to the block and a feature map of the search area by one branch, calculating a response map corresponding to each block to obtain a classification result, and linearly superposing three classification errors; calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension ascending on the convolution characteristic corresponding to the acquired template region according to the channel direction, and carrying out cross-correlation operation to obtain the deviation of the central point and the width and height of the target box and the real box; the total loss of both branches is calculated. According to the single-target tracking method based on the combination of the classification of the twin network and the position loss, the feature extraction is carried out on the preprocessed picture through the modified residual error, so that the output sizes of different blocks are consistent.
Description
Technical Field
The invention relates to the technical field of computer vision digital image processing, in particular to a single-target tracking method based on combination of classification and position loss of a twin network.
Background
Twin networks (Siamese networks) are a supervised model for metric learning. Typically, a twin network has two inputs, which are fed into two neural networks sharing weights, respectively, and then a similarity loss function is performed on the two eigenvectors at the last layer to find the similarity matching the two inputs.
The residual network (ResNet) is a deeper neural network that suppresses degradation problems that occur as the network deepens. The Residual network consists of a series of Residual blocks (Residual blocks), one of which can be expressed as:
x l+1 =x l +F(x l ,W l )
wherein x is l Is an input feature, F (x l ,W l ) Is to perform a plurality of convolution operations on the input features, x l+1 Is an output feature.
Anchors (anchors) are a set of preset rims that frame the target at approximately the possible locations and then adjust based on these preset rims. The anchor is defined by the aspect ratio (ratio) of the frame and the scale (scale) of the frame, which corresponds to a series of preset frame generation rules, and can generate a series of frames at any position of the image. In general, the anchor generates a target frame based on the rule described above with the points of the feature map extracted by the convolutional neural network as the center positions.
Three sets of aspect ratios, typically 0.5, 1 and 2, with three dimensions 8, 16 and 32, can be combined into nine different shapes and sizes of rims. For example, assuming that the area s=16×16 of the rectangular frame, the width and height of the rectangular frame are respectively equal to w and h, there are:
simplifying and obtaining:
nine different rectangular boxes can be obtained after adding the scale factors, as follows:
how to find the target position more accurately and more quickly is a problem to be solved.
Disclosure of Invention
The invention aims to provide a single-target tracking method based on combination of classification and position loss of a twin network, so as to solve the problem of how to find a target position more accurately and more rapidly.
In order to solve the technical problems, the technical scheme of the invention is as follows: providing a single target tracking method based on combination of classification and position loss of a twin network, which comprises the following steps:
step one, determining the size of a template and a search area after cutting, and taking the area as the input of a model;
taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;
step three, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, and classifying by using the last block in a testing stage;
step four, calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the product characteristic corresponding to the acquired template region according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the deviation of the center point and the width and height of the target box and the real frame;
and step five, calculating the total loss of the two branches.
Further, in step three, performing cross-correlation operation according to the convolution feature map corresponding to the extracted block, and calculating a classification loss sum: and carrying out cross-correlation operation after the template features obtained by the same block are up-scaled to obtain the probability that each target box is divided into the foreground and the background, and calculating the classification loss weighting summation of different blocks, wherein the formula is as follows:
L cls =α 1 L 1 +α 2 L 2 +α 3 L 3
wherein L represents the classification loss of the ith block and is a classification cross entropy loss function; alpha represents the weight of the corresponding classification loss.
Further, in step four, the formula for calculating the error of the target position regression with the true value:where x represents the element-by-element difference between the prediction and real frames, the parameter σ controls the smoothing of the region, σTaking 3; />Wherein R is smooth L1 Function t i And->The offset of the prediction anchor and the offset of the real frame are represented, respectively. For each anchor, L is calculated reg Partial post-multiplication by p * ,p * Indicating a 1 when there is an object and a 0 when there is no object.
In step five, the results of the two branches are added linearly according to a certain weight, and the calculation formula of the total loss of the two branches is L total =αL cls +γL reg
Wherein L is cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L reg Is the return branch loss, and gamma is the proportion of the return branch.
According to the single-target tracking method based on the combination of the classification and the position loss of the twin network, which is provided by the invention, the characteristics of the preprocessed pictures are extracted through the modified residual error, so that the output sizes of different blocks are consistent. In the training stage, one branch assists in locating the center of the target by linearly weighting the classification losses between different blocks, and the other branch calculates the position regression loss to generate a target frame with a more suitable size. In the test stage, only the classification and position regression result of the last block are used, so that the success rate and the accuracy are improved, and the speed is improved.
Drawings
The invention is further described below with reference to the accompanying drawings:
fig. 1 is a schematic flow chart of a single-target tracking method based on combination of classification and position loss of a twin network according to an embodiment of the present invention.
Detailed Description
The single-target tracking method based on the combination of classification of the twin network and position loss, which is proposed by the invention, is further described in detail below with reference to the accompanying drawings and the specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims. It is noted that the drawings are in a very simplified form and utilize non-precise ratios, and are intended to facilitate a convenient, clear, description of the embodiments of the invention.
The invention has the core idea that the single-target tracking method based on the combination of the classification and the position loss of the twin network extracts the characteristics of the preprocessed picture through the modified residual error, so that the output sizes of different blocks are consistent. In the training stage, one branch assists in locating the center of the target by linearly weighting the classification losses between different blocks, and the other branch calculates the position regression loss to generate a target frame with a more suitable size. In the test stage, only the classification and position regression result of the last block are used, so that the success rate and the accuracy are improved, and the speed is improved.
According to the technical scheme, the invention provides a single-target tracking method based on combination of classification and position loss of a twin network, and fig. 1 is a flow chart of steps of the single-target tracking method based on combination of classification and position loss of the twin network. Referring to fig. 1, a single target tracking method based on a combination of classification and position loss of a twin network is provided, comprising the steps of:
s11, determining the size of the template and the search area after cutting, and taking the area as the input of a model;
s12, taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;
s13, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, and classifying by using the last block in a testing stage;
s14, calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the product characteristic corresponding to the acquired template region according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the deviation of the center point and the width and height of the target box and the real frame;
s15, calculating the total loss of the two branches.
First, in S11, the size of the template area after clipping is determined to be 127, if the template area exceeds the original image boundary, the average value of the image is used as the edge to be filled; determining the size of the cut search area as 255; the template and the search area are taken as two inputs and pass through the backbone network with identical parameters.
In S12, the original residual network is modified to ensure that the convolution feature graphs of the last three blocks are equal in size, the step sizes of the last three blocks are removed, and simultaneously an expansion convolution is added to increase the receptive field. In the embodiment of the invention, the cut area is used as the input of a model, the sizes of template area feature maps corresponding to different blocks are 15x15x512, 15x15x1024 and 15x15x2048, and the sizes of search area feature maps are 31x31x512, 31x31x1024 and 31x31x2048.
Respectively taking template features and search region features of each block to perform cross-correlation operation, wherein the specific formula is expressed as follows:
wherein,and->The feature map of the ith block obtained by the same convolution operation of the template region z and the search region x is represented, the inner product of the response diagram is represented, and b1 represents the inner product.
The result of each block classification is weighted linearly, and the specific formula is expressed as:
L cls =α 1 L 1 +α 2 L 2 +α 3 L 3
where L represents the classification loss of the i-th block and α represents the weight of the corresponding classification loss.
The formula for calculating the error of the target position regression and the true value:
wherein x represents element-by-element difference between the prediction frame and the real frame, and the parameter sigma controls smoothing of the area, and sigma takes 3;
wherein R is smooth L1 Function t i Andthe offset of the prediction anchor and the offset of the real frame are represented, respectively. For each anchor, L is calculated reg Partial post-multiplication by p * ,p * When there is an object (positive) is 1, and when there is no object (negative) is 0, this means that only the foreground calculates the loss, and the background does not calculate the loss.
In the fifth step, the results of the two branches are added linearly according to a certain weight, and the calculation formula of the total loss of the two branches is L total =αL cls +γL reg Wherein L is cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L reg Is the return branch loss, and gamma is the proportion of the return branch.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (2)
1. The single target tracking method based on the combination of classification and position loss of the twin network is characterized by comprising the following steps:
step one, determining the size of a template and a search area after cutting, and taking the area as the input of a model;
taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;
step three, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, classifying by using the last block in a testing stage, performing cross-correlation operation according to the convolution feature map corresponding to the extracted block, and calculating a classification loss sum: and carrying out cross-correlation operation after the template features obtained by the same block are up-scaled to obtain the probability that each target box is divided into the foreground and the background, and calculating the classification loss weighting summation of different blocks, wherein the formula is as follows:
L cls =α 1 L 1 +α 2 L 2 +α 3 L 3
wherein L represents the classification loss of the ith block and is a classification cross entropy loss function; alpha represents the weight of the corresponding classification loss;
step four, calculating the loss of position regression of the convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the convolution characteristic corresponding to the acquired template area according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the center point of the target box and the real frame and the deviation of the width and the height L And (3) calculating an error formula between the target position regression and the true value:
wherein x represents element-by-element difference between the prediction frame and the real frame, and the parameter sigma controls smoothing of the area, and sigma takes 3; />Wherein R is smooth L1 Function t i And->Respectively representing the offset of the prediction anchor and the offset of the real frame; for each anchor, L is calculated reg Part ×
Multiplying the division by p, wherein p represents 1 when an object is present and 0 when no object is present;
and step five, calculating the total loss of the two branches.
2. The method for single-target tracking by combining classification and position loss of twin network according to claim 1, wherein in step five, the results of two branches are added linearly according to a certain weight, and the calculation formula of total loss of two branches is L total =αL cls +γL reg Wherein L is cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L reg Is the return branch loss, and gamma is the proportion of the return branch.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011188664.7A CN112200870B (en) | 2020-10-30 | 2020-10-30 | Single-target tracking method based on combination of classification and position loss of twin network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011188664.7A CN112200870B (en) | 2020-10-30 | 2020-10-30 | Single-target tracking method based on combination of classification and position loss of twin network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112200870A CN112200870A (en) | 2021-01-08 |
CN112200870B true CN112200870B (en) | 2024-03-12 |
Family
ID=74012155
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011188664.7A Active CN112200870B (en) | 2020-10-30 | 2020-10-30 | Single-target tracking method based on combination of classification and position loss of twin network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112200870B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113052873B (en) * | 2021-03-16 | 2022-09-09 | 南京理工大学 | Single-target tracking method for on-line self-supervision learning scene adaptation |
CN113129341B (en) | 2021-04-20 | 2021-12-14 | 广东工业大学 | Landing tracking control method and system based on light-weight twin network and unmanned aerial vehicle |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111179314A (en) * | 2019-12-30 | 2020-05-19 | 北京工业大学 | Target tracking method based on residual dense twin network |
CN111179307A (en) * | 2019-12-16 | 2020-05-19 | 浙江工业大学 | Visual target tracking method for full-volume integral and regression twin network structure |
WO2020173036A1 (en) * | 2019-02-26 | 2020-09-03 | 博众精工科技股份有限公司 | Localization method and system based on deep learning |
CN111640136A (en) * | 2020-05-23 | 2020-09-08 | 西北工业大学 | Depth target tracking method in complex environment |
-
2020
- 2020-10-30 CN CN202011188664.7A patent/CN112200870B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020173036A1 (en) * | 2019-02-26 | 2020-09-03 | 博众精工科技股份有限公司 | Localization method and system based on deep learning |
CN111179307A (en) * | 2019-12-16 | 2020-05-19 | 浙江工业大学 | Visual target tracking method for full-volume integral and regression twin network structure |
CN111179314A (en) * | 2019-12-30 | 2020-05-19 | 北京工业大学 | Target tracking method based on residual dense twin network |
CN111640136A (en) * | 2020-05-23 | 2020-09-08 | 西北工业大学 | Depth target tracking method in complex environment |
Non-Patent Citations (2)
Title |
---|
杨康 ; 宋慧慧 ; 张开华 ; .基于双重注意力孪生网络的实时视觉跟踪.计算机应用.2019,(06),全文. * |
石国强 ; 赵霞 ; .基于联合优化的强耦合孪生区域推荐网络的目标跟踪算法.计算机应用.2020,(10),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN112200870A (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11208985B2 (en) | Correction method and apparatus for predicted wind speed of wind farm | |
CN112200870B (en) | Single-target tracking method based on combination of classification and position loss of twin network | |
CN111126472A (en) | Improved target detection method based on SSD | |
CN112016507A (en) | Super-resolution-based vehicle detection method, device, equipment and storage medium | |
CN113128355A (en) | Unmanned aerial vehicle image real-time target detection method based on channel pruning | |
CN109492596B (en) | Pedestrian detection method and system based on K-means clustering and regional recommendation network | |
CN109583483A (en) | A kind of object detection method and system based on convolutional neural networks | |
CN112287832A (en) | High-resolution remote sensing image-based urban illegal building detection method | |
CN113408423A (en) | Aquatic product target real-time detection method suitable for TX2 embedded platform | |
CN111798447B (en) | Deep learning plasticized material defect detection method based on fast RCNN | |
CN111553348A (en) | Anchor-based target detection method based on centernet | |
CN110827312A (en) | Learning method based on cooperative visual attention neural network | |
CN112070037B (en) | Road extraction method, device, medium and equipment based on remote sensing image | |
CN111862122A (en) | Corrugated board stacking layer number counting method based on deep learning | |
CN112365511A (en) | Point cloud segmentation method based on overlapped region retrieval and alignment | |
CN114972759A (en) | Remote sensing image semantic segmentation method based on hierarchical contour cost function | |
CN114782714A (en) | Image matching method and device based on context information fusion | |
CN114140485A (en) | Method and system for generating cutting track of main root of panax notoginseng | |
CN117253188A (en) | Transformer substation grounding wire state target detection method based on improved YOLOv5 | |
CN114419078B (en) | Surface defect region segmentation method and device based on convolutional neural network | |
CN115984559A (en) | Intelligent sample selection method and related device | |
CN113989267B (en) | Battery defect detection method based on lightweight neural network | |
CN115995020A (en) | Small target detection algorithm based on full convolution | |
CN111860332B (en) | Dual-channel electrokinetic diagram part detection method based on multi-threshold cascade detector | |
CN112686310B (en) | Anchor frame-based prior frame design method in target detection algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |