CN112200870B

CN112200870B - Single-target tracking method based on combination of classification and position loss of twin network

Info

Publication number: CN112200870B
Application number: CN202011188664.7A
Authority: CN
Inventors: 鄢展锋; 姚敏
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2024-03-12
Anticipated expiration: 2040-10-30
Also published as: CN112200870A

Abstract

The invention provides a single target tracking method based on combination of classification and position loss of a twin network, which comprises the following steps: determining the size of the template and the search area after cutting, and taking the area as the input of a model; taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks; selecting a template corresponding to the block and a feature map of the search area by one branch, calculating a response map corresponding to each block to obtain a classification result, and linearly superposing three classification errors; calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension ascending on the convolution characteristic corresponding to the acquired template region according to the channel direction, and carrying out cross-correlation operation to obtain the deviation of the central point and the width and height of the target box and the real box; the total loss of both branches is calculated. According to the single-target tracking method based on the combination of the classification of the twin network and the position loss, the feature extraction is carried out on the preprocessed picture through the modified residual error, so that the output sizes of different blocks are consistent.

Description

Single-target tracking method based on combination of classification and position loss of twin network

Technical Field

The invention relates to the technical field of computer vision digital image processing, in particular to a single-target tracking method based on combination of classification and position loss of a twin network.

Background

Twin networks (Siamese networks) are a supervised model for metric learning. Typically, a twin network has two inputs, which are fed into two neural networks sharing weights, respectively, and then a similarity loss function is performed on the two eigenvectors at the last layer to find the similarity matching the two inputs.

The residual network (ResNet) is a deeper neural network that suppresses degradation problems that occur as the network deepens. The Residual network consists of a series of Residual blocks (Residual blocks), one of which can be expressed as:

x _l+1 ＝x _l +F(x _l ,W _l )

wherein x is _l Is an input feature, F (x _l ,W _l ) Is to perform a plurality of convolution operations on the input features, x _l+1 Is an output feature.

Anchors (anchors) are a set of preset rims that frame the target at approximately the possible locations and then adjust based on these preset rims. The anchor is defined by the aspect ratio (ratio) of the frame and the scale (scale) of the frame, which corresponds to a series of preset frame generation rules, and can generate a series of frames at any position of the image. In general, the anchor generates a target frame based on the rule described above with the points of the feature map extracted by the convolutional neural network as the center positions.

Three sets of aspect ratios, typically 0.5, 1 and 2, with three dimensions 8, 16 and 32, can be combined into nine different shapes and sizes of rims. For example, assuming that the area s=16×16 of the rectangular frame, the width and height of the rectangular frame are respectively equal to w and h, there are:

simplifying and obtaining:

nine different rectangular boxes can be obtained after adding the scale factors, as follows:

how to find the target position more accurately and more quickly is a problem to be solved.

Disclosure of Invention

The invention aims to provide a single-target tracking method based on combination of classification and position loss of a twin network, so as to solve the problem of how to find a target position more accurately and more rapidly.

In order to solve the technical problems, the technical scheme of the invention is as follows: providing a single target tracking method based on combination of classification and position loss of a twin network, which comprises the following steps:

step one, determining the size of a template and a search area after cutting, and taking the area as the input of a model;

taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;

step three, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, and classifying by using the last block in a testing stage;

step four, calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the product characteristic corresponding to the acquired template region according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the deviation of the center point and the width and height of the target box and the real frame;

and step five, calculating the total loss of the two branches.

Further, in step three, performing cross-correlation operation according to the convolution feature map corresponding to the extracted block, and calculating a classification loss sum: and carrying out cross-correlation operation after the template features obtained by the same block are up-scaled to obtain the probability that each target box is divided into the foreground and the background, and calculating the classification loss weighting summation of different blocks, wherein the formula is as follows:

L _cls ＝α ₁ L ₁ +α ₂ L ₂ +α ₃ L ₃

wherein L represents the classification loss of the ith block and is a classification cross entropy loss function; alpha represents the weight of the corresponding classification loss.

Further, in step four, the formula for calculating the error of the target position regression with the true value:where x represents the element-by-element difference between the prediction and real frames, the parameter σ controls the smoothing of the region, σTaking 3; />Wherein R is smooth _L1 Function t _i And->The offset of the prediction anchor and the offset of the real frame are represented, respectively. For each anchor, L is calculated _reg Partial post-multiplication by p ^* ，p ^* Indicating a 1 when there is an object and a 0 when there is no object.

In step five, the results of the two branches are added linearly according to a certain weight, and the calculation formula of the total loss of the two branches is L _total ＝αL _cls +γL _reg

Wherein L is _cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L _reg Is the return branch loss, and gamma is the proportion of the return branch.

According to the single-target tracking method based on the combination of the classification and the position loss of the twin network, which is provided by the invention, the characteristics of the preprocessed pictures are extracted through the modified residual error, so that the output sizes of different blocks are consistent. In the training stage, one branch assists in locating the center of the target by linearly weighting the classification losses between different blocks, and the other branch calculates the position regression loss to generate a target frame with a more suitable size. In the test stage, only the classification and position regression result of the last block are used, so that the success rate and the accuracy are improved, and the speed is improved.

Drawings

The invention is further described below with reference to the accompanying drawings:

fig. 1 is a schematic flow chart of a single-target tracking method based on combination of classification and position loss of a twin network according to an embodiment of the present invention.

Detailed Description

The single-target tracking method based on the combination of classification of the twin network and position loss, which is proposed by the invention, is further described in detail below with reference to the accompanying drawings and the specific embodiments. Advantages and features of the invention will become more apparent from the following description and from the claims. It is noted that the drawings are in a very simplified form and utilize non-precise ratios, and are intended to facilitate a convenient, clear, description of the embodiments of the invention.

The invention has the core idea that the single-target tracking method based on the combination of the classification and the position loss of the twin network extracts the characteristics of the preprocessed picture through the modified residual error, so that the output sizes of different blocks are consistent. In the training stage, one branch assists in locating the center of the target by linearly weighting the classification losses between different blocks, and the other branch calculates the position regression loss to generate a target frame with a more suitable size. In the test stage, only the classification and position regression result of the last block are used, so that the success rate and the accuracy are improved, and the speed is improved.

According to the technical scheme, the invention provides a single-target tracking method based on combination of classification and position loss of a twin network, and fig. 1 is a flow chart of steps of the single-target tracking method based on combination of classification and position loss of the twin network. Referring to fig. 1, a single target tracking method based on a combination of classification and position loss of a twin network is provided, comprising the steps of:

s11, determining the size of the template and the search area after cutting, and taking the area as the input of a model;

s12, taking a residual error network as a main network, and taking convolution characteristic diagrams of the last three blocks;

s13, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, and classifying by using the last block in a testing stage;

s14, calculating the loss of position regression of a convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the product characteristic corresponding to the acquired template region according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the deviation of the center point and the width and height of the target box and the real frame;

s15, calculating the total loss of the two branches.

First, in S11, the size of the template area after clipping is determined to be 127, if the template area exceeds the original image boundary, the average value of the image is used as the edge to be filled; determining the size of the cut search area as 255; the template and the search area are taken as two inputs and pass through the backbone network with identical parameters.

In S12, the original residual network is modified to ensure that the convolution feature graphs of the last three blocks are equal in size, the step sizes of the last three blocks are removed, and simultaneously an expansion convolution is added to increase the receptive field. In the embodiment of the invention, the cut area is used as the input of a model, the sizes of template area feature maps corresponding to different blocks are 15x15x512, 15x15x1024 and 15x15x2048, and the sizes of search area feature maps are 31x31x512, 31x31x1024 and 31x31x2048.

Respectively taking template features and search region features of each block to perform cross-correlation operation, wherein the specific formula is expressed as follows:

wherein,and->The feature map of the ith block obtained by the same convolution operation of the template region z and the search region x is represented, the inner product of the response diagram is represented, and b1 represents the inner product.

The result of each block classification is weighted linearly, and the specific formula is expressed as:

L _cls ＝α ₁ L ₁ +α ₂ L ₂ +α ₃ L ₃

where L represents the classification loss of the i-th block and α represents the weight of the corresponding classification loss.

The formula for calculating the error of the target position regression and the true value:

wherein x represents element-by-element difference between the prediction frame and the real frame, and the parameter sigma controls smoothing of the area, and sigma takes 3;

wherein R is smooth _L1 Function t _i Andthe offset of the prediction anchor and the offset of the real frame are represented, respectively. For each anchor, L is calculated _reg Partial post-multiplication by p ^* ，p ^* When there is an object (positive) is 1, and when there is no object (negative) is 0, this means that only the foreground calculates the loss, and the background does not calculate the loss.

In the fifth step, the results of the two branches are added linearly according to a certain weight, and the calculation formula of the total loss of the two branches is L _total ＝αL _cls +γL _reg Wherein L is _cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L _reg Is the return branch loss, and gamma is the proportion of the return branch.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The single target tracking method based on the combination of classification and position loss of the twin network is characterized by comprising the following steps:

step three, selecting a template corresponding to a block and a feature map of a search area by one branch, calculating a response map corresponding to each block to obtain a classification result, linearly superposing three classification errors, adjusting the possibility of marking a wrong target by combining the classification errors of different blocks in a training stage, classifying by using the last block in a testing stage, performing cross-correlation operation according to the convolution feature map corresponding to the extracted block, and calculating a classification loss sum: and carrying out cross-correlation operation after the template features obtained by the same block are up-scaled to obtain the probability that each target box is divided into the foreground and the background, and calculating the classification loss weighting summation of different blocks, wherein the formula is as follows:

L _cls ＝α ₁ L ₁ +α ₂ L ₂ +α ₃ L ₃

wherein L represents the classification loss of the ith block and is a classification cross entropy loss function; alpha represents the weight of the corresponding classification loss;

step four, calculating the loss of position regression of the convolution layer corresponding to the last block of the other branch, carrying out dimension lifting on the convolution characteristic corresponding to the acquired template area according to the channel direction, changing the dimension into the original four times the set target box number, and then carrying out cross-correlation operation to obtain the center point of the target box and the real frame and the deviation of the width and the height _L And (3) calculating an error formula between the target position regression and the true value:

wherein x represents element-by-element difference between the prediction frame and the real frame, and the parameter sigma controls smoothing of the area, and sigma takes 3; />Wherein R is smooth _L1 Function t _i And->Respectively representing the offset of the prediction anchor and the offset of the real frame; for each anchor, L is calculated _reg Part ×

Multiplying the division by p, wherein p represents 1 when an object is present and 0 when no object is present;

and step five, calculating the total loss of the two branches.

2. The method for single-target tracking by combining classification and position loss of twin network according to claim 1, wherein in step five, the results of two branches are added linearly according to a certain weight, and the calculation formula of total loss of two branches is L _total ＝αL _cls +γL _reg Wherein L is _cls Is the loss of the classification branch, alpha is the proportion of the classification branch, L _reg Is the return branch loss, and gamma is the proportion of the return branch.