CN114332151B

CN114332151B - Method for tracking interested target in shadow Video-SAR (synthetic aperture radar)

Info

Publication number: CN114332151B
Application number: CN202111310927.1A
Authority: CN
Inventors: 张晓玲; 王宝有; 鲍金宇; 张天文; 师君
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2023-04-07
Anticipated expiration: 2041-11-05
Also published as: CN114332151A

Abstract

The invention discloses a method for tracking an interested target based on shadow Video-SAR (Video-synthetic aperture radar). A brand-new network, namely a guide anchor twin network is established, the guide anchor twin network comprises a twin subnetwork, a guide anchor network and a similarity learning subnetwork, and the shadow-based Video-SAR arbitrary interested moving target tracking is realized by utilizing the advantages of stable shadow gray scale characteristics, no position deviation and the like of an SAR image moving target; meanwhile, a guidance anchor network (GA-SubNet) is used for improving the tracking precision, inhibiting false alarms and generating sparse anchors which are more in line with the shape of the SAR moving target shadow, the tracking precision of the invention on the known Sandy national laboratory data set is 60.16%, and the tracking speed is 32 frames per second. Compared with other advanced moving target tracking technologies in the Video-SAR, the method realizes high moving target tracking precision in the Video-SAR.

Description

Method for tracking interested target in shadow Video-SAR (synthetic aperture radar)

Technical Field

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and relates to a method for tracking an interested target based on shadow Video-SAR.

Background

Synthetic Aperture Radar (SAR) is an active remote sensing technology that can work all day long and all weather. Compared with an optical sensor, the SAR can penetrate through cloud mist and can complete an observation task under severe meteorological conditions. SAR has become an important means for earth observation at present, and is more and more widely applied to national economy such as terrain image generation, target detection and reconnaissance, national soil resource exploration, natural disaster monitoring and the like. See the literature, "Zhang Qingjun, korean Lei Jie, liu Jie. Satellite-borne synthetic aperture radar remote sensing technology development and development tendency [ J ] spacecraft engineering, 2017,26 (06): 1-8 ].

The Video synthetic aperture radar (Video-SAR) technology enables an SAR image sequence to have the display effect like animation, has the advantage of intuitively and continuously observing target motion information, and has very important application value in the aspect of SAR moving target tracking. The all-weather and all-day detection and tracking capability of the moving target of the video SAR has great strategic significance to the initiative of battlefield enemy detection in the military, and can well improve the accurate hitting capability and early warning capability in the war in the military, so that the video SAR has great research value and prospect, and has irreplaceable monitoring effect in the civil field such as complex climate traffic detection, natural disaster survey and other scenes. See the literature "complementary and smooth" details, the shadow-based video SAR moving target detection tracking algorithm researches [ D ]. Electronic technology university, 2021 ].

Shadow can appear at the real position of a moving target in a Video-SAR image, the shadow is the shielding of the moving target (no energy reflection), and the method has the advantages of stable gray scale characteristics, no position deviation and the like, so that the method for tracking the moving target by means of the shadow of the moving target becomes the most concerned means. See the literature "Dingjin flash. Video SAR imaging and moving target shadow detection technology [ J ] Radar newspaper 2020,9 (2).".

At present, the tracking of a moving target is realized from multiple angles by the existing target tracking method based on the target shadow in the Video-SAR, and a good tracking effect is obtained. However, due to the problems that the background similar to the shadow is not easily distinguished, and the method based on the appearance characteristics is not easy to obtain any class of target training samples, the tracking of any interested target in the Video-SAR is still a challenge.

Therefore, in order to solve the problems of high false alarm rate, low tracking precision and the like in the prior art, the invention provides an interested target tracking method based on shadow Video-SAR.

Disclosure of Invention

The invention belongs to the technical field of Synthetic Aperture Radar (SAR) image interpretation, and discloses a method for tracking an interested target in a shadow-based Video-SAR, which is used for solving the problems of high false alarm rate and low tracking precision. The method designs a brand-new network, namely a guidance anchor twin network, which comprises a twin sub-network, a guidance anchor network and a similarity learning sub-network, so that the false alarm rate is reduced, and the moving target tracking precision is improved. Compared with other advanced Video-SAR moving target tracking technologies, the method realizes the most advanced moving target tracking precision in the Video-SAR according to the experimental result on the known Sangia national laboratory data set.

For the convenience of describing the present invention, the following terms are first defined:

definition 1: classical sandia national laboratory dataset

The classical Sandia National Laboratory Dataset is a Video-SAR Dataset, which is known as Sandia National Laboratory Dataset in english, and can be used for training a deep learning model for researchers to evaluate the performance of their algorithms in this unified Dataset. In this data set containing 50 different moving objects in all 899 frames, the size of the image is 600 x 600. The Sangya national laboratory data set is detailed in the website "https:// www.sandia.gov/radar/pathfinder-radar-isr-and-synthetic-opacity-radar-sar-systems/video/.".

Definition 2: traditional image cutting and scale transformation method

The conventional image cropping and scaling is to make all images of the Video-SAR have the same feature scale for subsequent processing. Image cropping is the cropping of a region of interest of an image and scaling is the scaling of the image. For a pair of front and back adjacent images in the input SAR image sequence, the template image is a 127 × 127 area with the center (x, y) of the shadow position in the t-1 th frame as the center, wherein t represents the current time, t-1 represents the last time, x represents the abscissa, and y represents the ordinate. The area of ((w + h) × 0.5+ w, (w + h) × 0.5+ h) is first cropped centered on (x, y) and then scaled to 127 × 127, where (w, h) is the width and height of the shaded bounding box, known during the training phase (x, y, w, h), and the parameters represent the prediction result of the previous frame during the testing phase. Similar to the processing of the template frame, the detection image selects the (x, y) -centered ((w + h) × 0.5+ w) × 255/127, (w + h) × 0.5+ h) × 255/127) region in the tth frame and then scales to 255 × 255, which is larger than the template region in order to ensure that the shadow is always contained in the detection region. The traditional image cutting and scale transformation method is detailed in a website 'https:// blog.csdn.net/ZBC 010/article/details/120584785'.

Definition 3: classical convolutional neural network

Classical convolutional neural networks refer to a class of feed-forward neural networks that contain convolutional computations and have a deep structure. The convolutional neural network is constructed by simulating a visual perception mechanism of a living being, and can be used for supervised learning and unsupervised learning, and the convolutional neural network can perform feature extraction with a smaller amount of calculation due to parameter sharing of convolution kernels in hidden layers and sparsity of interlayer connection. In recent years, convolutional neural networks have been developed rapidly in the fields of computer vision, natural language processing, speech recognition and the like, and the strong feature learning ability of convolutional neural networks has attracted wide attention of experts and scholars at home and abroad. The classic convolutional neural network method is detailed in the literature, "Zhang Sookou, von Ye, wu Xiao Fu. Target detection algorithm progress based on deep convolutional neural network [ J/OL ]. Nanjing post and electronics university school newspaper (Nature science edition), 2019 (05): 1-9.Https:// doi.org/10.14132/j.cnki.1673-5439.2019.05.010.

Definition 4: conventional convolution kernel

When the convolution kernel is used for image processing, given an input image, each corresponding pixel in an output image is formed after weighted averaging of pixels in a small area in the input image, wherein a weight value is defined by a function, and the function is called the convolution kernel. The effect of the convolution kernel is in feature extraction, with larger convolution kernel sizes implying larger fields of view, and of course, more parameters. As early as 1998, letNet-5 model issued by LeCun shows that there is local correlation in the image space, and the convolution process is a kind of extraction of local correlation. The convolution kernel setting method is described in the literature "Lecun Y, bottou L, bengio Y, et al, gradient-based learning applied to document retrieval [ J ]. Proceedings of the IEEE, 1998,86 (11): 2278-2324.

Definition 5: traditional convolution kernel size setting method

The convolution kernel size refers to the length, width and depth of the convolution kernel and is denoted as L × W × D, where L represents the length, W represents the width, and D represents the depth. Sizing the convolution kernel refers to determining the specific values of L, W, and D. Generally, the smaller the convolution kernel, the smaller the parameters and computational effort required to achieve the same receptive field. Specifically, the length and width of the convolution kernel must be greater than 1 to have the effect of enhancing the receptive field, and even a convolution kernel with an even size cannot ensure that the input characteristic spectrum size and the output characteristic spectrum size are unchanged even if the zero padding operation is symmetrically added, and 3 is generally used as the size of the convolution kernel. See website "https:// www.sohu.com/a/241208957_787107".

Definition 6: standard ResNet50 network

The standard ResNet50 network refers to a ResNet50 network with 50 hidden layers, is a network part for extracting features, can combine different modules in the network, comprises a plurality of convolution layers and pooling layers, and can automatically extract useful feature information through training. See in detail "He K., zhang X., ren S., et al, deep residual learning for image recognition [ C ]. Proceedings of the IEEE conference on computer vision and pattern recognition,2016,770-778.

Definition 7: classic twin network construction method

The twin network is a feature extraction network with two identical parameters, and the purpose is to extract the same features for feature comparison for two input images. In general, in the process of adopting the twin network, a branch used as a comparison reference is called a template branch, and a branch used for comparison is called a detection branch. The classic twin network construction method is described in detail in the literature "Bertonitto L, valldre J, henriques J F, et al.

Definition 8: classical convolution operation method

In the CNN, feature extraction is performed by using convolutional layers, three channels of an image are respectively subjected to convolutional operation by using different convolutional kernels, and the convolutional operation is essentially to perform dot product between a filter and a local region of input data, so that the convolutional operation can be realized by using a matrix multiplication method. The convolution operation method is detailed in a website 'https:// blog.csdn.net/qq _ 40962368/article/details/82864606'.

Definition 9: conventional deformable convolution method

The deformable convolution means that a parameter direction parameter is additionally added to each element of the convolution kernel, so that the convolution kernel can be expanded to a large range in the training process. The added offset in the deformable convolution unit is a part of a network structure and is calculated by another parallel standard convolution unit, and further end-to-end learning can be carried out through gradient back propagation. After learning of the offset is added, the size and the position of the deformable convolution kernel can be dynamically adjusted according to the image content which needs to be identified currently, and the visual effect is that the positions of the sampling points of the convolution kernels at different positions can be adaptively changed according to the image content, so that the method is suitable for the geometric deformation of shapes, sizes and the like of different objects. The conventional deformable convolution method is detailed in a website' https:// blog.csdn.net/LEANG 121/article/details/104234927.

Definition 10: anchor

In the convolution process using sliding window, the center position of the sliding window is defined as anchor point, and each sliding position corresponds to 3 scales and 3 length-width ratios, so that each sliding position has k =9 anchor points, and each sliding window generates k region suggestion boxes. Thus, the regression layer will have 4k outputs (the coordinates of each region suggestion box contain 4 parameters) and the classification layer will have 2k outputs (the probability estimate of whether each region suggestion box is a target or not). For details, see the documents "Ren S., he K., girshick R., et al. Faster r-cnn: targets real-time object detection with region pro-posal networks [ J ]. ArXiv preprinting arXiv:150601497,2015.".

Definition 11: classical method of loss function

A loss function (loss function) or cost function (cost function) is a function that maps a random event or its associated random variable values to non-negative real numbers to represent the "risk" or "loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function. The classical loss function is detailed in the website "https:// baike.

The focal loss function method was originally proposed by hocamme and was originally used in the image domain to solve model performance problems caused by data imbalance. See the website "https:// zhuanlan. Zhihhu. Com/p/266023273.

The SmoothL1 Loss function method is proposed in Fast RCNN paper, according to the explanation of the paper, because let Loss be more robust to outliers for smoothL1 Loss, namely: compared with L2 Loss, the method is insensitive to outliers and outliers (outliers), has relatively smaller gradient change, and is not easy to run away during training. See the website "https:// blog.csdn. Net/c 2250645962/article/details/106023381".

Definition 12: classical Adam algorithm

The classical Adam algorithm is a first-order optimization algorithm that can replace the traditional stochastic gradient descent process, and can iteratively update neural network weights based on training data. The Adam algorithm differs from the traditional random gradient descent. The stochastic gradient descent maintains a single learning rate that updates all weights, and the learning rate does not change during the training process. Adam, in turn, designs independent adaptive learning rates for different parameters by computing first and second order moment estimates of the gradient. See the literature "Kingma, d.; a Method for Stocharistic optimization, arXiv 2014, arXiv 1412.698. ".

Definition 13: standard tracking network test method

The standard tracking network test method is to finally test the tracking model on the test set to obtain the tracking result of the detection model on the test set. See in detail the document "Bertonitto L, valmdre J, henriques J F, et al. Full-conditional parameter Networks for Object Tracking [ J ]. Springer, cham,2016.".

Definition 14: standard tracking precision calculation method

The tracking accuracy is characterized by the average IoU (mIoU) and the larger the mIoU, the better, it is defined as follows:

where IoU is calculated as the above line of equations, P represents the tracking result, G represents the real box of the shadow, and N is the number of images in the sequence. The details of the IoU computing method are shown in a website' https:// blog. Csdn. Net/u 014061630/article/details/828112.

Definition 15: standard center point error calculation method

The standard center position error (CLE) reflects the stability of the tracker, i.e. the smaller the CLE, the better the stability of the tracker. It is defined as:

wherein (x) _R ,y _R ) Center coordinates representing tracking results, (x) _G ,y _G ) The center coordinates representing the true position of the shadow. Standard center point calculation errors are detailed in the website "https:// www. Jianshu. Com/p/e62baac9222c.

Definition 16: standard tracking speed calculation method

The standard tracking speed refers to the number of images tracked by the tracking model per unit time, and the unit time is 1 second. The number of tracking Frames Per Second (FPS) represents a tracking speed, and is defined as:

where t represents the total time of video tracking and N is the number of images in the sequence. Standard calculation method for tracking speed is detailed in website "https:// blog. Csdn. Net/weixin _ 45192980/article/detail/109064379"

Definition 17: classical CNN feature extraction method

The classical CNN feature extraction is to perform feature extraction on an original input image through CNN. In summary, the original input image is transformed into a series of feature maps by convolution operations of different features. In CNN, the convolution kernel in the convolutional layer is operated on the image by sliding. Meanwhile, the maximum pooling layer is responsible for taking the maximum value of each local block on the inner product result. Therefore, the CNN implements a picture feature extraction method through the convolutional layer and the max-pooling layer. The classic CNN feature extraction is detailed in a website 'https:// blog.csdn.net/qq _ 30815237/article/details/86703620'.

Definition 18: siamese-RPN of the prior art

siense-RPN (siense region pro-social network), which is a line-end-to-end training based on large-scale image pairs. In contrast to previous siemese networks, siemese-RPN mainly includes siemese subnets for feature extraction and RPN subnets containing classification and regression branches. In the inference phase, the method proposed by the authors is structured as a single-sample detection task (one-shot detection task). By pre-computing the template branch, i.e. the first frame, in the twin subnetwork, it is constructed as a convolutional layer inside the area extraction network in a detection branch for on-line tracking. Both traditional multi-scale testing and on-line trimming can be eliminated thanks to these improvements. See in detail the documents "Bo L, yan J, wei W, et al. High Performance Visual Tracking with parameter area register Proposal Network [ C ]//2018IEEE/CVF reference on Computer Vision and Pattern Registration (CVPR). IEEE,2018.

The invention provides a method for tracking an interested target in a shadow Video-SAR (synthetic aperture radar), which comprises the following steps of:

step 1, preparing a data set

The sandia national laboratory dataset was obtained from the definition 1 classical sandia national laboratory dataset according to 751:148, dividing a data set of a Sundia national laboratory into two parts to obtain a training set and a Test set, wherein the training set comprises 751 pictures at the front and is marked as Train, and the Test set comprises 148 pictures at the back and is marked as Test;

step 2, data preprocessing

Processing the current frame image by adopting the traditional image cutting and scale conversion method in the definition 2, and recording the obtained processing result as a detection region x, processing the previous frame image by adopting the traditional image cutting and scale conversion method in the definition 2, and recording the obtained processing result as a template region z, namely, processing the training set Train obtained in the step 1 by adopting the traditional image cutting and scale conversion method in the definition 2, and obtaining a network input image pair (template region, detection region) which is recorded as (z, x).

Step 3, constructing twin subnetworks

Step 3.1: building template branches

And (3) establishing the ResNet50 network by adopting a classical convolutional neural network method in definition 3 according to the standard ResNet50 network definition in definition 6 to obtain the standard ResNet50 network, namely a template branch, which is marked as CNN-1. And at this point, the template branch is established.

Step 3.2: constructing search branches

And (3) establishing the ResNet50 network by adopting a classical convolutional neural network method in definition 3 according to the standard ResNet50 network definition in definition 6 to obtain the standard ResNet50 network, namely a search branch, which is marked as CNN-2. And then, the establishment of the search branch is finished.

Step 3.3: feature extraction

And (3) constructing a twin Subnetwork for the template branch CNN-2 constructed in the step (3.1) and the search branch CNN-2 constructed in the step (3.2) by adopting a classical twin network construction method in the definition (7), and obtaining the twin Subnetwork which is marked as Simase Subnetwork.

For the template branch constructed in step 3.1, the template region z obtained in step 2 is subjected to feature extraction by adopting a classical CNN feature extraction method in definition 17 to obtain an output result of the template branch, and the output result is recorded as

For the search branch constructed in step 3.2, the detection area x obtained in step 2 is subjected to feature extraction by adopting a classical CNN feature extraction method in definition 17, and the output result of the search branch is obtained and recorded as

Step 4, constructing a first guide anchor network

Step 4.1: predicting anchor locations

The result is output for the template branch in step 3.3 using the classical convolution operation in definition 8, according to the definition of anchor in definition 10

And the search branch output result in step 3.3 &>

And performing convolution operation to obtain an anchor position prediction result, which is recorded as a _ loc.

And processing the anchor position prediction result a _ loc by adopting a focal loss function in the definition 11 according to a classical loss function in the definition 11 to obtain a position loss result which is marked as aloc _ loss.

Step 4.2: predicting anchor shape

And (3) according to the convolution kernel definition in the definition 4, adopting a classical convolution neural network method in the definition 3 to create a convolution kernel, and obtaining the convolution kernel which is marked as C1.

The size of the convolution kernel C1 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

According to the definition of anchor in definition 10, the classical convolution operation method in definition 8 is adopted to output results to the convolution kernel C1 and the search branch in step 3.3

And performing convolution operation to obtain an anchor shape prediction result, and marking as a _ shape.

And processing the anchor shape prediction result a _ shape by adopting a SmoothL1 Loss function method in the definition 11 according to a classical Loss function in the definition 11 to obtain a shape Loss result which is marked as ashape _ Loss.

Step 4.3: adjusting characteristic diagram

According to the traditional deformable convolution definition in definition 9, a deformable convolution kernel is created by adopting a classical convolution neural network method in definition 3 to obtain a deformable convolution kernel which is marked as D1.

The size of the convolution kernel D1 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

The result is output for the convolution kernel D1 and the search branch in step 3.3 by the classical convolution operation method in definition 8

And performing convolution operation to obtain a characteristic diagram adjustment result, and marking the characteristic diagram adjustment result as F1.

And at this moment, after the first guidance anchor network is established, the obtained guidance anchor network is marked as GA-Subnet1.

Step 5, constructing a second guide anchor network

Step 5.1: predicting anchor locations

Following the definition of Anchor in definition 10, the classic in definition 8 is usedConvolution operation method outputs result to template branch in step 3.3

And the search branch in step 3.3 outputs a result->

And performing convolution operation to obtain an anchor position prediction result, which is recorded as b _ loc.

And (3) processing the anchor position prediction result b _ loc by adopting a focal loss function method in the definition 11 according to the classical loss function in the definition 11 to obtain a position loss result which is marked as bloc _ loss.

And step 5.2: predicting anchor shape

And (4) according to the convolution kernel definition in the definition 4, adopting a classical convolution neural network method in the definition 3 to create a convolution kernel, and obtaining the convolution kernel which is marked as C2.

The size of the convolution kernel C2 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

According to the definition of anchor in definition 10, the result is output to the convolution kernel C2 and the search branch in step 3.3 by adopting the classic convolution operation method in definition 8

And performing convolution operation to obtain an anchor shape prediction result, and recording the anchor shape prediction result as b _ shape.

And processing the anchor shape prediction result b _ shape by adopting a SmoothL1 Loss function method in the definition 11 according to a classical Loss function in the definition 11 to obtain a shape Loss result, and recording the shape Loss result as bshape _ Loss.

Step 5.3: adjusting characteristic diagram

According to the traditional deformable convolution definition in definition 9, a deformable convolution kernel is created by adopting a classical convolution neural network method in definition 3 to obtain a deformable convolution kernel which is marked as D2.

The size of the convolution kernel D2 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

Using classic volumes as in definition 8Product operation method outputs result to convolution kernel D2 and search branch in step 3.3

And performing convolution operation to obtain a characteristic diagram adjustment result, and marking the characteristic diagram adjustment result as F2.

And at this point, after the second guidance anchor network is established, obtaining a guidance anchor network which is marked as GA-Subnet2.

Step 6, constructing a similarity learning sub-network

Step 6.1: building a classification module

And (4) constructing a convolution kernel by adopting a classical convolution neural network method in definition 3 according to the definition of the convolution kernel in definition 4 to obtain 2 types of convolution kernels which are respectively marked as C3 and C4.

And setting the size of the convolution kernel C3 to be 3 multiplied by 256 and the number of the convolution kernels C3 to be 2k by adopting a convolution kernel size setting method in definition 5, wherein k is the number of anchors in the anchor position prediction result a _ loc obtained in the step 4.1. The size of the convolution kernel C4 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

The result is output for the convolution kernel C3 and the search branch in step 3.3 by the classical convolution operation method in definition 8

And performing convolution operation to obtain a convolution result, and marking the convolution result as DC1.

And (4) performing convolution operation on the convolution kernel C4 and the feature map adjustment result F1 in the step 4.3 by adopting a classical convolution operation method in the definition 8 to obtain a convolution result, and marking the convolution result as DC2.

And (4) performing convolution operation on the convolution result DC1 and the convolution result DC2 by adopting a classical convolution operation method in definition 8 to obtain a convolution result, and marking the convolution result as A _ cls.

Step 6.2: constructing a regression Module

And (4) according to the convolution kernel definition in the definition 4, adopting a classical convolution neural network method in the definition 3 to create convolution kernels to obtain 2 types of convolution kernels which are respectively marked as C5 and C6.

By adopting the convolution kernel size setting method in definition 5, the size of the convolution kernel C5 is set to be 3 × 3 × 256, and the number of the convolution kernels C5 is set to be 4k, where k is the number of anchors in the anchor position prediction result b _ loc obtained in step 4.1. The size of the convolution kernel C6 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

The result is output by the classical convolution operation method in definition 8 for the convolution kernel C5 and the search branch in step 3.3

And performing convolution operation to obtain a convolution result, and marking the convolution result as DC3.

And (5) performing convolution operation on the convolution kernel C4 and the feature map adjustment result F2 in the step 5.3 by adopting a classical convolution operation method in the definition 8 to obtain a convolution result, and marking the convolution result as DC4.

And (5) performing convolution operation on the convolution result DC3 and the convolution result DC4 by adopting a classical convolution operation method in definition 8 to obtain a convolution result, and marking the convolution result as A _ reg.

The guidance anchor twin network consists of the twin sub-network constructed in the step 3, the guidance anchor sub-network constructed in the steps 4 and 5, and the similarity learning sub-network constructed in the step 6, so that the construction of the guidance anchor twin network is completed.

Step 7, establishing a target tracking model

And (3) taking the training set Train obtained in the step (1) as input, training on the guidance anchor twin network finished in the step (6) by adopting a classic Adam algorithm in a definition 12, and obtaining a target tracking model after training is finished, wherein the target tracking model is marked as GASN.

Step 8, testing a target tracking model

And (4) testing the Test set Test obtained in the step (1) on the target tracking model GASN obtained in the step (7) by adopting a standard target tracking model testing method in the definition (13), obtaining a testing Result of the Test set Test on the target tracking model GASN, and recording the testing Result as Result.

Step 9, evaluating a target tracking model

And (5) taking the test Result of the target tracking model GASN obtained in the step (8) as an input, and calculating the tracking precision by adopting a tracking precision calculation method of the definition 14 standard, and marking as Accuracy.

And (5) taking the test Result of the target tracking model GASN obtained in the step (8) as input, and calculating the central point error by adopting a central point error calculation method of a definition 15 standard, and recording the central point error as CLE.

And (5) taking the test Result of the target tracking model GASN obtained in the step (8) as an input, and calculating the tracking speed by adopting a tracking speed calculation method with a definition 16 standard, and recording the tracking speed as FPS.

The entire method is now complete.

The invention has the innovation points that a brand-new network, namely a guide anchor twin network is established, and the advantages of stable shadow gray scale characteristic, no position deviation and the like of the SAR image moving target are adopted to realize the shadow-based Video-SAR arbitrary interested moving target tracking; meanwhile, a guidance anchor network (GA-SubNet) is used for improving the tracking precision, inhibiting false alarms and generating sparse anchors which are more suitable for the shape of the SAR moving target shadow. The tracking accuracy of the method on the well-known sandia national laboratory dataset is 60.16%, and the tracking speed is 32 frames per second.

The method has the advantages that the method can track any interested target in the Video-SAR, solves the problems that the background similar to the shadow is not easy to distinguish in the prior art, the false alarm rate is high, the difficulty of obtaining a sample of any interested target by a method based on appearance characteristics is high and the like, and has better tracking precision and tracking speed than the prior moving target tracking method. The tracking precision is 4.55 percent higher than that of the prior art, and the tracking speed is 1 frame faster than that of the prior art, namely the Siamese-RPN.

Drawings

FIG. 1 is a schematic diagram of a network structure based on tracking of an interest target in a shadow Video-SAR in the present invention.

FIG. 2 is a flow chart of the implementation steps of the present invention

FIG. 3 is a comparison of the evaluation index of Siamese-RPN of the present invention and the prior art.

Where Siamese-RPN is detailed in definition 18 as prior art Siamese-RPN.

Detailed Description

The invention is described in further detail below with reference to fig. 1.

Step 1, preparing a data set

The sandia national laboratory dataset was obtained from the definition 1 classical sandia national laboratory dataset according to 751:148, the sandia national laboratory data set is divided into two parts, and a training set and a Test set are obtained, wherein the training set comprises 751 pictures at the front and is marked as Train, and the Test set comprises 148 pictures at the back and is marked as Test.

Step 2, data preprocessing

And (2) processing the training set Train obtained in the step (1) by adopting a traditional image cutting and scale transformation method in the definition 2 to obtain a network input picture pair (a template region and a detection region) marked as (z, x), wherein the template region z represents a processing result of a previous frame image by adopting the traditional image cutting and scale transformation method in the definition 2, and the detection region x represents a processing result of a current frame image by adopting the traditional image cutting and scale transformation method in the definition 2.

Step 3, constructing twin subnetworks

Step 3.1: building template branches

As shown in fig. 1, according to the standard ResNet50 network definition in definition 6, a ResNet50 network is established by using the classical convolutional neural network method in definition 3, and a standard ResNet50 network is obtained and is denoted as CNN-1. At this point, the Template branch is established and is marked as Template.

Step 3.2: constructing search branches

And (3) establishing the ResNet50 network by adopting a classical convolutional neural network method in definition 3 according to the standard ResNet50 network definition in definition 6 to obtain the standard ResNet50 network, and marking the standard ResNet50 network as CNN-2. At this point, the Search branch is marked as Search after being established.

Step 3.3: feature extraction

According to the classic twin network definition in definition 7, a twin Subnetwork is constructed by adopting the template branch constructed in step 3.1 and the search branch in step 3.2, and the twin Subnetwork is obtained and is marked as a siense subnet.

Processing the template region z obtained in the step 2 by adopting a classical CNN feature extraction method in definition 17 and using the template branch constructed in the step 3.1 to obtain an output result of the template branch, and recording the output result as an output result

Processing the detection area x obtained in the step 2 by adopting a classical CNN feature extraction method in definition 17 and using the search branch constructed in the step 3.2 to obtain an output result of the search branch, and recording the output result as an output result

Step 4, constructing a first guide anchor network

Step 4.1: predicting anchor locations

According to the definition of anchor in definition 10, the classical convolution operation method in definition 8 is adopted to output the result to the template branch in step 3.3

And the search branch in step 3.3 outputs a result->

According to the classical loss function in definition 11, the anchor position prediction result a _ loc is processed by using the focal loss function in definition 11 to obtain a position loss result, which is denoted as aloc _ loss.

Step 4.2: predicting anchor shape

And (4) according to the convolution kernel definition in the definition 4, adopting a classical convolution neural network method in the definition 3 to create a convolution kernel, and obtaining the convolution kernel which is marked as C1.

By definition of the anchor in definition 10, the classical convolution operation in definition 8 is employedThe method outputs the result for the convolution kernel C1 and the search branch in step 3.3

And performing convolution operation to obtain an anchor shape prediction result which is recorded as a _ shape.

According to the classical Loss function in the definition 11, the anchor shape prediction result a _ shape is processed by using the SmoothL1 Loss function in the definition 11, and a shape Loss result is obtained and is recorded as ashape _ Loss.

Step 4.3: adjusting feature maps

And performing convolution operation to obtain a characteristic diagram adjustment result, which is marked as F1.

Step 5, constructing a second guide anchor network

Step 5.1: predicting anchor locations

And the search branch in step 3.3 outputs a result->

And processing the anchor position prediction result b _ loc by using a focal loss function in the definition 11 according to a classical loss function in the definition 11 to obtain a position loss result which is marked as bloc _ loss.

And step 5.2: predicting anchor shape

And performing convolution operation to obtain an anchor shape prediction result which is recorded as b _ shape.

And processing the anchor shape prediction result b _ shape by using the SmoothL1 Loss function in the definition 11 according to the classical Loss function in the definition 11 to obtain a shape Loss result, which is recorded as bshape _ Loss.

Step 5.3: adjusting feature maps

According to the traditional deformable convolution definition in definition 9, a deformable convolution kernel is created by adopting a classical convolution neural network method in definition 3, and the deformable convolution kernel is obtained and is marked as D2.

The result is output for the convolution kernel D2 and the search branch in step 3.3 by the classical convolution operation method in definition 8

And performing convolution operation to obtain a characteristic diagram adjustment result, which is marked as F2.

And at this moment, the second guidance anchor network is established, and the obtained guidance anchor network is marked as GA-Subnet2.

Step 6, constructing a similarity learning sub-network

Step 6.1: building a Classification Module

And (3) creating a convolution kernel by adopting a classical convolution neural network method in the definition 3 according to the definition of the convolution kernel in the definition 4 to obtain 2 types of convolution kernels which are respectively marked as C3 and C4.

By adopting the convolution kernel size setting method in definition 5, the size of the convolution kernel C3 is set to be 3 × 3 × 256, and the number of the convolution kernels C3 is set to be 2k, where k is the number of anchors in the anchor position prediction result a _ loc obtained in step 4.1. The size of the convolution kernel C4 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

Step 6.2: constructing a regression Module

And setting the size of the convolution kernel C5 to be 3 multiplied by 256 and the number of the convolution kernels C5 to be 4k by adopting a convolution kernel size setting method in definition 5, wherein k is the number of anchors in the anchor position prediction result b _ loc obtained in the step 4.1. The size of the convolution kernel C6 is set to 3 × 3 × 256 by the convolution kernel size setting method in definition 5.

A convolution operation is performed to obtain a convolution result, which is denoted as DC3.

And (4) adopting the twin sub-network constructed in the step (3), the guide anchor sub-network constructed in the step (4) and the step (5), and the similarity learning sub-network constructed in the step (6) to construct a guide anchor twin network, so that the construction of the guide anchor twin network is completed.

Step 7, establishing a target tracking model

And (3) taking the training set Train obtained in the step (1) as an input, training the training set Train on the guidance anchor twin network finished in the step (6) by adopting a classic Adam algorithm in the definition (12), and obtaining a target tracking model after the training is finished, wherein the target tracking model is recorded as the GASN.

Step 8, testing a target tracking model

Step 9, evaluating a target tracking model

And (5) taking the test Result of the target tracking model GASN obtained in the step 8 as input, and calculating the center point error by adopting a center point error calculation method of a definition 15 standard, and recording the center point error as CLE.

As shown in fig. 2, the tracking accuracy achieved by the present invention on the well-known sandia national laboratory dataset was 60.16%. Meanwhile, the highest tracking precision in the prior art is realized, and the method can realize the target tracking in the high-precision Video-SAR.

Claims

1. An interested target tracking method based on shadow Video-SAR is characterized by comprising the following steps:

step 1, preparing a data set

The sandia national laboratory dataset was obtained from the classical sandia national laboratory dataset, 751:148, dividing a data set of a Sundia national laboratory into two parts to obtain a training set and a Test set, wherein the training set comprises 751 pictures at the front and is marked as Train, and the Test set comprises 148 pictures at the back and is marked as Test;

step 2, data preprocessing

Processing the current frame image by adopting a traditional image cutting and scale conversion method, recording an obtained processing result as a detection area x, processing the previous frame image by adopting the traditional image cutting and scale conversion method, and recording an obtained processing result as a template area z, namely processing the training set Train obtained in the step 1 by adopting the traditional image cutting and scale conversion method, and obtaining a network input image pair (a template area, a detection area) as (z, x);

step 3, constructing twin subnetworks

Step 3.1: building template branches

Establishing a ResNet50 network by adopting a classical convolution neural network method to obtain a standard ResNet50 network, namely a template branch, which is marked as CNN-1; at this point, the template branch is established;

step 3.2: constructing search branches

Adopting a classical convolution neural network method to establish a ResNet50 network to obtain a standard ResNet50 network, namely a search branch, which is marked as CNN-2; so far, the establishment of the search branch is finished;

step 3.3: feature extraction

Constructing a twin Subnetwork for the template branch CNN-1 constructed in the step 3.1 and the search branch CNN-2 constructed in the step 3.2 by adopting a classical twin network construction method to obtain a twin Subnetwork which is marked as a Simase Subnetwork;

for the template branch constructed in the step 3.1, the template area z obtained in the step 2 is subjected to feature extraction by adopting a classical CNN feature extraction method, and the output result of the template branch is obtained and recorded as

For the detection area x obtained in the step 2 of the search branch constructed in the step 3.2, a classical CNN feature extraction method is adopted to carry out feature extraction, and the output result of the search branch is obtained and recorded as

Step 4, constructing a first guide anchor network

Step 4.1: predicting anchor locations

Outputting the result to the template branch in step 3.3 by adopting a classical convolution operation method

And the search branch in step 3.3 outputs a result->

Performing convolution operation to obtain an anchor position prediction result, and marking as a _ loc;

processing the anchor position prediction result a _ loc by adopting a focal loss function to obtain a position loss result which is marked as aloc _ loss;

step 4.2: predicting anchor shape

Adopting a classical convolution neural network method to create a convolution kernel to obtain a convolution kernel which is marked as C1;

setting the size of a convolution kernel C1 to be 3 multiplied by 256 by adopting a convolution kernel size setting method;

outputting the result to the convolution kernel C1 and the search branch in the step 3.3 by adopting a classical convolution operation method

Performing convolution operation to obtain an anchor shape prediction result which is marked as a _ shape;

processing the anchor shape prediction result a _ shape by adopting a SmoothL1 Loss function method to obtain a shape Loss result which is recorded as ashape _ Loss;

step 4.3: adjusting characteristic diagram

Adopting a classical convolution neural network method to create a deformable convolution kernel to obtain a deformable convolution kernel which is marked as D1;

setting the size of a convolution kernel D1 to be 3 multiplied by 256 by adopting a convolution kernel size setting method;

outputting the result to the convolution kernel D1 and the search branch in the step 3.3 by adopting a classical convolution operation method

Performing convolution operation to obtain a characteristic diagram adjustment result, and marking the characteristic diagram adjustment result as F1;

at this point, the first guidance anchor network is established, and the obtained guidance anchor network is marked as GA-Subnet1;

step 5, constructing a second guide anchor network

Step 5.1: predicting anchor locations

And the search branch in step 3.3 outputs a result->

Performing convolution operation to obtain an anchor position prediction result which is recorded as b _ loc;

processing the anchor position prediction result b _ local by adopting a focal loss function method to obtain a position loss result which is marked as blob _ loss;

step 5.2: predicting anchor shape

Adopting a classical convolution neural network method to create a convolution kernel to obtain a convolution kernel which is marked as C2;

setting the size of a convolution kernel C2 to be 3 multiplied by 256 by adopting a convolution kernel size setting method;

outputting the result to the convolution kernel C2 and the search branch in step 3.3 by adopting a classical convolution operation method

Performing convolution operation to obtain an anchor shape prediction result, and recording the anchor shape prediction result as b _ shape;

processing the anchor shape prediction result b _ shape by adopting a SmoothL1 Loss function method to obtain a shape Loss result which is recorded as bsshape _ Loss;

step 5.3: adjusting feature maps

Adopting a classical convolution neural network method to create a deformable convolution kernel to obtain a deformable convolution kernel which is marked as D2;

setting the size of a convolution kernel D2 to be 3 multiplied by 256 by adopting a convolution kernel size setting method;

outputting the result to the convolution kernel D2 and the search branch in the step 3.3 by adopting a classical convolution operation method

Performing convolution operation to obtain a characteristic diagram adjustment result, and marking the characteristic diagram adjustment result as F2;

at this point, the second guidance anchor network is established, and the obtained guidance anchor network is marked as GA-Subnet2;

step 6, constructing a similarity learning sub-network

Step 6.1: building a Classification Module

Adopting a classical convolution neural network method to construct a convolution kernel to obtain 2 types of convolution kernels which are respectively marked as C3 and C4;

setting the size of a convolution kernel C3 to be 3 multiplied by 256 and the number of the convolution kernels C3 to be 2k by adopting a convolution kernel size setting method, wherein k is the number of anchors in the anchor position prediction result a _ loc obtained in the step 4.1;

setting the size of a convolution kernel C4 to be 3 multiplied by 256 by adopting a convolution kernel size setting method;

outputting the result to the convolution kernel C3 and the search branch in step 3.3 by adopting a classical convolution operation method

Performing convolution operation to obtain a convolution result, and marking the convolution result as DC1;

performing convolution operation on the convolution kernel C4 and the feature map adjustment result F1 in the step 4.3 by adopting a classical convolution operation method to obtain a convolution result, and marking the convolution result as DC2;

carrying out convolution operation on the convolution result DC1 and the convolution result DC2 by adopting a classical convolution operation method to obtain a convolution result, and marking the convolution result as A _ cls;

step 6.2: constructing a regression Module

Adopting a classical convolution neural network method to create convolution kernels to obtain 2 types of convolution kernels which are respectively marked as C5 and C6;

setting the size of a convolution kernel C5 to be 3 multiplied by 256 and the number of the convolution kernels C5 to be 4k by adopting a convolution kernel size setting method, wherein k is the number of anchors in the anchor position prediction result b _ loc obtained in the step 4.1;

setting the size of a convolution kernel C6 to be 3 multiplied by 256 by adopting a convolution kernel size setting method;

outputting the result to the convolution kernel C5 and the search branch in the step 3.3 by adopting a classical convolution operation method

Performing convolution operation to obtain a convolution result, and marking the convolution result as DC3;

performing convolution operation on the convolution kernel C4 and the characteristic diagram adjustment result F2 in the step 5.3 by adopting a classical convolution operation method to obtain a convolution result, and marking the convolution result as DC4;

carrying out convolution operation on the convolution result DC3 and the convolution result DC4 by adopting a classical convolution operation method to obtain a convolution result, and recording the convolution result as A _ reg;

the guidance anchor twin network consists of the twin sub-network constructed in the step 3, the guidance anchor sub-network constructed in the steps 4 and 5 and the similarity learning sub-network constructed in the step 6, so that the construction of the guidance anchor twin network is completed;

step 7, establishing a target tracking model

Training the training set Train obtained in the step 1 by adopting a classical Adam algorithm on the guidance anchor twin network finished in the step 6 by taking the Train as input, and obtaining a target tracking model after the training is finished, and recording the target tracking model as GASN;

step 8, testing a target tracking model

Testing the Test set Test obtained in the step 1 on the target tracking model GASN obtained in the step 7 by adopting a standard target tracking model testing method to obtain a testing Result of the Test set Test on the target tracking model GASN, and recording the testing Result as Result;

step 9, evaluating a target tracking model

Taking the test Result of the target tracking model GASN obtained in the step 8 as input, and calculating the tracking precision by adopting a standard tracking precision calculation method, and recording the tracking precision as Accuracy;

taking the test Result of the target tracking model GASN obtained in the step 8 as input, calculating the error of the central point by adopting a standard central point error calculation method, and recording the error as CLE;

taking the test Result of the target tracking model GASN obtained in the step 8 as input, and calculating the tracking speed by adopting a standard tracking speed calculation method, and recording the tracking speed as FPS;

the entire method is now complete.