CN113222898A

CN113222898A - Double-voyage SAR image trace detection method based on multivariate statistics and deep learning

Info

Publication number: CN113222898A
Application number: CN202110401084.XA
Authority: CN
Inventors: 邢孟道; 石鑫; 张金松; 孙光才
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-08-06
Anticipated expiration: 2041-04-14
Also published as: CN113222898B

Abstract

The invention discloses a double-voyage SAR image trace detection method based on multivariate statistics and deep learning, which comprises the following steps: obtaining a difference image of the double-voyage SAR image by using a complex reflection change detection estimator; identifying the water area and the vegetation area of the difference image and eliminating false alarms by using an unsupervised multivariate statistical method to obtain a false alarm elimination image; training a CUnet network by using inductive transfer learning and rough-to-fine images; and performing trace identification on the image to be processed by utilizing the trained CUnet network. The method comprises the steps of obtaining a difference image of a double-navigation SAR image through a complex reflection change detection estimator, obtaining a water area and a vegetation area through unsupervised multivariate statistics, further obtaining a false alarm elimination image, combining an original image, the difference image and a false alarm removal image to construct a rough-to-fine image, and carrying out induction migration learning by using the rough-to-fine image and the CUnet, so that double-navigation SAR image trace detection under the condition of a small sample is realized, and the detection effect is good.

Description

Double-voyage SAR image trace detection method based on multivariate statistics and deep learning

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a double-voyage SAR image trace detection method based on multivariate statistics and deep learning.

Background

A very important application of Synthetic Aperture Radar (SAR) systems is the detection of footprints, wheel prints and other trace areas, which can be used for surveillance and search purposes. The double-navigation SAR image is two SAR images obtained by repeatedly flying the same area at different times. Correlation Change Detection (CCD) has the ability to locate trace areas from large scale areas, and can be used to achieve double-navigation SAR image trace Detection. The CCD model includes two modules: a disparity generation module for generating a disparity image using the repeatedly passed and geometrically repeated pairs of registered images; a difference analysis module for analyzing the difference image to obtain invariant and variant regions of interest pixels.

For the first part of the CCD model, two approaches can be summarized: one approach to generate difference images by designing appropriate statistical models, while reducing the false alarm of the generated images, is unable to completely remove low-correlation-number regions caused by natural environment variables; another method, which effectively distinguishes false alarms from traces, but requires very stringent experimental conditions, obtains multi-dimensional images over multiple flights or bandwidths. Therefore, how to generate the change information as much as possible while removing the interference area is still a very important issue for the generation of the difference image.

For the second part of the CCD model, two approaches can also be summarized: one is a non-deep learning method, which can use unsupervised methods such as threshold, clustering, etc. to identify pixels in difference images; one is a deep learning approach, which identifies pixels by building neural networks. Due to changes of natural environments such as wind blowing and rivers, the difference image generated by the CCD contains a large number of low-correlation-number areas caused by the natural environments, and great challenges are brought to the detection method. Besides, trace samples are rare, and how to train the neural network by using small samples is also an important problem.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a double-voyage SAR image trace detection method based on multivariate statistics and deep learning. The technical problem to be solved by the invention is realized by the following technical scheme:

the invention provides a double-voyage SAR image trace detection method based on multivariate statistics and deep learning, which comprises the following steps:

s1: obtaining a difference image of the double-voyage SAR image by using a complex reflection change detection estimator;

s2: identifying the water area and the vegetation area of the difference image and eliminating false alarms by using an unsupervised multivariate statistical method to obtain a false alarm elimination image;

s3: training a CUnet network by using inductive transfer learning and rough-to-fine images;

s4: and performing trace identification on the image to be processed by utilizing the trained CUnet network.

In an embodiment of the present invention, the S1 includes:

s11: constructing a mathematical model of the double-voyage SAR image obtained at the same geometric position at different moments;

s12: obtaining a complex reflection change detection estimator according to the mathematical model;

s13: and processing the double-voyage SAR image by using the complex reflection change detection estimator to obtain the difference image.

In one embodiment of the present invention, the expression of the complex reflection change detection estimator is:

wherein the content of the first and second substances,

and

respectively represent images X₁And X₂The (k) th complex data of (2),

representing complex data

Performing conjugate operation, wherein N is the number of neighborhood pixels, sigma_n1And σ_n2Are respectively an image X₁And X₂An additive system thermal noise estimate.

In an embodiment of the present invention, the S2 includes:

s21: performing intensity superposition by using an image pair consisting of the double-navigation SAR images to obtain a water area extraction statistic;

s22: obtaining a global threshold value of the water area extraction statistic based on an OTSU method and extracting the water area of the difference image;

s23: subtracting the intensity of the image pair to obtain vegetation extraction statistics;

s24: acquiring a threshold value of vegetation extraction statistics and carrying out vegetation area extraction on the difference image based on a method for manually determining the threshold value;

s25: and constructing a false alarm eliminating image for eliminating the water area and the vegetation area according to the difference image.

In an embodiment of the present invention, the S22 includes:

after obtaining the water area extraction statistics, obtaining the global threshold value tau by an OTSU method_sAnd judging whether the current pixel m belongs to a water area, wherein the judgment standard is as follows:

wherein S is_mA water area extraction statistic representing a current pixel m of the difference image.

In an embodiment of the present invention, the S24 includes:

for each pixel m in the image pair, judging whether the pixel m satisfies the condition

If yes, the pixel m belongs to the vegetation area, wherein tau_DAnd τ_αEstimates from statistics D and CRCD estimators for vegetation zone identification, respectively

Threshold value of (D)_mRepresents a statistical value of vegetation extraction corresponding to the pixel m,

representing the estimate corresponding to pixel m.

In an embodiment of the present invention, the S3 includes:

s31: connecting the original SAR image, the difference image and the false alarm elimination image in parallel according to a channel to obtain a coarse-fine image;

s32: acquiring a CUnet network;

s33: dividing the rough-to-fine image based on a transfer learning method to obtain a source label and a target label;

s34: pre-training the CUnet network by using a source domain task;

s35: and finely adjusting the CUnet network by using the target task to obtain the trained CUnet network.

In an embodiment of the present invention, the S33 includes:

s331: slicing the region of the CTF image without trace pixels into a plurality of image blocks, and establishing source domain data

Judging each pixel area as water area, vegetation or background pixel according to unsupervised multi-statistic method or artificial marking, and establishing source area label

S332: slicing the trace area in the CTF picture into a plurality of image blocks, and establishing target data

Judging each pixel area as water area, vegetation or background pixel according to unsupervised multi-statistic method or artificial marking, and establishing target label

In an embodiment of the present invention, the S34 includes:

establishing a prediction function f of a source domain task using the CUnet network_S(. to pre-train the CUnet network using the source domain data and the source domain labels to obtain a trained source domain task prediction function f_S(·)。

In an embodiment of the present invention, the S35 includes:

establishing a prediction function f of a trace detection task using the CUnet network_T(. to) using a trained source domain task prediction function f_SWeight of (-) initializes a prediction function f of a trace detection task_T(ii) and training a prediction function f using the target data and the target labels_T(. to obtain a trained CUnet network.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides a high-resolution SAR image trace detection method based on unsupervised multivariate statistics and small sample deep learning.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

Fig. 1 is a flowchart of a double-voyage SAR image trace detection method based on multivariate statistics and deep learning according to an embodiment of the present invention;

fig. 2 is a detailed flowchart of a double-voyage SAR image trace detection method based on multivariate statistics and deep learning according to an embodiment of the present invention;

FIG. 3 is a plot of the backscatter coefficients of waters, sands, fields, hills and mountains when the scrub angle ranges from 0-80 deg.;

FIG. 4 is a flow chart for training the CUnet using inductive transfer learning and coarse-to-fine images according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a CUnet network according to an embodiment of the present invention;

fig. 6 is a graph of a double-voyage SAR image and a trace tag thereof according to an embodiment of the present invention;

FIG. 7 is a diagram of results of different detection estimators provided by embodiments of the invention;

FIG. 8 is a ROC plot for different window sizes and decision thresholds provided by embodiments of the present invention;

FIG. 9 is a diagram of water area detection results of various methods provided by embodiments of the present invention;

FIG. 10 is a diagram of vegetation detection results for various methods provided by embodiments of the invention;

fig. 11 is a diagram of a false alarm removal result provided by an embodiment of the present invention;

FIG. 12 is a graph of CTF results provided by an embodiment of the present invention;

FIG. 13 is a diagram of accuracy rate changes in the CUnet pre-training process and the trimming process provided by embodiments of the present invention;

FIG. 14 is a graph of intermediate process characteristics of a test sample provided by an embodiment of the present invention;

FIG. 15 is a graph of trace detection results of various methods provided by embodiments of the present invention;

fig. 16 is a graph of the detection results of different numbers of training samples according to the embodiment of the present invention.

Detailed Description

In order to further illustrate the technical means and effects of the present invention adopted to achieve the predetermined invention purpose, the following describes in detail the double-voyage SAR image trace detection method based on multivariate statistics and deep learning according to the present invention with reference to the accompanying drawings and the detailed embodiments.

The foregoing and other technical matters, features and effects of the present invention will be apparent from the following detailed description of the embodiments, which is to be read in connection with the accompanying drawings. The technical means and effects of the present invention adopted to achieve the predetermined purpose can be more deeply and specifically understood through the description of the specific embodiments, however, the attached drawings are provided for reference and description only and are not used for limiting the technical scheme of the present invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or device that comprises a list of elements does not include only those elements but may include other elements not expressly listed. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of additional like elements in the article or device comprising the element.

Referring to fig. 1 and fig. 2, the method for detecting trace of double-voyage SAR image based on multivariate statistics and deep learning in this embodiment includes the following steps:

s1: and obtaining a difference image of the double-navigation SAR image by using a complex reflection change detection estimator.

Although the existing CCD method can be used for detecting trace changes, a high false alarm exists in a low Noise Ratio (CNR) region. The Complex Reflection Change Detection (CRCD) provided by this embodiment can integrate clutter and noise energy in the SAR image into a unified statistical model, reduce false alarm in a low CNR region, improve Detection probability of a trace pixel region, and simply and effectively apply the generated difference image to subsequent analysis. In this embodiment, a disparity image of the original SAR image is obtained using a complex reflection variation estimator.

Specifically, the S1 includes:

s11: and constructing a mathematical model of the double-navigation SAR image obtained at the same geometric position at different moments.

The double-navigation SAR image is two SAR images obtained by repeatedly flying the same position at different times, wherein one SAR image with earlier acquisition time does not contain a trace area and is called a reference image; and a pair of SAR images with later acquisition time contains a trace area, and is called a task image.

Suppose that two SAR images acquired at the same geometric position at different times are X respectively₁And X₂Then the two images can be modeled as:

wherein the content of the first and second substances,

and

respectively represent images X₁And X₂K-th complex data (i.e., data in complex form), k ∈ [1, M]M is the total number of image complex data, and alpha represents the sum of image complex data and image X₁And X₂The value range of alpha is[0，1]，

As an image X₁And X₂The data of the same time interval between the first and second time intervals,

as an image X₁And X₂The data of the change between the two,

and

respectively represent images X₁And X₂Additive system thermal noise, phase

Representing a constant phase difference, is an interference parameter.

S12: performing a small amount of algebraic operation according to the constructed mathematical models of the two SAR images to obtain a Complex Reflection Change Detection (CRCD) estimator:

wherein the content of the first and second substances,

representing complex data

Performing conjugate operation, wherein N is the number of neighborhood pixels, sigma_n1And σ_n2Are respectively an image X₁And X₂The additive system thermal noise estimate of (a) is,

known by system design specifications or measured by the shadow region of a SAR image pair。

Representing a complete change of neighboring pixels, 1 representing no change, the estimated value obtained by the CRCD estimator compared to the estimate of the CCD

The influence of clutter and thermal noise energy of adjacent pixels can be reflected. The CRCD estimator not only improves the coherence value of the surrounding unchanged low CNR region, but also increases the difference between the unchanged value and the trace-changed value in the coherence result.

S13: and processing the double-voyage SAR image by using the CRCD estimator to obtain the difference image.

Two SAR images X₁And X₂Substituting the complex data into the CRCD estimator to obtain the difference image by calculation. Specifically, image X₁And X₂Respectively substituting the M groups of complex data into the CRCD estimator to respectively obtain M estimation values, and forming the difference image by the M estimation values.

S2: and carrying out identification and false alarm elimination on the water area and the vegetation area of the difference image by using an unsupervised multivariate statistical method to obtain a false alarm elimination image.

Specifically, the S2 includes:

s21: and carrying out intensity superposition by using an image pair consisting of the two SAR images to obtain the water area extraction statistic.

The scattering properties of various land coverages in SAR images are first analyzed using a Morchin model. In the Morchin model, the backscattering coefficient can be expressed as:

where λ is the signal wavelength of the radar, θ_gA, B, mu and beta for wiping the ground₀For the characteristic parameter, depending on the type of ground, θ_c＝arcsin(λ/4πh_c)，

θ_cAnd h_cFor auxiliary calculation of intermediate parameters, when the ground type is desert and theta_g＜θ_cWhen, σ_c＝θ_g/θ_c(ii) a When the ground type is other ground type or theta_g＞θ_cWhen, σ _c1. Referring to fig. 3, fig. 3 is a graph of backscattering coefficients of waters, sands, fields, hills and mountains when the angle of the mopping is in the range of 0-80 °, wherein the waters are approximated by the sea surface of the first-class sea state. As shown in FIG. 3, the backscattering coefficient σ for waters is compared to other land coverage areas₀This results in a lower intensity of the water area in the SAR image than in other areas. Based on this analysis, a close threshold may be selected to extract the water region from the SAR image pair. Since the incoherent summation between the intensities of the double-navigated SAR images helps to suppress speckle noise and energy differences of different ground coverage, the embodiment takes the incoherent summation between the intensities of the double-navigated SAR images as the water area extraction statistic:

wherein the content of the first and second substances,

and

the corresponding pixel values in the selected region are respectively for the SAR image. Use of an averaging filter for suppressing speckle noise, N, in the estimation_sNeighborhood pixel values that are mean filtered within the selected region.

S22: and obtaining a global threshold value of the water area extraction statistic based on an OTSU method and carrying out water area extraction on the difference image.

Specifically, after obtaining the water area extraction statistic S, the global threshold τ is obtained by the OTSU method_sAnd judging whether the current pixel m belongs toIn water areas, the criteria are:

The OTSU method (maximum inter-class variance method) was proposed in 1979 by the japan scholars profusion (Nobuyuki OTSU), and is a method for determining a self-adaptive threshold, which is also called the ohio method, and is abbreviated as OTSU.

S23: and subtracting the intensities by using the image pair to obtain vegetation extraction statistics.

Specifically, when the thermal noise of the double-navigated SAR images are close to each other, the overall observation correlation coefficient can be expressed as:

ρ_total＝ρ_spatial·ρ_thermal·ρ_temporal

where ρ is_thermalAnd ρ_spatialRespectively representing the thermal and spatial reference uncorrelated coefficients, p_temporalThe time-independent coefficient is expressed by a physical change of the surface of the photographed area during the observation period (i.e., the time interval between two repeated flights), and can be expressed as:

where θ is the nominal angle of incidence and Δ y and Δ z are the vertical and height direction position changes during scattering, respectively. For a millimeter wave SAR system, a weak scattering offset generates a weak heat uncorrelated coefficient rho_temporalFurther results in an observed correlation coefficient ρ_totalThe generation of random vegetation wiggles and marks can produce a low correlation coefficient in the difference image, and thus removal of false alarms caused by vegetation areas is necessary.

Movement of vegetation not only affects the correlation of observations, but also causes changes in the backscattering coefficient over time. Detailed ground measurements show that the scattering coefficient varies rapidly with the change in the scrub angle, which means that there is a significant difference in the intensity of the vegetation area for the reference image and the mission image (i.e., two SAR images obtained by repeatedly flying the same area). Unlike vegetation areas, the trace areas are less susceptible to natural factors such as wind, and therefore the intensity difference between the trace areas of an image pair is small compared to vegetation areas. Based on this analysis, the intensity difference of the image pair is used as vegetation extraction statistics to identify vegetation areas:

wherein the content of the first and second substances,

and

respectively, the pixel values corresponding to the selected regions in the SAR image pair. Use of an averaging filter for suppressing speckle noise, N, in the estimation_dNeighborhood pixel values that are mean filtered within the selected region.

S24: and acquiring a threshold value of vegetation extraction statistics and carrying out vegetation area extraction on the difference image based on a method for manually determining the threshold value.

Specifically, for a pixel m of the image pair, it is considered that the pixel belongs to the vegetation area if the following condition is satisfied:

wherein, tau_DAnd τ_αEstimates from statistics D and CRCD estimators for vegetation zone identification, respectively

Is required to passManual determination, D_mRepresents a statistical value of vegetation extraction corresponding to the pixel m,

representing the estimate corresponding to pixel m. Tau is_DThe criterion is to distinguish as much as possible the water area from other areas, τ_αThe determination criterion is to divide the trace area as little as possible into water areas.

Specifically, the pixel values of the water area and the vegetation area in the obtained difference image are set to be 1, which represents the unchanged pixel area, and the false alarm elimination image is obtained.

referring to fig. 4, fig. 4 is a flowchart of training the CUnet using inductive migration learning and coarse-to-fine images according to an embodiment of the present invention.

In this embodiment, the S3 includes:

s31: and based on a channel parallel method, connecting the original SAR image, the difference image and the false alarm elimination image in parallel according to channels to obtain coarse-to-fine images.

Specifically, the intensity image of the original SAR image containing traces is first taken as a "rough image" I₁It reflects rich land cover information of the observation area. The difference image generated by the CRCD estimator is then taken as an "intermediate image" I₂It reflects the changes in the double-pass SAR image caused by traces and natural phenomena. The false alarm elimination image is then treated as a "fine image" I₃It enhances the footprint area while mitigating false alarms caused by natural phenomena. Finally, the graph I₁、I₂And I₃Connected into a picture according to the dimension. The combined image is named as a Coarse-To-Fine (CTF) image.

Unlike each individual image, the CTF image not only includes a variety of land cover information, but also weakens the false alarm area and strengthens the trace area. The land cover area and the trace area of the CTF image can be easily extracted compared to a single image. The CTF image establishes a bridge between the existing task and the target task in the transfer learning, and can remarkably improve the trace detection performance.

S32: and acquiring the CUnet network, wherein the weight number of the CUnet network is smaller than that of the Unet network.

The Unet network structure is widely applied to image segmentation of optical images, hyperspectral images and SAR images, and is mainly characterized in that an encoder can extract important features in input images, and a decoder based on jump connection can effectively sample feature maps of different layers to obtain an accurate segmentation result. When differential image analysis is involved, the Unet network has too many weight parameters to train and too many pooling layers. In order to solve the problem of small trace sample number and small size, the present embodiment uses a Compressed Unet (CUnet) structure.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a CUnet network according to an embodiment of the present invention. The encoder and decoder of the cumet network each comprise four Convolution modules, each having two Convolution (Conv) layers, with a Convolution kernel size of 3 x 3. Each convolutional layer is followed by an active layer and a Batch Normalization (BN) layer. In particular, using LeakyReLU as the activation function:

wherein, a_i∈(1，+∞)。x_iIs the value of neuron i. To avoid overfitting, a pond (Drop) layer is added after the first convolutional layer of each module. The pooling size of the maximal pooling (maxporoling) layer was 2 x 2, which was used to reduce the size of the signature while increasing the receptive field. And adding a convolution layer with the convolution kernel size of 1 x 1 at the end of the decoder to transform the feature map into class probability. The total number of training weights for Unet is 7846723, the total number of CUnet is 1947763. Fewer parameters means that the CUnet can improve the efficiency of use of trace samples and reduce the risk of over-fitting.

S33: and dividing the rough-to-fine image based on a transfer learning method to obtain a source label and a target label.

In particular, although the construction of CTF images and the CUnet network can help to train recognition models, a small number of trace samples remains an important issue for model training. In the embodiment, the small sample learning is performed by using the transfer learning, and because the training sample for trace detection is not enough to train a CUnet network with a good effect, a source task which has a large number of marked samples and is related to trace data is constructed, so that the training of the trace detection task can be facilitated, and the learning strategy belongs to the transfer learning. The widely used ImageNet, Pascal VOC, COCO data sets are typically used as the source domain. However, SAR images are sparse scattering center based images, and are not as intuitive and sharp as optical images. Under such conditions, brute force migration is unsuccessful, and may affect the learning of the trace-detection model.

SAR images are typically obtained by a beamformed SAR system with a resolution of a few centimeters to tens of centimeters. High resolution means large size of the imaged scene and large image size. The trace area only occupies a small part of the SAR image, and most of the trace area is various land coverage areas. Since the trace area and the land area have the same imaging conditions and imaging algorithms, the correlation between them is far superior to the correlation between the SAR image and the optical image. Based on these analyses, a migration learning training mode of the CUnet is proposed using the land cover in the CTF image as data of the source domain, including a pre-training phase and a fine-tuning phase. The details are as follows:

(1) slicing the CTF image without trace pixels into blocks of 128-128 pixels to create data of the source domain

Judging each pixel area as water area, vegetation or background pixel according to unsupervised multi-statistical method or artificial markingTo build the source label

(2) Slicing the trace area in the CTF picture into 128-128 pixel blocks to establish target data

Build object label accordingly

Wherein the content of the first and second substances,

0 or 1 means a trace pixel area or a background pixel area, respectively.

S34: and pre-training the CUnet network by utilizing a source domain task.

It should be noted that the migration learning is divided into a source domain task and a target task. The source domain task is a task for network training by using source domain data and source domain labels. And migrating the network parameters obtained by the source domain task training to the network of the target task. The target task is a task for network training by using target data and target labels.

Specifically, a forecasting function f of a source domain task is established by using the CUnet network_S(. using training samples in source domain data

Training the CUnet, this step is called "pre-training", so as to obtain a trained source domain task prediction function f_S(·)。

Specifically, a forecasting function f of trace detection task is established by using the CUnet network_T(. to) using a trained source domain task prediction function f_SWeight of (-) initializes a prediction function f of a trace detection task_TWeight of (a) andtraining a prediction function f using the target data and the target labels_T(. The step is called "fine tuning" to obtain a trained CUnet network.

Specifically, a trained CUnet network model is obtained through the induction and the migration learning. An image block is input, and the model can automatically recognize each pixel area in the image block as a trace area or a background area. In the steps, the CTF image constructs a bridge between a source domain task of land coverage identification and a target task of trace detection, and the maximum correlation between the source domain task and the target task also effectively promotes the learning of trace detection under the condition of a small sample.

The effect of the double-voyage SAR image trace detection method based on multivariate statistics and deep learning in the embodiment of the invention can be verified through the following experimental data.

The experimental conditions are as follows:

referring to fig. 6, fig. 6 is a graph of a double-voyage SAR image and a trace tag thereof according to an embodiment of the present invention, where the double-voyage SAR image includes two 4096 × 4096 pixels, which are obtained by the SAR system in a beamforming mode, HH polarization, and Ka bandwidth. The time interval between the two image acquisitions was 4 hours, the range resolution was 0.13m, and the azimuth resolution was 0.21 m. In the obtained SAR images, there are various land coverage areas, such as water areas, vegetation areas, and other background areas. A car travels from the top to the bottom of the upper left corner area. While leaving some footprint in this area. The generated task area and the mark data of the trace area are shown in fig. 6(b) and fig. 6(c), respectively, where white pixels mean trace areas and black pixels mean background areas.

Description of experimental evaluation criteria:

the undetected trace pixels are designated as False Negative (FN), the falsely detected background as (FP), the correctly detected trace pixels as True (TP), and the correctly undetected background as True Negative (TN). Some evaluation criteria are as follows:

P_FP＝FP/(TN+FP)×100％

P_FN＝FN/(TP+FN)×100％

P_OE＝(FP+FN)/(TP+FP+TN+FN)×100％

P_CC＝(TP+TN)/(TP+FP+TN+FN)×100％

wherein, P_OEIndicating the proportion of total errors, P_CCIndicating the proportion of correct recognition. P_FP、P_FN、P_OEAnd P_CCThe value range is [0, 1]]，P_FP、P_FN、P_OEThe smaller the value of (c), the lower the detection false alarm rate. P_CCThe larger the value, the higher the detection accuracy. The Kappa coefficient is calculated as follows:

Kappa＝(P_CC-PRE)/(1-PRE)×100％

wherein the content of the first and second substances,

the Kappa value range is usually [0, 1], the Kappa coefficient is a common method for measuring consistency among evaluators, and the detection precision is higher when the value of Kappa is larger.

And (3) analyzing an experimental result:

CRCD results and analysis:

the CRCD estimator described in this embodiment is used

To produce a difference image with a low false alarm rate and a high detection rate. Calculating thermal noise in CRCD equation Using shaded areas of FIGS. 6(a) and 6(b)

And

referring to fig. 7, fig. 7 shows a diagram of different detection estimators according to an embodiment of the present inventionAnd (5) a result chart. FIG. 7(a) and FIG. 7(b) are CCD estimators, respectively

And CRCD estimator

As a result, the adjacent pixel size is 9 × 9. The brighter the pixel, the higher the likelihood of being identified as an unchanged pixel, and the darker the pixel, the higher the likelihood of being identified as a changed pixel. It can be seen that next to the trace pixels of the changes of fig. 7(a) and 7(b), there are many black pixels. Vegetation areas, waters and other low correlation pixel areas cause a severe false alarm rate if these images are directly identified as trail areas or background areas based on pixel values.

To understand the CCD estimator

And CRCD evaluator

The entire trace area is cut out from fig. 7(a) and 7(b), as shown in fig. 7(c) and 7(d), respectively.

Referring to fig. 8, fig. 8 is a graph illustrating ROC curves under different window sizes and decision thresholds according to an embodiment of the present invention, where the abscissa is a false alarm rate and the ordinate is a detection accuracy rate. As can be seen from the Receiver Operating Characteristic (ROC) graph, when the false alarm rates are the same, the accuracy of both the CCD estimator (indicated by Coherence in fig. 8) and the CRCD estimator increases due to the increase of the window size. In particular, the best detection accuracy can be achieved using 9 × 9 windows and 13 × 13 windows. Since an excessive size would result in blurred boundaries of the trace area and a large computational load, a 9 × 9 window was chosen instead of a 13 × 13 window to generate the difference image. The CRCD estimator has a higher accuracy than the CCD estimator at the same false alarm rate. At the same time, the CRCD estimator has a lower false alarm rate than the CCD at the same accuracy. Therefore, the CRCD method can obtain a difference image with a low false alarm rate and high detection accuracy.

Water area detection result and analysis:

referring to fig. 9, a water area detection result chart of different methods according to an embodiment of the present invention is shown. Wherein, fig. 9(a) - (c) are water area detection results obtained by MRF method respectively using reference image intensity, task image intensity and image pair intensity and statistics, fig. 9(d) - (f) are water area detection results obtained by LevelSet method respectively using reference image intensity, task image intensity and image pair intensity and statistics, fig. 9(g) - (i) are water area detection results obtained by OTSU method respectively using reference image intensity, task image intensity and image pair intensity and statistics, and fig. 9(j) is a real label of the water area in fig. 6 (a). A Markov Random Field (MRF) is a model widely used for image segmentation, and integrates texture, context information, and the like of an image in a prior distribution manner to improve the image segmentation effect. The level set (LevelSet) method is an energy-based image segmentation method, and an expression of a target contour is obtained by solving a minimum energy functional. The OTSU method determines a global detection threshold by using the OTSU method with the intensity sum of the image pair as a statistic. It should be noted that the upper area of fig. 6(a) is a beach covered with seawater, which will rise and fall with time, and therefore the area also changes with time, and this embodiment does not mark this area as a true label. Fig. 9(a) - (c) show that a large number of terrestrial pixels are erroneously identified as waters by the MRF method. Fig. 9(d) - (f) show that although most of the water area is identified correctly, there are still a large number of isolated false alarm pixels. Fig. 9(g) - (i) show that the false alarm pixels are greatly reduced by the OTSU method. In particular, the image pair intensity sum as a statistic helps suppress speckle noise in the reference image and the task image. Please refer to table 1, wherein table 1 shows the quantitative results of different water area detection methods. The method provided by the embodiment realizes the highest P_CCAnd the highest Kappa. At the same time, the method of the present embodiment takes only 0.2s for the image pair processing of 4096 × 4096 pixels.

TABLE 1 quantitative results of different water area detection methods

Vegetation detection results and analysis:

referring to fig. 10, fig. 10 is a diagram of vegetation detection results of different methods according to an embodiment of the present invention, where fig. 10(a) is a detection result of an unsupervised MRF method and uses a task image as an input, fig. 10(b) is a detection result of a genetic algorithm (abbreviated as GA-FCM) of a fuzzy C-means cluster and uses a task image as an input, fig. 10(C) is a detection result of a method according to an embodiment of the present invention and uses an intensity difference of an image as an input, a window size of a mean filter is 3 × 3, and a statistic D and an estimator are determined through previous experiments

Threshold τ of_DIs 0.73, τ_αIs 0.35; fig. 10(d) shows a true label of the vegetation region. The GA-FCM method is a method for analyzing and modeling important data by using a fuzzy theory, establishes uncertainty description of sample categories, can objectively reflect the real world, is effectively applied to the field of image segmentation, optimizes clustering results by using a genetic algorithm, and can obtain satisfactory results in time performance and clustering quality. It can be seen that the MRF and GA-CFM methods can extract most of the vegetation area from the task image. However, there are a large number of false alarms caused by shadow and building areas in fig. 10(a) and (b). Compared with the methods, the method provided by the embodiment of the invention can eliminate most false alarms and extract most vegetation areas at the same time.

Please refer to table 2, wherein table 2 shows the quantitative results of different water area detection methods. As shown in Table 2, the method proposed by the embodiment of the present invention has a low global error and a high accuracy compared to MRF and GA-FCM methods. In addition, the method provided by the embodiment of the invention has a very fast inference speed, and the image pair processing of 4096 × 4096 pixels takes 0.68 s. Based on the analysis, the method provided by the embodiment of the invention can effectively extract the vegetation area by utilizing the intensity difference of the image pair and the low correlation characteristic of the vegetation area.

TABLE 2 quantitative results of different water area detection methods

And (3) false alarm elimination result and analysis of the difference image:

referring to fig. 11, fig. 11 is a diagram illustrating a false alarm removal result according to an embodiment of the present invention, where fig. 11(a) is a diagram illustrating a CRCD result without false alarm removal, fig. 11(b) is a diagram illustrating a CRCD result with false alarm removal, and fig. 11(b) removes most of the low correlation pixels in fig. 11(a), so that the trace area can be easily identified from the background. It is noted that although these false alarm regions have been removed, the other low correlation regions in fig. 11(b) still affect the trace detection performance.

The CUnet generalizes the migration learning results and analyses:

referring to fig. 12, a CTF result graph according to an embodiment of the present invention is provided. The CTF image is obtained by connecting the original SAR image containing traces in fig. 6(b), the difference image in fig. 7(b), and the false alarm removal image in fig. 11(b) by channel. As shown in fig. 12(a), trace pixels can be easily distinguished from other regions, which is helpful for training a robust neural network. The results of the multi-statistic method pre-recognition are shown in fig. 12(b) and 12(c), where white pixels in fig. 12(b) represent water areas and white pixels in fig. 12(c) represent vegetation areas. For the inductive transfer learning strategy, the pixel regions outside the white frame region in fig. 12(a) are used as source region data, and the corresponding labels in fig. 12(b) and 12(c) are used as source labels, to perform pre-training of the network. The fine adjustment of the network is performed with the pixel region in fig. 12(a) as the target data and the corresponding label in fig. 12(b) and 12(c) as the target label.

For the pre-training step, the source domain data of the CTF image is first cut into 128 × 128 pixel blocks with a step size of 112 pixels. The number of image blocks of the source domain data is 1330. Here, 80% was selected as the training sample and 20% as the test sample. The cross-entropy loss of the CUnet is optimized using the Stochastic Gradient Descent (SGD) method. The batch size was 32 and the number of iterations was 50. The learning rate is initialized to 0.001 and the momentum parameter of SGD is 0.9. The weight of the CUnet uses a random initialization method. During the training of the model, no data augmentation strategy is used. Referring to fig. 13, fig. 13 is a diagram of the variation of the precision rate of the compressed Unet pre-training process and the fine-tuning process according to the embodiment of the present invention. As can be seen from fig. 13(a), when the source domain data set is used as the training image and the unsupervised recognition result is used as the training label, the CUnet has a very stable convergence process.

For the fine tuning step, the white frame region of fig. 12(a) is cut into 128 × 128 pixel blocks by 64 steps. Unlike the cut step size of the source domain data, the smaller the cut step size, the more training and testing samples. In order to understand the detection result of the whole image, the pixel area below the dotted line of the white frame area is used as a training sample, and the pixel area above the dotted line of the white frame area is used as a test sample. In addition, some pixel blocks beside the white frame area are also selected as negative examples to improve the robustness of the method described in this embodiment. By the above strategy, 138 image blocks (49 image blocks including trace areas) were obtained. Unlike the random initialization of the CUnet by the source domain task, the weight of the CUnet for trace detection is initialized using the weight trained by the source domain task. The batch size is 32 and the number of iterations is 60. The Adam method was used to optimize the cross entropy loss function. Fig. 13(b) is a training curve of the fine tuning process, and it can be known that induction transfer learning based on land cover pre-recognition promotes the learning of the small sample trace detection model.

Referring to fig. 14, fig. 14 is a feature diagram of an intermediate process of a test sample according to an embodiment of the present invention, where a test sample is input as shown in fig. 14(a), and feature diagrams extracted through a trained CUnet network are shown in fig. 14(b) and fig. 14 (c). Fig. 14(b) shows that the CUnet network can extract important and abstract features from the footprint area, and fig. 14(c) shows that the decoder of the CUnet network can discriminate between background pixels and footprint pixels with high efficiency using the extracted features.

Comparison and analysis of different detection methods:

in order to confirm the efficiency of the CUnet network proposed by the embodiment of the present invention, some unsupervised detection models and supervised detection models are used as comparison methods. For unsupervised based methods, level set, OTSU, GA-FCM were used for trace detection of CTF images. These methods take the entire CTF image as input and use different features to detect objects of interest, such as intensity features, contour features. For the supervised deep learning method, each pixel of the CTF image is sliced into image blocks using a sliding window based DBN (depth confidence network), CNN (convolutional neural network), and whether the pixel area is a trace area or a background area is identified, the size of the sliced image block is 45 × 45 pixels. For each hidden layer in the DBN, the entire training set is used for pre-training 40 times, and a 300-250-100-2 network is used. In CNN, 3 convolution layers with convolution kernel 3 x 3 and 2 maximum pooling layers with kernel size 2 x 2 are used. For the full convolutional neural network, light weight SegNet and Unet were used as comparison methods. These deep learning methods use the same training set.

Referring to fig. 15, fig. 15 is a graph showing the trace detection results of different methods according to an embodiment of the present invention, in which fig. 15(a) is the result obtained by the LevelSet method, fig. 15(b) is the result obtained by the OTSU method, fig. 15(c) is the result obtained by the GA-FCM method, fig. 15(d) is the result obtained by the DBN method, fig. 15(e) is the result obtained by the CNN method, fig. 15(f) is the result obtained by the SegNet method, fig. 15(g) is the result obtained by the Unet method, and fig. 15(h) is the result obtained by the CUnet method. The DBN is a classical neural network structure, and the optimization problem of a deep neural network is solved by adopting a layer-by-layer training mode, so that the network can reach an optimal solution only through fine adjustment. SegNet and Unet are convolution networks commonly used for image segmentation, and the specific implementation mode is different from the application scene. As shown in fig. 15(a) - (c), although the unsupervised method can easily extract a large number of trace pixels, the detection result is contaminated with noise points and complexityThe background area is contaminated. The main reason is that these unsupervised methods cannot extract valid features to identify trace and background pixels. In contrast, the results of fig. 15(d) - (h) demonstrate that the supervised deep learning approach suppresses false alarms caused by background regions very well. Among these methods, CNNs, Unet and the CUnet network according to embodiments of the present invention maintain a clearer trace profile, while the CUnet network has the lowest false alarm detection result. Please refer to table 3, table 3 shows the trace detection quantification results of different methods. P of the method provided by the embodiment of the invention_CC99.76% higher than the other methods, and the P of the method provided by the embodiment of the invention_FN40.34%, lower than the other methods. Kappa is taken as an overall evaluation standard, the method provided by the embodiment of the invention achieves 70.50%, and is greatly improved compared with other methods. In addition, the CUnet of the embodiment of the invention has a fast reasoning speed, and only 8.23 seconds are needed for reasoning about a 4096 x 4096 pixel size image.

TABLE 3 quantitative results of trace detection by different methods

Comparison results and analysis of different pre-training strategies:

to verify the effect of unsupervised pre-training models on trace detection, the CUnet network was pre-trained using two other methods. On one hand, the CTF image is still used as the input image of the source task, and unlike the method provided by the embodiment of the present invention, the land cover label, including the water area, the vegetation area and the background category, is manually labeled as the real label of the source task. On the other hand, the proposed CUnet network was pre-trained using the Pascal VOC semantic segmentation task. The detection model is initialized by using the CUnet network model trained by the pre-training strategies to obtain three corresponding task detection models, and in addition, the detection model which is not pre-trained but initializes the weight is also used here. Please refer to table 4, table 4 shows the trace detection results of different pre-training methods. It can be seen that it is not pre-trainedThe model has a lowest recognition accuracy and Kappa value. After VOC data pre-training, the Kappa coefficient is obviously improved, and the global error of detection is reduced. And VOC data is replaced by land coverage pixels of the CTF image, and the detection precision is higher by adopting the supervised pre-training and unsupervised pre-training methods. Notably, unsupervised pre-trained models achieve the highest P by detecting trace pixels_CC(99.76%) and the highest Kappa (70.50%), which confirms the effectiveness of unsupervised pre-trained inductive transfer learning based on CTF images.

TABLE 4 Trace detection results of different Pre-training methods

Comparison and analysis of different numbers of training samples:

the CUnet network and inductive transfer learning are used to overcome the problem of small number of trace region samples in CTF images. To verify the adaptability of the method, the present embodiment randomly selects a different number of samples from the training set to fine-tune the CUnet network. Referring to fig. 16, fig. 16 is a diagram illustrating the detection results of different training sample numbers according to the embodiment of the present invention. Kappa and P at different ratios of selected samples to total trace training samples_OEThe variation (c) is shown in FIGS. 15(a) and (b). As the number of training samples increases, Kappa continues to rise, P_OEAnd is continuously decreased. Therefore, the CUnet network and the inductive transfer learning model provided by the embodiment have strong adaptability to the number of training samples, and the more training samples, the better the detection result.

Comparison and analysis of different data expansion strategies:

data augmentation, like image rotation and inversion, is a fundamental step for deep learning, as it can provide more training samples and in most cases improve performance. The present embodiment also uses different strategies to expand the trace samples, including random rotation by 0 ° to 30 °, random scaling by 0.8 to 1.2, brightness adjustment by 0.8 to 1.2, and random inversion. The number of extended samples is 0.5 times, 1 times, and 2 times the original samples. Please refer to table 5, table 5 shows the detection results of different data expansion strategies. It can be seen that these expansion strategies have a negative impact on the respective parameters, in particular Kappa, with a significant attenuation compared to the unexpanded samples. The more samples that are extended, the higher the false alarms and errors, so these widely used data extension methods cannot be directly applied to the trace detection task. Because of the small number of samples in the trace detection task, the expansion of these samples can cause severe overfitting of the test samples. At the same time, these expansion strategies are different from double-voyage SAR images, and unreasonable expansion strategies affect the learning process of the method.

TABLE 5 test results of different data expansion strategies

In summary, the embodiment of the invention provides a double-voyage SAR image trace detection method based on multivariate statistics and deep learning, a differential image of an SAR image pair is obtained through a complex reflection change detection estimator, a water area and a vegetation area are obtained through unsupervised multivariate statistics, a false alarm elimination image is further obtained, a rough-to-fine image is constructed by combining an original image, the differential image and a false alarm removal image, induction transfer learning is carried out by utilizing the rough-to-fine image and a CUnet network, so that double-voyage SAR image trace detection under the condition of a small sample is realized, and the detection effect is good.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A double-voyage SAR image trace detection method based on multivariate statistics and deep learning is characterized by comprising the following steps:

2. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 1, wherein the S1 comprises:

3. The multivariate statistic and deep learning based double-voyage SAR image trace detection method according to claim 2, characterized in that the expression of the complex reflection change detection estimator is:

wherein the content of the first and second substances,

and

respectively represent images X₁And X₂The (k) th complex data of (2),

representing complex data

4. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 1, wherein the S2 comprises:

5. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 4, wherein the S22 comprises:

wherein S is_mTo representAnd extracting statistics from the water area of the current pixel m of the difference image.

6. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 4, wherein the S24 comprises:

representing the estimate corresponding to pixel m.

7. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 1, wherein the S3 comprises:

s32: acquiring a CUnet network;

s34: pre-training the CUnet network by using a source domain task;

8. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 7, wherein the S33 comprises:

9. The multivariate statistic and deep learning-based double-voyage SAR image trace detection method according to claim 8, wherein the S34 comprises:

10. The dual-navigation SAR image trace detection method based on multivariate statistics and deep learning of claim 9, wherein the S35 comprises:

establishing a prediction function f of a trace detection task using the CUnet network_T(. using a trained source-domain taskTraffic prediction function f_SWeight of (-) initializes a prediction function f of a trace detection task_T(ii) and training a prediction function f using the target data and the target labels_T(. to obtain a trained CUnet network.